I am hypothetically attempting to fit an auto.arima ARIMAX model with
y = a given proportion over a daily time series
and for my xregs I have temperature and velocity metrics lagged all the way back to 60 days in a matrix titled xvars with structure: column 1 = Temp, column 2 = Velocity, column 3 = Temp Lag 1, column 4 = Velocity Lag 1......column 121 = Temp Lag 60, column 122 = Velocity Lag 60.
the possible model parameterization/specifications for lag 60 could in essence then be
m_60 <- auto.arima(y,xreg = xvars[,1:122])
m2_60 <- auto.arima(y,xreg = xvars[,121:122])
the outcome of m_60 results in an ARIMA(0,0,0) presumably due to white noise/noisy data.
the outcome of m2_60 results in an ARIMA(5,0,0)
being that I have quite a bit more to read on the subject my naive ensuing questions are then
1.) is it inappropriate to lag exogenous regressor variables out so far as they may be encroaching upon seasonality and/or other trends and factors? what is an appropriate/acceptable amount of days to lag x regressors out?
2.) the m_60 parameterization results in a better AIC while still accounting for the increased parameters/DF's, can this model be trusted as it is lagged out so far, and includes all xregs before m_60 in the xvars matrix? I want to avoid over fiting as well.
3.) is there a better diagnostic/performance metric then AIC for this suite of models (i.e. ACF, PACF, MASE, etc.)?
4.) all forums seem to speak more to lagging the time variable itself in a standard ARIMA model, so I could not find much on the subject of lagging the x regs in an ARIMAX.
thank you in advance for any help, and I appreciate any pertinent literature on the subject.
question from:
https://stackoverflow.com/questions/65925645/arimax-model-lagged-x-regressor-variable-specification-within-model-call-and-as