# CS代考程序代写 Bayesian Regression From a Forecasting Perspective

Regression From a Forecasting Perspective

Zhenhao Gong University of Connecticut

Welcome 2

This course is designed to be:

1. Introductory

2. Leading by interesting questions and applications 3. Less math, useful, and fun!

Most important:

Feel free to ask any questions!

Enjoy!

Regression in time series data 3

Basic regression model:

yt = β0 + β1xt + εt

iid 2

εt ∼ (0,σ ), t = 1,··· ,T,

where β0, β1, and σ2 are called the model’s parameters. The index t keeps track of time.

Conditional expectation 4

If the regression model postulated above holds true, then the expected value of y conditional upon x∗ is,

E(y|x∗) = β0 + β1x∗,

so the regression function is the conditional expectation of y. In fact, as we will see later, the expectation of future y conditional upon available information is a particularly good forecast.

OLS Estimation 5

We assume the model sketched above is true in population and estimate the unknown parameters by solving the problem

T

min ε2t (sum of the squared error terms)

β0,β1 t=1

T

= min [yt − β0 − β1xt]2 . β0,β1 t=1

We denote the set of estimated parameters by βˆ, and its elements by βˆ0 and βˆ1.

Fitted values 6

The fitted values, or in-sample forecasts, are yˆ t = βˆ 0 + βˆ 1 x t , t = 1 , · · · , T .

The corresponding residuals, or in-sample forecast errors, are e t = y t − yˆ t , t = 1 , · · · , T .

Remark: Systematic patterns in forecast errors indicate that the forecasting model is inadequate; forecast errors from a good forecasting model must be unforecastable!

In-sample vs out-of-sample forecast 7

In-sample forecasting: forecasting for an observation that was part of the data sample.

Out-of-sample forecasting: forecasting for an observation that was not part of the data sample.

Example: if you use data 1990-2013 to fit the model and then you forecast for 2011-2013, it’s in-sample forecast. But if you only use 1990-2010 for fitting the model and then you forecast 2011-2013, then its out-of-sample forecast.

Multiple regressors 8

Multiple linear regression model:

yt =β0 +β1xt +β2zt +εt

iid 2

εt ∼ (0,σ ), t = 1,··· ,T.

We estimate the coefficients by OLS method. The fitted values are yˆt = βˆ0 + βˆ1xt + βˆ2zt, and the corresponding residuals are

e t = y t − yˆ t , t = 1 , · · · , T .

Remark: Each estimated coefficient gives the weight put on the corresponding variable in forming the best linear forecast of y.

Forecasting results: Eviews format

Std.Error 10

Std.Error:

indicate the sampling variability and hence the

reliability of each estimated coefficient.

95 % confidence interval: [βˆ − 1.96SE(βˆ), βˆ + 1.96SE(βˆ)].

large coefficient standard errors translate into wide confidence intervals ⇒ poor estimation.

t-statistics: test the hypothesis of variable irrelevance.

(β = 0 ⇒ variable contributes nothing and can be dropped)

Prob. 11

Probability value (P-value):

the probability of getting a value of the t statistic at least as large in absolute value as the one actually obtained, assuming that the irrelevance hypothesis is true.

the smaller the probability value, the stronger the evidence against irrelevance.

probability values below 0.05 are viewed as very strong evidence against irrelevance.

Sum squared residuals 12

Sum squared residuals: record the minimized value of the sum of squared residuals.

T SSR = e2t .

t=1

It serves as an input to other diagnostics that we’ll discuss shortly.

Log likelihood 13

Log likelihood: The likelihood function is the joint density function of the data, viewed as a function of the model parameters.

maximum likelihood estimation (MLE).

equivalent to minimizing the sum of squared residuals in the case of normally distributed regression distrubances.

F-statistic 14

F-statistic: test the hypothesis that the coefficients of all variables in the regression except the intercept are jointly zero. The formula is

F = (SSRres −SSR)/(k−1), SSR/(T − k)

where SSRres is the sum of squared residuals from a restricted regression that contains only an intercept.

Examining how much the SSR increases when all the variables except the constant are dropped.

If it increases by a great deal, there’s evidence that at least one of the variables has predictive content.

SER 15

S.E of regression (SER):

√ T e2 SER= s2= t=1 t.

s2 is the sample variance of the observed residuals, et. It is a natural estimator of σ2, which is the population variance of the unobserved residuals, εt.

s2 is used to assess goodness of fit of the model, as well as the magnitude of forecast errors.

The larger is s2, the worse the model’s fit, and the larger the forecast errors. (SER ≤ 10 or 15% of y ̄t)

T−k

R-squared

16

R-squared (R2):

2 SSres

Tt=1 e2t =1−T 2,

t=1 (yt − y)

The percent of the variance of y explained by the variables

included in the regression.

SSres, the residual sum of squares: Tt (yt − yˆt)2. SStot, the total sum of squares: Tt (yt − y ̄)2.

R-squared must be between zero and one.

R =1−SS tot

Adjusted R-squared 17

Adjusted R-squared (R ̄2) :

1 T e2 ̄2 T−k t=1t

R =1− 1 T (yt−y)2, T−1 t=1

where k is the number of right-hand-side variables, including the constant term.

incorporates adjustments for degrees of freedom used in fitting the model.

a more trustworthy goodness-of-fit measure than R2 in multiple regression models.

AIC 18

Akaike info criterion (AIC):

AIC = e(2k )Tt=1 e2t . T

T

An estimate of the out-of-sample forecast error variance,

as is s2, but it penalizes degree of freedom more harshly. It is used to select among competing forecasting models.

SIC/BIC 19

Schwarz/Bayesian information criterion (SIC/BIC):

SIC=T(k )Tt=1e2t. T

T

An alternative to the AIC with the same interpretation, but a still harsher degrees-of-freedom penalty.

Durbin-Watson Statistic 20

The Durbin-Watson (DW) statistic tests for serial correlation, in regression disturbances. It works within the context of the model

yt =β0 +β1xt +β2zt +εt εt = φεt−1 + vt

iid 2

vt ∼ (0,σ ), t = 1,··· ,T.

The regression disturbance is serially correlated when φ ̸= 0 (εt follows AR(1) ). The hypothesis of interest if that φ = 0.

Durbin-Watson Statistic 21

The formula for the DW test is

Tt=2 (et − et−1)2

DW= Te2 . t=1 t

DW takes values in the interval [0, 4]. A value near 2 indicates non-autocorrelation; a value toward 0 indicates positive autocorrelation; a value toward 4 indicates negative autocorrelation.

As a rough rule of thumb, if DW is less than 1.5, there may be cause for alarm.

Residual plot 22

Plot the actual data (yt’s), the fitted values (yˆt’s), and the residuals (et = yt − yˆt) in a single graph to assess the adequacy of the model.