# 程序代写代做代考 ER AI finance scheme chain algorithm GMM matlab database Bayesian data mining Lecture 1: Introduction to Forecasting

Lecture 1: Introduction to Forecasting

UCSD, January 9 2017

Allan Timmermann1

1UC San Diego

Timmermann (UCSD) Forecasting Winter, 2017 1 / 64

1 Course objectives

2 Challenges facing forecasters

3 Forecast Objectives: the Loss Function

4 Common Assumptions on Loss

5 Specific Types of Loss Functions

6 Multivariate loss

7 Does the loss function matter?

8 Informal Evaluation Methods

9 Out-of-Sample Forecast Evaluation

10 Some easy and hard to predict variables

11 Weak predictability but large economic gains

Timmermann (UCSD) Forecasting Winter, 2017 2 / 64

Course objectives: Develop

Skills in analyzing, modeling and working with time series data from

finance and economics

Ability to construct forecasting models and generate forecasts

formulating a class of models – using information intelligently

model selection

estimation – making best use of historical data

Develop creativity in posing forecasting questions, collecting and

using often incomplete data

which data help me build a better forecasting model?

Ability to critically evaluate and compare forecasts

reasonable (simple) benchmarks

skill or luck? Overfitting (data mining)

Compete or combine?

Timmermann (UCSD) Forecasting Winter, 2017 2 / 64

Ranking forecasters: Mexican inflation

Timmermann (UCSD) Forecasting Winter, 2017 3 / 64

Forecast situations

Forecasts are used to guide current decisions that affect the future

welfare of a decision maker (forecast user)

Predicting my grade – updating information on the likely grade as the

course progresses

Choosing between a fixed-rate mortgage (interest rate fixed for 20

years) versus a floating-rate (variable) mortgage

Depends on interest rate and inflation forecast

Political or sports outcomes – prediction markets

Investing in the stock market. How volatile will the stock market be?

Predicting Chinese property prices. Supply and demand considerations,

economic growth

Structural versus reduced-form approaches

Depends on the forecast horizon: 1 month vs 10 years

Timmermann (UCSD) Forecasting Winter, 2017 4 / 64

Forecasting and decisions

Credit card company deciding which transactions are potentially

fraudulent and should be denied (in real time)

requires fitting a model to past credit card transactions

binary data (zero-one)

Central Bank predicting the state of the economy – timing issues

Predicting which fund manager (if any) or asset class will outperform

Forecasting the outcome of the world cup:

http://www.goldmansachs.com/our-thinking/outlook/world-cup-

sections/world-cup-book-2014-statistical-model.html

Timmermann (UCSD) Forecasting Winter, 2017 5 / 64

Forecasting the outcome of the world cup

Timmermann (UCSD) Forecasting Winter, 2017 6 / 64

Key issues

Decision maker’s actions depend on predicted future outcomes

Trade off relative costs of over- or underpredicting outcomes

Actions and forecasts are inextricably linked

good forecasts are expected to lead to good decisions

bad forecasts are expected to lead to poor decisions

Forecast is an intermediate input in a decision process, rather than an

end product of separate interest

Loss function weighs the cost of possible forecast errors – like a utility

function uses preferences to weigh different outcomes

Timmermann (UCSD) Forecasting Winter, 2017 7 / 64

Loss functions

Forecasts play an important role in almost all decision problems where

a decision maker’s utility or wealth is affected by his current and

future actions and depend on unknown future events

Central Banks

Forecast inflation, unemployment, GDP growth

Action: interest rate; monetary policy

Trade off cost of over- vs. under-predictions

Firms

Forecast sales

Action: production level, new product launch

Trade off inventory vs. stock-out/goodwill costs

Money managers

Forecast returns (mean, variance, density)

Action: portfolio weights/trading strategy

Trade off Risk vs. return

Timmermann (UCSD) Forecasting Winter, 2017 8 / 64

Ways to generate forecasts

Rule of thumb. Simple decision rule that is not optimal, but may be

robust

Judgmental/subjective forecast, e.g., expert opinion

Combine with other information/forecasts

Quantitative models

“… an estimated forecasting model provides a characterization of what

we expect in the present, conditional upon the past, from which we

infer what to expect in the future, conditional upon the present and the

past. Quite simply, we use the estimated forecasting model to

extrapolate the observed historical data.” (Frank Diebold, Elements of

Forecasting).

Combine different types of forecasts

Timmermann (UCSD) Forecasting Winter, 2017 9 / 64

Forecasts: key considerations

Forecasting models are simplified approximations to a complex reality

How do we make the right shortcuts?

Which methods seem to work in general or in specific situations?

Economic theory may suggest relevant predictor variables, but is silent

about functional form, dynamics of forecasting model

combine art (judgment) and science

how much can we learn from the past?

Timmermann (UCSD) Forecasting Winter, 2017 10 / 64

Forecast object – what are we trying to forecast?

Event outcome: predict if a certain event will happen

Will a bank or hedge fund close?

Will oil prices fall below $40/barrel in 2017?

Will Europe experience deflation in 2017?

Event timing: it is known that an event will happen, but unknown

when it will occur

When will US stocks enter a “bear” market (Dow drops by 10%)?

Time-series: forecasting future values of a continuous variable by

means of current and past data

Predicting the level of the Dow Jones Index on March 15, 2017

Timmermann (UCSD) Forecasting Winter, 2017 11 / 64

Forecast statement

Point forecast

Single number summarizing “best guess”. No information on how

certain or precise the point forecast is. Random shocks affect all

time-series so a non-zero forecast error is to be expected even from a

very good forecast

Ex: US GDP growth for 2017 is expected to be 2.5%

Interval forecast

Lower and upper bound on outcome. Gives a range of values inside

which we expect the outcome will fall with some probability (e.g., 50%

or 95%). Confidence interval for the predicted variable. Length of

interval conveys information about forecast uncertainty.

Ex: 90% chance US GDP growth will fall between 1% and 4%

Density or probability forecast

Entire probability distribution of the future outcome

Ex: US GDP growth for 2017 is Normally distributed N(2.5,1)

Timmermann (UCSD) Forecasting Winter, 2017 12 / 64

Forecast horizon

The best forecasting model is likely to depend on whether we are

forecasting 1 minute, 1 day, 1 month or 1 year ahead

We refer to an h−step-ahead forecast, where h (short for “horizon”)

is the number of time periods ahead that we predict

Often you hear the argument that “fundamentals matter in the long

run, psychological factors are more important in the short run”

Timmermann (UCSD) Forecasting Winter, 2017 13 / 64

Information set

Do we simply use past values of a series itself or do we include a

larger information set?

Suppose we wish to forecast some outcome y for period T + 1 and

have historical data on this variable from t = 1, ..,T . The univariate

information set consists of the series itself up to time T :

IunivariateT = {y1, …, yT }

If data on other series, zt (typically an N × 1 vector), are available,

we have a multivariate information set

ImultivariateT = {y1, …, yT , z1, …, zT }

It is often important to establish whether a forecast can benefit from

using such additional information

Timmermann (UCSD) Forecasting Winter, 2017 14 / 64

Loss function: notations

Outcome: Y

Forecast: f

Forecast error: e = Y − f

Observed data: Z

Loss function: L(f ,Y )→ R

maps inputs f ,Y to the real number line R

yields a complete ordering of forecasts

describes in relative terms how costly it is to make forecast errors

Timmermann (UCSD) Forecasting Winter, 2017 15 / 64

Loss Function Considerations

Choice of loss function that appropriately measures trade-offs is

important for every facet of the forecasting exercise and affects

which forecasting models are preferred

how parameters are estimated

how forecasts are evaluated and compared

Loss function reflects the economics of the decision problem

Financial analysts’forecasts; Hong and Kubik (2003), Lim (2001)

Analysts tend to bias their earnings forecasts (walk-down effect)

Sometimes a forecast is best viewed as a signal in a strategic game

that explicitly accounts for the forecast provider’s incentives

Timmermann (UCSD) Forecasting Winter, 2017 16 / 64

Constructing a loss function

For profit maximizing investors the natural choice of loss is the

function relating payoffs (through trading rule) to the forecast and

realized returns

Link between loss and utility functions: both are used to minimize risk

arising from economic decisions

Loss is sometimes viewed as the negative of utility

U(f ,Y ) ≈ −L(Y , f )

Majority of forecasting papers use simple ‘off the shelf’statistical loss

functions such as Mean Squared Error (MSE)

Timmermann (UCSD) Forecasting Winter, 2017 17 / 64

Common Assumptions on Loss

Granger (1999) proposes three ‘required’properties for error loss

functions, L(f , y) = L(y − f ) = L(e):

A1. L(0) = 0 (minimal loss of zero for perfect forecast);

A2. L(e) ≥ 0 for all e;

A3. L(e) is monotonically non-decreasing in |e| :

L(e1) ≥ L(e2) if e1 > e2 > 0

L(e1) ≥ L(e2) if e1 < e2 < 0
A1: normalization
A2: imperfect forecasts are more costly than perfect ones
A3: regularity condition - bigger forecast mistakes are (weakly)
costlier than smaller mistakes (of same sign)
Timmermann (UCSD) Forecasting Winter, 2017 18 / 64
Additional Assumptions on Loss
Symmetry:
L(y − f , y) = L(y + f , y)
Granger and Newbold (1986, p. 125): “.. an assumption of symmetry
about the conditional mean ... is likely to be an easy one to accept ...
an assumption of symmetry for the cost function is much less
acceptable.”
Homogeneity: for some positive function h(a) :
L(ae) = h(a)L(e)
scaling doesn’t matter
Differentiability of loss with respect to the forecast (regularity
condition)
Timmermann (UCSD) Forecasting Winter, 2017 19 / 64
Squared Error (MSE) Loss
L(e) = ae2, a > 0

Satisfies the three Granger properties

Homogenous, symmetric, differentiable everywhere

Convex: penalizes large forecast errors at an increasing rate

Optimal forecast:

f ∗ = arg

f

min

∫

(y − f )2pY dy

First order condition

f ∗ =

∫

ypY dy = E (y)

The optimal forecast under MSE loss is the conditional mean

Timmermann (UCSD) Forecasting Winter, 2017 20 / 64

Piece-wise Linear (lin-lin) Loss

L(e) = (1− α)e1e>0 − αe1e≤0, 0 < α < 1 1e>0 = 1 if e > 0, otherwise 1e>0 = 0. Indicator variable

Weight on positive forecast errors: (1− α)

Weight on negative forecast errors: α

Lin-lin loss satisfies the three Granger properties and is homogenous

and differentiable everywhere with regard to f , except at zero

Lin-lin loss does not penalize large errors as much as MSE loss

Mean absolute error (MAE) loss arises if α = 1/2:

L(e) = |e|

Timmermann (UCSD) Forecasting Winter, 2017 21 / 64

MSE vs. piece-wise Linear (lin-lin) Loss

-3 -2 -1 0 1 2 3

0

5

10

L(

e)

e

α = 0.25

-3 -2 -1 0 1 2 3

0

5

10

L(

e)

e

α = 0.5, MAE loss

-3 -2 -1 0 1 2 3

0

5

10

L(

e)

e

α = 0.75

MSE

linlin

MSE

linlin

MSE

linlin

Timmermann (UCSD) Forecasting Winter, 2017 22 / 64

Optimal forecast under lin-lin Loss

Expected loss under lin-lin loss:

EY [L(Y − f )] = (1− α)E [Y |Y > f ]− αE [Y |Y ≤ f ]

First order condition:

f ∗ = P−1Y (1− α)

PY : CDF of Y

The optimal forecast is the (1− α) quantile of Y

α = 1/2 : optimal forecast is the median of Y

As α increases towards one, the optimal forecast moves further to the

left of the tail of the predicted outcome distribution

Timmermann (UCSD) Forecasting Winter, 2017 23 / 64

Optimal forecast of N(0,1) variable under lin-lin loss

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

α

f*

Timmermann (UCSD) Forecasting Winter, 2017 24 / 64

Linex Loss

L(e) = exp(a2e)− a2e − 1, a2 6= 0

Differentiable everywhere

Asymmetric: a2 controls both the degree and direction of asymmetry

a2 > 0 : loss is approximately linear for e < 0 and approximately exponential for e > 0

Large underpredictions are very costly (f < y , so e = y − f > 0)

Converse is true when a2 < 0 Timmermann (UCSD) Forecasting Winter, 2017 25 / 64 MSE versus Linex Loss -3 -2 -1 0 1 2 3 0 5 10 15 20 L( e) e right-skewed linex loss with a 2 =1 -3 -2 -1 0 1 2 3 0 5 10 15 20 L( e) e left-skewed linex loss with a 2 =-1 MSE Linex MSE Linex Timmermann (UCSD) Forecasting Winter, 2017 26 / 64 Linex Loss Suppose Y ∼ N(µY , σ 2 Y ). Then E [L(e)] = exp(a2(µY − f ) + a22 2 σ2Y )− a2(µY − f ) Optimal forecast: f ∗ = µY + a2 2 σ2Y Under linex loss, the optimal forecast depends on both the mean and variance of Y (µY and σ 2 Y ) as well as on the curvature parameter of the loss function, a2 Timmermann (UCSD) Forecasting Winter, 2017 27 / 64 Optimal bias under Linex Loss for N(0,1) variable -3 -2 -1 0 1 2 3 0 0.2 0.4 e MSE loss -3 -2 -1 0 1 2 3 0 0.2 0.4 e linex loss with a 2 =1 -3 -2 -1 0 1 2 3 0 0.2 0.4 e linex loss with a 2 =-1 Timmermann (UCSD) Forecasting Winter, 2017 28 / 64 Multivariate Loss Functions Multivariate MSE loss with n errors e = (e1, ..., en)′ : MSE (A) = e ′Ae A is a nonnegative and positive definite n× n matrix This satisfies the basic assumptions for a loss function When A = In, covariances can be ignored and the loss function simplifies to MSE (In) = E [e ′e] = ∑ n i=1 e 2 i , i.e., the sum of the individual mean squared errors Timmermann (UCSD) Forecasting Winter, 2017 29 / 64 Does the loss function matter? Cenesizoglu and Timmermann (2012) compare statistical and economic measures of forecasting performance across a large set of stock return prediction models with time-varying mean and volatility Economic performance is measured through the certainty equivalent return (CER), i.e., the risk-adjusted return Statistical performance is measured through mean squared error (MSE) Performance is measured relative to that of a constant expected return (prevailing mean) benchmark Common for forecast models to produce worse mean squared error (MSE) but better return performance than the benchmark Relation between statistical and economic measures of forecasting performance can be weak Timmermann (UCSD) Forecasting Winter, 2017 30 / 64 Does loss function matter? Cenesizoglu and Timmermann Timmermann (UCSD) Forecasting Winter, 2017 31 / 64 Percentage of models with worse statistical but better economic performance than prevailing mean (CT, 2012) CER is certainty equivalent return Sharpe is the Sharpe ratio RAR is risk-adjusted return RMSE is root mean squared (forecast) error Timmermann (UCSD) Forecasting Winter, 2017 32 / 64 Example: Directional Trading system Consider the decisions of a risk-neutral ‘market timer’whose utility is linear in the return on the market portfolio (y) U(δ(f ), y) = δy Investor’s decision rule, δ(f ) : go ‘long’one unit in the risky asset if a positive return is predicted (f > 0), otherwise go short one unit:

δ(f ) =

{

1 if f ≥ 0

−1 if f < 0
Let sign(y) = 1, if y > 0, otherwise sign(y) = 0. Payoff:

U(y , δ(f )) = (2sign(f )− 1)y

Sign and magnitude of y and sign of f matter to trader’s utility

Timmermann (UCSD) Forecasting Winter, 2017 33 / 64

Example: Directional Trading system (cont.)

Which forecast approach is best under the directional trading rule?

Since the trader ignores information about the magnitude of the

forecast, an approach that focuses on predicting only the sign of the

excess return could make sense

Leitch and Tanner (1991) studied forecasts of T-bill futures:

Professional forecasters reported predictions with higher mean squared

error (MSE) than those from simple time-series models

Puzzling since the time-series models incorporate far less information

than the professional forecasts

When measured by their ability to generate profits or correctly forecast

the direction of future interest rate movements the professional

forecasters did better than the time-series models

Professional forecasters’objectives are poorly approximated by MSE

loss – closer to directional or ‘sign’loss

Timmermann (UCSD) Forecasting Winter, 2017 34 / 64

Common estimates of forecasting performance

Define the forecast error et+h|t = yt+h − ft+h|t . Then

MSE = T−1

T

∑

t=1

e2t+h|t

RMSE =

√√√√T−1 T∑

t=1

e2

t+h|t

MAE = T−1

T

∑

t=1

|et+h|t |

Directional accuracy (DA): let Ixt+1>0 = 1 if xt+1 > 0, otherwise

Ixt+1>0 = 0. Then an estimate of DA is

DA = T−1

T

∑

t=1

Iyt+h×ft+h|t>0

Timmermann (UCSD) Forecasting Winter, 2017 35 / 64

Forecast evaluation

ft+h|t : forecast of yt+h given information available at time t

Given a sequence of forecasts, ft+h|t , and outcomes, yt+h,

t = 1, …,T , it is natural to ask if the forecast was “optimal”or

obviously deficient

Questions posed by forecast evaluation are related to the

measurement of predictive accuracy

Absolute performance measures the accuracy of an individual

forecast relative to the outcome, using either an economic

(loss-based) or a statistical metric

Relative performance compares the performance of one or several

forecasts against some benchmark

Timmermann (UCSD) Forecasting Winter, 2017 36 / 64

Forecast evaluation (cont.)

Forecast evaluation amounts to understanding if the loss from a given

forecast is “small enough”

Informal methods – graphical plots, decompositions

Formal methods – distribution of test statistic for sample averages of

loss estimates can depend on how the forecasts were constructed, e.g.

which estimation method was used

The method (not only the model) used to construct the forecast

matters – expanding vs. rolling estimation window

Formal evaluation of an individual forecast requires testing whether

the forecast is optimal with respect to some loss function and a

specific information set

Rejection of forecast optimality suggests that the forecast can be

improved

Timmermann (UCSD) Forecasting Winter, 2017 37 / 64

Effi cient Forecast: Definition

A forecast is effi cient (optimal) if no other forecast using the available

data, xt ∈ It , can be used to generate a smaller expected loss

Under MSE loss:

f̂ ∗t+h|t = arg

f̂ (xt )

minE

[

(yt+h − f̂ (xt ))2

]

If we can use information in It to produce a more accurate forecast,

then the original forecast would be suboptimal

Effi ciency is conditional on the information set

weak form forecast effi ciency tests include only past forecasts and

past outcomes It = {yt , yt−1, …, f̂t |t−1, et |t−1, …}

strong form effi ciency tests extend this to include all other variables

xt ∈ It

Timmermann (UCSD) Forecasting Winter, 2017 38 / 64

Optimality under MSE loss

First order condition for an optimal forecast under MSE loss:

E [

∂(yt+h − ft+h|t )2

∂ft+h|t

] = −2E

[

yt+h − ft+h|t

]

= −2E

[

et+h|t

]

= 0

Similarly, conditional on information at time t, It :

E [et+h|t |It ] = 0

The expected value of the forecast error must equal zero given

current information, It

Test E [et+h|txt ] = 0 for all variables xt ∈ It known at time t

If the forecast is optimal, no variable known at time t can predict its

future forecast error et+h|t . Otherwise the forecast wouldn’t be

optimal

If I can predict that my forecast will be too low, I should increase my

forecast

Timmermann (UCSD) Forecasting Winter, 2017 39 / 64

Optimality properties under Squared Error Loss

1 Optimal forecasts are unbiased: the forecast error et+h|t has zero

mean, both conditionally and unconditionally:

E [et+h|t ] = E [et+h|t |It ] = 0

2 h-period forecast errors (et+h|t) are uncorrelated with information

available at the time the forecast was computed (It). In particular,

single-period forecast errors, et+1|t , are serially uncorrelated:

E [et+1|tet |t−1] = 0

3 The variance of the forecast error (et+h|t) increases (weakly) in the

forecast horizon, h :

Var(et+h+1|t ) ≥ Var(et+h|t ) for all h ≥ 1

Timmermann (UCSD) Forecasting Winter, 2017 40 / 64

Optimality properties under Squared Error Loss (cont.)

Forecasts should be unbiased. Why? If they were biased, we could

improve the forecast simply by correcting for the bias

Suppose ft+1|t is biased:

yt+1 = 1+ ft+1|t + εt+1, εt+1 ∼ WN(0, σ

2)

The bias-corrected forecast:

f ∗t+1|t = 1+ ft+1|t

is more accurate than ft+1|t

Forecast errors should be unpredictable:

Suppose yt+1 − ft+1|t = et+1 = 0.5et + εt+1 so the one-step forecast

error is serially correlated

Adding back 0.5et to the original forecast yields a more accurate

forecast: f ∗t+1|t = ft+1|t + 0.5et is better than f

∗

t+1|t

Variance of forecast error increases in the forecast horizon

We learn more information as we get closer to the forecast “target”

Timmermann (UCSD) Forecasting Winter, 2017 41 / 64

Informal evaluation methods (Greenbook forecasts)

Time-series graph of forecasts and outcomes {ft+h|t , yt+h}Tt=1

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

-10

-5

0

5

10

GDP growth

time

an

nu

al

iz

ed

c

ha

ng

e

Actual

Forecast

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

0

2

4

6

8

10

12

14

inflation rate

time, t

an

nu

al

iz

ed

c

ha

ng

e

Actual

Forecast

Timmermann (UCSD) Forecasting Winter, 2017 42 / 64

Informal evaluation methods (Greenbook forecasts)

Scatterplots of {ft+h|t , yt+h}Tt=1

-10 -8 -6 -4 -2 0 2 4 6 8 10

-10

-5

0

5

10

GDP growth

forecast

ac

tu

al

0 5 10 15

0

5

10

15

inflation rate

forecast

ac

tu

al

Timmermann (UCSD) Forecasting Winter, 2017 43 / 64

Informal evaluation methods (Greenbook Forecasts)

Plots of ft+h|t − yt against yt+h − yt : directional accuracy

-15 -10 -5 0 5 10 15

-10

-5

0

5

10

forecast

ac

tu

al

GDP growth

-10

-5

0

5

10

-15 -10 -5 0 5 10 15

-4 -3 -2 -1 0 1 2 3 4

-6

-4

-2

0

2

4

6

forecast

ac

tu

al

inflation rate

-6

-4

-2

0

2

4

6

-4 -3 -2 -1 0 1 2 3 4

Timmermann (UCSD) Forecasting Winter, 2017 44 / 64

Informal evaluation methods (Greenbook forecasts)

Plot of forecast errors et+h = yt+h − ft+h|t

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

-5

0

5

10

fo

re

ca

st

e

rr

or

GDP growth

1965 1970 1975 1980 1985 1990 1995 2000 2005 2010

-4

-2

0

2

4

6

fo

re

ca

st

e

rr

or

time, t

Inflation rate

Timmermann (UCSD) Forecasting Winter, 2017 45 / 64

Informal evaluation methods

Theil (1961) suggested the following decomposition:

E [y − f ]2 = E [(y − Ey)− (f − Ef ) + (Ey − Ef )]2

= (Ey − Ef )2 + (σy − σf )2 + 2σyσf (1− ρ)

MSE depends on

squared bias (Ey − Ef )2

squared differences in standard deviations (σy − σf )2

correlation between the forecast and outcome ρ

Timmermann (UCSD) Forecasting Winter, 2017 46 / 64

Pseudo out-of-sample Forecasts

Simulated (“pseudo”) out-of-sample (OoS) forecasts seek to mimic

the “real time”updating underlying most forecasts

What would a forecaster have done (historically) at a given point in

time?

Method splits data into an initial estimation sample (in-sample

period) and a subsequent evaluation sample (OoS period)

Forecasts are based on parameter estimates that use data only up to

the date when the forecast is computed

As the sample expands, the model parameters get updated, resulting

in a sequence of forecasts

Why do out-of-sample forecasting?

control for data mining – harder to “game”

feasible in real time (less “look-ahead” bias)

Timmermann (UCSD) Forecasting Winter, 2017 47 / 64

Pseudo out-of-sample forecasts (cont.)

Out-of-sample (OoS) forecasts impose the constraint that the

parameter estimates of the forecasting model only use information

available at the time the forecast was computed

Only information known at time t can be used to estimate and select

the forecasting model and generate forecasts ft+h|t

Many variants of OoS forecast estimation methods exist. These can

be illustrated for the linear regression model

yt+1 = β

′xt + εt+1

f̂t+1|t = β̂

′

txt

β̂t =

(

t

∑

s=1

ω(s, t)xs−1x

′

s−1

)−1 (

t

∑

s=1

ω(s, t)xs−1y

′

s

)

Different methods use different weighting functions ω(s, t)

Timmermann (UCSD) Forecasting Winter, 2017 48 / 64

Expanding window

Expanding or recursive estimation windows put equal weight on all

observations s = 1, …, t to estimate the parameters of the model:

ω(s, t) =

{

1 1 ≤ s ≤ t

0 otherwise

As time progresses, the estimation sample grows larger, It ⊆ It+1

If the parameters of the model do not change (“stationarity”), the

expanding window approach makes effi cient use of the data and leads

to consistent parameter estimates

If model parameters are subject to change, the approach leads to

biased forecasts

The approach works well empirically due to its use of all available

data which reduces the effect of estimation error on the forecasts

Timmermann (UCSD) Forecasting Winter, 2017 49 / 64

Expanding window

1 t t+1 t+2 T-1

time

Timmermann (UCSD) Forecasting Winter, 2017 50 / 64

Rolling window

Rolling window uses an equal-weighted kernel of the most recent ω̄

observations to estimate the parameters of the forecasting model

ω(s, t) =

{

1 t − ω̄+ 1 ≤ s ≤ t

0 otherwise

Only one ‘design’parameter: ω̄ (length of window)

Practical way to account for slowly-moving changes to the data

generating process

Does this address “breaks”?

window too long immediately after breaks

window too short further away

Timmermann (UCSD) Forecasting Winter, 2017 51 / 64

Rolling window

t-w+1 t-w+2 t t+1 t+2 T-1

time

Timmermann (UCSD) Forecasting Winter, 2017 52 / 64

Fixed window

Fixed window uses only the first ω̄0 observations to once and for all

estimate the parameters of the forecasting model

ω(s, t) =

{

1 1 ≤ s ≤ ω̄0

0 otherwise

This method is typically employed when the costs of estimation are

very high, so re-estimating the model with new data is prohibitively

expensive or impractical in real time

The method also makes analytical results easier

Timmermann (UCSD) Forecasting Winter, 2017 53 / 64

Fixed window

1 w t t+1 t+2 T-1

time

Timmermann (UCSD) Forecasting Winter, 2017 54 / 64

Exponentially declining weights

In the presence of model instability, it is common to discount past

observations using weights that get smaller, the older the data

Exponentially declining weights take the following form:

ω(s, t) =

{

λt−s 1 ≤ s ≤ t

0 otherwise

0 < λ < 1. This method is sometimes called discounted least squares as the discount factor, λ, puts less weight on past observations Timmermann (UCSD) Forecasting Winter, 2017 55 / 64 Comparisons Expanding estimation window: number of observations available for estimating model parameters increases with the sample size Effect of estimation error gets reduced Fixed/rolling/discounted window: parameter estimation error continues to affect the forecasts even as the sample grows large model parameters are inconsistent Forecasts vary more under the short (fixed and rolling) estimation windows than under the expanding window Timmermann (UCSD) Forecasting Winter, 2017 56 / 64 US stock index Timmermann (UCSD) Forecasting Winter, 2017 57 / 64 Monthly US stock returns Timmermann (UCSD) Forecasting Winter, 2017 58 / 64 Monthly inflation Timmermann (UCSD) Forecasting Winter, 2017 59 / 64 US T-bill rate Timmermann (UCSD) Forecasting Winter, 2017 60 / 64 US Stock market volatility Timmermann (UCSD) Forecasting Winter, 2017 61 / 64 Example: Portfolio Choice under Mean-Variance Utility T-bills with known payoff rf vs stocks with uncertain return r s t+1 and excess return rt+1 = r st+1 − rf Wt = $1 : Initial wealth ωt : portion of portfolio held in stocks at time t (1−ωt ) : portion of portfolio held in Tbills Wt+1 : future wealth Wt+1 = (1−ωt )rf +ωt (rt+1 + rf ) = rf +ωt rt+1 Investor chooses ωt to maximize mean-variance utility: Et [U(Wt+1)] = Et [Wt+1]− A 2 Vart (Wt+1) Et [Wt+1] and Vart (Wt+1) : conditional mean and variance of Wt+1 Timmermann (UCSD) Forecasting Winter, 2017 62 / 64 Portfolio Choice under Mean-Variance Utility (cont.) Suppose stock returns follow the process rt+1 = µ+ xt + εt+1 xt ∼ (0, σ2x ), εt+1 ∼ (0, σ2ε ), cov(xt , εt+1) = 0 xt : predictable component given information at t εt+1 : unpredictable innovation (shock) Uninformed investor’s (no information on xt) stock holding: ω∗t = arg ωt max { ωtµ+ rf − A 2 ω2t (σ 2 x + σ 2 ε ) } = µ A(σ2x + σ 2 ε ) E [U(Wt+1(ω ∗ t ))] = rf + µ2 2A(σ2x + σ 2 ε ) = rf + S2 2A S = µ/ √ σ2x + σ 2 ε : unconditional Sharpe ratio Timmermann (UCSD) Forecasting Winter, 2017 63 / 64 Portfolio Choice under Mean-Variance Utility (cont.) Informed investor knows xt . His stock holdings are ω∗t = µ+ xt Aσ2ε Et [U(Wt+1(ω ∗ t ))] = rf + (µ+ xt )2 2Aσ2ε Average (unconditional expectation) value of this is E [Et [U(Wt+1(ω ∗ t ))]] = rf + µ2 + σ2x 2Aσ2ε Increase in expected utility due to knowing the predictor variable: E [U inf ]− E [Uun inf ] = σ2x 2Aσ2ε = R2 2A(1− R2) Plausible empirical numbers, i.e., R2 = 0.005, and A = 3, give an annualized certainty equivalent return of about 1% Timmermann (UCSD) Forecasting Winter, 2017 64 / 64 Lecture 2: Univariate Forecasting Models UCSD, January 18 2017 Allan Timmermann1 1UC San Diego Timmermann (UCSD) ARMA Winter, 2017 1 / 59 1 Introduction to ARMA models 2 Covariance Stationarity and Wold Representation Theorem 3 Forecasting with ARMA models 4 Estimation and Lag Selection for ARMA Models Choice of Lag Order 5 Random walk model 6 Trend and Seasonal Components Seasonal components Trended Variables Timmermann (UCSD) ARMA Winter, 2017 2 / 59 Introduction: ARMA models When building a forecasting model for an economic or financial variable, the variable’s own past time series is often the first thing that comes to mind Many time series are persistent Effect of past and current shocks takes time to evolve Auto Regressive Moving Average (ARMA) models Work hors