# CS计算机代考程序代写 AI finance An Introduction to Causality

An Introduction to Causality

Chris Hansman

Empirical Finance: Methods and Applications Imperial College Business School

Week Two

January 18, 2021

1/95

Last Week’s Lecture: Two Parts

(1) Introduction to the conditional expectation function (CEF) (2) Ordinary Least Squares and the CEF

2/95

Today’s Lecture: Four Parts

(1) Analyzing an experiment in R

Comparing means (t-test) and regression

(2) Causality and the potential outcomes framework What do we mean when we say X causes Y

(3) Linear regression, the CEF, and causality

How do we think about causality in a regression framework?

(4) Instrumental Variables (if time)

The intuition behind instrumental variables

3/95

Part 1: Analyzing an Experiment

Last year I performed an experiment for my MODES scores Can I bribe students to get great teaching evaluations?

Two sections: morning (9:00-12:00) vs. afternoon (1:00-4:00) Evaluation day: gave candy only to the morning students

Compared evaluations across the two: scored 1-5

4/95

Part 1: Analyzing an Experiment

Let’s define a few variables:

yi : Teaching evaluation for student i from 1-5

Di : Treatment status (candy vs. no candy)

1 if student received candy

Di = 0 otherwise

How do we see if the bribe was effective?

Are evaluations higher—on average—for students who got candy? E[yi|Di = 1] > E[yi|Di = 0]

Equivalently:

E[yi|Di = 1]−E[yi|Di = 0] > 0

5/95

Plotting the Difference in (Conditional) Means

5

4

3

2

1

0

01

Treatment: Candy or Not

6/95

Modes Score

Estimating the Difference in (Conditional) Means

Two exercises in R

What is the difference in means between the two groups

What is the magnitude of the t-statistic from a t-test for a difference in these means (two sample, equal variance)

7/95

Regression Provides Simple way to Analyze an Experiment

yi =β0+β1Di+vi

β1 gives the difference in means

t-statistic is equivalent to (two sample-equal variance) t-test

8/95

Part 2: Causality and the potential outcomes framework

Differences in conditional means often represent correlations Not causal effects

Potential outcomes framework helps us define a notion of causality And understand the assumptions necessary for conditional

expectations to reflect causal effects

Experiments allow CEF to capture causal effects

What’s really key is certain assumption: conditional independence

9/95

Conditional Means and Causality

In economics/finance we often want more than conditional means Interested in causal questions:

Does a change in X cause a change in Y?

How do corporate acquisitions affect the value of the acquirer? How does a firm’s capital structure impact investment?

Does corporate governance affect firm performance?

10/95

Thinking Formally About Causality

Consider the example we just studied

Compare evaluations (1 – 5) for two (equal sized) groups: Treated with Candy vs. Not Treated

Group

No Candy Candy

Sample Size Evaluation

80 3.32 80 4.33

Standard Error

0.065 0.092

Treated group provided significantly higher evaluations Difference in conditional means:

E[yi|Di = 1]−E[yi|Di = 0] = 1.01 So does candy cause higher scores?

11/95

Any Potential Reasons Why This Might not be Causal?

What if students in the morning class would have given me higher ratings even without candy?

Maybe I teach better in the morning? Or morning students are more generous We call this a “selection effect”

What if students in the morning respond better to candy? Perhaps they are hungrier

For both of these, would need to answer the following question:

What scores would morning students have given me without candy?

I’ll never know…

12/95

The Potential Outcomes Framework

Ideally, how would we find the impact of candy on evaluations (yi )? Imagine we had access to two parallel universes and could observe

The exact same student (i)

At the exact same time

In one universe they receive candy—in the other they do not

And suppose we could see the student’s evaluations in both worlds Define the variables we would like to see: for each individual i:

yi1 = evaluation with candy yi0 = evaluation without candy

13/95

The Potential Outcomes Framework

If we could see both yi1 and yi0 impact would be easy to find: The causal effect or treatment effect for individual i defined as

yi1 −yi0

Would answer our question—but we never see both yi1 and yi0!

Some people call this the “fundamental problem of causal inference” Intuition: there are two “potential” worlds out there

The treatment variable Di decides which one we see:

yi1 if Di = 1 yi= yi0ifDi=0

14/95

The Potential Outcomes Framework

We can never see the individual treatment effect

yi1 −yi0

We are typically happy with population level alternatives

For example, the average treatment effect:

Average Treatment Effect = E[yi1 −yi0] = E[yi1]−E[yi0]

This is usually what’s meant by the “effect” of x on y

We often aren’t even able to see the average treatment effect We typically only see conditional means

15/95

So What Do Differences in Conditional Means Tell You?

In the MODES example, we compared:

E[yi|Di = 1]−E[yi|Di = 0] = 1.01

We can estimate this We can estimate this

Or, written in terms of potential outcomes:

⇒ E[yi1|Di = 1]−E[yi0|Di = 0] = 1.01

̸= E[yi1]−E[yi0]

Why is this not equal to E[yi1]−E[yi0]?

E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1]

Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]

Selection Effect

16/95

So What Do Differences in Conditional Means Tell You?

E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1]

Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]

Selection Effect ̸= E[yi1]−E[yi0]

Average Treatment Effect

So our estimate could be different from the average effect of treatment E[yi1]−E[yi0] for two reasons:

(1) The morning section might have given better reviews anyway: E[yi0|Di = 1]−E[yi0|Di = 0] > 0

Selection Effect

(2) Candy matters more in the morning:

E[yi1|Di = 1]−E[yi0|Di = 1] ̸= E[yi1]−E[yi0]

Average Treatment Effect for the Treated Group Average Treatment Effect

17/95

What are the Benefits of Experiments?

Truly random experiments solve this “identification” problem—why? Suppose Di is chosen randomly for each individual

This means that Di is independent of (yi1,yi0) in a statistical sense (yi1,yi0) ⊥ Di

Intuition: potential outcomes yi1 and yi0 unrelated to treatment

18/95

What are the Benefits of Experiments?

(yi1,yi0) ⊥ Di

Sidenote: if two random variables are independent (X ⊥ Z ):

E[X|Z] = E[X]

Hence in an experiment:

E[yi1|Di = 1] = E[yi1]

so

E[yi0|Di = 0] = E[yi0]

E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1]−E[yi0]

We can estimate this We can estimate this Average Treatment Effect!

19/95

What are the Benefits of Experiments?

Why does independence fix the two problems with using E[yi1|Di = 1]−E[yi0|Di = 0]

(1) The selection effect is now 0

E[yi0|Di = 1]−E[yi0|Di = 0] = 0

Selection Effect

(2) Average treatment effect for treated group now accurately measures average treatment effect in the whole sample:

E[yi1|Di = 1]−E[yi0|Di = 1] = E[yi1]−E[yi0]

Average Treatment Effect for the Treated Group

20/95

The Conditional Independence Assumption

Of course an experiment is not strictly necessary as long as As long as (yi1,yi0) ⊥ Di

This happens, but not likely for most practical applications

Slightly more reasonable is the conditional independence assumption Let Xi be a set of control variables

Conditional Independence: (yi1,yi0) ⊥ Di|Xi

Independence holds within a group with the same characteristics Xi

E[yi1|Di = 1,Xi]−E[yi0|Di = 0,Xi] = E[yi1 −yi0|Xi]

21/95

A (silly) Example of Conditional Independence

Suppose I randomly treat (Di = 1) 75% of the morning class (with candy)

And randomly treat (Di = 1) 25% of the afternoon class

And suppose I am a much better teacher in the morning

Then

(yi1,yi0) ̸⊥ Di

Because E[yi0|Di = 1] > [yi0|Di = 0]

22/95

23/95

A (silly) Example of Conditional Independence

Let xi = 1 for the morning class, xi = 0 for afternoon We can estimate the means for all groups:

Afternoon, no candy: E[yi|Di =0,xi =0]=3.28 Afternoon, with candy E [yi |Di = 1, xi = 0] = 3.78 Morning, no candy: E[yi|Di = 0,xi = 1] = 3.95

Morning, with candy E[yi|Di = 1,xi = 1] = 4.45

24/95

A (silly) Example of Conditional Independence

If we try to calculate the difference in means directly

E[yi|Di =1]= 1×E[yi|Di =1,xi =0]+3×E[yi|Di =1,xi =1]=4.36

E[yi|Di =0]= 3×E[yi|Di =0,xi =0]+1×E[yi|Di =0,xi =0]=3.45 44

Our estimate is contaminated because the morning class is better E[yi|Di = 1]−E[yi|Di = 0] = 4.36−3.45 = 0.835

44

25/95

A (silly) Example of Conditional Independence

E[yi|Di = 0,xi = 0] = 3.28 and E[yi|Di = 1,xi = 0] = 3.78 E[yi|Di = 0,xi = 1] = 3.95 and E[yi|Di = 1,xi = 1] = 4.45

However, within each class treatment is random yi 0 ⊥ Di |Xi :

So we may recover the average treatment effect conditional on Xi :

E[yi1−yi0|xi =0]=? E[yi1−yi0|xi =1]=?

26/95

A (silly) Example of Conditional Independence

E[yi|Di = 0,xi = 0] = 3.28 and E[yi|Di = 1,xi = 0] = 3.78 E[yi|Di = 0,xi = 1] = 3.95 and E[yi|Di = 1,xi = 1] = 4.45

However, within each class treatment is random yi 0 ⊥ Di |xi :

So we may recover the average treatment effect conditional on xi :

For the afternoon:

E[yi1 −yi0|xi = 0] = 3.78−3.28 = 0.5

For the morning

E[yi1 −yi0|xi = 1] = 4.45−3.95 = 0.5

In this case:

E[yi1 −yi0] = 1E[yi1 −yi0|xi = 0]+ 1E[yi1 −yi0|xi = 1] = 0.5 22

27/95

Part 3: Causality and Regression

When does regression recover a causal effect? Need conditional mean independence

Threats to recovering causal effects

Omitted variables and measurement error

Controlling for confounding variables

28/95

Causality and Regression

When does linear regression capture a causal effect?

Start with a simple case: constant treatment effects

Suppose yi depends only on two random vars.: Di ∈ {0, 1} and vi

yi =α+ρDi +vi Di is some treatment (say candy)

vi is absolutely everything else that impacts yi

How good at R you are, How much you liked Paolo’s course, etc.

Set: E[vi]=0

29/95

Causality and Regression

yi =α+ρDi +vi Can then write potential outcomes:

yi1 =α+ρ+vi yi0 =α+vi

Because ρ is constant, individual and average treatment effects are: yi1 −yi0 = E[yi1 −yi0] = ρ

So ρ is what we want, the effect of treatment

But suppose we don’t know ρ, and only see yi and Di

30/95

Causality and Regression

yi =α+ρDi +vi

Suppose we regress yi on Di, and recover βOLS. When will

βOLS =ρ? 1

Because Di is binary, we have:

βOLS =E[y |D =1]−E[y |D =0]

1iiii

= E[α +ρ +vi|Di = 1]−E[α +vi|Di = 0] = ρ +E[vi|Di = 1]−E[vi|Di = 0]

31/95

Causality and Regression

βOLS = ρ +E[vi|Di = 1]−E[vi|Di = 0]

Selection Effect

So when does βOLS =ρ? 1

Holds under independence assumption (yi1,yi0) ⊥ Di Since yi1 =α+ρ+vi, yi0 =α+vi:

(yi1,yi0) ⊥ Di ⇐⇒ vi ⊥ Di This independence means

⇒ E[vi|Di = 1] = E[vi|Di = 0] = E[vi]

32/95

Causality and Regression

βOLS = ρ +E[vi|Di = 1]−E[vi|Di = 0]

Selection Effect

So when does βOLS =ρ?

β OLS = ρ even under weaker assumption than independence:

Mean Independence : E [vi |Di ] = E [vi ]

33/95

When will βOLS =ρ? 1

Suppose we don’t know ρ

yi =α+ρDi +vi

Our regression coefficient captures the causal effect ( βOLS = ρ1) if: 1

E[vi|Di] = E[vi]

The conditional mean is the same for every Di

More intuitive: E [vi |Di ] = E [vi ] implies that Corr(vi,Di)=0

34/95

What if vi and Di are Correlated?

Our regression coefficient captures the causal effect ( βOLS = ρ) if:

E[vi|Di] = E[vi]

This implies that Corr(vi , Di ) = 0

So anytime Di and vi are correlated:

βOLS ̸=ρ 1

βOLS ̸=α 0

Anytime Di correlated with anything else unobserved that impacts yi

1

35/95

Causality and Regression: Continuous xi

Suppose there is a continuous xi with a causal relationship with yi : A1unit↑inxi increasesyi byafixedamountβ1

e.g. an hour of studying increases your final grade by β1

Tempting to write:

yi =β0+β1xi

But in practice other things impact yi : again call these vi yi =β0+β1xi+vi

e.g. intelligence also matters for your final grade

36/95

OLS Estimator Fits a Line Through the Data

37/95

X

Y

OLS Estimator Fits a Line Through the Data

Y

X

βOLS +βOLSX 01

37/95

OLS Estimator Fits a Line Through the Data

37/95

X

Y

OLS Estimator Fits a Line Through the Data

37/95

βOLS +βOLSX 01

X

Y

Causality and Regression: Continuous xi yi =β0+β1xi+vi

Regression coefficient captures causal effect (β OLS = β ) if: 1

E[vi|xi] = E[vi] Failsanytimecorr(xi,vi)̸=0

An aside: we have used similar notation for 3 different things: 1. β1: the causal effect on yi of a 1 unit change in xi.

2. β OLS = Cov (xi ,yi ) : the population regression coefficient 1 Var(xi)

3. βˆOLS = Cov (xi ,yi ) : the sample regression coefficient

1 Var (xi )

38/95

The Causal Relationship Between X and Y

39/95

β0 + β1X

X

Y

One Data Point

39/95

β0 + β1X

yi=β0 + β1xi+vi vi

β0 + β1xi

xi

X

Y

If E[vi|xi] = E[vi] then βOLS = β1 1

Y

X

β0 + β1X

39/95

If E[vi|xi] = E[vi] then βOLS = β1 1

Y

X

βOLS +βOLSX 01

39/95

If E[vi|xi] = E[vi] then βOLS = β1 1

Y

X

βOLS +βOLSX 01

=β0 +β1X

39/95

What if vi and xi are Positively Correlated?

40/95

β0 + β1X

X

Y

What if vi and xi are Positively Correlated?

Y

vi

X

vj

β0 + β1X

40/95

If Corr(vi,xi) ̸= 0 then βOLS ̸= β1 1

41/95

βOLS +βOLSX 01

β0 + β1X

X

Y

An Example from Economics

Consider the model for wages

Wagesi =β0+β1Si+vi

Where Si is years of schooling

Are there any reasons that Si might be correlated with vi ? If so, this regression won’t uncover β1

42/95

Examples from Corporate Finance

Consider the model for leverage

Leveragei = α + β Profitabilityi + vi

Why might we have trouble recovering β?

(1) Unprofitable firms tend to have higher bankruptcy risk and should

have lower leverage than more profitable firms (tradeoff theory) ⇒ corr(Profitabilityi , vi ) > 0

⇒ E [vi |Profitabilityi ] ̸= E [vi ]

(2) Unprofitable firms have accumulated lower profits in the past and may have to use debt financing implying higher leverage (pecking order theory)

⇒ corr(Profitabilityi , vi ) < 0 ⇒ E [vi |Profitabilityi ] ̸= E [vi ]
43/95
One reason vi and xi might be correlated?
Suppose that we know yi is generated by the following
yi =β0+β1xi+γai+ei
Wherexi andei areuncorrelated,butCorr(ai,xi)>0

Could think of yi as wages, xi as years of schooling, ai as ability

Suppose we see yi and xi but not ai , and have to consider the model

yi=β0+β1xi+ vi

γai+ei

44/95

A Quick Review: Properties of Covariance

A few properties of Covariance: If W,X,Z are random variables: Cov(W,X +Z) = Cov(W,X)+Cov(W,Z)

Cov(X,X) = Var(X)

If a and b are constants:

Cov(aW,bX) = abCov(W,X)

Cov(a+W,Y)=Cov(W,X)

Finally, remember that correlation is just the covariance scaled:

Cov(X,Z) Corr(X,Z) = Var(X)Var(Z)

I’ll switch back and forth between them sometimes

45/95

Omitted Variables Bias

So if we have:

What will the regression of yi on xi give us?

Recall that the regression coefficient is βOLS = Cov(yi,xi) : 1 Var(xi)

βOLS = Cov(yi,xi) 1 Var(xi)

= Cov(β0 +β1xi +vi,xi) Var(xi)

= β1Cov(xi,xi) + Cov(vi,xi) Var (xi ) Var (xi )

= β1 + Cov (vi , xi ) Var(xi)

yi =β0+β1xi+vi

46/95

Omitted Variables Bias

So βOLS is biased 1

βOLS =β +Cov(vi,xi) 1 1 Var(xi)

Bias

Ifvi =γai+ei with

Corr(ai,xi) ̸= 0 and Corr(ei,xi) = 0

we can characterize this bias in simple terms: βOLS=β +Cov(γai+ei,xi)

1 1 Var(xi)

Bias = β1 + γ Cov (ai , xi )

Var(xi)

Bias

47/95

Omitted Variables Bias

βOLS =β +γCov(ai,xi) 1 1 Var(xi)

=β +γδOLS 11

Where δOLS is the coefficient from the regression:

a =δOLS+δOLSx +ηOLS i01ii

48/95

Omitted Variables Bias

Good heuristic for evaluating OLS estimates:

βOLS =β +γδOLS 111

Bias

γ: relationship between ai and Yi

δOLS: relationship between ai and xi

1

Might not be able to measure γ or δOLS—but can often make a 1

good guess

49/95

Impact of Schooling on Wages

Suppose wages (yi ) are determined by:

yi =β0+β1xi+γai+ei

and we see years of schooling (xi ) but not ability (ai ) Corr(xi,ai) > 0 and Corr(yi,ai) > 0

We estimate: And recover

yi =β0+β1xi+vi

βOLS =β +γδOLS 111

Bias

50/95

Impact of Schooling on Wages

βOLS =β +γδOLS 111

Bias

Is our estimated βOLS larger or smaller than β1? 1

menti.com

51/95

Controlling for a Confounding Variable

yi =β0+β1xi+γai+ei Suppose we are able to observe ability

e.g. an IQ test is all that matters For simplicity, let xi be binary

xi = 1 if individual i has an MSc, 0 otherwise Suppose we regress yi on xi and ai

βOLS =E[y |x =1,a ]−E[y |x =0,a ] 1 iiiiii

= E[β0 +β1 +γai +ei|xi = 1,ai]−E[β0 +γai +ei|xi = 0,ai]

52/95

Controlling for a Confounding Variable

βOLS =E[β +β +γa +e |x =1,a ]−E[β +γa +e |x =0,a ] 001iiii0iiii

Canceling out terms gives:

βOLS =β +E[e |x =1,a ]−E[e |x =0,a ]

11iiiiii So our β OLS = β1 if the following condition holds:

1

E[ei|xi,ai]−E[ei|ai] This is called Conditional Mean Independence

A slightly weaker version of our conditional independence assumption (yi1,yi0)⊥xi|ai

53/95

Example: Controlling for a Confounding Variable

54/95

Republican Votes and Income: South and North

55/95

Republican Votes and Income: South and North

56/95

Controlling for a Confounding Variable

Suppose we run the following regression

repvotesi =β0+β1income+vi

What is βˆols? menti.com… 1

So does being rich decrease Republican votes?

Suppose we run separately in the South and North:

repvotesi =β0+β1income+vi repvotesi =β0+β1income+vi

What is βˆols in the south? 1

Within regions, income positively associated with republican votes

57/95

Controlling for a Confounding Variable

Now suppose instead we run the following regression repvotesi = β0 + β1 income + γ southi + ei

Where southi = 1 for southern states We estimate βˆols = 0.340

This is just the (weighted) average for the two regions: βˆols ≈ 1βˆols + 1βˆols

12121

(weights in general do not have to be 1 ) 2

1

58/95

Why is This? Regression Anatomy

Suppose we have the following (multivariate) regression y =βOLS+βOLSx +Z′γ+v

i01iii

Here Zi is a potentially multidimensional set of controls Then the OLS estimator is algebraically equal to

βOLS = Cov(yi,x ̃i) 1 V a r ( x ̃ i )

Where x ̃i is the residual from a regression of xi on Zi

x ̃ =x −(δOLS+Z′δOLS) ii0i

59/95

Why is This? Regression Anatomy

βOLS = Cov(yi,x ̃i) 1 V a r ( x ̃ i )

Where x ̃i is the residual from a regression of xi on Zi x ̃ =x −(δOLS+Z′δOLS)

ii0i

Coefficient from a multiple regression is the same as that from a single regression!

After first subtracting (partialling out) the part of xi explained by Zi

60/95

Republican Votes and Income: South and North

61/95

Residualizing W.R.T. South Removes Difference in Income

62/95

OLS Finds Line of Best Fit on the “Residuals”

63/95

OLS Finds Line of Best Fit on the “Residuals”

63/95

Measurement Error

An omitted variable is a common reason why β OLS ̸= β 1

There are several others, including measurement error

Suppose yi is generated by:

y =β +β x∗+e i01ii

But we can’t exactly see xi∗, instead we see:

where Corr(xi∗,ηi)=0 Why might this happen?

x =x∗+η iii

64/95

Measurement Error

y =β +β x∗+e i01ii

x =x∗+η ⇒x∗ =x −η iiiiii

⇒yi =β0+β1xi−β1ηi+ei

vi

yi =β0+β1xi+vi

65/95

Measurement Error

So we can only estimate

yi =β0+β1xi+ vi

−β1 ηi +ei

But because of the measurement error: Corr (xi , vi ) ̸= 0

Why?

And our estimates are off!

Cov(yi,xi) = βOLS ̸= β Var(xi) 1 1

66/95

Measurement Error

Measurement error in xi really also an omitted variables problem With ηi as the omitted variable.

What happens if we mismeasure yi ?

67/95

Part 4: An Overview of IV

Assumptions for a valid instrument Two interpretations of an IV

The ratio of two OLS coefficients Two stage least squares

An example: the impact of schooling on wages

68/95

Hope remains for β

Suppose Corr (xi , vi ) ̸= 0 but want to estimate causal effect β yi =β0+β1xi+vi

For now keep yi =wages, xi =years of schooling in mind An instrument (or instrumental variable) can help

69/95

What makes a good instrument?

Suppose there is a variable zi that should be totally unrelated to yi For example: should the quarter you were born in impact your wage?

70/95

A sample of birth quarters

71/95

1234 Quarter of Birth

Number of Observations

0 20000 40000 60000 80000 100000

What makes a good instrument?

Suppose there is a variable zi that should be totally unrelated to yi For example: should your birth quarter impact your wage?

Except for one thing

zi has a direct impact on xi

72/95

In the US birth quarter influences years of schooling

73/95

1234 Quarter of Birth

Average Years of Education

12.6 12.7 12.8 12.9

In the US birth quarter influences years of schooling

Many US states mandate that students begin school in the calendar year when they turn 6

School years start in mid- to late- year (e.g., September)

Many states mandate students stay in school until they turn 16

74/95

In the US birth quarter influences years of schooling

Consider two students

(1) One born in February (Q1)

(2) One born in November (Q4)

Q1 student: starting school age ≈ 6.5

Q4 student: starting school age < 6
There is then variation in schooling completed when each turns 16
(Q4 > Q1)

75/95

Intuition behind instrumental variables

Suppose we really believe that zi should have no impact on yi except by changing xi

e.g. no possible other way birth quarter could influence wages

Then the impact of zi on yi should tell us something about the effect we are looking for

Any impact of birth quarter on wages must be because it raises education

76/95

Birth quarter impacts wages

77/95

1234 Quarter of Birth

Average Yearly Wage in Dollars

22500 22600 22700 22800 22900 23000

Intuition behind instrumental variables

Being born in 4th quart. vs. 1st quart. increases yearly wages by: ≈ $300

If this all happens because quarter of birth increases education, how do we recover the impact of education on wages?

Being born in 4th quart. vs. 1st quart. increases education by: ≈ 0.2 years

So each year of schooling increases wages by: Impact of birth quart. (zi ) on wages (yi )

βiv = $300 0.2

Impact of birth quart. (zi ) on education (xi )

=$1500

78/95

Formalizing the Instrumental Variables Assumptions

yi =β0+β1xi+vi

Our informal assumption: zi should change xi , but have absolutely

no other impact on yi . Formally:

1. Cov [zi , xi ] ̸= 0 (Instrument Relevance)

Intuition: zi must change xi

2. Cov [zi , vi ] = 0 (Exclusion Restriction)

Intuition: zi has absolutely no other impact on yi

Recall that vi is everything else outside of xi that influences yi

79/95

Hope remains for β

1. Instrument Relevance: Cov [zi , xi ] ̸= 0 2. Exclusion Restriction: Cov [zi , vi ] = 0

Under these assumptions, we may consistently estimate β IV = β βIV = Cov(yi,zi) = β

Cov(xi,zi)

This ratio should look familiar from our example…

And Cov(yi,zi) and Cov(xi,zi) can be estimated from the data!

80/95

Getting βIV in practice

An alternative way of writing this :

Coefficient from regression yi on zi

βIV = Cov(yi,zi) = Cov(yi,zi)/Var(zi) Cov(xi,zi) Cov(xi,zi)/Var(zi)

Coefficient from regression xi on zi

81/95

Getting βIV in practice

This means that if we run two regressions:

1. “First stage” impact of zi on xi

xi = α1 + φ zi + ηi

2. “Reduced form” impact of zi on yi

yi =α2+ρzi+ui

Then we can write:

βIV = Cov(yi,zi)/Var(zi) = ρOLS

Cov(xi,zi)/Var(zi) φOLS

Impact of zi on xi

Impact of zi onyi

82/95

Getting βIV in practice: Two stage least squares A more common way of estimating β IV :

1. Estimate φOLS in first stage:

xi = α1 + φ zi + ηi

2. Predict the part of Xi explained by zi

xˆ =αOLS +φOLSz

i1i 3. Regress Yi on predicted xˆi in a second stage

Note that:

y i = α 2 + β xˆ i + u i

β2ndStage = Cov(yi,xˆi) =β V a r ( xˆ i )

83/95

Getting βIV in practice: Two stage least squares

β2ndStage = Cov(yi,xˆi) V a r ( xˆ i )

= Cov(yi,α1 +φOLSzi) Var(α1 +φOLSzi)

= φOLS ·Cov(yi,zi) φOLS ·φOLSVar(zi)

= ρOLS φ OLS

=βIV =β

84/95

Two stage least squares works with more Xs and Zs

The same approach works with multiple X′s, Z′s and with control variables

Y =α+X′β+W′θ+v iiii

Xi =[xi1,…,xiM]′ with Cov(Xim,vi)̸=0 (“endogenous variables”) Wi = [wi1,…,wiL]′ (control variables)

Zi =[zi1,…,ziK]′ with E[zivi]=0 (instruments)

Note that K must be greater than M

More instruments than endogenous variables

85/95

Two stage least squares works with multiple Xs and Zs For each xim, run the first stage regression:

x =δ+Z′γ+W′φ+ε im i i i

Generate predicted values:

xˆ = δOLS +Z′γOLS +W′φOLS

im i i

Run the second stage regression

Y =α+Xˆ′β+W′θ+v

iiii WhereXˆi =[xˆi1,…,xˆiM]′

β2ndStage =β

86/95

Back to Example IV: Education and Wages

Adapted from Angrist and Krueger, 1991

Does an additional year of education lead to higher wages?

yi =β0+β1xi+vi

Are there any concerns about OVB here?

What are the requirements for a valid instrument zi ?

87/95

Example IV: Education and Wages

Angrist and Krueger use the quarter of birth as an instrument for education

Many US states mandate that students begin school in the calendar year when they turn 6

School years start in mid- to late- year (e.g., September)

Many states mandate students stay in school until they turn 16

88/95

Example IV: Education and Wages

Consider two students

(1) One born in February (Q1)

(2) One born in November (Q4)

Q1 student: starting school age ≈ 6.5

Q4 student: starting school age < 6
There is then variation in schooling completed when each turns 16
(Q4 > Q1)

89/95

First Stage

90/95

Reduced Form

91/95

Example IV: Education and Wages

A simple instrument:

zi = 1{Quarter of Birth = 1}

First stage:

xi =γ0+γ1xi+εi Do you expect γOLS >0 or γOLS <0?
Predict xˆ = γOLS +γOLSz i01i
Second stage:
11
y i = β 0 + β 1 xˆ i + v i
92/95
OLS and IV estimates of the economic returns to schooling
93/95
Today’s Lecture: Four Parts
(1) Analyzing an experiment in R
Comparing means (t-test) and regression
(2) Causality and the potential outcomes framework What do we mean when we say X causes Y
(3) Linear regression, the CEF, and causality
How do we think about causality in a regression framework?
(4) Instrumental Variables (if time)
The intuition behind instrumental variables
94/95
Next Week
Next week: introduction to panel data
95/95