# CS计算机代考程序代写 AI finance An Introduction to Causality

An Introduction to Causality
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
Week Two
January 18, 2021
1/95

Last Week’s Lecture: Two Parts
(1) Introduction to the conditional expectation function (CEF) (2) Ordinary Least Squares and the CEF
2/95

Today’s Lecture: Four Parts
(1) Analyzing an experiment in R
􏰒 Comparing means (t-test) and regression
(2) Causality and the potential outcomes framework 􏰒 What do we mean when we say X causes Y
(3) Linear regression, the CEF, and causality
􏰒 How do we think about causality in a regression framework?
(4) Instrumental Variables (if time)
􏰒 The intuition behind instrumental variables
3/95

Part 1: Analyzing an Experiment
􏰒 Last year I performed an experiment for my MODES scores 􏰒 Can I bribe students to get great teaching evaluations?
􏰒 Two sections: morning (9:00-12:00) vs. afternoon (1:00-4:00) 􏰒 Evaluation day: gave candy only to the morning students
􏰒 Compared evaluations across the two: scored 1-5
4/95

Part 1: Analyzing an Experiment
􏰒 Let’s define a few variables:
􏰒 yi : Teaching evaluation for student i from 1-5
􏰒 Di : Treatment status (candy vs. no candy)
Di = 0 otherwise
􏰒 How do we see if the bribe was effective?
􏰒 Are evaluations higher—on average—for students who got candy? E[yi|Di = 1] > E[yi|Di = 0]
􏰒 Equivalently:
E[yi|Di = 1]−E[yi|Di = 0] > 0
5/95

Plotting the Difference in (Conditional) Means
5
4
3
2
1
0
01
Treatment: Candy or Not
6/95
Modes Score

Estimating the Difference in (Conditional) Means
􏰒 Two exercises in R
􏰒 What is the difference in means between the two groups
􏰒 What is the magnitude of the t-statistic from a t-test for a difference in these means (two sample, equal variance)
7/95

Regression Provides Simple way to Analyze an Experiment
yi =β0+β1Di+vi
􏰒 β1 gives the difference in means
􏰒 t-statistic is equivalent to (two sample-equal variance) t-test
8/95

Part 2: Causality and the potential outcomes framework
􏰒 Differences in conditional means often represent correlations 􏰒 Not causal effects
􏰒 Potential outcomes framework helps us define a notion of causality 􏰒 And understand the assumptions necessary for conditional
expectations to reflect causal effects
􏰒 Experiments allow CEF to capture causal effects
􏰒 What’s really key is certain assumption: conditional independence
9/95

Conditional Means and Causality
􏰒 In economics/finance we often want more than conditional means 􏰒 Interested in causal questions:
Does a change in X cause a change in Y?
􏰒 How do corporate acquisitions affect the value of the acquirer? 􏰒 How does a firm’s capital structure impact investment?
􏰒 Does corporate governance affect firm performance?
10/95

􏰒 Consider the example we just studied
􏰒 Compare evaluations (1 – 5) for two (equal sized) groups: 􏰒 Treated with Candy vs. Not Treated
Group
No Candy Candy
Sample Size Evaluation
80 3.32 80 4.33
Standard Error
0.065 0.092
􏰒 Treated group provided significantly higher evaluations 􏰒 Difference in conditional means:
E[yi|Di = 1]−E[yi|Di = 0] = 1.01 􏰒 So does candy cause higher scores?
11/95

Any Potential Reasons Why This Might not be Causal?
􏰒 What if students in the morning class would have given me higher ratings even without candy?
􏰒 Maybe I teach better in the morning? 􏰒 Or morning students are more generous 􏰒 We call this a “selection effect”
􏰒 What if students in the morning respond better to candy? 􏰒 Perhaps they are hungrier
􏰒 For both of these, would need to answer the following question:
􏰒 What scores would morning students have given me without candy?
􏰒 I’ll never know…
12/95

The Potential Outcomes Framework
􏰒 Ideally, how would we find the impact of candy on evaluations (yi )? 􏰒 Imagine we had access to two parallel universes and could observe
􏰒 The exact same student (i)
􏰒 At the exact same time
􏰒 In one universe they receive candy—in the other they do not
􏰒 And suppose we could see the student’s evaluations in both worlds 􏰒 Define the variables we would like to see: for each individual i:
yi1 = evaluation with candy yi0 = evaluation without candy
13/95

The Potential Outcomes Framework
􏰒 If we could see both yi1 and yi0 impact would be easy to find: 􏰒 The causal effect or treatment effect for individual i defined as
yi1 −yi0
􏰒 Would answer our question—but we never see both yi1 and yi0!
􏰒 Some people call this the “fundamental problem of causal inference” 􏰒 Intuition: there are two “potential” worlds out there
􏰒 The treatment variable Di decides which one we see:
􏰍yi1 if Di = 1 yi= yi0ifDi=0
14/95

The Potential Outcomes Framework
􏰒 We can never see the individual treatment effect
yi1 −yi0
􏰒 We are typically happy with population level alternatives
􏰒 For example, the average treatment effect:
Average Treatment Effect = E[yi1 −yi0] = E[yi1]−E[yi0]
􏰒 This is usually what’s meant by the “effect” of x on y
􏰒 We often aren’t even able to see the average treatment effect 􏰒 We typically only see conditional means
15/95

So What Do Differences in Conditional Means Tell You?
􏰒 In the MODES example, we compared:
E[yi|Di = 1]−E[yi|Di = 0] = 1.01
􏰐 􏰏􏰎 􏰑 􏰐 􏰏􏰎 􏰑
We can estimate this We can estimate this
􏰒 Or, written in terms of potential outcomes:
⇒ E[yi1|Di = 1]−E[yi0|Di = 0] = 1.01
̸= E[yi1]−E[yi0]
􏰒 Why is this not equal to E[yi1]−E[yi0]?
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1]
􏰐 􏰏􏰎 􏰑
Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]
􏰐 􏰏􏰎 􏰑
Selection Effect
16/95

So What Do Differences in Conditional Means Tell You?
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1|Di = 1]−E[yi0|Di = 1] 􏰐 􏰏􏰎 􏰑
Average Treatment Effect for the Treated Group +E[yi0|Di = 1]−E[yi0|Di = 0]
􏰐 􏰏􏰎 􏰑
Selection Effect ̸= E[yi1]−E[yi0]
􏰐 􏰏􏰎 􏰑
Average Treatment Effect
􏰒 So our estimate could be different from the average effect of treatment E[yi1]−E[yi0] for two reasons:
(1) The morning section might have given better reviews anyway: E[yi0|Di = 1]−E[yi0|Di = 0] > 0
􏰐 􏰏􏰎 􏰑
Selection Effect
(2) Candy matters more in the morning:
E[yi1|Di = 1]−E[yi0|Di = 1] ̸= E[yi1]−E[yi0]
􏰐 􏰏􏰎 􏰑􏰐􏰏􏰎􏰑
Average Treatment Effect for the Treated Group Average Treatment Effect
17/95

What are the Benefits of Experiments?
􏰒 Truly random experiments solve this “identification” problem—why? 􏰒 Suppose Di is chosen randomly for each individual
􏰒 This means that Di is independent of (yi1,yi0) in a statistical sense (yi1,yi0) ⊥ Di
􏰒 Intuition: potential outcomes yi1 and yi0 unrelated to treatment
18/95

What are the Benefits of Experiments?
(yi1,yi0) ⊥ Di
􏰒 Sidenote: if two random variables are independent (X ⊥ Z ):
E[X|Z] = E[X]
􏰒 Hence in an experiment:
E[yi1|Di = 1] = E[yi1]
so
E[yi0|Di = 0] = E[yi0]
E[yi1|Di = 1]−E[yi0|Di = 0] = E[yi1]−E[yi0]
􏰐 􏰏􏰎 􏰑 􏰐 􏰏􏰎 􏰑 􏰐 􏰏􏰎 􏰑
We can estimate this We can estimate this Average Treatment Effect!
19/95

What are the Benefits of Experiments?
􏰒 Why does independence fix the two problems with using E[yi1|Di = 1]−E[yi0|Di = 0]
(1) The selection effect is now 0
E[yi0|Di = 1]−E[yi0|Di = 0] = 0
􏰐 􏰏􏰎 􏰑
Selection Effect
(2) Average treatment effect for treated group now accurately measures average treatment effect in the whole sample:
E[yi1|Di = 1]−E[yi0|Di = 1] = E[yi1]−E[yi0] 􏰐 􏰏􏰎 􏰑
Average Treatment Effect for the Treated Group
20/95

The Conditional Independence Assumption
􏰒 Of course an experiment is not strictly necessary as long as As long as (yi1,yi0) ⊥ Di
􏰒 This happens, but not likely for most practical applications
􏰒 Slightly more reasonable is the conditional independence assumption 􏰒 Let Xi be a set of control variables
Conditional Independence: (yi1,yi0) ⊥ Di|Xi
􏰒 Independence holds within a group with the same characteristics Xi
E[yi1|Di = 1,Xi]−E[yi0|Di = 0,Xi] = E[yi1 −yi0|Xi]
21/95

A (silly) Example of Conditional Independence
􏰒 Suppose I randomly treat (Di = 1) 75% of the morning class (with candy)
􏰒 And randomly treat (Di = 1) 25% of the afternoon class
􏰒 And suppose I am a much better teacher in the morning
􏰒 Then
(yi1,yi0) ̸⊥ Di
􏰒 Because E[yi0|Di = 1] > [yi0|Di = 0]
22/95

􏰚􏱈􏰓 􏱉􏰼􏰔􏱇􏰔􏱊 􏰚􏱈􏰙 􏱉􏰼􏰔􏱇􏰔􏱊 􏰚􏱈􏰓 􏱉􏱃􏰔􏱇􏰔􏱊 􏰚􏱈􏰙 􏱉􏱃􏰔􏱇􏰔􏱊 􏱀􏱁􏰰􏰝􏰝 􏰰􏰜􏰽 􏱄􏰭􏰛􏰰􏰟􏰮􏰛􏰜􏰟
23/95
􏱇􏰬􏰽􏰛􏰝 􏰓􏰙􏰕􏰻􏰖􏰢

A (silly) Example of Conditional Independence
􏰒 Let xi = 1 for the morning class, xi = 0 for afternoon 􏰒 We can estimate the means for all groups:
􏰒 Afternoon, no candy: E[yi|Di =0,xi =0]=3.28 􏰒 Afternoon, with candy E [yi |Di = 1, xi = 0] = 3.78 􏰒 Morning, no candy: E[yi|Di = 0,xi = 1] = 3.95
􏰒 Morning, with candy E[yi|Di = 1,xi = 1] = 4.45
24/95

A (silly) Example of Conditional Independence
􏰒 If we try to calculate the difference in means directly
E[yi|Di =1]= 1×E[yi|Di =1,xi =0]+3×E[yi|Di =1,xi =1]=4.36
E[yi|Di =0]= 3×E[yi|Di =0,xi =0]+1×E[yi|Di =0,xi =0]=3.45 44
􏰒 Our estimate is contaminated because the morning class is better E[yi|Di = 1]−E[yi|Di = 0] = 4.36−3.45 = 0.835
44
25/95

A (silly) Example of Conditional Independence
E[yi|Di = 0,xi = 0] = 3.28 and E[yi|Di = 1,xi = 0] = 3.78 E[yi|Di = 0,xi = 1] = 3.95 and E[yi|Di = 1,xi = 1] = 4.45
􏰒 However, within each class treatment is random yi 0 ⊥ Di |Xi :
􏰒 So we may recover the average treatment effect conditional on Xi :
E[yi1−yi0|xi =0]=? E[yi1−yi0|xi =1]=?
26/95

A (silly) Example of Conditional Independence
E[yi|Di = 0,xi = 0] = 3.28 and E[yi|Di = 1,xi = 0] = 3.78 E[yi|Di = 0,xi = 1] = 3.95 and E[yi|Di = 1,xi = 1] = 4.45
􏰒 However, within each class treatment is random yi 0 ⊥ Di |xi :
􏰒 So we may recover the average treatment effect conditional on xi :
􏰒 For the afternoon:
E[yi1 −yi0|xi = 0] = 3.78−3.28 = 0.5
􏰒 For the morning
E[yi1 −yi0|xi = 1] = 4.45−3.95 = 0.5
􏰒 In this case:
E[yi1 −yi0] = 1E[yi1 −yi0|xi = 0]+ 1E[yi1 −yi0|xi = 1] = 0.5 22
27/95

Part 3: Causality and Regression
􏰒 When does regression recover a causal effect? 􏰒 Need conditional mean independence
􏰒 Threats to recovering causal effects
􏰒 Omitted variables and measurement error
􏰒 Controlling for confounding variables
28/95

Causality and Regression
􏰒 When does linear regression capture a causal effect?
􏰒 Suppose yi depends only on two random vars.: Di ∈ {0, 1} and vi
yi =α+ρDi +vi 􏰒 Di is some treatment (say candy)
􏰒 vi is absolutely everything else that impacts yi
􏰒 How good at R you are, How much you liked Paolo’s course, etc.
􏰒 Set: E[vi]=0
29/95

Causality and Regression
yi =α+ρDi +vi 􏰒 Can then write potential outcomes:
yi1 =α+ρ+vi yi0 =α+vi
􏰒 Because ρ is constant, individual and average treatment effects are: yi1 −yi0 = E[yi1 −yi0] = ρ
􏰒 So ρ is what we want, the effect of treatment
􏰒 But suppose we don’t know ρ, and only see yi and Di
30/95

Causality and Regression
yi =α+ρDi +vi
􏰒 Suppose we regress yi on Di, and recover βOLS. When will
βOLS =ρ? 1
􏰒 Because Di is binary, we have:
βOLS =E[y |D =1]−E[y |D =0]
1iiii
= E[α +ρ +vi|Di = 1]−E[α +vi|Di = 0] = ρ +E[vi|Di = 1]−E[vi|Di = 0]
31/95

Causality and Regression
βOLS = ρ +E[vi|Di = 1]−E[vi|Di = 0] 􏰐 􏰏􏰎 􏰑
Selection Effect
􏰒 So when does βOLS =ρ? 1
􏰒 Holds under independence assumption (yi1,yi0) ⊥ Di 􏰒 Since yi1 =α+ρ+vi, yi0 =α+vi:
(yi1,yi0) ⊥ Di ⇐⇒ vi ⊥ Di 􏰒 This independence means
⇒ E[vi|Di = 1] = E[vi|Di = 0] = E[vi]
32/95

Causality and Regression
βOLS = ρ +E[vi|Di = 1]−E[vi|Di = 0] 􏰐 􏰏􏰎 􏰑
Selection Effect
􏰒 So when does βOLS =ρ?
􏰒 β OLS = ρ even under weaker assumption than independence:
Mean Independence : E [vi |Di ] = E [vi ]
33/95

When will βOLS =ρ? 1
􏰒 Suppose we don’t know ρ
yi =α+ρDi +vi
􏰒 Our regression coefficient captures the causal effect ( βOLS = ρ1) if: 1
E[vi|Di] = E[vi]
􏰒 The conditional mean is the same for every Di
􏰒 More intuitive: E [vi |Di ] = E [vi ] implies that Corr(vi,Di)=0
34/95

What if vi and Di are Correlated?
􏰒 Our regression coefficient captures the causal effect ( βOLS = ρ) if:
E[vi|Di] = E[vi]
􏰒 This implies that Corr(vi , Di ) = 0
􏰒 So anytime Di and vi are correlated:
βOLS ̸=ρ 1
βOLS ̸=α 0
􏰒 Anytime Di correlated with anything else unobserved that impacts yi
1
35/95

Causality and Regression: Continuous xi
􏰒 Suppose there is a continuous xi with a causal relationship with yi : 􏰒 A1unit↑inxi increasesyi byafixedamountβ1
􏰒 e.g. an hour of studying increases your final grade by β1
􏰒 Tempting to write:
yi =β0+β1xi
􏰒 But in practice other things impact yi : again call these vi yi =β0+β1xi+vi
36/95

OLS Estimator Fits a Line Through the Data
37/95
X
Y

OLS Estimator Fits a Line Through the Data
Y
X
βOLS +βOLSX 01
37/95

OLS Estimator Fits a Line Through the Data
37/95
X
Y

OLS Estimator Fits a Line Through the Data
37/95
βOLS +βOLSX 01
X
Y

Causality and Regression: Continuous xi yi =β0+β1xi+vi
􏰒 Regression coefficient captures causal effect (β OLS = β ) if: 1
E[vi|xi] = E[vi] 􏰒 Failsanytimecorr(xi,vi)̸=0
􏰒 An aside: we have used similar notation for 3 different things: 1. β1: the causal effect on yi of a 1 unit change in xi.
2. β OLS = Cov (xi ,yi ) : the population regression coefficient 1 Var(xi)
􏱋
3. βˆOLS = Cov (xi ,yi ) : the sample regression coefficient
1􏱋 Var (xi )
38/95

The Causal Relationship Between X and Y
39/95
β0 + β1X
X
Y

One Data Point
39/95
β0 + β1X
yi=β0 + β1xi+vi vi
β0 + β1xi
xi
X
Y

If E[vi|xi] = E[vi] then βOLS = β1 1
Y
X
β0 + β1X
39/95

If E[vi|xi] = E[vi] then βOLS = β1 1
Y
X
βOLS +βOLSX 01
39/95

If E[vi|xi] = E[vi] then βOLS = β1 1
Y
X
βOLS +βOLSX 01
=β0 +β1X
39/95

What if vi and xi are Positively Correlated?
40/95
β0 + β1X
X
Y

What if vi and xi are Positively Correlated?
Y
vi
X
vj
β0 + β1X
40/95

If Corr(vi,xi) ̸= 0 then βOLS ̸= β1 1
41/95
βOLS +βOLSX 01
β0 + β1X
X
Y

An Example from Economics
􏰒 Consider the model for wages
Wagesi =β0+β1Si+vi
􏰒 Where Si is years of schooling
􏰒 Are there any reasons that Si might be correlated with vi ? 􏰒 If so, this regression won’t uncover β1
42/95

Examples from Corporate Finance
􏰒 Consider the model for leverage
Leveragei = α + β Profitabilityi + vi
􏰒 Why might we have trouble recovering β?
(1) Unprofitable firms tend to have higher bankruptcy risk and should
have lower leverage than more profitable firms (tradeoff theory) ⇒ corr(Profitabilityi , vi ) > 0
⇒ E [vi |Profitabilityi ] ̸= E [vi ]
(2) Unprofitable firms have accumulated lower profits in the past and may have to use debt financing implying higher leverage (pecking order theory)
⇒ corr(Profitabilityi , vi ) < 0 ⇒ E [vi |Profitabilityi ] ̸= E [vi ] 43/95 One reason vi and xi might be correlated? 􏰒 Suppose that we know yi is generated by the following yi =β0+β1xi+γai+ei 􏰒 Wherexi andei areuncorrelated,butCorr(ai,xi)>0
􏰒 Could think of yi as wages, xi as years of schooling, ai as ability
􏰒 Suppose we see yi and xi but not ai , and have to consider the model
yi=β0+β1xi+ vi 􏰐􏰏􏰎􏰑
γai+ei
44/95

A Quick Review: Properties of Covariance
􏰒 A few properties of Covariance: If W,X,Z are random variables: Cov(W,X +Z) = Cov(W,X)+Cov(W,Z)
Cov(X,X) = Var(X)
􏰒 If a and b are constants:
Cov(aW,bX) = abCov(W,X)
Cov(a+W,Y)=Cov(W,X)
􏰒 Finally, remember that correlation is just the covariance scaled:
Cov(X,Z) Corr(X,Z) = 􏰺Var(X)􏰺Var(Z)
􏰒 I’ll switch back and forth between them sometimes
45/95

Omitted Variables Bias
􏰒 So if we have:
􏰒 What will the regression of yi on xi give us?
􏰒 Recall that the regression coefficient is βOLS = Cov(yi,xi) : 1 Var(xi)
βOLS = Cov(yi,xi) 1 Var(xi)
= Cov(β0 +β1xi +vi,xi) Var(xi)
= β1Cov(xi,xi) + Cov(vi,xi) Var (xi ) Var (xi )
= β1 + Cov (vi , xi ) Var(xi)
yi =β0+β1xi+vi
46/95

Omitted Variables Bias
􏰒 So βOLS is biased 1
βOLS =β +Cov(vi,xi) 1 1 Var(xi)
􏰐 􏰏􏰎 􏰑
Bias
􏰒 Ifvi =γai+ei with
Corr(ai,xi) ̸= 0 and Corr(ei,xi) = 0
we can characterize this bias in simple terms: βOLS=β +Cov(γai+ei,xi)
1 1 Var(xi)
􏰐 􏰏􏰎 􏰑
Bias = β1 + γ Cov (ai , xi )
Var(xi)
􏰐 􏰏􏰎 􏰑
Bias
47/95

Omitted Variables Bias
βOLS =β +γCov(ai,xi) 1 1 Var(xi)
=β +γδOLS 11
􏰒 Where δOLS is the coefficient from the regression:
a =δOLS+δOLSx +ηOLS i01ii
48/95

Omitted Variables Bias
􏰒 Good heuristic for evaluating OLS estimates:
βOLS =β +γδOLS 111
􏰐 􏰏􏰎 􏰑
Bias
􏰒 γ: relationship between ai and Yi
􏰒 δOLS: relationship between ai and xi
1
􏰒 Might not be able to measure γ or δOLS—but can often make a 1
good guess
49/95

Impact of Schooling on Wages
􏰒 Suppose wages (yi ) are determined by:
yi =β0+β1xi+γai+ei
􏰒 and we see years of schooling (xi ) but not ability (ai ) Corr(xi,ai) > 0 and Corr(yi,ai) > 0
􏰒 We estimate: 􏰒 And recover
yi =β0+β1xi+vi
βOLS =β +γδOLS 111
􏰐 􏰏􏰎 􏰑
Bias
50/95

Impact of Schooling on Wages
βOLS =β +γδOLS 111
􏰐 􏰏􏰎 􏰑
Bias
􏰒 Is our estimated βOLS larger or smaller than β1? 1
􏰒 menti.com
51/95

Controlling for a Confounding Variable
yi =β0+β1xi+γai+ei 􏰒 Suppose we are able to observe ability
􏰒 e.g. an IQ test is all that matters 􏰒 For simplicity, let xi be binary
􏰒 xi = 1 if individual i has an MSc, 0 otherwise 􏰒 Suppose we regress yi on xi and ai
βOLS =E[y |x =1,a ]−E[y |x =0,a ] 1 iiiiii
= E[β0 +β1 +γai +ei|xi = 1,ai]−E[β0 +γai +ei|xi = 0,ai]
52/95

Controlling for a Confounding Variable
βOLS =E[β +β +γa +e |x =1,a ]−E[β +γa +e |x =0,a ] 001iiii0iiii
􏰒 Canceling out terms gives:
βOLS =β +E[e |x =1,a ]−E[e |x =0,a ]
11iiiiii 􏰒 So our β OLS = β1 if the following condition holds:
1
E[ei|xi,ai]−E[ei|ai] 􏰒 This is called Conditional Mean Independence
􏰒 A slightly weaker version of our conditional independence assumption (yi1,yi0)⊥xi|ai
53/95

Example: Controlling for a Confounding Variable
􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓 􏰙􏰕􏰓 􏱇􏰛􏰽􏰞􏰰􏰜 􏱀􏰬􏰱􏰜􏰟􏰠 􏱐􏰜􏰵􏰬􏰮􏰛
54/95
􏱌􏰛􏰴􏰱􏱍􏱁􏰞􏰵􏰰􏰜 􏱎􏰬􏰟􏰛 􏱏􏰶􏰰􏰭􏰛
􏰓 􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓

Republican Votes and Income: South and North
􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓 􏰙􏰕􏰓 􏱇􏰛􏰽􏰞􏰰􏰜 􏱀􏰬􏰱􏰜􏰟􏰠 􏱐􏰜􏰵􏰬􏰮􏰛
55/95
􏱌􏰛􏰴􏰱􏱍􏱁􏰞􏰵􏰰􏰜 􏱎􏰬􏰟􏰛 􏱏􏰶􏰰􏰭􏰛
􏰓 􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓

Republican Votes and Income: South and North
􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓 􏰙􏰕􏰓 􏱇􏰛􏰽􏰞􏰰􏰜 􏱀􏰬􏰱􏰜􏰟􏰠 􏱐􏰜􏰵􏰬􏰮􏰛
56/95
􏱌􏰛􏰴􏰱􏱍􏱁􏰞􏰵􏰰􏰜 􏱎􏰬􏰟􏰛 􏱏􏰶􏰰􏰭􏰛
􏰓 􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓

Controlling for a Confounding Variable
􏰒 Suppose we run the following regression
􏰒 What is βˆols? menti.com… 1
􏰒 So does being rich decrease Republican votes?
􏰒 Suppose we run separately in the South and North:
􏰒 What is βˆols in the south? 1
􏰒 Within regions, income positively associated with republican votes
57/95

Controlling for a Confounding Variable
􏰒 Now suppose instead we run the following regression repvotesi = β0 + β1 income + γ southi + ei
􏰒 Where southi = 1 for southern states 􏰒 We estimate βˆols = 0.340
􏰒 This is just the (weighted) average for the two regions: βˆols ≈ 1βˆols + 1βˆols
12121
􏰒 (weights in general do not have to be 1 ) 2
1
58/95

Why is This? Regression Anatomy
􏰒 Suppose we have the following (multivariate) regression y =βOLS+βOLSx +Z′γ+v
i01iii
􏰒 Here Zi is a potentially multidimensional set of controls 􏰒 Then the OLS estimator is algebraically equal to
βOLS = Cov(yi,x ̃i) 1 V a r ( x ̃ i )
􏰒 Where x ̃i is the residual from a regression of xi on Zi
x ̃ =x −(δOLS+Z′δOLS) ii0i
59/95

Why is This? Regression Anatomy
βOLS = Cov(yi,x ̃i) 1 V a r ( x ̃ i )
􏰒 Where x ̃i is the residual from a regression of xi on Zi x ̃ =x −(δOLS+Z′δOLS)
ii0i
􏰒 Coefficient from a multiple regression is the same as that from a single regression!
􏰒 After first subtracting (partialling out) the part of xi explained by Zi
60/95

Republican Votes and Income: South and North
􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓 􏰙􏰕􏰓 􏱇􏰛􏰽􏰞􏰰􏰜 􏱀􏰬􏰱􏰜􏰟􏰠 􏱐􏰜􏰵􏰬􏰮􏰛
61/95
􏱌􏰛􏰴􏰱􏱍􏱁􏰞􏰵􏰰􏰜 􏱎􏰬􏰟􏰛 􏱏􏰶􏰰􏰭􏰛
􏰓 􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓

Residualizing W.R.T. South Removes Difference in Income
􏰡􏰹􏰓 􏰡􏰢􏰓 􏰡􏰻􏰓 􏰡􏰙􏰓 􏱉􏱌􏰛􏰝􏰞􏰽􏰱􏰰􏱁􏰞􏱑􏰛􏰽􏱊 􏱇􏰛􏰽􏰞􏰰􏰜 􏱀􏰬􏰱􏰜􏰟􏰠 􏱐􏰜􏰵􏰬􏰮􏰛
􏰙􏰓 􏰻􏰓
62/95
􏱌􏰛􏰴􏰱􏱍􏱁􏰞􏰵􏰰􏰜 􏱎􏰬􏰟􏰛 􏱏􏰶􏰰􏰭􏰛
􏰓 􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓

OLS Finds Line of Best Fit on the “Residuals”
􏰡􏰹􏰓 􏰡􏰢􏰓 􏰡􏰻􏰓 􏰡􏰙􏰓 􏱉􏱌􏰛􏰝􏰞􏰽􏰱􏰰􏱁􏰞􏱑􏰛􏰽􏱊 􏱇􏰛􏰽􏰞􏰰􏰜 􏱀􏰬􏰱􏰜􏰟􏰠 􏱐􏰜􏰵􏰬􏰮􏰛
􏰙􏰓 􏰻􏰓
63/95
􏱌􏰛􏰴􏰱􏱍􏱁􏰞􏰵􏰰􏰜 􏱎􏰬􏰟􏰛 􏱏􏰶􏰰􏰭􏰛
􏰓 􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓

OLS Finds Line of Best Fit on the “Residuals”
􏰡􏰹􏰓 􏰡􏰢􏰓 􏰡􏰻􏰓 􏰡􏰙􏰓 􏱉􏱌􏰛􏰝􏰞􏰽􏰱􏰰􏱁􏰞􏱑􏰛􏰽􏱊 􏱇􏰛􏰽􏰞􏰰􏰜 􏱀􏰬􏰱􏰜􏰟􏰠 􏱐􏰜􏰵􏰬􏰮􏰛
􏰙􏰓 􏰻􏰓
63/95
􏱌􏰛􏰴􏰱􏱍􏱁􏰞􏰵􏰰􏰜 􏱎􏰬􏰟􏰛 􏱏􏰶􏰰􏰭􏰛
􏰓 􏰕􏰓 􏰖􏰓 􏰗􏰓 􏰘􏰓 􏰙􏰓􏰓

Measurement Error
􏰒 An omitted variable is a common reason why β OLS ̸= β 1
􏰒 There are several others, including measurement error
􏰒 Suppose yi is generated by:
y =β +β x∗+e i01ii
􏰒 But we can’t exactly see xi∗, instead we see:
where Corr(xi∗,ηi)=0 􏰒 Why might this happen?
x =x∗+η iii
64/95

Measurement Error
y =β +β x∗+e i01ii
x =x∗+η ⇒x∗ =x −η iiiiii
⇒yi =β0+β1xi−β1ηi+ei 􏰐 􏰏􏰎 􏰑
vi
yi =β0+β1xi+vi
65/95

Measurement Error
􏰒 So we can only estimate
yi =β0+β1xi+ vi
􏰐􏰏􏰎􏰑
−β1 ηi +ei
􏰒 But because of the measurement error: Corr (xi , vi ) ̸= 0
􏰒 Why?
􏰒 And our estimates are off!
Cov(yi,xi) = βOLS ̸= β Var(xi) 1 1
66/95

Measurement Error
􏰒 Measurement error in xi really also an omitted variables problem 􏰒 With ηi as the omitted variable.
􏰒 What happens if we mismeasure yi ?
67/95

Part 4: An Overview of IV
􏰒 Assumptions for a valid instrument 􏰒 Two interpretations of an IV
􏰒 The ratio of two OLS coefficients 􏰒 Two stage least squares
􏰒 An example: the impact of schooling on wages
68/95

Hope remains for β
􏰒 Suppose Corr (xi , vi ) ̸= 0 but want to estimate causal effect β yi =β0+β1xi+vi
􏰒 For now keep yi =wages, xi =years of schooling in mind 􏰒 An instrument (or instrumental variable) can help
69/95

What makes a good instrument?
􏰒 Suppose there is a variable zi that should be totally unrelated to yi 􏰒 For example: should the quarter you were born in impact your wage?
70/95

A sample of birth quarters
71/95
1234 Quarter of Birth
Number of Observations
0 20000 40000 60000 80000 100000

What makes a good instrument?
􏰒 Suppose there is a variable zi that should be totally unrelated to yi 􏰒 For example: should your birth quarter impact your wage?
􏰒 Except for one thing
􏰒 zi has a direct impact on xi
72/95

In the US birth quarter influences years of schooling
73/95
1234 Quarter of Birth
Average Years of Education
12.6 12.7 12.8 12.9

In the US birth quarter influences years of schooling
􏰒 Many US states mandate that students begin school in the calendar year when they turn 6
􏰒 School years start in mid- to late- year (e.g., September)
􏰒 Many states mandate students stay in school until they turn 16
74/95

In the US birth quarter influences years of schooling
􏰒 Consider two students
(1) One born in February (Q1)
(2) One born in November (Q4)
􏰒 Q1 student: starting school age ≈ 6.5
􏰒 Q4 student: starting school age < 6 􏰒 There is then variation in schooling completed when each turns 16 (Q4 > Q1)
75/95

Intuition behind instrumental variables
􏰒 Suppose we really believe that zi should have no impact on yi except by changing xi
􏰒 e.g. no possible other way birth quarter could influence wages
􏰒 Then the impact of zi on yi should tell us something about the effect we are looking for
􏰒 Any impact of birth quarter on wages must be because it raises education
76/95

Birth quarter impacts wages
77/95
1234 Quarter of Birth
Average Yearly Wage in Dollars
22500 22600 22700 22800 22900 23000

Intuition behind instrumental variables
􏰒 Being born in 4th quart. vs. 1st quart. increases yearly wages by: ≈ \$300
􏰒 If this all happens because quarter of birth increases education, how do we recover the impact of education on wages?
􏰒 Being born in 4th quart. vs. 1st quart. increases education by: ≈ 0.2 years
􏰒 So each year of schooling increases wages by: Impact of birth quart. (zi ) on wages (yi )
􏰎􏰑􏰐􏰏
βiv = \$300 0.2
􏰐􏰏􏰎􏰑
Impact of birth quart. (zi ) on education (xi )
=\$1500
78/95

Formalizing the Instrumental Variables Assumptions
yi =β0+β1xi+vi
􏰒 Our informal assumption: zi should change xi , but have absolutely
no other impact on yi . Formally:
1. Cov [zi , xi ] ̸= 0 (Instrument Relevance)
􏰒 Intuition: zi must change xi
2. Cov [zi , vi ] = 0 (Exclusion Restriction)
􏰒 Intuition: zi has absolutely no other impact on yi
􏰒 Recall that vi is everything else outside of xi that influences yi
79/95

Hope remains for β
1. Instrument Relevance: Cov [zi , xi ] ̸= 0 2. Exclusion Restriction: Cov [zi , vi ] = 0
􏰒 Under these assumptions, we may consistently estimate β IV = β βIV = Cov(yi,zi) = β
Cov(xi,zi)
􏰒 This ratio should look familiar from our example…
􏰒 And Cov(yi,zi) and Cov(xi,zi) can be estimated from the data!
80/95

Getting βIV in practice
􏰒 An alternative way of writing this :
Coefficient from regression yi on zi
􏰎 􏰑􏰐 􏰏
βIV = Cov(yi,zi) = Cov(yi,zi)/Var(zi) Cov(xi,zi) Cov(xi,zi)/Var(zi)
􏰐 􏰏􏰎 􏰑
Coefficient from regression xi on zi
81/95

Getting βIV in practice
􏰒 This means that if we run two regressions:
1. “First stage” impact of zi on xi
xi = α1 + φ zi + ηi
2. “Reduced form” impact of zi on yi
yi =α2+ρzi+ui
􏰒 Then we can write:
βIV = Cov(yi,zi)/Var(zi) = ρOLS
Cov(xi,zi)/Var(zi) φOLS 􏰐􏰏􏰎􏰑
Impact of zi on xi
Impact of zi onyi 􏰎􏰑􏰐􏰏
82/95

Getting βIV in practice: Two stage least squares 􏰒 A more common way of estimating β IV :
1. Estimate φOLS in first stage:
xi = α1 + φ zi + ηi
2. Predict the part of Xi explained by zi
xˆ =αOLS +φOLSz
i1i 3. Regress Yi on predicted xˆi in a second stage
􏰒 Note that:
y i = α 2 + β xˆ i + u i
β2ndStage = Cov(yi,xˆi) =β V a r ( xˆ i )
83/95

Getting βIV in practice: Two stage least squares
β2ndStage = Cov(yi,xˆi) V a r ( xˆ i )
= Cov(yi,α1 +φOLSzi) Var(α1 +φOLSzi)
= φOLS ·Cov(yi,zi) φOLS ·φOLSVar(zi)
= ρOLS φ OLS
=βIV =β
84/95

Two stage least squares works with more Xs and Zs
􏰒 The same approach works with multiple X′s, Z′s and with control variables
Y =α+X′β+W′θ+v iiii
􏰒 Xi =[xi1,…,xiM]′ with Cov(Xim,vi)̸=0 (“endogenous variables”) 􏰒 Wi = [wi1,…,wiL]′ (control variables)
􏰒 Zi =[zi1,…,ziK]′ with E[zivi]=0 (instruments)
􏰒 Note that K must be greater than M
􏰒 More instruments than endogenous variables
85/95

Two stage least squares works with multiple Xs and Zs 􏰒 For each xim, run the first stage regression:
x =δ+Z′γ+W′φ+ε im i i i
􏰒 Generate predicted values:
xˆ = δOLS +Z′γOLS +W′φOLS
im i i
􏰒 Run the second stage regression
Y =α+Xˆ′β+W′θ+v
iiii 􏰒 WhereXˆi =[xˆi1,…,xˆiM]′
β2ndStage =β
86/95

Back to Example IV: Education and Wages
􏰒 Adapted from Angrist and Krueger, 1991
yi =β0+β1xi+vi
􏰒 Are there any concerns about OVB here?
􏰒 What are the requirements for a valid instrument zi ?
87/95

Example IV: Education and Wages
􏰒 Angrist and Krueger use the quarter of birth as an instrument for education
􏰒 Many US states mandate that students begin school in the calendar year when they turn 6
􏰒 School years start in mid- to late- year (e.g., September)
􏰒 Many states mandate students stay in school until they turn 16
88/95

Example IV: Education and Wages
􏰒 Consider two students
(1) One born in February (Q1)
(2) One born in November (Q4)
􏰒 Q1 student: starting school age ≈ 6.5
􏰒 Q4 student: starting school age < 6 􏰒 There is then variation in schooling completed when each turns 16 (Q4 > Q1)
89/95

First Stage
90/95

Reduced Form
91/95

Example IV: Education and Wages
􏰒 A simple instrument:
zi = 1{Quarter of Birth = 1}
􏰒 First stage:
xi =γ0+γ1xi+εi 􏰒 Do you expect γOLS >0 or γOLS <0? 􏰒 Predict xˆ = γOLS +γOLSz i01i 􏰒 Second stage: 11 y i = β 0 + β 1 xˆ i + v i 92/95 OLS and IV estimates of the economic returns to schooling 93/95 Today’s Lecture: Four Parts (1) Analyzing an experiment in R 􏰒 Comparing means (t-test) and regression (2) Causality and the potential outcomes framework 􏰒 What do we mean when we say X causes Y (3) Linear regression, the CEF, and causality 􏰒 How do we think about causality in a regression framework? (4) Instrumental Variables (if time) 􏰒 The intuition behind instrumental variables 94/95 Next Week 􏰒 Next week: introduction to panel data 95/95