CS计算机代考程序代写 finance data science Excel data structure OLS and the Conditional Expectation Function

OLS and the Conditional Expectation Function
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
Week One
January 11th and 12th, 2021
1/84

This Week
􏰒 Course Details
􏰒 Basic housekeeping
􏰒 Course tools: Menti, R, and R-Studio 􏰒 Introduction to tidy data
􏰒 OLS and the Conditional Expectation Function 􏰒 Review and properties of the CEF
􏰒 Review, implementation, and value of OLS
2/84

Course Details: Contact
􏰒 Lecturer: Chris Hansman
􏰒 Email: chansman@imperial.ac.uk
􏰒 Office: 53 Prince’s Gate, 5.01b 􏰒 Phone: +44 (0)20 7594 1044
􏰒 TA: Davide Benedetti
􏰒 Email: d.benedetti@imperial.ac.uk
3/84

Course Details: Assessment
􏰒 Two assignments
􏰒 Assignment 1 (25%)
􏰒 Assigned Tuesday of Week 3
􏰒 Due by 4pm on Tuesday of Week 5 􏰒 Assignment 2 (25%)
􏰒 Assigned Tuesday of Week 6
􏰒 Due by 5:30pm Tuesday of Week 8
􏰒 Final Exam (50%)
4/84

Course Details: Tentative Office Hours and Tutorials
􏰒 Tentative office hours
􏰒 Tuesdays from 17:30-18:30
􏰒 Or by appointment
􏰒 Formal tutorials will begin in Week 2
􏰒 Davide will be available this week to help with R/RStudio
5/84

Course Details: Mentimeter
􏰒 On your phone (or computer) go to Menti.com
6/84

Course Details: R and R-Studio
􏰒 Make sure you have the most up-to-date version of R: 􏰒 https://cloud.r-project.org/
􏰒 And an up-to-date version of RStudio:
7/84

Course Details: In Class Exercises
􏰒 Throughout the module we’ll regularly do hands on exercises 􏰒 Lets start with a quick example:
􏰒 On the insendi course page find the data: ols basic.csv
􏰒 5variablesY,X,Y sin,Y 2,Y nl
􏰒 Load the data into R, and run an OLS regression of Y on X. 􏰒 What is the coefficient on X?
8/84

Course Details: Projects in R-Studio
􏰒 For those with R-Studio set up:
􏰒 Open R-Studio and select File ⇒ New Project ⇒ New Directory ⇒
New Project
􏰒 Name the directory “EF lecture 1” and locate it somewhere convenient
􏰒 Each coursework should be completed in a unique project folder
9/84

Course Details: R set up
􏰒 Download all data files from the hub and place them in EF lecture 1 􏰒 s p price.csv
􏰒 ols basics.csv
􏰒 ames testing.csv 􏰒 ames training.csv
10/84

Course Details: The Tidyverse
􏰒 The majority of the coding we do will utilize the tidyverse
The tidyverse is an opinionated collection of R packages designed for data science.
All packages share an underlying design philosophy, grammar, and data structures.
􏰒 For an excellent introduction and overview:
install.packages(“tidyverse”)
library(tidyverse)
11/84

Course Details: Tidy Data
􏰒 The tidyverse is structured around tidy datasets
􏰒 There are three interrelated rules which make a dataset tidy:
1. Each variable must have its own column 2. Each observation must have its own row 3. Each value must have its own cell
􏰒 For the theory underlying tidy data:
􏰒 http://www.jstatsoft.org/v59/i10/paper
12/84

An Example of Tidy Data
13/84

An Example of Non-Tidy Data
14/84

Fixing An Observation Scattered Across Rows: spread()
tidy2 <- table2 %>%
15/84

Another Example of Non-Tidy Data
16/84

Fixing Columns as Values: gather()
tidy4a <- table4a %>%
gather(‘1999‘, ‘2000‘, key = “year”, value = “cases”)
17/84

Introducing the Pipe: %≥%
􏰒 You’ll notice that both of these operations utilize a “pipe”: %≥%
􏰒 A tool for clearly expressing a sequence of multiple operations 􏰒 Can help make code easy to read and understand
􏰒 Consider evaluating the following: x = 􏰺(log(e9)) 􏰒 Could write it as:
x <-sqrt(log(exp(9))) 􏰒 Or with pipes: x <- 9 %>%
exp() %>%
log() %>%
sqrt()
18/84

This Week: Two Parts
(1) Introduction to the conditional expectation function (CEF) 􏰒 Why is the CEF a useful (and widely used) summary of the
relationship between variables Y and X
(2) Ordinary Least Squares and the CEF
􏰒 Review, implementation, and the utility of OLS
19/84

Part 1: The Conditional Expectation Function
􏰒 Overview
􏰒 Key takeaway: useful tool for describing the relationship between
variables Y and X
􏰒 Why: (at least) three nice properties:
1. Law of iterated expections 2. CEF decomposition property 3. CEF prediction property
20/84

Review: Expectation of a Random Variable Y
􏰒 Suppose Y is a random variable with a finite number of outcomes y1,y2,···yk occurring with probability p1,p2,···pk:
􏰒 The expectation of Y is:
k E[Y]= ∑yipi
i=1
􏰒 For example: if Y is the value of a (fair) dice roll:
E[Y]=1×1+2×1+3×1+4×1+5×1+6×1 =3.5 666666
􏰒 Suppose Y is a (continuous) random variable whose CDF F(y) admits density f (y )
􏰒 The expectation of Y is:
􏰒 This is just a number!
􏰩
E[Y]= yf(y)dy
21/84

The Conditional Expectation Function (CEF)
􏰒 We are often interested in the relationship between some outcome Y and a variable (or set of variables) X
􏰒 A useful summary is the conditional expectation function: E[Y|X] 􏰒 Gives the expectation of Y when X takes any particular value
􏰒 Formally, if fy(·|X) is the conditional density of Y|X: 􏰩
E[Y|X]= zfy(z|X)dz
􏰒 E[Y|X] is a random variable itself: a function of the random X
􏰒 Can think of it as E[Y|X]=h(X)
􏰒 Alternatively, evaluate it at particular values: for example X = 0.5
E[Y|X =0.5] is just a number!
22/84

Unconditional Expectation of Height for Adults: E[H]
23/84
54 60 66 72 78

Unconditional Expectation of Height for Adults: E[H]
54 60 66 72 78
24/84

Unconditional Expectation of Height for Adults: E[H]
E[H]=67.5 In.
54 60 66 72 78
25/84

Conditional Expectation of Height by Age: E[H|Age]
E[H|Age=5]
E[H|Age=10]
E[H|Age=15]
E[H|Age=20] E[H|Age=25] E[H|Age=30] E[H|Age=35]
E[H|Age=40]
Height (Inches)
30 40 50 60 70 80
0 5 10 15 20 25 30 35 40 Age
26/84

Why the Conditional Expectation Function?
􏰒 E[Y|X] is not the only function that relates Y to X
􏰒 For example, consider 95th Percentile of Y given X: P95(Y|X)
P95[H|G=Male]
P95[H|G=Female]
54 60 66 72 78
􏰒 But E[Y|X] has a bunch of nice properties
27/84

Property 1: The Law of Iterated Expectations
EX[E[Y|X]]=E[Y]
􏰒 Example: let Y be yearly wages for MSc graduates
􏰒 E[Y]=£1,000,900
􏰒 Two values for X : {RMFE, Other}
􏰒 Say 10% of MSc students are RMFE, 90% in other programs 􏰒 E[Y|X=RMFE]=£10,000,000
􏰒 E[Y|X=Other]=£1000
􏰒 The expectation works like always (just over E[Y|X] instead of X): E[E[Y|X]]=E[Y|X =RMFE]×P[X =RMFE]+E[Y|X =Other]×P[X =Other]
􏰐 􏰏􏰎 􏰑􏰐 􏰏􏰎 􏰑
£10,000,000×0.1 £1000×0.9 = £1, 000, 900
28/84

Property 1: The Law of Iterated Expectations
E[E[Y|X]]=E[Y]
􏰒 Not true, for example, for the 95th percentile: E[P95[Y|X]]̸=P95[Y]
29/84

Property 2: The CEF Decomposition Property
􏰒 Any random variable Y can be broken down into two pieces Y =E[Y|X]+ε
􏰒 Where the residual ε has the following properties: (i) E[ε|X] = 0 (“mean independence”)
(ii) ε uncorrelated with any function of X
􏰒 Intuitively this property says we can break down Y into two parts: (i) The part of Y “explained by” X: E[Y|X]
􏰒 This is the (potentially) useful part when predicting Y with X (ii) The part of Y unrelated to X: ε
30/84

Property 2: Proof
Y =E[Y|X]+ε
(i) E[ε|X] = 0 (“mean independence”)
ε = Y − E [Y |X ]
⇒E[ε|X]=E[Y −E[Y|X]|X] =E[Y|X]−E[Y|X]=0
(ii) ε uncorrelated with any function of X
Cov(ε,h(x)) = E[h(X)ε]−E[h(X)]E[ε]
􏰐 􏰏􏰎 􏰑
=0 How come?
= E[E[h(X)ε|X]] 􏰐 􏰏􏰎 􏰑
iterated expectations
= E[h(X)E[ε|X]] = E[h(X)·0] = 0
31/84

Property 3: The CEF Prediction Property
􏰒 Out of any function of X, E[Y|X] is the best predictor of Y
􏰒 In other words, E[Y|X] is the “closest” function to Y on average
􏰒 What do we mean by closest?
􏰒 Consider any function of X, say m(X)
􏰒 m(X) is close to Y if the difference (or “error”) is small: Y −m(x) 􏰒 Close is about magnitude, treat positive/negative the same…
􏰒 m(X) is also close to Y if the squared error is small: (Y −m(x))2 􏰒 E[Y|X] is the closest, in this sense, in expectation:
E [Y |X ] = arg min E [(Y − m(X ))2 ] m(X)
􏰒 “Minimum mean squared error”
32/84

Property 3: Proof (Just for Fun)
􏰒 Out of any function of X, E[Y|X] is the best predictor of Y E [Y |X ] = arg min E [(Y − m(X ))2 ]
m(X)
􏰒 To see this, note:
(Y −m(X))2 =([Y −E[Y|X]]+[E[Y|X]−m(X)])2 = [Y − E [Y |X ]]2 + [E [Y |X ] − m(X )]]2
+2[E[Y|X]−m(X)]]·[Y −E[Y|X]] 􏰐 􏰏􏰎 􏰑􏰐 􏰏􏰎 􏰑
h(x) ε
⇒E[(Y −m(X))2]=E[(Y −E[Y|X])2]+E[(E[Y|X]−m(X))2]+E[h(X)·ε]
􏰐 􏰏􏰎 􏰑􏰐 􏰏􏰎 􏰑􏰐􏰏􏰎􏰑
Unrelated to m(X) Min. when m(X)=E[Y|X] =0
33/84

Summary: Why We Care About Conditional Expectation Functions
􏰒 Useful tool for describing relationship between Y and X
􏰒 Several nice properties
􏰒 Most statistical tests come down to comparing E[Y|X] at certain X 􏰒 Classic example: experiments
34/84

Part 2: Ordinary Least Squares
􏰒 Linear regression is arguably the most popular modeling approach across every field in the social sciences
􏰒 Transparent, robust, relatively easy to understand
􏰒 Provides a basis for more advanced empirical methods 􏰒 Extremely useful when summarizing data
􏰒 Plenty of focus on the technical aspects of OLS last term 􏰒 Focus today on an applied perspective
35/84

Review of OLS in Three Parts
1. Overview
􏰒 Intuition and Review of Population and Sample Regression Algebra
􏰒 Connection With Conditional Expectation Function 􏰒 Estimating a Linear Regression in R
2. An Example: Predicting Home Prices
3. Rounding Out Some Details
􏰒 Scaling and Implementation
36/84

OLS Part 1: Overview
37/84
X
Y

OLS Estimator Fits a Line Through the Data
Y
X
βOLS +βOLSX 01
37/84

A Line Through the Data: Example in R
10
5
0
−5

●● ●

●● ●

●●
● ●

● ●
● ● ●
●●
●●
●●● ●

●● ● ●●
●●
●●● ●●●
●● ● ● ●●
●●●●●

●●●●● ● ●●
● ● ●● ● ●● ●●●●
●●●● ●● ● ● ●●
●●● ●●●
● ● ●●● ●● ●●●● ●●
● ●●
● ● ●
●● ●● ●●●●● ●●

●● ●
●●

● ●●●
● ●● ●● ●● ● ●
●● ●● ●
● ● ●● ●●●●
●● ● ●

● ● ●
●●● ●●●● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●
● ●●●●●●● ● ● ●● ●●●●●●●●●●●●●●
● ● ●
●● ●●●●●●●●●●● ●
●●●● ● ●●
●●● ● ● ● ●●● ●●
● ●●●● ● ● ● ●● ● ● ● ●●● ●●●
● ●●●●●● ●●
●●● ●● ●● ●
● ●● ●
●●● ●● ● ●● ● ●
●●●●
● ● ● ●●

●● ●
● ● ●●
●●●
●●●
● ●
●● ● ●●●●● ●● ● ● ●●
● ●●
● ●
●●

●● ● ●● ●
●● ●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ● ●● ● ●●●● ● ●●●●●●●● ●● ●●●●● ●●● ●●●● ●
●●●● ● ●● ● ●●
●●● ● ●
●● ●●●●
●● ● ●● ●●●●●●●●●
● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ●●●
●● ●●●●●● ● ● ●●● ●●●●●●●●●●●●● ●
● ● ●●●● ●●● ● ●●●●●●●●● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●● ●● ●● ●● ●●● ●●● ● ● ●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ●● ●● ●● ● ●●

●●●●●●●● ●

● ●●● ●
●● ● ●●
●●●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
● ●
●●●● ●●●● ●●
●●● ●
●●
● ●● ● ●●●

●●● ●●
● ●● ●●●

● ● ●
● ●

● ●
−2 0 2
X
●● ●
● ●● ●

● ●●●●

●●● ●●
●●● ●●
●●● ●●●
●●●

●●●●● ● ●●
38/84
Y

A Line Through the Data: Example in R
10
5
0
−5

●● ●

●● ●

●●
● ●

● ●
● ● ●
●●
●●
●●● ●

●● ● ●●
●●
●●● ●●●
●● ● ● ●●
●●●●●

●●●●● ● ●●
● ● ●● ● ●● ●●●●
●●●● ●● ● ● ●●
●●● ●●●
● ● ●●● ●● ●●●● ●●
● ●●
● ● ●
●● ●● ●●●●● ●●

●● ●
●●

● ●●●
● ●● ●● ●● ● ●
●● ●● ●
● ● ●● ●●●●
●● ● ●

● ● ●
●●● ●●●● ● ●● ● ●● ● ● ● ●●●●●●●● ●●●
● ●●●●●●● ● ● ●● ●●●●●●●●●●●●●●
● ● ●
●● ●●●●●●●●●●● ●
●●●● ● ●●
●●● ● ● ● ●●● ●●
● ●●●● ● ● ● ●● ● ● ● ●●● ●●●
● ●●●●●● ●●
●●● ●● ●● ●
● ●● ●
●●● ●● ● ●● ● ●
●●●●
● ● ● ●●

●● ●
● ● ●●
●●●
●●●
● ●
●● ● ●●●●● ●● ● ● ●●
● ●●
● ●
●●

●● ● ●● ●
●● ●●●●● ●●●●● ● ● ●● ● ● ● ●●
● ●● ●●● ● ●● ● ●●●● ● ●●●●●●●● ●● ●●●●● ●●● ●●●● ●
●●●● ● ●● ● ●●
●●● ● ●
●● ●●●●
●● ● ●● ●●●●●●●●●
● ● ●●● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ●●●●●●●●●●● ●●●● ●●●
●● ●●●●●● ● ● ●●● ●●●●●●●●●●●●● ●
● ● ●●●● ●●● ● ●●●●●●●●● ● ● ● ● ●●●● ●● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ●●●● ●
●● ●● ●● ●● ●●● ●●● ● ● ●●●●●●●●●●●●●●●●● ●● ●●●●●● ●●● ●● ●● ●● ● ●●

●●●●●●●● ●

● ●●● ●
●● ● ●●
●●●
● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●
● ●
●●●● ●●●● ●●
●●● ●
●●
● ●● ● ●●●

●●● ●●
● ●● ●●●

● ● ●
● ●

● ●
−2 0 2
X
●● ●
● ●● ●

● ●●●●

●●● ●●
●●● ●●
●●● ●●●
●●●

●●●●● ● ●●
39/84
Y

How Do We Choose Which Line?
40/84
β0 + β1X
X
Y

One Data Point
40/84
β0 + β1X
xi
X
Y

vi: Observation i’s Deviation from β0 +β1xi
40/84
β0 + β1X
vi
β0 + β1xi
xi
X
Y

One Data Point
40/84
β0 + β1X
yi=β0 + β1xi+vi vi
β0 + β1xi
xi
X
Y

Choosing the Regression Line
􏰒 For any line β0 +β1X, the data point (yi,xi) may be written as: yi =β0+β1xi+vi
􏰒 vi will be big if β0 +β1xi is “far” from yi 􏰒 vi willbesmallifβ0+β1xi is“close”toyi 􏰒 We refer to vi as the residual
41/84

Choosing the (Population) Regression Line
yi =β0+β1xi+vi
􏰒 An OLS regression is simply choosing the βOLS,βOLS that make vi
as “small” as possible on average 􏰒 How do we define “small”?
􏰒 Want to treat positive/negative the same: consider vi2
􏰒 Choose βOLS,βOLS to minimize: 01
E[vi2] = E[(yi −β0 −β1xi)2]
01
42/84

(Population) Regression Anatomy
{βOLS,βOLS} = arg min E[(y −β −β x )2] 0 1 {β0,β1} i 0 1i
􏰒 In this simple case with only one xi , β OLS has an intuitive definition: 1
βOLS = Cov(yi,xi) 1 Var(xi)
βOLS =y ̄−βOLSx ̄ 01
43/84

Regression Anatomy (Matrix Notation)
yi =β0+β1xi+vi
􏰒 You will often see more concise matrix notation:
􏰋β0􏰌 􏰋1􏰌
β=β Xi=x 1i
􏰐 􏰏􏰎 􏰑 􏰐􏰏􏰎􏰑
2×1 2×1 y =X′β+v
􏰒 This lets us write the OLS Coefficients as:
βOLS = argminE[(y −X′β)2]
iii
⇒ βOLS = E[X X′]−1E[X y ] ii ii
{β} i i
44/84

(Sample) Regression Anatomy
βOLS = argminE[(y −X′β)2] {β} i i
βOLS = E[X X′]−1E[X y ] ii ii
􏰒 Usually do not explicitly know these expectations, so compute sample analogues:
􏰒 Where
⇒βˆOLS =(X′X)−1(X′Y)
1 x1 y1
1 x2 y2 X=. . Y=.
. . . 1 xN yn
􏰐 􏰏􏰎 􏰑 􏰐􏰏􏰎􏰑
ˆOLS N (yi −Xi′β)2
β =argmin∑ {β} i=1
N
N×2 N×1
45/84

This Should (Hopefully) Look Familiar
N