# CS计算机代考程序代写 finance Non-Parametrics

Non-Parametrics
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
March 1-2, 2021
1/46

Non-Parametrics
1. Kernel Density Estimation 2. Non-Parametric Regression
2/46

Kernel Density Estimation
1. Parametric vs. non-parametric approaches 2. Histograms and the uniform kernel
3. Different bandwidths
4. Different kernels
3/46

Estimating Densities
􏰒 Suppose we see n=100 draws from a continuous random variable X: x1,x2 ··· ,xn
􏰒 We are often interested in the distribution of X: 􏰒 CDF:
􏰒 PDF:
FX(u)=P(X ≤u) fX (u) = dFX (u)
du
􏰒 How do we uncover the distribution of X from the data?
4/46

Scatter Plot of x1,x2 ··· ,xn
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
5/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Estimating Densities
􏰒 How do we uncover the distribution of X from the data? x1,x2 ··· ,xn
􏰒 Parametric Approach
􏰒 One strategy is to assume we know the form of the distribution
􏰒 e.g. Normal or χ2
􏰒 But we don’t know the particular parameters:
􏰒 Use the data to estimate the unknown parameters
􏰒 For example: we know X ∼ N(μ,σ2), but we don’t know μ or σ2
􏰒 Estimate: μˆ = ∑n xi i=1 n
􏰒 Estimate: σˆ2 = ∑n (xi −μˆ)2 i=1 n−1
􏰒 PlotN(μˆ,σˆ2)
6/46

Normal Density with Estimated μˆ = −0.75, σˆ = 9.24
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
7/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Downsides of the Parametric Approach
􏰒 In practice, we often don’t know the underlying distribution 􏰒 e.g. The assumption of normality may provide a very bad fit
􏰒 Non-parametric Approach
􏰒 No assumptions about underlying distribution
􏰒 Recover directly from the data 􏰒 Simplest form: histogram
8/46

Histogram Built from x1, x2 · · · , xn
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
9/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Histograms
􏰒 Histograms (appropriately scaled) provide a non-parametric approach to the density
􏰒 But a few downsides:
􏰒 Doesn’t provide a smooth, continuous distribution
􏰒 Lots of holes in the distribution when choice of bin is small 􏰒 Uninformative when bins are big
10/46

Histogram Built from x1,x2 ··· ,xn: Bin Size=1
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
11/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Histogram Built from x1,x2 ··· ,xn: Bin Size=5
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
12/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Histogram Built from x1,x2 ··· ,xn: Bin Size=20
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
13/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Kernel Density Estimation: Uniform Kernel
􏰒 To uncover smoother non-parametric densities we use a technique called kernel density estimation
􏰒 Many different versions (“choices of kernel”) but lets start with one very similar to a histogram
􏰒 Suppose we are interested in estimating fˆ(u) for any u 􏰒 First, lets count how many xi are “near” u
􏰒 We’ll define “near” as within 1 of u in either direction: 2
n􏰦 1􏰧 number of xi near u = ∑1 |u−xi|≤ 2
i=1
u – 1/2
u
u + 1/2
14/46

Kernel Density Estimation: Uniform Kernel
􏰒 To turn this count into a density, just scale by n: ˆ1n􏰦 1􏰧
f(u)=n∑1 |u−xi|≤2 i=1
􏰒 Average number of xi near u (per unit of x) scaled by n observations 􏰒 A density
􏰒 Note that 􏰨∞ =fˆ(u)du=1 −∞
15/46

Kernel Density Estimation: Uniform Kernel
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
16/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Kernel Density Estimation: Uniform Kernel
􏰒 Naturally, can adjust definition of “near” depending on the context 􏰒 For example, define “near” as within 1 of u in either direction:
i=1
􏰒 Doubling “near” ⇒ divide by 2 to keep things comparable
n 1{|u−xi|≤1}

i=1
􏰒 Number of xi near u per unit of x. 􏰒 To get a density:
fˆ(u)= 1 ∑n 1{|u−xi|≤1} n i=1 2
n
∑1 |u−xi|≤1
􏰦􏰧
2
17/46

Kernel Density Estimation: Uniform Kernel
􏰒 We call the function:
the uniform (or box, or rectangular) kernel 􏰒 Note that above we evaluate:
K(u−xi)= 1{|u−xi|≤1} 2
􏰒 We can write the density in terms of the kernel: ˆ1n
K(z) = 1{|z| ≤ 1} 2
f(u)= n ∑K(u−xi) i=1
18/46

What Defines a Kernel?
􏰒 Typically, a kernel is a function K(·) that satisfies two properties: 1. K(·) integrates to 1;
􏰩∞
K(z)dz = 1
−∞ 2. Symmetry: K(−z) = K(z)
􏰒 You can think of it as a weighting function
19/46

Kernel Density Estimation: Different Bandwidths
K(u−xi)= 1{|u−xi|≤1} 2
􏰒 By adjusting definition of “near” u, we get smoother densities: 􏰒 For example, define “near” as within 3:
n
Numberofxi within3ofu =∑1 3 ≤1
􏰒 Average number of xi near u (per unit):
numberofxi nearu 1 n 􏰦|u−xi| 􏰧 1 n 􏰉u−xi􏰊
unit =6∑1 3 ≤1=3∑K 3 i=1 i=1
􏰦|u−xi| 􏰧 i=1
􏰒 Then we can estimate the density as:
ˆ 11n􏰉u−xi􏰊
f (u) = n · 3 ∑ K 3 i=1
20/46

Uniform Kernel Density Estimation: Bandwidth=3
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
21/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Kernel Density Estimation: Different Bandwidths
K(ui −xi) = 1{|ui −xi| ≤ 1} 2
􏰒 In general, we can estimate our density as:
ˆ 11n 􏰉ui−xi􏰊
fh(u) = n · h ∑ K h i=1
􏰒 we call h the bandwidth
􏰒 Larger bandwidth ⇒ smoother
􏰒 Note that for any choice of h:
1n
= n · ∑Kh(ui −xi)
i=1
􏰩∞ˆ
fh(u)du = 1 −∞
22/46

Kernel Density Estimation: Bandwidth=6
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
23/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Kernel Density Estimation: Different Kernels
􏰒 The uniform kernel is one of the simplest:
1 n 1 n 1{|ui−xi|≤1} n ∑K(ui −xi)= n ∑ 2
i=1 i=1
􏰒 Many other choices of kernel that do a better job
􏰒 In fact, can choose any function K(z) such that: 􏰩∞
K(z)dz = 1 −∞
􏰒 Common choice: Gaussian
1 −1z2 K(z)=φ(z)=√ e 2

24/46

Kernel Density Estimation: Different Kernels
􏰒 For any choice of Kh: Kh(u−xi) gives a weight for observation xi
􏰒 Uniform (h=1)
􏰒 Weight=1 if xi is within 1 of u 2
􏰒 0 Otherwise
􏰒 Gaussian
􏰒 Weight is positive for all xi
􏰒 But declines depending on distance from u
􏰒 By taking the average of these weights (across all xi ), we get an estimate of the density at any point u
ˆ1n
fh(u)= n ∑Kh(u−xi)
i=1
25/46

Different Kernels
26/46

Different Kernels
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
􏰪􏰜􏰞􏰫􏰬􏰭􏰮 􏰯􏰰􏰱􏰝􏰝􏰞􏰰􏰜 􏰳􏰴􏰰􏰜􏰛􏰵􏰶􏰜􏰞􏰷􏰬􏰸 􏰚􏰰􏰟􏰰
27/46

Kernel Density Estimation: Epanechnikov
􏰒 A frequently used kernel is the Epanechnikov: 􏰉􏰊