# CS代考 MAST 90138: MULTIVARIATE STATISTICAL TECHNIQUES 1 INTRODUCTION – cscodehelp代写

MAST 90138: MULTIVARIATE STATISTICAL TECHNIQUES 1 INTRODUCTION

Goal of the course: Describe, understand and discover properties of data in p > 1 dimensions.

We are interested in analysing a sample of n random vectors X1, . . . , Xn, also denoted by Xi for i = 1,…,n, where X1, X2, etc all belong to Rp:

X1 = (X11,…,X1p) ∈ Rp X2 = (X21,…,X2p) ∈ Rp

.

Xn = (Xn1,…,Xnp) ∈ Rp .

This means that we have a sample of n individuals, and that for each in-

dividual we observe p variables (sometimes called features).

Lecture notes originally by Prof. 1

Example:

• Consider a health study involving n = 100 patients.

• On each patient we measure p = 4 quantities: age, weight, body mass index, systolic blood pressure.

• For the ith individual, where i = 1, . . . , 100 we observe Xi =(Xi1,Xi2,Xi3,Xi4).

• Xi1=age of ith patient

Xi2=weight of ith patient

Xi3= body mass index of ith patient Xi4=systolic blood pressure of ith patient.

Lecture notes originally by Prof. 2

• Often we gather all the observations into a single n × p matrix X11 …X1p

X=X21 …X2p .

Xn1 … Xnp

Each Xij, for i = 1,…,n and j = 1,…,p is a random variable. X has n rows and p columns.

The ith row represents the p variables corresponding to the ith individual,

the jth column represents the jth variable for all n individuals. •WecallthevaluetakenbyXi =(Xi1,…,Xip)theobservedvalue,or

realization.

• Often we use a lower case to denote the observed value, i.e.

xi = (xi1, . . . , xip) is the realization of the random vector Xi.

Lecture notes originally by Prof. 3

What we will do in the course:

• The technique learned in basic statistics courses are not focused on

multivariate data.

• E.g. how would we do a scatterplot in more than p = 2 dimensions? How to graphically represent multivariate data?

• In this course we will learn techniques especially designed for multi- variate data.

Lecture notes originally by Prof. 4

• Revision of basic matrix results needed in the course.

As this is just a revision, we won’t spend much time on examples

but the reference book has examples.

• Revision of basic results about multivariate data and multinormal dis- tributions.

As this is just a revision, we won’t spend much time on examples but the reference book has examples.

Lecture notes originally by Prof. 5

2 REVIEW OF MATRIX PROPERTIES

Sections 2.1, 2.2, 2.3, 2.4, 2.6 and 2.7 in Ha ̈rdle and Simar(2015).

2.1 ELEMENTARY OPERATIONS

A matrix A is a system of numbers with n rows and p columns:

a11 a12 …a1p A=a21 a22 …a2p

. an1 an2 … anp

We can refer to A as (aij) or (aij)1≤i≤n,1≤j≤p.

The dimension of the matrix is denoted by n × p, where n is the num-

ber of rows and p is the number of columns.

Lecture notes originally by Prof. 6

We use AT to denote the transpose of the matrix A: a11 a21 …an1

AT = a12 a22 . . . an2 .

a1p a2p … anp (rows become columns and columns become rows).

Lecture notes originally by Prof. 7

54 2 A Short Excursion into Matrix Algebra

Special matrices (taken from Ha ̈rdle and Simar):

Table 2.1

Name

Scalar

Column vector Row vector Vector of ones

Vector of zeros

Square matrix Diagonal matrix Identity matrix

Unit matrix Symmetric matrix

Special matrices and vectors

Definition

Notation

pDnD1

a

pD1

a

nD1

a>

.1;:::;1/> „ ƒ‚ …

n

1n

.0;:::;0/> „ ƒ‚ …

n

0n

nDp

A.p p/

aij D 0, i 6D j , n D p

diag.aii /

Example

3

1 3

1 !3 1

1

!

0 0

!

20 02!

10 02!

10 01

!

11 1 1!

128 23

!

diag.1; : : : ; 1/ „ ƒ‚ …

p

aij D 1, n D p

Ip 1n1>n

Lecture notes originally by Prof.

aij D aji

!

Diagonal matrix Identity matrix

Unit matrix Symmetric matrix Null matrix

Upper triangular matrix

Idempotent matrix

Orthogonal matrix

aij D 0, i 6D j , n D p diag.aii/ diag.1; : : : ; 1/ Ip

02! 10

01

!

11 1 1!

12 23!

00

00 0 1

B124C

@ 0 1 3 A

„ ƒ‚ … p

aij D1,nDp

1 n 1 >n

aij D aji

aij D 0

0

aij D 0; i < j
AA D A
A>A D I D AA>

We also write .aij/ for A and A.n p/ to indicate the numbers of rows and columns. Vectors are matrices with one column and are denoted as x or x.p 1/. Special matrices and vectors are defined in Table 2.1. Note that we use small letters

for scalars as well as for vectors.

001

01 B100C

@ 0 1 1 A 22

011 22

!

p1 22

p1

p1 p1

22

Lecture notes originally by Prof. 9

Matrix operations:

• Sum of two n × p matrices A = (aij) and B = (bij)

a11 +b11 a12 +b12 … a1p +b1p

D=A+B=a21+b21 a22+b22 … a2p+b2p .

an1 +bn1 an2 +bn2 … anp +bnp

Thus the (i, j)th element dij of D (found at the ith row, j columns of D)

is equal to aij + bij, the sum of the corresponding elements of A and B. D is also an n × p matrix.

• Likewise

a11 −b11 a12 −b12 … a1p −b1p A−B=a21−b21 a22−b22 … a2p−b2p

. an1 −bn1 an2 −bn2 … anp −bnp

Lecture notes originally by Prof.

10

• A constant c times an n × p matrix A = (aij) gives the following n × p matrix:

ca11 ca12 … ca1p cA=ca21 ca22 …ca2p

. can1 can2 … canp

so that each aij is multiplied by c.

• Product of two matrices A, an n × p matrix, and B, a p × m matrix: AB = C ,

where C is an n × m matrix whose (i, j)th element, for i = 1, . . . , n and j = 1,…,m is given by:

p

cij =aikbkj

k=1

Lecture notes originally by Prof. 11

• Scalar product (or dot product) of two p dimensional vectors x = (x1, . . . , xp)T (a column vector) and y = (y1,…,yp)T:

p

xTy = xjyj

is a real number (not a vector).

Properties of matrix operations:

BUT :

A+B=B+A A(B + C) = AB + AC A(BC) = (AB)C

(AT)T =A (AB)T =BTAT

AC̸=CA

j=1

Lecture notes originally by Prof. 12

Matrix Characteristics:

• Rank: The rank of an n × p matrix A, denoted by rank(A) is defined as

the maximum number of linearly independent rows (or columns).

A set of k rows (or columns) a1,…,ak of A are said to be linearly inde- pendent if none of them can be expressed as a nontrivial linear combi- nation of the other k − 1 rows (or columns). It is not possible to write

aj =ciai, i̸=j

where the ci’s are all different from zero, or equivalently

We always have

k

ciai = 0 ⇒ c1,…,ck = 0.

i=1

rank(A) ≤ min(n, p) .

Lecture notes originally by Prof. 13

• Trace: The trace of a square p × p matrix A, denoted by tr(A), is the sum of its diagonal elements:

p tr(A) = aii .

i=1

• Determinant: The determinant of a square p × p matrix A, denoted by det(A) or |A|, is a number computed from the matrix and which plays an important role in all sorts of problems. For a 2 by 2 matrix

it is computed by

a11 a12 A= a21a22

|A| = a11a22 − a21a12 .

For larger matrices, we compute this recursively. If you have forgot- ten how a determinant is computed, see Appendix A.2.3 of Mardia et al. (1979) or

https://en.wikipedia.org/wiki/Determinant

Lecture notes originally by Prof. 14

• Inverse: If |A| ≠ 0, the inverse of a square p × p matrix A exists. It is denoted by A−1 and is such that

AA−1 = A−1A = Ip

(Ip = p × p identity matrix).

See Ha ̈rdle and Simar, page 57, for how to compute the inverse.

We have

|A−1| = 1/|A| .

Lecture notes originally by Prof. 15

• Eigenvalues and eigenvectors of a square p × p matrix A:

The (non zero) p × 1 vector v is an eigenvector of A with eigenvalue

λ if it is such that

Note that λ is a real number (not a vector).

If the matrix A is symmetric then there are p eigenvalues and eigenvectors.

The eigenvalues are not necessarily all different from each other. All eigenvalues satisfy

|A − λIp| = 0

(they are the p roots of the above polynomial of order p in λ).

In practice we can compute them with a software, e.g. R.

Lecture notes originally by Prof. 16

Av = λv .

Constant multiples of an eigenvector v with eigenvalue λ are also eigenvectors with eigenvalue λ.

Therefore we usually define eigenvectors to be scaled so that they have norm 1:

√

∥v∥= vTv=1.

Lecture notes originally by Prof. 17

• Suppose the square p × p matrix A has eigenvalues λ1, . . . , λp.

Let Λ be the diagonal matrix Λ = diag(λ1, . . . , λp) with the λi’s on the

diagonal and 0 everywhere else. Then we have

and .

p det(A) = |A| = |Λ| = λi

i=1 p

tr(A) = tr(Λ) = λi i=1

Lecture notes originally by Prof. 18

Properties of Matrix Characteristics

tr.ABC/ D tr.BCA/

)

5

)

6 7

8

9 0 1 2 3

A.n n/; B.n n/; c 2 R

2.1 Elementary Operations

A.np/; B.pn/

tr.ACB/ D trACtrB tr.cA/ D c tr A

jcAj D cnjAj

jABj D jBAj D jAjjBj

tr.A B/ D tr.B A/ rank.A/ min.n; p/ rank.A/ 0 rank.A/ D rank.A>/

rank.A>A/ D rank.A/

rank.A C B/ rank.A/ C rank.B/

rank.AB/ minfrank.A/; rank.B/g

(2.4 (2.5

(2. (2.

(2.

(2. (2.1 (2.1 (2.1 (2.1

19

A.np/; B.pq/; C.qn/

Lecture notes originally by Prof.

rank.AB/ minfrank.A/; rank.B/g (2.1

1 1

1 1

A.np/; B.pq/; C.qn/

A.p p/

tr.ABC/ D tr.BCA/ D tr.CAB/

rank.ABC/ D rank.B/

for nonsingular A; C

(2. (2.

(2. (2.

jA1j D jAj1

rank.A/ D p if and only if A is nonsingular.

Summary

,! The determinant jAj is the product of the eigenvalues of A.

,! The inverse of a matrix A exists if jAj ¤ 0.

Lecture notes originally by Prof.

20

,! The trace tr.A/ is the sum of the eigenvalues of A.

2.2 SPECTRAL DECOMPOSITIONS Spectral decomposition

• Spectral decomposition: Suppose A is a square and symmetric p × p matrix and let

λ1, . . . , λp denotes its p eigenvalues and

v1, . . . , vp denote the associated p × 1 eigenvectors of norm 1 and or-

thogonal to each other.

Note: two p × 1 vectors v and w are orthogonal if p

vT w = viwi = 0 . i=1

Lecture notes originally by Prof. 21

Then we can always express A in the following way, which is called spectral decomposition of A:

p

A = λ j v j v jT .

j=1

This can also be written in matrix form, if we let

Λ = diag(λ1,…,λp) Γ = (v1,…,vp)

and

the p × p orthogonal matrix whose columns are the p eigenvectors:

A=ΓΛΓT .

Lecture notes originally by Prof. 22

• In the above notation, if A = ΓΛΓT then if we take a power of A, for example Aα, we find

Aα = ΓΛαΓT .

This is because the vj’s are orthogonal and of norm 1.

For example

A2 =ΓΛΓT ΓΛΓT = ΓΛ2ΓT .

This also works for negative powers if A is invertible (which happens if and only if the eigenvalues are all nonzero). For example,

A−1 = ΓΛ−1ΓT (A−1: the inverse of the matrix A).

Lecture notes originally by Prof. 23

Singular value decomposition

More generally, a similar decomposition exists for matrices that are not especially square matrices. In particular, any n × p matrix A with rank r can be decomposed as

A=ΓΛ∆T ,

where the n × r matrix Γ and the p × r matrix ∆ are column orthonormal,

which means that their colomns are orthonormal, that is

ΓT Γ = ∆T ∆ = Ir

Λ = diag(λ1/2, . . . , λ1/2)

The λi’s are the nonzero eigenvalues of the matrices AAT and AT A; the columns of Γ and ∆ are the corresponding r eigenvectors of those matri- ces.

and

where each λi > 0.

1r

Lecture notes originally by Prof. 24

2.3 QUADRATIC FORMS •AquadraticformQ(x)ofthep-vectorx=(x1,…,xp)T isdefinedby

pp

Q(x) = aijxixj = xTAx,

i=1 j=1

where aij is the (i, j)th element of a symmetric p × p matrix A.

• If

then the matrix A is called semi positive definite, which is denoted by

A ≥ 0.

• However if the quadratic form satisfies Q(x)>0 forallx̸=0

then the matrix A is called positive definite, which is denoted by A > 0. Lecture notes originally by Prof. 25

Q(x)≥0 forallx̸=0

• A > 0 is equivalent to all the eigenvalues of A satisfy: λ1 > 0,…,λp > 0.

Then |A| > 0 and A−1 exists. • If A ≥ 0 then

rank(A) = r < p
and
p − r eigenvalues of A are equal to zero
while the other r are strictly positive.
Lecture notes originally by Prof. 26
2.4 GEOMETRICAL ASPECTS Distance
• The Euclidian distance d(x, y) between two vectors x, y ∈ Rp is defined by
d(x,y)=
(xi −yi)2 =
p
(x−y)T(x−y) Example in R2, where x = (x1, x2) and y = (y1, y2):
69
i=1
Lecture notes originally by Prof. 27
• A weighted version of this distance can be defined as
p
wi(xi −yi)2 =
(x−y)TW(x−y),
d(x,y)=
where each wj > 0 and W = diag(w1,…,wp).

i=1

• This can be further generalised into the following distance: d(x, y) = (x − y)T A(x − y) ,

where A is a positive definite matrix.

Lecture notes originally by Prof. 28

Norm

• The (Euclidian) norm of a vector x ∈ Rp is defined by p

√ ∥x∥= x2i= xTx.

i=1

• A unit vector is a vector of norm 1.

• Can be generalised into a norm with respect to a positive definite ma- trix A:

√

∥x∥A =

xTAx.

Lecture notes originally by Prof. 29

Angle between two vectors

• The angle θ between two vectors x, y ∈ Rp is defined through the co- sine of θ by:

Example in R2:

cos(θ) = xT y/∥x∥∥y∥ .

73

0, then the angle is equal to 2 . From trigonometry, we f equals the length of the base of a triangle (jjpx jj) divided

potenuse (jjxjj). Hence, we have

Lecture notes originally by Prof. 30

D

y

jx>yj

xjj). Hence, we have

jjxjjjcosj D jx>yj; (2.42)

∥ cos(θ)∥ = ∥px∥/∥x∥ ,

(see figure) where px is called the projection of x on y.

kyk

• Since cos(θ) = xT y/∥x∥∥y∥ , we can find px by taking

y (which is defined below). It is the coordinate

px = (cos(θ)∥x∥) y = xT y y,

73

7/17/2019 Rtriangle.svg

• We also know from trigonometry that, in a right angled triangle ACB

with right angle at C, the cos of the angle at A is equal to length b of e angle is equal to 2 . From trigonometry, we

the segment AC divided by the length c of the segment AB. Thus e length of the base of a triangle (jjpx jj) divided

ith respect to a general metric A ∥y∥ ∥y∥2 > where y is the unit vector in the direction of y.

D x Ay :∥y∥ (2.43) kxkA kykA

file:///C:/Users/delaigle/Documents/COURS/MAST90085/AuroreSlides/Rtriangle.svg

o y with respect to the metric A.

Lecture notes originally by Prof. 31

h

j D

n w

t

Rotation

• When we work with vectors in Rp we generally describe them through a system of p axes (see figure on p.9 here for example: the arrows repre- sent the axes) and give the coordinates of x in that p coordinate system.

• In multivariate statistics it is sometimes useful to rotate the axes (all of them at the same time) by an angle θ, creating in this way a new p coordinate system.

• In R2, we can describe a rotation of angle θ via the orthogonal matrix cos(θ) sin(θ)

Γ = − sin(θ) cos(θ) .

Lecture notes originally by Prof. 32

Specifially if the original set of axes are rotated counter-clockwise through the origin by an angle θ then the new coordinates y of a point with co- ordinates x in the original system of axes is given by

y = Γx .

If the rotation is clockwise, then instead we have y = ΓT x .

• More generally, premultiplying a vector x by an orthogonal matrix Γ geometrically corresponds to a rotation of the system of axes.

Lecture notes originally by Prof. 33

3 MEAN, COVRIANCE, CORRELATION

Sections 3.1, 3.2, 3.3 in Ha ̈rdle and Simar (2015).

3.1 MEAN

• The mean μ ∈ Rp of a random vector X = (X1,…,Xp)T is defined by

μ1 E(X1) μ= . = . .

μp E(Xp)

• In practice can’t compute μ (we don’t observe the whole population),

but we can estimate it from a sample X1, . . . , Xn by the sample mean

where, for j = 1,…,p,

X ̄ p ̄ 1n

X ̄ 1 X ̄ = . . . ,

Xj = n

Lecture notes originally by Prof. 34

i=1

Xij

is the sample mean of the jth component Xj. • Recall the notation

X11 …X1p X=X21 …X2p

. Xn1 … Xnp

and 1n = (1,…,1)T, a column vector of length n. We can express X ̄ in matrix notation as

X ̄ = n − 1 X T 1 n .

Lecture notes originally by Prof. 35

3.2 COVARIANCE MATRIX

• The covariance σXY between two random variables X and Y is a mea-

sure of the linear dependence between those random variables: σXY =cov(X,Y)=E(XY)−E(X)E(Y).

σXX = var(X).

if X and Y are independent then σXY = 0.

However σXY = 0 does not imply that X and Y are independent (there could be a nonlinear dependence).

Lecture notes originally by Prof. 36

positive covariance

-2 -1 0 1 2 3 X

Lecture notes originally by Prof. 37

Y

-2 0 2 4 6

negative covariance

-2 -1 0 1 2 3 X

Lecture notes originally by Prof. 38

Y

-6 -4 -2 0 2

Can you picture a situation where X and Y are dependent but have zero covariance?

Lecture notes originally by Prof. 39