Chapter 19. Digression about Correlation Coefficients

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )

514

19. DIGRESSION ABOUT CORRELATION COEFFICIENTS

Problem 254. Given the constant scalars a = 0 and c = 0 and b and d arbitrary.

Show that corr[x, y] = ± corr[ax + b, cy + d], with the + sign being valid if a and c

have the same sign, and the − sign otherwise.

Answer. Start with cov[ax + b, cy + d] = ac cov[x, y] and go from there.

Besides the simple correlation coeﬃcient ρxy between two scalar variables y and

x, one can also deﬁne the squared multiple correlation coeﬃcient ρ2

y(x) between one

scalar variable y and a whole vector of variables x, and the partial correlation coefﬁcient ρ12.x between two scalar variables y 1 and y 2 , with a vector of other variables

x “partialled out.” The multiple correlation coeﬃcient measures the strength of

a linear association between y and all components of x together, and the partial

correlation coeﬃcient measures the strength of that part of the linear association

between y 1 and y 2 which cannot be attributed to their joint association with x. One

can also deﬁne partial multiple correlation coeﬃcients. If one wants to measure the

linear association between two vectors, then one number is no longer enough, but

one needs several numbers, the “canonical correlations.”

The multiple or partial correlation coeﬃcients are usually deﬁned as simple correlation coeﬃcients involving the best linear predictor or its residual. But all these

19.1. A UNIFIED DEFINITION OF CORRELATION COEFFICIENTS

515

correlation coeﬃcients share the property that they indicate a proportionate reduction in the MSE. See e.g. [Rao73, pp. 268–70]. Problem 255 makes this point for

the simple correlation coeﬃcient:

Problem 255. 4 points Show that the proportionate reduction in the MSE of

the best predictor of y, if one goes from predictors of the form y ∗ = a to predictors

of the form y ∗ = a + bx, is equal to the squared correlation coeﬃcient between y and

x. You are allowed to use the results of Problems 229 and 240. To set notation, call

the minimum MSE in the ﬁrst prediction (Problem 229) MSE[constant term; y], and

the minimum MSE in the second prediction (Problem 240) MSE[constant term and

x; y]. Show that

(19.1.2)

MSE[constant term; y] − MSE[constant term and x; y]

(cov[y, x])2

=

= ρ2 .

yx

MSE[constant term; y]

var[y] var[x]

Answer. The minimum MSE with only a constant is var[y] and (18.2.32) says that MSE[constant

term and x; y] = var[y]−(cov[x, y])2 / var[x]. Therefore the diﬀerence in MSE’s is (cov[x, y])2 / var[x],

and if one divides by var[y] to get the relative diﬀerence, one gets exactly the squared correlation

coeﬃcient.

516

19. DIGRESSION ABOUT CORRELATION COEFFICIENTS

Multiple Correlation Coeﬃcients. Now assume x is a vector while y remains a

scalar. Their joint mean vector and dispersion matrix are

(19.1.3)

Ω

x

µ

∼

, σ 2 xx

ω xy

y

ν

ω xy

.

ωyy

By theorem ??, the best linear predictor of y based on x has the formula

(19.1.4)

−

y ∗ = ν + ω xy Ω xx (x − µ)

y ∗ has the following additional extremal value property: no linear combination b x

has a higher squared correlation with y than y ∗ . This maximal value of the squared

correlation is called the squared multiple correlation coeﬃcient

(19.1.5)

2

ρy(x) =

−

ω xy Ω xxω xy

ωyy

The multiple correlation coeﬃcient itself is the positive square root, i.e., it is always

nonnegative, while some other correlation coeﬃcients may take on negative values.

The squared multiple correlation coeﬃcient can also deﬁned in terms of proportionate reduction in MSE. It is equal to the proportionate reduction in the MSE of

the best predictor of y if one goes from predictors of the form y ∗ = a to predictors

19.1. A UNIFIED DEFINITION OF CORRELATION COEFFICIENTS

517

of the form y ∗ = a + b x, i.e.,

(19.1.6)

ρ2

y(x) =

MSE[constant term; y] − MSE[constant term and x; y]

MSE[constant term; y]

There are therefore two natural deﬁnitions of the multiple correlation coeﬃcient.

These two deﬁnitions correspond to the two formulas for R2 in (18.3.6).

Partial Correlation Coeﬃcients. Now assume y = y 1 y 2

is a vector with

two elements and write





   

Ω xx ω y1 ω y2

x

µ

y 1  ∼ ν1  , σ 2  ω y1 ω11 ω12  .

(19.1.7)

y2

ν2

ω y2 ω21 ω22

Let y ∗ be the best linear predictor of y based on x. The partial correlation coeﬃcient

ρ12.x is deﬁned to be the simple correlation between the residuals corr[(y 1 −y ∗ ), (y 2 −

1

y ∗ )]. This measures the correlation between y 1 and y 2 which is “local,” i.e., which

2

does not follow from their association with x. Assume for instance that both y 1 and

y 2 are highly correlated with x. Then they will also have a high correlation with

each other. Subtracting y ∗ from y i eliminates this dependency on x, therefore any

i

remaining correlation is “local.” Compare [Krz88, p. 475].

518

19. DIGRESSION ABOUT CORRELATION COEFFICIENTS

The partial correlation coeﬃcient can be deﬁned as the relative reduction in the

MSE if one adds y 2 to x as a predictor of y 1 :

(19.1.8)

MSE[constant term and x; y 2 ] − MSE[constant term, x, and y 1 ; y 2 ]

ρ2 =

.

12.x

MSE[constant term and x; y 2 ]

Problem 256. Using the deﬁnitions in terms of MSE’s, show that the following

relationship holds between the squares of multiple and partial correlation coeﬃcients:

(19.1.9)

2

2

1 − ρ2

2(x,1) = (1 − ρ21.x )(1 − ρ2(x) )

Answer. In terms of the MSE, (19.1.9) reads

(19.1.10)

MSE[constant term, x, and y 1 ; y 2 ]

MSE[constant term, x, and y 1 ; y 2 ] MSE[constant term and x;

=

MSE[constant term; y 2 ]

MSE[constant term and x; y 2 ]

MSE[constant term; y 2 ]

From (19.1.9) follows the following weighted average formula:

(19.1.11)

2

2

2

ρ2

2(x,1) = ρ2(x) + (1 − ρ2(x) )ρ21.x

An alternative proof of (19.1.11) is given in [Gra76, pp. 116/17].

19.2. CORRELATION COEFFICIENTS AND THE ASSOCIATED LEAST SQUARES PROBLEM

519

Mixed cases: One can also form multiple correlations coeﬃcients with some of

the variables partialled out. The dot notation used here is due to Yule, [Yul07]. The

notation, deﬁnition, and formula for the squared correlation coeﬃcient is

(19.1.12)

ρ2

y(x).z =

(19.1.13)

=

MSE[constant term and z; y] − MSE[constant term, z, and x; y]

MSE[constant term and z; y]

ω xy.z Ω − ω xy.z

xx.z

ωyy.z

19.2. Correlation Coeﬃcients and the Associated Least Squares Problem

One can deﬁne the correlation coeﬃcients also as proportionate reductions in

the objective functions of the associated GLS problems. However one must reverse

predictor and predictand, i.e., one must look at predictions of a vector x by linear

functions of a scalar y.

Here it is done for multiple correlation coeﬃcients: The value of the GLS objective function if one predicts x by the best linear predictor x∗ , which is the minimum

attainable when the scalar observation y is given and the vector x can be chosen

520

19. DIGRESSION ABOUT CORRELATION COEFFICIENTS

freely, as long as it satisﬁes the constraint x = µ + Ω xx q for some q, is

(19.2.1)

−

Ω xx ω xy

x−µ

−

(y − ν)

SSE[y; best x] = min (x − µ)

= (y−ν) ωyy (y−

ω xy ωyy

y−ν

xs.t....

On the other hand, the value of the GLS objective function when one predicts

x by the best constant x = µ is

(19.2.2)

−

−

Ωxx

Ω − + Ω − ω xy ωyy.xω xy Ω − −Ω − ω xy ωyy.x

o

xx

xx

xx

(y − ν)

SSE[y; x = µ] = o

−

−

−

y−

−ωyy.xω xy Ω xx

ωyy.x

−

= (y − ν) ωyy.x (y − ν).

(19.2.3)

The proportionate reduction in the objective function is

(19.2.4)

(19.2.5)

SSE[y; x = µ] − SSE[y; best x]

(y − ν)2 /ωyy.x − (y − ν)2 /ωyy

=

=

SSE[y; x = µ]

(y − ν)2 /ωyy.x

=

ωyy − ωyy.x

ωyy.x

1

2

= ρ2

= 1 − yy

= ρy(x)

y(x) = 1 −

ωyy

ωyy

ω ωyy

19.3. CANONICAL CORRELATIONS

521

19.3. Canonical Correlations

Now what happens with the correlation coeﬃcients if both predictor and predictand are vectors? In this case one has more than one correlation coeﬃcient. One ﬁrst

ﬁnds those two linear combinations of the two vectors which have highest correlation,

then those which are uncorrelated with the ﬁrst and have second highest correlation,

and so on. Here is the mathematical construction needed:

Let x and y be two column vectors consisting of p and q scalar random variables,

respectively, and let

(19.3.1)

x

2 Ω xx

V[ y ] = σ Ω

yx

Ω xy

,

Ω yy

where Ω xx and Ω yy are nonsingular, and let r be the rank of Ω xy . Then there exist

two separate transformations

(19.3.2)

u = Lx,

v = My

such that

(19.3.3)

u

2 Ip

V[ v ] = σ

Λ

Λ

Iq

522

19. DIGRESSION ABOUT CORRELATION COEFFICIENTS

where Λ is a (usually rectangular) diagonal matrix with only r diagonal elements

positive, and the others zero, and where these diagonal elements are sorted in descending order.

Proof: One obtains the matrix Λ by a singular value decomposition of Ω −1/2Ω xy Ω −

xx

yy

A, say. Let A = P ΛQ be its singular value decomposition with fully orthogonal

Ω−1/2

matrices, as in equation (A.9.8). Deﬁne L = P Ω−1/2 and M = QΩ yy . Therefore

xx

Ω

Ω

Ω

LΩ xx L = I, MΩ yy M = I, and LΩ xy M = P Ω −1/2Ω xy Ω −1/2 Q = P AQ =

xx

yy

Λ.

The next problems show how one gets from this the maximization property of

the canonical correlation coeﬃcients:

Problem 257. Show that for every p-vector l and q-vector m,

(19.3.4)

corr(l x, m y) ≤ λ1

where λ1 is the ﬁrst (and therefore biggest) diagonal element of Λ. Equality in

(19.3.4) holds if l = l1 , the ﬁrst row in L, and m = m1 , the ﬁrst row in M .

Answer: If l or m is the null vector, then there is nothing to prove. If neither of

them is a null vector, then one can, without loss of generality, multiply them with

appropriate scalars so that p = (L−1 ) l and q = (M −1 ) m satisfy p p = 1 and

19.3. CANONICAL CORRELATIONS

q q = 1. Then

(19.3.5)

p Lx

p

l x

] = V[

] = V[

V[

q My

o

m y

o

q

u

p

] = σ2

v

o

523

o

q

Ip

Λ

Λ

Iq

p o

=σ

o q

Since the matrix at the righthand side has ones in the diagonal, it is the correlation

matrix, i.e., p Λq = corr(l x, m y). Therefore (19.3.4) follows from Problem 258.

2

Problem 258. If

p2 = qi = 1, and λi ≥ 0, show that | pi λi qi | ≤ max λi .

i

Hint: ﬁrst get an upper bound for | pi λi qi | through a Cauchy-Schwartz-type argument.

Answer. (

pi λi qi )2 ≤

p2 λi

i

2

qi λi ≤ (max λi )2 .

Problem 259. Show that for every p-vector l and q-vector m such that l x is

uncorrelated with l1 x, and m y is uncorrelated with m1 y,

(19.3.6)

corr(l x, m y) ≤ λ2

where λ2 is the second diagonal element of Λ. Equality in (19.3.6) holds if l = l2 ,

the second row in L, and m = m2 , the second row in M .

Answer. If l or m is the null vector, then there is nothing to prove. If neither of them is a

null vector, then one can, without loss of generality, multiply them with appropriate scalars so that

524

19. DIGRESSION ABOUT CORRELATION COEFFICIENTS

p = (L−1 ) l and q = (M −1 ) m satisfy p p = 1 and q q = 1. Now write e1 for the ﬁrst unit

vector, which has a 1 as ﬁrst component and zeros everywhere else:

(19.3.7)

cov[l x, l1 x] = cov[p Lx, e1 Lx] = p Λe1 = p e1 λ1 .

This covariance is zero iﬀ p1 = 0. Furthermore one also needs the following, directly from the proof

of Problem 257:

(19.3.8)

Ip

Λ

p Lx

p

o

u

p

o

p o

p p

l x

] = V[

] = V[

] = σ2

= σ2

V[

q My

o

q

v

o

q

o q

q Λp

Λ

Iq

m y

Since the matrix at the righthand side has ones in the diagonal, it is the correlation matrix, i.e.,

p Λq = corr(l x, m y). Equation (19.3.6) follows from Problem 258 if one lets the subscript i

start at 2 instead of 1.

Problem 260. (Not eligible for in-class exams) Extra credit question for good

mathematicians: Reformulate the above treatment of canonical correlations without

the assumption that Ω xx and Ω yy are nonsingular.

19.4. Some Remarks about the Sample Partial Correlation Coeﬃcients

The deﬁnition of the partial sample correlation coeﬃcients is analogous to that of

the partial population correlation coeﬃcients: Given two data vectors y and z, and

the matrix X (which includes a constant term), and let M = I −X(X X)−1 X be

Xem Thêm

Chapter 19. Digression about Correlation Coefficients

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về