1. Trang chủ >
  2. Kinh Doanh - Tiếp Thị >
  3. Kế hoạch kinh doanh >

Chapter 19. Digression about Correlation Coefficients

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )


514



19. DIGRESSION ABOUT CORRELATION COEFFICIENTS



Problem 254. Given the constant scalars a = 0 and c = 0 and b and d arbitrary.

Show that corr[x, y] = ± corr[ax + b, cy + d], with the + sign being valid if a and c

have the same sign, and the − sign otherwise.

Answer. Start with cov[ax + b, cy + d] = ac cov[x, y] and go from there.



Besides the simple correlation coefficient ρxy between two scalar variables y and

x, one can also define the squared multiple correlation coefficient ρ2

y(x) between one

scalar variable y and a whole vector of variables x, and the partial correlation coefficient ρ12.x between two scalar variables y 1 and y 2 , with a vector of other variables

x “partialled out.” The multiple correlation coefficient measures the strength of

a linear association between y and all components of x together, and the partial

correlation coefficient measures the strength of that part of the linear association

between y 1 and y 2 which cannot be attributed to their joint association with x. One

can also define partial multiple correlation coefficients. If one wants to measure the

linear association between two vectors, then one number is no longer enough, but

one needs several numbers, the “canonical correlations.”

The multiple or partial correlation coefficients are usually defined as simple correlation coefficients involving the best linear predictor or its residual. But all these



19.1. A UNIFIED DEFINITION OF CORRELATION COEFFICIENTS



515



correlation coefficients share the property that they indicate a proportionate reduction in the MSE. See e.g. [Rao73, pp. 268–70]. Problem 255 makes this point for

the simple correlation coefficient:

Problem 255. 4 points Show that the proportionate reduction in the MSE of

the best predictor of y, if one goes from predictors of the form y ∗ = a to predictors

of the form y ∗ = a + bx, is equal to the squared correlation coefficient between y and

x. You are allowed to use the results of Problems 229 and 240. To set notation, call

the minimum MSE in the first prediction (Problem 229) MSE[constant term; y], and

the minimum MSE in the second prediction (Problem 240) MSE[constant term and

x; y]. Show that

(19.1.2)

MSE[constant term; y] − MSE[constant term and x; y]

(cov[y, x])2

=

= ρ2 .

yx

MSE[constant term; y]

var[y] var[x]



Answer. The minimum MSE with only a constant is var[y] and (18.2.32) says that MSE[constant

term and x; y] = var[y]−(cov[x, y])2 / var[x]. Therefore the difference in MSE’s is (cov[x, y])2 / var[x],

and if one divides by var[y] to get the relative difference, one gets exactly the squared correlation

coefficient.



516



19. DIGRESSION ABOUT CORRELATION COEFFICIENTS



Multiple Correlation Coefficients. Now assume x is a vector while y remains a

scalar. Their joint mean vector and dispersion matrix are

(19.1.3)





x

µ



, σ 2 xx

ω xy

y

ν



ω xy

.

ωyy



By theorem ??, the best linear predictor of y based on x has the formula

(19.1.4)





y ∗ = ν + ω xy Ω xx (x − µ)



y ∗ has the following additional extremal value property: no linear combination b x

has a higher squared correlation with y than y ∗ . This maximal value of the squared

correlation is called the squared multiple correlation coefficient

(19.1.5)



2

ρy(x) =





ω xy Ω xxω xy

ωyy



The multiple correlation coefficient itself is the positive square root, i.e., it is always

nonnegative, while some other correlation coefficients may take on negative values.

The squared multiple correlation coefficient can also defined in terms of proportionate reduction in MSE. It is equal to the proportionate reduction in the MSE of

the best predictor of y if one goes from predictors of the form y ∗ = a to predictors



19.1. A UNIFIED DEFINITION OF CORRELATION COEFFICIENTS



517



of the form y ∗ = a + b x, i.e.,

(19.1.6)



ρ2

y(x) =



MSE[constant term; y] − MSE[constant term and x; y]

MSE[constant term; y]



There are therefore two natural definitions of the multiple correlation coefficient.

These two definitions correspond to the two formulas for R2 in (18.3.6).

Partial Correlation Coefficients. Now assume y = y 1 y 2

is a vector with

two elements and write





   

Ω xx ω y1 ω y2

x

µ

y 1  ∼ ν1  , σ 2  ω y1 ω11 ω12  .

(19.1.7)

y2

ν2

ω y2 ω21 ω22

Let y ∗ be the best linear predictor of y based on x. The partial correlation coefficient

ρ12.x is defined to be the simple correlation between the residuals corr[(y 1 −y ∗ ), (y 2 −

1

y ∗ )]. This measures the correlation between y 1 and y 2 which is “local,” i.e., which

2

does not follow from their association with x. Assume for instance that both y 1 and

y 2 are highly correlated with x. Then they will also have a high correlation with

each other. Subtracting y ∗ from y i eliminates this dependency on x, therefore any

i

remaining correlation is “local.” Compare [Krz88, p. 475].



518



19. DIGRESSION ABOUT CORRELATION COEFFICIENTS



The partial correlation coefficient can be defined as the relative reduction in the

MSE if one adds y 2 to x as a predictor of y 1 :

(19.1.8)

MSE[constant term and x; y 2 ] − MSE[constant term, x, and y 1 ; y 2 ]

ρ2 =

.

12.x

MSE[constant term and x; y 2 ]

Problem 256. Using the definitions in terms of MSE’s, show that the following

relationship holds between the squares of multiple and partial correlation coefficients:

(19.1.9)



2

2

1 − ρ2

2(x,1) = (1 − ρ21.x )(1 − ρ2(x) )



Answer. In terms of the MSE, (19.1.9) reads

(19.1.10)

MSE[constant term, x, and y 1 ; y 2 ]

MSE[constant term, x, and y 1 ; y 2 ] MSE[constant term and x;

=

MSE[constant term; y 2 ]

MSE[constant term and x; y 2 ]

MSE[constant term; y 2 ]



From (19.1.9) follows the following weighted average formula:

(19.1.11)



2

2

2

ρ2

2(x,1) = ρ2(x) + (1 − ρ2(x) )ρ21.x



An alternative proof of (19.1.11) is given in [Gra76, pp. 116/17].



19.2. CORRELATION COEFFICIENTS AND THE ASSOCIATED LEAST SQUARES PROBLEM

519



Mixed cases: One can also form multiple correlations coefficients with some of

the variables partialled out. The dot notation used here is due to Yule, [Yul07]. The

notation, definition, and formula for the squared correlation coefficient is

(19.1.12)

ρ2

y(x).z =

(19.1.13)



=



MSE[constant term and z; y] − MSE[constant term, z, and x; y]

MSE[constant term and z; y]

ω xy.z Ω − ω xy.z

xx.z

ωyy.z



19.2. Correlation Coefficients and the Associated Least Squares Problem

One can define the correlation coefficients also as proportionate reductions in

the objective functions of the associated GLS problems. However one must reverse

predictor and predictand, i.e., one must look at predictions of a vector x by linear

functions of a scalar y.

Here it is done for multiple correlation coefficients: The value of the GLS objective function if one predicts x by the best linear predictor x∗ , which is the minimum

attainable when the scalar observation y is given and the vector x can be chosen



520



19. DIGRESSION ABOUT CORRELATION COEFFICIENTS



freely, as long as it satisfies the constraint x = µ + Ω xx q for some q, is

(19.2.1)



Ω xx ω xy

x−µ



(y − ν)

SSE[y; best x] = min (x − µ)

= (y−ν) ωyy (y−

ω xy ωyy

y−ν

xs.t....



On the other hand, the value of the GLS objective function when one predicts

x by the best constant x = µ is

(19.2.2)





Ωxx

Ω − + Ω − ω xy ωyy.xω xy Ω − −Ω − ω xy ωyy.x

o

xx

xx

xx

(y − ν)

SSE[y; x = µ] = o







y−

−ωyy.xω xy Ω xx

ωyy.x



= (y − ν) ωyy.x (y − ν).



(19.2.3)



The proportionate reduction in the objective function is

(19.2.4)



(19.2.5)



SSE[y; x = µ] − SSE[y; best x]

(y − ν)2 /ωyy.x − (y − ν)2 /ωyy

=

=

SSE[y; x = µ]

(y − ν)2 /ωyy.x

=



ωyy − ωyy.x

ωyy.x

1

2

= ρ2

= 1 − yy

= ρy(x)

y(x) = 1 −

ωyy

ωyy

ω ωyy



19.3. CANONICAL CORRELATIONS



521



19.3. Canonical Correlations

Now what happens with the correlation coefficients if both predictor and predictand are vectors? In this case one has more than one correlation coefficient. One first

finds those two linear combinations of the two vectors which have highest correlation,

then those which are uncorrelated with the first and have second highest correlation,

and so on. Here is the mathematical construction needed:

Let x and y be two column vectors consisting of p and q scalar random variables,

respectively, and let

(19.3.1)



x

2 Ω xx

V[ y ] = σ Ω

yx



Ω xy

,

Ω yy



where Ω xx and Ω yy are nonsingular, and let r be the rank of Ω xy . Then there exist

two separate transformations

(19.3.2)



u = Lx,



v = My



such that

(19.3.3)



u

2 Ip

V[ v ] = σ

Λ



Λ

Iq



522



19. DIGRESSION ABOUT CORRELATION COEFFICIENTS



where Λ is a (usually rectangular) diagonal matrix with only r diagonal elements

positive, and the others zero, and where these diagonal elements are sorted in descending order.

Proof: One obtains the matrix Λ by a singular value decomposition of Ω −1/2Ω xy Ω −

xx

yy

A, say. Let A = P ΛQ be its singular value decomposition with fully orthogonal

Ω−1/2

matrices, as in equation (A.9.8). Define L = P Ω−1/2 and M = QΩ yy . Therefore

xx







LΩ xx L = I, MΩ yy M = I, and LΩ xy M = P Ω −1/2Ω xy Ω −1/2 Q = P AQ =

xx

yy

Λ.

The next problems show how one gets from this the maximization property of

the canonical correlation coefficients:

Problem 257. Show that for every p-vector l and q-vector m,

(19.3.4)



corr(l x, m y) ≤ λ1



where λ1 is the first (and therefore biggest) diagonal element of Λ. Equality in

(19.3.4) holds if l = l1 , the first row in L, and m = m1 , the first row in M .

Answer: If l or m is the null vector, then there is nothing to prove. If neither of

them is a null vector, then one can, without loss of generality, multiply them with

appropriate scalars so that p = (L−1 ) l and q = (M −1 ) m satisfy p p = 1 and



19.3. CANONICAL CORRELATIONS



q q = 1. Then

(19.3.5)

p Lx

p

l x

] = V[

] = V[

V[

q My

o

m y



o

q



u

p

] = σ2

v

o



523



o

q



Ip

Λ



Λ

Iq



p o



o q



Since the matrix at the righthand side has ones in the diagonal, it is the correlation

matrix, i.e., p Λq = corr(l x, m y). Therefore (19.3.4) follows from Problem 258.

2

Problem 258. If

p2 = qi = 1, and λi ≥ 0, show that | pi λi qi | ≤ max λi .

i

Hint: first get an upper bound for | pi λi qi | through a Cauchy-Schwartz-type argument.



Answer. (



pi λi qi )2 ≤



p2 λi

i



2

qi λi ≤ (max λi )2 .



Problem 259. Show that for every p-vector l and q-vector m such that l x is

uncorrelated with l1 x, and m y is uncorrelated with m1 y,

(19.3.6)



corr(l x, m y) ≤ λ2



where λ2 is the second diagonal element of Λ. Equality in (19.3.6) holds if l = l2 ,

the second row in L, and m = m2 , the second row in M .

Answer. If l or m is the null vector, then there is nothing to prove. If neither of them is a

null vector, then one can, without loss of generality, multiply them with appropriate scalars so that



524



19. DIGRESSION ABOUT CORRELATION COEFFICIENTS



p = (L−1 ) l and q = (M −1 ) m satisfy p p = 1 and q q = 1. Now write e1 for the first unit

vector, which has a 1 as first component and zeros everywhere else:

(19.3.7)



cov[l x, l1 x] = cov[p Lx, e1 Lx] = p Λe1 = p e1 λ1 .



This covariance is zero iff p1 = 0. Furthermore one also needs the following, directly from the proof

of Problem 257:

(19.3.8)

Ip

Λ

p Lx

p

o

u

p

o

p o

p p

l x

] = V[

] = V[

] = σ2

= σ2

V[

q My

o

q

v

o

q

o q

q Λp

Λ

Iq

m y

Since the matrix at the righthand side has ones in the diagonal, it is the correlation matrix, i.e.,

p Λq = corr(l x, m y). Equation (19.3.6) follows from Problem 258 if one lets the subscript i

start at 2 instead of 1.



Problem 260. (Not eligible for in-class exams) Extra credit question for good

mathematicians: Reformulate the above treatment of canonical correlations without

the assumption that Ω xx and Ω yy are nonsingular.

19.4. Some Remarks about the Sample Partial Correlation Coefficients

The definition of the partial sample correlation coefficients is analogous to that of

the partial population correlation coefficients: Given two data vectors y and z, and

the matrix X (which includes a constant term), and let M = I −X(X X)−1 X be



Xem Thêm
Tải bản đầy đủ (.pdf) (1,644 trang)

×