Chapter 18. Mean-Variance Analysis in the Linear Model

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )

482

18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL

εi = y i −µ, and deﬁne the vectors y = y 1

and ι = 1

(18.1.1)

1 ···

1

y2

···

yn

, ε = ε1

ε2

···

εn

,

. Then one can write the model in the form

y = ιµ + ε

ε ∼ (o, σ 2 I)

ε

ε

The notation ε ∼ (o, σ 2 I) is shorthand for E [ε ] = o (the null vector) and V [ε ] = σ 2 I

(σ 2 times the identity matrix, which has 1’s in the diagonal and 0’s elsewhere). µ is

the deterministic part of all the y i , and εi is the random part.

Model 2 is “simple regression” in which the deterministic part µ is not constant

but is a function of the nonrandom variable x. The assumption here is that this

function is diﬀerentiable and can, in the range of the variation of the data, be approximated by a linear function [Tin51, pp. 19–20]. I.e., each element of y is a

constant α plus a constant multiple of the corresponding element of the nonrandom

vector x plus a random error term: y t = α + xt β + εt , t = 1, . . . , n. This can be

written as

   

 

  



 

y1

ε1

ε1

1

x1

1 x1

 .  .

.  α + . 

.  = . α +  .  β +  .  = .

. 

.  .

. 

(18.1.2)

.

 .

.

.

.

.

.

.

β

yn

1

xn

εn

1 xn

εn

or

(18.1.3)

y = Xβ + ε

ε ∼ (o, σ 2 I)

18.1. THREE VERSIONS OF THE LINEAR MODEL

483

Problem 228. 1 point Compute the matrix product





4 0

1 2 5 

2 1

0 3 1

3 8

Answer.

1

0

2

3

5

1

4

2

3

0

1

8

=

1·4+2·2+5·3

0·4+3·2+1·3

1·0+2·1+5·8

23

=

0·0+3·1+1·8

9

42

11

If the systematic part of y depends on more than one variable, then one needs

multiple regression, model 3. Mathematically, multiple regression has the same form

(18.1.3), but this time X is arbitrary (except for the restriction that all its columns

are linearly independent). Model 3 has Models 1 and 2 as special cases.

Multiple regression is also used to “correct for” disturbing inﬂuences. Let me

explain. A functional relationship, which makes the systematic part of y dependent

on some other variable x will usually only hold if other relevant inﬂuences are kept

constant. If those other inﬂuences vary, then they may aﬀect the form of this functional relation. For instance, the marginal propensity to consume may be aﬀected

by the interest rate, or the unemployment rate. This is why some econometricians

484

18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL

(Hendry) advocate that one should start with an “encompassing” model with many

explanatory variables and then narrow the speciﬁcation down by hypothesis tests.

Milton Friedman, by contrast, is very suspicious about multiple regressions, and

argues in [FS91, pp. 48/9] against the encompassing approach.

Friedman does not give a theoretical argument but argues by an example from

Chemistry. Perhaps one can say that the variations in the other inﬂuences may have

more serious implications than just modifying the form of the functional relation:

they may destroy this functional relation altogether, i.e., prevent any systematic or

predictable behavior.

observed unobserved

random

y

ε

nonrandom

X

β, σ 2

18.2. Ordinary Least Squares

ˆ

In the model y = Xβ + ε , where ε ∼ (o, σ 2 I), the OLS-estimate β is deﬁned to

ˆ which minimizes

be that value β = β

(18.2.1)

SSE = (y − Xβ) (y − Xβ) = y y − 2y Xβ + β X Xβ.

Problem 184 shows that in model 1, this principle yields the arithmetic mean.

18.2. ORDINARY LEAST SQUARES

485

Problem 229. 2 points Prove that, if one predicts a random variable y by a

constant a, the constant which gives the best MSE is a = E[y], and the best MSE one

can get is var[y].

Answer. E[(y − a)2 ] = E[y 2 ] − 2a E[y] + a2 . Diﬀerentiate with respect to a and set zero to

get a = E[y]. One can also diﬀerentiate ﬁrst and then take expected value: E[2(y − a)] = 0.

We will solve this minimization problem using the ﬁrst-order conditions in vector

notation. As a preparation, you should read the beginning of Appendix C about

matrix diﬀerentiation and the connection between matrix diﬀerentiation and the

Jacobian matrix of a vector function. All you need at this point is the two equations

(C.1.6) and (C.1.7). The chain rule (C.1.23) is enlightening but not strictly necessary

for the present derivation.

The matrix diﬀerentiation rules (C.1.6) and (C.1.7) allow us to diﬀerentiate

(18.2.1) to get

(18.2.2)

∂SSE/∂β = −2y X + 2β X X.

Transpose it (because it is notationally simpler to have a relationship between column

ˆ

vectors), set it zero while at the same time replacing β by β, and divide by 2, to get

the “normal equation”

(18.2.3)

ˆ

X y = X X β.

486

18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL

Due to our assumption that all columns of X are linearly independent, X X has

an inverse and one can premultiply both sides of (18.2.3) by (X X)−1 :

(18.2.4)

ˆ

β = (X X)−1 X y.

If the columns of X are not linearly independent, then (18.2.3) has more than one

solution, and the normal equation is also in this case a necessary and suﬃcient

ˆ

condition for β to minimize the SSE (proof in Problem 232).

Problem 230. 4 points Using the matrix diﬀerentiation rules

(18.2.5)

(18.2.6)

∂w x/∂x = w

∂x M x/∂x = 2x M

ˆ

for symmetric M , compute the least-squares estimate β which minimizes

(18.2.7)

SSE = (y − Xβ) (y − Xβ)

You are allowed to assume that X X has an inverse.

Answer. First you have to multiply out

(18.2.8)

(y − Xβ) (y − Xβ) = y y − 2y Xβ + β X Xβ.

The matrix diﬀerentiation rules (18.2.5) and (18.2.6) allow us to diﬀerentiate (18.2.8) to get

(18.2.9)

∂SSE/∂β

= −2y X + 2β X X.

18.2. ORDINARY LEAST SQUARES

487

Transpose it (because it is notationally simpler to have a relationship between column vectors), set

ˆ

it zero while at the same time replacing β by β, and divide by 2, to get the “normal equation”

(18.2.10)

ˆ

X y = X X β.

Since X X has an inverse, one can premultiply both sides of (18.2.10) by (X X)−1 :

(18.2.11)

ˆ

β = (X X)−1 X y.

Problem 231. 2 points Show the following: if the columns of X are linearly

independent, then X X has an inverse. (X itself is not necessarily square.) In your

proof you may use the following criteria: the columns of X are linearly independent

(this is also called: X has full column rank) if and only if Xa = o implies a = o.

And a square matrix has an inverse if and only if its columns are linearly independent.

Answer. We have to show that any a which satisﬁes X Xa = o is itself the null vector.

From X Xa = o follows a X Xa = 0 which can also be written Xa 2 = 0. Therefore Xa = o,

and since the columns of X are linearly independent, this implies a = o.

Problem 232. 3 points In this Problem we do not assume that X has full column

rank, it may be arbitrary.

• a. The normal equation (18.2.3) has always at least one solution. Hint: you

are allowed to use, without proof, equation (A.3.3) in the mathematical appendix.

488

18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL

ˆ

Answer. With this hint it is easy: β = (X X)− X y is a solution.

ˆ

• b. If β satisﬁes the normal equation and β is an arbitrary vector, then

ˆ

ˆ

ˆ

ˆ

(18.2.12) (y − Xβ) (y − Xβ) = (y − X β) (y − X β) + (β − β) X X(β − β).

Answer. This is true even if X has deﬁcient rank, and it will be shown here in this general

ˆ

ˆ

ˆ

ˆ

case. To prove (18.2.12), write (18.2.1) as SSE = (y − X β) − X(β − β)

(y − X β) − X(β − β) ;

ˆ satisﬁes (18.2.3), the cross product terms disappear.

since β

• c. Conclude from this that the normal equation is a necessary and suﬃcient

ˆ

condition characterizing the values β minimizing the sum of squared errors (18.2.12).

Answer. (18.2.12) shows that the normal equations are suﬃcient. For necessity of the normal

ˆ

equations let β be an arbitrary solution of the normal equation, we have seen that there is always

ˆ

at least one. Given β, it follows from (18.2.12) that for any solution β ∗ of the minimization,

∗ − β) = o. Use (18.2.3) to replace (X X)β by X y to get X Xβ ∗ = X y.

ˆ

ˆ

X X(β

ˆ ˆ

It is customary to use the notation X β = y for the so-called ﬁtted values, which

are the estimates of the vector of means η = Xβ. Geometrically, y is the orthogonal

ˆ

projection of y on the space spanned by the columns of X. See Theorem A.6.1 about

projection matrices.

The vector of diﬀerences between the actual and the ﬁtted values is called the

vector of “residuals” ε = y − y . The residuals are “predictors” of the actual (but

ˆ

ˆ

18.2. ORDINARY LEAST SQUARES

489

unobserved) values of the disturbance vector ε . An estimator of a random magnitude

is usually called a “predictor,” but in the linear model estimation and prediction are

treated on the same footing, therefore it is not necessary to distinguish between the

two.

You should understand the diﬀerence between disturbances and residuals, and

between the two decompositions

(18.2.13)

ˆ ˆ

y = Xβ + ε = X β + ε

Problem 233. 2 points Assume that X has full column rank. Show that ε = M y

ˆ

where M = I − X(X X)−1 X . Show that M is symmetric and idempotent.

ˆ

Answer. By deﬁnition, ε = y − X β = y − X(X X)−1 Xy = I − X(X X)−1 X y. Idemˆ

potent, i.e. M M = M :

(18.2.14)

M M = I − X(X X)−1 X

I − X(X X)−1 X

= I − X(X X)−1 X

− X(X X)−1 X

Problem 234. Assume X has full column rank. Deﬁne M = I−X(X X)−1 X .

• a. 1 point Show that the space M projects on is the space orthogonal to all

columns in X, i.e., M q = q if and only if X q = o.

490

18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL

Answer. X q = o clearly implies M q = q. Conversely, M q = q implies X(X X)−1 X q =

o. Premultiply this by X to get X q = o.

• b. 1 point Show that a vector q lies in the range space of X, i.e., the space

spanned by the columns of X, if and only if M q = o. In other words, {q : q = Xa

for some a} = {q : M q = o}.

Answer. First assume M q = o. This means q = X(X X)−1 X q = Xa with a =

(X X)−1 X q. Conversely, if q = Xa then M q = M Xa = Oa = o.

Problem 235. In 2-dimensional space, write down the projection matrix on the

diagonal line y = x (call it E), and compute Ez for the three vectors a = [ 2 ],

1

b = [ 2 ], and c = [ 3 ]. Draw these vectors and their projections.

2

2

Assume we have a dependent variable y and two regressors x1 and x2 , each with

15 observations. Then one can visualize the data either as 15 points in 3-dimensional

space (a 3-dimensional scatter plot), or 3 points in 15-dimensional space. In the

ﬁrst case, each point corresponds to an observation, in the second case, each point

corresponds to a variable. In this latter case the points are usually represented

as vectors. You only have 3 vectors, but each of these vectors is a vector in 15dimensional space. But you do not have to draw a 15-dimensional space to draw

these vectors; these 3 vectors span a 3-dimensional subspace, and y is the projection

ˆ

of the vector y on the space spanned by the two regressors not only in the original

18.2. ORDINARY LEAST SQUARES

491

15-dimensional space, but already in this 3-dimensional subspace. In other words,

[DM93, Figure 1.3] is valid in all dimensions! In the 15-dimensional space, each

dimension represents one observation. In the 3-dimensional subspace, this is no

longer true.

Problem 236. “Simple regression” is regression with an intercept and one explanatory variable only, i.e.,

(18.2.15)

y t = α + βxt + εt

Here X = ι x and β = α

ˆ

for β = α β :

ˆ ˆ

β

. Evaluate (18.2.4) to get the following formulas

x2 y t − xt xt y t

t

n x2 − ( xt )2

t

n xt y t − xt y t

ˆ

β=

n x2 − ( xt )2

t

(18.2.16)

α=

ˆ

(18.2.17)

Answer.

(18.2.18)

X X=

ι

x

ι

x =

ι ι

x ι

ι x

=

x x

n

xt

xt

x2

t

492

18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL

(18.2.19)

X X −1 =

(18.2.20)

n

x2

t

X y=

1

−(

xt

)2

−

ι y

=

x y

x2

t

xt

−

xt

n

yt

xi y t

Therefore (X X)−1 X y gives equations (18.2.16) and (18.2.17).

Problem 237. Show that

n

n

(xt − x)(y t − y ) =

¯

¯

(18.2.21)

t=1

xt y t − n¯y

x¯

t=1

(Note, as explained in [DM93, pp. 27/8] or [Gre97, Section 5.4.1], that the left

hand side is computationally much more stable than the right.)

Answer. Simply multiply out.

Problem 238. Show that (18.2.17) and (18.2.16) can also be written as follows:

(18.2.22)

(18.2.23)

(xt − x)(y t − y )

¯

¯

(xt − x)2

¯

α = y − βx

ˆ ¯ ˆ¯

ˆ

β=

Xem Thêm

Chapter 18. Mean-Variance Analysis in the Linear Model

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về