1. Trang chủ >
  2. Kinh Doanh - Tiếp Thị >
  3. Kế hoạch kinh doanh >

Chapter 18. Mean-Variance Analysis in the Linear Model

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )


482



18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL



εi = y i −µ, and define the vectors y = y 1

and ι = 1

(18.1.1)



1 ···



1



y2



···



yn



, ε = ε1



ε2



···



εn



,



. Then one can write the model in the form

y = ιµ + ε



ε ∼ (o, σ 2 I)



ε

ε

The notation ε ∼ (o, σ 2 I) is shorthand for E [ε ] = o (the null vector) and V [ε ] = σ 2 I

(σ 2 times the identity matrix, which has 1’s in the diagonal and 0’s elsewhere). µ is

the deterministic part of all the y i , and εi is the random part.

Model 2 is “simple regression” in which the deterministic part µ is not constant

but is a function of the nonrandom variable x. The assumption here is that this

function is differentiable and can, in the range of the variation of the data, be approximated by a linear function [Tin51, pp. 19–20]. I.e., each element of y is a

constant α plus a constant multiple of the corresponding element of the nonrandom

vector x plus a random error term: y t = α + xt β + εt , t = 1, . . . , n. This can be

written as

   

 

  



 

y1

ε1

ε1

1

x1

1 x1

 .  .

.  α + . 

.  = . α +  .  β +  .  = .

. 

.  .

. 

(18.1.2)

.

 .

.

.

.

.

.

.

β

yn

1

xn

εn

1 xn

εn

or

(18.1.3)



y = Xβ + ε



ε ∼ (o, σ 2 I)



18.1. THREE VERSIONS OF THE LINEAR MODEL



483



Problem 228. 1 point Compute the matrix product





4 0

1 2 5 

2 1

0 3 1

3 8

Answer.

1

0



2

3



5

1



4

2

3



0

1

8



=



1·4+2·2+5·3

0·4+3·2+1·3



1·0+2·1+5·8

23

=

0·0+3·1+1·8

9



42

11



If the systematic part of y depends on more than one variable, then one needs

multiple regression, model 3. Mathematically, multiple regression has the same form

(18.1.3), but this time X is arbitrary (except for the restriction that all its columns

are linearly independent). Model 3 has Models 1 and 2 as special cases.

Multiple regression is also used to “correct for” disturbing influences. Let me

explain. A functional relationship, which makes the systematic part of y dependent

on some other variable x will usually only hold if other relevant influences are kept

constant. If those other influences vary, then they may affect the form of this functional relation. For instance, the marginal propensity to consume may be affected

by the interest rate, or the unemployment rate. This is why some econometricians



484



18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL



(Hendry) advocate that one should start with an “encompassing” model with many

explanatory variables and then narrow the specification down by hypothesis tests.

Milton Friedman, by contrast, is very suspicious about multiple regressions, and

argues in [FS91, pp. 48/9] against the encompassing approach.

Friedman does not give a theoretical argument but argues by an example from

Chemistry. Perhaps one can say that the variations in the other influences may have

more serious implications than just modifying the form of the functional relation:

they may destroy this functional relation altogether, i.e., prevent any systematic or

predictable behavior.

observed unobserved

random

y

ε

nonrandom

X

β, σ 2

18.2. Ordinary Least Squares

ˆ

In the model y = Xβ + ε , where ε ∼ (o, σ 2 I), the OLS-estimate β is defined to

ˆ which minimizes

be that value β = β

(18.2.1)



SSE = (y − Xβ) (y − Xβ) = y y − 2y Xβ + β X Xβ.



Problem 184 shows that in model 1, this principle yields the arithmetic mean.



18.2. ORDINARY LEAST SQUARES



485



Problem 229. 2 points Prove that, if one predicts a random variable y by a

constant a, the constant which gives the best MSE is a = E[y], and the best MSE one

can get is var[y].

Answer. E[(y − a)2 ] = E[y 2 ] − 2a E[y] + a2 . Differentiate with respect to a and set zero to

get a = E[y]. One can also differentiate first and then take expected value: E[2(y − a)] = 0.



We will solve this minimization problem using the first-order conditions in vector

notation. As a preparation, you should read the beginning of Appendix C about

matrix differentiation and the connection between matrix differentiation and the

Jacobian matrix of a vector function. All you need at this point is the two equations

(C.1.6) and (C.1.7). The chain rule (C.1.23) is enlightening but not strictly necessary

for the present derivation.

The matrix differentiation rules (C.1.6) and (C.1.7) allow us to differentiate

(18.2.1) to get

(18.2.2)



∂SSE/∂β = −2y X + 2β X X.



Transpose it (because it is notationally simpler to have a relationship between column

ˆ

vectors), set it zero while at the same time replacing β by β, and divide by 2, to get

the “normal equation”

(18.2.3)



ˆ

X y = X X β.



486



18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL



Due to our assumption that all columns of X are linearly independent, X X has

an inverse and one can premultiply both sides of (18.2.3) by (X X)−1 :

(18.2.4)



ˆ

β = (X X)−1 X y.



If the columns of X are not linearly independent, then (18.2.3) has more than one

solution, and the normal equation is also in this case a necessary and sufficient

ˆ

condition for β to minimize the SSE (proof in Problem 232).

Problem 230. 4 points Using the matrix differentiation rules

(18.2.5)

(18.2.6)



∂w x/∂x = w

∂x M x/∂x = 2x M



ˆ

for symmetric M , compute the least-squares estimate β which minimizes

(18.2.7)



SSE = (y − Xβ) (y − Xβ)



You are allowed to assume that X X has an inverse.

Answer. First you have to multiply out

(18.2.8)



(y − Xβ) (y − Xβ) = y y − 2y Xβ + β X Xβ.



The matrix differentiation rules (18.2.5) and (18.2.6) allow us to differentiate (18.2.8) to get

(18.2.9)



∂SSE/∂β



= −2y X + 2β X X.



18.2. ORDINARY LEAST SQUARES



487



Transpose it (because it is notationally simpler to have a relationship between column vectors), set

ˆ

it zero while at the same time replacing β by β, and divide by 2, to get the “normal equation”

(18.2.10)



ˆ

X y = X X β.



Since X X has an inverse, one can premultiply both sides of (18.2.10) by (X X)−1 :

(18.2.11)



ˆ

β = (X X)−1 X y.



Problem 231. 2 points Show the following: if the columns of X are linearly

independent, then X X has an inverse. (X itself is not necessarily square.) In your

proof you may use the following criteria: the columns of X are linearly independent

(this is also called: X has full column rank) if and only if Xa = o implies a = o.

And a square matrix has an inverse if and only if its columns are linearly independent.

Answer. We have to show that any a which satisfies X Xa = o is itself the null vector.

From X Xa = o follows a X Xa = 0 which can also be written Xa 2 = 0. Therefore Xa = o,

and since the columns of X are linearly independent, this implies a = o.



Problem 232. 3 points In this Problem we do not assume that X has full column

rank, it may be arbitrary.

• a. The normal equation (18.2.3) has always at least one solution. Hint: you

are allowed to use, without proof, equation (A.3.3) in the mathematical appendix.



488



18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL



ˆ

Answer. With this hint it is easy: β = (X X)− X y is a solution.



ˆ

• b. If β satisfies the normal equation and β is an arbitrary vector, then

ˆ

ˆ

ˆ

ˆ

(18.2.12) (y − Xβ) (y − Xβ) = (y − X β) (y − X β) + (β − β) X X(β − β).

Answer. This is true even if X has deficient rank, and it will be shown here in this general

ˆ

ˆ

ˆ

ˆ

case. To prove (18.2.12), write (18.2.1) as SSE = (y − X β) − X(β − β)

(y − X β) − X(β − β) ;

ˆ satisfies (18.2.3), the cross product terms disappear.

since β



• c. Conclude from this that the normal equation is a necessary and sufficient

ˆ

condition characterizing the values β minimizing the sum of squared errors (18.2.12).

Answer. (18.2.12) shows that the normal equations are sufficient. For necessity of the normal

ˆ

equations let β be an arbitrary solution of the normal equation, we have seen that there is always

ˆ

at least one. Given β, it follows from (18.2.12) that for any solution β ∗ of the minimization,

∗ − β) = o. Use (18.2.3) to replace (X X)β by X y to get X Xβ ∗ = X y.

ˆ

ˆ

X X(β



ˆ ˆ

It is customary to use the notation X β = y for the so-called fitted values, which

are the estimates of the vector of means η = Xβ. Geometrically, y is the orthogonal

ˆ

projection of y on the space spanned by the columns of X. See Theorem A.6.1 about

projection matrices.

The vector of differences between the actual and the fitted values is called the

vector of “residuals” ε = y − y . The residuals are “predictors” of the actual (but

ˆ

ˆ



18.2. ORDINARY LEAST SQUARES



489



unobserved) values of the disturbance vector ε . An estimator of a random magnitude

is usually called a “predictor,” but in the linear model estimation and prediction are

treated on the same footing, therefore it is not necessary to distinguish between the

two.

You should understand the difference between disturbances and residuals, and

between the two decompositions

(18.2.13)



ˆ ˆ

y = Xβ + ε = X β + ε



Problem 233. 2 points Assume that X has full column rank. Show that ε = M y

ˆ

where M = I − X(X X)−1 X . Show that M is symmetric and idempotent.

ˆ

Answer. By definition, ε = y − X β = y − X(X X)−1 Xy = I − X(X X)−1 X y. Idemˆ

potent, i.e. M M = M :

(18.2.14)

M M = I − X(X X)−1 X



I − X(X X)−1 X



= I − X(X X)−1 X



− X(X X)−1 X



Problem 234. Assume X has full column rank. Define M = I−X(X X)−1 X .

• a. 1 point Show that the space M projects on is the space orthogonal to all

columns in X, i.e., M q = q if and only if X q = o.



490



18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL



Answer. X q = o clearly implies M q = q. Conversely, M q = q implies X(X X)−1 X q =

o. Premultiply this by X to get X q = o.



• b. 1 point Show that a vector q lies in the range space of X, i.e., the space

spanned by the columns of X, if and only if M q = o. In other words, {q : q = Xa

for some a} = {q : M q = o}.

Answer. First assume M q = o. This means q = X(X X)−1 X q = Xa with a =

(X X)−1 X q. Conversely, if q = Xa then M q = M Xa = Oa = o.



Problem 235. In 2-dimensional space, write down the projection matrix on the

diagonal line y = x (call it E), and compute Ez for the three vectors a = [ 2 ],

1

b = [ 2 ], and c = [ 3 ]. Draw these vectors and their projections.

2

2

Assume we have a dependent variable y and two regressors x1 and x2 , each with

15 observations. Then one can visualize the data either as 15 points in 3-dimensional

space (a 3-dimensional scatter plot), or 3 points in 15-dimensional space. In the

first case, each point corresponds to an observation, in the second case, each point

corresponds to a variable. In this latter case the points are usually represented

as vectors. You only have 3 vectors, but each of these vectors is a vector in 15dimensional space. But you do not have to draw a 15-dimensional space to draw

these vectors; these 3 vectors span a 3-dimensional subspace, and y is the projection

ˆ

of the vector y on the space spanned by the two regressors not only in the original



18.2. ORDINARY LEAST SQUARES



491



15-dimensional space, but already in this 3-dimensional subspace. In other words,

[DM93, Figure 1.3] is valid in all dimensions! In the 15-dimensional space, each

dimension represents one observation. In the 3-dimensional subspace, this is no

longer true.

Problem 236. “Simple regression” is regression with an intercept and one explanatory variable only, i.e.,

(18.2.15)



y t = α + βxt + εt



Here X = ι x and β = α

ˆ

for β = α β :

ˆ ˆ



β



. Evaluate (18.2.4) to get the following formulas



x2 y t − xt xt y t

t

n x2 − ( xt )2

t

n xt y t − xt y t

ˆ

β=

n x2 − ( xt )2

t



(18.2.16)



α=

ˆ



(18.2.17)

Answer.

(18.2.18)



X X=



ι

x



ι



x =



ι ι

x ι



ι x

=

x x



n

xt



xt

x2

t



492



18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL



(18.2.19)



X X −1 =



(18.2.20)



n



x2

t



X y=



1

−(



xt



)2







ι y

=

x y



x2

t

xt







xt

n



yt

xi y t



Therefore (X X)−1 X y gives equations (18.2.16) and (18.2.17).



Problem 237. Show that

n



n



(xt − x)(y t − y ) =

¯

¯



(18.2.21)

t=1



xt y t − n¯y



t=1



(Note, as explained in [DM93, pp. 27/8] or [Gre97, Section 5.4.1], that the left

hand side is computationally much more stable than the right.)

Answer. Simply multiply out.



Problem 238. Show that (18.2.17) and (18.2.16) can also be written as follows:

(18.2.22)

(18.2.23)



(xt − x)(y t − y )

¯

¯

(xt − x)2

¯

α = y − βx

ˆ ¯ ˆ¯

ˆ

β=



Xem Thêm
Tải bản đầy đủ (.pdf) (1,644 trang)

×