Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )
482
18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL
εi = y i −µ, and define the vectors y = y 1
and ι = 1
(18.1.1)
1 ···
1
y2
···
yn
, ε = ε1
ε2
···
εn
,
. Then one can write the model in the form
y = ιµ + ε
ε ∼ (o, σ 2 I)
ε
ε
The notation ε ∼ (o, σ 2 I) is shorthand for E [ε ] = o (the null vector) and V [ε ] = σ 2 I
(σ 2 times the identity matrix, which has 1’s in the diagonal and 0’s elsewhere). µ is
the deterministic part of all the y i , and εi is the random part.
Model 2 is “simple regression” in which the deterministic part µ is not constant
but is a function of the nonrandom variable x. The assumption here is that this
function is differentiable and can, in the range of the variation of the data, be approximated by a linear function [Tin51, pp. 19–20]. I.e., each element of y is a
constant α plus a constant multiple of the corresponding element of the nonrandom
vector x plus a random error term: y t = α + xt β + εt , t = 1, . . . , n. This can be
written as
y1
ε1
ε1
1
x1
1 x1
. .
. α + .
. = . α + . β + . = .
.
. .
.
(18.1.2)
.
.
.
.
.
.
.
.
β
yn
1
xn
εn
1 xn
εn
or
(18.1.3)
y = Xβ + ε
ε ∼ (o, σ 2 I)
18.1. THREE VERSIONS OF THE LINEAR MODEL
483
Problem 228. 1 point Compute the matrix product
4 0
1 2 5
2 1
0 3 1
3 8
Answer.
1
0
2
3
5
1
4
2
3
0
1
8
=
1·4+2·2+5·3
0·4+3·2+1·3
1·0+2·1+5·8
23
=
0·0+3·1+1·8
9
42
11
If the systematic part of y depends on more than one variable, then one needs
multiple regression, model 3. Mathematically, multiple regression has the same form
(18.1.3), but this time X is arbitrary (except for the restriction that all its columns
are linearly independent). Model 3 has Models 1 and 2 as special cases.
Multiple regression is also used to “correct for” disturbing influences. Let me
explain. A functional relationship, which makes the systematic part of y dependent
on some other variable x will usually only hold if other relevant influences are kept
constant. If those other influences vary, then they may affect the form of this functional relation. For instance, the marginal propensity to consume may be affected
by the interest rate, or the unemployment rate. This is why some econometricians
484
18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL
(Hendry) advocate that one should start with an “encompassing” model with many
explanatory variables and then narrow the specification down by hypothesis tests.
Milton Friedman, by contrast, is very suspicious about multiple regressions, and
argues in [FS91, pp. 48/9] against the encompassing approach.
Friedman does not give a theoretical argument but argues by an example from
Chemistry. Perhaps one can say that the variations in the other influences may have
more serious implications than just modifying the form of the functional relation:
they may destroy this functional relation altogether, i.e., prevent any systematic or
predictable behavior.
observed unobserved
random
y
ε
nonrandom
X
β, σ 2
18.2. Ordinary Least Squares
ˆ
In the model y = Xβ + ε , where ε ∼ (o, σ 2 I), the OLS-estimate β is defined to
ˆ which minimizes
be that value β = β
(18.2.1)
SSE = (y − Xβ) (y − Xβ) = y y − 2y Xβ + β X Xβ.
Problem 184 shows that in model 1, this principle yields the arithmetic mean.
18.2. ORDINARY LEAST SQUARES
485
Problem 229. 2 points Prove that, if one predicts a random variable y by a
constant a, the constant which gives the best MSE is a = E[y], and the best MSE one
can get is var[y].
Answer. E[(y − a)2 ] = E[y 2 ] − 2a E[y] + a2 . Differentiate with respect to a and set zero to
get a = E[y]. One can also differentiate first and then take expected value: E[2(y − a)] = 0.
We will solve this minimization problem using the first-order conditions in vector
notation. As a preparation, you should read the beginning of Appendix C about
matrix differentiation and the connection between matrix differentiation and the
Jacobian matrix of a vector function. All you need at this point is the two equations
(C.1.6) and (C.1.7). The chain rule (C.1.23) is enlightening but not strictly necessary
for the present derivation.
The matrix differentiation rules (C.1.6) and (C.1.7) allow us to differentiate
(18.2.1) to get
(18.2.2)
∂SSE/∂β = −2y X + 2β X X.
Transpose it (because it is notationally simpler to have a relationship between column
ˆ
vectors), set it zero while at the same time replacing β by β, and divide by 2, to get
the “normal equation”
(18.2.3)
ˆ
X y = X X β.
486
18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL
Due to our assumption that all columns of X are linearly independent, X X has
an inverse and one can premultiply both sides of (18.2.3) by (X X)−1 :
(18.2.4)
ˆ
β = (X X)−1 X y.
If the columns of X are not linearly independent, then (18.2.3) has more than one
solution, and the normal equation is also in this case a necessary and sufficient
ˆ
condition for β to minimize the SSE (proof in Problem 232).
Problem 230. 4 points Using the matrix differentiation rules
(18.2.5)
(18.2.6)
∂w x/∂x = w
∂x M x/∂x = 2x M
ˆ
for symmetric M , compute the least-squares estimate β which minimizes
(18.2.7)
SSE = (y − Xβ) (y − Xβ)
You are allowed to assume that X X has an inverse.
Answer. First you have to multiply out
(18.2.8)
(y − Xβ) (y − Xβ) = y y − 2y Xβ + β X Xβ.
The matrix differentiation rules (18.2.5) and (18.2.6) allow us to differentiate (18.2.8) to get
(18.2.9)
∂SSE/∂β
= −2y X + 2β X X.
18.2. ORDINARY LEAST SQUARES
487
Transpose it (because it is notationally simpler to have a relationship between column vectors), set
ˆ
it zero while at the same time replacing β by β, and divide by 2, to get the “normal equation”
(18.2.10)
ˆ
X y = X X β.
Since X X has an inverse, one can premultiply both sides of (18.2.10) by (X X)−1 :
(18.2.11)
ˆ
β = (X X)−1 X y.
Problem 231. 2 points Show the following: if the columns of X are linearly
independent, then X X has an inverse. (X itself is not necessarily square.) In your
proof you may use the following criteria: the columns of X are linearly independent
(this is also called: X has full column rank) if and only if Xa = o implies a = o.
And a square matrix has an inverse if and only if its columns are linearly independent.
Answer. We have to show that any a which satisfies X Xa = o is itself the null vector.
From X Xa = o follows a X Xa = 0 which can also be written Xa 2 = 0. Therefore Xa = o,
and since the columns of X are linearly independent, this implies a = o.
Problem 232. 3 points In this Problem we do not assume that X has full column
rank, it may be arbitrary.
• a. The normal equation (18.2.3) has always at least one solution. Hint: you
are allowed to use, without proof, equation (A.3.3) in the mathematical appendix.
488
18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL
ˆ
Answer. With this hint it is easy: β = (X X)− X y is a solution.
ˆ
• b. If β satisfies the normal equation and β is an arbitrary vector, then
ˆ
ˆ
ˆ
ˆ
(18.2.12) (y − Xβ) (y − Xβ) = (y − X β) (y − X β) + (β − β) X X(β − β).
Answer. This is true even if X has deficient rank, and it will be shown here in this general
ˆ
ˆ
ˆ
ˆ
case. To prove (18.2.12), write (18.2.1) as SSE = (y − X β) − X(β − β)
(y − X β) − X(β − β) ;
ˆ satisfies (18.2.3), the cross product terms disappear.
since β
• c. Conclude from this that the normal equation is a necessary and sufficient
ˆ
condition characterizing the values β minimizing the sum of squared errors (18.2.12).
Answer. (18.2.12) shows that the normal equations are sufficient. For necessity of the normal
ˆ
equations let β be an arbitrary solution of the normal equation, we have seen that there is always
ˆ
at least one. Given β, it follows from (18.2.12) that for any solution β ∗ of the minimization,
∗ − β) = o. Use (18.2.3) to replace (X X)β by X y to get X Xβ ∗ = X y.
ˆ
ˆ
X X(β
ˆ ˆ
It is customary to use the notation X β = y for the so-called fitted values, which
are the estimates of the vector of means η = Xβ. Geometrically, y is the orthogonal
ˆ
projection of y on the space spanned by the columns of X. See Theorem A.6.1 about
projection matrices.
The vector of differences between the actual and the fitted values is called the
vector of “residuals” ε = y − y . The residuals are “predictors” of the actual (but
ˆ
ˆ
18.2. ORDINARY LEAST SQUARES
489
unobserved) values of the disturbance vector ε . An estimator of a random magnitude
is usually called a “predictor,” but in the linear model estimation and prediction are
treated on the same footing, therefore it is not necessary to distinguish between the
two.
You should understand the difference between disturbances and residuals, and
between the two decompositions
(18.2.13)
ˆ ˆ
y = Xβ + ε = X β + ε
Problem 233. 2 points Assume that X has full column rank. Show that ε = M y
ˆ
where M = I − X(X X)−1 X . Show that M is symmetric and idempotent.
ˆ
Answer. By definition, ε = y − X β = y − X(X X)−1 Xy = I − X(X X)−1 X y. Idemˆ
potent, i.e. M M = M :
(18.2.14)
M M = I − X(X X)−1 X
I − X(X X)−1 X
= I − X(X X)−1 X
− X(X X)−1 X
Problem 234. Assume X has full column rank. Define M = I−X(X X)−1 X .
• a. 1 point Show that the space M projects on is the space orthogonal to all
columns in X, i.e., M q = q if and only if X q = o.
490
18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL
Answer. X q = o clearly implies M q = q. Conversely, M q = q implies X(X X)−1 X q =
o. Premultiply this by X to get X q = o.
• b. 1 point Show that a vector q lies in the range space of X, i.e., the space
spanned by the columns of X, if and only if M q = o. In other words, {q : q = Xa
for some a} = {q : M q = o}.
Answer. First assume M q = o. This means q = X(X X)−1 X q = Xa with a =
(X X)−1 X q. Conversely, if q = Xa then M q = M Xa = Oa = o.
Problem 235. In 2-dimensional space, write down the projection matrix on the
diagonal line y = x (call it E), and compute Ez for the three vectors a = [ 2 ],
1
b = [ 2 ], and c = [ 3 ]. Draw these vectors and their projections.
2
2
Assume we have a dependent variable y and two regressors x1 and x2 , each with
15 observations. Then one can visualize the data either as 15 points in 3-dimensional
space (a 3-dimensional scatter plot), or 3 points in 15-dimensional space. In the
first case, each point corresponds to an observation, in the second case, each point
corresponds to a variable. In this latter case the points are usually represented
as vectors. You only have 3 vectors, but each of these vectors is a vector in 15dimensional space. But you do not have to draw a 15-dimensional space to draw
these vectors; these 3 vectors span a 3-dimensional subspace, and y is the projection
ˆ
of the vector y on the space spanned by the two regressors not only in the original
18.2. ORDINARY LEAST SQUARES
491
15-dimensional space, but already in this 3-dimensional subspace. In other words,
[DM93, Figure 1.3] is valid in all dimensions! In the 15-dimensional space, each
dimension represents one observation. In the 3-dimensional subspace, this is no
longer true.
Problem 236. “Simple regression” is regression with an intercept and one explanatory variable only, i.e.,
(18.2.15)
y t = α + βxt + εt
Here X = ι x and β = α
ˆ
for β = α β :
ˆ ˆ
β
. Evaluate (18.2.4) to get the following formulas
x2 y t − xt xt y t
t
n x2 − ( xt )2
t
n xt y t − xt y t
ˆ
β=
n x2 − ( xt )2
t
(18.2.16)
α=
ˆ
(18.2.17)
Answer.
(18.2.18)
X X=
ι
x
ι
x =
ι ι
x ι
ι x
=
x x
n
xt
xt
x2
t
492
18. MEAN-VARIANCE ANALYSIS IN THE LINEAR MODEL
(18.2.19)
X X −1 =
(18.2.20)
n
x2
t
X y=
1
−(
xt
)2
−
ι y
=
x y
x2
t
xt
−
xt
n
yt
xi y t
Therefore (X X)−1 X y gives equations (18.2.16) and (18.2.17).
Problem 237. Show that
n
n
(xt − x)(y t − y ) =
¯
¯
(18.2.21)
t=1
xt y t − n¯y
x¯
t=1
(Note, as explained in [DM93, pp. 27/8] or [Gre97, Section 5.4.1], that the left
hand side is computationally much more stable than the right.)
Answer. Simply multiply out.
Problem 238. Show that (18.2.17) and (18.2.16) can also be written as follows:
(18.2.22)
(18.2.23)
(xt − x)(y t − y )
¯
¯
(xt − x)2
¯
α = y − βx
ˆ ¯ ˆ¯
ˆ
β=