Chapter 17. The Mean Squared Error as an Initial Criterion of Precision

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.92 MB, 370 trang )

204

17. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION

that not only the MSE of all linear transformations, but also all other nonnegative

deﬁnite quadratic loss functions involving these vectors (such as the trace of the

MSE-matrix, which is an often-used criterion) are minimized. In order to formulate

and prove this, we ﬁrst need a formal deﬁnition of the MSE-matrix. We write MSE

ˆ

for the matrix and MSE for the scalar mean squared error. The MSE-matrix of φ

as an estimator of φ is deﬁned as

ˆ

ˆ

ˆ

MSE[φ; φ] = E [(φ − φ)(φ − φ) ] .

(17.1.1)

ˆ

Problem 241. 2 points Let θ be a vector of possibly random parameters, and θ

an estimator of θ. Show that

(17.1.2)

ˆ

ˆ

ˆ

ˆ

MSE[θ; θ] = V [θ − θ] + (E [θ − θ])(E [θ − θ]) .

Don’t assume the scalar result but make a proof that is good for vectors and scalars.

Answer. For any random vector x follows

E [xx ] = E (x − E [x] + E [x])(x − E [x] + E [x])

= E (x − E [x])(x − E [x])

− E (x − E [x]) E [x]

− E E [x](x − E [x])

+ E E [x] E [x]

= V [x] − O − O + E [x] E [x] .

ˆ

Setting x = θ − θ the statement follows.

ˆ

If θ is nonrandom, formula (17.1.2) simpliﬁes slightly, since in this case V [θ−θ] =

ˆ In this case, the MSE matrix is the covariance matrix plus the squared bias

V [θ].

ˆ

matrix. If θ is nonrandom and in addition θ is unbiased, then the MSE-matrix

coincides with the covariance matrix.

ˆ

˜

Theorem 17.1.1. Assume φ and φ are two estimators of the parameter φ (which

is allowed to be random itself ). Then conditions (17.1.3), (17.1.4), and (17.1.5) are

equivalent:

(17.1.3)

For every constant vector t,

(17.1.4)

˜

ˆ

MSE[φ; φ] − MSE[φ; φ]

(17.1.5)

For every nnd Θ,

ˆ

˜

MSE[t φ; t φ] ≤ MSE[t φ; t φ]

is a nonnegative deﬁnite matrix

ˆ

ˆ

˜

˜

E (φ − φ) Θ(φ − φ) ≤ E (φ − φ) Θ(φ − φ) .

˜

ˆ

Proof. Call MSE[φ; φ] = σ 2 Ξ and MSE[φ; φ] = σ 2Ω . To show that (17.1.3)

ˆ

˜

implies (17.1.4), simply note that MSE[t φ; t φ] = σ 2 t Ω t and likewise MSE[t φ; t φ] =

2

Ω)t ≥ 0 for all t, which is the

σ t Ξt. Therefore (17.1.3) is equivalent to t (Ξ −

deﬁning property making Ξ − Ω nonnegative deﬁnite.

Here is the proof that (17.1.4) implies (17.1.5):

ˆ

ˆ

ˆ

ˆ

E[(φ − φ) Θ(φ − φ)] = E[tr (φ − φ) Θ(φ − φ) ] =

ˆ

ˆ

= E[tr Θ(φ − φ)(φ − φ)

ˆ

ˆ

Ω

] = tr Θ E [(φ − φ)(φ − φ) ] = σ 2 tr ΘΩ

and in the same way

˜

˜

E[(φ − φ) Θ(φ − φ)] = σ 2 tr ΘΞ .

The diﬀerence in the expected quadratic forms is therefore σ 2 tr Θ(Ξ − Ω ) . By

assumption, Ξ − Ω is nonnegative deﬁnite. Therefore, by theorem A.5.6 in the

Mathematical Appendix, or by Problem 242 below, this trace is nonnegative.

To complete the proof, (17.1.5) has (17.1.3) as a special case if one sets Θ =

tt .

17.1. COMPARISON OF TWO VECTOR ESTIMATORS

205

Problem 242. Show that if Θ and Σ are symmetric and nonnegative deﬁnite,

Σ

then tr(ΘΣ ) ≥ 0. You are allowed to use that tr(AB) = tr(BA), that the trace of a

nonnegative deﬁnite matrix is ≥ 0, and Problem 118 (which is trivial).

Σ

Answer. Write Θ = RR ; then tr(ΘΣ ) = tr(RR Σ ) = tr(R Σ R) ≥ 0.

Problem 243. Consider two very simple-minded estimators of the unknown

nonrandom parameter vector φ = φ1 . Neither of these estimators depends on any

φ2

ˆ

observations, they are constants. The ﬁrst estimator is φ = [ 11 ], and the second is

11

12 ].

˜

φ=[

8

• a. 2 points Compute the MSE-matrices of these two estimators if the true

value of the parameter vector is φ = [ 10 ]. For which estimator is the trace of the

10

MSE matrix smaller?

ˆ

Answer. φ has smaller trace of the MSE-matrix.

1

ˆ

φ−φ=

1

ˆ

ˆ

ˆ

MSE[φ; φ] = E [(φ − φ)(φ − φ) ]

= E[

1

1

˜

φ−φ=

4

−4

1 ] = E[

1

1

1

1

]=

1

1

1

1

2

−2

˜

MSE[φ; φ] =

1

−4

4

Note that both MSE-matrices are singular, i.e., both estimators allow an error-free look at certain

linear combinations of the parameter vector.

ˆ

• b. 1 point Give two vectors g = [ g1 ] and h = h1 satisfying MSE[g φ; g φ] <

g2

h2

˜

ˆ

˜

MSE[g φ; g φ] and MSE[h φ; h φ] > MSE[h φ; h φ] (g and h are not unique;

there are many possibilities).

1

ˆ

˜

Answer. With g = −1 and h = 1 for instance we get g φ − g φ = 0, g φ −

1

ˆ h φ = 2, h φ; h φ = 0, therefore MSE[g φ; g φ] = 0, MSE[g φ; g φ] = 16,

˜

ˆ

˜

g φ = 4, h φ;

ˆ

˜

MSE[h φ; h φ] = 4, MSE[h φ; h φ] = 0. An alternative way to compute this is e.g.

˜

MSE[h φ; h φ] = 1

−1

4

−4

−4

4

1

= 16

−1

ˆ

˜

˜

• c. 1 point Show that neither MSE[φ; φ] − MSE[φ; φ] nor MSE[φ; φ] −

ˆ

MSE[φ; φ] is a nonnegative deﬁnite matrix. Hint: you are allowed to use the

mathematical fact that if a matrix is nonnegative deﬁnite, then its determinant is

nonnegative.

Answer.

(17.1.6)

˜

ˆ

MSE[φ; φ] − MSE[φ; φ] =

3

−5

−5

3

Its determinant is negative, and the determinant of its negative is also negative.

CHAPTER 18

Sampling Properties of the Least Squares

Estimator

ˆ

The estimator β was derived from a geometric argument, and everything which

we showed so far are what [DM93, p. 3] calls its numerical as opposed to its statistical

ˆ

properties. But β has also nice statistical or sampling properties. We are assuming

right now the speciﬁcation given in (14.1.3), in which X is an arbitrary matrix of full

column rank, and we are not assuming that the errors must be Normally distributed.

The assumption that X is nonrandom means that repeated samples are taken with

the same X-matrix. This is often true for experimental data, but not in econometrics.

The sampling properties which we are really interested in are those where also the Xmatrix is random; we will derive those later. For this later derivation, the properties

with ﬁxed X-matrix, which we are going to discuss presently, will be needed as an

intermediate step. The assumption of ﬁxed X is therefore a preliminary technical

assumption, to be dropped later.

ˆ

In order to know how good the estimator β is, one needs the statistical properties

ˆ − β. This sampling error has the following formula:

of its “sampling error” β

ˆ

β − β = (X X)−1 X y − (X X)−1 X Xβ =

(18.0.7)

= (X X)−1 X (y − Xβ) = (X X)−1 X ε

ˆ

From (18.0.7) follows immediately that β is unbiased, since E [(X X)−1 X ε ] = o.

Unbiasedness does not make an estimator better, but many good estimators are

unbiased, and it simpliﬁes the math.

We will use the MSE-matrix as a criterion for how good an estimator of a vector

of unobserved parameters is. Chapter 17 gave some reasons why this is a sensible

criterion (compare [DM93, Chapter 5.5]).

18.1. The Gauss Markov Theorem

ˆ

Returning to the least squares estimator β, one obtains, using (18.0.7), that

ˆ

ˆ

ˆ

ε

MSE[β; β] = E [(β − β)(β − β) ] = (X X)−1 X E [εε ]X(X X)−1 =

(18.1.1)

= σ 2 (X X)−1 .

This is a very simple formula. Its most interesting aspect is that this MSE matrix

does not depend on the value of the true β. In particular this means that it is

bounded with respect to β, which is important for someone who wants to be assured

of a certain accuracy even in the worst possible situation.

Problem 244. 2 points Compute the MSE-matrix MSE[ˆ; ε ] = E [(ˆ − ε )(ˆ −

ε

ε

ε

ε ) ] of the residuals as predictors of the disturbances.

ε

ε

Answer. Write ε − ε = Mε − ε = (M − I)ε = −X(X X)−1 X ε ; therefore MSE[ˆ; ε ] =

ˆ

ε

ˆ

E [X(X X)−1 X εε X(X X)−1 X = σ 2 X(X X)−1 X . Alternatively, start with ε − ε = y −

207

208

18. SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR

ˆ

ˆ

y −ε = Xβ−ˆ = X(β−β). This allows to use MSE[ˆ; ε ] = X MSE[β; β]X

ˆ ε

y

ε

= σ 2 X(X X)−1 X .

Problem 245. 2 points Let v be a random vector that is a linear transformation

of y, i.e., v = T y for some constant matrix T . Furthermore v satisﬁes E [v] = o.

Show that from this follows v = T ε. (In other words, no other transformation of y

ˆ

with zero expected value is more “comprehensive” than ε . However there are many

other transformation of y with zero expected value which are as “comprehensive” as

ε ).

Answer. E [v] = T Xβ must be o whatever the value of β. Therefore T X = O, from which

follows T M = T . Since ε = M y, this gives immediately v = T ε. (This is the statistical implication

ˆ

ˆ

of the mathematical fact that M is a deﬁciency matrix of X.)

ˆ ˆ

ˆ

Problem 246. 2 points Show that β and ε are uncorrelated, i.e., cov[β i , εj ] =

ˆ

ˆ ε] as that matrix whose (i, j)

0 for all i, j. Deﬁning the covariance matrix C [β, ˆ

ˆ ˆ

ˆ ˆ

element is cov[β i , εj ], this can also be written as C [β, ε] = O. Hint: The covariance

matrix satisﬁes the rules C [Ay, Bz] = A C [y, z]B and C [y, y] = V [y]. (Other rules

for the covariance matrix, which will not be needed here, are C [z, y] = (C [y, z]) ,

C [x + y, z] = C [x, z] + C [y, z], C [x, y + z] = C [x, y] + C [x, z], and C [y, c] = O if c is

a vector of constants.)

Answer. A = (X X)−1 X

X(X X)−1 X ) = O.

ˆ ˆ

and B = I−X(X X)−1 X , therefore C [β, ε] = σ 2 (X X)−1 X (I−

ε

Problem 247. 4 points Let y = Xβ +ε be a regression model with intercept, in

ˆ

which the ﬁrst column of X is the vector ι, and let β the least squares estimator of

ˆ

β. Show that the covariance matrix between y and β, which is deﬁned as the matrix

¯

(here consisting of one row only) that contains all the covariances

(18.1.2)

y ˆ

y ˆ

C [¯, β] ≡ cov[¯, β 1 ]

cov[¯, β 2 ] · · ·

y ˆ

cov[¯, β k ]

y ˆ

2

has the following form: C [¯, β] = σ 1 0 · · · 0 where n is the number of oby ˆ

n

servations. Hint: That the regression has an intercept term as ﬁrst column of the

X-matrix means that Xe(1) = ι, where e(1) is the unit vector having 1 in the ﬁrst

place and zeros elsewhere, and ι is the vector which has ones everywhere.

ˆ

Answer. Write both y and β in terms of y, i.e., y =

¯

¯

1

ι

n

ˆ

y and β = (X X)−1 X y. Therefore

(18.1.3)

σ 2 (1)

1

σ2

σ 2 (1)

−1

ι X(X X)−1 =

e

e

y ˆ

=

X X(X X)−1 =

.

C [¯, β] = ι V [y]X(X X)

n

n

n

n

ˆ

Theorem 18.1.1. Gauss-Markov Theorem: β is the BLUE (Best Linear Unbiased Estimator) of β in the following vector sense: for every nonrandom coeﬃcient

ˆ

vector t, t β is the scalar BLUE of t β, i.e., every other linear unbiased estimator

˜ = a y of φ = t β has a bigger MSE than t β.

ˆ

φ

˜

Proof. Write the alternative linear estimator φ = a y in the form

˜

(18.1.4)

φ = t (X X)−1 X + c y

then the sampling error is

˜

φ − φ = t (X X)−1 X + c

(18.1.5)

−1

= t (X X)

X +c

(Xβ + ε ) − t β

ε + c Xβ.

Xem Thêm

Chapter 17. The Mean Squared Error as an Initial Criterion of Precision

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về