Chapter 23. The Mean Squared Error as an Initial Criterion of Precision

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )

630

23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION

ˆ

For our purposes, therefore, the estimator (or predictor) φ of the unknown parameter

˜

ˆ

(or unobserved random variable) φ is no worse than the alternative φ if MSE[φ; φ] ≤

˜ φ]. This is a criterion which can be applied before any observations are

MSE[φ;

collected and actual estimations are made; it is an “initial” criterion regarding the

expected average performance in a series of future trials (even though, in economics,

usually only one trial is made).

23.1. Comparison of Two Vector Estimators

ˆ

˜

If one wants to compare two vector estimators, say φ and φ, it is often impossible

ˆ

to say which of two estimators is better. It may be the case that φ1 is better than

˜ (in terms of MSE or some other criterion), but φ is worse than φ . And even if

ˆ

˜

φ1

2

2

ˆ than by φ , certain linear combinations

˜

every component φi is estimated better by φi

i

˜

ˆ

t φ of the components of φ may be estimated better by t φ than by t φ.

ˆ

Problem 294. 2 points Construct an example of two vector estimators φ and

ˆ

˜

˜ of the same random vector φ = φ1 φ2 , so that MSE[φi ; φi ] < MSE[φi ; φi ] for

φ

ˆ

ˆ

˜

˜

i = 1, 2 but MSE[φ1 + φ2 ; φ1 + φ2 ] > MSE[φ1 + φ2 ; φ1 + φ2 ]. Hint: it is easiest to use

an example in which all random variables are constants. Another hint: the geometric

ˆ

˜

analog would be to ﬁnd two vectors in a plane φ and φ. In each component (i.e.,

23.1. COMPARISON OF TWO VECTOR ESTIMATORS

631

ˆ

˜

projection on the axes), φ is closer to the origin than φ. But in the projection on

˜ is closer to the origin than φ.

ˆ

the diagonal, φ

ˆ

φ=

Answer. In the simplest counterexample, all variables involved are constants: φ =

1 , and φ = −2 .

˜

1

0

0

,

2

ˆ

One can only then say unambiguously that the vector φ is a no worse estimator

˜

than φ if its MSE is smaller or equal for every linear combination. Theorem 23.1.1

ˆ

will show that this is the case if and only if the MSE-matrix of φ is smaller, by a

˜

nonnegative deﬁnite matrix, than that of φ. If this is so, then theorem 23.1.1 says

that not only the MSE of all linear transformations, but also all other nonnegative

deﬁnite quadratic loss functions involving these vectors (such as the trace of the

MSE-matrix, which is an often-used criterion) are minimized. In order to formulate

and prove this, we ﬁrst need a formal deﬁnition of the MSE-matrix. We write MSE

ˆ

for the matrix and MSE for the scalar mean squared error. The MSE-matrix of φ

as an estimator of φ is deﬁned as

(23.1.1)

ˆ

ˆ

ˆ

MSE[φ; φ] = E [(φ − φ)(φ − φ) ] .

632

23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION

ˆ

Problem 295. 2 points Let θ be a vector of possibly random parameters, and θ

an estimator of θ. Show that

(23.1.2)

ˆ

ˆ

ˆ

ˆ

MSE[θ; θ] = V [θ − θ] + (E [θ − θ])(E [θ − θ]) .

Don’t assume the scalar result but make a proof that is good for vectors and scalars.

Answer. For any random vector x follows

E [xx ] = E (x − E [x] + E [x])(x − E [x] + E [x])

= E (x − E [x])(x − E [x])

− E (x − E [x]) E [x]

− E E [x](x − E [x])

+ E E [x] E [x]

= V [x] − O − O + E [x] E [x] .

ˆ

Setting x = θ − θ the statement follows.

ˆ

If θ is nonrandom, formula (23.1.2) simpliﬁes slightly, since in this case V [θ−θ] =

ˆ In this case, the MSE matrix is the covariance matrix plus the squared bias

V [θ].

ˆ

matrix. If θ is nonrandom and in addition θ is unbiased, then the MSE-matrix

coincides with the covariance matrix.

ˆ

˜

Theorem 23.1.1. Assume φ and φ are two estimators of the parameter φ (which

is allowed to be random itself ). Then conditions (23.1.3), (23.1.4), and (23.1.5) are

23.1. COMPARISON OF TWO VECTOR ESTIMATORS

633

equivalent:

(23.1.3)

For every constant vector t,

(23.1.4)

˜

ˆ

MSE[φ; φ] − MSE[φ; φ]

(23.1.5)

For every nnd Θ,

ˆ

˜

MSE[t φ; t φ] ≤ MSE[t φ; t φ]

is a nonnegative deﬁnite matrix

ˆ

ˆ

˜

˜

E (φ − φ) Θ(φ − φ) ≤ E (φ − φ) Θ(φ − φ) .

˜

ˆ

Proof. Call MSE[φ; φ] = σ 2 Ξ and MSE[φ; φ] = σ 2Ω . To show that (23.1.3)

ˆ t φ] = σ 2 t Ωt and likewise MSE[t φ; t φ

˜

implies (23.1.4), simply note that MSE[t φ;

2

σ t Ξt. Therefore (23.1.3) is equivalent to t (Ξ − Ω )t ≥ 0 for all t, which is the

deﬁning property making Ξ − Ω nonnegative deﬁnite.

Here is the proof that (23.1.4) implies (23.1.5):

ˆ

ˆ

ˆ

ˆ

E[(φ − φ) Θ(φ − φ)] = E[tr (φ − φ) Θ(φ − φ) ] =

ˆ

ˆ

= E[tr Θ(φ − φ)(φ − φ)

ˆ

ˆ

Ω

] = tr Θ E [(φ − φ)(φ − φ) ] = σ 2 tr ΘΩ

and in the same way

˜

˜

E[(φ − φ) Θ(φ − φ)] = σ 2 tr ΘΞ .

The diﬀerence in the expected quadratic forms is therefore σ 2 tr Θ(Ξ − Ω ) . By

assumption, Ξ − Ω is nonnegative deﬁnite. Therefore, by theorem A.5.6 in the

Mathematical Appendix, or by Problem 296 below, this trace is nonnegative.

634

23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION

To complete the proof, (23.1.5) has (23.1.3) as a special case if one sets Θ =

tt .

Problem 296. Show that if Θ and Σ are symmetric and nonnegative deﬁnite,

Σ

then tr(ΘΣ ) ≥ 0. You are allowed to use that tr(AB) = tr(BA), that the trace of a

nonnegative deﬁnite matrix is ≥ 0, and Problem 129 (which is trivial).

Σ

Answer. Write Θ = RR ; then tr(ΘΣ ) = tr(RR Σ ) = tr(R Σ R) ≥ 0.

Problem 297. Consider two very simple-minded estimators of the unknown

nonrandom parameter vector φ = φ1 . Neither of these estimators depends on any

φ2

ˆ

observations, they are constants. The ﬁrst estimator is φ = [ 11 ], and the second is

11

˜ = [ 12 ].

φ

8

• a. 2 points Compute the MSE-matrices of these two estimators if the true

value of the parameter vector is φ = [ 10 ]. For which estimator is the trace of the

10

MSE matrix smaller?

23.1. COMPARISON OF TWO VECTOR ESTIMATORS

635

ˆ

Answer. φ has smaller trace of the MSE-matrix.

1

ˆ

φ−φ=

1

ˆ

ˆ

ˆ

MSE[φ; φ] = E [(φ − φ)(φ − φ) ]

= E[

1

1

˜

φ−φ=

4

−4

1 ] = E[

1

1

1

1

]=

1

1

1

1

2

−2

˜

MSE[φ; φ] =

1

−4

4

Note that both MSE-matrices are singular, i.e., both estimators allow an error-free look at certain

linear combinations of the parameter vector.

ˆ

• b. 1 point Give two vectors g = [ g1 ] and h = h1 satisfying MSE[g φ; g φ] <

g2

h2

˜

ˆ

˜

MSE[g φ; g φ] and MSE[h φ; h φ] > MSE[h φ; h φ] (g and h are not unique;

there are many possibilities).

1

ˆ

˜

Answer. With g = −1 and h = 1 for instance we get g φ − g φ = 0, g φ −

1

ˆ

˜

ˆ

˜

g φ = 4, h φ; h φ = 2, h φ; h φ = 0, therefore MSE[g φ; g φ] = 0, MSE[g φ; g φ] = 16,

636

23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION

ˆ

˜

MSE[h φ; h φ] = 4, MSE[h φ; h φ] = 0. An alternative way to compute this is e.g.

˜

MSE[h φ; h φ] = 1

−1

4

−4

−4

4

1

= 16

−1

ˆ

˜

˜

• c. 1 point Show that neither MSE[φ; φ] − MSE[φ; φ] nor MSE[φ; φ] −

ˆ

MSE[φ; φ] is a nonnegative deﬁnite matrix. Hint: you are allowed to use the

mathematical fact that if a matrix is nonnegative deﬁnite, then its determinant is

nonnegative.

Answer.

(23.1.6)

˜

ˆ

MSE[φ; φ] − MSE[φ; φ] =

3

−5

−5

3

Its determinant is negative, and the determinant of its negative is also negative.

CHAPTER 24

Sampling Properties of the Least Squares

Estimator

ˆ

The estimator β was derived from a geometric argument, and everything which

we showed so far are what [DM93, p. 3] calls its numerical as opposed to its statistical

ˆ

properties. But β has also nice statistical or sampling properties. We are assuming

right now the speciﬁcation given in (18.1.3), in which X is an arbitrary matrix of full

column rank, and we are not assuming that the errors must be Normally distributed.

The assumption that X is nonrandom means that repeated samples are taken with

the same X-matrix. This is often true for experimental data, but not in econometrics.

The sampling properties which we are really interested in are those where also the Xmatrix is random; we will derive those later. For this later derivation, the properties

637

638

24. SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR

with ﬁxed X-matrix, which we are going to discuss presently, will be needed as an

intermediate step. The assumption of ﬁxed X is therefore a preliminary technical

assumption, to be dropped later.

ˆ

In order to know how good the estimator β is, one needs the statistical properties

ˆ − β. This sampling error has the following formula:

of its “sampling error” β

ˆ

β − β = (X X)−1 X y − (X X)−1 X Xβ =

(24.0.7)

= (X X)−1 X (y − Xβ) = (X X)−1 X ε

ˆ

From (24.0.7) follows immediately that β is unbiased, since E [(X X)−1 X ε ] = o.

Unbiasedness does not make an estimator better, but many good estimators are

unbiased, and it simpliﬁes the math.

We will use the MSE-matrix as a criterion for how good an estimator of a vector

of unobserved parameters is. Chapter 23 gave some reasons why this is a sensible

criterion (compare [DM93, Chapter 5.5]).

Xem Thêm

Chapter 23. The Mean Squared Error as an Initial Criterion of Precision

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về