1. Trang chủ >
  2. Kinh Doanh - Tiếp Thị >
  3. Kế hoạch kinh doanh >

Chapter 23. The Mean Squared Error as an Initial Criterion of Precision

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )


630



23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION



ˆ

For our purposes, therefore, the estimator (or predictor) φ of the unknown parameter

˜

ˆ

(or unobserved random variable) φ is no worse than the alternative φ if MSE[φ; φ] ≤

˜ φ]. This is a criterion which can be applied before any observations are

MSE[φ;

collected and actual estimations are made; it is an “initial” criterion regarding the

expected average performance in a series of future trials (even though, in economics,

usually only one trial is made).

23.1. Comparison of Two Vector Estimators

ˆ

˜

If one wants to compare two vector estimators, say φ and φ, it is often impossible

ˆ

to say which of two estimators is better. It may be the case that φ1 is better than

˜ (in terms of MSE or some other criterion), but φ is worse than φ . And even if

ˆ

˜

φ1

2

2

ˆ than by φ , certain linear combinations

˜

every component φi is estimated better by φi

i

˜

ˆ

t φ of the components of φ may be estimated better by t φ than by t φ.

ˆ

Problem 294. 2 points Construct an example of two vector estimators φ and

ˆ

˜

˜ of the same random vector φ = φ1 φ2 , so that MSE[φi ; φi ] < MSE[φi ; φi ] for

φ

ˆ

ˆ

˜

˜

i = 1, 2 but MSE[φ1 + φ2 ; φ1 + φ2 ] > MSE[φ1 + φ2 ; φ1 + φ2 ]. Hint: it is easiest to use

an example in which all random variables are constants. Another hint: the geometric

ˆ

˜

analog would be to find two vectors in a plane φ and φ. In each component (i.e.,



23.1. COMPARISON OF TWO VECTOR ESTIMATORS



631



ˆ

˜

projection on the axes), φ is closer to the origin than φ. But in the projection on

˜ is closer to the origin than φ.

ˆ

the diagonal, φ



ˆ

φ=



Answer. In the simplest counterexample, all variables involved are constants: φ =

1 , and φ = −2 .

˜

1



0

0



,



2



ˆ

One can only then say unambiguously that the vector φ is a no worse estimator

˜

than φ if its MSE is smaller or equal for every linear combination. Theorem 23.1.1

ˆ

will show that this is the case if and only if the MSE-matrix of φ is smaller, by a

˜

nonnegative definite matrix, than that of φ. If this is so, then theorem 23.1.1 says

that not only the MSE of all linear transformations, but also all other nonnegative

definite quadratic loss functions involving these vectors (such as the trace of the

MSE-matrix, which is an often-used criterion) are minimized. In order to formulate

and prove this, we first need a formal definition of the MSE-matrix. We write MSE

ˆ

for the matrix and MSE for the scalar mean squared error. The MSE-matrix of φ

as an estimator of φ is defined as

(23.1.1)



ˆ

ˆ

ˆ

MSE[φ; φ] = E [(φ − φ)(φ − φ) ] .



632



23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION



ˆ

Problem 295. 2 points Let θ be a vector of possibly random parameters, and θ

an estimator of θ. Show that

(23.1.2)



ˆ

ˆ

ˆ

ˆ

MSE[θ; θ] = V [θ − θ] + (E [θ − θ])(E [θ − θ]) .



Don’t assume the scalar result but make a proof that is good for vectors and scalars.

Answer. For any random vector x follows

E [xx ] = E (x − E [x] + E [x])(x − E [x] + E [x])

= E (x − E [x])(x − E [x])



− E (x − E [x]) E [x]



− E E [x](x − E [x])



+ E E [x] E [x]



= V [x] − O − O + E [x] E [x] .

ˆ

Setting x = θ − θ the statement follows.



ˆ

If θ is nonrandom, formula (23.1.2) simplifies slightly, since in this case V [θ−θ] =

ˆ In this case, the MSE matrix is the covariance matrix plus the squared bias

V [θ].

ˆ

matrix. If θ is nonrandom and in addition θ is unbiased, then the MSE-matrix

coincides with the covariance matrix.

ˆ

˜

Theorem 23.1.1. Assume φ and φ are two estimators of the parameter φ (which

is allowed to be random itself ). Then conditions (23.1.3), (23.1.4), and (23.1.5) are



23.1. COMPARISON OF TWO VECTOR ESTIMATORS



633



equivalent:

(23.1.3)



For every constant vector t,



(23.1.4)



˜

ˆ

MSE[φ; φ] − MSE[φ; φ]



(23.1.5)



For every nnd Θ,



ˆ

˜

MSE[t φ; t φ] ≤ MSE[t φ; t φ]

is a nonnegative definite matrix



ˆ

ˆ

˜

˜

E (φ − φ) Θ(φ − φ) ≤ E (φ − φ) Θ(φ − φ) .



˜

ˆ

Proof. Call MSE[φ; φ] = σ 2 Ξ and MSE[φ; φ] = σ 2Ω . To show that (23.1.3)

ˆ t φ] = σ 2 t Ωt and likewise MSE[t φ; t φ

˜

implies (23.1.4), simply note that MSE[t φ;

2

σ t Ξt. Therefore (23.1.3) is equivalent to t (Ξ − Ω )t ≥ 0 for all t, which is the

defining property making Ξ − Ω nonnegative definite.

Here is the proof that (23.1.4) implies (23.1.5):

ˆ

ˆ

ˆ

ˆ

E[(φ − φ) Θ(φ − φ)] = E[tr (φ − φ) Θ(φ − φ) ] =

ˆ

ˆ

= E[tr Θ(φ − φ)(φ − φ)



ˆ

ˆ



] = tr Θ E [(φ − φ)(φ − φ) ] = σ 2 tr ΘΩ



and in the same way

˜

˜

E[(φ − φ) Θ(φ − φ)] = σ 2 tr ΘΞ .

The difference in the expected quadratic forms is therefore σ 2 tr Θ(Ξ − Ω ) . By

assumption, Ξ − Ω is nonnegative definite. Therefore, by theorem A.5.6 in the

Mathematical Appendix, or by Problem 296 below, this trace is nonnegative.



634



23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION



To complete the proof, (23.1.5) has (23.1.3) as a special case if one sets Θ =

tt .



Problem 296. Show that if Θ and Σ are symmetric and nonnegative definite,

Σ

then tr(ΘΣ ) ≥ 0. You are allowed to use that tr(AB) = tr(BA), that the trace of a

nonnegative definite matrix is ≥ 0, and Problem 129 (which is trivial).



Σ

Answer. Write Θ = RR ; then tr(ΘΣ ) = tr(RR Σ ) = tr(R Σ R) ≥ 0.



Problem 297. Consider two very simple-minded estimators of the unknown

nonrandom parameter vector φ = φ1 . Neither of these estimators depends on any

φ2

ˆ

observations, they are constants. The first estimator is φ = [ 11 ], and the second is

11

˜ = [ 12 ].

φ

8



• a. 2 points Compute the MSE-matrices of these two estimators if the true

value of the parameter vector is φ = [ 10 ]. For which estimator is the trace of the

10

MSE matrix smaller?



23.1. COMPARISON OF TWO VECTOR ESTIMATORS



635



ˆ

Answer. φ has smaller trace of the MSE-matrix.

1

ˆ

φ−φ=

1

ˆ

ˆ

ˆ

MSE[φ; φ] = E [(φ − φ)(φ − φ) ]

= E[



1

1



˜

φ−φ=



4

−4



1 ] = E[



1

1



1

1

]=

1

1



1

1



2

−2



˜

MSE[φ; φ] =



1



−4

4



Note that both MSE-matrices are singular, i.e., both estimators allow an error-free look at certain

linear combinations of the parameter vector.



ˆ

• b. 1 point Give two vectors g = [ g1 ] and h = h1 satisfying MSE[g φ; g φ] <

g2

h2

˜

ˆ

˜

MSE[g φ; g φ] and MSE[h φ; h φ] > MSE[h φ; h φ] (g and h are not unique;

there are many possibilities).

1

ˆ

˜

Answer. With g = −1 and h = 1 for instance we get g φ − g φ = 0, g φ −

1

ˆ

˜

ˆ

˜

g φ = 4, h φ; h φ = 2, h φ; h φ = 0, therefore MSE[g φ; g φ] = 0, MSE[g φ; g φ] = 16,



636



23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION



ˆ

˜

MSE[h φ; h φ] = 4, MSE[h φ; h φ] = 0. An alternative way to compute this is e.g.

˜

MSE[h φ; h φ] = 1



−1



4

−4



−4

4



1

= 16

−1



ˆ

˜

˜

• c. 1 point Show that neither MSE[φ; φ] − MSE[φ; φ] nor MSE[φ; φ] −

ˆ

MSE[φ; φ] is a nonnegative definite matrix. Hint: you are allowed to use the

mathematical fact that if a matrix is nonnegative definite, then its determinant is

nonnegative.

Answer.

(23.1.6)



˜

ˆ

MSE[φ; φ] − MSE[φ; φ] =



3

−5



−5

3



Its determinant is negative, and the determinant of its negative is also negative.



CHAPTER 24



Sampling Properties of the Least Squares

Estimator

ˆ

The estimator β was derived from a geometric argument, and everything which

we showed so far are what [DM93, p. 3] calls its numerical as opposed to its statistical

ˆ

properties. But β has also nice statistical or sampling properties. We are assuming

right now the specification given in (18.1.3), in which X is an arbitrary matrix of full

column rank, and we are not assuming that the errors must be Normally distributed.

The assumption that X is nonrandom means that repeated samples are taken with

the same X-matrix. This is often true for experimental data, but not in econometrics.

The sampling properties which we are really interested in are those where also the Xmatrix is random; we will derive those later. For this later derivation, the properties

637



638



24. SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR



with fixed X-matrix, which we are going to discuss presently, will be needed as an

intermediate step. The assumption of fixed X is therefore a preliminary technical

assumption, to be dropped later.

ˆ

In order to know how good the estimator β is, one needs the statistical properties

ˆ − β. This sampling error has the following formula:

of its “sampling error” β



ˆ

β − β = (X X)−1 X y − (X X)−1 X Xβ =

(24.0.7)



= (X X)−1 X (y − Xβ) = (X X)−1 X ε



ˆ

From (24.0.7) follows immediately that β is unbiased, since E [(X X)−1 X ε ] = o.

Unbiasedness does not make an estimator better, but many good estimators are

unbiased, and it simplifies the math.

We will use the MSE-matrix as a criterion for how good an estimator of a vector

of unobserved parameters is. Chapter 23 gave some reasons why this is a sensible

criterion (compare [DM93, Chapter 5.5]).



Xem Thêm
Tải bản đầy đủ (.pdf) (1,644 trang)

×