Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )
630
23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION
ˆ
For our purposes, therefore, the estimator (or predictor) φ of the unknown parameter
˜
ˆ
(or unobserved random variable) φ is no worse than the alternative φ if MSE[φ; φ] ≤
˜ φ]. This is a criterion which can be applied before any observations are
MSE[φ;
collected and actual estimations are made; it is an “initial” criterion regarding the
expected average performance in a series of future trials (even though, in economics,
usually only one trial is made).
23.1. Comparison of Two Vector Estimators
ˆ
˜
If one wants to compare two vector estimators, say φ and φ, it is often impossible
ˆ
to say which of two estimators is better. It may be the case that φ1 is better than
˜ (in terms of MSE or some other criterion), but φ is worse than φ . And even if
ˆ
˜
φ1
2
2
ˆ than by φ , certain linear combinations
˜
every component φi is estimated better by φi
i
˜
ˆ
t φ of the components of φ may be estimated better by t φ than by t φ.
ˆ
Problem 294. 2 points Construct an example of two vector estimators φ and
ˆ
˜
˜ of the same random vector φ = φ1 φ2 , so that MSE[φi ; φi ] < MSE[φi ; φi ] for
φ
ˆ
ˆ
˜
˜
i = 1, 2 but MSE[φ1 + φ2 ; φ1 + φ2 ] > MSE[φ1 + φ2 ; φ1 + φ2 ]. Hint: it is easiest to use
an example in which all random variables are constants. Another hint: the geometric
ˆ
˜
analog would be to find two vectors in a plane φ and φ. In each component (i.e.,
23.1. COMPARISON OF TWO VECTOR ESTIMATORS
631
ˆ
˜
projection on the axes), φ is closer to the origin than φ. But in the projection on
˜ is closer to the origin than φ.
ˆ
the diagonal, φ
ˆ
φ=
Answer. In the simplest counterexample, all variables involved are constants: φ =
1 , and φ = −2 .
˜
1
0
0
,
2
ˆ
One can only then say unambiguously that the vector φ is a no worse estimator
˜
than φ if its MSE is smaller or equal for every linear combination. Theorem 23.1.1
ˆ
will show that this is the case if and only if the MSE-matrix of φ is smaller, by a
˜
nonnegative definite matrix, than that of φ. If this is so, then theorem 23.1.1 says
that not only the MSE of all linear transformations, but also all other nonnegative
definite quadratic loss functions involving these vectors (such as the trace of the
MSE-matrix, which is an often-used criterion) are minimized. In order to formulate
and prove this, we first need a formal definition of the MSE-matrix. We write MSE
ˆ
for the matrix and MSE for the scalar mean squared error. The MSE-matrix of φ
as an estimator of φ is defined as
(23.1.1)
ˆ
ˆ
ˆ
MSE[φ; φ] = E [(φ − φ)(φ − φ) ] .
632
23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION
ˆ
Problem 295. 2 points Let θ be a vector of possibly random parameters, and θ
an estimator of θ. Show that
(23.1.2)
ˆ
ˆ
ˆ
ˆ
MSE[θ; θ] = V [θ − θ] + (E [θ − θ])(E [θ − θ]) .
Don’t assume the scalar result but make a proof that is good for vectors and scalars.
Answer. For any random vector x follows
E [xx ] = E (x − E [x] + E [x])(x − E [x] + E [x])
= E (x − E [x])(x − E [x])
− E (x − E [x]) E [x]
− E E [x](x − E [x])
+ E E [x] E [x]
= V [x] − O − O + E [x] E [x] .
ˆ
Setting x = θ − θ the statement follows.
ˆ
If θ is nonrandom, formula (23.1.2) simplifies slightly, since in this case V [θ−θ] =
ˆ In this case, the MSE matrix is the covariance matrix plus the squared bias
V [θ].
ˆ
matrix. If θ is nonrandom and in addition θ is unbiased, then the MSE-matrix
coincides with the covariance matrix.
ˆ
˜
Theorem 23.1.1. Assume φ and φ are two estimators of the parameter φ (which
is allowed to be random itself ). Then conditions (23.1.3), (23.1.4), and (23.1.5) are
23.1. COMPARISON OF TWO VECTOR ESTIMATORS
633
equivalent:
(23.1.3)
For every constant vector t,
(23.1.4)
˜
ˆ
MSE[φ; φ] − MSE[φ; φ]
(23.1.5)
For every nnd Θ,
ˆ
˜
MSE[t φ; t φ] ≤ MSE[t φ; t φ]
is a nonnegative definite matrix
ˆ
ˆ
˜
˜
E (φ − φ) Θ(φ − φ) ≤ E (φ − φ) Θ(φ − φ) .
˜
ˆ
Proof. Call MSE[φ; φ] = σ 2 Ξ and MSE[φ; φ] = σ 2Ω . To show that (23.1.3)
ˆ t φ] = σ 2 t Ωt and likewise MSE[t φ; t φ
˜
implies (23.1.4), simply note that MSE[t φ;
2
σ t Ξt. Therefore (23.1.3) is equivalent to t (Ξ − Ω )t ≥ 0 for all t, which is the
defining property making Ξ − Ω nonnegative definite.
Here is the proof that (23.1.4) implies (23.1.5):
ˆ
ˆ
ˆ
ˆ
E[(φ − φ) Θ(φ − φ)] = E[tr (φ − φ) Θ(φ − φ) ] =
ˆ
ˆ
= E[tr Θ(φ − φ)(φ − φ)
ˆ
ˆ
Ω
] = tr Θ E [(φ − φ)(φ − φ) ] = σ 2 tr ΘΩ
and in the same way
˜
˜
E[(φ − φ) Θ(φ − φ)] = σ 2 tr ΘΞ .
The difference in the expected quadratic forms is therefore σ 2 tr Θ(Ξ − Ω ) . By
assumption, Ξ − Ω is nonnegative definite. Therefore, by theorem A.5.6 in the
Mathematical Appendix, or by Problem 296 below, this trace is nonnegative.
634
23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION
To complete the proof, (23.1.5) has (23.1.3) as a special case if one sets Θ =
tt .
Problem 296. Show that if Θ and Σ are symmetric and nonnegative definite,
Σ
then tr(ΘΣ ) ≥ 0. You are allowed to use that tr(AB) = tr(BA), that the trace of a
nonnegative definite matrix is ≥ 0, and Problem 129 (which is trivial).
Σ
Answer. Write Θ = RR ; then tr(ΘΣ ) = tr(RR Σ ) = tr(R Σ R) ≥ 0.
Problem 297. Consider two very simple-minded estimators of the unknown
nonrandom parameter vector φ = φ1 . Neither of these estimators depends on any
φ2
ˆ
observations, they are constants. The first estimator is φ = [ 11 ], and the second is
11
˜ = [ 12 ].
φ
8
• a. 2 points Compute the MSE-matrices of these two estimators if the true
value of the parameter vector is φ = [ 10 ]. For which estimator is the trace of the
10
MSE matrix smaller?
23.1. COMPARISON OF TWO VECTOR ESTIMATORS
635
ˆ
Answer. φ has smaller trace of the MSE-matrix.
1
ˆ
φ−φ=
1
ˆ
ˆ
ˆ
MSE[φ; φ] = E [(φ − φ)(φ − φ) ]
= E[
1
1
˜
φ−φ=
4
−4
1 ] = E[
1
1
1
1
]=
1
1
1
1
2
−2
˜
MSE[φ; φ] =
1
−4
4
Note that both MSE-matrices are singular, i.e., both estimators allow an error-free look at certain
linear combinations of the parameter vector.
ˆ
• b. 1 point Give two vectors g = [ g1 ] and h = h1 satisfying MSE[g φ; g φ] <
g2
h2
˜
ˆ
˜
MSE[g φ; g φ] and MSE[h φ; h φ] > MSE[h φ; h φ] (g and h are not unique;
there are many possibilities).
1
ˆ
˜
Answer. With g = −1 and h = 1 for instance we get g φ − g φ = 0, g φ −
1
ˆ
˜
ˆ
˜
g φ = 4, h φ; h φ = 2, h φ; h φ = 0, therefore MSE[g φ; g φ] = 0, MSE[g φ; g φ] = 16,
636
23. THE MEAN SQUARED ERROR AS AN INITIAL CRITERION OF PRECISION
ˆ
˜
MSE[h φ; h φ] = 4, MSE[h φ; h φ] = 0. An alternative way to compute this is e.g.
˜
MSE[h φ; h φ] = 1
−1
4
−4
−4
4
1
= 16
−1
ˆ
˜
˜
• c. 1 point Show that neither MSE[φ; φ] − MSE[φ; φ] nor MSE[φ; φ] −
ˆ
MSE[φ; φ] is a nonnegative definite matrix. Hint: you are allowed to use the
mathematical fact that if a matrix is nonnegative definite, then its determinant is
nonnegative.
Answer.
(23.1.6)
˜
ˆ
MSE[φ; φ] − MSE[φ; φ] =
3
−5
−5
3
Its determinant is negative, and the determinant of its negative is also negative.
CHAPTER 24
Sampling Properties of the Least Squares
Estimator
ˆ
The estimator β was derived from a geometric argument, and everything which
we showed so far are what [DM93, p. 3] calls its numerical as opposed to its statistical
ˆ
properties. But β has also nice statistical or sampling properties. We are assuming
right now the specification given in (18.1.3), in which X is an arbitrary matrix of full
column rank, and we are not assuming that the errors must be Normally distributed.
The assumption that X is nonrandom means that repeated samples are taken with
the same X-matrix. This is often true for experimental data, but not in econometrics.
The sampling properties which we are really interested in are those where also the Xmatrix is random; we will derive those later. For this later derivation, the properties
637
638
24. SAMPLING PROPERTIES OF THE LEAST SQUARES ESTIMATOR
with fixed X-matrix, which we are going to discuss presently, will be needed as an
intermediate step. The assumption of fixed X is therefore a preliminary technical
assumption, to be dropped later.
ˆ
In order to know how good the estimator β is, one needs the statistical properties
ˆ − β. This sampling error has the following formula:
of its “sampling error” β
ˆ
β − β = (X X)−1 X y − (X X)−1 X Xβ =
(24.0.7)
= (X X)−1 X (y − Xβ) = (X X)−1 X ε
ˆ
From (24.0.7) follows immediately that β is unbiased, since E [(X X)−1 X ε ] = o.
Unbiasedness does not make an estimator better, but many good estimators are
unbiased, and it simplifies the math.
We will use the MSE-matrix as a criterion for how good an estimator of a vector
of unobserved parameters is. Chapter 23 gave some reasons why this is a sensible
criterion (compare [DM93, Chapter 5.5]).