Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )
328
12. A SIMPLE EXAMPLE OF ESTIMATION
1. The location parameter of the Normal distribution is its expected value, and
by the weak law of large numbers, the probability limit for n → ∞ of the sample
mean is the expected value.
2. The expected value µ is sometimes called the “population mean,” while y is
¯
the sample mean. This terminology indicates that there is a correspondence between
population quantities and sample quantities, which is often used for estimation. This
is the principle of estimating the unknown distribution of the population by the
empirical distribution of the sample. Compare Problem 63.
3. This estimator is also unbiased. By definition, an estimator t of the parameter
θ is unbiased if E[t] = θ. y is an unbiased estimator of µ, since E[¯] = µ.
¯
y
4. Given n observations y1 , . . . , yn , the sample mean is the number a = y which
¯
minimizes (y1 − a)2 + (y2 − a)2 + · · · + (yn − a)2 . One can say it is the number whose
squared distance to the given sample numbers is smallest. This idea is generalized
in the least squares principle of estimation. It follows from the following frequently
used fact:
5. In the case of normality the sample mean is also the maximum likelihood
estimate.
12.1. SAMPLE MEAN AS ESTIMATOR OF THE LOCATION PARAMETER
329
Problem 183. 4 points Let y1 , . . . , yn be an arbitrary vector and α an arbitrary
n
number. As usual, y = n i=1 yi . Show that
¯ 1
n
n
(yi − α)2 =
(12.1.1)
i=1
(yi − y )2 + n(¯ − α)2
¯
y
i=1
Answer.
n
n
(yi − α)2 =
(12.1.2)
i=1
(12.1.3)
(yi − y ) + (¯ − α)
¯
y
i=1
n
i=1
n
(12.1.4)
n
n
(yi − y )2 + 2
¯
=
2
i=1
i=1
n
(yi − y )2 + 2(¯ − α)
¯
y
=
(¯ − α)2
y
(yi − y )(¯ − α) +
¯ y
i=1
(yi − y ) + n(¯ − α)2
¯
y
i=1
Since the middle term is zero, (12.1.1) follows.
Problem 184. 2 points Let y be a n-vector. (It may be a vector of observations
of a random variable y, but it does not matter how the yi were obtained.) Prove that
330
12. A SIMPLE EXAMPLE OF ESTIMATION
the scalar α which minimizes the sum
(12.1.5)
(y1 − α)2 + (y2 − α)2 + · · · + (yn − α)2 =
(yi − α)2
is the arithmetic mean α = y .
¯
Answer. Use (12.1.1).
Problem 185. Give an example of a distribution in which the sample mean is
not a good estimate of the location parameter. Which other estimate (or estimates)
would be preferable in that situation?
12.2. Intuition of the Maximum Likelihood Estimator
In order to make intuitively clear what is involved in maximum likelihood estimation, look at the simplest case y = µ + ε, ε ∼ N (0, 1), where µ is an unknown
parameter. In other words: we know that one of the functions shown in Figure 1 is
the density function of y, but we do not know which:
Assume we have only one observation y. What is then the MLE of µ? It is that
µ for which the value of the likelihood function, evaluated at y, is greatest. I.e., you
˜
look at all possible density functions and pick the one which is highest at point y,
and use the µ which belongs this density as your estimate.
12.2. INTUITION OF THE MAXIMUM LIKELIHOOD ESTIMATOR
331
. .........
. .
.................................
.................................
...................................
.....................................................
......................................
..............................................................
.........
.........
.........
..........
..........
..........
.......... .................. .................. ...................
..........
.......... ..................
...........
............
............
..........
.........
............. ...................... ...................
.........
.............
..............
.......
........
.......
.....
.....
..... .................. .........
......
.............
....
..............
.........................................................
..........................................................
..........................................................
..........................................................
µ1
q
µ2
µ3
µ4
Figure 1. Possible Density Functions for y
2) Now assume two independent observations of y are given, y1 and y2 . The
family of density functions is still the same. Which of these density functions do we
choose now? The one for which the product of the ordinates over y1 and y2 gives
the highest value. For this the peak of the density function must be exactly in the
middle between the two observations.
3) Assume again that we made two independent observations y1 and y2 of y, but
this time not only the expected value but also the variance of y is unknown, call it
σ 2 . This gives a larger family of density functions to choose from: they do not only
differ by location, but some are low and fat and others tall and skinny.
For which density function is the product of the ordinates over y1 and y2 the
largest again? Before even knowing our estimate of σ 2 we can already tell what µ is:
˜
it must again be (y1 + y2 )/2. Then among those density functions which are centered
332
12. A SIMPLE EXAMPLE OF ESTIMATION
.......... ..........
......... .........
......................................
...................................... ...................... ...............................................
......................................
............................................................. ................................................
.........
...... .............
.........
.......... ........ ........
..........
...... ............
.......... ......... ........
..........
......
......
.
.............
..
........
..
......... .......... ...........
..........................................................
......... ......... ...........
..............................................................
..........
...........
.......
.....
...........
...........
.......................................................... ......................... ...................
.......................................................... .............................................
q
µ1
q
µ2
µ3
µ4
Figure 2. Two observations, σ 2 = 1
Figure 3. Two observations, σ 2 unknown
over (y1 + y2 )/2, there is one which is highest over y1 and y2 . Figure 4 shows the
densities for standard deviations 0.01, 0.05, 0.1, 0.5, 1, and 5. All curves, except
the last one, are truncated at the point where the resolution of TEX can no longer
distinguish between their level and zero. For the last curve this point would only be
reached at the coordinates ±25.
4) If we have many observations, then the density pattern of the observations,
as indicated by the histogram below, approximates the actual density function of y
itself. That likelihood function must be chosen which has a high value where the
points are dense, and which has a low value where the points are not so dense.
12.2. INTUITION OF THE MAXIMUM LIKELIHOOD ESTIMATOR
333
..
.
..
.
..
..
..
..
..
..
..
..
..
..
.
..
..
..
.
..
..
.
..
..
..
..
.
..
..
..
..
.
..
..
.
..
..
..
..
..
.
..
....
...
....
.
..
....
....
...
...
...
....
.
..
..
....
...
....
...
..
...
.
....
.
...
.
...
.
...
.
...
....
....
....
....
..
....
....
....
....
..
..
....
....
..
..
....
....
....
.
....
. .
....
. .
....
. .
....
. ..
. ...
. .
. ...
.
. .
. ...
. .
. ...
.
. .
. ...
.
. ..
. ...
. ..
. ...
.. .
. .
. .. .
. .
.....
.
. ..
......
......
..
......
.....
......
......
......
......
......
.... .
......
......
......
.
......
......
......
......
..
......
......
......
.. ....
.
.. ....
.. ....
.. .
.........
. ..
. ....
... .. .
..........
...
.... . .. .....
.... . .. . ...
...
.. . . .. . . ..
.. . . .. . . ....
.. .......................
............................
.. . . ....
..
. . . . .. . .
..... . . .. . .
..........
..
... .......
.......... . . .. . . ..................
.......... . . .. . .
..
....
..............
.....
..............
... ........
. .. . .
.
. .
. . .. . .
......
......
.
................................................................................................
................................................................................................
.
..........................................................................................................................................................................................................................................................................................................................
...........................................................................................................................................................................................................................................................................................................................
..
...........................................
........................................... .
.....................................................
.....................................................
........... . . ..........
....... .. . . ..........
Figure 4. Only those centered over the two observations need to be considered
Figure 5. Many Observations
12.2.1. Precision of the Estimator. How good is y as estimate of µ? To an¯
swer this question we need some criterion how to measure “goodness.” Assume your
334
12. A SIMPLE EXAMPLE OF ESTIMATION
business depends on the precision of the estimate µ of µ. It incurs a penalty (extra
ˆ
cost) amounting to (ˆ − µ)2 . You don’t know what this error will be beforehand,
µ
but the expected value of this “loss function” may be an indication how good the
estimate is. Generally, the expected value of a loss function is called the “risk,” and
for the quadratic loss function E[(ˆ − µ)2 ] it has the name “mean squared error of
µ
µ as an estimate of µ,” write it MSE[ˆ; µ]. What is the mean squared error of y ?
ˆ
µ
¯
2
y
y
y
y
Since E[¯] = µ, it is E[(¯ − E[¯])2 ] = var[¯] = σ .
n
Note that the MSE of y as an estimate of µ does not depend on µ. This is
¯
convenient, since usually the MSE depends on unknown parameters, and therefore
one usually does not know how good the estimator is. But it has more important
advantages. For any estimator y of µ follows MSE[˜; µ] = var[˜] + (E[˜] − µ)2 . If
˜
y
y
y
˜
y is linear (perhaps with a constant term), then var[˜] is a constant which does
y
not depend on µ, therefore the MSE is a constant if y is unbiased and a quadratic
˜
function of µ (parabola) if y is biased. Since a parabola is an unbounded function,
˜
a biased linear estimator has therefore the disadvantage that for certain values of µ
its MSE may be very high. Some estimators are very good when µ is in one area,
and very bad when µ is in another area. Since our unbiased estimator y has bounded
¯
MSE, it will not let us down, wherever nature has hidden the µ.
On the other hand, the MSE does depend on the unknown σ 2 . So we have to
estimate σ 2 .
12.3. VARIANCE ESTIMATION AND DEGREES OF FREEDOM
335
12.3. Variance Estimation and Degrees of Freedom
It is not so clear what the best estimator of σ 2 is. At least two possibilities are
in common use:
s2 =
m
1
n
s2 =
u
(12.3.1)
1
n−1
(y i − y )2
¯
or
(12.3.2)
(y i − y )2 .
¯
Let us compute the expected value of our two estimators. Equation (12.1.1) with
α = E[y] allows us to simplify the sum of squared errors so that it becomes easy to
take expected values:
n
(12.3.3)
n
(y i − y )2 ] =
¯
E[
i=1
(12.3.4)
E[(y i − µ)2 ] − n E[(¯ − µ)2 ]
y
i=1
n
σ2 − n
=
i=1
σ2
= (n − 1)σ 2 .
n
336
12. A SIMPLE EXAMPLE OF ESTIMATION
because E[(y i − µ)2 ] = var[y i ] = σ 2 and E[(¯ − µ)2 ] = var[¯] =
y
y
use as estimator of σ 2 the quantity
s2 =
u
(12.3.5)
1
n−1
σ2
n .
Therefore, if we
n
(y i − y )2
¯
i=1
then this is an unbiased estimate.
Problem 186. 4 points Show that
s2 =
u
(12.3.6)
1
n−1
n
(y i − y )2
¯
i=1
is an unbiased estimator of the variance. List the assumptions which have to be made
about y i so that this proof goes through. Do you need Normality of the individual
observations y i to prove this?
Answer. Use equation (12.1.1) with α = E[y]:
n
(12.3.7)
n
(y i − y )2 ] =
¯
E[
i=1
(12.3.8)
E[(y i − µ)2 ] − n E[(¯ − µ)2 ]
y
i=1
n
σ2 − n
=
i=1
σ2
= (n − 1)σ 2 .
n
12.3. VARIANCE ESTIMATION AND DEGREES OF FREEDOM
337
You do not need Normality for this.
For testing, confidence intervals, etc., one also needs to know the probability
distribution of s2 . For this look up once more Section 5.9 about the Chi-Square
u
distribution. There we introduced the terminology that a random variable q is distributed as a σ 2 χ2 iff q/σ 2 is a χ2 . In our model with n independent normal variables
y i with same mean and variance, the variable (y i − y )2 is a σ 2 χ2 . Problem 187
¯
n−1
gives a proof of this in the simplest case n = 2, and Problem 188 looks at the case
σ2
n = 3. But it is valid for higher n too. Therefore s2 is a n−1 χ2 . This is reu
n−1
markable: the distribution of s2 does not depend on µ. Now use (5.9.5) to get the
u
2σ 4
variance of s2 : it is n−1 .
u
Problem 187. Let y 1 and y 2 be two independent Normally distributed variables
with mean µ and variance σ 2 , and let y be their arithmetic mean.
¯
• a. 2 points Show that
2
(12.3.9)
¯
(y i − y )2 ∼ σ 2 χ2
1
SSE =
i−1
Hint: Find a Normally distributed random variable z with expected value 0 and variance 1 such that SSE = σ 2 z 2 .
338
12. A SIMPLE EXAMPLE OF ESTIMATION
Answer.
(12.3.10)
(12.3.11)
y1 − y
¯
(12.3.12)
y2 − y
¯
(12.3.13)
(y 1 − y )2 + (y 2 − y )2
¯
¯
(12.3.14)
y1 + y2
2
y1 − y2
=
2
y1 − y2
=−
2
(y 1 − y 2 )2
(y − y 2 )2
=
+ 1
4
4
2
2 y1 − y2
,
=σ
√
2σ 2
y=
¯
=
(y 1 − y 2 )2
2
√
and since z = (y 1 − y 2 )/ 2σ 2 ∼ N (0, 1), its square is a χ2 .
1
• b. 4 points Write down the covariance matrix of the vector
y1 − y
¯
y2 − y
¯
(12.3.15)
and show that it is singular.
Answer. (12.3.11) and (12.3.12) give
(12.3.16)
1
y1 − y
¯
2
=
y2 − y
¯
−1
2
−1
2
1
2
y1
y2
= Dy