Chapter 12. A Simple Example of Estimation

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )

328

12. A SIMPLE EXAMPLE OF ESTIMATION

1. The location parameter of the Normal distribution is its expected value, and

by the weak law of large numbers, the probability limit for n → ∞ of the sample

mean is the expected value.

2. The expected value µ is sometimes called the “population mean,” while y is

¯

the sample mean. This terminology indicates that there is a correspondence between

population quantities and sample quantities, which is often used for estimation. This

is the principle of estimating the unknown distribution of the population by the

empirical distribution of the sample. Compare Problem 63.

3. This estimator is also unbiased. By deﬁnition, an estimator t of the parameter

θ is unbiased if E[t] = θ. y is an unbiased estimator of µ, since E[¯] = µ.

¯

y

4. Given n observations y1 , . . . , yn , the sample mean is the number a = y which

¯

minimizes (y1 − a)2 + (y2 − a)2 + · · · + (yn − a)2 . One can say it is the number whose

squared distance to the given sample numbers is smallest. This idea is generalized

in the least squares principle of estimation. It follows from the following frequently

used fact:

5. In the case of normality the sample mean is also the maximum likelihood

estimate.

12.1. SAMPLE MEAN AS ESTIMATOR OF THE LOCATION PARAMETER

329

Problem 183. 4 points Let y1 , . . . , yn be an arbitrary vector and α an arbitrary

n

number. As usual, y = n i=1 yi . Show that

¯ 1

n

n

(yi − α)2 =

(12.1.1)

i=1

(yi − y )2 + n(¯ − α)2

¯

y

i=1

Answer.

n

n

(yi − α)2 =

(12.1.2)

i=1

(12.1.3)

(yi − y ) + (¯ − α)

¯

y

i=1

n

i=1

n

(12.1.4)

n

n

(yi − y )2 + 2

¯

=

2

i=1

i=1

n

(yi − y )2 + 2(¯ − α)

¯

y

=

(¯ − α)2

y

(yi − y )(¯ − α) +

¯ y

i=1

(yi − y ) + n(¯ − α)2

¯

y

i=1

Since the middle term is zero, (12.1.1) follows.

Problem 184. 2 points Let y be a n-vector. (It may be a vector of observations

of a random variable y, but it does not matter how the yi were obtained.) Prove that

330

12. A SIMPLE EXAMPLE OF ESTIMATION

the scalar α which minimizes the sum

(12.1.5)

(y1 − α)2 + (y2 − α)2 + · · · + (yn − α)2 =

(yi − α)2

is the arithmetic mean α = y .

¯

Answer. Use (12.1.1).

Problem 185. Give an example of a distribution in which the sample mean is

not a good estimate of the location parameter. Which other estimate (or estimates)

would be preferable in that situation?

12.2. Intuition of the Maximum Likelihood Estimator

In order to make intuitively clear what is involved in maximum likelihood estimation, look at the simplest case y = µ + ε, ε ∼ N (0, 1), where µ is an unknown

parameter. In other words: we know that one of the functions shown in Figure 1 is

the density function of y, but we do not know which:

Assume we have only one observation y. What is then the MLE of µ? It is that

µ for which the value of the likelihood function, evaluated at y, is greatest. I.e., you

˜

look at all possible density functions and pick the one which is highest at point y,

and use the µ which belongs this density as your estimate.

12.2. INTUITION OF THE MAXIMUM LIKELIHOOD ESTIMATOR

331

. .........

. .

.................................

.................................

...................................

.....................................................

......................................

..............................................................

.........

.........

.........

..........

..........

..........

.......... .................. .................. ...................

..........

.......... ..................

...........

............

............

..........

.........

............. ...................... ...................

.........

.............

..............

.......

........

.......

.....

.....

..... .................. .........

......

.............

....

..............

.........................................................

..........................................................

..........................................................

..........................................................

µ1

q

µ2

µ3

µ4

Figure 1. Possible Density Functions for y

2) Now assume two independent observations of y are given, y1 and y2 . The

family of density functions is still the same. Which of these density functions do we

choose now? The one for which the product of the ordinates over y1 and y2 gives

the highest value. For this the peak of the density function must be exactly in the

middle between the two observations.

3) Assume again that we made two independent observations y1 and y2 of y, but

this time not only the expected value but also the variance of y is unknown, call it

σ 2 . This gives a larger family of density functions to choose from: they do not only

diﬀer by location, but some are low and fat and others tall and skinny.

For which density function is the product of the ordinates over y1 and y2 the

largest again? Before even knowing our estimate of σ 2 we can already tell what µ is:

˜

it must again be (y1 + y2 )/2. Then among those density functions which are centered

332

12. A SIMPLE EXAMPLE OF ESTIMATION

.......... ..........

......... .........

......................................

...................................... ...................... ...............................................

......................................

............................................................. ................................................

.........

...... .............

.........

.......... ........ ........

..........

...... ............

.......... ......... ........

..........

......

......

.

.............

..

........

..

......... .......... ...........

..........................................................

......... ......... ...........

..............................................................

..........

...........

.......

.....

...........

...........

.......................................................... ......................... ...................

.......................................................... .............................................

q

µ1

q

µ2

µ3

µ4

Figure 2. Two observations, σ 2 = 1

Figure 3. Two observations, σ 2 unknown

over (y1 + y2 )/2, there is one which is highest over y1 and y2 . Figure 4 shows the

densities for standard deviations 0.01, 0.05, 0.1, 0.5, 1, and 5. All curves, except

the last one, are truncated at the point where the resolution of TEX can no longer

distinguish between their level and zero. For the last curve this point would only be

reached at the coordinates ±25.

4) If we have many observations, then the density pattern of the observations,

as indicated by the histogram below, approximates the actual density function of y

itself. That likelihood function must be chosen which has a high value where the

points are dense, and which has a low value where the points are not so dense.

12.2. INTUITION OF THE MAXIMUM LIKELIHOOD ESTIMATOR

333

..

.

..

.

..

..

..

..

..

..

..

..

..

..

.

..

..

..

.

..

..

.

..

..

..

..

.

..

..

..

..

.

..

..

.

..

..

..

..

..

.

..

....

...

....

.

..

....

....

...

...

...

....

.

..

..

....

...

....

...

..

...

.

....

.

...

.

...

.

...

.

...

....

....

....

....

..

....

....

....

....

..

..

....

....

..

..

....

....

....

.

....

. .

....

. .

....

. .

....

. ..

. ...

. .

. ...

.

. .

. ...

. .

. ...

.

. .

. ...

.

. ..

. ...

. ..

. ...

.. .

. .

. .. .

. .

.....

.

. ..

......

......

..

......

.....

......

......

......

......

......

.... .

......

......

......

.

......

......

......

......

..

......

......

......

.. ....

.

.. ....

.. ....

.. .

.........

. ..

. ....

... .. .

..........

...

.... . .. .....

.... . .. . ...

...

.. . . .. . . ..

.. . . .. . . ....

.. .......................

............................

.. . . ....

..

. . . . .. . .

..... . . .. . .

..........

..

... .......

.......... . . .. . . ..................

.......... . . .. . .

..

....

..............

.....

..............

... ........

. .. . .

.

. .

. . .. . .

......

......

.

................................................................................................

................................................................................................

.

..........................................................................................................................................................................................................................................................................................................................

...........................................................................................................................................................................................................................................................................................................................

..

...........................................

........................................... .

.....................................................

.....................................................

........... . . ..........

....... .. . . ..........

Figure 4. Only those centered over the two observations need to be considered

Figure 5. Many Observations

12.2.1. Precision of the Estimator. How good is y as estimate of µ? To an¯

swer this question we need some criterion how to measure “goodness.” Assume your

334

12. A SIMPLE EXAMPLE OF ESTIMATION

business depends on the precision of the estimate µ of µ. It incurs a penalty (extra

ˆ

cost) amounting to (ˆ − µ)2 . You don’t know what this error will be beforehand,

µ

but the expected value of this “loss function” may be an indication how good the

estimate is. Generally, the expected value of a loss function is called the “risk,” and

for the quadratic loss function E[(ˆ − µ)2 ] it has the name “mean squared error of

µ

µ as an estimate of µ,” write it MSE[ˆ; µ]. What is the mean squared error of y ?

ˆ

µ

¯

2

y

y

y

y

Since E[¯] = µ, it is E[(¯ − E[¯])2 ] = var[¯] = σ .

n

Note that the MSE of y as an estimate of µ does not depend on µ. This is

¯

convenient, since usually the MSE depends on unknown parameters, and therefore

one usually does not know how good the estimator is. But it has more important

advantages. For any estimator y of µ follows MSE[˜; µ] = var[˜] + (E[˜] − µ)2 . If

˜

y

y

y

˜

y is linear (perhaps with a constant term), then var[˜] is a constant which does

y

not depend on µ, therefore the MSE is a constant if y is unbiased and a quadratic

˜

function of µ (parabola) if y is biased. Since a parabola is an unbounded function,

˜

a biased linear estimator has therefore the disadvantage that for certain values of µ

its MSE may be very high. Some estimators are very good when µ is in one area,

and very bad when µ is in another area. Since our unbiased estimator y has bounded

¯

MSE, it will not let us down, wherever nature has hidden the µ.

On the other hand, the MSE does depend on the unknown σ 2 . So we have to

estimate σ 2 .

12.3. VARIANCE ESTIMATION AND DEGREES OF FREEDOM

335

12.3. Variance Estimation and Degrees of Freedom

It is not so clear what the best estimator of σ 2 is. At least two possibilities are

in common use:

s2 =

m

1

n

s2 =

u

(12.3.1)

1

n−1

(y i − y )2

¯

or

(12.3.2)

(y i − y )2 .

¯

Let us compute the expected value of our two estimators. Equation (12.1.1) with

α = E[y] allows us to simplify the sum of squared errors so that it becomes easy to

take expected values:

n

(12.3.3)

n

(y i − y )2 ] =

¯

E[

i=1

(12.3.4)

E[(y i − µ)2 ] − n E[(¯ − µ)2 ]

y

i=1

n

σ2 − n

=

i=1

σ2

= (n − 1)σ 2 .

n

336

12. A SIMPLE EXAMPLE OF ESTIMATION

because E[(y i − µ)2 ] = var[y i ] = σ 2 and E[(¯ − µ)2 ] = var[¯] =

y

y

use as estimator of σ 2 the quantity

s2 =

u

(12.3.5)

1

n−1

σ2

n .

Therefore, if we

n

(y i − y )2

¯

i=1

then this is an unbiased estimate.

Problem 186. 4 points Show that

s2 =

u

(12.3.6)

1

n−1

n

(y i − y )2

¯

i=1

is an unbiased estimator of the variance. List the assumptions which have to be made

about y i so that this proof goes through. Do you need Normality of the individual

observations y i to prove this?

Answer. Use equation (12.1.1) with α = E[y]:

n

(12.3.7)

n

(y i − y )2 ] =

¯

E[

i=1

(12.3.8)

E[(y i − µ)2 ] − n E[(¯ − µ)2 ]

y

i=1

n

σ2 − n

=

i=1

σ2

= (n − 1)σ 2 .

n

12.3. VARIANCE ESTIMATION AND DEGREES OF FREEDOM

337

You do not need Normality for this.

For testing, conﬁdence intervals, etc., one also needs to know the probability

distribution of s2 . For this look up once more Section 5.9 about the Chi-Square

u

distribution. There we introduced the terminology that a random variable q is distributed as a σ 2 χ2 iﬀ q/σ 2 is a χ2 . In our model with n independent normal variables

y i with same mean and variance, the variable (y i − y )2 is a σ 2 χ2 . Problem 187

¯

n−1

gives a proof of this in the simplest case n = 2, and Problem 188 looks at the case

σ2

n = 3. But it is valid for higher n too. Therefore s2 is a n−1 χ2 . This is reu

n−1

markable: the distribution of s2 does not depend on µ. Now use (5.9.5) to get the

u

2σ 4

variance of s2 : it is n−1 .

u

Problem 187. Let y 1 and y 2 be two independent Normally distributed variables

with mean µ and variance σ 2 , and let y be their arithmetic mean.

¯

• a. 2 points Show that

2

(12.3.9)

¯

(y i − y )2 ∼ σ 2 χ2

1

SSE =

i−1

Hint: Find a Normally distributed random variable z with expected value 0 and variance 1 such that SSE = σ 2 z 2 .

338

12. A SIMPLE EXAMPLE OF ESTIMATION

Answer.

(12.3.10)

(12.3.11)

y1 − y

¯

(12.3.12)

y2 − y

¯

(12.3.13)

(y 1 − y )2 + (y 2 − y )2

¯

¯

(12.3.14)

y1 + y2

2

y1 − y2

=

2

y1 − y2

=−

2

(y 1 − y 2 )2

(y − y 2 )2

=

+ 1

4

4

2

2 y1 − y2

,

=σ

√

2σ 2

y=

¯

=

(y 1 − y 2 )2

2

√

and since z = (y 1 − y 2 )/ 2σ 2 ∼ N (0, 1), its square is a χ2 .

1

• b. 4 points Write down the covariance matrix of the vector

y1 − y

¯

y2 − y

¯

(12.3.15)

and show that it is singular.

Answer. (12.3.11) and (12.3.12) give

(12.3.16)

1

y1 − y

¯

2

=

y2 − y

¯

−1

2

−1

2

1

2

y1

y2

= Dy

Xem Thêm

Chapter 12. A Simple Example of Estimation

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về