Chapter 34. Asymptotic Properties of the OLS Estimator

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )

848

34. ASYMPTOTIC PROPERTIES OF OLS

Two examples where this is not the case. Look at the model y t = α+βt+εt . Here





1 1

1 2 





1 + 1 + 1 + ··· + 1 1 + 2 + 3 + ··· + n





X = 1 3 . Therefore X X =

=

1 + 2 + 3 + · · · + n 1 + 4 + 9 + · · · + n2

. . 

. .

. .

1 n

1 ∞

n

n(n + 1)/2

1

. Here the assumption

, and n X X →

∞ ∞

n(n + 1)/2 n(n + 1)(2n + 1)/6

(34.0.5) does not hold, but one can still prove consistency and asymtotic normality,

the estimators converge even faster than in the usual case.

The other example is the model y t = α + βλt + εt with a known λ with −1 <

λ < 1. Here

X X=

=

1 + 1 + ··· + 1

λ + λ2 + · · · + λn

(λ − λ

n

)/(1 − λ)

n+1

λ + λ2 + · · · + λn

=

λ2 + λ4 + · · · + λ2n

(λ − λn+1 )/(1 − λ)

.

(λ2 − λ2n+2 )/(1 − λ2 )

34. ASYMPTOTIC PROPERTIES OF OLS

849

1 0

, which is singular. In this case, a consistent estimate of

0 0

λ does not exist: future observations depend on λ so little that even with inﬁnitely

many observations there is not enough information to get the precise value of λ.

ˆ

We will show that under assumption (34.0.5), β and s2 are consistent. However

this assumption is really too strong for consistency. A weaker set of assumptions

is the Grenander conditions, see [Gre97, p. 275]. To write down the Grenander

conditions, remember that presently X depends on n (in that we only look at the

ﬁrst n elements of y and ﬁrst n rows of X), therefore also the column vectors xj also

depend of n (although we are not indicating this here). Therefore xj xj depends

on n as well, and we will make this dependency explicit by writing xj xj = d2 .

nj

Then the ﬁrst Grenander condition is limn→∞ d2 = +∞ for all j. Second: for all i

nj

and k, limn→∞ maxi=1···n xij /d2 = 0 (here is a typo in Greene, he leaves the max

nj

out). Third: Sample correlation matrix of the columns of X minus the constant

term converges to a nonsingular matrix.

Consistency means that the probability limit of the estimates converges towards

ˆ

ˆ

the true value. For β this can be written as plimn→∞ β n = β. This means by

ˆ

deﬁnition that for all ε > 0 follows limn→∞ Pr[|β n − β| ≤ ε] = 1.

The probability limit is one of several concepts of limits used in probability

theory. We will need the following properties of the plim here:

Therefore

1

nX

X→

850

34. ASYMPTOTIC PROPERTIES OF OLS

(1) For nonrandom magnitudes, the probability limit is equal to the ordinary

limit.

(2) It satisﬁes the Slutsky theorem, that for a continuous function g,

(34.0.6)

plim g(z) = g(plim(z)).

(3) If the MSE-matrix of an estimator converges towards the null matrix, then

the estimator is consistent.

(4) Kinchine’s theorem: the sample mean of an i.i.d. distribution is a consistent

estimate of the population mean, even if the distribution does not have a population

variance.

34.1. Consistency of the OLS estimator

ˆ

For the proof of consistency of the OLS estimators β and of s2 we need the

following result:

1

X ε = o.

n

I.e., the true ε is asymptotically orthogonal to all columns of X. This follows immediately from MSE[o; X ε /n] = E [X εε X/n2 ] = σ 2 X X/n2 , which converges

towards O.

(34.1.1)

plim

34.1. CONSISTENCY OF THE OLS ESTIMATOR

851

ˆ

ˆ

In order to prove consistency of β and s2 , transform the formulas for β and s2

in such a way that they are written as continuous functions of terms each of which

ˆ

converges for n → ∞, and then apply Slutsky’s theorem. Write β as

(34.1.2)

(34.1.3)

(34.1.4)

X X

ˆ

β = β + (X X)−1 X ε = β +

n

X X −1

X ε

ˆ

plim β = β + lim

plim

n

n

−1

= β + Q o = β.

−1 X

ε

n

Let’s look at the geometry of this when there is only one explanatory variable.

ε

The speciﬁcation is therefore y = xβ +ε . The assumption is that ε is asymptotically

orthogonal to x. In small samples, it only happens by sheer accident with probability

0 that ε is orthogonal to x. Only ε is. But now let’s assume the sample grows

ˆ

larger, i.e., the vectors y and x become very high-dimensional observation vectors,

i.e. we are drawing here a two-dimensional subspace out of a very high-dimensional

space. As more and more data are added, the observation vectors also become

√

longer and longer. But if we divide each vector by n, then the lengths of these

√

normalized lenghts stabilize. The squared length of the vector ε / n has the plim of

1

σ 2 . Furthermore, assumption (34.0.5) means in our case that plimn→∞ n x x exists

1

and is nonsingular. This is the squared length of √n x. I.e., if we normalize the

852

34. ASYMPTOTIC PROPERTIES OF OLS

√

vectors by dividing them by n, then they do not get longer but converge towards

1

a ﬁnite length. And the result (34.1.1) plim n x ε = 0 means now that with this

√

√

normalization, ε / n becomes more and more orthogonal to x/ n. I.e., if n is large

ˆ

enough, asymptotically, not only ε but also the true ε is orthogonal to x, and this

ˆ

means that asymptotically β converges towards the true β.

For the proof of consistency of s2 we need, among others, that plim ε nε = σ 2 ,

ε

which is a consequence of Kinchine’s theorem. Since ε ε = ε Mε it follows

ˆ ˆ

n

I

X X X −1 X

ε ε

ˆ ˆ

=

−

ε

ε=

n−k

n−k

n

n

n

n

ε ε ε X X X −1 X ε

n

=

−

→ 1 · σ 2 − o Q−1 o .

n−k n

n

n

n

34.2. Asymptotic Normality of the Least Squares Estimator

√ To show asymptotic normality of an estimator, multiply the sampling error by

n, so that the variance is stabilized.

1

1

We have seen plim n X ε = o. Now look at √n X ε n . Its mean is o and its covariance matrix σ 2 X n X . Shape of distribution, due to a variant of the Central Limit

34.2. ASYMPTOTIC NORMALITY OF THE LEAST SQUARES ESTIMATOR

Theorem, is asymptotically normal:

is convergence in distribution.)

√ ˆ

We can write n(β n −β) = X

1

√ X

n

X

n

−1

853

ε n → N (o, σ 2 Q). (Here the convergence

−1

1

( √n X ε n ). Therefore its limiting covari√ ˆ

ance matrix is Q−1 σ 2 QQ−1 = σ 2 Q , Therefore n(β n −β) → N (o, σ 2 Q−1 ) in disˆ

tribution. One can also say: the asymptotic distribution of β is N (β, σ 2 (X X)−1 ).

√

ˆn − Rβ) → N (o, σ 2 RQ−1 R ), and therefore

From this follows n(Rβ

(34.2.1)

ˆ

n(Rβ n − Rβ) RQ−1 R

−1

ˆ

(Rβ n − Rβ) → σ 2 χ2 .

i

Divide by s2 and replace in the limiting case Q by X X/n and s2 by σ 2 to get

−1

ˆ

ˆ

(Rβ n − Rβ)

(Rβ n − Rβ) R(X X)−1 R

→ χ2

i

2

s

in distribution. All this is not a proof; the point is that in the denominator, the

distribution is divided by the increasingly bigger number n − k, while in the numerator, it is divided by the constant i; therefore asymptotically the denominator can

be considered 1.

The central limit theorems only say that for n → ∞ these converge towards the

χ2 , which is asymptotically equal to the F distribution. It is easily possible that

before one gets to the limit, the F -distribution is better.

(34.2.2)

854

34. ASYMPTOTIC PROPERTIES OF OLS

ˆ

Problem 393. Are the residuals y − X β asymptotically normally distributed?

√

Answer. Only if the disturbances are normal, otherwise of course not! We can show that

√

ˆ

ε ˆ

n(ε − ε) = nX(β − β) ∼ N (o, σ 2 XQX ).

Now these results also go through if one has stochastic regressors. [Gre97, 6.7.7]

shows that the above condition (34.0.5) with the lim replaced by plim holds if xi

and ε i are an i.i.d. sequence of random variables.

Problem 394. 2 points In the regression model with random regressors y =

1

1

ε

Xβ+ε , you only know that plim n X X = Q is a nonsingular matrix, and plim n X ε

o. Using these two conditions, show that the OLS estimate is consistent.

ˆ

Answer. β = (X X)−1 X y = β + (X X)−1 X ε due to (24.0.7), and

plim(X X)−1 X ε = plim(

X X −1 X ε

)

= Qo = o.

n

n

CHAPTER 35

Least Squares as the Normal Maximum Likelihood

Estimate

Now assume ε is multivariate normal. We will show that in this case the OLS

ˆ

estimator β is at the same time the Maximum Likelihood Estimator. For this we

need to write down the density function of y. First look at one y t which is y t ∼

 

x1

 . 

2

N (xt β, σ ), where X =  . , i.e., xt is the tth row of X. It is written as a

.

xn

column vector, since we follow the “column vector convention.” The (marginal)

855

856

35. LEAST SQUARES AS THE NORMAL MAXIMUM LIKELIHOOD ESTIMATE

density function for this one observation is

(35.0.3)

fyt (yt ) = √

1

2πσ 2

e−(yt −xt

β)2 /2σ 2

.

Since the y i are stochastically independent, their joint density function is the product,

which can be written as

(35.0.4)

fy (y) = (2πσ 2 )−n/2 exp −

1

(y − Xβ) (y − Xβ) .

2σ 2

To compute the maximum likelihood estimator, it is advantageous to start with

the log likelihood function:

(35.0.5)

log fy (y; β, σ 2 ) = −

n

n

1

log 2π − log σ 2 − 2 (y − Xβ) (y − Xβ).

2

2

2σ

Assume for a moment that σ 2 is known. Then the MLE of β is clearly equal to

ˆ

ˆ

the OLS β. Since β does not depend on σ 2 , it is also the maximum likelihood

ˆ

estimate when σ 2 is unknown. β is a linear function of y. Linear transformations

of normal variables are normal. Normal distributions are characterized by their

mean vector and covariance matrix. The distribution of the MLE of β is therefore

ˆ

β ∼ N (β, σ 2 (X X)−1 ).

Xem Thêm

Chapter 34. Asymptotic Properties of the OLS Estimator

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về