Chapter 7. Chebyshev Inequality, Weak Law of Large Numbers, and Central Limit Theorem

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )

190

7. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEO

to.”) One does not need to know the full distribution of y for that, only its expected

value and standard deviation. We will give here a proof only if y has a discrete

distribution, but the inequality is valid in general. Going over to the standardized

1

variable z = y−µ we have to show Pr[|z|≥k] ≤ k2 . Assuming z assumes the values

σ

z1 , z2 ,. . . with probabilities p(z1 ), p(z2 ),. . . , then

(7.1.2)

Pr[|z|≥k] =

p(zi ).

i : |zi |≥k

Now multiply by k 2 :

(7.1.3)

k 2 Pr[|z|≥k] =

k 2 p(zi )

i : |zi |≥k

(7.1.4)

2

zi p(zi )

≤

i : |zi |≥k

(7.1.5)

2

zi p(zi ) = var[z] = 1.

≤

all i

The Chebyshev inequality is sharp for all k ≥ 1. Proof: the random variable

1

which takes the value −k with probability 2k2 and the value +k with probability

7.1. CHEBYSHEV INEQUALITY

191

1

and 0 with probability 1 − k2 , has expected value 0 and variance 1 and the

≤-sign in (7.1.1) becomes an equal sign.

1

2k2 ,

Problem 115. [HT83, p. 316] Let y be the number of successes in n trials of a

Bernoulli experiment with success probability p. Show that

y

1

− p <ε ≥ 1 −

.

n

4nε2

Hint: ﬁrst compute what Chebyshev will tell you about the lefthand side, and then

you will need still another inequality.

(7.1.6)

Pr

Answer. E[y/n] = p and var[y/n] = pq/n (where q = 1 − p). Chebyshev says therefore

(7.1.7)

Setting ε = k

(7.1.8)

Pr

y

− p ≥k

n

pq

n

≤

1

.

k2

pq/n, therefore 1/k2 = pq/nε2 one can rewerite (7.1.7) as

Pr

y

− p ≥ε

n

≤

pq

.

nε2

Now note that pq ≤ 1/4 whatever their values are.

Problem 116. 2 points For a standard normal variable, Pr[|z|≥1] is approximately 1/3, please look up the precise value in a table. What does the Chebyshev

192

7. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEO

inequality says about this probability? Also, Pr[|z|≥2] is approximately 5%, again

look up the precise value. What does Chebyshev say?

Answer. Pr[|z|≥1] = 0.3174, the Chebyshev inequality says that Pr[|z|≥1] ≤ 1.

Pr[|z|≥2] = 0.0456, while Chebyshev says it is ≤ 0.25.

Also,

7.2. The Probability Limit and the Law of Large Numbers

Let y 1 , y 2 , y 3 , . . . be a sequence of independent random variables all of which

n

1

have the same expected value µ and variance σ 2 . Then y n = n i=1 y i has expected

¯

2

value µ and variance σ . I.e., its probability mass is clustered much more closely

n

around the value µ than the individual y i . To make this statement more precise we

need a concept of convergence of random variables. It is not possible to deﬁne it in

the “obvious” way that the sequence of random variables y n converges toward y if

every realization of them converges, since it is possible, although extremely unlikely,

that e.g. all throws of a coin show heads ad inﬁnitum, or follow another sequence

for which the average number of heads does not converge towards 1/2. Therefore we

will use the following deﬁnition:

The sequence of random variables y 1 , y 2 , . . . converges in probability to another

random variable y if and only if for every δ > 0

(7.2.1)

lim Pr |y n − y| ≥δ = 0.

n→∞

7.2. THE PROBABILITY LIMIT AND THE LAW OF LARGE NUMBERS

193

One can also say that the probability limit of y n is y, in formulas

(7.2.2)

plim y n = y.

n→∞

In many applications, the limiting variable y is a degenerate random variable, i.e., it

is a constant.

The Weak Law of Large Numbers says that, if the expected value exists, then the

probability limit of the sample means of an ever increasing sample is the expected

value, i.e., plimn→∞ y n = µ.

¯

Problem 117. 5 points Assuming that not only the expected value but also the

variance exists, derive the Weak Law of Large Numbers, which can be written as

(7.2.3)

lim Pr |¯n − E[y]|≥δ = 0 for all δ > 0,

y

n→∞

from the Chebyshev inequality

(7.2.4)

Pr[|x − µ|≥kσ] ≤

1

k2

where µ = E[x] and σ 2 = var[x]

Answer. From nonnegativity of probability and the Chebyshev inequality for x = y follows

¯

√

kσ

1

σ2

0 ≤ Pr[|¯ − µ|≥ √n ] ≤ k2 for all k. Set k = δ σ n to get 0 ≤ Pr[|¯n − µ|≥δ] ≤ nδ2 . For any ﬁxed

y

y

δ > 0, the upper bound converges towards zero as n → ∞, and the lower bound is zero, therefore

the probability iself also converges towards zero.

194

7. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEO

Problem 118. 4 points Let y 1 , . . . , y n be a sample from some unknown probn

1

ability distribution, with sample mean y = n i=1 y i and sample variance s2 =

¯

n

1

¯ 2

i=1 (y i − y ) . Show that the data satisfy the following “sample equivalent” of

n

the Chebyshev inequality: if k is any ﬁxed positive number, and m is the number of

observations y j which satisfy y j − y ≥ks, then m ≤ n/k 2 . In symbols,

¯

n

.

k2

Hint: apply the usual Chebyshev inequality to the so-called empirical distribution of

the sample. The empirical distribution is a discrete probability distribution deﬁned

by Pr[y=y i ] = k/n, when the number y i appears k times in the sample. (If all y i are

diﬀerent, then all probabilities are 1/n). The empirical distribution corresponds to

the experiment of randomly picking one observation out of the given sample.

(7.2.5)

¯

#{y i : |y i − y | ≥ks} ≤

Answer. The only thing to note is: the sample mean is the expected value in that empirical

distribution, the sample variance is the variance, and the relative number m/n is the probability.

(7.2.6)

#{y i : y i ∈ S} = n Pr[S]

• a. 3 points What happens to this result when the distribution from which the

y i are taken does not have an expected value or a variance?

7.3. CENTRAL LIMIT THEOREM

195

Answer. The result still holds but y and s2 do not converge as the number of observations

¯

increases.

7.3. Central Limit Theorem

Assume all y i are independent and have the same distribution with mean µ,

variance σ 2 , and also a moment generating function. Again, let y n be the sample

¯

mean of the ﬁrst n observations. The central limit theorem says that the probability

distribution for

¯

yn − µ

√

(7.3.1)

σ/ n

converges to a N (0, 1). This is a diﬀerent concept of convergence than the probability

limit, it is convergence in distribution.

Problem 119. 1 point Construct a sequence of random variables y 1 , y 2 . . . with

the following property: their cumulative distribution functions converge to the cumulative distribution function of a standard normal, but the random variables themselves

do not converge in probability. (This is easy!)

Answer. One example would be: all y i are independent standard normal variables.

196

7. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEO

y n −µ

¯

Why do we have the funny expression σ/√n ? Because this is the standardized

version of y n . We know from the law of large numbers that the distribution of

¯

y n becomes more and more concentrated around µ. If we standardize the sample

¯

averages y n , we compensate for this concentration. The central limit theorem tells

¯

us therefore what happens to the shape of the cumulative distribution function of y n .

¯

If we disregard the fact that it becomes more and more concentrated (by multiplying

it by a factor which is chosen such that the variance remains constant), then we see

that its geometric shape comes closer and closer to a normal distribution.

Proof of the Central Limit Theorem: By Problem 120,

(7.3.2)

yn − µ

¯

1

√ =√

σ/ n

n

n

i=1

yi − µ

1

=√

σ

n

n

zi

where z i =

i=1

yi − µ

.

σ

Let m3 , m4 , etc., be the third, fourth, etc., moments of z i ; then the m.g.f. of z i is

(7.3.3)

mzi (t) = 1 +

Therefore the m.g.f. of

(7.3.4)

1+

1

√

n

n

i=1

t2

m3 t3

m4 t4

+

+

+ ···

2!

3!

4!

√

z i is (multiply and substitute t/ n for t):

t2

m3 t3

m4 t 4

+ √ +

+ ···

2!n 3! n3

4!n2

n

= 1+

wn

n

n

7.3. CENTRAL LIMIT THEOREM

197

where

(7.3.5)

wn =

m4 t 4

t2

m3 t3

+ √ +

+ ··· .

2! 3! n

4!n

Now use Euler’s limit, this time in the form: if wn → w for n → ∞, then 1+ wn

n

n

→

t2

2

2

ew . Since our wn → t2 , the m.g.f. of the standardized y n converges toward e , which

¯

is that of a standard normal distribution.

The Central Limit theorem is an example of emergence: independently of the

distributions of the individual summands, the distribution of the sum has a very

speciﬁc shape, the Gaussian bell curve. The signals turn into white noise. Here

emergence is the emergence of homogenity and indeterminacy. In capitalism, much

more speciﬁc outcomes emerge: whether one quits the job or not, whether one sells

the stock or not, whether one gets a divorce or not, the outcome for society is to

perpetuate the system. Not many activities don’t have this outcome.

Problem 120. Show in detail that

√

Answer. Lhs =

µ

= rhs.

n

σ

1

n

n

i=1

y n −µ

¯

√

σ/ n

√

y i −µ

=

n

σ

=

1

√

n

1

n

n

y i −µ

i=1 σ .

n

i=1

yi −

1

n

n

i=1

√

µ

=

n 1

σ n

n

i=

198

7. CHEBYSHEV INEQUALITY, WEAK LAW OF LARGE NUMBERS, AND CENTRAL LIMIT THEO

Problem 121. 3 points Explain verbally clearly what the law of large numbers

means, what the Central Limit Theorem means, and what their diﬀerence is.

Problem 122. (For this problem, a table is needed.) [Lar82, exercise 5.6.1,

p. 301] If you roll a pair of dice 180 times, what is the approximate probability that

the sum seven appears 25 or more times? Hint: use the Central Limit Theorem (but

don’t worry about the continuity correction, which is beyond the scope of this class).

Answer. Let xi be the random variable that equals one if the i-th roll is a seven, and zero

otherwise. Since 7 can be obtained in six ways (1+6, 2+5, 3+4, 4+3, 5+2, 6+1), the probability

to get a 7 (which is at the same time the expected value of xi ) is 6/36=1/6. Since x2 = xi ,

i

180

1

5

var[xi ] = E[xi ] − (E[xi ])2 = 1 − 36 = 36 . Deﬁne x =

x . We need Pr[x≥25]. Since x

6

i=1 i

is the sum of many independent identically distributed random variables, the CLT says that x is

asympotically normal. Which normal? That which has the same expected value and variance as

x. E[x] = 180 · (1/6) = 30 and var[x] = 180 · (5/36) = 25. Therefore deﬁne y ∼ N (30, 25). The

CLT says that Pr[x≥25] ≈ Pr[y≥25]. Now y≥25 ⇐⇒ y − 30≥ − 5 ⇐⇒ y − 30≤ + 5 ⇐⇒

(y − 30)/5≤1. But z = (y − 30)/5 is a standard Normal, therefore Pr[(y − 30)/5≤1] = Fz (1), i.e.,

the cumulative distribution of the standard Normal evaluated at +1. One can look this up in a

table, the probability asked for is .8413. Larson uses the continuity correction: x is discrete, and

Pr[x≥25] = Pr[x>24]. Therefore Pr[y≥25] and Pr[y>24] are two alternative good approximations;

but the best is Pr[y≥24.5] = .8643. This is the continuity correction.

CHAPTER 8

Vector Random Variables

In this chapter we will look at two random variables x and y deﬁned on the same

sample space U , i.e.,

(8.0.6)

x: U

ω → x(ω) ∈ R

and

y: U

ω → y(ω) ∈ R.

As we said before, x and y are called independent if all events of the form x ≤ x

are independent of any event of the form y ≤ y. But now let us assume they are

not independent. In this case, we do not have all the information about them if we

merely know the distribution of each.

The following example from [Lar82, example 5.1.7. on p. 233] illustrates the

issues involved. This example involves two random variables that have only two

possible outcomes each. Suppose you are told that a coin is to be ﬂipped two times

199

200

8. VECTOR RANDOM VARIABLES

and that the probability of a head is .5 for each ﬂip. This information is not enough

to determine the probability of the second ﬂip giving a head conditionally on the

ﬁrst ﬂip giving a head.

For instance, the above two probabilities can be achieved by the following experimental setup: a person has one fair coin and ﬂips it twice in a row. Then the

two ﬂips are independent.

But the probabilities of 1/2 for heads and 1/2 for tails can also be achieved as

follows: The person has two coins in his or her pocket. One has two heads, and one

has two tails. If at random one of these two coins is picked and ﬂipped twice, then

the second ﬂip has the same outcome as the ﬁrst ﬂip.

What do we need to get the full picture? We must consider the two variables not

separately but jointly, as a totality. In order to do this, we combine x and y into one

x

entity, a vector

∈ R2 . Consequently we need to know the probability measure

y

x(ω)

induced by the mapping U ω →

∈ R2 .

y(ω)

It is not suﬃcient to look at random variables individually; one must look at

them as a totality.

Therefore let us ﬁrst get an overview over all possible probability measures on the

plane R2 . In strict analogy with the one-dimensional case, these probability measures

Xem Thêm

Chapter 7. Chebyshev Inequality, Weak Law of Large Numbers, and Central Limit Theorem

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về