2 Expected Values, Covariance, and Correlation

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (15.74 MB, 756 trang )

5.2 Expected Values, Covariance, and Correlation

PROPOSITION

Let X and Y be jointly distributed rv’s with pmf p(x, y) or pdf f (x, y) according

to whether the variables are discrete or continuous. Then the expected value of

a function h(X, Y ), denoted by E[h(X, Y )] or ␮h(X, Y ), is given by

E [h(X, Y )] ϭ

Example 5.13

197

{

Α Α h(x, y) и p(x, y)

if X and Y are discrete

͵ ͵

if X and Y are continuous

x y

∞ ∞

h(x, y) и f (x, y) dx dy

Ϫ∞ Ϫ∞

Five friends have purchased tickets to a certain concert. If the tickets are for seats

1–5 in a particular row and the tickets are randomly distributed among the five, what

is the expected number of seats separating any particular two of the five? Let X and

Y denote the seat numbers of the first and second individuals, respectively. Possible

(X, Y) pairs are {(1, 2), (1, 3), . . . , (5, 4)}, and the joint pmf of (X, Y) is

{

1

ᎏᎏ

p(x, y) ϭ 20

0

x ϭ 1, . . . , 5; y ϭ 1, . . . , 5; x

y

otherwise

The number of seats separating the two individuals is h(X, Y) ϭ ⏐X Ϫ Y⏐ Ϫ 1. The

accompanying table gives h(x, y) for each possible (x, y) pair.

h(x, y)

y

1

2

3

4

5

|

|

|

|

|

|

1

2

x

3

4

5

—

0

1

2

3

0

—

0

1

2

1

0

—

0

1

2

1

0

—

0

3

2

1

0

—

Thus

5

E[h(X, Y )] ϭ ΑΑ h(x, y) и p(x, y) ϭ Α

1

■

xϭ1 yϭ1

(x, y)

x

Example 5.14

5

ϭ1

Α (⏐x Ϫ y⏐ Ϫ 1) и ᎏ

20

y

In Example 5.5, the joint pdf of the amount X of almonds and amount Y of cashews

in a 1-lb can of nuts was

f (x, y) ϭ

{

24xy 0 Յ x Յ 1, 0 Յ y Յ 1, x ϩ y Յ 1

0 otherwise

If 1 lb of almonds costs the company $1.00, 1 lb of cashews costs $1.50, and 1 lb of

peanuts costs $.50, then the total cost of the contents of a can is

h(X, Y ) ϭ (1)X ϩ (1.5)Y ϩ (.5)(1 Ϫ X Ϫ Y ) ϭ .5 ϩ .5X ϩ Y

(since 1 Ϫ X Ϫ Y of the weight consists of peanuts). The expected total cost is

͵͵

ϭ͵ ͵

E[h(X, Y )] ϭ

ϱ ϱ

h(x, y) и f(x, y) dx dy

᎐ϱ ᎐ϱ

1 1Ϫx

0 0

(.5 ϩ .5x ϩ y) и 24xy dy dx ϭ $1.10

■

The method of computing the expected value of a function h(X1, . . . , Xn) of

n random variables is similar to that for two random variables. If the Xi s are discrete, E[h(X1, . . . , Xn)] is an n-dimensional sum; if the Xi s are continuous, it is

an n-dimensional integral.

198

CHAPTER 5

Joint Probability Distributions and Random Samples

Covariance

When two random variables X and Y are not independent, it is frequently of interest

to assess how strongly they are related to one another.

DEFINITION

The covariance between two rv’s X and Y is

Cov(X, Y ) ϭ E[(X Ϫ ␮X)(Y Ϫ ␮Y)]

ϭ

{

Α Α (x Ϫ ␮X)(y Ϫ ␮Y)p(x, y)

X, Y discrete

͵͵

X, Y continuous

x y

ϱ ϱ

᎐ϱ ᎐ϱ

(x Ϫ ␮X)(y Ϫ ␮Y)f (x, y) dx dy

That is, since X Ϫ ␮X and Y Ϫ ␮Y are the deviations of the two variables from their

respective mean values, the covariance is the expected product of deviations. Note

that Cov(X, X) ϭ E[(X Ϫ ␮X)2] ϭ V(X).

The rationale for the definition is as follows. Suppose X and Y have a strong

positive relationship to one another, by which we mean that large values of X tend

to occur with large values of Y and small values of X with small values of Y. Then

most of the probability mass or density will be associated with (x Ϫ ␮X) and (y Ϫ ␮Y),

either both positive (both X and Y above their respective means) or both negative,

so the product (x Ϫ ␮X)(y Ϫ ␮Y) will tend to be positive. Thus for a strong positive

relationship, Cov(X, Y ) should be quite positive. For a strong negative relationship,

the signs of (x Ϫ ␮X) and (y Ϫ ␮Y) will tend to be opposite, yielding a negative

product. Thus for a strong negative relationship, Cov(X, Y ) should be quite negative. If X and Y are not strongly related, positive and negative products will tend to

cancel one another, yielding a covariance near 0. Figure 5.4 illustrates the different

possibilities. The covariance depends on both the set of possible pairs and the probabilities. In Figure 5.4, the probabilities could be changed without altering the set

of possible pairs, and this could drastically change the value of Cov(X, Y ).

y

y

؊

y

؉

؊

␮Y

؉

␮Y

؉

؊

␮Y

؉

x

␮X

(a)

؊

x

x

␮X

(b)

␮X

(c)

Figure 5.4 p(x, y ) ϭ 1/10 for each of ten pairs corresponding to indicated points; (a) positive

covariance; (b) negative covariance; (c) covariance near zero

Example 5.15

The joint and marginal pmf’s for X ϭ automobile policy deductible amount and Y ϭ

homeowner policy deductible amount in Example 5.1 were

p(x, y)

x

100

250

|

|

|

0

y

100

200

x

.20

.05

.10

.15

.20

.30

pX(x)

|

|

100

250

y

.5

.5

pY (y)

|

|

0

100

200

.25

.25

.5

5.2 Expected Values, Covariance, and Correlation

199

from which ␮X ϭ ΑxpX(x) ϭ 175 and ␮Y ϭ 125. Therefore,

Cov(X, Y ) ϭ ΑΑ (x Ϫ 175)(y Ϫ 125)p(x, y)

(x, y)

ϭ (100 Ϫ 175)(0 Ϫ 125)(.20) ϩ . . .

ϩ (250 Ϫ 175)(200 Ϫ 125)(.30)

■

ϭ 1875

The following shortcut formula for Cov(X, Y ) simplifies the computations.

Cov(X, Y) ϭ E(XY) Ϫ ␮X и ␮Y

PROPOSITION

According to this formula, no intermediate subtractions are necessary; only at the end

of the computation is ␮X и ␮Y subtracted from E(XY). The proof involves expanding

(X Ϫ ␮X)(Y Ϫ ␮Y) and then taking the expected value of each term separately.

Example 5.16

(Example 5.5

continued)

The joint and marginal pdf’s of X ϭ amount of almonds and Y ϭ amount of cashews

were

0 Յ x Յ 1, 0 Յ y Յ 1, x ϩ y Յ 1

{24xy0 otherwise

12x(1 Ϫ x) 0 Յ x Յ 1

f (x) ϭ {

0

otherwise

f(x, y) ϭ

2

X

with fY (y) obtained by replacing x by y in fX(x). It is easily verified that ␮X ϭ ␮Y ϭ ᎏ5ᎏ,

and

2

E(XY ) ϭ

͵͵

ϱ ϱ

᎐ϱ ᎐ϱ

xy f (x, y) dx dy ϭ

͵͵

1 1Ϫx

0 0

xy и 24xy dy dx

͵

1

2

ϭ 8 x2(1 Ϫ x)3 dx ϭ ᎏᎏ

15

0

Thus Cov(X, Y) ϭ ᎏ15ᎏ Ϫ (ᎏ5ᎏ)(ᎏ5ᎏ) ϭ ᎏ15ᎏ Ϫ ᎏ25ᎏ ϭ ᎐ ᎏ75ᎏ. A negative covariance is reasonable

here because more almonds in the can implies fewer cashews.

■

2

2

2

2

4

2

It might appear that the relationship in the insurance example is quite strong

2

since Cov(X, Y) ϭ 1875, whereas Cov(X, Y) ϭ ᎐ ᎏ75ᎏ in the nut example would seem

to imply quite a weak relationship. Unfortunately, the covariance has a serious defect

that makes it impossible to interpret a computed value. In the insurance example,

suppose we had expressed the deductible amount in cents rather than in dollars. Then

100X would replace X, 100Y would replace Y, and the resulting covariance would be

Cov(100X, 100Y) ϭ (100)(100)Cov(X, Y) ϭ 18,750,000. If, on the other hand, the

deductible amount had been expressed in hundreds of dollars, the computed covariance would have been (.01)(.01)(1875) ϭ .1875. The defect of covariance is that its

computed value depends critically on the units of measurement. Ideally, the choice

of units should have no effect on a measure of strength of relationship. This is

achieved by scaling the covariance.

200

CHAPTER 5

Joint Probability Distributions and Random Samples

Correlation

DEFINITION

The correlation coefficient of X and Y, denoted by Corr(X, Y ), ␳X,Y, or just ␳,

is defined by

Cov (X, Y)

␳X,Y ϭ ᎏᎏ

␴X и ␴Y

Example 5.17

It is easily verified that in the insurance scenario of Example 5.15, E(X2) ϭ 36,250,

␴ X2 ϭ 36,250 Ϫ (175)2 ϭ 5625, ␴X ϭ 75, E(Y 2) ϭ 22,500, ␴ Y2 ϭ 6875, and ␴Y ϭ

82.92. This gives

1875

␳ ϭ ᎏᎏ ϭ .301

(75)(82.92)

■

The following proposition shows that ␳ remedies the defect of Cov(X, Y ) and

also suggests how to recognize the existence of a strong (linear) relationship.

PROPOSITION

1. If a and c are either both positive or both negative,

Corr(aX ϩ b, cY ϩ d) ϭ Corr(X, Y )

2. For any two rv’s X and Y, Ϫ1 Յ Corr(X, Y) Յ 1.

Statement 1 says precisely that the correlation coefficient is not affected by a linear

change in the units of measurement (if, say, X ϭ temperature in °C, then 9X/5 ϩ 32 ϭ

temperature in °F). According to Statement 2, the strongest possible positive relationship is evidenced by ␳ ϭ ϩ1, whereas the strongest possible negative relationship corresponds to ␳ ϭ ᎐1. The proof of the first statement is sketched in Exercise 35, and that

of the second appears in Supplementary Exercise 87 at the end of the chapter. For

descriptive purposes, the relationship will be described as strong if ⏐␳⏐ Ն .8, moderate

if .5 Ͻ ⏐␳⏐ Ͻ .8, and weak if ⏐␳⏐ Յ .5.

If we think of p(x, y) or f(x, y) as prescribing a mathematical model for how the

two numerical variables X and Y are distributed in some population (height and weight,

verbal SAT score and quantitative SAT score, etc.), then ␳ is a population characteristic or parameter that measures how strongly X and Y are related in the population.

In Chapter 12, we will consider taking a sample of pairs (x1, y1), . . . , (xn, yn) from the

population. The sample correlation coefficient r will then be defined and used to make

inferences about ␳.

The correlation coefficient ␳ is actually not a completely general measure of

the strength of a relationship.

PROPOSITION

1. If X and Y are independent, then ␳ ϭ 0, but ␳ ϭ 0 does not imply independence.

2. ␳ ϭ 1 or ᎐1 iff Y ϭ aX ϩ b for some numbers a and b with a

0.

This proposition says that ␳ is a measure of the degree of linear relationship between

X and Y, and only when the two variables are perfectly related in a linear manner will ␳

be as positive or negative as it can be. A ␳ less than 1 in absolute value indicates only

5.2 Expected Values, Covariance, and Correlation

201

that the relationship is not completely linear, but there may still be a very strong nonlinear relation. Also, ␳ ϭ 0 does not imply that X and Y are independent, but only that

there is complete absence of a linear relationship. When ␳ ϭ 0, X and Y are said to be

uncorrelated. Two variables could be uncorrelated yet highly dependent because there

is a strong nonlinear relationship, so be careful not to conclude too much from knowing that ␳ ϭ 0.

Example 5.18

Let X and Y be discrete rv’s with joint pmf

{

1

ᎏᎏ (x, y) ϭ (᎐4, 1), (4, ᎐1), (2, 2), (᎐2, ᎐2)

p(x, y) ϭ 4

0 otherwise

The points that receive positive probability mass are identified on the (x, y) coordinate system in Figure 5.5. It is evident from the figure that the value of X is completely

determined by the value of Y and vice versa, so the two variables are completely

1

1

dependent. However, by symmetry ␮X ϭ ␮Y ϭ 0 and E(XY) ϭ (᎐4)ᎏ4ᎏ ϩ (᎐4)ᎏ4ᎏ ϩ

1

1

(4)ᎏ4ᎏ ϩ (4)ᎏ4ᎏ ϭ 0, so Cov(X, Y) ϭ E(XY) Ϫ ␮X и ␮Y ϭ 0 and thus ␳X,Y ϭ 0. Although

there is perfect dependence, there is also complete absence of any linear relationship!

2

1

؊4

؊3

؊2

؊1

؊1

1

2

3

4

؊2

Figure 5.5

The population of pairs for Example 5.18

■

A value of ␳ near 1 does not necessarily imply that increasing the value of

X causes Y to increase. It implies only that large X values are associated with

large Y values. For example, in the population of children, vocabulary size and number of cavities are quite positively correlated, but it is certainly not true that cavities

cause vocabulary to grow. Instead, the values of both these variables tend to increase

as the value of age, a third variable, increases. For children of a fixed age, there is

probably a very low correlation between number of cavities and vocabulary size. In

summary, association (a high correlation) is not the same as causation.

EXERCISES

Section 5.2 (22–36)

22. An instructor has given a short quiz consisting of two parts.

For a randomly selected student, let X ϭ the number of points

earned on the first part and Y ϭ the number of points earned

on the second part. Suppose that the joint pmf of X and Y is

given in the accompanying table.

y

p(x, y)

x

0

5

10

|

|

|

|

0

5

10

15

.02

.04

.01

.06

.15

.15

.02

.20

.14

.10

.10

.01

a. If the score recorded in the grade book is the total number of points earned on the two parts, what is the expected

recorded score E(X ϩ Y)?

b. If the maximum of the two scores is recorded, what is the

expected recorded score?

23. The difference between the number of customers in line at

the express checkout and the number in line at the superexpress checkout in Exercise 3 is X1 Ϫ X2. Calculate the

expected difference.

24. Six individuals, including A and B, take seats around a circular table in a completely random fashion. Suppose the seats

202

CHAPTER 5

Joint Probability Distributions and Random Samples

are numbered 1, . . . , 6. Let X ϭ A’s seat number and Y ϭ B’s

seat number. If A sends a written message around the table

to B in the direction in which they are closest, how many

individuals (including A and B) would you expect to handle

the message?

25. A surveyor wishes to lay out a square region with each side

having length L. However, because of measurement error, he

instead lays out a rectangle in which the north–south sides

both have length X and the east–west sides both have length

Y. Suppose that X and Y are independent and that each

is uniformly distributed on the interval [L Ϫ A, L ϩ A]

(where 0 Ͻ A Ͻ L). What is the expected area of the resulting rectangle?

26. Consider a small ferry that can accommodate cars and buses.

The toll for cars is $3, and the toll for buses is $10. Let X and

Y denote the number of cars and buses, respectively, carried

on a single trip. Suppose the joint distribution of X and Y is

as given in the table of Exercise 7. Compute the expected

revenue from a single trip.

27. Annie and Alvie have agreed to meet for lunch between noon

(0:00 P.M.) and 1:00 P.M. Denote Annie’s arrival time by X,

Alvie’s by Y, and suppose X and Y are independent with pdf’s

{3x0

2y

f (y) ϭ {

0

fX(x) ϭ

2

Y

0ՅxՅ1

otherwise

0ՅyՅ1

otherwise

What is the expected amount of time that the one who

arrives first must wait for the other person? [Hint: h(X, Y ) ϭ

⏐X Ϫ Y⏐.]

28. Show that if X and Y are independent rv’s, then E(XY) ϭ

E(X) и E(Y). Then apply this in Exercise 25. [Hint: Consider

the continuous case with f(x, y) ϭ fX (x) и fY (y).]

29. Compute the correlation coefficient ␳ for X and Y of

Example 5.16 (the covariance has already been computed).

30. a. Compute the covariance for X and Y in Exercise 22.

b. Compute ␳ for X and Y in the same exercise.

31. a. Compute the covariance between X and Y in Exercise 9.

b. Compute the correlation coefficient ␳ for this X and Y.

32. Reconsider the minicomputer component lifetimes X and

Y as described in Exercise 12. Determine E(XY ). What can

be said about Cov(X, Y ) and ␳?

33. Use the result of Exercise 28 to show that when X and Y are

independent, Cov(X, Y) ϭ Corr(X, Y) ϭ 0.

34. a. Recalling the definition of ␴ 2 for a single rv X, write a

formula that would be appropriate for computing the variance of a function h(X, Y) of two random variables.

[Hint: Remember that variance is just a special expected

value.]

b. Use this formula to compute the variance of the recorded

score h(X, Y) [ϭ max(X, Y)] in part (b) of Exercise 22.

35. a. Use the rules of expected value to show that Cov(aX ϩ b,

cY ϩ d) ϭ ac Cov(X, Y).

b. Use part (a) along with the rules of variance and standard

deviation to show that Corr(aX ϩ b, cY ϩ d) ϭ Corr(X, Y)

when a and c have the same sign.

c. What happens if a and c have opposite signs?

36. Show that if Y ϭ aX ϩ b (a 0), then Corr(X, Y) ϭ ϩ1 or ᎐1.

Under what conditions will ␳ ϭ ϩ1?

5.3 Statistics and Their Distributions

The observations in a single sample were denoted in Chapter 1 by x1, x2, . . . , xn.

Consider selecting two different samples of size n from the same population distribution. The xi s in the second sample will virtually always differ at least a bit from

those in the first sample. For example, a first sample of n ϭ 3 cars of a particular

type might result in fuel efficiencies x1 ϭ 30.7, x2 ϭ 29.4, x3 ϭ 31.1, whereas a second sample may give x1 ϭ 28.8, x2 ϭ 30.0, and x3 ϭ 31.1. Before we obtain data,

there is uncertainty about the value of each xi. Because of this uncertainty, before the

data becomes available we view each observation as a random variable and denote

the sample by X1, X2, . . . , Xn (uppercase letters for random variables).

This variation in observed values in turn implies that the value of any function

of the sample observations—such as the sample mean, sample standard deviation, or

sample fourth spread—also varies from sample to sample. That is, prior to obtaining

x1, . . . , xn, there is uncertainty as to the value of xෆ, the value of s, and so on.

Example 5.19

Suppose that material strength for a randomly selected specimen of a particular type

has a Weibull distribution with parameter values ␣ ϭ 2 (shape) and ␤ ϭ 5 (scale).

5.3 Statistics and Their Distributions

203

The corresponding density curve is shown in Figure 5.6. Formulas from Section 4.5

give

~ ϭ 4.1628

␴ ϭ 2.316

␮ ϭ E(X) ϭ 4.4311

␮

␴ 2 ϭ V(X) ϭ 5.365

The mean exceeds the median because of the distribution’s positive skew.

f(x)

.15

.10

.05

0

0

5

Figure 5.6

10

15

x

The Weibull density curve for Example 5.19

We used MINITAB to generate six different samples, each with n ϭ 10, from

this distribution (material strengths for six different groups of ten specimens each).

The results appear in Table 5.1, followed by the values of the sample mean, sample

median, and sample standard deviation for each sample. Notice first that the ten

observations in any particular sample are all different from those in any other sample. Second, the six values of the sample mean are all different from one another, as

are the six values of the sample median and the six values of the sample standard

deviation. The same is true of the sample 10% trimmed means, sample fourth spreads,

and so on.

Table 5.1 Samples from the Weibull Distribution of Example 5.19

Sample

1

2

3

4

5

6

1

2

3

4

5

6

7

8

9

10

x

x˜

s

6.1171

4.1600

3.1950

0.6694

1.8552

5.2316

2.7609

10.2185

5.2438

4.5590

4.401

4.360

2.642

5.07611

6.79279

4.43259

8.55752

6.82487

7.39958

2.14755

8.50628

5.49510

4.04525

5.928

6.144

2.062

3.46710

2.71938

5.88129

5.14915

4.99635

5.86887

6.05918

1.80119

4.21994

2.12934

4.229

4.608

1.611

1.55601

4.56941

4.79870

2.49759

2.33267

4.01295

9.08845

3.25728

3.70132

5.50134

4.132

3.857

2.124

3.12372

6.09685

3.41181

1.65409

2.29512

2.12583

3.20938

3.23209

6.84426

4.20694

3.620

3.221

1.678

8.93795

3.92487

8.76202

7.05569

2.30932

5.94195

6.74166

1.75468

4.91827

7.26081

5.761

6.342

2.496

204

CHAPTER 5

Joint Probability Distributions and Random Samples

Furthermore, the value of the sample mean from any particular sample can be

regarded as a point estimate (“point” because it is a single number, corresponding to

a single point on the number line) of the population mean ␮, whose value is known

to be 4.4311. None of the estimates from these six samples is identical to what is

being estimated. The estimates from the second and sixth samples are much too

large, whereas the fifth sample gives a substantial underestimate. Similarly, the sample standard deviation gives a point estimate of the population standard deviation. All

six of the resulting estimates are in error by at least a small amount.

In summary, the values of the individual sample observations vary from sample

to sample, so in general the value of any quantity computed from sample data, and the

value of a sample characteristic used as an estimate of the corresponding population

characteristic, will virtually never coincide with what is being estimated.

■

DEFINITION

A statistic is any quantity whose value can be calculated from sample data.

Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable and will be denoted

by an uppercase letter; a lowercase letter is used to represent the calculated or

observed value of the statistic.

Thus the sample mean, regarded as a statistic (before a sample has been selected or

an experiment carried out), is denoted by ෆ

X; the calculated value of this statistic is ෆx.

Similarly, S represents the sample standard deviation thought of as a statistic, and its

computed value is s. If samples of two different types of bricks are selected and the

individual compressive strengths are denoted by X1, . . . , Xm and Y1, . . . , Yn, respectively, then the statistic X

ෆϪෆ

Y, the difference between the two sample mean compressive strengths, is often of great interest.

Any statistic, being a random variable, has a probability distribution. In particular, the sample mean ෆ

X has a probability distribution. Suppose, for example, that n ϭ 2

components are randomly selected and the number of breakdowns while under warranty is determined for each one. Possible values for the sample mean number of

breakdowns ෆ

X are 0 (if X1 ϭ X2 ϭ 0), .5 (if either X1 ϭ 0 and X2 ϭ 1 or X1 ϭ 1 and

X2 ϭ 0), 1, 1.5, . . . . The probability distribution of ෆ

X specifies P(X

ෆ ϭ 0), P(X

ෆ ϭ .5), and

ෆ Ն 2.5) can be calso on, from which other probabilities such as P(1 Յ ෆ

X Յ 3) and P(X

culated. Similarly, if for a sample of size n ϭ 2, the only possible values of the sample

variance are 0, 12.5, and 50 (which is the case if X1 and X2 can each take on only the

values 40, 45, or 50), then the probability distribution of S 2 gives P(S 2 ϭ 0),

P(S 2 ϭ 12.5), and P(S 2 ϭ 50). The probability distribution of a statistic is sometimes

referred to as its sampling distribution to emphasize that it describes how the statistic

varies in value across all samples that might be selected.

Random Samples

The probability distribution of any particular statistic depends not only on the population distribution (normal, uniform, etc.) and the sample size n but also on the

method of sampling. Consider selecting a sample of size n ϭ 2 from a population

consisting of just the three values 1, 5, and 10, and suppose that the statistic of

interest is the sample variance. If sampling is done “with replacement,” then

S 2 ϭ 0 will result if X1 ϭ X2. However, S 2 cannot equal 0 if sampling is “without

replacement.” So P(S 2 ϭ 0) ϭ 0 for one sampling method, and this probability is

5.3 Statistics and Their Distributions

205

positive for the other method. Our next definition describes a sampling method

often encountered (at least approximately) in practice.

DEFINITION

The rv’s X1, X2, . . . , Xn are said to form a (simple) random sample of size

n if

1. The Xi s are independent rv’s.

2. Every Xi has the same probability distribution.

Conditions 1 and 2 can be paraphrased by saying that the Xi s are independent and

identically distributed (iid). If sampling is either with replacement or from an infinite (conceptual) population, Conditions 1 and 2 are satisfied exactly. These conditions will be approximately satisfied if sampling is without replacement, yet the

sample size n is much smaller than the population size N. In practice, if n/N Յ .05

(at most 5% of the population is sampled), we can proceed as if the Xi s form a random sample. The virtue of this sampling method is that the probability distribution

of any statistic can be more easily obtained than for any other sampling method.

There are two general methods for obtaining information about a statistic’s

sampling distribution. One method involves calculations based on probability rules,

and the other involves carrying out a simulation experiment.

Deriving a Sampling Distribution

Probability rules can be used to obtain the distribution of a statistic provided that it

is a “fairly simple” function of the Xi s and either there are relatively few different X

values in the population or else the population distribution has a “nice” form. Our

next two examples illustrate such situations.

Example 5.20

A large automobile service center charges $40, $45, and $50 for a tune-up of four-,

six-, and eight-cylinder cars, respectively. If 20% of its tune-ups are done on fourcylinder cars, 30% on six-cylinder cars, and 50% on eight-cylinder cars, then the probability distribution of revenue from a single randomly selected tune-up is given by

x

p(x)

|

|

40

45

50

.2

.3

.5

with ␮ ϭ 46.5, ␴ 2 ϭ 15.25

(5.2)

Suppose on a particular day only two servicing jobs involve tune-ups. Let X1 ϭ the

revenue from the first tune-up and X2 ϭ the revenue from the second. Suppose that

X1 and X2 are independent, each with the probability distribution shown in (5.2)

[so that X1 and X2 constitute a random sample from the distribution (5.2)]. Table 5.2

lists possible (x1, x2) pairs, the probability of each [computed using (5.2) and the

assumption of independence], and the resulting xෆ and s2 values. Now to obtain the

ෆ, the sample average revenue per tune-up, we must conprobability distribution of X

sider each possible value xෆ and compute its probability. For example, xෆ ϭ 45 occurs

three times in the table with probabilities .10, .09, and .10, so

Similarly,

p X (45) ϭ P(X

ෆ ϭ 45) ϭ .10 ϩ .09 ϩ .10 ϭ .29

ෆ

pS (50) ϭ P(S 2 ϭ 50) ϭ P(X1 ϭ 40, X2 ϭ 50

ϭ .10 ϩ .10 ϭ .20

2

or X1 ϭ 50, X2 ϭ 40)

206

CHAPTER 5

Joint Probability Distributions and Random Samples

Table 5.2 Outcomes, Probabilities, and Values

of x and s2 for Example 5.20

x1

x2

p(x1, x2)

xෆ

s2

40

40

40

45

45

45

50

50

50

40

45

50

40

45

50

40

45

50

.04

.06

.10

.06

.09

.15

.10

.15

.25

40

42.5

45

42.5

45

47.5

45

47.5

50

0

12.5

50

12.5

0

12.5

50

12.5

0

The complete sampling distributions of ෆ

X and S 2 appear in (5.3) and (5.4).

xෆ

pX (xෆ)

ෆ

s2

pS2(s 2)

|

|

|

|

40

42.5

45

47.5

50

.04

.12

.29

.30

.25

0

12.5

50

.38

.42

.20

(5.3)

(5.4)

Figure 5.7 pictures a probability histogram for both the original distribution (5.2)

and the X

ෆ distribution (5.3). The figure suggests first that the mean (expected value)

of the X

ෆ distribution is equal to the mean 46.5 of the original distribution, since both

histograms appear to be centered at the same place.

.5

.29

.30

.3

.25

.12

.2

.04

40

45

50

40

42.5

45

47.5

50

Figure 5.7 Probability histograms for the underlying distribution and ෆ

X distribution in Example 5.20

From (5.3),

␮X ϭ E(X

ෆ ) ϭ Α xෆpX(xෆ) ϭ (40)(.04) ϩ . . . ϩ (50)(.25) ϭ 46.5 ϭ ␮

ෆ

ෆ

Second, it appears that the ෆ

X distribution has smaller spread (variability) than the

original distribution, since probability mass has moved in toward the mean. Again

from (5.3),

␴ X2 ϭ V(X

ෆ ) ϭ Α xෆ 2 и pX(xෆ) Ϫ ␮X2

ෆ

ෆ

ෆ

ϭ (40)2(.04) ϩ . . . ϩ (50)2(.25) Ϫ (46.5)2

␴2

15.25

ϭ 7.625 ϭ ᎏ ϭ ᎏ

2

2

The variance of ෆ

X is precisely half that of the original variance (because n ϭ 2).

The mean value of S 2 is

␮S ϭ E(S2) ϭ Α s2 и pS (s2)

2

2

ϭ (0)(.38) ϩ (12.5)(.42) ϩ (50)(.20) ϭ 15.25 ϭ ␴ 2

5.3 Statistics and Their Distributions

207

That is, the ෆ

X sampling distribution is centered at the population mean ␮, and the S 2

sampling distribution is centered at the population variance ␴ 2.

If four tune-ups had been done on the day of interest, the sample average

revenue ෆ

X would be based on a random sample of four Xi s, each having the distribution (5.2). More calculation eventually yields the pmf of X

ෆ for n ϭ 4 as

xෆ

p X (xෆ)

ෆ

|

|

40

41.25

42.5

43.75

45

46.25

47.5

48.75

50

.0016

.0096

.0376

.0936

.1761

.2340

.2350

.1500

.0625

2

From this, ␮ X ϭ 46.50 ϭ␮ and ␴ X ϭ 3.8125 ϭ ␴ 2/4. Figure 5.8 is a probability hisෆ

ෆ

togram of this pmf.

40

Figure 5.8

42.5

45

47.5

50

Probability histogram for ෆ

X based on n ϭ 4 in Example 5.20

■

Example 5.20 should suggest first of all that the computation of pX(xෆ) and

ෆ

pS 2(s2) can be tedious. If the original distribution (5.2) had allowed for more than

three possible values 40, 45, and 50, then even for n ϭ 2 the computations would

have been more involved. The example should also suggest, however, that there are

some general relationships between E(X

ෆ), V(X

ෆ), E(S 2), and the mean ␮ and variance

2

␴ of the original distribution. These are stated in the next section. Now consider an

example in which the random sample is drawn from a continuous distribution.

Example 5.21

Service time for a certain type of bank transaction is a random variable having an exponential distribution with parameter ␭. Suppose X1 and X2 are service times for two different customers, assumed independent of each other. Consider the total service time

To ϭ X1 ϩ X2 for the two customers, also a statistic. The cdf of To is, for t Ն 0,

͵͵

FT (t) ϭ P(X1 ϩ X2 Յ t) ϭ

0

ϭ

f(x1, x2) dx1 dx2

{(x1, x2):x1ϩx2Յt}

͵͵

t tϪx1

␭eϪ␭x и ␭eϪ␭x dx2 dx1 ϭ

1

0 0

2

͵ [␭e

t

0

Ϫ␭x1

Ϫ ␭eϪ␭t] dx1

ϭ 1 Ϫ eϪ␭t Ϫ ␭teϪ␭t

The region of integration is pictured in Figure 5.9.

x2

(x1, t ؊ x1)

x1

x1

Figure 5.9

؉

x2

‫؍‬

t

x1

Region of integration to obtain cdf of To in Example 5.21

Xem Thêm

2 Expected Values, Covariance, and Correlation

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về