Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (15.74 MB, 756 trang )
5.2 Expected Values, Covariance, and Correlation
PROPOSITION
Let X and Y be jointly distributed rv’s with pmf p(x, y) or pdf f (x, y) according
to whether the variables are discrete or continuous. Then the expected value of
a function h(X, Y ), denoted by E[h(X, Y )] or h(X, Y ), is given by
E [h(X, Y )] ϭ
Example 5.13
197
{
Α Α h(x, y) и p(x, y)
if X and Y are discrete
͵ ͵
if X and Y are continuous
x y
∞ ∞
h(x, y) и f (x, y) dx dy
Ϫ∞ Ϫ∞
Five friends have purchased tickets to a certain concert. If the tickets are for seats
1–5 in a particular row and the tickets are randomly distributed among the five, what
is the expected number of seats separating any particular two of the five? Let X and
Y denote the seat numbers of the first and second individuals, respectively. Possible
(X, Y) pairs are {(1, 2), (1, 3), . . . , (5, 4)}, and the joint pmf of (X, Y) is
{
1
ᎏᎏ
p(x, y) ϭ 20
0
x ϭ 1, . . . , 5; y ϭ 1, . . . , 5; x
y
otherwise
The number of seats separating the two individuals is h(X, Y) ϭ ⏐X Ϫ Y⏐ Ϫ 1. The
accompanying table gives h(x, y) for each possible (x, y) pair.
h(x, y)
y
1
2
3
4
5
|
|
|
|
|
|
1
2
x
3
4
5
—
0
1
2
3
0
—
0
1
2
1
0
—
0
1
2
1
0
—
0
3
2
1
0
—
Thus
5
E[h(X, Y )] ϭ ΑΑ h(x, y) и p(x, y) ϭ Α
1
■
xϭ1 yϭ1
(x, y)
x
Example 5.14
5
ϭ1
Α (⏐x Ϫ y⏐ Ϫ 1) и ᎏ
20
y
In Example 5.5, the joint pdf of the amount X of almonds and amount Y of cashews
in a 1-lb can of nuts was
f (x, y) ϭ
{
24xy 0 Յ x Յ 1, 0 Յ y Յ 1, x ϩ y Յ 1
0 otherwise
If 1 lb of almonds costs the company $1.00, 1 lb of cashews costs $1.50, and 1 lb of
peanuts costs $.50, then the total cost of the contents of a can is
h(X, Y ) ϭ (1)X ϩ (1.5)Y ϩ (.5)(1 Ϫ X Ϫ Y ) ϭ .5 ϩ .5X ϩ Y
(since 1 Ϫ X Ϫ Y of the weight consists of peanuts). The expected total cost is
͵͵
ϭ͵ ͵
E[h(X, Y )] ϭ
ϱ ϱ
h(x, y) и f(x, y) dx dy
᎐ϱ ᎐ϱ
1 1Ϫx
0 0
(.5 ϩ .5x ϩ y) и 24xy dy dx ϭ $1.10
■
The method of computing the expected value of a function h(X1, . . . , Xn) of
n random variables is similar to that for two random variables. If the Xi s are discrete, E[h(X1, . . . , Xn)] is an n-dimensional sum; if the Xi s are continuous, it is
an n-dimensional integral.
198
CHAPTER 5
Joint Probability Distributions and Random Samples
Covariance
When two random variables X and Y are not independent, it is frequently of interest
to assess how strongly they are related to one another.
DEFINITION
The covariance between two rv’s X and Y is
Cov(X, Y ) ϭ E[(X Ϫ X)(Y Ϫ Y)]
ϭ
{
Α Α (x Ϫ X)(y Ϫ Y)p(x, y)
X, Y discrete
͵͵
X, Y continuous
x y
ϱ ϱ
᎐ϱ ᎐ϱ
(x Ϫ X)(y Ϫ Y)f (x, y) dx dy
That is, since X Ϫ X and Y Ϫ Y are the deviations of the two variables from their
respective mean values, the covariance is the expected product of deviations. Note
that Cov(X, X) ϭ E[(X Ϫ X)2] ϭ V(X).
The rationale for the definition is as follows. Suppose X and Y have a strong
positive relationship to one another, by which we mean that large values of X tend
to occur with large values of Y and small values of X with small values of Y. Then
most of the probability mass or density will be associated with (x Ϫ X) and (y Ϫ Y),
either both positive (both X and Y above their respective means) or both negative,
so the product (x Ϫ X)(y Ϫ Y) will tend to be positive. Thus for a strong positive
relationship, Cov(X, Y ) should be quite positive. For a strong negative relationship,
the signs of (x Ϫ X) and (y Ϫ Y) will tend to be opposite, yielding a negative
product. Thus for a strong negative relationship, Cov(X, Y ) should be quite negative. If X and Y are not strongly related, positive and negative products will tend to
cancel one another, yielding a covariance near 0. Figure 5.4 illustrates the different
possibilities. The covariance depends on both the set of possible pairs and the probabilities. In Figure 5.4, the probabilities could be changed without altering the set
of possible pairs, and this could drastically change the value of Cov(X, Y ).
y
y
؊
y
؉
؊
Y
؉
Y
؉
؊
Y
؉
x
X
(a)
؊
x
x
X
(b)
X
(c)
Figure 5.4 p(x, y ) ϭ 1/10 for each of ten pairs corresponding to indicated points; (a) positive
covariance; (b) negative covariance; (c) covariance near zero
Example 5.15
The joint and marginal pmf’s for X ϭ automobile policy deductible amount and Y ϭ
homeowner policy deductible amount in Example 5.1 were
p(x, y)
x
100
250
|
|
|
0
y
100
200
x
.20
.05
.10
.15
.20
.30
pX(x)
|
|
100
250
y
.5
.5
pY (y)
|
|
0
100
200
.25
.25
.5
5.2 Expected Values, Covariance, and Correlation
199
from which X ϭ ΑxpX(x) ϭ 175 and Y ϭ 125. Therefore,
Cov(X, Y ) ϭ ΑΑ (x Ϫ 175)(y Ϫ 125)p(x, y)
(x, y)
ϭ (100 Ϫ 175)(0 Ϫ 125)(.20) ϩ . . .
ϩ (250 Ϫ 175)(200 Ϫ 125)(.30)
■
ϭ 1875
The following shortcut formula for Cov(X, Y ) simplifies the computations.
Cov(X, Y) ϭ E(XY) Ϫ X и Y
PROPOSITION
According to this formula, no intermediate subtractions are necessary; only at the end
of the computation is X и Y subtracted from E(XY). The proof involves expanding
(X Ϫ X)(Y Ϫ Y) and then taking the expected value of each term separately.
Example 5.16
(Example 5.5
continued)
The joint and marginal pdf’s of X ϭ amount of almonds and Y ϭ amount of cashews
were
0 Յ x Յ 1, 0 Յ y Յ 1, x ϩ y Յ 1
{24xy0 otherwise
12x(1 Ϫ x) 0 Յ x Յ 1
f (x) ϭ {
0
otherwise
f(x, y) ϭ
2
X
with fY (y) obtained by replacing x by y in fX(x). It is easily verified that X ϭ Y ϭ ᎏ5ᎏ,
and
2
E(XY ) ϭ
͵͵
ϱ ϱ
᎐ϱ ᎐ϱ
xy f (x, y) dx dy ϭ
͵͵
1 1Ϫx
0 0
xy и 24xy dy dx
͵
1
2
ϭ 8 x2(1 Ϫ x)3 dx ϭ ᎏᎏ
15
0
Thus Cov(X, Y) ϭ ᎏ15ᎏ Ϫ (ᎏ5ᎏ)(ᎏ5ᎏ) ϭ ᎏ15ᎏ Ϫ ᎏ25ᎏ ϭ ᎐ ᎏ75ᎏ. A negative covariance is reasonable
here because more almonds in the can implies fewer cashews.
■
2
2
2
2
4
2
It might appear that the relationship in the insurance example is quite strong
2
since Cov(X, Y) ϭ 1875, whereas Cov(X, Y) ϭ ᎐ ᎏ75ᎏ in the nut example would seem
to imply quite a weak relationship. Unfortunately, the covariance has a serious defect
that makes it impossible to interpret a computed value. In the insurance example,
suppose we had expressed the deductible amount in cents rather than in dollars. Then
100X would replace X, 100Y would replace Y, and the resulting covariance would be
Cov(100X, 100Y) ϭ (100)(100)Cov(X, Y) ϭ 18,750,000. If, on the other hand, the
deductible amount had been expressed in hundreds of dollars, the computed covariance would have been (.01)(.01)(1875) ϭ .1875. The defect of covariance is that its
computed value depends critically on the units of measurement. Ideally, the choice
of units should have no effect on a measure of strength of relationship. This is
achieved by scaling the covariance.
200
CHAPTER 5
Joint Probability Distributions and Random Samples
Correlation
DEFINITION
The correlation coefficient of X and Y, denoted by Corr(X, Y ), X,Y, or just ,
is defined by
Cov (X, Y)
X,Y ϭ ᎏᎏ
X и Y
Example 5.17
It is easily verified that in the insurance scenario of Example 5.15, E(X2) ϭ 36,250,
X2 ϭ 36,250 Ϫ (175)2 ϭ 5625, X ϭ 75, E(Y 2) ϭ 22,500, Y2 ϭ 6875, and Y ϭ
82.92. This gives
1875
ϭ ᎏᎏ ϭ .301
(75)(82.92)
■
The following proposition shows that remedies the defect of Cov(X, Y ) and
also suggests how to recognize the existence of a strong (linear) relationship.
PROPOSITION
1. If a and c are either both positive or both negative,
Corr(aX ϩ b, cY ϩ d) ϭ Corr(X, Y )
2. For any two rv’s X and Y, Ϫ1 Յ Corr(X, Y) Յ 1.
Statement 1 says precisely that the correlation coefficient is not affected by a linear
change in the units of measurement (if, say, X ϭ temperature in °C, then 9X/5 ϩ 32 ϭ
temperature in °F). According to Statement 2, the strongest possible positive relationship is evidenced by ϭ ϩ1, whereas the strongest possible negative relationship corresponds to ϭ ᎐1. The proof of the first statement is sketched in Exercise 35, and that
of the second appears in Supplementary Exercise 87 at the end of the chapter. For
descriptive purposes, the relationship will be described as strong if ⏐⏐ Ն .8, moderate
if .5 Ͻ ⏐⏐ Ͻ .8, and weak if ⏐⏐ Յ .5.
If we think of p(x, y) or f(x, y) as prescribing a mathematical model for how the
two numerical variables X and Y are distributed in some population (height and weight,
verbal SAT score and quantitative SAT score, etc.), then is a population characteristic or parameter that measures how strongly X and Y are related in the population.
In Chapter 12, we will consider taking a sample of pairs (x1, y1), . . . , (xn, yn) from the
population. The sample correlation coefficient r will then be defined and used to make
inferences about .
The correlation coefficient is actually not a completely general measure of
the strength of a relationship.
PROPOSITION
1. If X and Y are independent, then ϭ 0, but ϭ 0 does not imply independence.
2. ϭ 1 or ᎐1 iff Y ϭ aX ϩ b for some numbers a and b with a
0.
This proposition says that is a measure of the degree of linear relationship between
X and Y, and only when the two variables are perfectly related in a linear manner will
be as positive or negative as it can be. A less than 1 in absolute value indicates only
5.2 Expected Values, Covariance, and Correlation
201
that the relationship is not completely linear, but there may still be a very strong nonlinear relation. Also, ϭ 0 does not imply that X and Y are independent, but only that
there is complete absence of a linear relationship. When ϭ 0, X and Y are said to be
uncorrelated. Two variables could be uncorrelated yet highly dependent because there
is a strong nonlinear relationship, so be careful not to conclude too much from knowing that ϭ 0.
Example 5.18
Let X and Y be discrete rv’s with joint pmf
{
1
ᎏᎏ (x, y) ϭ (᎐4, 1), (4, ᎐1), (2, 2), (᎐2, ᎐2)
p(x, y) ϭ 4
0 otherwise
The points that receive positive probability mass are identified on the (x, y) coordinate system in Figure 5.5. It is evident from the figure that the value of X is completely
determined by the value of Y and vice versa, so the two variables are completely
1
1
dependent. However, by symmetry X ϭ Y ϭ 0 and E(XY) ϭ (᎐4)ᎏ4ᎏ ϩ (᎐4)ᎏ4ᎏ ϩ
1
1
(4)ᎏ4ᎏ ϩ (4)ᎏ4ᎏ ϭ 0, so Cov(X, Y) ϭ E(XY) Ϫ X и Y ϭ 0 and thus X,Y ϭ 0. Although
there is perfect dependence, there is also complete absence of any linear relationship!
2
1
؊4
؊3
؊2
؊1
؊1
1
2
3
4
؊2
Figure 5.5
The population of pairs for Example 5.18
■
A value of near 1 does not necessarily imply that increasing the value of
X causes Y to increase. It implies only that large X values are associated with
large Y values. For example, in the population of children, vocabulary size and number of cavities are quite positively correlated, but it is certainly not true that cavities
cause vocabulary to grow. Instead, the values of both these variables tend to increase
as the value of age, a third variable, increases. For children of a fixed age, there is
probably a very low correlation between number of cavities and vocabulary size. In
summary, association (a high correlation) is not the same as causation.
EXERCISES
Section 5.2 (22–36)
22. An instructor has given a short quiz consisting of two parts.
For a randomly selected student, let X ϭ the number of points
earned on the first part and Y ϭ the number of points earned
on the second part. Suppose that the joint pmf of X and Y is
given in the accompanying table.
y
p(x, y)
x
0
5
10
|
|
|
|
0
5
10
15
.02
.04
.01
.06
.15
.15
.02
.20
.14
.10
.10
.01
a. If the score recorded in the grade book is the total number of points earned on the two parts, what is the expected
recorded score E(X ϩ Y)?
b. If the maximum of the two scores is recorded, what is the
expected recorded score?
23. The difference between the number of customers in line at
the express checkout and the number in line at the superexpress checkout in Exercise 3 is X1 Ϫ X2. Calculate the
expected difference.
24. Six individuals, including A and B, take seats around a circular table in a completely random fashion. Suppose the seats
202
CHAPTER 5
Joint Probability Distributions and Random Samples
are numbered 1, . . . , 6. Let X ϭ A’s seat number and Y ϭ B’s
seat number. If A sends a written message around the table
to B in the direction in which they are closest, how many
individuals (including A and B) would you expect to handle
the message?
25. A surveyor wishes to lay out a square region with each side
having length L. However, because of measurement error, he
instead lays out a rectangle in which the north–south sides
both have length X and the east–west sides both have length
Y. Suppose that X and Y are independent and that each
is uniformly distributed on the interval [L Ϫ A, L ϩ A]
(where 0 Ͻ A Ͻ L). What is the expected area of the resulting rectangle?
26. Consider a small ferry that can accommodate cars and buses.
The toll for cars is $3, and the toll for buses is $10. Let X and
Y denote the number of cars and buses, respectively, carried
on a single trip. Suppose the joint distribution of X and Y is
as given in the table of Exercise 7. Compute the expected
revenue from a single trip.
27. Annie and Alvie have agreed to meet for lunch between noon
(0:00 P.M.) and 1:00 P.M. Denote Annie’s arrival time by X,
Alvie’s by Y, and suppose X and Y are independent with pdf’s
{3x0
2y
f (y) ϭ {
0
fX(x) ϭ
2
Y
0ՅxՅ1
otherwise
0ՅyՅ1
otherwise
What is the expected amount of time that the one who
arrives first must wait for the other person? [Hint: h(X, Y ) ϭ
⏐X Ϫ Y⏐.]
28. Show that if X and Y are independent rv’s, then E(XY) ϭ
E(X) и E(Y). Then apply this in Exercise 25. [Hint: Consider
the continuous case with f(x, y) ϭ fX (x) и fY (y).]
29. Compute the correlation coefficient for X and Y of
Example 5.16 (the covariance has already been computed).
30. a. Compute the covariance for X and Y in Exercise 22.
b. Compute for X and Y in the same exercise.
31. a. Compute the covariance between X and Y in Exercise 9.
b. Compute the correlation coefficient for this X and Y.
32. Reconsider the minicomputer component lifetimes X and
Y as described in Exercise 12. Determine E(XY ). What can
be said about Cov(X, Y ) and ?
33. Use the result of Exercise 28 to show that when X and Y are
independent, Cov(X, Y) ϭ Corr(X, Y) ϭ 0.
34. a. Recalling the definition of 2 for a single rv X, write a
formula that would be appropriate for computing the variance of a function h(X, Y) of two random variables.
[Hint: Remember that variance is just a special expected
value.]
b. Use this formula to compute the variance of the recorded
score h(X, Y) [ϭ max(X, Y)] in part (b) of Exercise 22.
35. a. Use the rules of expected value to show that Cov(aX ϩ b,
cY ϩ d) ϭ ac Cov(X, Y).
b. Use part (a) along with the rules of variance and standard
deviation to show that Corr(aX ϩ b, cY ϩ d) ϭ Corr(X, Y)
when a and c have the same sign.
c. What happens if a and c have opposite signs?
36. Show that if Y ϭ aX ϩ b (a 0), then Corr(X, Y) ϭ ϩ1 or ᎐1.
Under what conditions will ϭ ϩ1?
5.3 Statistics and Their Distributions
The observations in a single sample were denoted in Chapter 1 by x1, x2, . . . , xn.
Consider selecting two different samples of size n from the same population distribution. The xi s in the second sample will virtually always differ at least a bit from
those in the first sample. For example, a first sample of n ϭ 3 cars of a particular
type might result in fuel efficiencies x1 ϭ 30.7, x2 ϭ 29.4, x3 ϭ 31.1, whereas a second sample may give x1 ϭ 28.8, x2 ϭ 30.0, and x3 ϭ 31.1. Before we obtain data,
there is uncertainty about the value of each xi. Because of this uncertainty, before the
data becomes available we view each observation as a random variable and denote
the sample by X1, X2, . . . , Xn (uppercase letters for random variables).
This variation in observed values in turn implies that the value of any function
of the sample observations—such as the sample mean, sample standard deviation, or
sample fourth spread—also varies from sample to sample. That is, prior to obtaining
x1, . . . , xn, there is uncertainty as to the value of xෆ, the value of s, and so on.
Example 5.19
Suppose that material strength for a randomly selected specimen of a particular type
has a Weibull distribution with parameter values ␣ ϭ 2 (shape) and  ϭ 5 (scale).
5.3 Statistics and Their Distributions
203
The corresponding density curve is shown in Figure 5.6. Formulas from Section 4.5
give
~ ϭ 4.1628
ϭ 2.316
ϭ E(X) ϭ 4.4311
2 ϭ V(X) ϭ 5.365
The mean exceeds the median because of the distribution’s positive skew.
f(x)
.15
.10
.05
0
0
5
Figure 5.6
10
15
x
The Weibull density curve for Example 5.19
We used MINITAB to generate six different samples, each with n ϭ 10, from
this distribution (material strengths for six different groups of ten specimens each).
The results appear in Table 5.1, followed by the values of the sample mean, sample
median, and sample standard deviation for each sample. Notice first that the ten
observations in any particular sample are all different from those in any other sample. Second, the six values of the sample mean are all different from one another, as
are the six values of the sample median and the six values of the sample standard
deviation. The same is true of the sample 10% trimmed means, sample fourth spreads,
and so on.
Table 5.1 Samples from the Weibull Distribution of Example 5.19
Sample
1
2
3
4
5
6
1
2
3
4
5
6
7
8
9
10
x
x˜
s
6.1171
4.1600
3.1950
0.6694
1.8552
5.2316
2.7609
10.2185
5.2438
4.5590
4.401
4.360
2.642
5.07611
6.79279
4.43259
8.55752
6.82487
7.39958
2.14755
8.50628
5.49510
4.04525
5.928
6.144
2.062
3.46710
2.71938
5.88129
5.14915
4.99635
5.86887
6.05918
1.80119
4.21994
2.12934
4.229
4.608
1.611
1.55601
4.56941
4.79870
2.49759
2.33267
4.01295
9.08845
3.25728
3.70132
5.50134
4.132
3.857
2.124
3.12372
6.09685
3.41181
1.65409
2.29512
2.12583
3.20938
3.23209
6.84426
4.20694
3.620
3.221
1.678
8.93795
3.92487
8.76202
7.05569
2.30932
5.94195
6.74166
1.75468
4.91827
7.26081
5.761
6.342
2.496
204
CHAPTER 5
Joint Probability Distributions and Random Samples
Furthermore, the value of the sample mean from any particular sample can be
regarded as a point estimate (“point” because it is a single number, corresponding to
a single point on the number line) of the population mean , whose value is known
to be 4.4311. None of the estimates from these six samples is identical to what is
being estimated. The estimates from the second and sixth samples are much too
large, whereas the fifth sample gives a substantial underestimate. Similarly, the sample standard deviation gives a point estimate of the population standard deviation. All
six of the resulting estimates are in error by at least a small amount.
In summary, the values of the individual sample observations vary from sample
to sample, so in general the value of any quantity computed from sample data, and the
value of a sample characteristic used as an estimate of the corresponding population
characteristic, will virtually never coincide with what is being estimated.
■
DEFINITION
A statistic is any quantity whose value can be calculated from sample data.
Prior to obtaining data, there is uncertainty as to what value of any particular statistic will result. Therefore, a statistic is a random variable and will be denoted
by an uppercase letter; a lowercase letter is used to represent the calculated or
observed value of the statistic.
Thus the sample mean, regarded as a statistic (before a sample has been selected or
an experiment carried out), is denoted by ෆ
X; the calculated value of this statistic is ෆx.
Similarly, S represents the sample standard deviation thought of as a statistic, and its
computed value is s. If samples of two different types of bricks are selected and the
individual compressive strengths are denoted by X1, . . . , Xm and Y1, . . . , Yn, respectively, then the statistic X
ෆϪෆ
Y, the difference between the two sample mean compressive strengths, is often of great interest.
Any statistic, being a random variable, has a probability distribution. In particular, the sample mean ෆ
X has a probability distribution. Suppose, for example, that n ϭ 2
components are randomly selected and the number of breakdowns while under warranty is determined for each one. Possible values for the sample mean number of
breakdowns ෆ
X are 0 (if X1 ϭ X2 ϭ 0), .5 (if either X1 ϭ 0 and X2 ϭ 1 or X1 ϭ 1 and
X2 ϭ 0), 1, 1.5, . . . . The probability distribution of ෆ
X specifies P(X
ෆ ϭ 0), P(X
ෆ ϭ .5), and
ෆ Ն 2.5) can be calso on, from which other probabilities such as P(1 Յ ෆ
X Յ 3) and P(X
culated. Similarly, if for a sample of size n ϭ 2, the only possible values of the sample
variance are 0, 12.5, and 50 (which is the case if X1 and X2 can each take on only the
values 40, 45, or 50), then the probability distribution of S 2 gives P(S 2 ϭ 0),
P(S 2 ϭ 12.5), and P(S 2 ϭ 50). The probability distribution of a statistic is sometimes
referred to as its sampling distribution to emphasize that it describes how the statistic
varies in value across all samples that might be selected.
Random Samples
The probability distribution of any particular statistic depends not only on the population distribution (normal, uniform, etc.) and the sample size n but also on the
method of sampling. Consider selecting a sample of size n ϭ 2 from a population
consisting of just the three values 1, 5, and 10, and suppose that the statistic of
interest is the sample variance. If sampling is done “with replacement,” then
S 2 ϭ 0 will result if X1 ϭ X2. However, S 2 cannot equal 0 if sampling is “without
replacement.” So P(S 2 ϭ 0) ϭ 0 for one sampling method, and this probability is
5.3 Statistics and Their Distributions
205
positive for the other method. Our next definition describes a sampling method
often encountered (at least approximately) in practice.
DEFINITION
The rv’s X1, X2, . . . , Xn are said to form a (simple) random sample of size
n if
1. The Xi s are independent rv’s.
2. Every Xi has the same probability distribution.
Conditions 1 and 2 can be paraphrased by saying that the Xi s are independent and
identically distributed (iid). If sampling is either with replacement or from an infinite (conceptual) population, Conditions 1 and 2 are satisfied exactly. These conditions will be approximately satisfied if sampling is without replacement, yet the
sample size n is much smaller than the population size N. In practice, if n/N Յ .05
(at most 5% of the population is sampled), we can proceed as if the Xi s form a random sample. The virtue of this sampling method is that the probability distribution
of any statistic can be more easily obtained than for any other sampling method.
There are two general methods for obtaining information about a statistic’s
sampling distribution. One method involves calculations based on probability rules,
and the other involves carrying out a simulation experiment.
Deriving a Sampling Distribution
Probability rules can be used to obtain the distribution of a statistic provided that it
is a “fairly simple” function of the Xi s and either there are relatively few different X
values in the population or else the population distribution has a “nice” form. Our
next two examples illustrate such situations.
Example 5.20
A large automobile service center charges $40, $45, and $50 for a tune-up of four-,
six-, and eight-cylinder cars, respectively. If 20% of its tune-ups are done on fourcylinder cars, 30% on six-cylinder cars, and 50% on eight-cylinder cars, then the probability distribution of revenue from a single randomly selected tune-up is given by
x
p(x)
|
|
40
45
50
.2
.3
.5
with ϭ 46.5, 2 ϭ 15.25
(5.2)
Suppose on a particular day only two servicing jobs involve tune-ups. Let X1 ϭ the
revenue from the first tune-up and X2 ϭ the revenue from the second. Suppose that
X1 and X2 are independent, each with the probability distribution shown in (5.2)
[so that X1 and X2 constitute a random sample from the distribution (5.2)]. Table 5.2
lists possible (x1, x2) pairs, the probability of each [computed using (5.2) and the
assumption of independence], and the resulting xෆ and s2 values. Now to obtain the
ෆ, the sample average revenue per tune-up, we must conprobability distribution of X
sider each possible value xෆ and compute its probability. For example, xෆ ϭ 45 occurs
three times in the table with probabilities .10, .09, and .10, so
Similarly,
p X (45) ϭ P(X
ෆ ϭ 45) ϭ .10 ϩ .09 ϩ .10 ϭ .29
ෆ
pS (50) ϭ P(S 2 ϭ 50) ϭ P(X1 ϭ 40, X2 ϭ 50
ϭ .10 ϩ .10 ϭ .20
2
or X1 ϭ 50, X2 ϭ 40)
206
CHAPTER 5
Joint Probability Distributions and Random Samples
Table 5.2 Outcomes, Probabilities, and Values
of x and s2 for Example 5.20
x1
x2
p(x1, x2)
xෆ
s2
40
40
40
45
45
45
50
50
50
40
45
50
40
45
50
40
45
50
.04
.06
.10
.06
.09
.15
.10
.15
.25
40
42.5
45
42.5
45
47.5
45
47.5
50
0
12.5
50
12.5
0
12.5
50
12.5
0
The complete sampling distributions of ෆ
X and S 2 appear in (5.3) and (5.4).
xෆ
pX (xෆ)
ෆ
s2
pS2(s 2)
|
|
|
|
40
42.5
45
47.5
50
.04
.12
.29
.30
.25
0
12.5
50
.38
.42
.20
(5.3)
(5.4)
Figure 5.7 pictures a probability histogram for both the original distribution (5.2)
and the X
ෆ distribution (5.3). The figure suggests first that the mean (expected value)
of the X
ෆ distribution is equal to the mean 46.5 of the original distribution, since both
histograms appear to be centered at the same place.
.5
.29
.30
.3
.25
.12
.2
.04
40
45
50
40
42.5
45
47.5
50
Figure 5.7 Probability histograms for the underlying distribution and ෆ
X distribution in Example 5.20
From (5.3),
X ϭ E(X
ෆ ) ϭ Α xෆpX(xෆ) ϭ (40)(.04) ϩ . . . ϩ (50)(.25) ϭ 46.5 ϭ
ෆ
ෆ
Second, it appears that the ෆ
X distribution has smaller spread (variability) than the
original distribution, since probability mass has moved in toward the mean. Again
from (5.3),
X2 ϭ V(X
ෆ ) ϭ Α xෆ 2 и pX(xෆ) Ϫ X2
ෆ
ෆ
ෆ
ϭ (40)2(.04) ϩ . . . ϩ (50)2(.25) Ϫ (46.5)2
2
15.25
ϭ 7.625 ϭ ᎏ ϭ ᎏ
2
2
The variance of ෆ
X is precisely half that of the original variance (because n ϭ 2).
The mean value of S 2 is
S ϭ E(S2) ϭ Α s2 и pS (s2)
2
2
ϭ (0)(.38) ϩ (12.5)(.42) ϩ (50)(.20) ϭ 15.25 ϭ 2
5.3 Statistics and Their Distributions
207
That is, the ෆ
X sampling distribution is centered at the population mean , and the S 2
sampling distribution is centered at the population variance 2.
If four tune-ups had been done on the day of interest, the sample average
revenue ෆ
X would be based on a random sample of four Xi s, each having the distribution (5.2). More calculation eventually yields the pmf of X
ෆ for n ϭ 4 as
xෆ
p X (xෆ)
ෆ
|
|
40
41.25
42.5
43.75
45
46.25
47.5
48.75
50
.0016
.0096
.0376
.0936
.1761
.2340
.2350
.1500
.0625
2
From this, X ϭ 46.50 ϭ and X ϭ 3.8125 ϭ 2/4. Figure 5.8 is a probability hisෆ
ෆ
togram of this pmf.
40
Figure 5.8
42.5
45
47.5
50
Probability histogram for ෆ
X based on n ϭ 4 in Example 5.20
■
Example 5.20 should suggest first of all that the computation of pX(xෆ) and
ෆ
pS 2(s2) can be tedious. If the original distribution (5.2) had allowed for more than
three possible values 40, 45, and 50, then even for n ϭ 2 the computations would
have been more involved. The example should also suggest, however, that there are
some general relationships between E(X
ෆ), V(X
ෆ), E(S 2), and the mean and variance
2
of the original distribution. These are stated in the next section. Now consider an
example in which the random sample is drawn from a continuous distribution.
Example 5.21
Service time for a certain type of bank transaction is a random variable having an exponential distribution with parameter . Suppose X1 and X2 are service times for two different customers, assumed independent of each other. Consider the total service time
To ϭ X1 ϩ X2 for the two customers, also a statistic. The cdf of To is, for t Ն 0,
͵͵
FT (t) ϭ P(X1 ϩ X2 Յ t) ϭ
0
ϭ
f(x1, x2) dx1 dx2
{(x1, x2):x1ϩx2Յt}
͵͵
t tϪx1
eϪx и eϪx dx2 dx1 ϭ
1
0 0
2
͵ [e
t
0
Ϫx1
Ϫ eϪt] dx1
ϭ 1 Ϫ eϪt Ϫ teϪt
The region of integration is pictured in Figure 5.9.
x2
(x1, t ؊ x1)
x1
x1
Figure 5.9
؉
x2
؍
t
x1
Region of integration to obtain cdf of To in Example 5.21