Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (21.9 MB, 497 trang )
CHAPTER
14
Learning Objectives
• Tolearnwhatismeantbyan
“association”betweentwo
variables
Making Use of
Associations Tests
Significant Associations Can Help
Managers Make Better Decisions
• Toexaminevariousrelationships
thatmaybeconstruedas
associations
Sima Vasa is Partner and Chief Executive
Officer of Paradigm Sample. She has 20 years
• Tounderstandwhereandhow
cross-tabulationswithChisquareanalysisareapplied
of experience in building and growing market
research businesses specifically in the technology space. Positions she has held include:
• Tobecomefamiliarwiththe
useandinterpretationof
correlations
• Tolearnhowtoobtainand
interpretcross-tabulations,Chisquarefindings,andcorrelations
withSPSS
Senior Partner at Momentum Market Intelligence, President of NPD Techworld at the
Sima Vasa,
Partner and CEO,
Paradigm Sample™
NPD Group, Vice President of the Technology
Division at NPD, and a member of the IBM
Market Intelligence Group. During her tenure
at NPD, Vasa spearheaded a $19 million business unit revenue increase
“Where We are”
1 Establish the need for marketing
research.
2 Define the problem.
3 Establish research objectives.
4 Determine research design.
5 Identify information types and
sources.
6 Determine methods of accessing
data.
7 Design data collection forms.
8 Determine the sample plan and size.
9 Collect data.
10 Analyze data.
11 Prepare and present the final
research report.
in three years to $40 million by refocusing priorities and aggressively
expanding the business portfolio.
As humans we are always searching for associations in life. What types
of recreational activities give you the most enjoyment? What types of foods
are tasteful to you? By knowing these associations, we can add enjoyment to our lives. Marketing managers are also searching for associations.
Which advertising copy will be associated with the highest awareness level
of the advertised brand? Which type of salesperson compensation and
reward packages will result in the highest level of satisfaction and the lowest turnover? By understanding these associations, marketers can achieve
higher sales and net profits while keeping costs low. This helps them earn
a higher ROA, which gives the owners of the firm a higher RONW.
Marketing managers learn these associations through trial and error,
just as you learn which activities give you the most pleasure or which
foods you prefer. However, sometimes the learning process takes too
long or being wrong is too costly in a business situation. Trial and error
is not always the best way to learn. In these situations a manager can
obtain information and examine it to see if there are associations. But,
what if information collected yesterday shows there is
an association, as in when ad copy B has the highest
awareness scores? How do we know that pattern of association will occur if we collect the information tomorrow? Or, next month? Is that pattern of association a
true pattern that exists in the population or only in the
one sample of information we collected? Fortunately,
Text and Images: By permission, Sima Vasa, Paradigm Sample™
the statisticians have given us ways to answer these questions. You will learn how to find
significant associations by reading this chapter.
Paradigm Sample provides market research insight, through innovative data access to
high-value audiences based on exclusive panels and partner panels along with proven mobile
technologies. To learn about Paradigm Sample’s innovative IdeaShifters and GlobalShifters
panels, refer to www.paradigmsample.com.
T
his chapter illustrates the usefulness of statistical analyses beyond simple descriptive
measures, statistical inference, and differences tests. Often, as we have described in the
opening comments of this chapter, marketers are interested in relationships among variables. For example, Frito-Lay wants to know what kinds of people and under what circumstances these people choose to buy Cheetos, Fritos, Lay’s potato chips, and any of the other
items in the Frito-Lay line. The Chevrolet Division of General Motors wants to know what
types of individuals would respond favorably to the various style changes proposed for the
Cruze. A newspaper wants to understand
the lifestyle characteristics of its subscribers so that it can modify or change
sections in the newspaper to better suit
its audience. Furthermore, the newspaper
desires information about various types
of subscribers so it can communicate this
information to its advertisers, helping
them in copy design and advertisement
placement within the various newspaper
sections. For all of these cases, statistical
procedures called associative analyses
are available to help identify answers to
these questions. Associative analyses
determine whether a stable relationship
exists between two variables; they are the
central topic of this chapter.
We begin the chapter by describing
the four different types of relationships
possible between two variables. Then
Photo: trekandshoot/Fotolia
Associative analyses
determine whether a
stable relationship exists
between two variables.
379
380
Chapter 14 • Making Use of assoCiations tests
we describe cross-tabulations and indicate how a cross-tabulation can be used to determine
whether a statistically significant association exists between the two variables. From crosstabulations, we move to a general discussion of correlation coefficients, and we illustrate the
use of Pearson product moment correlations. As in our previous analysis chapters, we show
the SPSS steps to perform these analyses and the resulting output.
Types of Relationships Between Two Variables
A relationship is a
consistent and systematic
linkage between the levels
or labels for two variables.
A nonmonotonic
relationship means two
variables are associated
but only in a general
sense.
As you learned in Chapter 8, every scale has unique descriptors, called levels or labels, that
identify the different demarcations of that scale. The term levels implies that the variable is
either an interval or a ratio scale; while the term labels implies that the level of measurement is
not scale, typically nominal. An example of a simple label is “yes” or “no,” as in if a respondent
is labeled as a buyer (yes) or nonbuyer (no) of a particular product or service. Of course, if the
researcher measures how many times a respondent bought a product, the level would be the
number of times, and the scale would satisfy the assumptions of a ratio scale.
A relationship is a consistent and systematic linkage between the levels for two scale
variables or between the labels for two nominal variables. This linkage is statistical, not
necessarily causal. A causal linkage is one in which it is certain one variable affected the
other; with a statistical linkage, there is no certainty because some other variable might
have had some influence. Nonetheless, statistical linkages or relationships often provide
insights that lead to understanding even though we do not know it there is a cause-andeffect relationship. For example, if we found a relationship that most marathon runners
purchased “Vitaminwater,” we understand that the ingredients are important to marathoners. Associative analysis procedures are useful because they determine if there is a
consistent and systematic relationship between the presence (label) or amount (level) of
one variable and the presence (label) or amount (level) of another variable. There are four
basic types of relationships between two variables: nonmonotonic, monotonic, linear, and
curvilinear. A discussion of each follows.
nOnmOnOtOnic reLatiOnships
A nonmonotonic relationship is one in which the presence (or absence) of the label for
one variable is systematically associated with the presence (or absence) of the label for
another variable. The term nonmonotonic means essentially that there
is no discernible direction to the relationship, but a relationship exists.
Drink Orders at McDonald’s
For example, McDonald’s, Burger King, and Wendy’s all know from
100%
experience that morning customers typically purchase coffee whereas
20%
80%
noon customers typically purchase soft drinks. The relationship is in
no way exclusive—there is no guarantee that a morning customer will
90%
60%
always order coffee or that an afternoon customer will always order a
80%
40%
soft drink. In general, though, this relationship exists, as can be seen in
Figure 14.1. The nonmonotonic relationship is simply that the morn20%
ing customer tends to purchase breakfast foods such as eggs, biscuits,
10%
0%
and coffee, and the afternoon customers tend to purchase lunch items
Breakfast
Lunch
such as burgers, fries, and soft drinks. So, the “morning” label is asCoffee
Soft Drink
sociated with the “coffee” label while the “noon” label is associated
with “soft drink” label. In other words, with a monotonic relationship,
FIGURE 14.1
when you find the presence of one label for a variable, you tend to find the presence of
Example of a Nonanother specific label of another variable: breakfast diners typically order coffee. But the
monotonic Relationassociation is general, and we must state each one by spelling it out verbally. In other
ship Between Drink
words, we know only the general pattern of presence or nonpresence with a nonmonotonic
Orders and Meal
relationship. We show you how to meaure nonmonotonic relationships using Chi-square
analysis later in this chapter.
Type at McDonald’s
types of relationships Between two VariaBles
mOnOtOnic reLatiOnships
In monotonic relationships, the researcher can assign
a general direction to the association between two variables. There are two types of monotonic relationships:
increasing and decreasing. Monotonic increasing relationships are those in which one variable increases as the
other variable increases. As you would guess, monotonic
decreasing relationships are those in which one variable
increases as the other variable decreases. You should note
that in neither case is there any indication of the exact
amount of change in one variable as the other changes.
Monotonic means that the relationship can be described
only in a general directional sense. Beyond this, precision in the description is lacking. For example, if a company increases its advertising, we would expect its sales
to increase, but we do not know the amount of the sales
increase. Monotonic relationships are not in the scope of
this textbook, so we will simply mention them here.
381
That higher SPF sunscreen blocks more ultraviolet rays is a
nonmonotonic relationship, meaning that the relationship is
not exact.
Photo: Yuri Arcurs/Shutterstock
Linear reLatiOnships
Next, we turn to a more precise relationship—and one that is very easy to envision. A linear
relationship is a “straight-line association” between two scale variables. Here, knowledge
of the amount of one variable will automatically yield knowledge of the amount of the other
variable as a consequence of applying the linear or straight-line formula that is known to exist
between them. In its general form, a straight-line formula is as follows:
A monotonic relationship
means you know the
general direction
(increasing or decreasing)
of the relationship
between two variables.
Formula for a straight line
y = a + bx
where
y = the dependent variable being estimated or predicted
a = the intercept
b = the slope
x = the independent variable used to predict the dependent variable
The terms intercept and slope should be familiar to you, but if they are a bit hazy, do not
be concerned as we describe the straight-line formula in detail in the next chapter. We also
clarify the terms independent and dependent in Chapter 15.
It should be apparent that a linear relationship is much more precise and contains a great
deal more information than does a nonmontonic or a monotonic relationship. By simply substituting the values of a and b, an exact amount can be determined for y given any value of x. For
example, if Jack-in-the-Box estimates that every customer will spent about $9 per lunch visit,
it is easy to use a linear relationship to estimate how many dollars of revenue will be associated
with the number of customers for any given location. The following equation would be used:
Straight-line formula example
y = 0 + 9 times number of customers
In this example, x is the number of customers. If 100 customers come to a Jack-inthe-Box location, the associated expected total revenues would be $0 plus $9 times 100, or
$900. If 200 customers are expected to visit the location, the expected total revenue would
be $0 plus $9 times 200, or $1,800. To be sure, the Jack-in-the-Box location would not
A linear relationship means
the two variables have a
“straight-line” relationship.
382
Chapter 14 • Making Use of assoCiations tests
Linear relationships are
quite precise.
A curvilinear relationship
means some smooth curve
pattern describes the
association.
derive exactly $1,800 for 200 customers, but the linear relationship shows what is expected
to happen, on average. In subsequent sections of the textbook, we describe correlation and
regression analysis, both of which rely on linear relationships.
curviLinear reLatiOnships
Finally, curvilinear relationships are those in which one variable is associated with another
variable, but the relationship is described by a curve rather than a straight line. In other words,
the formula for a curved relationship is used rather than the formula for a straight line. Many
curvilinear patterns are possible. The relationship may be an S-shape, a J-shape, or some
other curved-shape pattern. Curvilinear relationships are beyond the scope of this text; nonetheless, it is important to list them as a type of relationship that can be investigated through the
use of special-purpose statistical procedures.
Characterizing Relationships Between Variables
Depending on its type, a relationship can usually be characterized in three ways: by its
presence, direction, and strength of association. We need to describe these before taking up
specific statistical analyses of associations between two variables.
The presence of a
relationship between two
variables is determined by
a statistical test.
Direction means that you
know if the relationship is
positive or negative, while
pattern means you know
the general nature of the
relationship.
Strength means you
know how consistent the
relationship is.
presence
Presence refers to the finding that a systematic relationship exists between the two variables
of interest in the population. Presence is a statistical issue. By this statement, we mean that the
marketing researcher relies on statistical significance tests to determine if there is sufficient
evidence in the sample to support the claim that a particular association is present in the population. The chapter on statistical inference introduced the concept of a null hypothesis. With
associative analysis, the null hypothesis states there is no association (relationship) present in
the population and the appropriate statistical test is applied to test this hypothesis. If the test
results reject the null hypothesis, then we can state that an association (relationship) is present in the population (at a certain level of confidence). We describe the statistical tests used in
associative analysis later in this chapter.
DirectiOn (Or pattern)
You have seen that in the cases of monotonic and linear relationships, associations may be
described with regard to direction. For a linear relationship, if b (slope) is positive, then the
linear relationship is increasing; if b is negative, then the linear relationship is decreasing.
So the direction of the relationship is straightforward with linear relationships.
For nonmonotonic relationships, positive or negative direction is inappropriate, because
we can only describe the pattern verbally.1 It will soon become clear to you that the scaling
assumptions of variables having nonmonotonic association negate the directional aspects
of the relationship. Nevertheless, we can verbally describe the pattern of the association
as we have in our examples using presence or absence, and that statement substitutes
for direction.
strength Of assOciatiOn
Finally, when present—that is, when statistically significant—the association between two
variables can be envisioned as to its strength, commonly using words such as “strong,”
“moderate,” “weak,” or some similar characterization. That is, when a consistent and systematic association is found to be present between two variables, it is then up to the marketing
researcher to ascertain the strength of association. Strong associations are those in which
there is a high probability of the two variables exhibiting a dependable relationship, regardless of the type of relationship being analyzed. A low degree of association, on the other hand,
Cross-taBUlations
TablE 14.1
383
Step-by-Step Procedure for Analyzing Relationships
step
1. Choose variables to analyze.
2. Determine the scaling assumptions of
the chosen variables.
3. Use the correct relationship analysis.
4. Determine if the relationship is present.
5. If present, determine the direction of
the relationship.
6. If present, assess the strength of the
relationship.
Description
Identify which variables you think might be related.
For purposes of this chapter, both must be either scale
(interval or ratio) or categorical (nominal) variables.
For two nominal variables (distinct categories),
use cross-tabulation; for two scale variables, use
correlation.
If the analysis shows the relationship is statistically
significant, it is present.
A linear (scale variables) relationship will be either
increasing or decreasing; a nonmonotonic relationship
(nominal scales) will require looking for a pattern.
With correlation, the size of the coefficient denotes
strength; with cross-tabulation, the pattern is subjectively assessed.
is one in which there is a low probability of the two variables’ exhibiting a dependable relationship. The relationship exists between the variables, but it is less evident.
There is an orderly procedure for determining the presence, direction, and strength of a
relationship, which is outlined in Table 14.1. As can be seen in the table, you must first decide what type of relationship can exist between the two variables of interest: nonmontonic
or linear. As you will learn in this chapter, the answer to this question depends on the scaling
assumptions of the variables; as we illustrated earlier, nominal scales can embody only imprecise, pattern-like relationships, but scale variables (interval or ratio scales) can incorporate
very precise and linear relationships. Once you identify the appropriate relationship type as
either nonmonotonic or linear, the next step is to determine whether that relationship actually
exists in the population you are analyzing. This step requires a statistical test, and again, we
describe the proper tests beginning with the next section of this chapter.
When you determine that a true relationship does exist in the population by means of the
correct statistical test, you then establish its direction or pattern. Again, the type of relationship dictates how you describe its direction. You might have to inspect the relationship in a
table or graph, or you might need only to look for a positive or negative sign before the computed statistic. Finally, the strength of the relationship remains to be judged. Some associative
analysis statistics, such as correlations, indicate the strength in a straightforward manner—
that is, just by their absolute size. With nominal-scaled variables, however, you must inspect
the pattern to judge the strength. We describe this procedure—the use of cross-tabulations—
next, and we describe correlation analysis later in this chapter.
Based on scaling
assumptions, first
determine the type of
relationship, and then
perform the appropriate
statistical test.
Cross-Tabulations
Cross-tabulation and the associated Chi-square value we are about to explain are used to assess
if a nonmonotonic relationship exists between two nominally scaled variables. Remember that
nonmonotonic relationships are those in which the presence of the label for one nominally scaled
variable coincides with the presence or absence of the label for another nominally scaled variable
such as lunch buyers ordering soft drinks with their meals. (Actually, cross-tabulation can be used
for any 2 variables with well-defined labels, but it is best demonstrated with nominal variables.)
crOss-tabuLatiOn anaLysis
When investigating the relationship between two nominally scaled variables, we typically
use “cross-tabs,” or the use of a cross-tabulation table, defined as a table in which data is
A cross-tabulation consists
of rows and columns
defined by the categories
classifying each variable.
384
Chapter 14 • Making Use of assoCiations tests
compared using a row and column format. A cross-tabulation table is sometimes referred to
as an “r × c” (or r-by-c) table because it is comprised of rows by columns. The intersection
of a row and a column is called a cross-tabulation cell. As an example, let’s take a survey where there are two types of individuals: buyers of Michelob Light beer and nonbuyers
of Michelob Light beer. There are also two types of occupations: professional workers who
might be called “white colar” employees and manual workers who are sometimes referred
to as “blue colar” workers. There is no requirement that the number of rows and columns
are equal; we are just using a 2 × 2 cross-tabulation to keep the example as simple as possible. A cross-tabulation table for our Michelob Light beer survey is presented in Table 14.2.
The columns are in vertical alignment and are indicated in this table as either “Buyer” or
“Nonbuyer” of Michelob Light, whereas the rows are indicated as “White Collar” or “Blue
Collar” for occupation. Additionally, there is a “Totals” column and row.
A cross-classification
table can have four types
of numbers in each cell:
frequency, raw percentage,
column percentage, and
row percentage.
Raw percentages are cell
frequencies divided by the
grand total.
types Of frequencies anD percentages
in a crOss-tabuLatiOn tabLe
Look at the frequencies table in Table 14.2A. It includes plus (+) and equal (=) signs to help
you learn the terminology and to understand how the numbers are computed. The frequencies table contains the raw numbers determined from the preliminary tabulation.2 The upper left-hand cell number is a frequency cell that counts people in the sample who are both
white-collar workers and buyers of Michelob Light (152), and the cell frequency to its right
identifies the number of individuals who are white-collar workers who do not buy Michelob
Light (8). These cell numbers represent raw counts or frequencies—that is, the number of
respondents who possess the quality indicated by the row label as well as the quality indicated
by the column label. The cell frequencies can be summed to determine the row totals and
column totals. For example, Buyer/White Collar (152) and Nonbuyer/White Collar (8) sum to
160, while Buyer/White Collar (152) and Buyer/Blue Collar (14) sum to 166. Similarly, the
row and column totals sum to equal the grand total of 200. Take a few minutes to be familiar
with the terms and computations in the frequencies table as they will be referred to in the
following discussion.
Table 14.2B illustrates how at least three different sets of percentages can be computed
for cells in the table. These three percentages tables are: the raw percentages table, the column
percentages table, and the row percentages table.
The first table in Table 14.2B shows that the raw frequencies can be converted to raw
percentages by dividing each by the grand total. The raw percentages table contains the percentages of the raw frequency numbers just discussed. The grand total location now has 100%
TablE 14.2a
Cross-Tabulation Frequencies Table for a Michelob Light
Survey
frequencies table
type of Buyer
White Collar
Buyer/White Collar
Cell Frequency
Occupational Status
Buyer
152
+
+
+
Blue Collar
Totals
Column Totals
14
=
166
nonbuyer
=
8
+
+
totals
160
+
26
=
34
=
=
Grand Total
40
=
200
Row
Totals
Cross-taBUlations
TablE 14.2b
Cross-Tabulation Percentages Tables for a Michelob Light Survey
Raw Percentages Table
White Collar
Buyer/White Collar
Cell Raw Percent
Occupational Status
Blue Collar
Totals
Buyer
76%
(152/200)
+
7%
(14/200)
=
83%
(166/200)
+
+
+
nonbuyer
4%
(8/200)
+
13%
(26/200)
=
17%
(34/200)
=
=
=
totals
80%
(160/200)
+
20%
(40/200)
=
100%
(200/200)
Column Percentages Table
White Collar
Buyer/White Collar
Column Percent
Occupational Status
Blue Collar
Totals
Row Percentages Table
Buyer/White Collar
White Collar
Occupational Status
Blue Collar
Totals
Row Percent
Buyer
92%
(152/166)
+
8%
(14/166)
=
100%
(166/166)
Buyer
95%
(152/160)
35%
(14/40)
83%
(166/200)
nonbuyer
24%
(8/34)
+
76%
(26/34)
=
100%
(34)
+
+
+
nonbuyer
5%
(8/160)
65%
(26/40)
17%
(34/200)
(or 200/200) of the grand total. Above it are 80% and 20% for the raw percentages of whitecollar occupational respondents and blue-collar occupational respondents, respectively, in the
sample. Compute a couple of the cells just to verify that you understand how they are derived.
For instance 152 ÷ 200 = 76%. Our + and = signs indicate how the totals are computed.
Two additional cross-tabulation tables can be presented, and these are more valuable in
revealing underlying relationships. The column percentages table divides the raw frequencies by its column total raw frequency. That is, the formula is as follows:
Formula for a column
cell percent
Column cell percent =
Cell frequency
Total of cell frequencies in that column
For instance, it is apparent that of the nonbuyers, 24% are white-collar and 76% are bluecollar. Note the reverse pattern for the buyers group: 92% of white-collar respondents are
Michelob Light buyers and 8% are blue-collar buyers. You are beginning to see the nonmonotonic relationship; in the presence of white collar we have the presence of buying.
totals
80%
(160/200)
+
20%
(40/200)
=
100%
(200/200)
=
=
=
totals
100%
(160/160)
100%
(40/40)
100%
(200/200)
385
386
Chapter 14 • Making Use of assoCiations tests
The row percentages table presents the data with the row totals as the 100% base for
each. That is, a row cell percentage is computed as follows:
Formula for a row
cell percent
Row (column) percentages
are row (column) cell
frequencies divided by
the row (column) total.
Row cell percent =
cell frequency
total of cell frequencies in that row
Now, it is possible to see that, of the white-collar respondents, 95% are buyers and 5%
are nonbuyers. As you compare the row percentages table to the column percentages table,
you should detect the relationship between occupational status and Michelob Light beer preference. Can you state it at this time?
Unequal percentage concentrations of individuals in a few cells, as we have in this example, illustrates the possible presence of a nonmonotonic association. If we had found that
approximately 25% of the sample had fallen in each of the four cells, no relationship would
be found to exist—it would be equally probable for a white or blue collar worker to be either
a buyer or nonbuyer. However, the large concentrations of individuals in two particular cells
here suggests that there is a high probability that a buyer of Michelob Light beer is also a
white-collar worker, and there is also a tendency for nonbuyers to work in blue-collar occupations. In other words, there is probably an association between occupational status and the
beer-buying behavior of individuals in the population represented by this sample. However,
as noted in step 4 of our procedure for analyzing relationships (Table 14.1), we must test the
statistical significance of the apparent relationship before we can say anything more about it.
The test will tell us if this pattern would occur again if we repeated the study.
Chi-Square Analysis
Chi-square analysis
assesses the statistical
significance of
nonmonotonic
associations in crosstabulation tables.
Expected frequencies
are calculated based on
the null hypothesis of
no association between
the two variables under
investigation.
Chi-square (χ2) analysis is the examination of frequencies for two nominal-scaled variables in a
cross-tabulation table to determine whether the variables have a statistically significant nonmonotonic relationship.3 The formal procedure for Chi-square analysis begins when the researcher formulates a statistical null hypothesis that the two variables under investigation are not associated in
the population. Actually, it is not necessary for the researcher to state this hypothesis in a formal
sense, for Chi-square analysis always implicitly takes this hypothesis into account. In other words,
whenever we use Chi-square analysis explicitly with a cross-tabulation, we always begin with the
assumption that no association exists between the two nominal-scaled variables under analysis.4
ObserveD anD expecteD frequencies
The statistical procedure is as follows. The cross-tabulation table in Table 14.2A contains
observed frequencies, which are the actual cell counts in the cross-tabulation table. These
observed frequencies are compared to expected frequencies, which are defined as the theoretical frequencies that are derived from the null hypothesis of no association between the two
variables. The degree to which the observed frequencies depart from the expected frequencies
is expressed in a single number called the Chi-square statistic. The computed Chi-square
statistic is then compared to a table Chi-square value (at a chosen level of significance) to
determine whether the computed value is significantly different from zero.
The expected frequencies are those that would be found if there were no association between the two variables. Remember, this is the null hypothesis. About the only “difficult” part
of Chi-square analysis is in the computation of the expected frequencies. The computation is
accomplished using the following equation:
Formula for an expected
cross-tabulation
cell frequency
Expected cell frequency =
cell column total * cell row total
grand total
Chi-sqUare analysis
387
The application of this equation generates a number for each cell that would occur if no
association existed. Returning to our Michelob Light beer example, you were told that 160
white-collar and 40 blue-collar consumers had been sampled, and it was found that there
were 166 buyers and 34 nonbuyers of Michelob Light. The expected frequency for each cell,
assuming no association, calculated with the expected cell frequency is as follows:
Calculations of
expected cell
frequencies
using the
Michelob Beer
example
160 * 166
= 132.8
200
160 * 34
White-collar nonbuyer =
= 27.2
200
40 * 166
Blue-collar buyer =
= 33.2
200
40 * 34
Blue-collar nonbuyer =
= 6.8
200
White-collar buyer =
Notes:
Buyers total = 166
Nonbuyers total = 34
White-collar total = 160
Blue-collar total = 40
Grand total = 200
the cOmputeD χ2 vaLue
Next, compare the observed frequencies to these expected frequencies. The formula for this
computation is as follows:
Chi-square formula
x2 = a
n
i- 1
1Observedi - Expectedi2 2
Expectedi
Where
Observedi = observed frequency in cell i
Expectedi = expected frequency in cell i
n = number of cells
Applied to our Michelob beer example,
Calculation of
Chi-square value
(Michelob example)
Notes:
x =
+
Observed frequencies
132.8
27.2
are in Table 14.2A.
2
2
114 - 33.22
126 - 6.82
+
+
= 81.64 Expected frequencies
33.2
6.8
are computed above.
2
1152 - 132.82 2
18 - 27.22 2
You can see from the equation that each expected frequency is compared (via subtraction) to
the observed frequency and squared to adjust for any negative values and to avoid the cancellation effect. This value is divided by the expected frequency to adjust for cell size differences, and
these amounts are summed across all of the cells. If there are many large deviations of observed
frequencies from the expected frequencies, the computed Chi-square value will increase; but if
there are only a few slight deviations from the expected frequencies, the computed Chi-square
number will be small. In other words, the computed Chi-square value is really a summary indication of how far away from the expected frequencies the observed frequencies are found to be. As
such, it expresses the departure of the sample findings from the null hypothesis of no association.
the chi-square DistributiOn
Now that you’ve learned how to calculate a Chi-square value, you need to know if it is statistically significant. In previous chapters, we described how the normal curve or z distribution, the F distribution, and Student’s t distribution, all of which exist in tables, are used
The computed Chi-square
value compares observed
to expected frequencies.
The Chi-square statistic
summarizes how far
away from the expected
frequencies the observed
cell frequencies are found
to be.
388
Chapter 14 • Making Use of assoCiations tests
Do these blue-collar workers want their boss to buy them a
Michelob Light for doing a great job? Cross-tabulation can
answer this question.
Photo: AISPIX by Image Source/Shutterstock
Formula for Chi-Square
degrees of freedom
by a computer statistical program to determine level of
significance. Chi-square analysis requires the use of a
different distribution. The Chi-square distribution is
skewed to the right, and the rejection region is always
at the right-hand tail of the distribution. It differs from
the normal and t distributions in that it changes its shape
depending on the situation at hand, and it does not have
negative values. Figure 14.2 shows examples of two
Chi-square distributions.
The Chi-square distribution’s shape is determined by
the number of degrees of freedom. The figure shows that
the more the degrees of freedom, the more the curve’s
tail is pulled to the right. In other words, the more the
degrees of freedom, the larger the Chi-square value must
be to fall in the rejection region for the null hypothesis.
It is a simple matter to determine the number of degrees of freedom. In a cross-tabulation table, the degrees
of freedom are found through the following formula:
Degrees of freedom = 1r - 12 1c - 12
Where:
r = the number of rows
c = the number of columns
The Chi-square
distribution’s shape
changes depending on
the number of degrees
of freedom.
A table of Chi-square values contains critical points that determine the break between
acceptance and rejection regions at various levels of significance. It also takes into account
the numbers of degrees of freedom associated with each curve. That is, a computed Chisquare value says nothing by itself—you must consider the number of degrees of freedom in
the cross-tabulation table because more degrees of freedom are indicative of higher critical
Chi-square table values for the same level of significance. The logic of this situation stems
from the number of cells. With more cells, there is more opportunity for departure from the
Chi-Square Curve for
4 Degrees of Freedom
Chi-Square Curve for
6 Degrees of Freedom
FIGURE 14.2 The
Chi-Square Curve’s
Shape Depends on its
Degrees of Freedom
0
Rejection Region is the
Right-Hand End of Curve