Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (21.9 MB, 497 trang )
CHAPTER
15
Learning Objectives
• Tounderstandthebasicconcept
ofprediction
• Tolearnhowmarketing
researchersuseregression
analysis
• Tolearnhowmarketing
researchersusebivariate
regressionanalysis
• Toseehowmultipleregression
differsfrombivariateregression
• Tolearnhowtoobtainand
interpretmultipleregression
analyseswithSPSS
“Where We are”
1 Establish the need for marketing
research.
A Marketing Research Practitioner’s
Comments on Multiple Regression
Analysis
Marketing research practitioners have two basic
missions: describe the current state of the marketplace and predict how the marketplace will
react to changes in current product offerings or
the introduction of new offerings. Prediction is
the more difficult of these two missions simply
because so many variables must be measured.
William D. Neal,
Founder and Senior Partner,
SDR Consulting
Consider the introduction of a new type
of yogurt. To predict sales among regular yogurt purchasers, one would need to
initially measure awareness of the new offering, consumer importance of
2 Define the problem.
differing product attributes (e.g., taste, calories, claimed benefits, pack-
3 Establish research objectives.
age size), relative price, availability, and so forth. Some of these product
4 Determine research design.
attributes may be important predictors of sales, and some may not. But
5 Identify information types and
it is obvious that it is highly unlikely one single attribute will adequately
sources.
6 Determine methods of accessing
data.
Understanding Regression
Analysis Basics
predict sales. Rather it is most likely it will require a combination of
measures to provide an acceptable sales prediction model.
One of the best tools for unraveling this prediction complexity is lin-
7 Design data collection forms.
ear regression. Executed correctly, regression can help you understand
8 Determine the sample plan and size.
whether the variables you have measured can aid in predicting sales (or
9 Collect data.
consideration, or the likelihood of choice). Linear regression is one of the
10 Analyze data.
11 Prepare and present the final
research report.
fundamental and most important tools in the researcher’s toolbox.
When I first studied regression in graduate school, we were required
to do the analysis manually using only a simple hand calculator. That
was a great learning experience and taught me how to analyze residuals
to understand whether the data was truly linear and whether the data
was a good fit to the model I developed. In this age, we have powerful
computers and sophisticated software to do all those things. However, since the advent of these new tools, I have also seen too many
instances where researchers who lack a deep understanding of regression analysis produce a regression-based model they represented to
be a good predictor when it is not, leading to erroneous conclusions.
Thus, a note of caution is in order: Test it, test it, and test it again.
Text and Images: By permission, William D. Neal,
Founder and Senior Partner, SDR Consulting.
A
s you can surmise from reading the opening vignette, this chapter takes up the subject of
multiple regression analysis. Undoubtedly, your reading of William Neal’s description
has left you with more questions than answers. For example, what is a residual? Truly
linear? Good fit? Your questions should alert you to the fact that we are going to describe a complex analytical technique. We will endeavor to describe regression analysis in a slow and methodical manner, and when we end our description, we will warn you that, while you have learned to
run it and interpret its findings, we have barely scratched the surface of this complicated analysis.
Bivariate Linear Regression Analysis
In this chapter, we will deal exclusively with linear regression analysis, a predictive model technique often used by marketing researchers. However, regression analysis is a complex statistical technique with a large number of requirements and nuances.1 You must understand that this
chapter offers only a basic introduction to this area, and as we will warn you toward the end of
the material, a great many aspects of regression analysis are beyond the scope of this textbook.
We define regression analysis as a predictive analysis technique in which one or more
variables are used to predict the level of another by use of the straight-line formula. Bivariate
regression means only two variables are being analyzed, and researchers sometimes refer to
this case as simple regression. We will review the equation for a straight line and introduce
basic terms used in regression. We also describe some basic computations and significance
with bivariate regression.
A straight-line relationship underlies regression,
and it is a powerful predictive model. Figure 15.1 illustrates a straight-line relationship, and you should refer to
it as we describe the elements in a general straight-line
formula. The formula for a straight line is:
Formula for a straight-line
relationship
With bivariate regression,
one variable is used to
predict another variable
using the straight-line
formula.
y = a + bx
where
y = the predicted variable
x = the variable used to predict y
a = the intercept, or point where the line cuts the y
axis when x = 0
b = the slope, or the change in y for any 1 unit
change in x
Photo: Eisenhans/Fotolia
407
408
Chapter 15 • Understanding regression analysis BasiCs
y
b
a
b = the slope, the change in the line
for each one-unit change in x
basic cOncepts in regressiOn
anaLysis
We now define the variables and show how the intercept and slope are computed.
1
a = intercept, the point on the y-axis
that the line cuts when x = 0
0
Figure 15.1
General Equation
for a Straight Line
in Graph Form
The straight-line equation
is the basis of regression
analysis.
Regression is directly
related to correlation by
the underlying straight-line
relationship.
You should recall the straight-line relationship we described underlying the correlation coefficient: When the scatter diagram for two variables
appears as a thin ellipse, there is a high correlation between them. Regression is directly related
to correlation.
x
independent and Dependent variables As
we indicated, bivariate regression analysis is a case
in which only two variables are involved in the predictive model. When we use only two variables, one is termed dependent and the other is termed independent. The dependent variable
is that which is predicted, and it is customarily termed y in the regression straight-line equation. The independent variable is that which is used to predict the dependent variable, and
it is the x in the regression formula. We must quickly point out that the terms dependent and
independent are arbitrary designations and are customary to regression analysis. There is no
cause-and-effect relationship or true dependence between the dependent and the independent
variable. It is strictly a statistical relationship, not causal, that may be found between these
two variables.
The least squares criterion
used in regression analysis
guarantees that the
“best” straight-line slope
and intercept will be
calculated.
computing the slope and the intercept To compute a (intercept) and b (slope),
you must work with a number of observations of the various levels of the dependent
variable paired with different levels of the independent variable, identical to the scatter
diagrams we illustrated previously when we were demonstrating how to perform
correlation analysis.
The formulas for calculating the slope (b) and the intercept (a) are rather complicated, but
some instructors are in favor of their students learning these formulas, so we have included
them in Marketing Research Insight 15.1.
When SPSS or any other statistical analysis program computes the intercept and the
slope in a regression analysis, it does so on the basis of the least squares criterion. The least
squares criterion is a way of guaranteeing that the straight line that runs through the points
on the scatter diagram is positioned to minimize the vertical distances away from the line of
the various points. In other words, if you draw a line where the regression line is calculated
and calculate the distances of all the points away from that line (called residuals), it would be
impossible to draw any other line that would result in a lower sum of all of those distances.
The least squares criterion guarantees that the line is the one with the lowest total squared
residuals. Each residual is squared to avoid a cancellation effect of positive and negative
residuals.
To learn
about
linear
regression,
launch
www.youtube.com, and
search for “Intro to Linear
Regression.”
hOW tO imprOve a regressiOn anaLysis FinDing
When a researcher would wants to improve a regression analysis, the researcher can use a
scatter diagram to identify outlier pairs of points. An outlier2 is a data point that is substantially outside the normal range of the data points being analyzed. As one author has noted,
outliers “stick out like sore thumbs.”3 When using a scatter diagram to identify outliers,4 the
researcher draws an ellipse that encompasses most of the points that appear to be in an elliptical pattern.5 He or she then eliminates outliers from the data and reruns the regression
analysis. Generally, this approach will improve the regression analysis results.
In regression, the
independent variable
is used to predict the
dependent variable.
Bivariate linear regression analysis
marketing research insight 15.1
409
Practical Application
How to Calculate the Intercept and Slope of a Bivariate Regression
In this example, we are using the Novartis pharmaceuticals
company sales territory and number of salespersons data found
Table 15.1
in Table 15.1. Intermediate regression calculations are included
in Table 15.2.
Bivariate Regression Analysis Data and Intermediate
Calculations
Territory (I)
sales
($ millions) (y)
number of
salespersons (x)
1
2
3
4
5
6
7
8
9
10
102
125
150
155
160
168
180
220
210
205
230
255
250
260
250
275
280
240
300
310
4,325
(Average = 216.25)
11
12
13
14
15
16
17
18
19
20
Sums
xy
χ2
7
5
9
9
9
8
10
10
12
12
714
625
1,350
1,395
1,440
1,344
1,800
2,200
2,520
2,460
49
25
81
81
81
64
100
100
144
144
12
15
14
15
16
16
17
18
18
19
251
(Average = 12.55)
2,760
3,825
3,500
3,900
4,320
4,400
4,760
4,320
5,400
5,890
58,603
144
225
196
225
256
256
289
324
324
361
3,469
The formula for computing the regression parameter b is:
Formula for b,
the slope, in bivariate
regression
where
xi = an x variable value
yi = a y value paired with each xi value
n = the number of pairs
n a xi yi - a a xi b a a yi b
n
b =
n
i=1
n a x 2i
i=1
n
n
i=1
i=1
2
- a a xi b
n
i=1
410
Chapter 15 • Understanding regression analysis BasiCs
The calculations for b, the slope, are as follows:
Calculation of b, the slope,
in bivariate regression using
Novartis sales territory data
n a xi yi - a a xi b a a yi b
n
b =
=
n
i=1
n a x 2i
i=1
n
n
i=1
i=1
2
- a a xi b
n
i=1
20 * 58603 - 251 * 4325
20 * 3469 - 2512
1172060 - 1085575
=
69380 - 63001
86485
=
6379
= 13.56
Notes:
n = 20
Sum xy = 58603
Sum of x = 251
Sum of y = 4325
Sum of x2 = 3469
The formula for computing the intercept is:
Formula for a,
the intercept, in
bivariate regression
a = y - bx
The computations for a, the intercept, are as follows:
Calculation of a, the intercept,
in bivariate regression using
Novartis sales territory data
a = y - bx
= 216.25 - 13.56 * 12.55
= 216.25 - 170.178
= 46.07
Notes:
y = 216.25
x = 12.55
In other words, the bivariate regression equation has been found to be:
Novartis sales regression equation
y = 46.07 + 13.56 x
The interpretation of this equation is as follows. Annual sales in the average Novartis sales territory are $46.07 million, and sales
increase $13.56 million annually with each additional salesperson.
Multiple Regression Analysis
We follow up our introduction to bivariate regression analysis by discussing multiple regression analysis. You will find that all of the concepts in bivariate regression apply to multiple
regression analysis, except you will be working with multiple independent variables.
There is an underlying
general conceptual model
in multiple regression
analysis.
an UnDerLying cOnceptUaL mODeL
A model as a structure that ties together various constructs and their relationships. It is beneficial for the marketing manager and the market researcher to have some sort of model in
mind when designing the research plan. The bivariate regression equation is a model that ties
together an independent variable and its dependent variable. The dependent variables that
interest market researchers are typically sales, potential sales, or some attitude held by those
who make up the market. For example, in the Novartis example, the dependent variable was
territory sales. If Dell Computers commissioned a survey, it might want information on those
who intend to purchase a Dell computer, or it might want information on those who intend to
buy a competing brand as a means of understanding these consumers and perhaps dissuading
them. The dependent variable would be purchase intentions for Dell computers. If Maxwell
House Coffee were considering a brand of gourmet iced coffee, it would want to know how
coffee drinkers feel about gourmet iced coffee; attitudes toward buying, preparing, and drinking iced coffee would be the dependent variable.
MUltiple regression analysis
Figure 15.2 provides a general conceptual
model that fits many marketing research situations,
Attitudes,
particularly those that are investigating consumer
Opinions,
behavior. A general conceptual model identifies
Feelings
independent and dependent variables and shows
Purchases;
their expected basic relationships to one another.
Intentions to
In Figure 15.2, you can see that purchases, intenPurchase;
tions to purchase, and preferences are in the center,
Preferences;
meaning they are dependent. The surrounding conor Satisfaction
Past Behavior,
cepts are possible independent variables. That is,
Experience,
any one could be used to predict any dependent varKnowledge
iable. For example, one’s intentions to purchase an
expensive automobile like a Lexus could depend on
one’s income. It could also depend on the friends’
recommendations (word of mouth), one’s opinions about how a Lexus would enhance one’s
self-image, or experiences riding in or driving a Lexus.
In truth, consumers’ preferences, intentions, and actions are potentially influenced by a
great number of factors as would be evident if you listed all of the subconcepts that make up
each concept in Figure 15.2. For example, there are probably a dozen demographic variables;
there could be dozens of lifestyle dimensions, and a person is exposed to a great many types
of advertising media every day. Of course, in the problem definition stage, the researcher and
manager reduce the myriad of independent variables down to a manageable number to be
included on the questionnaire. That is, they have the general model structure in Figure 15.2
in mind, but they identify and measure specific variables that pertain to the problem at hand.
Because bivariate regression analysis treats only one dependent–independent pair, it would
take a great many bivariate regression analyses to account for all possible relevant dependent–
independent pairs of variables in a general model such as Figure 15.2. Fortunately, there is no
need to perform a great many bivariate regressions, as multiple regression analysis is a much
better tool, and a technique we are about to describe in some detail.
Active Learning
The General Conceptual Model for Global Motors
Understandably, Nick Thomas, CEO of Global Motors, a new division of a large automobile
manufacturer, ZEN Motors, wants everyone to intend to purchase a new gasoline alternative technology automobile; however, this will not be the case due to different beliefs and
predispositions in the driving public. Regression analysis will assist Nick by revealing what
variables are good predictors of intentions to buy the various new technology automobile
models under consideration at Global Motors. What is the general conceptual model apparent in the Global Motors survey dataset?
In order to answer this question and to portray the general conceptual model in the format of Figure 15.2, you must inspect the several variables in this SPSS dataset or otherwise
come up with a list of the variables in the survey. Using any “Desirability” variable as the dependent variable, diagram the general types of independent or predictor variables that are
apparent in this study. Comment on the usefulness of this general conceptual model to Nick
Thomas; that is, assuming that the regression results are significant, what marketing strategy
implications will become apparent?
411
Media
Exposure,
Word of
Mouth
Demographics,
Lifestyle
Figure 15.2 A
General Conceptual
Model for Multiple
Regression Analysis
The researcher and
manager identify, measure,
and analyze specific
variables that pertain to
the general conceptual
model in mind.
412
Chapter 15 • Understanding regression analysis BasiCs
Multiple regression means
you have more than one
independent variable to
predict a single dependent
variable.
With multiple regression,
you work with a regression
plane rather than a line.
A multiple regression
equation has two or more
independent variables (x’s).
mULtipLe regressiOn anaLysis DescribeD
Multiple regression analysis is an expansion of bivariate regression analysis in that more
than one independent variable is used in the regression equation. The addition of independent variables complicates the conceptualization by adding more dimensions or axes to the
regression situation. But it makes the regression model more realistic because, as we have
just explained with our general model discussion, predictions normally depend on multiple
factors, not just one.
basic assumptions in multiple regression Consider our Novartis example with the
number of salespeople as the independent variable and territory sales as the dependent variable. A second independent variable, such as advertising levels, can be added to the equation.
The addition of a second variable turns the regression line into a regression plane because
there are three dimensions if we were to try to graph it: territory sales (Y), number of sales
people (X1), and advertising level (X2). A regression plane is the shape of the dependent variable in multiple regression analysis. If other independent variables are added to the regression
analysis, it would be necessary to envision each one as a new and separate axis existing at
right angles to all other axes. Obviously, it is impossible to draw more than three dimensions
at right angles. In fact, it is difficult to even conceive of a multiple dimension diagram, but the
assumptions of multiple regression analysis require this conceptualization.
Everything about multiple regression is largely equivalent to bivariate regression except
you are working with more than one independent variable. The terminology is slightly different in places, and some statistics are modified to take into account the multiple aspect, but for
the most part, concepts in multiple regression are analogous to those in the simple bivariate
case. We note these similarities in our description of multiple regression.
The equation in multiple regression has the following form:
Multiple regression equation y = a + b1x 1 + b2x 2 + b3x 3 + . . . + bm x m
Where
y = the dependent, or predicted, variable
xi = independent variable i
a = the intercept
bi = the slope for independent variable i
m = the number of independent variables in the equation
As you can see, the addition of other independent variables has done nothing more than
to add bixi’s to the equation. We have retained the basic y = a + bx straight-line formula,
except now we have multiple x variables, and each one is added to the equation, changing y
by its individual slope. The inclusion of each independent variable in this manner preserves
the straight-line assumptions of multiple regression analysis. This is sometimes known as
additivity because each new independent variable is added to the regression equation.
Let’s look at a multiple regression analysis result so you can better understand the multiple regression equation. Here is a possible result using our Lexus example:
Lexus purchase
intention
multiple regression
equation example
Intention to purchase a Lexus = 2
+ 1.0 * attitude toward Lexus (1-5 scale)
- .5 * attitude toward current auto (1-5 scale)
+ 1.0 * income level (1-10 scale)
Notes:
a=2
b1 = 1.0
b2 = −.5
b3 = 1.0
This multiple regression equation says that you can predict a consumer’s intention to buy
a Lexus level if you know three variables: (1) attitude toward the Lexus brand, (2) attitude
toward the automobile he/she owns now, and (3) income level using a scale with 10 income
MUltiple regression analysis
413
levels. Further, we can see the impact of each of these variables on Lexus purchase intentions.
Here is how to interpret the equation. First, the average person has a 2 intention level, or some
small propensity to want to buy a Lexus. Attitude toward Lexus is measured on a 1–5 scale;
with each attitude scale point, intention goes up one point. That is, an individual with a strong
positive attitude of 5 will have a greater intention than one with a strong negative attitude of 1.
With attitude toward the current automobile he/she owns (for example, a potential Lexus
buyer may currently own a Cadillac or a BMW), the intention decreases by .5 for each level
on the 5-point scale. Of course we are assuming that these potential buyers own automobile makes other than a Lexus. Finally, the intention increases by 1 with each increasing
income level.
Here is a numerical example for a potential Lexus buyer whose Lexus attitude is 4,
current automobile make attitude is 3, and income is 5:
Calculation of
Lexus purchase
intention using
the multiple
regression
equation
Intention to purchase a Lexus = 2
+ 1.0 * 4
- .5 * 3
+ 1.0 * 5
= 9.5
Notes:
Intercept = 2
Attitude toward Lexus (x1) = 4
Attitude toward current auto (x2) = 3
Income level (x3) = 5
Multiple regression is a powerful tool, because it tells us what factors are related to the
dependent variable, how (the sign) each factor influences the dependent variable, and how
much (the size of bi) each factor influences it.
While you have yet not learned how to run multiple regression analysis on SPSS, you
have sufficient knowledge to realize that this analysis can provide interesting insights into
consumer behavior. Marketing Research Insight 15.2 presents an application of multiple
regression analysis in the social media marketing research arena.
As with bivariate regression analysis in which we alluded to the correlation between y
and x, it is possible to inspect the strength of the linear relationship between the independent
variables and the dependent variable with multiple regression. Multiple R, also called the coefficient of determination, is a handy measure of the strength of the overall linear relationship. As with bivariate regression analysis, the multiple regression analysis model assumes
that a straight-line (plane) relationship exists among the variables. Multiple R ranges from 0
to +1.0 and represents the amount of the dependent variable “explained,” or accounted for,
by the combined independent variables. High multiple R values indicate that the regression
plane applies well to the scatter of points, whereas low values signal that the straight-line
model does not apply well. At the same time, a multiple regression result is an estimate of
the population multiple regression equation, and, as was the case with other estimated population parameters, it is necessary to test for statistical significance.
Multiple R is like a lead indicator of the multiple regression analysis findings. As you
will see soon, it is one of the first pieces of information provided in a multiple regression output. Many researchers mentally convert the multiple R value into a percentage. For example
a multiple R of .75 means that the regression findings will explain 75% of the dependent
variable. The greater the explanatory power of the multiple regression finding, the better and
more useful it is for the researcher.
Before we show you how to run a multiple regression analysis using SPSS, consider this
caution: The independence assumption stipulates that the independent variables must be
statistically independent and uncorrelated with one another. The independence assumption
is crucial because if it is violated, the multiple regression findings are untrue. The presence
of moderate or stronger correlations among the independent variables is termed multicollinearity, which will violate the independence assumption of multiple regression analysis
results when it occurs.6 It is up to the researcher to test for and remove multicollinearity if it
is present.
Multiple R indicates how
well the independent
variables can predict the
dependent variable in
multiple regression.
With multiple regression,
the independent
variables should have
low correlations with one
another.
414
Chapter 15 • Understanding regression analysis BasiCs
marketing research insight 15.2
Social Media Marketing
Multiple Regression Analysis Gives Insights into Students’
Use of Twitter and Facebook
Two researchers recently took note of the widespread adoption of Twitter and Facebook by university students. 7 They
pondered the possible factors and reasons why these two
social media tools are so popular. Using adoptions of innovations theory, they identified several independent variables that
might be related to the use of one or both vehicles. Drawing
samples of undergraduate students from both a large Midwest university and a large Southeastern university, they used
an online survey to obtain information about some demographic variables (such as gender), several behavioral variables
(such as amount of mobile phone usage), and a number of
other variables (such as popularity or degree to which friends
use Twitter/Facebook).
The researchers then used multiple regression analysis
with these items treated as independent variables, and the
degree of use of Twitter/Facebook as the dependent variables. As reported by the researchers, the statistically significant independent variables for amount of use of Twitter and
Facebook are summarized in the following table by “Yes.”
A “No” means that no statistically significant relationship
was found.
Multicollinearity can be
assessed and eliminated
in multiple regression with
the VIF statistic.
Variable
Amount of mobile phone usage
How long used account
Friends expect me to use
Attitude toward Twitter/Facebook
To pass the time
Substitute for face-to-face
My friends use it
Twitter Use Facebook Use
Yes
No
Yes
Yes
No
Yes
No
No
Yes
Yes
No
Yes
No
Yes
The findings reveal that university students use Twitter if: (1) they are
heavy mobile users, (2) they believe their friends expect to tweet,
(3) they have a positive attitude toward Twitter, and (4) they prefer
to use Twitter in place of face-to-face meetings with their friends.
On the other hand, university students use Facebook if: (1) they
are established users of Facebook, (2) they have a positive attitude
toward Facebook, (3) they are bored and want to do something to
pass the time, and (4) many of their friends use it. It is interesting to
note that gender and age were not found to be significantly related
to either the use of Twitter or Facebook.
The way to avoid multicollinearity is to use warning statistics issued by most statistical analysis programs to identify this problem. One commonly used method is the variance
inflation factor (VIF). The VIF is a single number, and a rule of thumb is that as long as the
VIF is less than 10, multicollinearity is not a concern. With a VIF of greater than 10 associated with any independent variable in the multiple regression equation, it is prudent to remove
that variable from consideration or to otherwise reconstitute the set of independent variables.8
In other words, when examining the output of any multiple regression, the researcher should
inspect the VIF number associated with each independent variable that is retained in the final
multiple regression equation by the procedure. If the VIF is greater than 10, the researcher
should remove that variable from the independent variable set and rerun the multiple regression.9 This iterative process is used until only independent variables that are statistically significant and that have acceptable VIFs are in the final multiple regression equation.
integrateD case
®
Global Motors: How to Run and Interpret Multiple Regression
Analysis on SpSS
Running multiple regression first requires specification of the dependent and independent
variables. Let’s select the desirability of the standard size gasoline automobile model as
the dependent variable, and think about a general conceptual model that might pertain to
Global Motors. We already know from basic marketing strategy that demographics are
MUltiple regression analysis
often used for target marketing, and we have hometown size, age, income, education, and
household size. Also, beliefs are often useful for predicting market segments, and we have
some variables that pertain to beliefs about the gasoline emissions and global warming. To
summarize, we have determined our conceptual model: the desirability of a standard-size
gasoline automobile related to (1) household demographics and (2) beliefs about global
warming. Where appropriate, we have recoded the ordinal demographic variables with
midpoints to convert them to ratio scales.
The ANALYZE-REGRESSION-LINEAR command sequence is used to run a multiple
regression analysis, and the variable, Desirability: standard size gasoline model, is selected
as the dependent variable, while the other nine are specified as the independent variables. You
will find this annotated SPSS clickstream in Figure 15.3.
As the computer output in Figure 15.4 shows, the multiple R value (Adjusted R Square
in the Model Summary table) indicating the strength of relationship between the independent
variables and the dependent variable is .235, signifying that there is some linear relationship
present. Next, the printout reveals that the ANOVA F is significant, signaling that the null
hypothesis of no linear relationship is rejected, and it is justifiable to use a straight-line relationship to model the variables in this case.
It is necessary in multiple regression analysis to test for statistical significance of the
bi (beta) determined for the each independent variable. In other words, you must determine
whether sampling error is influencing the results and giving a false reading. One must test for
significance from zero (the null hypothesis) through the use of separate t tests for each bi. The
SPSS output in Figure 15.4 indicates the levels of statistical significance in the Coefficients
table in the column labeled “Sig.”; we have highlighted in yellow the cases where the significance level is .05 or less (95% level of confidence). It is apparent that size of hometown,
gender, number of people in the household, age, income, and the two attitude variables are
statistically significant. The other independent variables fail this test, meaning that their computed betas must be treated as zeros. No VIF value is greater than 10, so multicollinearity is
not a concern here.
415
The SPSS ANALYZEREGRESSION-LINEAR
command is used to run
multiple regression.
With multiple
regression, look at the
significance level of each
calculated beta.
Figure 15.3 SPSS
Clickstream for
Multiple Regression
Analysis
Source: Reprint courtesy
of International Business
Machines Corporation,
© SPSS, Inc., an IBM
Company.
416
Chapter 15 • Understanding regression analysis BasiCs
Figure 15.4 SPSS
Output for Multiple
Regression Analysis
Source: Reprint courtesy
of International Business
Machines Corporation,
© SPSS, Inc., an IBM
Company.
A trimmed regression
means that you eliminate
the nonsignificant
independent variables and
rerun the regression.
Run trimmed regressions
iteratively until all betas
are significant.
“trimming” the regressiOn FOr signiFicant FinDings
What do you do with the mixed significance results we have just found in our multiple regression analysis? Before we answer this question, you should be aware that this mixed result
is very likely, so understanding how to handle it is vital to developing the ability to perform
multiple regression analysis successfully.
It is standard practice in multiple regression analysis to systematically eliminate one by
one those independent variables that are shown to be insignificant through a process called
trimming. You successively rerun the trimmed model and inspect the significance levels each
time. This series of eliminations or iterations helps to achieve the simplest model by eliminating the nonsignificant independent variables. The trimmed multiple regression model with all
significant independent variables is presented in Figure 15.5.
This trimming process enables the marketing researcher to think in terms of fewer
dimensions within which the dependent variable relationship operates. Generally, successive iterations sometimes cause the multiple R to decrease somewhat, and it is advisable to
scrutinize this value after each run. You can see that the new multiple R is now .236, so in
our example, there has been very little change. Iterations will also cause the beta values and
the intercept value to shift slightly; consequently, it is necessary to inspect all significance
levels of the betas once again. Through a series of iterations, the marketing researcher finally arrives at the final regression equation expressing the salient independent variables
and their linear relationships with the dependent variable. A concise predictive model has
been found.