Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )
470
16. GENERAL PRINCIPLES OF ECONOMETRIC MODELLING
associated evidence.” The other extreme is “‘Data-driven’ approaches, where models
are developed to closely describe the data . . . These suffer from sample dependence in
that accidental and transient data features are embodied as tightly in the model as
permanent aspects, so that extension of the data set often reveal predictive failure.”
Hendry proposes the following useful distinction of 4 levels of knowledge:
A Consider the situation where we know the complete structure of the process
which gernerates economic data and the values of all its parameters. This is the
equivalent of a probability theory course (example: rolling a perfect die), but involves
economic theory and econometric concepts.
B consider a known economic structure with unknown values of the parameters.
Equivalent to an estimation and inference course in statistics (example: independent
rolls of an imperfect die and estimating the probabilities of the different faces) but
focusing on econometrically relevant aspects.
C is “the empirically relevant situation where neither the form of the datagenerating process nor its parameter values are known. (Here one does not know
whether the rolls of the die are independent, or whether the probabilities of the
different faces remain constant.) Model discovery, evaluation, data mining, modelsearch procedures, and associated methodological issues.
D Forecasting the future when the data outcomes are unknown. (Model of money
demand under financial innovation).
16. GENERAL PRINCIPLES OF ECONOMETRIC MODELLING
471
The example of Keynes’s consumption function in [Gre97, pp. 221/22] sounds at
the beginning as if it was close to B, but in the further discussion Greene goes more
and more over to C. It is remarkable here that economic theory usually does not yield
functional forms. Greene then says: the most common functional form is the linear
one c = α + βx with α > 0 and 0 < β < 1. He does not mention the aggregation
problem hidden in this. Then he says: “But the linear function is only approximate;
in fact, it is unlikely that consumption and income can be connected by any simple
relationship. The deterministic relationship is clearly inadequate.” Here Greene
uses a random relationship to model a relationship which is quantitatively “fuzzy.”
This is an interesting and relevant application of randomness.
A sentence later Green backtracks from this insight and says: “We are not so
ambitious as to attempt to capture every influence in the relationship, but only those
that are substantial enough to model directly.” The “fuzziness” is not due to a lack
of ambition of the researcher, but the world is inherently quantiatively fuzzy. It is
not that we don’t know the law, but there is no law; not everything that happens in
an economy is driven by economic laws. Greene’s own example, in Figure 6.2, that
during the war years consumption was below the trend line, shows this.
Greene’s next example is the relationship between income and education. This
illustrates multiple instead of simple regression: one must also include age, and then
also the square of age, even if one is not interested in the effect which age has, but
472
16. GENERAL PRINCIPLES OF ECONOMETRIC MODELLING
in order to “control” for this effect, so that the effects of education and age will not
be confounded.
Problem 224. Why should a regression of income on education include not only
age but also the square of age?
Answer. Because the effect of age becomes smaller with increases in age.
Critical Realist approaches are [Ron02] and [Mor02].
CHAPTER 17
Causality and Inference
This chapter establishes the connection between critical realism and Holland and
Rubin’s modelling of causality in statistics as explained in [Hol86] and [WM83, pp.
3–25] (and the related paper [LN81] which comes from a Bayesian point of view). A
different approach to causality and inference, [Roy97], is discussed in chapter/section
2.8. Regarding critical realism and econometrics, also [Dow99] should be mentioned:
this is written by a Post Keynesian econometrician working in an explicitly realist
framework.
Everyone knows that correlation does not mean causality. Nevertheless, experience shows that statisticians can on occasion make valid inferences about causality. It is therefore legitimate to ask: how and under which conditions can causal
473
474
17. CAUSALITY AND INFERENCE
conclusions be drawn from a statistical experiment or a statistical investigation of
nonexperimental data?
Holland starts his discussion with a description of the “logic of association”
(= a flat empirical realism) as opposed to causality (= depth realism). His model
for the “logic of association” is essentially the conventional mathematical model of
probability by a set U of “all possible outcomes,” which we described and criticized
on p. 12 above.
After this, Rubin describes his own model (developed together with Holland).
Rubin introduces “counterfactual” (or, as Bhaskar would say, “transfactual”) elements since he is not only talking about the value a variable takes for a given
individual, but also the value this variable would have taken for the same individual
if the causing variables (which Rubin also calls “treatments”) had been different.
For simplicity, Holland assumes here that the treatment variable has only two levels:
either the individual receives the treatment, or he/she does not (in which case he/she
belongs to the “control” group). The correlational view would simply measure the
average response of those individuals who receive the treatment, and of those who
don’t. Rubin recognizes in his model that the same individual may or may not be
subject to the treatment, therefore the response variable has two values, one being
the individual’s response if he or she receives the treatment, the other the response
if he or she does not.
17. CAUSALITY AND INFERENCE
475
A third variable indicates who receives the treatment. I.e, he has the “causal indicator” s which can take two values, t (treatment) and c (control), and two variables
y t and y c , which, evaluated at individual ω, indicate the responses this individual
would give in case he was subject to the treatment, and in case he was or not.
Rubin defines y t − y c to be the causal effect of treatment t versus the control
c. But this causal effect cannot be observed. We cannot observe how those indiviuals who received the treatement would have responded if they had not received
the treatment, despite the fact that this non-actualized response is just as real as
the response which they indeed gave. This is what Holland calls the Fundamental
Problem of Causal Inference.
Problem 225. Rubin excludes race as a cause because the individual cannot do
anything about his or her race. Is this argument justified?
Does this Fundamental Problem mean that causal inference is impossible? Here
are several scenarios in which causal inference is possible after all:
•
•
•
•
Temporal stability of the response, and transience of the causal effect.
Unit homogeneity.
Constant effect, i.e., yt (ω) − yc (ω) is the same for all ω.
Independence of the response with respect to the selection process regarding
who gets the treatment.
476
17. CAUSALITY AND INFERENCE
For an example of this last case, say
Problem 226. Our universal set U consists of patients who have a certain disease. We will explore the causal effect of a given treatment with the help of three
events, T , C, and S, the first two of which are counterfactual, compare [Hol86].
These events are defined as follows: T consists of all patients who would recover
if given treatment; C consists of all patients who would recover if not given treatment (i.e., if included in the control group). The event S consists of all patients
actually receiving treatment. The average causal effect of the treatment is defined as
Pr[T ] − Pr[C].
• a. 2 points Show that
(17.0.6)
Pr[T ] = Pr[T |S] Pr[S] + Pr[T |S ](1 − Pr[S])
and that
(17.0.7)
Pr[C] = Pr[C|S] Pr[S] + Pr[C|S ](1 − Pr[S])
Which of these probabilities can be estimated as the frequencies of observable outcomes
and which cannot?
Answer. This is a direct application of (2.7.9). The problem here is that for all ω ∈ C, i.e.,
for those patients who do not receive treatment, we do not know whether they would have recovered
17. CAUSALITY AND INFERENCE
477
if given treatment, and for all ω ∈ T , i.e., for those patients who do receive treatment, we do not
know whether they would have recovered if not given treatment. In other words, neither Pr[T |S]
nor E[C|S ] can be estimated as the frequencies of observable outcomes.
• b. 2 points Assume now that S is independent of T and C, because the subjects
are assigned randomly to treatment or control. How can this be used to estimate those
elements in the equations (17.0.6) and (17.0.7) which could not be estimated before?
Answer. In this case, Pr[T |S] = Pr[T |S ] and Pr[C|S ] = Pr[C|S]. Therefore, the average
causal effect can be simplified as follows:
Pr[T ] − Pr[C] = Pr[T |S] Pr[S] + Pr[T |S ](1 − Pr[S]) − Pr[C|S] Pr[S] + Pr[C|S ](1 − Pr[S])
= Pr[T |S] Pr[S] + Pr[T |S](1 − Pr[S]) − Pr[C|S ] Pr[S] + Pr[C|S ](1 − Pr[S])
(17.0.8)
= Pr[T |S] − Pr[C|S ]
• c. 2 points Why were all these calculations necessary? Could one not have
defined from the beginning that the causal effect of the treatment is Pr[T |S]−Pr[C|S ]?
Answer. Pr[T |S] − Pr[C|S ] is only the empirical difference in recovery frequencies between
those who receive treatment and those who do not. It is always possible to measure these differences,
but these differences are not necessarily due to the treatment but may be due to other reasons.
478
17. CAUSALITY AND INFERENCE
The main message of the paper is therefore: before drawing causal conclusions
one should acertain whether one of these conditions apply which make causal conclusions possible.
In the rest of the paper, Holland compares his approach with other approaches.
Suppes’s definitions of causality are interesting:
• If r < s denote two time values, event Cr is a prima facie cause of Es iff
Pr[Es |Cr ] > Pr[Es ].
• Cr is a spurious cause of Es iff it is a prima facie cause of Es and for some
q < r < s there is an event Dq so that Pr[Es |Cr , Dq ] = Pr[Es |Dq ] and
Pr[Es |Cr , Dq ] ≥ Pr[Es |Cr ].
• Event Cr is a genuine cause of Es iff it is a prima facie but not a spurious
cause.
This is quite different than Rubin’s analysis. Suppes concentrates on the causes of a
given effect, not the effects of a given cause. Suppes has a Popperian falsificationist
view: a hypothesis is good if one cannot falsify it, while Holland has the depth-realist
view which says that the empirical is only a small part of reality, and which looks at
the underlying mechanisms.
Problem 227. Construct an example of a probability field with a spurious cause.
17. CAUSALITY AND INFERENCE
479
Granger causality (see chapter/section 67.2.1) is based on the idea: knowing
a cause ought to improve our ability to predict. It is more appropriate to speak
here of “noncausality” instead of causality: a variable does not cause another if
knowing that variable does not improve our ability to predict the other variable.
Granger formulates his theory in terms of a specific predictor, the BLUP, while
Holland extends it to all predictors. Granger works on it in a time series framework,
while Holland gives a more general formulation. Holland’s formulation strips off the
unnecessary detail in order to get at the essence of things. Holland defines: x is not
a Granger cause of y relative to the information in z (which in the timeseries context
contains the past values of y) if and only if x and y are conditionally independent
given z. Problem 40 explains why this can be tested by testing predictive power.