Chapter 16. General Principles of Econometric Modelling

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.13 MB, 1,644 trang )

470

16. GENERAL PRINCIPLES OF ECONOMETRIC MODELLING

associated evidence.” The other extreme is “‘Data-driven’ approaches, where models

are developed to closely describe the data . . . These suﬀer from sample dependence in

that accidental and transient data features are embodied as tightly in the model as

permanent aspects, so that extension of the data set often reveal predictive failure.”

Hendry proposes the following useful distinction of 4 levels of knowledge:

A Consider the situation where we know the complete structure of the process

which gernerates economic data and the values of all its parameters. This is the

equivalent of a probability theory course (example: rolling a perfect die), but involves

economic theory and econometric concepts.

B consider a known economic structure with unknown values of the parameters.

Equivalent to an estimation and inference course in statistics (example: independent

rolls of an imperfect die and estimating the probabilities of the diﬀerent faces) but

focusing on econometrically relevant aspects.

C is “the empirically relevant situation where neither the form of the datagenerating process nor its parameter values are known. (Here one does not know

whether the rolls of the die are independent, or whether the probabilities of the

diﬀerent faces remain constant.) Model discovery, evaluation, data mining, modelsearch procedures, and associated methodological issues.

D Forecasting the future when the data outcomes are unknown. (Model of money

demand under ﬁnancial innovation).

16. GENERAL PRINCIPLES OF ECONOMETRIC MODELLING

471

The example of Keynes’s consumption function in [Gre97, pp. 221/22] sounds at

the beginning as if it was close to B, but in the further discussion Greene goes more

and more over to C. It is remarkable here that economic theory usually does not yield

functional forms. Greene then says: the most common functional form is the linear

one c = α + βx with α > 0 and 0 < β < 1. He does not mention the aggregation

problem hidden in this. Then he says: “But the linear function is only approximate;

in fact, it is unlikely that consumption and income can be connected by any simple

relationship. The deterministic relationship is clearly inadequate.” Here Greene

uses a random relationship to model a relationship which is quantitatively “fuzzy.”

This is an interesting and relevant application of randomness.

A sentence later Green backtracks from this insight and says: “We are not so

ambitious as to attempt to capture every inﬂuence in the relationship, but only those

that are substantial enough to model directly.” The “fuzziness” is not due to a lack

of ambition of the researcher, but the world is inherently quantiatively fuzzy. It is

not that we don’t know the law, but there is no law; not everything that happens in

an economy is driven by economic laws. Greene’s own example, in Figure 6.2, that

during the war years consumption was below the trend line, shows this.

Greene’s next example is the relationship between income and education. This

illustrates multiple instead of simple regression: one must also include age, and then

also the square of age, even if one is not interested in the eﬀect which age has, but

472

16. GENERAL PRINCIPLES OF ECONOMETRIC MODELLING

in order to “control” for this eﬀect, so that the eﬀects of education and age will not

be confounded.

Problem 224. Why should a regression of income on education include not only

age but also the square of age?

Answer. Because the eﬀect of age becomes smaller with increases in age.

Critical Realist approaches are [Ron02] and [Mor02].

CHAPTER 17

Causality and Inference

This chapter establishes the connection between critical realism and Holland and

Rubin’s modelling of causality in statistics as explained in [Hol86] and [WM83, pp.

3–25] (and the related paper [LN81] which comes from a Bayesian point of view). A

diﬀerent approach to causality and inference, [Roy97], is discussed in chapter/section

2.8. Regarding critical realism and econometrics, also [Dow99] should be mentioned:

this is written by a Post Keynesian econometrician working in an explicitly realist

framework.

Everyone knows that correlation does not mean causality. Nevertheless, experience shows that statisticians can on occasion make valid inferences about causality. It is therefore legitimate to ask: how and under which conditions can causal

473

474

17. CAUSALITY AND INFERENCE

conclusions be drawn from a statistical experiment or a statistical investigation of

nonexperimental data?

Holland starts his discussion with a description of the “logic of association”

(= a ﬂat empirical realism) as opposed to causality (= depth realism). His model

for the “logic of association” is essentially the conventional mathematical model of

probability by a set U of “all possible outcomes,” which we described and criticized

on p. 12 above.

After this, Rubin describes his own model (developed together with Holland).

Rubin introduces “counterfactual” (or, as Bhaskar would say, “transfactual”) elements since he is not only talking about the value a variable takes for a given

individual, but also the value this variable would have taken for the same individual

if the causing variables (which Rubin also calls “treatments”) had been diﬀerent.

For simplicity, Holland assumes here that the treatment variable has only two levels:

either the individual receives the treatment, or he/she does not (in which case he/she

belongs to the “control” group). The correlational view would simply measure the

average response of those individuals who receive the treatment, and of those who

don’t. Rubin recognizes in his model that the same individual may or may not be

subject to the treatment, therefore the response variable has two values, one being

the individual’s response if he or she receives the treatment, the other the response

if he or she does not.

17. CAUSALITY AND INFERENCE

475

A third variable indicates who receives the treatment. I.e, he has the “causal indicator” s which can take two values, t (treatment) and c (control), and two variables

y t and y c , which, evaluated at individual ω, indicate the responses this individual

would give in case he was subject to the treatment, and in case he was or not.

Rubin deﬁnes y t − y c to be the causal eﬀect of treatment t versus the control

c. But this causal eﬀect cannot be observed. We cannot observe how those indiviuals who received the treatement would have responded if they had not received

the treatment, despite the fact that this non-actualized response is just as real as

the response which they indeed gave. This is what Holland calls the Fundamental

Problem of Causal Inference.

Problem 225. Rubin excludes race as a cause because the individual cannot do

anything about his or her race. Is this argument justiﬁed?

Does this Fundamental Problem mean that causal inference is impossible? Here

are several scenarios in which causal inference is possible after all:

•

•

•

•

Temporal stability of the response, and transience of the causal eﬀect.

Unit homogeneity.

Constant eﬀect, i.e., yt (ω) − yc (ω) is the same for all ω.

Independence of the response with respect to the selection process regarding

who gets the treatment.

476

17. CAUSALITY AND INFERENCE

For an example of this last case, say

Problem 226. Our universal set U consists of patients who have a certain disease. We will explore the causal eﬀect of a given treatment with the help of three

events, T , C, and S, the ﬁrst two of which are counterfactual, compare [Hol86].

These events are deﬁned as follows: T consists of all patients who would recover

if given treatment; C consists of all patients who would recover if not given treatment (i.e., if included in the control group). The event S consists of all patients

actually receiving treatment. The average causal eﬀect of the treatment is deﬁned as

Pr[T ] − Pr[C].

• a. 2 points Show that

(17.0.6)

Pr[T ] = Pr[T |S] Pr[S] + Pr[T |S ](1 − Pr[S])

and that

(17.0.7)

Pr[C] = Pr[C|S] Pr[S] + Pr[C|S ](1 − Pr[S])

Which of these probabilities can be estimated as the frequencies of observable outcomes

and which cannot?

Answer. This is a direct application of (2.7.9). The problem here is that for all ω ∈ C, i.e.,

for those patients who do not receive treatment, we do not know whether they would have recovered

17. CAUSALITY AND INFERENCE

477

if given treatment, and for all ω ∈ T , i.e., for those patients who do receive treatment, we do not

know whether they would have recovered if not given treatment. In other words, neither Pr[T |S]

nor E[C|S ] can be estimated as the frequencies of observable outcomes.

• b. 2 points Assume now that S is independent of T and C, because the subjects

are assigned randomly to treatment or control. How can this be used to estimate those

elements in the equations (17.0.6) and (17.0.7) which could not be estimated before?

Answer. In this case, Pr[T |S] = Pr[T |S ] and Pr[C|S ] = Pr[C|S]. Therefore, the average

causal eﬀect can be simpliﬁed as follows:

Pr[T ] − Pr[C] = Pr[T |S] Pr[S] + Pr[T |S ](1 − Pr[S]) − Pr[C|S] Pr[S] + Pr[C|S ](1 − Pr[S])

= Pr[T |S] Pr[S] + Pr[T |S](1 − Pr[S]) − Pr[C|S ] Pr[S] + Pr[C|S ](1 − Pr[S])

(17.0.8)

= Pr[T |S] − Pr[C|S ]

• c. 2 points Why were all these calculations necessary? Could one not have

deﬁned from the beginning that the causal eﬀect of the treatment is Pr[T |S]−Pr[C|S ]?

Answer. Pr[T |S] − Pr[C|S ] is only the empirical diﬀerence in recovery frequencies between

those who receive treatment and those who do not. It is always possible to measure these diﬀerences,

but these diﬀerences are not necessarily due to the treatment but may be due to other reasons.

478

17. CAUSALITY AND INFERENCE

The main message of the paper is therefore: before drawing causal conclusions

one should acertain whether one of these conditions apply which make causal conclusions possible.

In the rest of the paper, Holland compares his approach with other approaches.

Suppes’s deﬁnitions of causality are interesting:

• If r < s denote two time values, event Cr is a prima facie cause of Es iﬀ

Pr[Es |Cr ] > Pr[Es ].

• Cr is a spurious cause of Es iﬀ it is a prima facie cause of Es and for some

q < r < s there is an event Dq so that Pr[Es |Cr , Dq ] = Pr[Es |Dq ] and

Pr[Es |Cr , Dq ] ≥ Pr[Es |Cr ].

• Event Cr is a genuine cause of Es iﬀ it is a prima facie but not a spurious

cause.

This is quite diﬀerent than Rubin’s analysis. Suppes concentrates on the causes of a

given eﬀect, not the eﬀects of a given cause. Suppes has a Popperian falsiﬁcationist

view: a hypothesis is good if one cannot falsify it, while Holland has the depth-realist

view which says that the empirical is only a small part of reality, and which looks at

the underlying mechanisms.

Problem 227. Construct an example of a probability ﬁeld with a spurious cause.

17. CAUSALITY AND INFERENCE

479

Granger causality (see chapter/section 67.2.1) is based on the idea: knowing

a cause ought to improve our ability to predict. It is more appropriate to speak

here of “noncausality” instead of causality: a variable does not cause another if

knowing that variable does not improve our ability to predict the other variable.

Granger formulates his theory in terms of a speciﬁc predictor, the BLUP, while

Holland extends it to all predictors. Granger works on it in a time series framework,

while Holland gives a more general formulation. Holland’s formulation strips oﬀ the

unnecessary detail in order to get at the essence of things. Holland deﬁnes: x is not

a Granger cause of y relative to the information in z (which in the timeseries context

contains the past values of y) if and only if x and y are conditionally independent

given z. Problem 40 explains why this can be tested by testing predictive power.

Xem Thêm

Chapter 16. General Principles of Econometric Modelling

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về