Causal Inference in Python Blog

Causal Inference in Python, or Causalinference in short, is a software package that implements various statistical and econometric methods used in the field variously known as Causal Inference, Program Evaluation, or Treatment Effect Analysis.

Through a series of blog posts on this page, I will illustrate the use of Causalinference, as well as provide high-level summaries of the underlying econometric theory with the non-specialist audience in mind. Source code for the package can be found at its GitHub page, and detailed documentation is available at causalinferenceinpython.org.

Weighting

In this post we will look at one additional treatment effect estimator — the so-called doubly-robust weighting estimator.

First, it turns out that under unconfoundedness, the following two equalities are true: $$\mathrm{E}\left[\frac{D Y}{p(X)}\right] = \mathrm{E}[Y(1)] \quad \mbox{and} \quad \mathrm{E}\left[\frac{(1-D)Y}{1-p(X)}\right] = \mathrm{E}[Y(0)].$$

This in turn suggests that the expectation of the potential outcomes can be estimated using $$\hat{\mathrm{E}}[Y(1)] = \frac{1}{N} \sum_{i=1}^N \frac{D_i Y_i}{p(X_i)} \quad \mbox{and} \quad \hat{\mathrm{E}}[Y(0)] = \frac{1}{N} \sum_{i=1}^N \frac{(1-D_i) Y_i}{1-p(X_i)}.$$

The difference between these two averages is thus a valid estimator of the average treatment effect \(\mathrm{E}[Y(1)-Y(0)]\). This estimator, also known as the Horvitz-Thompson estimator, is closely related to the inverse probability weighted estimators one might see in the missing data literature. In general, inverse weighting probability is used to inflate the weights for subjects who are underrepresented, thereby eliminating the bias that missing data might introduce. In our case, because the propensity score \(p(X)\) represents the probability of observing \(Y(1)\), inverse weighting by \(p(X)\) gives us exactly the right adjustment for eliminating the bias from selection.

Since the true propensity score is rarely known in practice, we typically use the estimated propensity score \(\hat{p}\) and the modified estimator $$\hat{\mathrm{E}}[Y(1)-Y(0)] = \left(\sum_{i=1}^N \frac{D_i}{\hat{p}(X_i)}\right)^{-1} \sum_{i=1}^N \frac{D_i Y_i}{\hat{p}(X_i)} - \left(\frac{1-D_i}{1-\hat{p}(X_i)}\right)^{-1} \sum_{i=1}^N \frac{(1-D_i) Y_i}{1-\hat{p}(X_i)}.$$

Alternatively, it is possible to compute the above estimator by running weighted least squares with the regression function $$Y_i = \alpha + \beta D_i + \varepsilon_i,$$

with weights given by $$\hat{\lambda}_i = \frac{1}{(1-\hat{p}(X_i))^{1-D_i} \hat{p}(X_i)^{D_i}}.$$

Expressed this way, it is easy to modify the estimator to further control for covariates, by including them into the regression function: $$Y_i = \alpha + \beta D_i + \gamma' X_i + \varepsilon_i.$$

Running weighted least squares on the above regression function with weights \(\hat{\lambda}\) yields the so-called doubly-robust estimator. This estimator has the property that as long as either the specification of the propensity score or the specification of the regression function is correct, it will be consistent for the true average treatment effect.

To compute this estimator using Causalinference, we simply run est_via_weighting, as follows:

>>> Y, D, X = vignette_data()
>>> causal = CausalModel(Y, D, X)
>>> causal.est_propensity_s()
>>> causal.est_via_weighting()
>>> print(causal.estimates)

Treatment Effect Estimates: Weighting

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE     17.989      1.443     12.469      0.000     15.161     20.816

Unfortunately, despite the apparent desirability of the double robustness property, it does not apply in this case as we have misspecified both the propensity score and the regression function. Furthermore, because the estimated propensity scores enter as the denominator, any noise in the estimated propensity scores can actually generate considerable bias. As a result, the estimated average treatment effect we see above turns out to be quite far from the true value of 10. For a more detailed discussion of the relative merits of weighting estimators, see Imbens and Rubin (2015).

References

Imbens, G. & Rubin, D. (2015). Causal inference in statistics, social, and biomedical sciences: An introduction. Cambridge University Press.