Causal Inference in Python Blog

Causal Inference in Python, or Causalinference in short, is a software package that implements various statistical and econometric methods used in the field variously known as Causal Inference, Program Evaluation, or Treatment Effect Analysis.

Through a series of blog posts on this page, I will illustrate the use of Causalinference, as well as provide high-level summaries of the underlying econometric theory with the non-specialist audience in mind. Source code for the package can be found at its GitHub page, and detailed documentation is available at causalinferenceinpython.org.

Matching

One way to overcome the problem of excessive extrapolation by least squares involves directly executing on the unconfoundedness assumption and nonparametrically matching subjects with similar covariate values together. As we shall see, least squares still plays an important role under this approach, but its scope is restricted to being a local one.

Recall that unconfoundedness says that conditional on \(X\), treatment assignment is as good as random. This means that conditional on \(X\), we should be able to estimate the conditional average treatment effect \(\mathrm{E}[Y(1)-Y(0)|X]\) by simply computing the difference between the average outcomes of the treated and control subjects that share similar covariate values. Once the conditional average treatment effects have been identified and estimated, we should then be able to recover the unconditional average treatment effect by aggregating them appropriately. This is the matching estimator of Abadie and Imbens (2006) in a nutshell.

More specifically, match each unit \(i\) in the sample with a unit \(m(i)\) in the opposite group, where $$m(i) = \mathrm{argmin}_{j: D_j \neq D_i} \|X_j - X_i\|.$$

Here \(\|X_j - X_i\|\) denotes some measure of distance between the covariate vectors \(X_j\) and \(X_i\). More precisely, it is defined as $$\|X_j - X_i\| = (X_j-X_i)' W (X_j-X_i).$$

By varying the positive-definite weighting matrix \(W\) we can obtain different measures of distance. One reasonable candidate for \(W\) is the inverse variance matrix \(\mathrm{diag}\{\hat{\sigma}_1^{-2}, \ldots, \hat{\sigma}_K^{-2}\}\), where \(\hat{\sigma}_k\) denotes the sample standard deviation of the \(k\)th covariate. Using this weighting matrix ensures that each covariate is put on a comparable scale before being aggregated.

Once the matching is complete, we can estimate the subject-level treatment effect by calculating the difference in observed outcomes between the subject and its matched twin. Averaging over these individual treatment effect estimates gives an estimate of the overall average treatment effect.

In Causalinference, we can implement this matching estimator and display the results by

>>> causal.est_via_matching()
>>> print(causal.estimates)

Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE     14.245      1.038     13.728      0.000     12.211     16.278
           ATC     10.288      1.815      5.669      0.000      6.731     13.845
           ATT     16.796      0.940     17.866      0.000     14.953     18.638

While the basic matching estimator is theoretically sound, as we see above its actual performance seems to be lacking, as its ATE estimate of 14.245 still seems quite far from the true value of 10. One reason is that in practice, the matching of one subject to another is rarely perfect. To the extent that a matching discrepancy exists, i.e., that \(X_i\) and \(X_{m(i)}\) are not equal, the matching estimator of the subject-level treatment effect will generally be biased.

It turns out it is possible to correct for this bias. In particular, one can show that the unit-level bias for a treated unit is equal to $$\mathrm{E}[Y(0)|X=X_i] - \mathrm{E}[Y(0)|X=X_{m(i)}].$$

A popular way of adjusting for this bias is to assume a linear specification for the conditional expectation function of \(Y(0)\) given \(X\), and approximate the above term by the inner product of the matching discrepancy and slope coefficient from an ancillary regression. The same principle of course applies for control units.

Although it might seem like we are back to assuming a linear regression function as was the case with OLS, the role played by the linear approximation is quite different here. In the OLS case, we are using the linearity assumption to extrapolate globally across the covariate space. In the current scenario, however, the linear approximation is only applied locally, to matched units whose covariate values were already quite similar.

To invoke bias adjustment in Causalinference, we simply supply True to the optional argument bias_adj, as follows:

>>> causal.est_via_matching(bias_adj=True)
>>> print(causal.estimates)

Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      9.624      0.245     39.354      0.000      9.145     10.103
           ATC      9.642      0.270     35.776      0.000      9.114     10.170
           ATT      9.606      0.318     30.159      0.000      8.981     10.230

As we can see above, the resulting ATE estimate is now much closer to the true ATE of 10.

In addition to bias adjustments, est_via_matching accepts two other optional parameters worth mentioning. The first is weights, which allows users to supply their own positive-definite weighting matrix to use for calculating distances between covariate vectors. The second is matches, which allows users to implement multiple matching by supplying an integer that is greater than 1. Setting matches=3, for instance, will result in having the three closest units matched to a given subject. In general, increasing this number introduces biases (since less ideal matches are being included), but lowers variance (as the counterfactual estimates are less dependent on any single unit). Typically it is advised that the number of matches be kept under 4, though there are no hard-and-fast rules.

References

Abadie, A. & Imbens, G. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica, 74, 235-267.