I admit I'm relatively new to propensity scores and causal analysis. What's different about the operation, and why is it or is it better than adding subpopulation covariates in a regression? The difference in averages of the subjects who participated in the intervention and those who did not can then be interpreted as the impact of the program. Find out about our discounts. I think a more satisfactory answer to your question is not necessarily in the math behind propensity scores but in their logic. The value of the inverse of the propensity score will be extremely high, asymptotically infinity.
Retrieved from Heckman J, H. Stata version 13 and later also offers the built-in command teffects psmatch. Whether they are 'better' depends only on that property, which will vary from problem to problem. } Let Y 0 and Y 1 denote the potential outcomes under control and treatment, respectively. For example, you could match each observation with its three nearest neighbors with: teffects psmatch y t x1 x2 , nn 3 Postestimation By default teffects psmatch does not add any new variables to the data set.
If you want to learn about logistic regression, consider our class on. However, weighting is likely to increase random error in the estimates, and to bias the estimated standard errors downward, even when selection mechanisms are well understood. However this cannot be done with people because people are different from each other, they come in different shapes and sizes, ages, ethnicities, etc. All the other assumptions are essentially the same between regression and matching. Matching by propensity scores eliminates the linearity assumption, but, as some observations may not be matched, you may not be able to say anything about certain groups. We are going to stratify the response variable, i. Definitely, one of the appeals of weighting rather than matching is that it should make the overall process more suitable for sitting inside a bootstrap.
This presents a statistical advantage, certainly, but nothing more. As the procedure only controls for observed variables, any hidden bias due to latent variables may remain after matching. A regression with the matched control and treatment data, even using the same explanatory variables as were used in the matching model, helps address the inevitable lack of complete balance between the two groups. It can also easily be implemented manually. To learn more, see our. The basic syntax of the teffects command when used for propensity score matching is: teffects psmatch outcome treatment covariates In this case the basic command would be: teffects psmatch y t x1 x2 However, the default behavior of teffects is not the same as psmatch2 so we'll need to use some options to get the same results.
New York: Cambridge University Press. We thus strongly recommend switching from psmatch2 to teffects psmatch, and this article will help you make the transition. However, it is very important that characteristics which may have been affected by the treatment are not included. General concerns with matching have also been raised by , who has argued that hidden bias may actually increase because matching on observed variables may unleash bias due to dormant unobserved confounders. Propensity scores are used to reduce by equating groups based on these covariates.
However, no compelling underlying theoretical reason has been presented. Most importantly, they should be able to immediately begin using inverse propensity weighting in their research, using any statistical software program. Approximately 650 households were excluded after the matching process due to the inavailability of sufficiently similar households. A 6-hour workshop taught by , Ph. Note that these observation numbers are only valid in the current sort order, so make sure you can recreate that order if needed.
A recent paper by Abadie and Imbens 2012. But if the two groups do not have substantial overlap, then substantial error may be introduced: E. Once all relevant covariates are selected for inclusion, a logit or a probit regression is performed and the predicted probabilities are obtained. If investigators have a good causal model, it seems better just to fit the model without weights. Here are the average incomes of the treatment and non-treatment groups using the full set of inverse probability weights, and another set truncated at 10. Regression simply extrapolates without checking for this, so extrapolations can give poor predictions. Harvard University and National Bureau of Economic Research established how to take into account that propensity scores are estimated, and teffects psmatch relies on their work.
The possibility of bias arises because the apparent difference in outcome between these two groups of units may depend on characteristics that affected whether or not a unit received a given treatment instead of due to the effect of the treatment per se. Familiarity with logistic regression is helpful but not required. . Since the goal of this propensity score model is to obtain the best estimated probability of treatment assignment, one is not concerned with over-parameterizing this model. This regression has an N of 666, 333 from the treated group and 333 from the control group. He divides the rats in two groups and tests the effects of the drug in one of the groups, which is the treatment group.
The authors concluded that access to piped water reduces disease prevalence by 21% and illness duration by 29%. Moreover, in some cases, weighting will increase the bias in estimated causal parameters. For example, if you are studying a worker training program, you may have all the enrollees be men, but the control, non-participant population be composed of men and women. This procedure matches cases and controls by utilizing random draws from the controls, based on a specified set of key variables. In addition to the answers here I would also suggest you check out the answers to the chl cited.