Propensity Score Analysis









The PS is a probability. In fact, it is a conditional probability of being exposed given a set of covariates, Pr(E+|covariates). We can calculate a PS for each subject in an observational study regardless of her actual exposure.

Once we have a PS for each subject, we then return to the real world of exposed and unexposed. We can match exposed subjects with unexposed subjects with the same (or very similar) PS. Thus, the probability of being exposed is the same as the probability of being unexposed. The exposure is “random.”


Propensity score analysis (PSA) arose as a way to achieve exchangeability between exposed and unexposed groups in observational studies without relying on traditional model building. Exchangeability is critical to our causal inference.

In experimental studies (e.g. randomized control trials), the probability of being exposed is 0.5. Thus, the probability of being unexposed is also 0.5. The probability of being exposed or unexposed is the same. Therefore, a subject’s actual exposure status is random.

This equal probability of exposure makes us feel more comfortable asserting that the exposed and unexposed groups are alike on all factors except their exposure. Therefore, we say that we have exchangeability between groups.

One of the biggest challenges with observational studies is that the probability of being in the exposed or unexposed group is not random.

There are several occasions where an experimental study is not feasible or ethical. But we still would like the exchangeability of groups achieved by randomization. PSA helps us to mimic an experimental study using data from an observational study.

Conducting PSA

5 Briefly Described Steps to PSA
1. Decide on the set of covariates you want to include.
2. Use logistic regression to obtain a PS for each subject.
3. Match exposed and unexposed subjects on the PS.
4. Check the balance of covariates in the exposed and unexposed groups after matching on PS.
5. Calculate the effect estimate and standard errors with this match population.

1. Decide on the set of covariates you want to include.
This is the critical step to your PSA. We use these covariates to predict our probability of exposure. We want to include all predictors of the exposure and none of the effects of the exposure. We do not consider the outcome in deciding upon our covariates. We may include confounders and interaction variables. If we are in doubt of the covariate, we include it in our set of covariates (unless we think that it is an effect of the exposure).

2. Use logistic regression to obtain a PS for each subject.
We use the covariates to predict the probability of being exposed (which is the PS). The more true covariates we use, the better our prediction of the probability of being exposed. We calculate a PS for all subjects, exposed and unexposed.

Using numbers and Greek letters:
ln(PS/(1-PS))= β0+β1X1+…+βpXp
PS= (exp(β0+β1X1+…+βpXp)) / (1+exp(β0 +β1X1 +…+βpXp))

3. Match exposed and unexposed subjects on the PS.
We want to match the exposed and unexposed subjects on their probability of being exposed (their PS). If we cannot find a suitable match, then that subject is discarded. Discarding a subject can introduce bias into our analysis.

Several methods for matching exist. Most common is the nearest neighbor within calipers. The nearest neighbor would be the unexposed subject that has a PS nearest to the PS for our exposed subject.

We may not be able to find an exact match, so we say that we will accept a PS score within certain caliper bounds. We set an apriori value for the calipers. This value typically ranges from +/-0.01 to +/-0.05. Below 0.01, we can get a lot of variability within the estimate because we have difficulty finding matches and this leads us to discard those subjects (incomplete matching). If we go past 0.05, we may be less confident that our exposed and unexposed are truly exchangeable (inexact matching). Typically, 0.01 is chosen for a cutoff.

The ratio of exposed to unexposed subjects is variable. 1:1 matching may be done, but oftentimes matching with replacement is done instead to allow for better matches. Matching with replacement allows for the unexposed subject that has been matched with an exposed subject to be returned to the pool of unexposed subjects available for matching.

There is a trade-off in bias and precision between matching with replacement and without (1:1). Matching with replacement allows for reduced bias because of better matching between subjects. Matching without replacement has better precision because more subjects are used.

4. Check the balance of covariates in the exposed and unexposed groups after matching on PS.
Substantial overlap in covariates between the exposed and unexposed groups must exist for us to make causal inferences from our data. This is true in all models, but in PSA, it becomes visually very apparent. If there is no overlap in covariates (i.e. if we have no overlap of propensity scores), then all inferences would be made off-support of the data (and thus, conclusions would be model dependent).

We can use a couple of tools to assess our balance of covariates. First, we can create a histogram of the PS for exposed and unexposed groups. Second, we can assess the standardized difference. Third, we can assess the bias reduction.

Standardized difference=(100*(mean(x exposed)-(mean(x unexposed)))/(sqrt((SD^2exposed+ SD^2unexposed)/2))

More than 10% difference is considered bad. Our covariates are distributed too differently between exposed and unexposed groups for us to feel comfortable assuming exchangeability between groups.
Bias reduction= 1-(|standardized difference matched|/|standardized difference unmatched|)
We would like to see substantial reduction in bias from the unmatched to the matched analysis. What substantial means is up to you.
5. Calculate the effect estimate and standard errors with this matched population.
Estimate of average treatment effect of the treated (ATT)=sum(y exposed- y unexposed)/# of matched pairs
Standard errors may be calculated using bootstrap resampling methods.
The resulting matched pairs can also be analyzed using standard statistical methods, e.g. Kaplan-Meier, Cox proportional hazards models. You can include PS in final analysis model as a continuous measure or create quartiles and stratify.

A few more notes on PSA
PSA can be used for dichotomous or continuous exposures.
Because PSA can only address measured covariates, complete implementation should include sensitivity analysis to assess unobserved covariates.
PSA can be used in SAS, R, and Stata. These are add-ons that are available for download.
Though PSA has traditionally been used in epidemiology and biomedicine, it has also been used in educational testing (Rubin is one of the founders) and ecology (EPA has a website on PSA!).

Strengths and Limitations of PSA

Can include interaction terms in calculating PSA.
PSA uses one score instead of multiple covariates in estimating the effect. This allows an investigator to use dozens of covariates, which is not usually possible in traditional multivariable models because of limited degrees of freedom and zero count cells arising from stratifications of multiple covariates.
Can be used for dichotomous and continuous variables (continuous variables has lots of ongoing research).
Patients included in this study may be a more representative sample of “real world” patients than an RCT would provide.
Since we don’t use any information on the outcome when calculating the PS, no analysis based on the PS will bias effect estimation.
We avoid off-support inference.
We rely less on p-values and other model specific assumptions.
We don’t need to know causes of the outcome to create exchangeability.

The most serious limitation is that PSA only controls for measured covariates.
Group overlap must be substantial (to enable appropriate matching).
Matching on observed covariates may open backdoor paths in unobserved covariates and exacerbate hidden bias.
PSA works best in large samples to obtain a good balance of covariates.
If we have missing data, we get a missing PS.
Does not take into account clustering (problematic for neighborhood-level research).


Textbooks & Chapters

Oakes JM and Johnson PJ. 2006. Propensity score matching for social epidemiology in Methods in Social Epidemiology (eds. JM Oakes and JS Kaufman), Jossey-Bass, San Francisco, CA.
Simple and clear introduction to PSA with worked example from social epidemiology.

Hirano K and Imbens GW. 2005. The propensity score with continuous treatments in Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives: An Essential Journey with Donald Rubin’s Statistical Family (eds. A Gelman and XL Meng), John Wiley & Sons, Ltd, Chichester, UK.
Discussion of using PSA for continuous treatments.

Methodological Articles

Rosenbaum PR and Rubin DB. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1); 41-55.
Germinal article on PSA.

Rosenbaum PR and Rubin DB. 1985. The bias due to incomplete matching. Biometrika, 41(1); 103-116.
Discussion of the bias due to incomplete matching of subjects in PSA.

D’Agostino RB. 1998. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statist Med, 17; 2265-2281.
A further discussion of PSA with worked examples. Includes calculations of standardized differences and bias reduction.

Joffe MM and Rosenbaum PR. 1999. Invited commentary: Propensity scores. Am J Epidemiol,150(4); 327-333.
Discussion of the uses and limitations of PSA. Also includes discussion of PSA in case-cohort studies.

Application Articles

Kumar S and Vollmer S. 2012. Does access to improved sanitation reduce diarrhea in rural India. Health Econ. DOI: 10.1002/hec.2809
Applies PSA to sanitation and diarrhea in children in rural India. Lots of explanation on how PSA was conducted in the paper. Good example.

Suh HS, Hay JW, Johnson KA, and Doctor, JN. 2012. Comparative effectiveness of statin plus fibrate combination therapy and statin monotherapy in patients with type 2 diabetes: use of propensity-score and instrumental variable methods to adjust for treatment-selection bias.Pharmacoepidemiol and Drug Safety. DOI: 10.1002/pds.3261
Applies PSA to therapies for type 2 diabetes. Also compares PSA with instrumental variables.

Rubin DB. 2001. Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Serv Outcomes Res Method, 2; 169-188.
More advanced application of PSA by one of PSA’s originators.

Landrum MB and Ayanian JZ. 2001. Causal effect of ambulatory specialty care on mortality following myocardial infarction: A comparison of propensity socre and instrumental variable analysis. Health Serv Outcomes Res Method, 2; 221-245.
A good clear example of PSA applied to mortality after MI. Comparison with IV methods.

Bingenheimer JB, Brennan RT, and Earls FJ. 2005. Firearm violence exposure and serious violent behavior. Science, 308; 1323-1326.
Interesting example of PSA applied to firearm violence exposure and subsequent serious violent behavior.


Statistical Software Implementation
Software for implementing matching methods and propensity scores:

For SAS macro: Computerized matching of cases to controls using the greedy matching algorithm with a fixed number of controls per case.
vmatch: Computerized matching of cases to controls using variable optimal matching.

SAS documentation:

Intro to Stata:
http://help.pop.psu/edu/help-by-statistical-method/propensity-metching/Intro to P-score_Sp08.pdf

For R program:

General Information on PSA
Good introduction to PSA from Kaltenbach:

Slides from Thomas Love 2003 ASA presentation:**Propensity**.pdf

Resources (handouts, annotated bibliography) from Thomas Love:

Explanation and example from ecology of PSA:


An online workshop on Propensity Score Matching is available through EPIC

Estimating the Effects of Mental Health Interventions in Non-experimental Settings
Elizabeth Stuart
Th-F, June 14-15, 2012, 8:30 am – 4:30 pm