Construction of Complex Survey Weights
Software |
|
Overview
Due to the prohibitive costs and practicalities of sampling for and conducting large scale population surveys, methodologies for complex survey design, sampling, weighting and data analysis were developed. These methods have been refined over the 20th century, and have been implemented widely in complex survey design and data analysis. While there are limited general examples of step-by-step walk throughs of survey weight construction, the listed resources at the end of this page will assist in various pieces of the weighting process. SUDAAN and R do however provide worked examples in their documentation, which I have found to be extremely useful.
Description
Before we get into practical approaches to weighting, I will provide some definitions of common terminology:
Basic Sample Types
Simple Random Sample– A sample chosen at random from a complete sampling frame. This is the ideal method, as all units have an equal probability of selection.
Stratified Sample– A sample chosen from mutually exclusive, meaningful groups or strata in a sampling frame. This approach is applied when population strata differ in meaningful ways, that are of interest or consequence to effect estimates and variance estimation. Within strata, simple random samples are often drawn.
Basic Sampling Terms
Weights-Adjustment factors assigned to each individual that account for their probability of selection, as well as other factors including non-response, and post stratification. Often the product of multiple weights is used for standardization.
Primary Sampling Unit (PSU)- This is the first unit to be sampled, per the study design. Given a 2 stage sample selected first by State of residence, second by gender, and third by age, State of residence would be the PSU. In a simple random sample, each individual is their own PSU.
Strata– A grouping of individuals who share a common characteristic of interest in study design or analysis. For example, common strata in complex surveys include sex, race and age. Strata can also be geographical or temporal in nature.
Finite Population Correction– This value is used in the calculation of standard errors when the sampling fraction becomes large.
Sampling With Replacement– Sampling with replacement is the drawing of sample units from a population, and replacing those that have already been sampled in the population with interchangeable units.
Sampling Without Replacement-Sampling without replacement is the drawing of sample units without replacing those that have already been sampled in the study population.
A Typology of Weights
Design Weights– Often constructed to adjust for sample design, including oversampling for certain strata.
Non-Response– Constructed to adjust for non-response among strata and substrata.
Post-Stratification– Constructed to standardize study effect estimates to a particular population, such as the population of New York State, or the continental United States.
Attrition- Attrition weights are constructed to account for attrition of study sample over multiple waves.
Practical Approaches to Constructing Complex Survey Weights
Construction of complex survey weights is always rooted in study design and usually involves a number of layers of adjustment, including many of the types of weights mentioned in the above typology. Thus, while the final weighting adjustment used will depend on the analysis, and study, it will often include the product of multiple types of weights.
Below, I give a brief summary of common statistical packages and their weighting features.
R– R provides the most flexible system available for constructing and analyzing weights. The SURVEY package, which is well documented, provides a number of applications for constructing weights including post stratify, and rake. It also includes a wide array of analytic procedures, and will handle all types of sampling designs. One nice feature to survey analysis in R is that users apply a single survey design object containing all relevant weighting adjustments to their analyses.
SAS– While SAS has not traditionally been the “go-to” software for construction of weights, or analysis of complex surveys, a number of new included in recent editions have made it much a much more viable program for these purposes. SAS supports a range of survey designs and boasts a growing number of analytic procedures for complex surveys.
STATA- Stata comes with a wide variety of procedures for analyzing survey weights, and some for their estimation. While it cannot handle all survey designs, it may be the most user friendly program for survey analysis. Weights are simply loaded into the users workspace and can be called without any complicated code into any analysis.
SUDAAN- SUDAAN features perhaps the most straightforward of procedures for construction of survey weights. In particular, the WTADJST procedure allows for the production of non-response, attrition, and post stratification weighting using a model-based approach. In addition, the new WTADJX procedure has some updates and additions to the WTADJST procedure, including raking. Weights can be easily trimmed as needed. SUDAAN has a number of functions for analysis of complex surveys, and can support any study design.
Readings
Textbooks & Chapters
Heeringa, S.G., West, B.T., & Berglund, P.A. (2010). Applied survey data analysis. CRC Press.
Lumley, T. (2011). Complex surveys: A guide to analysis using R (Vol. 565). John Wiley & Sons.
Methodological Articles
Brick, J. M., & Kalton, G. (1996). Handling missing data in survey research.Statistical methods in medical research, 5(3), 215-238.
Kalton, G., Flores Cervantes, I., Zheng, H., Little, R. J., Wu, C., Luan, Y., … & Hedlin, D. (1998). Weighting methods. New Methods for Survey Research, 79.
Application Articles
Abraham, K. G., Maitland, A., & Bianchi, S. M. (2006). Nonresponse in the American Time Use Survey Who Is Missing from the Data and How Much Does It Matter?. Public Opinion Quarterly,70(5), 676-703.
Chen, Q., Gelman, A., Tracy, M., Norris, F. H., & Galea, S. (2012). Weighting Adjustments for Panel Nonresponse.
Kessler, R. C., Heeringa, S. G., Colpe, L. J., Fullerton, C. S., Gebler, N., Hwang, I., … & Ursano, R. J. (2013). Response bias, weighting adjustments, and design effects in the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). International journal of methods in psychiatric research, 22(4), 288-302.
Lemeshow, S., Letenneur, L., Dartigues, J. F., Lafont, S., Orgogozo, J. M., & Commenges, D. (1998). Illustration of analysis taking into account complex survey considerations: the association between wine consumption and dementia in the PAQUID study. American Journal of Epidemiology, 148(3), 298-306.
Lumley, T. (2004). Analysis of complex survey samples. Journal of Statistical Software, 9//(1), 1-19.
Websites
Applied Survey Data Analysis (Website that serves as a companion to the book of the same name, with many worked examples using publicly available data sets)
Survey Analysis in R (Maintained by Thomas Lumley, creator of the R Survey package)
UCLA Statistical Computing
-
General Intro to Survey Analysis
https://stats.oarc.ucla.edu/other/mult-pkg/seminars/svy-intro/
-
Choosing the Correct Analysis for Various Survey Designs
-
Sample coding setups for commonly used survey data sets
https://stats.oarc.ucla.edu/other/mult-pkg/faq/sample-setups-for-commonly-used-survey-data-sets/
Courses
Analysis of Complex Survey Design (@ Columbia University, EPIC summer program)
Analysis Methods for Complex Sample Survey Data (@ University of Michigan, Institute of Social Research, Summer Institute)