Confounders in Time-Series Regression
Courses |
Overview
Time series analysis has proven to be very useful within environmental epidemiology studies particularly with understanding the effect of common exposures to health outcomes across time. Such common exposures can include, but are not limited to, pollen, air pollution, weather, drinking water quality, walking, or any other time-varying environmental agent. Recent literature has demonstrated the utility of time series regression analysis towards understanding short-term association between time-varying exposures with outcomes. Thus, questions related to the effect of a daily measured common exposure by an individual or a city can be inferred through a time series regression analysis. However, confounding specific to time series does lend itself to be a major source of biases in estimating and interpreting measures of short-term associations between exposure and outcome in addition to autocorrelation and over-dispersion. The goal for this piece is to identify points when confounding can occur in time series regression analysis for short-term associations and to provide acceptable modes to control for those confounders.
Description
Managing confounders by modeling
Discretely plotting exposure and outcome over time is necessary to understanding non-linear relationships before conducting time series regression analysis. Often you can find temporal confounders, such as seasonality and long-term trends, which can partially contribute to confounding bias. Measured and unmeasured time-varying confounders are also responsible creating bias between exposure-outcome relationships as well.
In the case for epidemiological questions related to short-term variation for exposure, a generalized additive model (e.g. log-linear semi-parametric) can be used for regression modeling. This type of model allows for explicit parameters and non-parametric functions, to be used as explanatory variables and model smoothers, respectively. This type of model fit is ideal since outcome values are discrete counts of total number events (e.g. mortality, disease) at a specified time point. A Poisson regression model is considered to be appropriate for count data so long no over-dispersion (variance > mean) is assumed. This is often a rarely achievable assumption and more often over dispersion is expected. Consequently, this disallows the application of a Poisson probability distribution. Thus, a more appropriate Poisson probability distribution entails a negative binomial or quasi-Poisson-based distribution for a model. Alternatively, the incorporation of an overdispersion parameter can be plugged into a Poisson regression model.
Temporal-based Confounders
Temporal confounders relate to a defined (or unknown) time-specific duration and its effect on an exposure and outcome. Such potential and common temporal confounders are seasonality or inter-annual trends. For example, a time series that is measuring the relationship between air pollution exposure and mortality, seasonal trends historically has influenced the effect that air pollution has on health-related outcomes during seasons when air pollution would not normally have a greater effect when compared to summertime in urban environments. This phenomena is commonly observed during the wintertime when estimating the effect of ozone on mortality over multiple years. Mortality counts appear to increase during the wintertime when considering the effect of ozone, but in reality, it is the seasonal effect of increased infectious diseases such as influenza that may partially explain the uptick of mortality during the wintertime- a season typically not known for harmful levels of ozone concentrations. Often, incorporating a smooth function that accounts for defined time periods such as a moving average or adjusting for autocorrelation through ARIMA (see Box-Jenkins page) can be considered as an initial approach in decomposing (or detrending) temporal confounding. However, the inclusion of temporal confounder variables as an explicit parameter into a regression model is necessary to adjust for its biasing effect.
Time-Varying Confounders
Other measured and unmeasured time-varying confounders can bias the relationship between the main exposure and an outcome. Consistent with generalized linear models, the explicit inclusion of additional explanatory variables into a regression model allows for its adjustment when estimating the measurement of association between the time-varying main exposure and outcome.
While unmeasured confounders cannot directly be included as explanatory variables into a time series regression, approaches to overcome their confounding effect can include smoothing functions into the model. Typically, a spline parameter to a pre-existing crude model (e.g. penalized or natural) or a spline function (e.g. cubic) can be used to generate a smoothed nonlinear model fit over time. The approach can be computationally complex and difficult to interpret. However, this approach is flexible in capturing longer-term variation inherent to temporal trends such as seasons and unmeasured or unforeseen covariates that were not directly measured. To temper the temptation to overfit, the use of a penalization function can impact the prediction of expected averages and affect interpretability of the model.
Other modeling approaches that can account for confounders include time-stratified (piecewise) modeling, Fourier functions, and lagging.
Time-stratified modeling fits data by breaking the study period by a determined interval. Essentially, this approach creates indicator variables that are assigned to each unique interval across the entire study period. This is useful in creating discrete coefficients specific to each interval in the regression model. But a major limitation is the generation of numerous indicators variables to a model. In addition, the interval breaks are often not based on a biological or ecological rationale and therefore may not have a plausible basis for causality.
The Fourier function provides a smooth term represented by a trigonometric function. Its fit reflects periodicity across time like seasonality. A disadvantage to this approach that is that the Fourier term is not entirely a flexible term. It cannot take into account the natural variation of seasonal patterns that span over multiple years.
Lastly, accounting for delayed exposure by a single lag or distributed lag (multiple lags) modeling is a useful approach. The incorporation of lagging parameters can modify associations between exposure and outcomes that can estimate exposure-outcome associations beyond the same day. This approach can answer relevant questions in understanding the effect of a unit increase of an exposure on a given time point is extended into the future.
Managing confounders by methodological design:
An alternative to managing confounders is through an epidemiological design known as case crossover. Under this design, the comparison is within an individual or a group through a mechanism of self-matching before and after the onset of an exposure. This is an advantageous design in assuring optimal exchangeability under an observational study, thus potentially drawing casual inference from the analysis of case-crossover study. The utility of a case crossover approach as an equivalent method to time series regression analysis, specifically for common exposures and acute and transient effects, can present a strong basis for estimating a causal effect.
Conditional logistic regression (CLR) is considered as the traditional method of analysis for modeling matched data by creating strata parameters that define each matched cases-controls variables in a data set. CLR is predominately used for estimating association measurements in case- crossover studies. However, some major limitations have been long established that the equivalence between time series regression analysis and case crossover is the application of CLR. Unlike time series regression analysis, CLR cannot account for over dispersion or autocorrelation by creating adjustable parameters. Both of which are relatively easy to control for in time series regression.
Conditional Poisson regression (CPR) is an alternative approach for the analysis of case crossover studies. The traditional use of CLR has been considered as an acceptable and, very often, reliable approach for matched-based epidemiological designs. Its utility as a special case for time series analysis are justified particularly when considering frequent (e.g. daily) and common exposures and its covariates. However, when considering the equivalence between time series regression analysis and time-stratified case crossover, CLR inherently introduces some biases. CLR falls short in its capacity to adjust for autocorrelation and over dispersion inherently related to transient effects caused by acute exposure events. By firming up the equivalence between time series regression and case crossover analyses, we can potentially draw causal inferences from a time series data sets so long it is confirmed from a case crossover analysis while using CPR. CPR analysis allows for the adjustment for such biases and can also “simplify” the bookkeeping and processing of a plethora of stratum-based indicators.
Where i= unique group, s= stratum, Y.s= sum of events at each stratum, β=row vector of main exposure coefficients, x= row vector of main exposure, and superscript T denotes transpose.
Readings
Textbooks & Chapters
Hilbe, J. (2007) Addendum Chapter to Negative Binomial Regression. Cambridge University Press.
This chapter was useful with interpreting model outputs from Poisson and Negative Binomial regression models.
Peng, R.D. and Dominici, F. (2008) Statistical Methods for Environmental Epidemiology with R: A Case Study in Air Pollution and Health. Springer
A great and clearly written reference on time series regression in R. The book provides statistical background and applications. In addition, R script is provided throughout the book for walkthrough demonstrations. The NMMAPS data library is no longer available for use. Install tsModel package and use the balt data set for the exercises.
Rothman, K.J., Greenland, S., and Lash, T.L. (2008) Modern Epidemiology 3rd ed. Philadelphia, PA: Lippincott Williams & Wilkins.
The chapter on environmental epidemiology provides a brief summary and overview on time series and case crossover design.
Articles
Armstrong, B.G., Gasparrini, A., Tobias, A. Conditional Poisson models: a flexible alternative to conditional logistic case cross-over analysis. BMC Med Res Methodol. 2014; 14(1)
This paper introduces the application and the benefits of conditional Poisson modeling for a case crossover study. Side-by-side comparisons of conditional logistic, standard Poisson, and conditional Poisson was provided. Also, R and Stata script are provided for conducting the aforementioned methods.
Bhaskaran, K., Gasparrini, A., Hajat, S., Smeeth, L., and Armstrong, B. Time series regression studies in environmental epidemiology. Intl Journal of Epidemiology. 2013; 42
This is a great primer for time series regression techniques and its extensions specific to short-term associations. This paper provides a user-friendly walkthrough with time series regression model building.
Jaakkola, J.J.K. Case-crossover design in air pollution epidemiology. Eur Respir J. 2003; 21
This paper reviews the design of a case crossover epidemiological design and it highlights applications for air pollution studies in individuals.
Lu, Y. and Zeger, S.L. On the equivalence of case-crossover and time series methods in environmental epidemiology. Biostatistics. 2006; 8(2)
The authors provide mathematical explanation on how using conditional logistic regression (CLR) to estimate relative risks from a case crossover study is nearly similar to a Poisson regression analysis used in time series. However, a disconnection between time series regression and CLR resides with CLR’s inability to correct for autocorrelation and over dispersion.
Mittleman, M. and Mostofsky. Exchangeability in the case-crossover design. International Journal of Epidemiology. 2014
This review explores how case crossover is an optimal study design in demonstrating exchangeability in observational studies. This paper also highlights the importance in identifying and controlling for confounding, selection bias, and autocorrelation.
Maclure, M. and Mittleman, M.A. Should we use a case-crossover design? Annual Rev. Public Health. 2000; 21
This review provides an in-depth layout of the nuts & bolts in a case-crossover study.
Richardson, D.B., Langholz, B. Background stratified Poisson regression analysis of cohort data. Radiat Environ Biophys. 2012; 51
This paper was one of the first papers to showcase the use of conditional Poisson and demonstrating its equivalence to Background Stratified Poisson regression sans the need for stratum-specific indicators.
Software
R Packages to Consider
Gnm
The gnm package is specific for fitting nonlinear regression models.
tsModel
The tsmodel package is specific for fitting time series models and generating time series model terms.
Websites
Coglan, A. Using R for Time Series Analysis. http://a-little-book-of-r-for-time-series.readthedocs.org/en/latest/src/timeseries.html#arima-models
A very helpful introduction to R and to time series analysis using R code from start to analytical finish.
Pennsylvania State University. Department of Statistics Online Programs. STAT 504: Analysis of Discrete Data Poisson Regression Model.. Notes. Retrieved fromhttps://onlinecourses.science.psu.edu/stat504/node/165
A great module on Poisson regression with notes and examples with SAS and R scripts to follow along.