Repeated Measures Analysis









This page briefly describes repeated measures analysis and provides an annotated resource list.


This page looks specifically at generalized estimating equations (GEE) for repeated measures analysis and compares GEE to other methods of repeated measures.

Longitudinal Studies

  • Longitudinal studies are repeated measurements through time, whereas cross-sectional studies are a single outcome per individual

    • Observations from an individual tend to be correlated and the correlation must be taken into account for valid inference.

Example of repeated measurements
Three different types of diets are randomly assigned to a group of men. Each man is assigned a different diet and the men are weighed weekly for one year. The treatment is diet type and is the between-subjects factor. Time is the within-subject factor. In this study we would be interested in how the weight and diet changes over time, i.e. is there a diet by time interaction? The covariance structure of the observed data is what makes repeated measures data unique-the data from the same subject may be correlated and the correlation should be modeled if it exists.

Ways data can be correlated

  • Multivariate Data- a persons weight and height simultaneously measured

  • Clustered Data- weight for all members in various families

  • Longitudinal Data- weight taken repeatedly over time on the same individuals

  • Spatially correlated data- replace time with one or more spatial dimensions

GEE can take into account the correlation of within-subject data (longitudinal studies) and other studies in which data are clustered within subgroups.

Failure to take into account correlation would lead to the regression estimates (Bs) being less efficient- meaning they would be more widely scattered around the true population value.

The GEE method was developed by Liang and Zeger (1986) in order to produce regression estimates when analyzing repeated measures with non-normal response variables.

Generalized Estimating Equations

  • Can be thought of as an extension of generalized linear models (GLM) to longitudinal data

  • Instead of attempting to model the within-subject covariance structure, GEE models the average response

  • The goal is to make inferences about the population when accounting for the within-subject correlation

    • For every one-unit increase in a covariate across the population, GEE tells us how much the average response would change

  • GEE estimates are the same as Ordinary Least Squares (OLS) if the dependent variable is normally distributed and no correlation within responses are assumed


  • The response variable (Y) can be either categorical or continuous.

    • Yij represents the response for each subject, i, measured at different time points (j=1,2,…,ni). Each yi can be a binomial or a multinomial response.

      • The responses are correlated-i.e. not independent

    • The explanatory variables, X=(X1, X2, X3,…,Xk), can be discrete, continuous, or a combination. Xi is ni x k matrix of covariates.

There are three main types of link and variance functions:

  • Normally-distributed response

    • g(uij)=uij “Identity link”

    • no transformation of u before construction of the matrix

    • For normally distributed data

  • Binary response (Bernoulli)

    • g(uij)=log[uij/(1-uij)] “Logit link”

    • For binary dependent variables

    • Allows regression equation to map interval from 0 to 1

  • Poisson response

    • g(uij)=log(uij) “Log link”

    • For count data

    • Regression coefficients are the expected change in the log of the mean of the dependent variable for each change in a covariate

  • Also probit link for cumulative predictive analysis of binary or ordered dependent variables and cumulative logit for ordered multinominal data

  • NOTE: The regression coefficients that results from GEE models for logit, probit, and log links need to be exponentiated before they are meaningful

Correlation Matrices

  • The goal of specifying a working correlation structure is to estimate B more efficiently. Incorrect specification can affect efficiency of the parameter estimates.

  • Autoregressive Correlation Structure- data that are correlated within clusters over time

    • within-subject correlations are set as an exponential function of this lag period- determined by researcher

  • Exchangeable-within-subject observations are equally correlated

    • No logical ordering for observations within a cluster-usually appropriate for data that are clustered within a subject but are not time-series data

  • Unstructured-free estimation on the within-subject correlation

    • estimates all possible correlations between within-subject responses and includes them in the estimation of the variances

Highlights of GEE

  • Can be used on non-normal data

  • Uses all available data for each subject

  • Accounts for correlations between binary outcomes across time within the same individual

  • Allows for specification of both time-varying and individual difference variables

Other methods for repeated measures:

  • Repeated measures ANOVA

    – not preferred since they require balanced and complete data sets, require normally distributed response variables and do not allow for the analysis of covariates that change over time.

    • Data are in the form of one row per subject

    • If there is no control group, use a One-way repeated-measures ANOVA

      • Here you are answering the question: “How does trial affect Y?”

    • If there is a control group, use a Two-way repeated-measures ANOVA

      • Investigating the interaction between Group*Trial

      • Here you are answering the question: “How does Trial affect Y differently across Groups?”

  • Paired t-test

    - allows for the investigation between groups for within-subjects. Can only be used for two time points.

  • Mixed modeling

    • Data are in the form of one row per subject per trial

    • Analysis is via maximizing likelihood of observed values

    • Can handle balanced as well as unbalanced or missing within subject data

    • Fixed effects-the differences or changes in the dependent variable that is attributed to an independent (predictor) variable

      • Their value is the same (fixed) for everyone in a group

    • Random effects- have values that vary randomly within and/or between individuals

    • Mixed=fixed+random


Textbooks & Chapters

Ballinger G.A. (2004). Using generalized estimating equations for longitudinal data analysis, Organizational Research Methods, 7:127-150.

Hardin J.W., Hilbe J.M. (2003). Generalized Estimating Equations, New York: Chapman and Hall.

Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence by Judith D. Singer and John B. Willett

Modeling Longitudinal Data by Robert Weiss

Applied Longitudinal Analysis by Garrett M. Fitzmaurice, Nan M. Laird and James H. Ware

Methodological Articles

Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, Bruckner T, Satariano WA. To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010 Jul;21(4):467-74. doi: 10.1097/EDE.0b013e3181caeb90.
Compares mixed models and GEE

Odueyungbo A, Browne D, Akhtar-Danesh N, Thabane L.Comparison of generalized estimating equations and quadratic inference functions using data from the National Longitudinal Survey of Children and Youth (NLSCY) database. BMC Med Res Methodol. 2008 May 9;8:28. doi: 10.1186/1471-2288-8-28.
Shows how to use the quadratic inference functions in GEE

Goldstein H. Tutorial in Biostatistics: Longitudinal Data Analysis (Repeated Measures) in Clinical Trials. Stat Med. 2000 Jul 15;19(13):1821.
Reviews and summarizes longitudinal data analysis in clinical trials. Uses five clinical trials with longitudinal outcomes. Reviews some software (note: published in 1999).

Ballinger GA. Using Generalized Estimating Equations for Longitudinal Data Analysis. Organizational Research Methods April 2004 7: 127-150, doi:10.1177/1094428104263672
A good overview of GEE with examples (no code however)

Application Articles

Ma Y, Mazumdar M, Memtsoudis SG. Beyond Repeated Measures ANOVA: advanced statistical methods for the analysis of longitudinal data in anesthesia research. Reg Anesth Pain Med. 2012 Jan-Feb;37(1):99-105. doi: 10.1097/AAP.0b013e31823ebc74.
Paper comparing GEE to other repeated measures analysis models (mixed models and RM-ANOVA)

Hanley JA, Negassa A, Edwardes MD, Forrester JE.Statistical Analysis of Correlated Data Using Generalized Estimating Equations: An Orientation. Am J Epidemiol. 2003 Feb 15;157(4):364-75.
Paper describing GEE method for epidemiologists

Hu FB, Goldberg J, Hedeker D, Flay BR, Pentz MA. Comparison of Population-Averaged and Subject-Specific Approaches for Analyzing Repeated Binary Outcomes. Am J Epidemiol. 1998 Apr 1;147(7):694-703.
A comparison of generalized estimating equation and random-effects approaches to analyzing binary outcomes from longitudinal studies: illustrations from a smoking prevention study


Powerpoint presentations on GEE & repeated measures analyses:

PDF resources:

-Refers on how to implement Repeated Measures Analyses in SAS
-compares strategies of analyzing repeated measures data in SAS and SPSS

Examples of research using GEE

1. Tamers, S. L., et al. (2014). “The impact of stressful life events on excessive alcohol consumption in the French population: findings from the GAZEL cohort study.” PLoS One 9(1): e87653.

2. Stopka, T. J., et al. (2014). “Is crime associated with over-the-counter pharmacy syringe sales? Findings from Los Angeles, California.” Int J Drug Policy 25(2): 244-250.

3. Patterson, A. C. and G. Veenstra (2010). “Loneliness and risk of mortality: a longitudinal investigation in Alameda County, California.” Soc Sci Med 71(1): 181-186.

4. Lin, K. C., et al. (2010). “Time-varying nature of risk factors for the longitudinal development of disability in older adults with arthritis.” J Epidemiol 20(6): 460-467.


Course in Mailman’s Bio-statistics department: Analysis of Longitudinal Data (P8157)

Course at CUNY: BIOS 75300 – Analysis of Longitudinal Data

Cornell Statistical Consulting Unit workshop:

NCU online version of course notes in pdf (Specifically Ch 12)

UCLA seminar with videos