Extensions to Multinomial Regression









This page briefly describes approaches to working with multinomial response variables, with extensions to clustered data structures and nested disease classification.


Multinomial (Polytomous) Logistic Regression
This technique is an extension to binary logistic regression for multinomial responses, where the outcome categories are more than two. It provides more power by using the sample size of all outcome categories in the likelihood estimation of the parameters and variance, than separate binary logistic regression, which only uses the sample size of the two outcome categories in the likelihood estimation of the parameters and variance. One of the major assumptions of this technique is that the outcome responses are independent. In polytomous logistic regression analysis, more than one logit model is fit to the data, as there are more than two outcome categories. Then one of the latter serves as the reference as each logit model outcome is compared to it. All logit models together make up the polytomous regression model and collectively they are used to predict the probability of each outcome.

Multinomial (Polytomous) Logistic Regression for Correlated Data
When using clustered data where the non-independence of the data are a nuisance and you only want to adjust for it in order to obtain correct standard errors, then a marginal model should be used to estimate the population-average. A recent paper by Rooij and Worku suggests that a multinomial logistic regression model should be used to obtain the parameter estimates and a clustered bootstrap approach should be used to obtain correct standard errors. They provide SAS code for this technique.

Chatterjee Approach for determining etiologic heterogeneity of disease subtypes
This technique is beneficial in situations where subtypes of a disease are defined by multiple characteristics of the disease. This technique accounts for the potentially large number of subtype categories and adjusts for correlation between characteristics that are used to define subtypes. This allows the researcher to examine associations between risk factors and disease subtypes after accounting for the correlation between disease characteristics.


Textbooks & Chapters

  • Kleinbaum DG, Kupper LL, Nizam A, Muller KE. Chapter 23: Polytomous and Ordinal Logistic Regression, from Applied Regression Analysis And Other Multivariable Methods, 4th Edition. United States: Duxbury, 2008.

  • Hosmer DW and Lemeshow S. Chapter 8: Special Topics, from Applied Logistic Regression, 2nd Edition. New York: John Wiley & Sons, Inc., 2000.

  • Agresti, Alan. Categorical data analysis. Vol. 359. John Wiley & Sons, 2002. Available here

  • Menard, Scott. Applied logistic regression analysis. Vol. 106. Sage, 2002.

    • These two books (Agresti & Menard) provide a gentle and condensed introduction to multinomial regression and a good solid review of logistic regression. I would advise, reading them first and then proceeding to the other books.

Methodological Articles

de Rooij M and Worku HM. A warning concerning the estimation of multinomial logistic models with correlated responses in SAS. Computer Methods and Programs in Biomedicine. 2012. Epub ahead of print.
This article is a critique of the 2007 Kuss and McLerran article. They provide an alternative method for dealing with multinomial regression with correlated data for a population-average perspective.

Kuss O and McLerran D. A note on the estimation of multinomial logistic models with correlated responses in SAS. Computer Methods and Programs in Biomedicine. 2007; 87: 262-269.
This article provides SAS code for Conditional and Marginal Models with multinomial outcomes. Their methods are critiqued by the 2012 article by de Rooij and Worku.

Chatterjee N. A Two-Stage Regression Model for Epidemiologic Studies with Multivariate Disease Classification Data. Journal of the American Statistical Assocication. 2004; 99(465): 127-138.
This article describes the statistics behind this approach for dealing with multivariate disease classification data. No software code is provided, but this technique is available with Matlab software.

Erdem, Tugba, and Zeynep Kalaylioglu. “A Monte Carlo Simulation Study to Assess Performances of Frequentist and Bayesian Methods for Polytomous Logistic Regression.” COMPSTAT’2010 Book of Abstracts (2008): 352.
In order to assess three methods used to estimate regression parameters of two-stage polytomous regression model, the authors construct a Monte Carlo Simulation Study design.

Ananth, Cande V., and David G. Kleinbaum. “Regression models for ordinal responses: a review of methods and applications.” International journal of epidemiology 26.6 (1997): 1323-1333.
This article offers a brief overview of models that are fitted to data with ordinal responses. Models reviewed include but are not limited to polytomous logistic regression models, cumulative logit models, adjacent –category logistic models, etc.…. The models are compared, their coefficients interpreted and their use in epidemiological data assessed. This assessment is illustrated via an analysis of data from the perinatal health program.

Application Articles

Example applications of Multinomial (Polytomous) Logistic Regression

Biesheuvel CJ, Vergouwe Y, Steyerberg EW, Grobbee DE, Moons KGM. Polytomous logistic regression analysis could be applied more often in diagnostic research. Journal of Clinical Epidemiology. 2008;61(2):125-34.
This article provides a simple introduction to the core principles of polytomous logistic model regression, their advantages and disadvantages via an illustrated example in the context of cancer research. The researchers also present a simplified blue-print/format for practical application of the models.

Bender, Ralf, and Ulrich Grouven. “Ordinal logistic regression in medical research.” Journal of the Royal College of Physicians of London 31.5 (1997): 546-551.
The purpose of this article was to offer a non-technical overview of proportional odds model for ordinal data and explain its relationship to the polytomous regression model and the binary logistic model.

Example applications of Multinomial (Polytomous) Logistic Regression for Correlated Data

Hedeker, Donald. “A mixed‐effects multinomial logistic regression model.” Statistics in medicine 22.9 (2003): 1433-1446.
The purpose of this article is to explain and describe mixed effects multinomial logistic regression models, and its parameter estimation. A practical application of the model is also described in the context of health service research using data from the McKinney Homeless Research Project

Example applications of the Chatterjee Approach

Garcia-Closas M, Brinton LA, Lissowska J et al. Established breast cancer risk factors by clinically important tumour characteristics. British Journal of Cancer. 2006; 95: 123-129.

Sherman ME, Rimm DL, Yang XR, et al. Variation in breast cancer receptor and HER2 levels by etiologic factors: A population-based analysis. International Journal of Cancer. 2007; 121: 1079-1085.


These websites provide programming code for multinomial logistic regression with non-correlated data

SAS code for multinomial logistic regression

Stata code for multinomial logistic regression

R code for multinomial logistic regression


This course is an online course offered by statistics .com covering several logistic regression (proportional odds logistic regression, multinomial (polytomous) logistic regression, etc.)

Intermediate level workshop offered as an interactive, online workshop on logistic regression – one module is offered on multinomial (polytomous) logistic regression

http://sites.stat.psu.edu/~jls/stat544/lectures.html and http://sites.stat.psu.edu/~jls/stat544/lectures/lec19.pdf
The course website for Dr Joseph L. Schafer on categorical data, includes Lecture notes on (polytomous) logistic regression

Online course offered by Pen State University. A succinct overview of (polytomous) logistic regression is posted, along with suggested readings and a case study with both SAS and R codes and outputs.