This list builds off of the work on Principal Components Analysis (PCA) page and Exploratory Factor Analysis (EFA) page on this site. This resource is intended to serve as a guide for researchers who are considering use of PCA or EFA as a data reduction technique. The resources outlined below are intended to complement the already existing resources on the technique-specific webpages.
Theoretical/Statistical background and comparisons
These two publications compare the two methods and present opposing views of whether EFA and PCA should be used on the same dataset.
“Determine the appropriate statistical analysis to answer research questions a priori…It is inappropriate to run PCA and EFA with your data. PCA includes correlated variables with the purpose of reducing the numbers of variables and explaining the same amount of variance with fewer variables (principal components). EFA estimates factors, underlying constructs that cannot be measured directly.”
Joliffe IT, Morgan BJ. Principal component analysis and exploratory factor analysis. Statistical methods in medical research 1992;1:69-95.
“Despite their different formulations and objectives, it can be informative to look at the results of both techniques on the same data set. Each technique gives different insights into the data structure, with PCA concentrating on explaining the diagonal elements, and factor analysis the off-diagonal elements, of the covariance matrix, and both may be useful.”
There are a number of other books and resources cited on the Advanced Epidemiology page for each method. Many resources cover both techniques but don’t necessarily compare and contrast the two. The online resources at the end of this handout provide introductory material and comparison of the two methods.
The overall goal of this guide is to provide resources for a researcher to navigate the junctures of the decision tree below by sharing literature that compared use of PCA, EFA and other data reduction techniques.
The papers below are reviews of use of PCA, EFA and other data reduction techniques in public health and health literature.
This paper is more theoretical and reviews the underlying theory for PCA, EFA (and their connection) along with structural equation models and MIMIC using well-being and poverty indices as a case study.
Krishnakumar, Jaya and Nagar, A. L., On Exact Statistical Properties of Multidimensional Indices Based on Principal Components, Factor Analysis, MIMIC and Structural Equation Models (2008). Social Indicators Research, (2008) 86:481-496.
Systematic review of major depressive disorder classification systems and statistical methods used to identify symptom dimensions or latent classes. Based on 20 articles with 34 analyses, the authors found equal number of factor analyses and PCAs conducted, often with the same scales and measures or on the same sample.
van Loo HM, de Jonge P, Romeijn JW, Kessler RC, Schoevers RA. Data-driven subtypes of major depressive disorder: a systematic review. BMC Med 2012;10:156.
This paper reviewed 47 studies using PCA and compares methods and challenges and mistakes when using PCA for composite health measures. Paper suggests repeating analysis across samples and using complementary methods such as factor analysis.
Coste J, Bouee S, Ecosse E, Leplege A, Pouchot J. Methodological issues in determining the dimensionality of composite health measures using principal component analysis: case illustration and suggestions for practice.
Quality of life research : an international journal of quality of life aspects of treatment, care and rehabilitation 2005;14:641-54.
This paper outlines common mistakes and errors with EFA from a review of 60 studies in psychology journals. Provides useful suggestions for improved practices related to use of EFA and reporting in journals.
Henson RK, Roberts JK. Use of Exploratory Factor Analysis in Published Research: Common Errors and Some Comment on Improved Practice. Educational and Psychological Measurement 2006;66:393-416.
This paper reviews the use of EFA and key decisions when conducting EFA (reviewing 28 papers from high-impact nursing journals). Findings reported that PCA was used more often than EFA (61% vs. 39%), though no paper explained why PCA was chosen over EFA. The paper outlines practical recommendations for addressing flawed and out-of-date “rules of thumb” for PCA and EFA use.
Gaskin CJ, Happell B. On exploratory factor analysis: A review of recent evidence, an assessment of current practice, and recommendations for future use. International Journal of Nursing Studies 2014;51:511-21.
Nutritional Epidemiology comparison of reduced rank regression, partial least-squares regression and PCA.
DiBello JR, Kraft P, McGarvey ST, Goldberg R, Campos H, Baylin A. Comparison of 3 Methods for Identifying Dietary Patterns Associated With Risk of Disease. American journal of epidemiology 2008;168:1433-43.
Social epidemiology example. The authors concluded using one variable rather than PCA might be as good as developing principal components.
Hurtado D, Kawachi I, Sudarsky J. Social capital and self-rated health in Colombia: The good, the bad and the ugly. Social science & medicine 2011;72:584-90.
Built environment research and development of neighborhood deprivation index using PCA.
Messer LC, Laraia BA, Kaufman JS, et al. The Development of a Standardized Neighborhood Deprivation Index. Journal of urban health : Bulletin of the New York Academy of Medicine 2006;83:1041-62.
Nutritional epidemiology study of dietary patterns and association with laryngeal cancer. Comparison of dietary patterns and whether they allow better explanation of determinants compared to individual components of dietary patterns.
De Stefani E, Boffetta P, Ronco AL, Deneo-Pellegrini H, Acosta G, Mendilaharsu M. Dietary patterns and risk of laryngeal cancer: an exploratory factor analysis in Uruguayan men. International journal of cancer Journal international du cancer 2007;121:1086-91.
Nutritional epidemiology study comparing two dietary patterns generated through EFA (“traditional cooking” and “fruits and vegetables” pattern) with a hypothesis-driven Dietary Approaches to Stop Hypertension (DASH) pattern. No significant trends were found when comparing all three patterns, though women in Q3 of DASH were at lower risk than those in Q1.
Schulze MB, Hoffmann K, Kroke A, Boeing H. Risk of hypertension among women in the EPIC-Potsdam Study: comparison of relative risk estimates for exploratory and hypothesis-oriented dietary patterns. American journal of epidemiology 2003;158:365-73.
Social epidemiology paper using PCA and EFA synonymously: authors write they “conducted an exploratory factor analysis using principal components analysis.” EFA yielded two factors that reflected Perceived and Enacted Sexual Stigma among LBQ women (based on items on a sexual stigma scale).
Logie CH, Earnshaw V. Adapting and Validating a Scale to Measure Sexual Stigma among Lesbian, Bisexual and Queer Women. PloS one 2015;10:e0116198.
Built environment paper exploring environmental contributors to drug abuse using 32 variables for census tracks. 4 factors (representing 55.8% of variance) were identified. Authors made the point that EFA can be more policy relevant by helping distinguish between influence/relationship of economic well-being, violence or social disorganization (3 of the factors).
Bell DC, Carlson JW, Richard AJ. The social ecology of drug use: a factor analysis of an urban environment. Subst Use Misuse 1998;33:2201-17.
Short course on PCA and EFA by Jose Manuel Roche at Oxford University Poverty and Human Development Initiative with lecture video, slides, exercise files, reading list and links to other resources. Available here: http://www.ophi.org.uk/principal-components-analysis-and-factor-analysis-2010
Two introductory lessons on PCA and EFA from Mike Clark, PhD at University of North Texas and Elizabeth Root at University of Colorado. Explains the difference the in the variance between the two methods. These lectures also have a useful explanation of factor analysis scales along with guidance on what variables to include in analysis:
A resource page on EFA and PCA from University of Wisconsin psychology department:http://psych.wisc.edu/henriques/pca.html
5 videos (2 hours) introduction and tutorial for EFA and PCA from Econometrics Academy (by Ani Katakova). Interesting to note that the example conducts EFA and PCA on the same dataset. https://www.youtube.com/playlist?list=PLRW9kMvtNZOjaStLK9ldf_Yc8MB6TkCUx
More resources from Econometrics Academy available here:https://sites.google.com/site/econometricsacademy/
Theoretical lecture on principal components analysis from “Opinionated Lessons in Statistics” by Bill Press, University of Texas. Main caution related to over-interpretation of meaning of components https://www.youtube.com/watch?v=frWqIUpIxLg&index=43&list=PLUAHeOPjkJseXJKbuk9-hlOfZU9Wd6pS0
A written tutorial on Principal Components Analysis. Lindsay I Smith February 26, 2002. Accessed March 15, 2015. Available athttps://courses.cs.washington.edu/courses/cse528/09sp/pca.pdf
Brief written tutorial on Exploratory (and Confirmatory) Factor Analysis from Jamie Decoster at University of Alabama. “Overview of Factor Analysis.” Accessed March 16, 2005. Available at: http://stat-help.com/factor.pdf