Public Health Data Science

The MS in Biostatistics Public Health Data Science Track (MS/PHDS) is designed for students interested in careers as biostatisticians applying statistical methods in health-related research settings. The MS/PHDS Track provides core training in biostatistical theory, methods, and applications, but adds a distinct emphasis on modern approaches to statistical learning, reproducible and transparent code, and data management. It is an appropriate program for students who intend to conclude their studies with the MS degree as well as those who want to pursue a PhD in biostatistics

All MS/PHDS candidates begin their studies in the fall semester. The length of the MS/PHDS program varies with the background, training, and experience of the candidate, but the usual period needed to complete the 36 credit MS/PHDS degree is two years (four semesters). In addition to fulfilling their course work, all MS/PHDS students also complete a one-term practicum and capstone experience.


Through a curriculum of 36 credit hours of course work, a practicum, and the capstone experience, the MS/PHDS track provides students with the skills necessary for a career as a public health data scientist and a rigorous grounding in traditional biostatistics.

In addition to achieving the MS in Biostatistics core competencies, students in the PHDS Track gain the following specific competencies in the areas of public health and collaborative research, the foundations of applied data science, teaching biostatistics and biostatistical research. Upon satisfactory completion of the MS/PHDS, graduates will be able to:

Public Health and Collaborative Research

  • Formulate and prepare a written statistical plan for analysis of public health research data that clearly reflects the research hypotheses of the proposal in a manner that resonates with both co-investigators and peer reviewers;
  • Prepare written summaries of quantitative analyses for journal publication, presentations at scientific meetings, grant applications, and review by regulatory agencies;

Foundations of Applied Data Science

  • Develop expertise in one or more statistical software and database management packages (often R and SQL, among others) routinely used by data science professionals;
  • Implement a reproducible workflow for data analysis projects, including robust project organization, transparent data management, and reproducible analysis results;
  • Develop and execute analysis strategies that use traditional statistical tools or modern approaches to statistical learning, depending on the nature of the scientific questions of interest;
  • Identify the uses to which data management can be put in practical statistical analysis, including the establishment of standards for documentation, archiving, auditing, and confidentiality; guidelines for accessibility; security; structural issues; and data cleaning;

Teaching Biostatistics

  • Review and illustrate selected principles of study design, probability theory, estimation, hypothesis testing, statistical learning, and data analytic techniques to public health students enrolled in introductory level graduate public health courses; and

Biostatistical Research

  • Apply probabilistic, statistical, and data scientific reasoning to structure thinking and solve a wide range of problems in public health.

Course Requirements

MS/PHDS graduates are expected to master the mathematical and biostatistical concepts and techniques presented in the curriculum’s required courses. Each student's program is designed on an individual basis in consultation with a faculty advisor taking into consideration the student's prior educational experience.

Students who have mastered an academic area through previous training may have the corresponding course requirement waived. Some students, such as those with undergraduate majors in statistics or mathematics, may apply to have several courses waived. Students wishing to waive one or more courses must request approval in writing from their advisors and the Director of Academic Programs. These students must still complete a minimum of 36 points to earn the MS/PHDS degree.

Required Courses

Below is the required course work. Students consult their faculty advisors before registering for classes to plan their programs based on their individual background, goals, and the appropriate sequencing of courses. Waiver of any required courses (with prior written approval of their faculty advisor and the Director of Academic Programs) enables students to take other, higher level classes.

Course #

Course Name



Principles of Epidemiology






Data Science I



Data Science II*



Statistical Inference



Biostatistical Methods I



Biostatistical Methods II



Relational Databases and SQL Programming for Research and Data Science



Capstone Consulting Seminar


*Students who have strong math background and/or have taken basic machine learning methods, can substitute the P8106 Data Science II with P9120 Topics in Statistical Learning and Data Mining I. 


Students choose four or more courses from the list below or from alternatives approved by their academic advisors.

Course #

Course Name



Statistical Computing with SAS



Survival Analysis



Advanced Statistical and Computational Methods in Genetics and Genomics



Graphical Models for Complex Health Data



Analysis of Longitudinal Data



Latent Variable and Structural Equation Modeling for Health Sciences



Topics in Advanced Statistical Computing



Topics in Statistical Learning and Data Mining


Sample Timeline

Below is a sample timeline for MS/PHDS candidates. Note that course schedules change from year to year, so that class days/times in future years will differ from the sample schedule below; you must check the current course schedule for each year on the course directory page.

Fall I

Spring I

Fall II

Spring II

 P6400: Principles of Epidemiology 

P8109: Statistical Inference

P8180: Relational Databases and SQL Programming for Research and Data Science

P8185: Capstone Consulting Seminar

P8104: Probability

P8106: Data Science II


Completion of practicum requirements

P8105: Data Science I 

P8131: Biostatistical Methods II



P8130: Biostatistical Methods I




Practicum Requirement

One term of practical experience is required of all students, providing educational opportunities that are different from and supplementary to the more academic aspects of the program. The practicum may be fulfilled during the school year or over the summer. Arrangements are made on an individual basis in consultation with faculty advisors who must approve both the proposed practicum project prior to its initiation, and the report submitted at the conclusion of the practicum experience. Students will be required to make a poster presentation at the department’s Annual Practicum Poster Symposium which is held in early May.

Capstone Experience

A formal, culminating experience for the MS degree is required for graduation. The capstone consulting seminar is designed to enable students to demonstrate their ability to integrate their academic studies with the role of biostatistical consultant/collaborator, which will comprise the major portion of their future professional practice.  

As part of the seminar, students are required to attend several sessions of the Biostatistics Consulting Service (BCS). The Consultation Service offers advice on data analysis and appropriate methods of data presentation for publications, and provides design recommendations for public health and clinical research, including preparation of grant proposals. Biostatistics faculty and research staff members conduct all consultation sessions with students observing, modeling, and participating in the consultations.

In the capstone seminar, students present their experience and the statistical issues that emerged in their consultations, developing statistical report writing and presentation skills essential to their professional practice in biomedical and public health research projects.


Paul McCullough
Director of Academic Programs
Department of Biostatistics
Columbia University