Electronic Medical Records Boot Camp: Biostatistical methods for analyzing EMR data

August 22-23, 2024 | Livestream, virtual

Registration is open! Join us for the next Electronic Medical Records Boot Camp on August 22-23, 2024. 

The Electronic Medical Records Boot Camp is a two-day intensive boot camp of seminars and hands-on analytical sessions to provide an overview of electronic health data opportunities, statistical challenges, and latest techniques. 


Subscribe for updates on registration and scholarship dates, deadlines, and announcements.

Jump to:  Overview  |  Prerequisites  |  Instructors  |  Scholarships  |  Locations  |  Testimonials  |  Registration Fees  |  Additional Information

Electronic Medical Records Training Overview

Summer 2024 dates: Livestream, online training August 22-23, 2024; 10:00am - ~5:30pm ET.

Over the last decade, Electronic Health Records (EHRs) and Electronic Medical Records (EMRs) systems have been increasingly implemented at US hospitals. Huge amounts of longitudinal and detailed patient information, including lab tests, medications, disease status, and treatment outcome, have been accumulated and are available electronically. Extensive effort has been dedicated to developing advanced clinical data processing and data management in order to integrate patient data into a computable collection of rich longitudinal patient profiles. EMR/EHRs provide unprecedented opportunities for cohort-wide investigations and knowledge discovery. They are important data resources for building predictive models for disease diagnosis and prognosis, thus enabling personalized medicine.

Despite the great potential, analyzing such large, scattered and heterogeneous observational patient data is still technically challenging. This two-day intensive workshop will go over opportunities and potentials of EMR/EHR for health and medical studies, statistical challenges and pitfalls for analyzing EMR/EHR, and the latest developments of multiple techniques to address those challenges, followed by hands-on computer lab sessions and case studies to put concepts into practice.

By the end of the electronic medical records training, participants will be familiar with the following topics:

  • Power and potentials of EMR/EHR data
  • Open-access datasets across the world
  • Preparation, transformation and integration of EMR/EHR
  • Confounding, bias and missing data in EMR/EHR and statistical methods addressing these challenges
  • Statistical methods for comparative effectiveness
  • Statistical methods for predictive analysis

Audience and Requirements

Investigators from any institution and from all career stages are welcome to attend, and we particularly encourage trainees and early-stage investigators to participate. There are four requirements to attend this training:

  1. Each participant must have an introductory background in statistics.
  2. Each participant must be familiar with R.
  3. Each participant must have a laptop/computer with latest versions of R and R-Studio downloaded and installed prior to the first day of the workshop. R and R-Studio are available for free download and installation on Mac, PC, and Linux devices.
  4. Each participant will be required to apply for access to MIMIC-III data, requiring completion of specific HIPAA training to receive credentials.


Shuang Wang, PhD, Department of Biostatistics, Columbia University. Dr. Wang is Professor of Biostatistics in the department of Biostatistics at Mailman School of Public Health. Her research focuses on methodological development in observational studies using electronic health records data and multi-omics data, especially methods for multiple domain fusion or multi-omics integration.  

Tian Gu, PhD, Department of Biostatistics, Columbia University. Dr.Gu is an Assistant Professor of Biostatistics. She has a broad interest in developing innovative statistical methods and easy-to-use computational tools to advance precision health by integrating real-world data and evidence collected from diverse populations and large datasets. Her research interests include robust and efficient data integration in precision health research, methods for use in biobank data, electronic health records, and disparity research.


Training scholarships are available for the Electronic Medical Records Boot Camp.


Summer 2024: The EMR Boot Camp is a livestream, remote training that takes place over live, online video on August 22-23, 2024 from 10am ET - ~5:30pm ET. Please note this training is not a self-paced, pre-recorded online training.


"A very comprehensive overview of statistical and machine learning approaches used to analyze EHR data, applicable to trainees from various backgrounds." - Fellow at Children's Hospital of Philadelphia, 2023

"This is a very informative and rigorous boot camp which helps understand how to use EHR data for answering clinical questions." - Senior research associate at Illinois Institute of Technology, 2023

"It was a great workshop and I highly recommend to those working with health data." - Graduate research assistant at University of Nebraska Medical Center, 2023

Additional Testimonials

"This is a good short course and provided a good review of working with EHR data primarily for predictive analysis." - Postdoc at The Ohio State University, 2022

"The boot camp had a good balance of lectures and labs to solidify concepts and help us apply the information." - Faculty member at Rutgers University, 2022

"This bootcamp was helpful for understanding what kinds of clinical questions can be answered with large-scale EMR data. There were novel techniques introduced that I was not familiar with, and the code review was helpful for understanding how to implement them." - Non-profit staff member at Whitman-Walker Institute, 2021

"The training materials (codings and books) are very practical. It inspires me to apply various analytical techniques in my EMR projects and produce better and more robust results." - Student at University of California San Diego, 2021

"I was excited to learn new techniques and R packages for EMR analysis." - Staff member at Michigan DHHS, 2021

"The boot camp covered an impressive array of sophisticated inference and modeling techniques. I felt much more comfortable with complicated, multi-dimensional EHR data after this bootcamp." -  Postdoc at Memorial Sloan Kettering Cancer Center, 2021

Registration Fees

  Early-Bird Rate (through 6/10/24) Regular Rate (6/11/24 - 8/15/24) Columbia Discount*
Student/Postdoc/Trainee  $995 $1,195 10%
Faculty/Academic Staff/Non-Profit Organizations/Government Agencies $1,195 $1,395 10%
Corporate/For-Profit Organizations $1,395 $1,595 NA


*Columbia Discount: This discount is valid for any active student, postdoc, staff, or faculty at Columbia University. If paying by credit card, use your Columbia email address during the registration process to automatically have the discount applied. If paying by internal transfer within Columbia, submit this Columbia Internal Transfer Request form to receive further instructions. Please note: filling out this form is not the same as registering for a training and does not guarantee a training seat.  

Invoice Payment: If you would prefer to pay by invoice/check, please submit this Invoice Request form to receive further instructions. Please note: filling out this form is not the same as registering for a training and does not guarantee a training seat.

Registration Fee: This fee includes course material, which will be made available to all participants both during and after the conclusion of the training.

Cancellations: Cancellation notices must be received via email at least 30 days prior to the training start date in order to receive a full refund, minus a $75 administrative fee. Cancellation notices received via email 14-29 days prior to the training will receive a 75% refund, minus a $75 administrative fee. Please email your cancellation notice to Columbia.EMR@gmail.com. Due to workshop capacity and preparation, we regret that we are unable to refund registration fees for cancellations <14days prior to the training.

If you are unable to attend the training, we encourage you to send a substitute within the same registration category. Please inform us of the substitute via email at least one week prior to the training to include them on attendee communications, updated registration forms, and materials. Should the substitute fall within a different registration category your credit card will be credited/charged respectively. Please email substitute inquiries to Columbia.EMR@gmail.com. In the event Columbia must cancel the event, your registration fee will be fully refunded.

Additional Information

The Electronic Medical Records Boot Camp is hosted by Columbia University's SHARP Program at the Mailman School of Public Health.

Jump to:   Overview  |  Prerequisites  |  Instructors  |  Scholarships  |  Locations  |  Testimonials  |  Registration Fees  |  Additional Information