Machine Learning Boot Camp: Analyzing Biomedical and Health Data


Machine Learning Boot camp - Columbia University SHARP TrainingThe most recent Machine Learning Boot Camp was August 27-28, 2020. Sign up below to hear about the next offering!


The Machine Learning Boot Camp is a two-day intensive boot camp of seminars combined with hands-on R sessions to provide an overview of concepts, techniques, and data analysis methods with applications in biomedical research.      


Subscribe for updates on registration and scholarship dates, deadlines, and announcements.



Summer 2020 dates: Live-stream, online training August 27-28, 2020; 10:00am - 4:00pm EDT

This two-day intensive training will provide a broad introduction to machine learning methodology with applications in biomedical research. Taught by a team of biostatisticians, the Boot Camp will integrate seminar lectures with hands-on R lab sessions to put concepts into practice. Emphasis will be given to supervised (e.g., penalized methods, classification and decision trees, survival forests) and unsupervised methods (e.g., clustering algorithms, dimensionality reduction) with numerous case studies and biomedical applications. The workshop will conclude with an overview and example(s) of ‘deep learning’ approaches.

By the end of the boot camp, participants will be familiar with the following topics:

  • Penalized Regression Methods (Ridge and Lasso)
  • Support Vector Machines
  • Decision Trees (Random Forest)
  • Predicting Survival Outcomes (Cox Regression/Lasso, Survival Forests)
  • Clustering Algorithms
  • Principle Component Analysis (PCA)
  • Deep Learning – An Illustrative Overview

Investigators at all career stages are welcome to attend, and we particularly encourage trainees and early-stage investigators to participate.


There are three prerequisites/requirements to attend:

  1. Each participant must have an introductory background in statistics.
  2. Each participant must be familiar with R. The main software used for the workshop will be R/RStudio, therefore we strongly recommend that participants have a basic understanding of this software prior to attending the Training. 
  3. Each participant is required to bring a personal laptop with R/RStudio installed prior to the first day of the workshop, as all lab sessions will be done on your personal laptop. R is available for free download and installation on Mac, PC, and Linux devices. Please review the R Installation Guide below.


Basic R knowledge is required for the boot camp as noted in prerequisites above. 

  • R Installation Guide: R is the free software programming language we will use in the boot camp. Use this installation guide to choose the correct version for your laptop (Mac/Windows) and install it prior to the first day of the boot camp.

If you have any specific questions about R and R studio in the context of the Machine Learning Boot Camp, please email us.


Noah Simon, PhD, Department of Biostatistics, School of Public Health, University of Washington. Dr. Simon’s methodological interests include computationally efficient methods for predictive modeling with high-dimensional, complex data, and the design of adaptive clinical trials.

Jean Feng, PhD, Department of Biostatistics and Epidemiology, School of Medicine, University of California, San Francisco. Dr. Feng's research interests include the interpretability and reliability of machine learning methods for biomedical applications, particularly those involving black-box models.

Cody Chiuzan, PhD, Department of Biostatistics, Mailman School of Public Health, Columbia University. Dr. Chiuzan’s research interests concern development of adaptive early-phase designs for oncology trials, including questions on the optimal study designs and endpoints for early-phase immune- and targeted-oncology agents. Dr. Chiuzan is the Director for Educational Initiatives of CTSA Biostatistics, Epidemiology and Research Design Resource (BERD) Resource.


Yifei Sun, PhD, Department of Biostatistics, Mailman School of Public Health, Columbia University. June 2019, June 2020.


Training scholarships are available for the Machine Learning Boot Camp.


COVID-19 Update: The Machine Learning Boot Camp will no longer take place in person due to the COVID-19 pandemic. The Boot Camp will instead be a live-stream, remote training that takes place over live, online video on August 27-28, 2020 from 10am EDT - 4pm EDT. Please note this training is not a self-paced, pre-recorded online training. 


"This was an amazing Boot Camp, complex information were delivered well with examples and the speakers had clear experience with the topics. Really appreciated the demonstrations in the code and live modifications that made the process clear. I really enjoyed the livestream vs. in person setting. More relaxed pace and had no problem having my questions answered (no commute)." - Katherine D., Faculty member from Columbia University, June 2020 virtual training

"The workshop introduce up-to-date concepts and provide training using timely examples with well-integrated insights." - Postdoc from MIT, June 2020 virtual training

"A fantastic overview of lots of fundamental ML concepts! The code snippets are very handy and the lecturers do a great job of teaching you what you need to know to use these methods in your own work. I just wish it was longer/we covered more!" - Vinyas H., Student from University of Toronto, June 2020 virtual training

"The boot camp gave me a good overview and hand-on training in R for real world data" - Staff attendee from a for-profit corporation, June 2020 virtual training

"The course gave a very practical and clear overview of the main ML methods people are using. What i learned will be very helpful for me to understand when a collaborator is using these methods on my data or as a starting point for when I want to delve deeper into a specific method. The teaching quality was superb." - Robert C., Faculty member from Columbia University, June 2020 virtual training

"Exceeded my expectations. Covered both conventional and cutting-edge methods of machine learning with both depth and breadth, implementing on real-world examples." - Student from Mount Sinai, June 2020 virtual training

"The training sections and hands-on programming exercises were very clear and interactive. Instructors are very knowledgeable and helpful." - Attendee from University of Iowa, June 2020 virtual training

"As someone with some statistical background but very little experience in machine learning specifically, I appreciated the overview of so many different topics, which introduced me to a wide array of applicable methods that I may choose to explore in my future work." - Student from Columbia University, June 2020 virtual training

"This course was a very helpful introduction to machine learning and included tools for the practical application of the concepts that were discussed." - Caleb I., Faculty member from Columbia University, June 2020 virtual training

"This boot camp was excellent in providing an introduction to machine learning. The quality of instruction was outstanding." - Victoria C., Research Biostatistician from Weill Cornell Medicine, 2019

"This is a great graduate-level workshop to understand the similarities and differences between traditional statistical modeling and machine learning. The level and pace are good, as are the Rmarkdown examples." - Anonymous Faculty member, 2019

"I enjoyed the ML boot camp. The instructors are highly knowledgable of statistics and ML as well as helpful. The intro to R session (1 hour) was not long enough and it was very rushed due to the fact that we were scheduled to go right into the actual boot camp workshop immediately afterwards. The days were long but not overly taxing. Overall, for someone with no background in using R or ML, I feel that I learned a tremendous amount! Thanks!" - Greg D, Faculty member from University of Delaware, 2019

"Instructors are super dedicated and training materials are well prepared. I definitely feel more confident in applying statistical learning methods in my work." - Xian W, Research Biostatistician from Weill Cornell Medicine, 2019 

"An excellent bootcamp that gives a good overview of machine learning as a concept as well as specific approaches." - Haotian W., Postdoc from Columbia University, 2019

"This was a great boot camp for people with a firm understanding of principles of statistics and machine learning, who are looking to deepen their knowledge, understanding, and application of machine learning in their research projects." - Marta J., Assistant Research Scientist from UCSD, 2019

"It was a great introduction to ML and it provided me with the right tools to apply these techniques in my own research." - Sujith R., Faculty member from University of Mississippi, 2019


COVID-19 Update: With the training being offered virtually, we are passing along any and all costs saved to attendees.

  Early-Bird Rate (through 4/15/20 4/22/20) Regular Rate (4/16/20 4/23/20 - 8/20/20) Columbia Discount*
Student/Postdoc/Trainee     $1,150  $825 $1,350 $975  10%
Faculty/Academic Staff/Non-Profit Organizations $1,350 $975  $1,550 $1,125 10%
Corporate/For-Profit Organizations $1,550 $1,125 $1,750 $1,275 NA


*Columbia Discount: This discount is valid for any active student, postdoc, staff, or faculty at Columbia University. To access Columbia discount, email for instructions and specify if you are paying by credit card, or internal transfer within Columbia.

Invoice Payment and Group Registrations: If you would prefer to pay by invoice/check, or would like to pay for a group of registrants, please email with details.

Registration Fee: includes course material. Course material will be available to all students after the workshop.

Cancellations: For summer 2020, no administrative fees will be assessed due to the evolving COVID-19 situation. Cancellation notices must be received via email at least 14 days prior to the workshop start date in order to receive a full refund. Please email your cancellation notice to Due to workshop capacity and preparation, we regret that we are unable to refund registration fees for cancellations after these dates, unless a new COVID-19 restriction is implemented that impedes virtual attendance, in which case any registration cancellation <14 days prior to a training related to COVID-19 restriction beyond your control (institutional policy, shift in work responsibilities, etc.) will be fully refunded and no administrative fee will be assessed. Because of the significant resources required to develop these trainings, you will be asked to submit supporting documentation (e.g. employer email notice, local regulations, etc.) for any COVID-19 related cancellation <14 days before a given training.

If you are unable to attend the training, we encourage you to send a substitute within the same registration category. Please inform us of the substitute via email at least one week prior to the training to include them on attendee communications, updated registration forms, and materials. Should the substitute fall within a different registration category your credit card will be credited/charged respectively. Please email substitute inquiries to . In the event Columbia must cancel the event, your registration fee will be fully refunded. 




Want updates on new Boot Camp details or registration deadlines? Subscribe here.

Questions? Email the Boot Camp team here.

The Machine Learning Boot Camp is hosted by Columbia University's Department of Environmental Health Sciences and Department of Biostatistics in the Mailman School of Public Health, and the Irving Institute for Clinical and Translational Research: Biostatistics, Epidemiology, and Research Design (BERD) Educational Resource.