Python Data Wrangling Boot Camp: Introduction to Data Wrangling, Cleaning and Manipulation with the Python Programming Language

June 6-7, 2024 | In-person training

Registration is open! Join us for the next Python Data Wrangling Boot Camp on June 6-7, 2024.

The Python Data Wrangling Boot Camp is a two-day intensive course that combines concept-focused seminars with hands-on exercises pairing Python fundamentals with practical data wrangling and analysis.


 Subscribe for updates on  scholarship dates, deadlines, and announcements


Jump to: Overview | Audience & Requirements | Instructor | Scholarships | Locations | Testimonials | Registration Fees | Additional Information

Boot Camp Overview

Summer 2024 dates: In-person training June 6-7, 2024; 9:00am - ~5:00pm ET

Python is one of the world's most popular programming languages. It is versatile enough to create sophisticated data visualizations and powerful enough to run sophisticated machine learning models. Fortunately, Python was also specifically designed to be easy to learn and use, making it an excellent tool for anyone looking to enhance their data gathering and analytical skillset.

This two-day course will provide an introduction to the python programming language and demonstrate how it can be used to do essential data wrangling, manipulation and cleaning tasks using real-world biomedical data. Bringing together scalable methods and popular libraries for data manipulation, basic statistical analysis and visualization, this boot camp will provide participants with all the necessary tools and background for getting started with Python for data work. Through hosted notebooks, participants will leave the workshop with functioning code that they can then apply to their own data sets. Participants will receive orienting videos before the real-time sessions so they can familiarize themselves with the Jupyter Notebook/Google Colab environment; all code samples will be available in this format for participant use.

By the end of the workshop, participants will be able to:

  • Load and explore data sets in Python
  • Join, reconcile and otherwise clean up messy data sets
  • Do basic statistical analyses, including linear and logistic regression
  • Render exploratory visualizations

Audience and Requirements

Investigators from any institution and from all career stages are welcome to attend, and we particularly encourage trainees and early-stage investigators to participate.

No prior programming experience is required to participate in this workshop. However, participants must have (or create) an unrestricted Google account for working with sample notebooks (via Google Colab) and data sets. Likewise, participants will be expected to complete a brief survey and watch up to 3 hours of pre-recorded introductory material before the start of the real-time workshop activities.


Summer 2024 instructing team is being finalized, but will be comparable to the 2023 lineup below.

Training Director: Susan McGregor, Associate Research Scholar, Columbia University Data Science Institute (DSI). McGregor has been teaching Python for data analysis and wrangling to learners from diverse backgrounds for almost a decade. Her book, Practical Python Data Wrangling & Data Quality is available from O'Reilly media.

Guest Speaker:

Jennifer Brite, DrPH is an Assistant Professor of Public Health at York College, The City University of New York (CUNY). Trained in epidemiology and biostatistics, she has applied public health experience in building and analyzing complex datasets using survey and administrative data through her work at the New York City Department of Health and Mental Hygiene (DOHMH) and Memorial Sloan Kettering Cancer Center (MSKCC). She also facilitated predictive modeling during the COVID-19 pandemic in a partnership between Columbia University, New York University, and DOHMH. She is currently leading a pilot study to gather indigent burial data nationwide (The US Public Burial Data Collection, USPBDC) in order to understand its determinants and distribution, as well as to more broadly examine economic and social disadvantage across the life course


Training scholarships are available for the Python Data Wrangling Boot Camp.


Summer 2024: The Python Data Wrangling Boot Camp is a live, in-person training taking place June 6-7 at the Columbia University Irving Medical Campus in NYC. All training start and end times are in EDT.

More information on travel, lodging, and getting around NYC.


"Well structured workshop that gave a broad idea on what to consider when structuring your data for analysis." - Postdoc at University of Pennsylvania, 2023

"It was a nice workshop for beginners to start using Python for data science projects." - Faculty member at University of Missouri-Kansas City, 2023

"This training was a very helpful introduction to Python for data science and analysis. We went over the most popular packages written in Python used to perform many of the tasks that I had previously been using in commercial softwares (e.g., SPSS). Since this training, I have already started using Python in my own work and have been very satisfied with how much I learned [at the training]!" - Graduate Student at Arizona State University, 2022

Additional Testimonials

"Great training for Python beginners.  Provides you with several different data wrangling techniques and enough example information to begin exploring your own data." - Postdoc at UNC Chapel Hill, 2022

"The training helped a lot, especially in starting to understand python. My focus is on bioinformatic analysis, and all the data manipulation in the course will be helpful." - Postdoc at Albany Medical Center, 2022

"I appreciated the emphasis on fundamental, practical skills. It's difficult to find workshops that are appropriate for beginners, but don't spend too much time only on easy to understand basics, and this one hit the mark." - Research Scientist at Arizona State University, 2022

"For a beginner the training was at a very good level to get started on my own. I feel I walked away with a lot of knowledge that can be applied to my own research and projects." - Kimberly R., Faculty member at Mercy College, 2021

"Susan clearly explained both high level concepts and specific syntax." - Amanda N., Government staff member, City of Atlanta City Auditor's Office, 2021

"The python training was excellent! Even as an early beginner in coding the instructors were fantastic and made the information easy and accessible to understand." - Michael F., Postdoc, 2021

"Great foundation for Python and has inspired me to develop my skills in it! Directly applicable to my research. Susan was a wonderful instructor - knowledgeable, enthusiastic and helpful!" - Anonymous Postdoc, 2021

Registration Fees

  Early-Bird Rate (through 4/10/24) Regular Rate (4/11/24 - 5/30/24) Columbia Discount*
Student/Postdoc/Trainee      $1,195 $1,395  10%
Faculty/Academic Staff/Non-Profit Organizations/Government Agencies $1,395  $1,595 10%
Corporate/For-Profit Organizations $1,595 $1,795 NA


*Columbia Discount: This discount is valid for any active student, postdoc, staff, or faculty at Columbia University. If paying by credit card, use your Columbia email address during the registration process to automatically have the discount applied. If paying by internal transfer within Columbia, submit this Columbia Internal Transfer Request form to receive further instructions. Please note: filling out this form is not the same as registering for a training and does not guarantee a training seat.  

Invoice Payment: If you would prefer to pay by invoice/check, please submit this Invoice Request form to receive further instructions. Please note: filling out this form is not the same as registering for a training and does not guarantee a training seat.

Registration Fee: This fee includes course material, which will be made available to all participants both during and after the conclusion of the training. The fee does not include travel/accommodation costs.

Cancellations: Cancellation notices must be received via email at least 30 days prior to the training start date in order to receive a full refund, minus a $75 administrative fee. Cancellation notices received via email 14-29 days prior to the training will receive a 75% refund, minus a $75 administrative fee. Please email your cancellation notice to Due to workshop capacity and preparation, we regret that we are unable to refund registration fees for cancellations <14 days prior to the training.

If you are unable to attend the training, we encourage you to send a substitute within the same registration category. Please inform us of the substitute via email at least one week prior to the training to include them on attendee communications, updated registration forms, and materials. Should the substitute fall within a different registration category your credit card will be credited/charged respectively. Please email substitute inquiries to In the event Columbia must cancel the event, your registration fee will be fully refunded.

Additional Information

The Python Data Wrangling Boot Camp is hosted by the Columbia Mailman School of Public Health's SHARP Program.

Jump to: Overview | Audience & Requirements | Instructor | Scholarships | Locations | Testimonials | Registration Fees | Additional Information