Python Data Wrangling Boot Camp: Introduction to Data Wrangling, Cleaning and Manipulation with the Python Programming Language

The next upcoming Python Data Wrangling Boot Camp will be held on July 24-25, 2023. Sign up below to hear about the registration opening!

The Python Data Wrangling Boot Camp is a two-day intensive course that combines concept-focused seminars with hands-on exercises pairing Python fundamentals with practical data wrangling and analysis.


 Subscribe for updates on registration and scholarship dates, deadlines, and announcements.

Jump to: Overview | Audience & Requirements | Instructor | Scholarships | Locations | Testimonials | Registration Fees | Additional Information


Boot Camp Overview

Summer 2023 dates: In-person at Columbia Mailman School of Public Health July 24-25, 2023; 10am EDT - ~5pm EDT

Python is one of the world's most popular programming languages. It is versatile enough to create sophisticated data visualizations and powerful enough to run sophisticated machine learning models. Fortunately, Python was also specifically designed to be easy to learn and use, making it an excellent tool for anyone looking to enhance their data gathering and analytical skillset. 

This two-day course will provide an introduction to the python programming language and demonstrate how it can be used to do essential data wrangling, manipulation and cleaning tasks using real-world biomedical data. Bringing together scalable methods and popular libraries for data manipulation, basic statistical analysis and visualization, this boot camp will provide participants with all the necessary tools and background for getting started with Python for data work. Through hosted notebooks, participants will leave the workshop with functioning code that they can then apply to their own data sets. Participants will receive orienting videos before the real-time sessions so they can familiarize themselves with the Jupyter Notebook/Google Colab environment; all code samples will be available in this format for participant use. 

By the end of the workshop, participants will be able to:

  • Load and explore data sets in Python
  • Join, reconcile and otherwise clean up messy data sets
  • Do basic statistical analyses, including linear and logistic regression
  • Render exploratory visualizations

Audience and Requirements

Investigators from any institution and from all career stages are welcome to attend, and we particularly encourage trainees and early-stage investigators to participate.

No prior programming experience is required to participate in this workshop. However, participants must have (or create) an unrestricted Google account for working with sample notebooks (via Google Colab) and data sets. Likewise, participants will be expected to complete a brief survey and watch up to 3 hours of pre-recorded introductory material before the start of the real-time workshop activities.

Per Columbia University Policy, on-campus participants must be vaccinated and should be prepared to show proof of vaccination while on campus. Any changes to on-campus policies will be updated here and communicated to participants.


Training Director: Susan McGregor, Associate Research Scholar, Columbia University Data Science Institute (DSI). McGregor has been teaching Python for data analysis and wrangling to learners from diverse backgrounds for almost a decade. Her book, Practical Python Data Wrangling & Data Quality is available from O'Reilly media.


Training scholarships are available for the Python Data Wrangling Boot Camp.


Summer 2023: The Python Data Wrangling Boot Camp is an in-person training held at the Columbia Mailman School of Public Health (722 W. 168th Street in NYC) on July 24-25, 2023; 10am EDT - ~5pm EDT.  

Per Columbia University Policy, on-campus participants must be vaccinated and should be prepared to show proof of vaccination while on campus. Any changes to on-campus policies will be updated here and communicated to participants.

More information on travel, lodging, and getting around NYC.


"Great training for Python beginners.  Provides you with several different data wrangling techniques and enough example information to begin exploring your own data."- Postdoc at UNC Chapel Hill, 2022

"This training was a very helpful introduction to Python for data science and analysis. We went over the most popular packages written in Python used to perform many of the tasks that I had previously been using in commercial softwares (e.g., SPSS). Since this training, I have already started using Python in my own work and have been very satisfied with how much I learned [at the training]!"- Graduate Student at Arizona State University, 2022

"The training helped a lot, especially in starting to understand python. My focus is on bioinformatic analysis, and all the data manipulation in the course will be helpful."- Postdoc at Albany Medical Center, 2022

"I appreciated the emphasis on fundamental, practical skills. It's difficult to find workshops that are appropriate for beginners, but don't spend too much time only on easy to understand basics, and this one hit the mark."- Research Scientist at Arizona State University, 2022

"For a beginner the training was at a very good level to get started on my own. I feel I walked away with a lot of knowledge that can be applied to my own research and projects." - Kimberly R., Faculty member at Mercy College, 2021

"Susan clearly explained both high level concepts and specific syntax." -  Amanda N., Government staff member, City of Atlanta City Auditor's Office, 2021

"The python training was excellent! Even as an early beginner in coding the instructors were fantastic and made the information easy and accessible to understand." - Michael F., Postdoc, 2021

"Great foundation for Python and has inspired me to develop my skills in it! Directly applicable to my research. Susan was a wonderful instructor - knowledgeable, enthusiastic and helpful!" - Anonymous Postdoc, 2021

Registration Fees

  Early-Bird Rate (through 5/15/23) Regular Rate (5/16/23 - 7/5/23) Columbia Discount*
Student/Postdoc/Trainee $1,175 $1,375 10%
Faculty/Academic Staff/Non-Profit Organizations/Government Agencies $1,375 $1,575 10%
Corporate/For-Profit Organizations $1,575 $1,775 NA

*Columbia Discount: This discount is valid for any active student, postdoc, staff, or faculty at Columbia University. To access Columbia discount, email for instructions and specify the following: 1) your registration category from the table above, 2) if you are paying by credit card, or internal transfer within Columbia, and 3) if an internal transfer, indicate the registration category from the table above, and if grant funds are being used.

Invoice Payment and Group Registrations: If you would prefer to pay by invoice/check or would like to pay for a group of registrants, please email with the following details: 1) full attendee name(s) and applicable registration category from the table above, and 2) payment method (credit card, invoice, wire).

Registration Fee:  This fee includes course material, which will be made available to all participants both during and after the conclusion of the training.

Cancellations: Cancellation notices must be received via email at least 30 days prior to the training start date in order to receive a full refund, minus a $75 administrative fee. Cancellation notices received via email 14-29 days prior to the training will receive a 75% refund, minus a $75 administrative fee. Please email your cancellation notice to Due to workshop capacity and preparation, we regret that we are unable to refund registration fees for cancellations <14 days prior to the training. 

If you are unable to attend the training, we encourage you to send a substitute within the same registration category. Please inform us of the substitute via email at least one week prior to the training to include them on attendee communications, updated registration forms, and materials. Should the substitute fall within a different registration category your credit card will be credited/charged respectively. Please email substitute inquiries to In the event Columbia must cancel the event, your registration fee will be fully refunded.

Additional Information

The Python Data Wrangling Boot Camp is hosted by the Columbia Mailman School of Public Health's SHARP Program.

Jump to: Overview | Audience & Requirements | Instructor | Scholarships | Locations | Testimonials | Registration Fees | Additional Information