Sep. 05 2019

Data Science Initiative Forges New Connections

Although obscured in the cloud, data science undergirds every corner of our digital lives; every Google search, Amazon purchase, and Uber ride arrives thanks to an ever-more-powerful set of quantitative techniques, including those described as artificial intelligence. Lately, Columbia Mailman scientists have been harnessing these cutting-edge tools to answer research questions beyond the reach of traditional statistical methods, for instance by harnessing techniques used in modern weather forecasting to generate predictions for seasonal outbreaks of influenza.

Now a new initiative aims to further elevate the School’s data science game in partnership with the Columbia University Data Science Institute (DSI), a university-wide effort to support the gathering and interpreting of data for social good. Beginning this fall, post-doctoral fellows of Columbia Mailman School of Public Health will be eligible for the program that provides training in areas such as machine learning—a branch of artificial intelligence—and network science, which could be used to shed light on intricate systems like the human microbiome.

“Data science gives us new tools to analyze complex challenges like climate change that present multiple, overlapping threats to human health,” says Gary Miller, the School’s Vice Dean for Research Strategy and Innovation who laid the groundwork for the DSI partnership. “Every area of public health can benefit from these approaches.”

Before students learn how to “train” a computer model to look for patterns and connections in the data, fellows are steeped in the ethics of data science—how to use “data for good,” in the words of Jeanette M. Wing, Avanessians Director of the Data Science Institute and Professor of Computer Science. The responsible use of analytical tools is ever more important when the computer model affects our health, says Miller, a professor of Environmental Health Sciences. “We need to take care to uncover hidden biases that could exacerbate disparities [in health outcomes, such as between ethnic groups] if the system was modeled incorrectly.

Research collaborations through the DSI are already underway. Kai Ruggeri in Health Policy and Management and Marianthi-Anna Kioumourtzoglou in Environmental Health Sciences are working with John Paisley at Columbia Engineering to use Bayesian machine learning techniques to understand why patients in New York City miss appointments and what can be done to help them keep them. 

Elsewhere at Columbia Mailman, there are other signs of accelerating interest in data science. A two-semester course in data science offered through Biostatistics is among the most popular electives for master’s students; other departments have started similar classes. Miller expects trainings in advanced quantitative methods will soon be available schoolwide. “Each year, more students arrive here having some familiarity with data science,” he says, “and all of them want to learn more.”