A collage of photos of speakers and audience members. Photos: Lucas Hoeffel

Data Science Summit Spotlights the Quantitative Cutting Edge

January 18, 2024

The third Data Science in Public Health Summit at Columbia University Mailman School of Public Health showcased the innovative ways that data scientists are using emerging quantitative techniques to address public health challenges, from climate change to cancer to the health risks of beauty products and more. The January 11 event also considered challenges and opportunities in the fast-moving field. (Watch a video of the event below.)

The Summit took place on the heels of a potentially game-changing announcement from New York Governor Kathy Hochul. On January 9, the governor proposed the creation of Empire AI, a partnership between New York State, private benefactors, and universities—including Columbia—to create a powerful AI infrastructure. Speaking at the Summit, Jeanette Wing, Executive Vice President for Research and professor of computer science at Columbia, said the facility would bring “access to computational resources that no one university can afford.” She said she was “incredibly proud” that Columbia University is a founding partner of the planned initiative.

In opening remarks, Gary Miller, Vice Dean for Research Strategy and Innovation and professor of environmental health sciences, said Columbia Mailman—already at the vanguard of data science in public health science and education—is positioning itself to go further and take advantage of potential opportunities like Empire AI. This past fall, Jeff Goldsmith, associate professor of Biostatistics, was named Associate Dean of Data Science. He is leading the School’s strategy in data science in partnership with Dean Linda P. Fried; Gary Miller; Kiros Berhane, chair of Biostatistics; and others.

In a keynote address, Rafael Irizarry, professor of biostatistics at the Harvard T.H. Chan School of Public Health, conceptualized data science in public health as a collaborative enterprise bringing together specialists from biostatistics, computer science, computational biology, systems biology, biomedical informatics, and beyond. Likewise, wrangling, analyzing, managing, and storing data are all important skills in data science, but mastery of them does not adequately describe any one practitioner. “I don’t like using the term data scientist,” Irizarry said, arguing that data science should be understood as a collective effort that leverages wide-ranging skills and expertise.

Throughout the day, researchers from Columbia Mailman and the broader Columbia community highlighted innovations in data science, from collection to analysis. Many involved the interpretation of visual signals.

Ami Zota, associate professor of environmental health sciences, is using technology to identify beauty product ingredients from photos submitted by study participants—one part of a larger project to expand the quantity and quality of data on the health risks of these products. John Wright, associate professor of electrical engineering, is studying methods to process visual data from videos—an approach with potential applications in the analysis of environmental chemical mixtures, in collaboration with Columbia Mailman’s Marianthi-Anna Kioumourtzoglou, associate professor in environmental health sciences.

At the intersection of climate and health, Robbie Parks, assistant professor of environmental health sciences, is working with Columbia computer scientist Utkarsh Mall and others to analyze images in Google Street View to improve the identification of flooding risk during hurricanes. Xiao Wu, assistant professor of biostatistics, is using satellite data to evaluate the effectiveness of low-intensity fires to mitigate severe wildfires—literally, fighting fire with fire.

In the area of clinical imaging analysis, Mary Beth Terry, professor of epidemiology, is using AI deep learning to evaluate mammograms. Separately, Despina Kontos, professor of radiology, is employing her own methods to analyze breast imaging data. Seonjoo Lee, associate professor of biostatistics in psychiatry, analyses brain imaging data to quantify an individual’s relative susceptibility to Alzheimer’s disease and other age-related cognitive challenges.

Other researchers presented on challenges in data science and how to overcome them.

Daniel Malinsky, assistant professor of biostatistics, said algorithmic bias related to issues like race reflects the “unfair reality behind the data.” Bias can be corrected using statistical techniques, he said—but only if researchers are willing to engage with ethical questions. Sorcha Brophy, assistant professor of health policy and management, highlighted the need for upgrades to data infrastructure in federally supported health centers that serve low-income communities. Sen Pei, an assistant professor of environmental health sciences who builds predictive models for infectious disease, is overcoming the scarcity of disease surveillance data by drawing from social media, cellular data, and other sources.

In remarks to attendees, Dean Fried said data science and new tools like AI and machine learning are “indispensable to our mission” to advance a healthy and just world. She said the School is “investing to accelerate the adoption and advancement of leading-edge tools for research and practice, from big data and algorithms to computational infrastructure,” under an ethical framework, and in collaboration with colleagues with varied expertise. Advances in data sciences, she said, “are a reminder that change is and will be constant and we have to be inspired continuously to advance health in new and innovative ways.”

2024 Data Science for Public Health Summit Keynote: The Bright Future of Applied Statistics