
Biostatistician Designs AI Tools for Early Diagnosis and Treatment of Alzheimer’s and Other Complex Diseases
As Alzheimer’s disease begins to disrupt neurocognitive function, the symptoms can be so subtle that even family and loved ones struggle to discern the changes, delaying diagnosis even as damage magnifies. That challenge has only been magnified for clinicians and scientists seeking signs of the earliest biochemical disruptions associated with Alzheimer’s, in their quest for early diagnosis and effective treatment.
Biostatistician Zhonghua Liu, ScD, a Columbia Mailman School assistant professor of biostatistics since 2022, sees a chance to turn the tables on Alzheimer’s—as well as myriad other complex, neurodegenerative diseases—by developing novel computational tools that comb through genetic biobanks and tease apart cause and effect to reveal protein biomarkers characteristic of early disease.
Liu’s first such tool—dubbed MR-SPI (Mendelian Randomization by Selecting genetic instruments and Post-selection Inference)—triangulates among data on genetic variants, structural and functional changes in proteins associated with Alzheimer’s, and disease outcomes to highlight potential targets for therapeutic intervention. In December 2024, Liu and colleagues at Columbia reported in Cell Genomics that they had identified several key proteins that exhibit structural alterations linked to Alzheimer’s risk. Even more exciting, they found multiple FDA-approved drugs that target those proteins. “The existing drugs can be repurposed to treat Alzheimer’s Disease,” says Liu, who has since been contacted by multiple pharmaceutical representatives curious to learn more about his methods. “We still need randomized clinical trials, but this work provides promising targets and opportunities.”
You’ve aspired to work as a statistician since before you started college. How has the field changed in the past two decades?
Liu: I think of myself as part of a new generation of biostatisticians in the era of artificial intelligence that combines knowledge of traditional statistics, machine learning, and AI, along with biomedicine—including epidemiology, genomics, and public health.
After finishing your doctorate in biostatistics and epidemiology at Harvard in 2015, you worked as an unpaid visiting scholar at Columbia Mailman and then as a quantitative strategist at Morgan Stanley. Why?
Liu: I was living in New York City with my wife, who was a PhD student at Columbia Business School, while I was working remotely as a postdoc at Harvard. I began collaborating with faculty here on a project using machine learning methods to analyze genetic variants. Later, my wife was pursuing her PhD at Columbia Business School, and we were expecting our daughter. I’ve always been drawn to quantitative problem-solving, and I wanted to stay close to family in New York, so I joined Morgan Stanley to explore how statistical and computational methods are applied in finance. It was a fascinating experience that exposed me to large-scale data analytics, modeling, and decision systems.
You spent four years in Hong Kong. What brought you back to Columbia Mailman?
Liu: Emotionally, mentally, I’m tied to the Columbia community. I even kept my expired Columbia ID as a souvenir when I joined the faculty at the University of Hong Kong. During a 2021 visit to Hong Kong, Professor of Biostatistics Ian McKeague mentioned to me that the department was hiring. Columbia was actually the only place I applied.
Your doctoral dissertation focused on metabolomics—molecular markers related to obesity. Now you work on the cascade of biological processes that give rise to Alzheimer’s and even cancer. What ties the work together?
Liu: My dissertation focused on metabolomics and developing statistical methods for analyzing multiple correlated traits like sleep patterns, blood tests, and insulin resistance. I later developed methods to disentangle direct and indirect effects of an environmental exposure on disease outcomes, possibly mediated by epigenetics. The COVID pandemic deepened my appreciation for the power of data-driven methods and strengthened my commitment to applying statistics and AI to improve human health.
Artificial intelligence gets a lot of hype. In your Cell Genomics Paper, you used Alpha Fold3, an AI tool that predicts the structure of proteins. What’s your take on AI?
Liu: Artificial intelligence has opened a new era in structural biology and deepened our understanding of life at the molecular level. AI tools like AlphaFold3can now learn directly from millions of protein sequences and structures to make computer-based predictions of three-dimensional structures that are nearly as precise as results from real laboratory experiments. Currently, my team is developing the first AI-driven pipeline that integrates human whole-exome sequencing data with protein 3D structural modeling to investigate causal pathways leading to Alzheimer’s disease. We also use generative AI to design therapeutic proteins that specifically bind to Alzheimer’s disease–associated targets.
You’ve developed and released multiple software packages that implement your statistical and machine learning methods and posted them online as open source, so other investigators can use them, too. What do you get from this kind of collaboration?
Liu: Open-source development keeps my work transparent, reproducible, and impactful. By releasing my software publicly, I’ve seen how other researchers adapt and extend these methods to address scientific problems I hadn’t originally envisioned, leading to new methodological challenges, formal partnerships, and interdisciplinary projects that create a continuous feedback loop between methodology and application, which I find deeply fulfilling as both a statistician and a scientist.
