Fall 2025 Departmental Seminars & Lectures

These are the archived Departmental Seminars & Lectures from Fall of 2025.

During the Fall and Spring semesters, the Department of Biostatistics holds regular seminars on Thursdays, called the Levin Lecture Series, on a wide variety of topics which are of interest to both students and faculty. Over each semester, there are also often guest lectures outside the regular Thursday Levin Lecture Series, to provide a robust schedule the covers the wide range of topics in Biostatistics. The speakers are invited guests who spend the day of their seminar discussing their research with Biostatistics faculty and students.

Many seminars this semester are Zoom, which are joinable via the link here or using Meeting ID: 963 2560 9671 & Passcode: 698339. Links are also available on the individual talk entries.

In-Person seminars will have no zoom option.

Fall 2025 Schedule

Thursday, September 4th, Zoom, 11:45am
Levin Lecture

James Zou, PhD
Associate Professor of Biomedical Data Science and, by courtesy, of Computer Science and of Electrical Engineering
Stanford University

Computational Biology in the Age of AI Agents

Abstract:

AI agents—large language models equipped with tools and reasoning capabilities—are emerging as powerful research enablers. This talk will explore how computational biology is particularly well-positioned to benefit from rapid advances in agentic AI. I’ll first introduce the Virtual Lab—a collaborative team of AI scientist agents conducting in silico research meetings to tackle open-ended research projects. As an example application, the Virtual Lab designed new nanobody binders to recent Covid variants that we experimentally validated. Then I will present CellVoyager, a data science agent that analyzes complex genomics data to derive new insights. I will conclude by discussing limits of agents and a roadmap for human researcher-AI collaboration.

Thursday, September 11th, ARB 8th Floor Auditorium, 11:45am
Levin Lecture

Bingxin Zhao, PhD
Assistant Professor of Statistics and Data Science
Assistant Professor of Medicine, Division of Translational Medicine and Human Research (secondary appointment)
University of Pennsylvania

Resampling-based Pseudo-training in Genomic Predictions

Abstract:
In this talk, I will present a resampling-based pseudo-training framework for genomic prediction that enables model development using only summary-level data. We show that generating pseudo-training and validation statistics from summary results achieves asymptotic equivalence to conventional training while avoiding the need for individual-level datasets. Simulations and real data applications suggest that pseudo-training performs comparably to standard approaches with large datasets and substantially better when tuning data are limited. We highlight two platforms built on this framework: PennPRS (https://pennprs.org/), a cloud-based computing infrastructure supporting large-scale, no-code polygenic risk score training with purely summary data resources, and GCB-Hub (https://www.gcbhub.org/), which applies pseudo-training to proteome-wide association studies for protein-disease mapping and drug discovery. Together, these advances demonstrate how resampling-based pseudo-training methods can broaden accessibility, scalability, and impact of genomic prediction across diverse biomedical research settings

Thursday, September 18th, Zoom, 11:45am
Levin Lecture

Jingyi Jessica Li, PhD
Professor & Program Head, Biostatistics Program; Donald and Janet K. Guthrie Endowed Chair in Statistics, Public Health Sciences Division; Fred Hutch Cancer Center Affiliate Professor Biostatistics
University of Washington

Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models

Abstract:
Balancing false discovery rate (FDR) control with high statistical power is a central challenge in high-dimensional variable selection. Existing methods often degrade data through knockoffs or splitting, leading to power loss. We propose Nullstrap, a framework that con- trols FDR without altering the original data. Nullstrap generates synthetic null data by fitting a null model under the null hypothesis and applies the same estimation to both original and synthetic datasets. This parallel structure resembles the likelihood ratio test, serving as its numerical analog. A data-driven correction procedure adjusts null estimates, enabling variable selection with theoretical guarantees: asymptotic FDR control at any desired level and power converging to one. Nullstrap is fast, stable, and broadly applicable across linear, generalized linear, Cox, and graphical models. Simulations indicate that Nullstrap maintains robust FDR control and outperforms the knockoff filter and data splitting in power (0.95 vs. 0.50 and 0.70) and efficiency (≈ 30×). While all three methods are randomized, Nullstrap is more stable (Jaccard 0.98 vs. 0 and 0.42). In a triple-omics time-to-labor dataset, the knockoff filter and data splitting fail to identify variables in most of 70 runs with different random seeds, whereas Nullstrap consistently selects predictors, achieves > 90% predictive accuracy, and is three orders of magnitude faster.

Thursday, September 25th, Zoom, 11:45am
Levin Lecture

Brian Caffo, PhD, MS
Professor, Department of Biostatistics
John Hopkins University Bloomberg School of Public Health

Does AI need to be artificial? Does it need to be intelligent?

In this talk we consider the fascinating possibility of organoid intelligence (OI). Organoid intelligence using human derived pluripotent stem cells to create neural clusters with measurable neuronal activity. We discuss a recent JHU effort in OI from a team of neuroscientists, engineers, signal processors and statisticians. The goal is to use organoids to perform complex computing tasks through stimulation and response. Measurement is obtained from an electrode shell custom designed for three dimensional measurements. Apart from OI, organoids electrophysiology experiments are useful for studying human genetic disorders and toxicity through another phenotype in vitro. Here, we will focus on the statistical challenges underlying this novel form of measurement. Time permitting, we will discuss other efforts in biocomputing.

Thursday, October 2nd, Zoom, 11:45am
Levin Lecture

Natalie Dean, PhD
Associate Professor, Department of Biostatistics and Bioinformatics, Department of Epidemiology
Emory University Rollins School of Public Health

Challenges in Estimating Vaccine Effectiveness Against Progression to Severe Disease

Abstract:
Vaccines can reduce an individual’s risk of infection and their risk of progression to disease given infection. The latter effect is less commonly estimated but is relevant for risk communication and vaccine impact modeling. Using a motivating example from the COVID-19 literature, we note how vaccine effectiveness against progression can appear to increase over time in settings where true biological strengthening is unlikely. We use mathematical modeling to demonstrate how this phenomenon can occur when there is an underlying vulnerable subpopulation with poor vaccine response against infection and progression. As a result, the earliest infections are among those with the weakest protection against disease. We describe a modeling framework to link underlying immunology and post-vaccination outcomes that we use to further examine this problem. This work highlights methodological challenges in isolating a vaccine’s effect on progression to severe disease after infection.

Thursday, October 9th, Zoom, 11:45am
Levin Lecture

KC Gary Chan, PhD
Professor, Health Services; Professor , Biostatistics
University of Washington School of Public Health

Robust and efficient semiparametric inference for the stepped wedge design

Abstract:
Stepped wedge designs (SWDs) are increasingly used to evaluate longitudinal cluster-level interventions but pose substantial challenges for valid inference. Because crossover times are randomized, intervention effects are intrinsically confounded with secular time trends, while heterogeneous cluster effects, complex correlation structures, baseline covariate imbalances, and unreliable standard errors from few clusters further complicate statistical inference. We propose a unified semiparametric framework for estimating possibly time-varying intervention effects in SWDs that directly addresses these issues. A nonstandard development of semiparametric efficiency theory is required to accommodate correlated observations within clusters, non-identically distributed outcomes across clusters due to varying cluster-period sizes, and weakly dependent treatment assignments that are hallmarks of SWDs. The resulting estimator of treatment contrast is consistent and asymptotically normal even under misspecification of the covariance structure and control cluster-period means, and achieves the semiparametric efficiency bound when both are correctly specified. To facilitate inference for trials with few clusters, we introduce a permutation-based procedure to better capture finite-sample variability and a leave-one-out correction to mitigate plug-in bias. We further discuss how effect modification can be naturally incorporated, and imbalanced precision variables can be accommodated via a simple adjustment closely related to post-stratification, a novel connection of independent interest. Simulations and application to a public health trial demonstrate the robustness and efficiency of the proposed method relative to standard approaches

Thursday, October 16th, Hess Commons, 11:45am
Levin Lecture

Andrew An Chen, PhD
Assistant Professor, Department of Public Health Sciences
Medical University of South Carolina

Methodological Considerations in Applying Brain Charts to New Samples

Abstract:

Multi-site national and international imaging consortia have formed with the goal of precisely characterizing the human brain across the lifespan. These consortia have succeeded in collecting large samples of brain magnetic resonance imaging (MRI) scans to estimate sex-specific trajectories of brain phenotypes across age, often called brain charts. The promise of brain charts is that future researchers and clinicians will be able to assess a new scan for deviations from this healthy trajectory. However, the implementation of these charts in practice is severely limited by differences across study sites, also known as site effects. Here, we first discuss several projects in harmonization of MRI data specifically tailored to this normative modeling setting. Then, we leverage advancements in model uncertainty quantification to propose new ways to calibrate brain charts, as an alternative to harmonizing data. Finally, we apply our approaches to the Lifespan Brain Chart Consortium (LBCC) to assess generalizability to new scans from both healthy individuals and Alzheimer's disease (AD) patients. Based on our findings, we provide methodological recommendations for applying fitted brain charts to new sites.

Thursday, October 23rd, Zoom, 11:45am
Levin Lecture

GuanNan Wang, PhD
Assistant Professor, Mathematics
College of William & Mary

Boosting Biomedical Imaging Analysis via Distributed Functional Regression and Synthetic Surrogates

Abstract:
Understanding how scalar covariates influence spatial patterns in medical imaging data, such as neuroimaging or organ-level functional images, is a central challenge in modern biomedical research. The rapid expansion of large-scale imaging studies has heightened the need for statistical frameworks that are both interpretable and computationally scalable. In this talk, I will introduce a new class of domain-aware functional regression models, where spatially varying coefficients link scalar predictors to imaging responses defined over complex 3D domains. Our Distributed Image-on-Scalar Regression framework employs a triangulation-based domain decomposition strategy, enabling efficient parallel estimation with trivariate penalized splines. This design preserves global spatial structure while flexibly accommodating subregion-specific heterogeneity. To address additional challenges posed by incomplete or noisy imaging data, I will also discuss the use of synthetic surrogates generated with modern AI tools. Rather than imputing missing values directly, these synthetic surrogates can serve as auxiliary data that can be jointly analyzed with observed images, improving efficiency while maintaining robustness to imputation error. Together, these advances pave the way for scalable, uncertainty-aware statistical analysis of high-dimensional biomedical imaging.

Thursday, October 30th, Zoom, 9:00am
Levin Lecture

Bibhas Chakraborty, PhD
Associate Professor, Centre for Quantitative Medicine; Interim Director, Centre for Quantitative Medicine
Duke-NUS Medical School

Innovative Trial Designs in Mobile Health Using Reinforcement Learning

Abstract:

Mobile health (mHealth) interventions (e.g., motivational text-messages or nudges to promote healthy behaviors) are becoming increasingly common in tandem with advances in mobile and wearable sensor technologies. In this talk, we will discuss an innovative trial design arising in mHealth, namely, the micro-randomized trial (MRT) that involves sequential, within-person randomization over many instances. The basic MRT design can be further improved to make it adaptive, thereby enabling it to learn from accumulated data as the trial progresses. This is appealing from an ethical perspective since the adaptive learning tends to make better interventions available to the trial participants. Adaptive learning in such trial designs is often operationalized via Reinforcement Learning algorithms. Specifically, we will discuss the role of a particular algorithm called Thompson sampling in designing adaptive MRTs. Theoretical as well as simulation results will be shown to validate the proposed approach. An mHealth clinical trial will be discussed in detail as a case study.

Thursday, November 6th, Hess Commons, 11:45am
Levin Lecture

Weng Kee Wong, PhD
Professor of Biostatistics
University of California Los Angeles Fielding School of Public Health

Nature-inspired Metaheuristics as a General-Purpose Optimization Tool in statistical Research

Metaheuristics have been widely used in engineering and computer science to tackle various types of optimization problems and are now increasingly used across disciplines. In particular, nature-inspired metaheuristic algorithms are increasingly popular in industry and AI for solving all kinds of complex and high-dimensional optimization problems. Interestingly, these algorithms seem to be still relatively underused in the statistical research community. I present an overview of nature-inspired metaheuristics and some of their applications in statistics. The main appealing features of these algorithms are their speed, flexibility, availability of codes in different platforms, and ease of implementation and usage. Above all, they are virtually assumptions free, which allows us to apply them to solve a huge range of challenging optimization tasks, including cases when the objective function or functions are non-differentiable or not explicitly specified. I will discuss recent applications to find challenging optimal designs for non-linear models commonly used in the biomedical sciences, how nature-inspired metaheuristics can design more efficient early phase clinical trials, and their recent applications to tackle non-design optimization problems in statistics.

Thursday, November 13th, Zoom, 11:45am
Levin Lecture

Lina Montoya, PhD
Assistant Professor, Department of Biostatistics; Assistant Professor, School of Data Science and Society
University of North Carolina Gillings School of Global Public Health

Effects among the affected

In this talk, she will discuss a causal estimand that elucidates how response to an earlier treatment (e.g., treatment initiation) modifies the effect of a later treatment (e.g., treatment discontinuation), thus learning if there are effects among the (un)affected. Specifically, she considers a working marginal structural model summarizing how the average effect of a later treatment varies as a function of the (estimated) conditional average effect of an earlier treatment.She defines the estimand to be a data-adaptive causal parameter, allowing for estimation of the conditional average treatment effect using machine learning without making strong smoothness assumptions. She shows how a sequentially randomized design can be used to identify this causal estimand, and we describe a targeted maximum likelihood estimator for the resulting statistical estimand, with influence curve-based inference. Throughout, she uses the “Adaptive Strategies for Preventing and Treating Lapses of Retention in HIV Care” trial (NCT02338739) as an illustrative example, showing that discontinuation of conditional cash transfers for HIV care adherence was most harmful among those who most had an increase in benefits from them initially.

Thursday, November 20th, Zoom, 11:45am
Levin Lecture

Didong Li, PhD
Assistant Professor, Department of Biostatistics
University of North Carolina Gillings School of Global Public Health

Statistics in the Age of AI: Theory, Methods, and Data

Artificial Intelligence (AI) has surged in popularity, creating both opportunities and challenges for statistics. In this talk, I will present three recent directions from my lab that reflect our efforts to engage with the age of AI. First, I will discuss theoretical results for decoder-based generative models, providing statistical foundations that connect latent dimension, approximation error, and model complexity. Second, I will discuss a method to use embeddings from large language models to enhance high-dimensional hypothesis testing, a widely used statistical tool in scientific domains, motivated by problems in cancer genomics where traditional methods are underpowered. I will also discuss extensions to genetic studies, where we curated annotations for 8.9 billion genetic variants from the human genome, and obtained embeddings of these 8.9 billion variants for downstream tasks such as GWAS and phenotype prediction. Finally, I will switch to an infrastructural view, introducing STimage-1K4M, one of the first and largest publicly available spatial transcriptomics datasets curated by my group, consisting of 1,149 slides and more than 4 million pathology image–gene expression pairs across 10 species and 50 tissue types. This resource has been downloaded over 192,000 times on HuggingFace and has facilitated the training of multiple foundation models. Together, these examples illustrate how theory, methodology, and data curation advance both statistics and AI.

Thursday, December 4th, Zoom, 11:45am
Levin Lecture

Mingyao Li, PhD
Professor of Biostatistics in Biostatistics and Epidemiology
University of Pennsylvania Perelman School of Medicine

AI-powered tissue maps: uniting spatial omics and pathology imaging

Spatial omics technologies have revolutionized biomedical research by providing high-resolution, spatially resolved molecular profiles that offer unprecedented insights into tissue structure and function. However, their widespread application is hindered by high costs, long turnaround times, and limited tissue coverage. In contrast, most spatial omics platforms also generate high-resolution histology images from the same tissue slices. Histopathology remains the clinical gold standard for disease diagnosis, widely used in practice due to its cost-effectiveness and rapid processing. In this talk, I will introduce several recently developed tools designed to bridge the gap between spatial omics and histology, leveraging their complementary strengths. These innovations aim to enhance the affordability and scalability of spatial omics, expanding its accessibility for both biomedical research and clinical applications.

Thursday, December 11th, Hess Commons, 11:45am
Levin Lecture

Falco J. Bargagli Stoffi, PhD
Assistant Professor, Department of Biostatistics
University of California Los Angeles Fielding School of Public Health

Stable Discovery of Treatment Effect Modifiers

Identifying covariates that modify treatment effects is a critical problem in causal inference. Yet existing data-adaptive methods lack rigorous error control, risking spurious findings that fail to replicate. We propose a method combining pseudo-outcomes with a novel cross-fitted stability selection algorithm to achieve finite-sample false discovery control for effect modifiers. We prove that selection probabilities are asymptotically unbiased, converging to oracle probabilities at parametric rate under doubly robust pseudo-outcome estimation. False discovery is controlled at the nominal level while maintaining power to detect genuine heterogeneity. We demonstrate the method on simulations and real-world data.

Biostatistics

Fall 2025 Departmental Seminars & Lectures

Fall 2025 Schedule

Thursday, September 4th, Zoom, 11:45am Levin Lecture

James Zou, PhD Associate Professor of Biomedical Data Science and, by courtesy, of Computer Science and of Electrical Engineering Stanford University

Computational Biology in the Age of AI Agents

Thursday, September 11th, ARB 8th Floor Auditorium, 11:45am Levin Lecture

Bingxin Zhao, PhD Assistant Professor of Statistics and Data Science Assistant Professor of Medicine, Division of Translational Medicine and Human Research (secondary appointment) University of Pennsylvania

Resampling-based Pseudo-training in Genomic Predictions

Thursday, September 18th, Zoom, 11:45am Levin Lecture

Jingyi Jessica Li, PhD Professor & Program Head, Biostatistics Program; Donald and Janet K. Guthrie Endowed Chair in Statistics, Public Health Sciences Division; Fred Hutch Cancer Center Affiliate Professor Biostatistics University of Washington

Nullstrap: A Simple, High-Power, and Fast Framework for FDR Control in Variable Selection for Diverse High-Dimensional Models

Thursday, September 25th, Zoom, 11:45am Levin Lecture

Brian Caffo, PhD, MS Professor, Department of Biostatistics John Hopkins University Bloomberg School of Public Health

Does AI need to be artificial? Does it need to be intelligent?

Thursday, October 2nd, Zoom, 11:45am Levin Lecture

Natalie Dean, PhD Associate Professor, Department of Biostatistics and Bioinformatics, Department of Epidemiology Emory University Rollins School of Public Health

Challenges in Estimating Vaccine Effectiveness Against Progression to Severe Disease

Thursday, October 9th, Zoom, 11:45am Levin Lecture

KC Gary Chan, PhD Professor, Health Services; Professor , Biostatistics University of Washington School of Public Health

Robust and efficient semiparametric inference for the stepped wedge design

Thursday, October 16th, Hess Commons, 11:45am Levin Lecture

Andrew An Chen, PhD Assistant Professor, Department of Public Health Sciences Medical University of South Carolina

Methodological Considerations in Applying Brain Charts to New Samples

Thursday, October 23rd, Zoom, 11:45am Levin Lecture

GuanNan Wang, PhD Assistant Professor, Mathematics College of William & Mary

Boosting Biomedical Imaging Analysis via Distributed Functional Regression and Synthetic Surrogates

Thursday, October 30th, Zoom, 9:00am Levin Lecture

Bibhas Chakraborty, PhD Associate Professor, Centre for Quantitative Medicine; Interim Director, Centre for Quantitative Medicine Duke-NUS Medical School

Innovative Trial Designs in Mobile Health Using Reinforcement Learning

Thursday, November 6th, Hess Commons, 11:45am Levin Lecture

Weng Kee Wong, PhD Professor of Biostatistics University of California Los Angeles Fielding School of Public Health

Nature-inspired Metaheuristics as a General-Purpose Optimization Tool in statistical Research

Thursday, November 13th, Zoom, 11:45am Levin Lecture

Lina Montoya, PhD Assistant Professor, Department of Biostatistics; Assistant Professor, School of Data Science and Society University of North Carolina Gillings School of Global Public Health

Effects among the affected

Thursday, November 20th, Zoom, 11:45am Levin Lecture

Didong Li, PhD Assistant Professor, Department of Biostatistics University of North Carolina Gillings School of Global Public Health

Statistics in the Age of AI: Theory, Methods, and Data

Thursday, December 4th, Zoom, 11:45am Levin Lecture

Mingyao Li, PhD Professor of Biostatistics in Biostatistics and Epidemiology University of Pennsylvania Perelman School of Medicine

AI-powered tissue maps: uniting spatial omics and pathology imaging

Thursday, December 11th, Hess Commons, 11:45am Levin Lecture

Falco J. Bargagli Stoffi, PhD Assistant Professor, Department of Biostatistics University of California Los Angeles Fielding School of Public Health

Stable Discovery of Treatment Effect Modifiers

Thursday, September 4th, Zoom, 11:45am
Levin Lecture

James Zou, PhD
Associate Professor of Biomedical Data Science and, by courtesy, of Computer Science and of Electrical Engineering
Stanford University

Thursday, September 11th, ARB 8th Floor Auditorium, 11:45am
Levin Lecture

Bingxin Zhao, PhD
Assistant Professor of Statistics and Data Science
Assistant Professor of Medicine, Division of Translational Medicine and Human Research (secondary appointment)
University of Pennsylvania

Thursday, September 18th, Zoom, 11:45am
Levin Lecture

Jingyi Jessica Li, PhD
Professor & Program Head, Biostatistics Program; Donald and Janet K. Guthrie Endowed Chair in Statistics, Public Health Sciences Division; Fred Hutch Cancer Center Affiliate Professor Biostatistics
University of Washington

Thursday, September 25th, Zoom, 11:45am
Levin Lecture

Brian Caffo, PhD, MS
Professor, Department of Biostatistics
John Hopkins University Bloomberg School of Public Health

Thursday, October 2nd, Zoom, 11:45am
Levin Lecture

Natalie Dean, PhD
Associate Professor, Department of Biostatistics and Bioinformatics, Department of Epidemiology
Emory University Rollins School of Public Health

Thursday, October 9th, Zoom, 11:45am
Levin Lecture

KC Gary Chan, PhD
Professor, Health Services; Professor , Biostatistics
University of Washington School of Public Health

Thursday, October 16th, Hess Commons, 11:45am
Levin Lecture

Andrew An Chen, PhD
Assistant Professor, Department of Public Health Sciences
Medical University of South Carolina

Thursday, October 23rd, Zoom, 11:45am
Levin Lecture

GuanNan Wang, PhD
Assistant Professor, Mathematics
College of William & Mary

Thursday, October 30th, Zoom, 9:00am
Levin Lecture

Bibhas Chakraborty, PhD
Associate Professor, Centre for Quantitative Medicine; Interim Director, Centre for Quantitative Medicine
Duke-NUS Medical School

Thursday, November 6th, Hess Commons, 11:45am
Levin Lecture

Weng Kee Wong, PhD
Professor of Biostatistics
University of California Los Angeles Fielding School of Public Health

Thursday, November 13th, Zoom, 11:45am
Levin Lecture

Lina Montoya, PhD
Assistant Professor, Department of Biostatistics; Assistant Professor, School of Data Science and Society
University of North Carolina Gillings School of Global Public Health

Thursday, November 20th, Zoom, 11:45am
Levin Lecture

Didong Li, PhD
Assistant Professor, Department of Biostatistics
University of North Carolina Gillings School of Global Public Health

Thursday, December 4th, Zoom, 11:45am
Levin Lecture

Mingyao Li, PhD
Professor of Biostatistics in Biostatistics and Epidemiology
University of Pennsylvania Perelman School of Medicine

Thursday, December 11th, Hess Commons, 11:45am
Levin Lecture

Falco J. Bargagli Stoffi, PhD
Assistant Professor, Department of Biostatistics
University of California Los Angeles Fielding School of Public Health