Spring 2026 Departmental Seminars & Lectures

During the Fall and Spring semesters, the Department of Biostatistics holds regular seminars on Thursdays, called the Levin Lecture Series, on a wide variety of topics which are of interest to both students and faculty. Over each semester, there are also often guest lectures outside the regular Thursday Levin Lecture Series, to provide a robust schedule the covers the wide range of topics in Biostatistics. The speakers are invited guests who spend the day of their seminar discussing their research with Biostatistics faculty and students. All Levin Lectures will be hosted over zoom, with the following credentials: Meeting ID:913 0905 0869; Passcode: 556019

 

Spring 2026 Schedule

Thursday, February 12th, Zoom
Levin Lecture 

Liangyuan Hu, PhD
Associate Professor of Biostatistics and Epidemiology, Rutgers School of Public Health

Bayesian Machine Learning for Causal Inference and Real-World Evidence

Abstract: 

In this talk, I will present a suite of Bayesian machine learning methods that strengthen causal inference with complex real-world data. First, I introduce riAFT-BART, a random-intercept accelerated failure time model that uses Bayesian additive regression trees to estimate causal effects of multiple treatments on clustered time-to-event outcomes. The approach flexibly captures nonlinear covariate effects and heterogeneous treatment responses in hierarchical data. I pair this model with a new Bayesian sensitivity analysis that quantifies how unmeasured confounding could alter posterior causal conclusions. Second, for longitudinal observational studies with time-to-event outcomes, I develop an alternative survival g-formula that embeds BART within the evolving generative components to reduce bias from model misspecification. Focusing on binary time-varying treatments, I propose a class of discrete-time survival g-formulas that incorporate longitudinal balancing scores for both static and dynamic treatment strategies, along with posterior sampling algorithms for inference. I also present a loss-based Bayesian sensitivity analysis that propagates uncertainty while assessing departures from the no unmeasured time-varying confounding assumption. I illustrate these methods in two applications: (i) using the National Cancer Database to compare three treatment strategies for high-risk localized prostate cancer with riAFT-BART and its sensitivity framework, and (ii) applying the new survival g-formula to electronic health record data from the Yale New Haven Health System.

 

Thursday, February 19th, Hess Commons
Levin Lecture

Xinyi Li, PhD
Assistant Professor, Mathematical and Statistical Sciences, Clemson University

 From Functional PCA to Precision Medicine: Regression Inference and Individualized Treatment Rules with Imaging Features

Abstract: 

Modern studies increasingly pair clinical features with high-dimensional imaging, where each scan can be viewed as a function living in a Hilbert space. This talk introduces a unified approach that incorporate imaging data as interpretable features via functional principal component analysis (FPCA). First, we discuss a framework for linear regression with Hilbert-space-valued covariates that provides asymptotic normal inference and bootstrap uncertainty quantification, explicitly accounting for the fact that FPCA bases are estimated from data. Second, we use the proposed multi-dimensional FPCA features from imaging to estimate individualized treatment regime under standard causal assumptions, enabling treatment decisions informed by patient-specific imaging patterns along with risk factors. The proposed methods are applied to Alzheimer's Disease Neuroimaging Initiative (ADNI) data, where PET scans and genetic and demographic covariates are used to model cognitive outcomes and guide personalized treatment strategies. 

 

 

 

Thursday, February 26th, Zoom
Levin Lecture 

Nancy R. Zhang, PhD
Ge Li and Ning Zhao Professor, Professor of Statistics and Data Science, The Wharton School at the University of Pennsylvania

Data Integration in Spatial and Single Cell Omics:  What is Erased, and Can you Recover it?

Abstract: In single-cell and spatial biology, data integration refers to the alignment of cells across samples and modalities, and is an ubiquitous challenge affecting all downstream analyses. The goal in cell integration is to find cells across data sets that share the same biological state that may be obscured by technical differences.   

In this talk, I will cast the cell integration problem on a continuum of weak to strong linkage, depending on the strength of feature sharing between experiments. First, I will examine integration across data modalities of weak linkage.  This arises when there are few shared features between the data being integrated, for example, between single-cell RNA sequencing data and spatial proteomics data. For this, I will present MaxFuse, a method that leverages higher order relationships between all features, including unshared features, to achieve accurate integration. Next, we consider the scenario of data alignment across the same modality in clinical scale studies. For this setting, I will show that existing paradigms are overly aggressive, erasing disease and treatment effects and introducing severe data distortion.  I will introduce a "pool-of-controls" experimental design concept to disentangle biological variation from unwanted variation.  Based on this, I will describe CellANOVA, a novel statistical model and scalable algorithm that recovers biological signals lost during batch integration and corrects integration related data distortion.  Through these two contrasting paradigms, I will share the key lessons learned and the remaining challenges in this field.

 

 

 

 

Thursday, March 5th, Zoom
Levin Lecture 

Ji-Hyun Lee, DrPH
Professor, Department of Biostatistics, University of Florida

Conversations with a Collaborative Biostatistician: The Quiet Power of Everyday Statisticians

Abstract: 

In a world that prizes headline breakthroughs, the steady, collaborative work of everyday statisticians often flies under the radar. In this interview‑style seminar, I will share a series of candid conversations that illustrate how routine statistical thinking, cross‑disciplinary teamwork, and inclusive leadership drive scientific discovery and improve patient outcomes. I’ll also highlight the initiatives I launched as ASA’s 2025 President to broaden community engagement, advance data‑science practice, and spotlight the tangible impact that ordinary experts have on meaningful research. Join me to uncover the quiet power that underpins modern science

 

 

 

 

Thursday, March 12th, Zoom
Levin Lecture 

Vadim Zipunnikov, PhD
Professor, Department of Biostatistics, Johns Hopkins University Bloomberg School of Public Health 

Developing more sensitive endpoints by leveraging novel statistical methods for Digital Health Technologies (DHTs) data

Abstract: 

Digital Health Technologies (DHT) are now used to continuously track physical activity and sleep in many clinical studies. This DHT data provides exciting opportunities to develop novel more sensitive clinical trial endpoints. There is, however, a large gap between the complexity of DHT data and statistical methodology for fully leveraging the potential of DHT. This talk will discuss recent developments of novel DHT-centric statistical methods that can provide more sensitive endpoints by extracting and fusing together information from temporal, distributional, and time-series aspects of DHT data. This talk will also emphasize why timing matters: many digital signals exhibit strong diurnal and circadian structure, and failure to account for timing can obscure clinically meaningful effects.

 

 

 

 

Thursday, March 26th, Zoom
Levin Lecture 

Vanessa Didelez, PhD
Professor of Statistics, Leibniz Institute for Prevention Research and Epidemiology

A critically review of causal mediation analysis

Abstract: 

In this presentation, I will reflect on why a formal causal treatment of mediation is conceptually and technically demanding, even though it feels so natural to speak of direct and indirect effects. Standard causal mediation analysis relies on nested counterfactuals and a cross-world independence assumption to define and identify natural (in)direct effects (Andrews & Didelez, 2021). This in itself is problematic; but it is even more problematic in longitudinal or survival settings. I will explain why  their identification in these latter cases is essentially hopeless. An alternative interventionist view (Robins & Richardson, 2011), closely aligned in spirit with the decision-theoretic approach of Dawid (2021), leads in the longitudinal/time-to-event case to separable treatment effects (Didelez, 2019; Stensrud et al, 2022). I will examine whether and how this approach addresses the challenges. Finally, when assumptions fail, partial identification through bounds can be considered; for separable effects, these bounds are closely related to existing bounds for specific natural and path-specific effects (Breum et al, 2025).

 

Thursday, April 2nd, Zoom
Levin Lecture 

Yi Song, PhD
Asssitant Professor, Department of Biostatistics, State University of New York (SUNY) at Buffalo

Debiasing Differentially Private Time-to-Event Data

Abstract: 

Sharing time-to-event data is essential for enabling collaborative research, designing effective interventions, and advancing patient care. However, sharing exact survival curves poses significant privacy risks. To protect individual privacy and mitigate the risk of membership inference attacks, various privacy-preserving solutions have been proposed. The differential privacy (DP) framework, in particular, offers strong and rigorous protection for data sharing. However, the noise injection required by DP can distort the probability density of the data, resulting in low utility and invalid statistical inference. In this work, we propose methods to mitigate these biases in differentially private, right-censored time-to-event data using a deconvolution framework via kernel density estimation. We provide a bias-corrected nonparametric estimator for the marginal distribution of event times and develop a corrected score approach for regression analysis to ensure valid inference under privacy-preserving noise.

 

Thursday, April 9th, Zoom
Levin Lecture 

Min Zhang, PhD
Vanke Chair Professor, Vanke School of Public Health, Tsinghua University

Modeling Time-varying Effects of Recurrent Exposures: A Time-Adapted Exponential Model to Assess Impact of Post-LVAD Bleeding on Mortality

Abstract: 

 

Bleeding is a common and recurrent adverse event in patients following left ventricular assist device implantation and is associated with an increased risk of mortality. Understanding its impact poses several challenges: (i) bleeding can occur at any time post-implantation; (ii) its effect may vary over time; and (iii) bleeding events often recur. However, no existing method addresses all these challenges simultaneously.  In this talk, we introduce the Time-Adapted Exponential (TAE) model, which accommodates recurrent bleeding events and incorporates an exponential time-adapted term in the Cox model to characterize both the transient effect at the onset of each bleeding event and the evolving effect over time. The TAE model also provides a framework to assess whether the effect of bleeding varies over time and whether different bleeding events have homogeneous effects. We derive the asymptotic properties of the TAE estimator and evaluate its performance through simulation studies. Application to the INTERMACS database reveals a decaying pattern in mortality risk following each bleeding event. Furthermore, all bleeding events are associated with an increased risk of mortality, with the first event having a greater impact than subsequent ones.

 

 

Wednesday April 15th, Zoom
Levin Lecture 

Chengchun Shi, PhD
Associate Professor of Data Science, London School of Economics and Political Science

Title: Demystifying LLM Reasoning using U-statistics Theory

Abstract: Group relative policy optimization (GRPO), a core methodological component of DeepSeekMath and DeepSeek-R1, has emerged as a cornerstone for scaling reasoning capabilities of large language models. Despite its widespread adoption and the proliferation of follow-up works, the theoretical properties of GRPO remain less studied. This paper provides a unified framework to understand GRPO through the lens of classical U-statistics. We demonstrate that the GRPO policy gradient is inherently a U-statistic, allowing us to characterize its mean squared error (MSE), derive the finite-sample error bound and asymptotic distribution of the suboptimality gap for its learned policy. Our findings reveal that GRPO is asymptotically equivalent to an oracle policy gradient algorithm -- one with access to a value function that quantifies the goodness of its learning policy at each training iteration -- and achieves asymptotically optimal performance within a broad class of policy gradient algorithms. Furthermore, we establish a universal scaling law that offers principled guidance for selecting the optimal group size. Empirical experiments further validate our theoretical findings, demonstrating that the optimal group size is universal, and verify the oracle property of GRPO.

Back to top