Levin Lecture Series Colloquium Seminars

Lectures are in-person only unless marked otherwise. For all Zoom inquiries to virtually attend seminars, please send an email to Erin Elliott, Programs Coordinator (ee2548@cumc.columbia.edu).

During the Fall and Spring semesters, the Department of Biostatistics holds seminars, called the Levin Lecture Series, on a wide variety of topics which are of interest to both students and faculty. The speakers are occasionally departmental faculty members themselves but very often are invited guests who spend the day of their seminar discussing their research with Biostatistics faculty and students. 

Spring 2024 Levin Lectures

January 31, 8th Floor Auditorium, 11:45am 

(Note: This lecture will take place on a Wednesday)

Kathryn Roeder, PhD
UPMC Professor of Statistics and Life Sciences
Carnegie Mellon University Dietrich College of Humanities and Social Sciences

Methods for removing the effect of unmeasured confounders in high throughput screens, such as CRISPR and single-cell differential expression

Abstract: In the setting of bulk tissue, confounding is known to be important when testing for differential expression (DE). The confounding is thought to arise from unmeasured covariates, which correlate with the primary variable of interest.  Multiple testing procedures can be severely biased in this setting, leading to a plethora of false discoveries. In the past decade, many statistical methods have been proposed to adjust for confounders, but these methods were developed prior to the advent of single-cell transcriptome, proteome and epigenome readouts.  Building on the newest results in the statistical literature, we have developed new methods that are better suited to these current data types.  We also extend these methods toward more flexible modeling assumptions.  Simulations and real data analyses show this approach performs well for DE test  and analysis of CRISPR screens.


February 8th, 8th Floor Auditorium, 11:45am

Denise Esserman, PhD
Professor of Biostatistics
Yale School of Public Health

Talk Title & Abstract TBA







February 15th, 8th Floor Auditorium, 11:45am

Jason Roy, PhD
Professor and Chair, Department of Biostatistics and Epidemiology
Rutgers School of Public Health

Bayesian Semiparametric Model for Sequential Treatment Decisions with Informative Timing

Abstract: We develop a Bayesian semi-parametric model for the estimating the impact of dynamic treatment rules on survival among patients diagnosed with pediatric acute myeloid leukemia (AML). The data consist of a subset of patients enrolled in the phase III AAML1031 clinical trial in which patients move through a sequence of four treatment courses. At each course, they undergo treatment that may or may not include anthracyclines (ACT). While ACT is known to be effective at treating AML, it is also cardiotoxic and can lead to early death for some patients. Our task is to estimate the potential survival probability under hypothetical dynamic ACT treatment strategies, but there are several impediments. First, since ACT was not randomized in the trial, its effect on survival is confounded over time. Second, subjects initiate the next course depending on when they recover from the previous course, making timing potentially informative of subsequent treatment and survival. Third, patients may die or drop out before ever completing the full treatment sequence. We develop a generative Bayesian semi-parametric model based on Gamma Process priors to address these complexities. At each treatment course, the model captures subjects' transition to subsequent treatment or death in continuous time under a given rule. A g-computation procedure is used to compute a posterior over potential survival probability that is adjusted for time-varying confounding. Using this approach, we conduct posterior inference for the efficacy of hypothetical treatment rules that dynamically modify ACT based on evolving cardiac function.

February 22nd, 8th Floor Auditorium, 11:45am

Zijun Gao, PhD
Assistant Professor of Data Sciences and Operations
University of Southern California Marshall School of Business

Selective randomization inference for adaptive studies

Abstract: Many clinical trials are structured with multiple stages, where data analysis is conducted after each stage to inform subsequent participant recruitment and treatment allocation. This adaptive approach allows for early elimination of ineffective treatments or targeted recruitment of subpopulations showing potential benefits. Analyzing such trials presents challenges as the data is utilized twice: first for selecting the design and null hypothesis, and then for testing the chosen hypothesis using the data generated under the selected design. Classical statistical methods are inadequate as they require pre-specified data generating mechanisms and null hypotheses. Existing solutions are often limited in scope, tailored to specific designs. In this work, we propose a general framework capable of handling diverse designs and adaptive choices. Our approach leverages post-selection inference principles to develop a selective randomization p-value. Notably, it does not necessitate assumptions about the distribution of outcomes or covariates, or the dependency structure among participants. We demonstrate that our method enhances statistical power compared to other valid tests while maintaining control over the selective type-I error in simulated data and hypothetical clinical trials.

February 29th, 8th Floor Auditorium, 11:45am

Feifang Hu, PhD
Professor of Statistics
George Washington University Columbian College of Arts & Sciences

New Covariate-Adaptive Randomization Procedures and Their Properties

Abstract: Ensuring balanced covariates is crucial in successful comparative studies exploring causal effects, like causal inference, online A/B testing, and clinical trials. Despite relying on randomized experiments, chance imbalances persist, exacerbated by the era of big data. While existing literature mainly tackles discrete covariate balance, the use of covariate-adaptive randomization (CAR) for continuous covariates is limited, especially when aiming beyond initial data balancing. In this presentation, we unveil a range of CAR techniques tailored to achieve balance across varied covariate characteristics, including quadratic and interaction terms. Our framework doesn’t just bring together various existing methods; it introduces a significantly broader array of innovative CAR procedures. Demonstrating superior balancing capabilities, these procedures outshine existing methods. Uniquely, both the convergence rate and its proof represent groundbreaking contributions to CAR. These enhanced balancing properties notably improve the precision of estimating treatment effects, especially in the presence of nonlinear covariate effects. Through empirical studies, we showcase the exceptional and reliable performance of these procedures.

March 7th, 8th Floor Auditorium, 11:45am

Carmen Tekwe, PhD
Associate Professor
Indiana University Bloomington School of Public Health

SIMEX approach to estimation of the sparse conditional functional quantile regression with measurement error

Abstract: Quantile regression is a semiparametric approach used for modelling associations between variables. It is most helpful when the covariates have a complex relationship with the location, scale, and shape of the outcome distribution. Despite its robustness to distributional assumptions and outliers in the outcome, regression quantiles may be biased in the presence of measurement error in the covariates. While studies have investigated the case of scalar-valued covariates, the impact of function-valued covariates contaminated with error has not yet been examined. We present an instrumental variable approach for consistently estimating linear quantile regression models that include a function-valued covariate measured with error. A two-stage approach to estimation is proposed. In the first stage, an instrumental variable is used to obtain a reasonable estimate of the covariance matrix for the measurement error. In the second stage, the simulation extrapolation (SIMEX) approach for measurement error correction is used to simulate additional measurement error with increasing variance which is added to the observed measure for the true function-valued covariate. The standard quantile check function is minimized after adding the simulated additional measurement error to the surrogate or observed function-valued covariate prone to error. Standard errors are estimated by means of point-wise nonparametric bootstrap. We present a simulation study to assess the robustness of the proposed estimator in the presence of measurement errors. The proposed methods are applied to the NHANES database to assess the relationship between wearable-device-based measures of physical activity on body mass index among U.S. adults.

March 21st, Hammer 312, 11:45am

Thomas Richardson, PhD
Professor, Department of Statistics
University of Washington

Single World Intervention Graphs: A simple framework for unifying graphs and potential outcomes with applications to mediation analysis

Causal models based on potential outcomes, also known as counterfactuals, were introduced by Neyman (1923) and extended to observational settings by Rubin (1974). Causal Directed Acyclic Graphs (DAGs) are another approach, originally introduced by Wright (1921), but subsequently significantly generalized and extended by Spirtes et al. (1993), Pearl (1995), and Dawid (2002), among others.

In this talk I will first present a simple approach to unifying these two approaches via Single-World Intervention Graphs (SWIGs). The SWIG encodes the counterfactual independences associated with a specific hypothetical intervention on a set of treatment variables. The nodes on the SWIG are the corresponding counterfactual random variables. This represents a counterfactual model originally introduced by Robins (1986) using event trees.

Malinsky et al. (2019) show that this synthesis leads to a simplification of the do-calculus of Pearl (1995) that clarifies and separates the underlying concepts.

Recently we have also shown that a (minimal) version of the SWIG framework is equivalent to a reformulation of the causal decision diagrams of Dawid (2021).

By expanding the graph, SWIGs may also be used to describe a novel interventionist approach to mediation analysis whereby treatment is decomposed into multiple separable components. This provides a means of discussing direct effects without reference to ''cross-world” independence assumptions, nested counterfactuals or interventions on the mediator. The theory preserves the dictum ''no causation without manipulation'' and makes questions of mediation empirically testable in future randomized controlled trials.

This is joint work with James M. Robins (Harvard) and Ilya Shpitser (Johns Hopkins).

March 28th, 8th Floor Auditorium, 11:45am

Ying Lu, PhD
Professor of Biomedical Data Science and, by courtesy, of Epidemiology
Stanford University School of Medicine

A Desirability of Outcome Ranking (DOOR) Approach to Evaluate Disease Severity and Treatment Benefit Based on Individualized Importance of Symptom Domains


Abstract: Complex disorders usually affect multiple symptom domains measured by several outcomes. We recently proposed a novel composite desirability of outcome ranking (DOOR) approach for patient-centered evaluation of overall treatment benefits and developed the Patient-ranked Order of Function (PROOF) outcome for the evaluation of efficacy of amyotrophic lateral sclerosis (ALS) clinical trials. In this talk, I will introduce the motivation of DOOR approach to incorporate individualized patient preferences, our previous work to use the composite endpoint for clinical trials and extend this approach to a cohort study setting. We will report the results of 2021 and 2022 surveys of the ALS patients in the Netherlands ALS registry for their associations with survival time, factors associated with PROOF outcomes, and analysis of longitudinal data of the composite DOOR ranking. This is a joint work with Dr. van Eijk and colleagues at Stanford University and UMC Utrecht.


1. van Eijk RPA, van den Berg LH, Lu Y. Composite endpoint for ALS clinical trials based on patient preference: Patient-Ranked Order of Function (PROOF). J Neurol Neurosurg Psychiatry. 2022 May;93(5):539-546. doi: 10.1136/jnnp-2021-328194. Epub ahead of print. PMID: 34921121.

2. Lu Y, Zhao Q Zou J, Yan S, Tamaresis J, Nelson L, Tu XM, Chen J, and Tian L. A Composite Endpoint for Treatment Benefit According to Patient Preference, Statistics in Biopharmaceutical Research, 2022 Jul; 14(4): 408422, DOI: 10.1080/19466315.2022.2085783.

April 4th, 8th Floor Auditorium, 11:45am

Rebecca Betensky, PhD
Chair of the Department of Biostatistics, Professor of Biostatistics
New York University School of Global Public Health

Estimation and regression for sequentially-truncated data

Abstract: In observational cohort studies with complex sampling schemes, truncation arises when the time to event of interest is observed only when it falls below or exceeds another random time, i.e., the truncation time. In more complex settings, observation may require a particular ordering of event times; we refer to this extension of the traditional paradigm as sequential truncation and partial sequential truncation. I first describe nonparametric and semiparametric maximum likelihood estimators for the distribution of the event time of interest in the setting of these truncation settings. I then describe methods for regression modeling in this complex setting using the tool of pseudo-observations (PO). PO's are jackknife-like constructs that estimate an individual's contribution to an estimand. They are attractive in this setting because they obviate the need to directly account for the sequential truncation in the regression model of interest. Importantly, they may not be used when the truncation depends on the covariates that explain the time-to-event of interest; in this case a modified PO approach is available. We consider both the Cox and accelerated failure time (AFT) models. We evaluate our approach in simulation studies and in application to an Alzheimer's cohort study.

April 11th, 8th Floor Auditorium, 11:45am

Grace Yi, PhD
Professor, Canada Research Chair in Data Science (Tier 1), Department of Statistical and Actuarial Sciences, Department of Computer Science
University of Western Ontario

Enhancing Survival Data Analysis: Graphical Proportional Hazards Measurement Error Models

Abstract: In survival data analysis, the Cox proportional hazards (PH) model is perhaps the most widely used model to feature the dependence of survival times on covariates. While many inference methods have been developed under such a model or its variants, those models are not adequate for handling data with complex structured covariates. High-dimensional survival data often entail several features: (1) many covariates are inactive in explaining the survival information, (2) active covariates are associated in a network structure, and (3) some covariates are error-contaminated. To handle such survival data, we propose graphical PH measurement error models and develop inferential procedures for the parameters of interest. Our proposed models significantly enlarge the scope of the usual Cox PH model and have great flexibility in characterizing survival data. Theoretical results are established to justify the proposed methods. Numerical studies are conducted to assess the performance of the proposed methods.


April 18th, 8th Floor Auditorium, 11:45am

Bei Jiang, PhD
Associate Professor, Department of Mathematical and Statistical Sciences
University of Alberta

Conformalized Fairness via Quantile Regression

Abstract: Algorithmic fairness has received increased attention in socially sensitive domains. While rich literature on mean fairness has been established, research on quantile fairness remains sparse but vital. To fulfill great needs and advocate the significance of quantile fairness, we propose a novel framework to learn a real-valued quantile function under the fairness requirement of Demographic Parity with respect to sensitive attributes, such as race or gender, and thereby derive a reliable fair prediction interval. Using optimal transport and functional synchronization techniques, we establish theoretical guarantees of distribution-free coverage and exact fairness for the induced prediction interval constructed by fair quantiles. A hands-on pipeline is provided to incorporate flexible quantile regressions with an efficient fairness adjustment post-processing algorithm. We demonstrate the superior empirical performance of this approach on several benchmark datasets. Our results show the model’s ability to uncover the mechanism underlying the fairness-accuracy trade-off in a wide range of societal and medical applications.

April 25th, 8th Floor Auditorium, 11:45am

Hongtu Zhu, PhD
Professor, Department of Biostatistics
University of North Carolina, Chapel Hill Gillings School of Global Public Health

Revolutionizing Medical Image Data Analysis: Uniting AI and Statistics for Breakthroughs and Challenges

Abstract: This talk provides an insightful overview of integrating artificial intelligence (AI) and statistical methods in medical image data analysis. It is structured into three key sections:

  1. Introduction to Medical Image Data Analysis: This section sets the stage by outlining the fundamentals and significance of medical image analysis in healthcare, charting its evolution and current applications.
  2. State-of-the-Art AI Applications and Statistical Challenges: Here, we explore the impact of AI, particularly deep learning, on medical imaging, and address the accompanying statistical challenges, such as data quality and model interpretability.
  3. Opportunities for Statisticians: The final section highlights the critical role of statisticians in refining AI applications in medical imaging, focusing on opportunities for advancing algorithmic accuracy and integrating statistical rigor.

The talk aims to demonstrate the crucial synergy between AI and statistics in enhancing medical image analysis, emphasizing the evolving challenges and the vital contributions of statisticians in this domain.

May 2nd, 8th Floor Auditorium, 11:45am

Annie Qu, PhD
Chancellor’s Professor, Department of Statistics
Donald Bren School of Information & Computer Sciences, University of California, Irvine

Optimal Individualized Treatment Rule For Combination Treatments Under Budget Constraints

The individualized treatment rule (ITR), which recommends an optimal treatment based on individual characteristics, has drawn considerable interest from many areas such as precision medicine, personalized education, and personalized marketing. Existing ITR estimation methods mainly adopt one of two or more treatments. However, a combination of multiple treatments could be more powerful in various areas. In this talk, we propose a novel Double Encoder Model (DEM) to estimate the individualized treatment rule for combination treatments. The proposed double encoder model is a nonparametric model which not only flexibly incorporates complex treatment effects and interaction effects among treatments, but also improves estimation efficiency via the parameter-sharing feature. In addition, we tailor the estimated ITR to budget constraints through a multi-choice knapsack formulation, which enhances our proposed method under restricted-resource scenarios. In theory, we provide the value reduction bound with or without budget constraints, and an improved convergence rate with respect to the number of treatments under the DEM. Our simulation studies show that the proposed method outperforms the existing ITR estimation in various settings. We also demonstrate the superior performance of the proposed method in PDX data that recommends optimal combination treatments to shrink the tumor size of the colorectal cancer.

May 9th, 8th Floor Auditorium, 11:45am

Karen Bandeen-Roch, PhD
Professor, Department of Biostatistics 
Johns Hopkins University Bloomberg School of Public Health

Novel Approaches for Characterizing Physical Resilience in Older Adults: The Study of Physical Resilience and Aging

Resilience—the ability to recover quickly from stressors—has emerged as a major gerontological concept aiming to promote more consistently positive outcomes for older adults. Translating the resilience concept to achieve said benefits poses numerous biostatistical challenges. This talk addresses three: identifying factors that influence the stress response from single-arm studies, characterizing the fitness of one’s physiology governing stress response, and addressing the considerable challenges arising from stress-response data collection with older adults. A methodology to correct mathematical coupling biases embedded in regression of pre-post resilience phenotype changes on baseline factors will be presented. The method performs strongly in simulation studies and evidences usefulness in practice. The fitness of stress-response physiology is conceptualized as a dynamical system whose functioning is then governed by differential equations: A simple methodology to approximate the governing equations from stimulus-response experiment repeated measures is presented. Simulation studies frame the methodology’s performance under a constellation of designs. These highlight challenges for feasible data from stimulus-response designs in older adults. The talk concludes by elucidating these challenges and discussing alternative approaches. It is rooted in the Study of Physical Resilience and Aging (“SPRING”), which implemented stimulus-response experiments to characterize physiological fitness in older adults scheduled for major stressors of either total knee replacement, incident hemodialysis, or bone marrow transplant for hematological cancers.  Our study lays groundwork to better forecast and foster older adults’ resilience to clinical stressors.