Levin Lecture Series Colloquium Seminars
Lectures are in-person only unless marked otherwise.
For all Zoom inquiries to virtually attend seminars with a virtual option, please send an email to Erin Elliott, Programs Coordinator (ee2548@cumc.columbia.edu).
During the Fall and Spring semesters, the Department of Biostatistics holds seminars, called the Levin Lecture Series, on a wide variety of topics which are of interest to both students and faculty. The speakers are occasionally departmental faculty members themselves but very often are invited guests who spend the day of their seminar discussing their research with Biostatistics faculty and students.
Fall 2024 Levin Lectures
September 5th, ARB 8th Floor Auditorium, 11:45am
Kevin Josey, PhD
Assistant Professor, Department of Biostatistics & Informatics
Colorado School of Public Health
Causal Inference using Variables Measured with Error
Abstract:
In both the scientific application and development of causal inference methods, it is often implicitly assumed that all relevant variables are measured without error. However, in many contexts obtaining error-free measurements of an outcome, exposure, or confounding variable may be unreasonable or even impossible. In these scenarios, the presence of measurement error can subsequently invalidate fundamental assumptions necessary for causal inference. Despite the extensive literature studying the impact of measurement error in associational studies, the development of methods at the intersection of measurement error and causal inference is in a relatively early stage. This presentation will first examine a variety of methods for addressing measurement error in causal analyses. Subsequently, we propose implementing a class of estimators applicable to general causal quantities that is conventionally used for unmeasured confounding to instead address bias induced by measurement error. Under standard double sampling schemes, the proposed estimator is shown to be competitive with existing approaches in a simulation study. We illustrate our method with observational electronic health record data on HIV outcomes from the Vanderbilt Comprehensive Care Clinic.
September 12th, ARB Room 532A/B, 11:45am
Shuangning Li, PhD
Assistant Professor of Econometrics and Statistics
University of Chicago Booth School of Business
Causal Inference in the Presence of Interference: Estimation and Testing Problems
Abstract:
In causal inference, "interference" refers to a scenario where the treatment assigned to one unit affects the observed outcomes of other units. In a wide variety of applied settings, such interference effects not only exist but are of considerable interest. In this talk, I will present some tools I have developed to conduct statistical inference in the presence of such interference.
1. Estimation:
I will begin by discussing estimation problems, focusing on a study that examines large-sample asymptotics for treatment effect estimation under network interference, where the interference graph is a random draw from a graphon. For direct effects, we demonstrate that popular estimators in this setting are significantly more accurate than previously suggested. For indirect effects, we propose a new consistent estimator in a setting where no other consistent estimators currently exist.
2. Testing:
If time permits, I will then discuss testing problems. I will present a study focused on testing for interference in A/B testing with increasing allocation. Specifically, we introduce two permutation tests designed to detect the existence of interference, each valid under different assumptions. These procedures have been implemented at LinkedIn to detect potential interference across all their marketplace experiments.
September 19th, ARB 8th Floor Auditorium, 11:45am
Shuangge (Steven) Ma, PhD
Department Chair and Professor of Biostatistics
Yale School of Medicine
Modeling Emotional Expressions for Multiple Cancers via a Linguistic Analysis of an Online Health Community
Abstract:
The diagnosis and treatment of cancer can evoke a variety of adverse emotions. Online health communities (OHCs) provide a safe platform for cancer patients and those closely related to express emotions without fear of judgement or stigma. In the literature, linguistic analysis of OHCs is usually limited to a single disease and based on methods with various technical limitations. In this article, we analyze posts from September 2010 to September 2022 on nine cancers that are publicly available at the American Cancer Society’s Cancer Survivors Network (CSN). We propose a novel network analysis technique based on a latent space model. The proposed approach decomposes the emotional expression semantic networks into an across-cancer time-independent component (which describes the ``baseline’’ that is shared by multiple cancers), a cancer-specific time-independent component (which describes cancer-specific properties), and an across-cancer time-dependent component (which accommodates temporal effects on multiple cancer communities). For the second and third components, respectively, we consider a novel clustering structure and a change point structure. A penalization approach is proposed, and its theoretical and computational properties are carefully examined. The analysis of the CSN data leads to sensible networks and deeper insights into emotions for cancer overall and specific cancer types.
September 26th, ARB 8th Floor Auditorium, 11:45am
Leilei Zeng, PhD
Professor/Associate Chair - Research
University of Waterloo, Department of Statistics and Actuarial Science
A Mixture Hidden Markov Model for Multiple Types of Disease
Abstract: Multistate models are widely used for analyzing longitudinal data on disease progression over time. Many diseases manifest differently and what appears to be a coherent collection of symptoms is often the expression of a variety of distinct disease subtypes, each with a different rate of onset of symptoms and progression. We propose a mixture hidden Markov model (MHMM), where the underlying process is characterized by a finite mixture of multiple Markov chains, one for each disease subtype, while the observation process contains states corresponding to the common symptomatic stages of these diseases. Information on type of disease is partially available and reflects the pathway through certain hidden states in the corresponding disease process, facilitating the estimation of parameters involved in the proposed models. The method is demonstrated on a dataset to model the development and progression of dementia caused by Alzheimer's disease and non-AD dementia.
October 3rd, ARB 8th Floor Auditorium, 11:45am
Qi Long, PhD
Professor of Biostatistics
University of Pennsylvania, Perelman School of Medicine, Department of Biostatistics, Epidemiology & informatics
Advancing Responsible Statistical and AI/ML Methods for Analysis of Complex EHR data
Rapid advances in technologies have enabled generation and collection of vast amounts of health data in research studies, from healthcare delivery, and from other real-world sources. While such rich data offer great promises in advancing intelligent and equitable health and medicine, they present daunting analytical challenges. One notable example is the multi-modal data from electronic health records (EHR) that are recorded at irregular time intervals with varying frequencies and include structured data such as labs and vitals, codified data such as diagnosis and procedure codes, and unstructured data such as clinical notes and pathology reports. They are typically incomplete and fraught with other data errors and biases. What’s more, data gaps and errors in EHRs are often unequally distributed across patient groups: People with less access to care, often people of color or with lower socioeconomic status, tend to have more incomplete EHRs. Such data bias, if not adequately addressed, would lead to biased results and exacerbate health inequities. In this talk, I will share my research group’s work on developing robust statistical and AI/ML methods for addressing these challenges including some recent work on large language models (LLMs). Our research experience has demonstrated that a trans-disciplinary data science approach that involves collaboration between statisticians, informaticians, computer scientists, and physician scientists can accelerate innovation in harnessing the transformative power of EHR to tackle complex real-world problems and exert powerful impact in medicine. To this end, I will also discuss some open questions and opportunities for future research.
October 10th, ARB Hess Commons, 11:45am
Menggang Yu, PhD
Professor, Biostatistics
University of Michigan, School of Public Health
Covariate-Balancing Weights for Causal Generalization with Target Sample Summary Information
In this talk, we focus on estimating the average treatment effect (ATE) of a target population when individual-level data from a source population and summary-level data (e.g., first or second moments of certain covariates) from the target population are available. In the presence of heterogeneous treatment effect, the ATE of the target population can be different from that of the source population when distributions of treatment effect modifiers are dissimilar in these two populations, a phenomenon also known as covariate shift. Many methods have been developed to adjust for covariate shift, but most require individual covariates from a representative target sample. We develop nonparametric weights for the treated and control groups within the source sample by calibration to the summary-level information from the target sample. Our approach also seeks additional covariate balance between the treated and control groups in the source sample. We will demonstrate statistical properties and numerical results of the resulting estimator.
October 17th, ARB Hess Commons, 11:45am
Laura Hatfield, PhD
Senior Fellow, NORC
University of Chicago
Transporting Difference-in-Differences Estimates for Health Equity Evaluations
The Medicare program provides medical insurance for most adults aged 65 years and older in the United States. To improve the cost, quality, and outcomes of Medicare beneficiaries, the Centers for Medicare and Medicaid Innovation (CMMI) designs and tests novel payment and delivery models. CMMI has recently pledged to put equity at the center of its demonstrations and evaluations. However, robust methods to estimate equity impacts using quasi-experimental designs are lacking. This paper addresses the problem of transporting treatment effect estimates from CMMI models, most commonly using difference-in-differences designs, to equity-relevant target populations. We extend methods developed by Renson et al. (2023) to transport difference-in-differences treatment effects. Specifically, we apply and extend these methods to transport the effects of Comprehensive Primary Care Plus (CPC+) to a target population of Black fee-for-service (FFS) Medicare beneficiaries living outside the original 18 CPC+ regions. Our application poses a unique problem in that the treatment status of the units to which we wish to transport inferences cannot be observed. Therefore, we conducted a simulation study in which we simulated practice-level spending in sample and target units, calibrating to values from the literature and varying key parameters to create multiple realistic scenarios that varied the representativeness of the sample relative to the target population. Across our simulation scenarios, transporting the treatment effect yielded median treatment effects that varied as much as the total estimated effect. We also explored the sensitivity of the methods to violations of assumptions. I also discuss connections to our research on formulating target estimands for equity evaluations and developing identification and estimation strategies for those estimands.
October 24th, ARB Hess Commons, 11:45am
Sandrah Proctor Eckel, PhD
Associate Professor of Population and Public Health Sciences
University of Southern California, Keck School of Medicine
Title & Abstract TBA
October 31st, ARB 8th Floor Auditorium, 11:45am
Oscar Madrid Padilla, PhD
Assistant Professor in the Department of Statistics
University of California Los Angeles
Multilayer random dot product graphs: Estimation and online change point detection
We study the multilayer random dot product graph (MRDPG) model, an extension of the random dot product graph to multilayer networks. To estimate the edge probabilities, we deploy a tensor-based methodology and demonstrate its superiority over existing approaches. Moving to dynamic MRDPGs, we formulate and analyse an online change point detection framework. At every time point, we observe a realization from an MRDPG. Across layers, we assume fixed shared common node sets and latent positions but allow for different connectivity matrices. We propose efficient tensor algorithms under both fixed and random latent position cases to minimize the detection delay while controlling false alarms. Notably, in the random latent position case, we devise a novel nonparametric change point detection algorithm based on density kernel estimation that is applicable to a wide range of scenarios, including stochastic block models as special cases. Our theoretical findings are supported by extensive numerical experiments, with the code available online.
November 7th, ARB Hess Commons, 11:45am
Gang Li, PhD
Professor of Biostatistics
University of California Los Angeles Fielding School of Public Health
Title & Abstract TBA
November 14th, ARB 8th Floor Auditorium, 11:45am
Michael Hudgens, PhD
Professor and Chair, Department of Biostatistics
University of North Carolina, Gillings School of Global Public Health
Causal Inference in Infectious Disease Prevention Studies
This talk will provide a high-level overview of the development and application of causal inference methods to infectious disease prevention studies, with particular focus on vaccines. Examples will include drawing inference about vaccine effects on post-infection outcomes, immunological correlates of vaccine protection, spillover effects of vaccines, and waning of vaccine effects over time.
November 21st, ARB Hess Commons, 11:45am
Cliff Meyer, PhD
Senior Research Scientist
Harvard T.H. Chan School of Public Health
Computational Biology
Title & Abstract TBA
December 5th, ARB Hess Commons, 11:45am
Zhengwu Zhang, PhD
Assistant Professor
University of North Carolina, Statistics & Operations Research
Title & Abstract TBA