Columbia Biostatistics Annual Research Symposium (CBARS)

Illuminating the Future of Health through Data: The Power of Innovation in Biostatistics and Health Data Science

This two-day event showcases cutting-edge research and the ways Biostatistics faculty, students, and partners promote trans-disciplinary public health science. Join us for the inaugural Columbia Biostatistics Annual research Symposium (CBARS) as we engage alumni; foster collaboration with partners in the academic, corporate, and governmental sectors; and partake in a free exchange of research ideas, thoughtful deliberations, and strategic implementation.

The keynote address will be given by Tianxi Cai, ScD, the John Rock Professor of Population and Translational Data Science, and a Professor of Biomedical Informatics at Harvard University. Dr. Shahram Ebadollahi, technologist, entrepreneur, senior executive and thought leader, will present an industry keynote on "Data Science and AI in Pharma - Current State, Challenges and Opportunities". Scientific session topics include machine learning, climate change data science, and more. Read below for more information on individual speakers and topics.

2023 Schedule

For the inaugural year of CBARS, the symposium will span over two days. September 26th will feature morning and afternoon sessions, running from 8:30am - 7:00pm, with meals provided. September 27th will be a half-day from 8:30am - 2:00pm, also with meals provided.

The symposium will take place in The Faculty Club in the Vagelos College of Physicians and Surgeons building, 630 West 168th St., 4th Floor. 

September 26, Morning Session (8:30am – 1:30pm)

8:30am – 9:00am Registration & Breakfast

Continental breakfast, coffee, and tea will be served. 

9:00am – 10:00am Welcome, Overview, and Opening 

Departmental and Symposium Overviews will be presented by Dr. Kiros Berhane and the Planning Committee Co-Chairs. This will be followed by brief opening remarks from Dean Linda Fried and a special departmental recognition event.

10:00am – 10:30am Break & Group Photo

A short break to allow attendants to mingle and network, as well as a group photo to commemorate the event.

10:30am – 11:30am Corporate/Industry Partnerships Panel Discussion

Jianying Hu
IBM Fellow; Global Science Leader, AI for Healthcare and Director of HCLS Research at IBM Research

Haoda Fu
Associate Vice President & Enterprise Lead, Machine Learning and Artificial Intelligence, Advanced Analytics and Data Sciences, Eli Lilly and Company

Yue Shentu
Executive Director, Biostatistics, Merck

View more about these speakers

11:30am – 12:30pm Academic Keynote Lecture

Tianxi Cai from Harvard University will present a keynote lecture:

"Crowdsourcing with Multi-institutional EHR to Improve Reliability of Real World Evidence - Opportunities and Challenges"
The wide adoption of electronic health records (EHR) systems has led to the availability of large clinical datasets available for discovery research. EHR data, linked with bio-repository, is a valuable new source for deriving real-word, data-driven prediction models of disease risk and progression. Yet, they also bring analytical difficulties especially when aiming to leverage multi-institutional EHR data. Synthesizing information across healthcare systems is challenging due to heterogeneity and privacy. Statistical challenges also arise due to high dimensionality in the feature space. In this talk, I’ll discuss analytical approaches for mining EHR data to improve the reliability and generalizability of real world evidence generated from the analyses. These methods will be illustrated using EHR data from Mass General Brigham and Veteran Health Administration. 

12:30pm – 1:30pm Lunch and Poster Viewing

Lunch will be served alongside the opportunity to view scientific posters from the Biostatistics department.

September 26, Afternoon Session (1:30pm – 7pm)

1:30pm – 2:30pm Corporate/Industry Keynote Lecture

Dr. Shahram Ebadollahi

"Data Science and AI in Pharma - Current State, Challenges and Opportunities"

2:30pm – 3:45pm ROADMAP Working Group Session

Machine Learning and Inferential Methods for Precision Medicine and mHealth Applications 

Dr. Ming Yuan (Professor, Department of Statistics; Associate Director, Data Science Institute, Columbia University) 
Title: TBA 

Kelly Zhang (Post-doc at Columbia Business School, will be faculty at Imperial College London) 
“Statistical inference after using online reinforcement learning for cigital health interventions” 
Online reinforcement learning is increasingly used in digital intervention experiments to personalize treatment delivery to users over time. We provide methods to perform a variety of statistical analyses using data collected by reinforcement learning (RL) algorithms for digital health interventions (longitudinal data). In this work, we focus on data collected by online RL algorithms that can learn across users, i.e., use the data of multiple users to learn and inform treatment decisions. This data type is important since highly noisy outcomes (as is the case for digital interventions) mean RL algorithms that learn using the data of multiple users can significantly reduce noise and learn faster. At the same time, this data type is challenging to develop inferential theory for because online RL algorithms that combine data across users to learn, induce dependence between the collected user data trajectories. We develop a general inferential approach for this non-i.i.d. data type that allows one to perform a variety of statistical analyses via general Z-estimation. This inferential approach will be for the primary analysis for Oralytics, a mobile health clinical trial designed to promote quality tooth-brushing via feedback/educational messages delivered by a personalizing RL algorithm. 

Eun-Jeong Oh (Assistant Professor at Northwell) 
“Penalized regressions for optimal interventions in personalized medicine” 

Ken Cheung (Professor of Biostatistics, Vice Dean for faculty, MSPH) 
“Bayesian multi-variable monotone regression” 
In this talk, I will introduce a new Bayesian multi-variable monotone regression method, called iPIPE, and briefly discuss the computational issues. The method is applied in data sets and simulation settings where the number of variables is much higher than what the isotonic regression literature has considered.  Simulation shows iPIPE-based credible intervals achieve nominal coverage probability and are more precise compared to unconstrained estimation. 

3:45pm – 4:15pm Break & Refreshments

Light refreshments will be served alongside an opportunity to view scientific posters from the Biostatistics department.

4:15pm – 5:30pm Causal Inference Learning Group Session

PART I: Causal Inference for Equitable Clinical Decision Making 

Safiya Sirota (Doctoral student Department of Biostatistics) 
“A health equity perspective on data-driven treatment decisions in cardiovascular care: risk assessments versus individualized treatment rules” 
It is standard in clinical care to inform medical decisions based on estimated risk scores, e.g., to inform assignment of antihypertensive medications based on risk of adverse cardiac events, as is currently recommended by national ACC/AHA cardiovascular guidelines. We will investigate the consequences of this practice in cardiovascular care from the perspective of health equity and health disparities. Complex associations between racial/ethnic categories, social determinants of health, and other disease risk factors may lead to disparities in treatment allocation that are exacerbated, not mitigated, by risk-based decision-making. An alternative is to base decisions on individualized treatment rules (ITRs), which are rules sensitive to causal effect heterogeneity that optimally direct therapies to patients based on their individual characteristics. We investigate how allocations based on ITRs may mitigate disparities in treatment assignment, using both simulated data and real data from a large observational cohort study. We find that recommending treatment according to the ITR paradigm may have substantial consequences for treatment recommendations and possibly health disparities.

Kara Rudolph (Assistant Professor, Department of Epidemiology) 
“Learning individualized treatment strategies for reducing opioid use disorder relapse” 
The opioid epidemic in the United States continues to be a public health emergency, exacerbated by the COVID-19 pandemic. There are three FDA-approved medications for opioid use disorder (MOUD) that have demonstrated benefit: 1) buprenorphine, 2) methadone, and 3) extended-release injection naltrexone. However, there was no quantitative guidance about which medication would work best for which patients. In addition, dose and dose adjustment of buprenorphine and methadone are likely important factors in treatment effectiveness, but providers were tasked with making these choices with little or no quantitative, evidence-based guidance. For both of these aspects of MOUD treatment, there may not be a one-size-fits-all best approach. Rather, the ``optimal" medication and dose and dose adjustment may depend on person-level factors. We: 1) learn optimal treatment rules for matching patients to an MOUD medication and 2) estimate risk of OUD relapse under different dosing strategies. 

Amy Pitts (Doctoral candidate Department of Biostatistics) 
“Using a separable effects model to overcome extreme positivity violation and distinguish the causal effects of surgery and anesthesia” 
The U.S. Food and Drug Administration has cautioned that prenatal exposure to anesthetic drugs during the third trimester may have neurotoxic effects. However, there is limited clinical evidence available to substantiate this recommendation. To explore this claim, we analyze data from the nationwide Medicaid Analytic extract from 1999 through 2013 that linked over 16 million deliveries to mothers enrolled in Medicaid during pregnancy. The main goal of this project is to estimate the causal effect of mothers receiving anesthesia during pregnancy on the diagnosis of attention-deficit/hyperactivity disorder (ADHD) in the child. Isolating the effect of anesthesia from the effect of surgical procedure is challenging since they are deterministically linked, thereby inducing a strong positivity violation. We use the separable effects model of Robins and Richardson (2010) to isolate the effect of anesthesia from that of the surgical procedure by blocking effects through variables that are assumed to completely mediate the causal pathway from surgery to ADHD. We discuss the identification assumptions required for our approach, derive sensitivity analyses for their violations, and discuss the results of our application to the real data. 


PART II: Foundations of Causal Inference for Complex Observational Data  

Melanie Mayer (Doctoral candidate Department of Biostatistics) 
“Statistical methods for transporting an environmental mixture effect” 
Transportability methods use data from a sample population to estimate an exposure effect in a non-overlapping target population of interest. Transporting an environmental mixture effect, where one evaluates the effect of multiple, continuous and correlated exposures using observational data, comes with a set of complexities, however, that have yet to be addressed. We borrow concepts from the causal inference framework to extend transportability methods to environmental mixtures analyses by (1) identifying scenarios where we can and cannot transport based on formalized assumptions and (2) developing a statistical approach for estimating the transported effect via matching and flexible modeling to account for differences across populations and non-linear/interaction effects conjointly. Our proposed method expands the data sources available for estimating an environmental mixture effect to help motivate policy decisions targeted to the populations they are intended for.

Charlotte Fowler (Doctoral candidate Department of Biostatistics) 
“Accounting for effect modification by latent disease state for individual causal estimation among bipolar participants in mobile health studies” 
Individuals with bipolar disorder cycle through disease states such as depression, mania, and euthymia. The heterogeneous nature of disease across states complicates the evaluation of interventions for bipolar disorder patients, as varied interventional success is observed within and across individuals. In fact, we hypothesize that disease state acts as an effect modifier for the causal effect of a given intervention on health outcomes. We propose an N-of-1 approach using an adapted autoregressive hidden Markov model, applied to longitudinal mobile health data collected from individuals with bipolar disorder. This method allows us to identify the latent disease state from daily survey responses to be treated as an effect modifier between the exposure and outcome of interest.  We then employ a g-formula to estimate said effect. We compare the performance of our proposed method with naive approaches across different simulation scenarios and in an application to a multi-year smartphone study of bipolar patients, evaluating the individual effect of physical activity on sleep controlling for confounding and modification by latent disease state.

Zhonghua Liu (Assistant Professor, Department of Biostatistics) 
“Leveraging interactions for robust mendelian randomization” 
The instrumental variable method is widely used in the health and social sciences for identification and estimation of causal effects in the presence of potential unmeasured confounding. To improve efficiency, multiple instruments are routinely used, raising concerns about bias due to possible violation of the instrumental variable assumptions. To address such concerns, we introduce a new class of G-estimators that are guaranteed to remain consistent and asymptotically normal for the causal effect of interest by leveraging interactions among candidate possibly invalid instruments. We provide formal semiparametric efficiency theory supporting our results. Simulation studies and applications to UK Biobank data demonstrate the superior empirical performance of the proposed estimators compared with competing methods. 

5:30pm – 7:00pm Reception

A reception will be held after the events of the day to allow for discussion and networking freely amongst attendees. 

September 27, Morning Session (8:30am – 2:00pm)

8:30am – 9:00am Registration & Breakfast

Continental breakfast, coffee, and tea will be served.

9:00am – 10:15am Environmental Statistics Working Group Session

Modeling the Future: Harnessing Environmental Statistics and Data Science to Navigate Climate Change
As our world grapples with the intricate challenges of climate change, understanding and utilizing the data behind our changing environment has never been more critical. This session will delve deep into the innovative statistical methods and analytical techniques directing our response to environmental threats. Our speakers will shed light on how data science and environmental statistics can pave the way for informed, proactive, and resilient decision-making, with a glimpse into the forward-thinking initiatives of the Columbia Climate School.

Daniel Malinsky (Assistant Professor, Department of Biostatistics) 
“Estimating the longitudinal causal effects of ambient air pollution exposures on chronic lung disease” 
This talk will present some ongoing work-in-progress on estimating the causal effects of air pollution exposures on chronic lung disease outcomes using marginal structural models (MSMs). We examine data from a multi-site cohort study that has been collecting data for more than 20 years on a diverse group of older American adults. Participants have been linked to ambient air pollution exposure levels on several pollutants of interest – we focus on ozone, NOx, and PM2.5. We will discuss challenges and approaches to estimating effects of these continuous exposures while accounting for time-varying confounding. 

Xiao Wu (Assistant Professor, Department of Biostatistics) 
“Low-intensity fires mitigate the risk of catastrophic wildfires in California's forests.” 
The increasing frequency of severe wildfires across the globe demands a shift in landscape management to mitigate their consequences. The role of managed, low-intensity fire as a driver of beneficial fuel treatment in fire-adapted ecosystems has drawn interest in both scientific and policy venues. Using a synthetic control approach to analyze twenty years of satellite-based fire activity data across 124,186 km2 of forests in California, we provide evidence that low-intensity fires substantially reduce the risk of future high-intensity fires. In conifer forests, the risk of high-intensity fire is reduced by 64.0% [95% CI: 41.2%–77.9%] in areas recently burned at low intensity relative to comparable unburned areas, and protective effects last for at least six years [lower bound of one-sided 95% CI: 6yr]. These findings support a transition from policies focused on fire suppression to ones emphasizing restoration, through increased use of prescribed fire, cultural burning, and managed wildfire, of a fire regime that approximates pre-suppression, pre-colonial conditions in California. 

Jeffrey Shaman (Professor of Environmental Health Sciences, Professor of Climate, Interim Dean of Climate School) 
“Columbia Climate School: Research, Practice and Analytics” 

Discussant: Kiros Berhane (Cynthia and Robert Citron-Roslyn and Leslie Goldstein Professor of Biostatistics and Chair, Department of Biostatistics) 

10:15am – 10:30am Break & Networking

A short break to allow attendants to mingle and network.

10:30am – 11:45am Functional Data Analysis Working Group Session

Functional Data Methods in Neuroimaging and Public Health
This session will highlight the recent work of several members of the Functional Data Analysis Working Group (FDAWG) who will be presenting methodological advances and novel applications for modeling imaging data and other function-valued data objects. 

Xin Ma (Assistant Professor, Department of Biostatistics) 
“Multi-task learning with high-dimensional noisy images” 
High-dimensional data have become increasingly prevalent in biomedical research, offering exciting opportunities for novel scientific discoveries. However, analyzing such data poses unique challenges, particularly when dealing with imaging data that are high-dimensional, spatially correlated, and subject to measurement errors. In this project, we focus on joint learning of noisy imaging data from multiple sources. We represent the imaging data as functional objects using wavelet bases and use a grouped penalty to control sparsity levels across data sources. We also propose a correction procedure for imaging data contaminated with measurement errors. Extensive simulations and application to the Alzheimer’s Disease Neuroimaging Initiative study illustrate superior performance over existing methods in both prediction and feature selection. 

Maddie Stoms (Doctoral candidate Department of Biostatistics) 
“Estimation of menstrual cycle day using cross-sectional biomarker measurements” 
Many health-related outcomes and exposures vary over the menstrual cycle. Accounting for this source of variation could improve model accuracy and statistical power. However, cycle day is difficult to measure through self report or estimate using available methods, and is routinely overlooked in studies involving women's health. Our goal is to provide an accurate estimate of menstrual cycle day (i.e. number of days since the start of cycle) using  hormone values derived from a single spot urine sample. We construct a likelihood for the latent cycle day based on observed hormone levels; this likelihood uses patterns of hormonal variation obtained from a unique study that followed a sample of women over a complete cycle as a reference. Our results suggest that we can estimate the cycle day within three days for the majority of observations based on single spot urine samples. This approach can be applied to ongoing and future studies in which menstrual cycle day is a potentially important variable, and may refine results based on extant datasets that obtained urine samples. 

Angel Garcia de la Garza (Assistant Professor of Biostatistics, Albert Einstein College of Medicine) 
“Adaptive Functional Principal Component Analysis”

Baoyi Shi (Doctoral candidate Department of Biostatistics) 
“Nonparametric functional data modeling of pharmacokinetic processes with applications in dynamic PET imaging” 
Modeling a pharmacokinetic process typically involves solving a system of linear differential equations and estimating the parameters upon which the functions depend. In order for this approach to be valid, it is necessary that a number of fairly strong assumptions hold, assumptions involving various aspects of the kinetic behavior of the substance being studied. In many situations, such models are understood to be simplifications of the "true" kinetic process. While in some circumstances such a simplified model may be a useful (and close) approximation to the truth, in some cases, important aspects of the kinetic behavior cannot be represented. We present a nonparametric approach, based on principles of functional data analysis, to modeling of pharmacokinetic data. We illustrate its use through application to data from a dynamic PET imaging study of the human brain. 

11:45am – 12:45pm Introduction to Working Groups

An abbreviated introduction to the other working groups of the Biostatistics department and their research.

12:45pm – 1:00pm Concluding Remarks

Dr. Kiros Berhane will give concluding remarks wrapping up the inaugural Columbia Biostatistics Annual Research Symposium.

1:00pm – 2:00pm Lunch & Poster Presentation Awards

Join us for lunch and the presentation of poster awards as a final celebration to the symposium.

2023 Videos

Industry Panel 

Corporate & Industry Partners Panel Discussion – Haoda Fu & Yue Shentu

Academic Keynote - Dr. Tianxi Cai

CBARS Academic Keynote – Dr. Tianxi Cai

Industry Keynote - Dr. Shahram Ebadollahi

CBARS Industry Keynote - Dr. Shahram Ebadollahi


Keynote Speakers

Dr. Shahram Ebadollahi

Dr. Shahram Ebadollahi is a technologist, entrepreneur, senior executive and thought leader in applications of data science and AI in healthcare and life sciences. He is currently an Operating Partner at a French Private Equity firm, specializing in identifying opportunities and investing in companies addressing challenges in the broad healthcare eco-system.

Until recently he was the Chief Data Science and AI Officer at Novartis, the Swiss pharmaceutical company, where he was responsible for all matters related to data strategy and management, data science and artificial intelligence. In this capacity, he oversaw the design, execution, and delivery of multiple strategic, cross-enterprise programs to enable and create data-driven decision-making systems and platforms. To enable these efforts, Dr. Ebadollahi set up an industry-first “AI Innovation Lab’ for Novartis where he hired world-class AI talent in the areas of NLP/G, imaging, and causal inference. He also sponsored and managed a unique and industry-first partnership with Microsoft on all matters regarding data science and AI innovation and their applications to challenges in life sciences industry. 

Prior to Novartis, Shahram was the co-strategist and the technical founder of IBM Watson Health and its first employee. In this role, he set up and ran multiple functions, e.g., Innovation, Business Development and Partnerships, Data Strategy, Patent Strategy, Technology Development, and served as the Chief Science Officer and spokesperson for IBM in the area of health. Shahram was also instrumental in leading multiple major acquisitions for the newly founded business and establishing strategic partnerships. During his tenure at IBM he also created the Blockchain business in healthcare, as well as creating and leading the Healthcare Research agenda and teams which resulted in innovative scientific work and a broad patent portfolio in applications of data science and machine learning to healthcare challenges. 

Dr. Ebadollahi has been very involved with the Columbia community and was instrumental in establishing the “Columbia-IBM Center for Blockchain and Data Transparency”. In addition, he serves on Columbia University’s School of Engineering and Applied Sciences Board of Visitors and the advisory boards of Data Science Institute (cross-University) and the Deming Center (Columbia Business School). 

Dr. Ebadollahi serves on various boards as Independent Director and Scientific Advisory Member. He received his PhD and MBA degrees from Columbia University.

Tianxi Cai, Harvard University

Tianxi Cai, PhD (Biostatistics), Professor of Biomedical Informatics (HMS) and Biostatistics (HSPH) and John Rock Professor of Population and Translational Data Science, Harvard University. Dr. Cai co-directs the VERITY Bioinformatics Core at Brigham and Women’s Hospital and an Applied Bioinformatics Core at Veteran Health Administration. She is the founding director of the Translational Data Science Center for a Learning Health System at HMS and HSPH. She also directs the Big Data Analytics Core at Harvard Medical School, providing statistical and biomedical informatics support to both Harvard research community and external research groups including VA and industry.  Dr. Cai’s research team has successfully developed statistical and informatics tools for analyzing complex big biomedical data from large scale studies including multi-institutional electronic health records, cohort studies, disease registries, genomic studies, and randomized clinical trials. 




Industry Panel Speakers

Jianying Hu

IBM Fellow; Global Science Leader, AI for Healthcare and Director of HCLS Research at IBM Research

Jianying Hu (Ph.D.) is an IBM Fellow; Global Science Leader, AI for Healthcare and Director of HCLS Research at IBM Research; and Adjunct Professor at Icahn School of Medicine at Mount Sinai. Dr. Hu joined IBM in 2003 after working at Bell Labs. She has over 30 years of experience conducting and leading research on machine learning, data mining, statistical pattern recognition, and signal processing applied to medical informatics, business analytics, and multimedia content analysis, with recent fucus on accelerated scientific discovery in health through scalable AI technologies. Dr. Hu has published over 150 peer reviewed scientific papers and holds 50 patents. She served on the Computational Science Advisory Board of Michael J. Fox Foundation from 2017 to 2018, and currently serves on the External Advisory Board of the NIH AIM-AHEAD Program, and the National Academy of Science, Engineering and Medicine (NASEM) Committee on Establishing a Framework for Emerging Science, Technology and Innovation in Health and Medicine. She has served as Associate Editor for many journals including IEEE TPAMI, IEEE TIP, and Pattern Recognition, and currently serves on the Journals and Publications Committee of AMIA, Editorial Board of JAMIA Open, and the Advisory Board of JHIR. Dr. Hu is a fellow of the American College of Medical Informatics (ACMI), International Academy of Health Sciences Informatics (IAHSI), IEEE, and the International Association of Pattern Recognition (IAPR). She received the Asian American Engineer of the Year Award in 2013.

Haoda Fu

Associate Vice President & Enterprise Lead, Machine Learning and Artificial Intelligence, Advanced Analytics and Data Sciences, Eli Lilly and Company

Dr. Haoda Fu is an Associate Vice President and an Enterprise Lead for Machine Learning,Artificial Intelligence, and Digital Connected Care from Eli Lilly and Company. Dr. Haoda Fu is a Fellow of ASA (American Statistical Association), and IMS Fellow (Institute of Mathematical Statistics). He is also an adjunct professor of biostatistics department, Univ. of North Carolina Chapel Hill and Indiana university School of Medicine. Dr. Fu received his Ph.D. in statistics from University of Wisconsin - Madison in 2007 and joined Lilly after that. Since he joined Lilly, he is very active in statistics and data science methodology research. He has more than 100 publications in the areas, such as Bayesian adaptive design, survival analysis, recurrent event modeling, personalized medicine, indirect and mixed treatment comparison, joint modeling, Bayesian decision making, and rare events analysis. 

In recent years, his research area focuses on machine learning and artificial intelligence. His research has been published in various top journals including JASA, JRSS, Biometrika, Biometrics, ACM, IEEE, JAMA, Annals of Internal Medicine etc.. He has been teaching topics of machine learning and AI in large industry conferences including teaching this topic in FDA workshop. He was board of directors for statistics organizations and program chairs, committee chairs such as ICSA, ENAR, and ASA Biopharm session. He is a COPSS Snedecor Awards committee member from 2022-2026, and will also serve as an associate editor for JASA theory and method from 2023. 

Yue Shentu

Executive Director, Biostatistics, Merck

Yue Shentu is an executive director at Merck Research Laboratories. He is currently the section head of late-development statistics in oncology, overseeing the thoracic and head & neck indications. Yue holds a Ph.D. In statistics from Rutgers University, and his research interest includes adaptive design and subgroup identification in clinical trials.