Awarded Projects

"Quantitative disease risk scores for common diseases, with application to eMERGE" 

Principal Investigator: Iuliana Ionita-Laza
Funding Agent: NIH (R01AG066107)

"A Data Science Framework for Empirically Evaluating and Deriving Reproducible and Transferrable RDoC"

(9/1/2020-6/30/2025) and (12/1/2020-11/30/2022)

Principal Investigator(s): Ying Liu and Seonjoo Lee

Funding Agent: NIH (R01MH124106)

Abstract: To advance the understanding of psychopathology using dimensional constructs of measurements from multiple units of analysis, we propose a reproducible statistical framework for validating and deriving RDoC constructs with relevance to psychopathology. We will use multi‐modal neuroimaging, behavioral, and clinical/self‐report data from multiple samples to develop this framework. The design of our study consists of analyzing large, nationally representative samples, validating the results in local clinically enriched samples, and transfer information from the large community samples to local clinical samples.

"Conditional Quantile Random Forest with Biomedical and Biological Applications"


Principal Investigator: Ying Wei

Funding Agent: NSF/NIH (DMS-1953527)

Abstract: Modern biology and biomedical science are experiencing a wave of machine learning applications as biological data sets become increasingly larger and more complex. Among them, the random forest is particularly appealing and has gained great popularity in biology studies, genomic data analysis, and biomedical science. They offer great flexibility in modeling the complex data and associations, while still enjoy certain levels of interpretability and transparent decision mechanism. The proposal aims to develop a new framework of conditional quantile random forest (CQRF), which largely generalizes the existing approaches.  The proposal will investigate its potential in advancing biology and biomedical science with focused applications analyzing electronic medical records and genomic data. Once carried out, the proposed work potentially leads to new knowledge discoveries and new precision interventions in biomedical science.

"Statistical method for neural mechanisms mediating and moderating cognitive system in Alzheimer’s Disease and aging research"

(1/15/2020 ‐ 12/31/2024)

Principal Investigator: Seonjoo Lee

Funding Agent: R01AG062578 (NIA)

Abstract: This project aims to develop statistical methods to study normal aging and Alzheimer’s disease pathology related brain changes and cognitive decline, and the role of cognitive reserve. We will demonstrate that the developed statistical methods offer improved accuracy and robustness over current tools. First, we will develop tools for identifying robust relationships between neurodegeneration or pathology markers and brain function (network expression measured by task fMRI) in the presence of CR as a moderator. Second, we will derive the neural substrate of CR using resting-state functional MRI and task fMRI and then develop statistical tools to test the moderation effect of the imaging CR proxies. Third, we will develop the sparse moderated mediation methods for high-dimensional predictors and mediators accounting for moderation. to test whether network expression during cognitive tasks mediates the effect of brain changes (measured via multimodal structural MRI) on cognitive performance, cognitive decline, and dementia transition and whether the derived neural substrate of CR moderates the mediation.

"Analysis and visualization of adverse events and patient-reported outcomes that reflected the overall treatment toxicity burden"


Principal Investigator: Shing Lee

Funding Agent: Hope Foundation (HF CU19-2753)

Abstract:  This project will develop innovative methods for summarizing, analyzing, and visualizing adverse event data captured from physicians and patients in clinical trials.  The methods will incorporate toxicity severity, duration, timing, and trajectory to better reflect the overall toxicity burden.  This project is a collaboration with the SWOG Cancer Consortium using completed SWOG studies that encompass a variety of cancer treatments, particularly targeted therapies and immunotherapies.  With feedback from key stakeholders such as patient advocates and researchers, the methods developed in this project have the potential to be directly applied in the reporting of future SWOG clinical trials to improve the analysis and interpretation of adverse event data in all phases of cancer drug development and care delivery.  

"Bayesian exposure-response analysis for immunoassays data with measurement errors"


Principal Investigator: Qixuan Chen

Funding Agent: NIEHS (R21ES029668)

Abstract: The proposed research will introduce new Bayesian approaches for estimating the nonlinear exposure-response relationship between a continuous environmental exposure and a binary disease outcome and for assessing the combined health effect of environmental exposure mixtures, in which the exposures are measured with errors but external calibration data are available to correct the errors. The proposed methods will be applied to the New York City Neighborhood Asthma and Allergy Study to assess the effects of indoor allergens on asthma morbidity among asthmatic children, with the indoor allergen concentrations measured using immunoassays. The findings of this study will provide important insights for the intervention and prevention of asthma morbidity among inner-city children with asthma.

"Identifying Bio‐signatures of Suicidal Subtypes in Veterans"


Methods core Lead: Hanga Galfalvy

Funding Agent: James Peters VA Medical Center

Abstract:  This project aims to develop sophisticated diagnosis tools for preventing future suicidal behavior in US Veterans at high-risk.

"Inferential methods for functional data from wearable devices"


Principal Investigator: Ian McKeague

Funding Agent: NIA (R01AG062401)

Abstract: This is a project to develop new statistical methods for comparing groups of subjects in terms of health outcomes that are assessed using data from wearable devices. Inexpensive wearable sensors for health monitoring are now capable of generating massive amounts of data collected longitudinally, up to months at a time. The project will develop inferential methods that can deal with the complexity of such data. A serious challenge is the presence of unmeasured time-dependent confounders (e.g., circadian and dietary patterns), making direct comparisons or borrowing strength across subjects untenable unless the studies are carried out in controlled experimental conditions. Generic data mining and machine learning tools have been widely used to provide predictions of health status from such data. However, such tools cannot be used for significance testing of covariate effects, which is necessary for designing precision medicine interventions, for example, without taking the inherent model selection or the presence of the unmeasured confounders into account. To overcome these difficulties, the systematic development of inferential methods for functional outcome data obtained from wearable devices will be carried out. 

"Big Data Methods for Comprehensive Similarity-based Risk Prediction"

(2/12/2019 -1/31/2024)

Principal Investigator: Shuang Wang

Funding Agent: NLM (R01LM013061)

Abstract: The project focus on developing a novel data science pipeline which includes a clinical data processing pipeline to format comprehensive patient health determinants from a variety of sources of clinical, genomic, socio-environmental data, and a clinical-outcome-prediction framework that optimally fuses relevant patient health determinants to define patient similarity for improved clinical risk predictions.

"Statistical Methods for the Assessment of Social Engagement in Psychosis Using Digital Technologies"

(9/19/2018 - 8/31/2022)

Principal Investigator: Linda Valeri

Funding Agent: NIMH (K01MH118477)

Abstract: Complex psychiatric diseases, such as chronic psychotic disorder, are major public health issues in the United States. The proposed research provides an innovative framework and develops powerful and computationally efficient statistical methods to integrate active (e.g. survey) and passive (e.g. GPS, text, and call log) data streams from mobile sensors for the discovery of behavioral targets of treatment for chronic psychosis. 

Key publications and products:

  • L, Coull BA, Zigler C, Valeri L. Bayesian data fusion: probabilistic sensitivity analysis for unmeasured confounding using informative priors based on secondary data. (2021). Biometrics, in press.
  • Zhu Y, Jackson J, Centorrino F, Fitzmaurice GM, Valeri L. Meta-analysis of the total effect decomposition in the presence of multiple mediators: Integrating evidence across trials for schizophrenia treatment. (2020).  Epidemiology, in press.
  • Bellavia A, Centorrino F, Jackson JW, Fitzmaurice G, Valeri L. The role of weight gain in explaining the effects of antipsychotic drugs on positive and negative symptoms: An analysis of the CATIE schizophrenia trial. Schizophr Res. 2019 04; 206:96-102.
  • Discacciati A, Bellavia A, Lee JJ, Mazumdar M, Valeri L. Med4way: a Stata command to investigate mediating and interactive mechanisms using the four-way effect decomposition. Int J Epidemiol. 2018, 48(1):15-20
  • Shi B, Choirat C, Coull BA, VanderWeele TJ, Valeri L. CMAverse: An R package for reproducible causal mediation analysis. (2020). Submitted. Rpackage 
  • Wang A, Devick K, Navas-Acien A, Coull BA, Valeri L. BKMR-CMA: A Novel R command for mediation analysis with multiple continuous exposures. (2020). Rpackage

"Integrative Learning to Combine Evidence for Personalized Treatment Strategies"


Principal Investigator: Yuanjia Wang

Funding Agent: NIMH (R21MH117458)

Abstract: Treatment responses for mental disorders are inadequate and considerable heterogeneity is observed, in part because an individual patient's clinical, psychosocial, and/or biological markers are not accounted for when select- ing treatments among available options. This research proposes novel analytic methods to discover new powerful, yet interpretable personalized treatment strategies and integrate evidence of strategies identified in multiple prior studies to increase robustness and reproducibility.

Key publications and products: 

  • Qiu X, Wang Y. (2019). Composite Interaction Tree for Simultaneous Learning Optimal Individualized Treatment Rules and Subgroups. Statistics in Medicine. 38:2632–2651.
  • Chen Y, Wang Y, Zeng D (2020). Synthesizing Independent Stagewise Trials for Optimal Dynamic Treatment Regimes. Statistics in Medicine. 39(28): 4107-4119.
  • Chen Y, Zeng D, Wang Y (2020). Learning Individualized Treatment Rules for Multiple-Domain Latent Outcomes. Journal of the American Statistical Association. In press.

"Efficient Statistical Learning Methods for Personalized Medicine Using Large Scale Biomedical Data" 


Principal Investigator: Yuanjia Wang

Funding Agent: NIH-NIGMS (R01GM124104)

Abstract: This project aims to develop novel and scalable statistical learning methods to analyze electronic health records (EHRs) and use two real-world, high-quality EHR databases for personalized medicine research. The methods will handle the non-experimental nature of data collection processes, along with heterogeneous data types, dynamic treatment sequences, and the trade-off between benefit and risk outcomes. The results will complement the current knowledge base for individual patient care using evidence generated from patients in real-world clinical practices.

Key publications and products:

  • Wang Q, Xie S, Wang Y, Zeng D (2020). Survival-Convolution Models for Predicting COVID-19 Cases and Assessing Effects of Response Strategies. Frontiers In Public Health.
  • 8:325. Codes are available on our Github website. Our forecasts are included in COVID-19 Forecast Hub and used by the CDC.
  • Wu P, Xu T, Wang Y (2019). Learning Personalized Treatment Rules from Electronic Health Records Using Topic Modeling Feature Extraction. 2019 IEEE Proceedings on Data Science and Advanced Analytics (DSAA). Washington D.C., USA, 2019. In press.
  • Wu P, Zeng D, Fu H, Wang Y (2020). Using Electronic Health Records to Improve Optimal Treatment Rules in Randomized Trials. Biometrics. In press.
  • Wu P, Zeng D, Wang Y. (2020). Matched Learning for Optimizing Individualized Treatment Strategies Using Electronic Health Records. Journal of the American Statistical Association. 115:529, 380-392.

"Optimizing and Personalizing Interventions For Schizophrenia Across the Lifespan" 

(4/1/2018 ‐ 12/31/2021)

Methods Core Lead: Melanie Wall

Funding Agent: NIMH

Abstract: The goal of OPAL is to accelerate the adaptation, development, and implementation of effective, personalized treatments in real‐world settings for people diagnosed with schizophrenia who commonly have impaired social and occupational functioning, experience persistent psychotic and mood symptoms, and are at risk for disability and premature death.

"Statistical Methods for Early Disease Prediction and Treatment Strategy Estimation Using Biomarker Signatures"


Principal Investigator: Yuanjia Wang

Funding Agent: NINDS

Abstract: The ultimate goal of neuropsychiatric research is to develop experimental therapeutics to delay disease on- set, slow disease progression, and provide effective treatment at each stage of the disease. This proposal aims to develop new statistical approaches to integrate complementary sources of information from genomic measures, brain imaging biomarkers, and early clinical signs to characterize disease mechanism, progression, and treatment responses, and thereby inform the design of clinical trials and the discovery of optimal personalized therapies.

Key publications and products:

  • Xie S, Li X, McColganc P, Scahillc S, Zeng D,Wang Y (2019). Identifying Disease-Associated Biomarker Network Features Through Conditional Graphical Model. Biometrics. 71(3): 772–781.
  • Sun M, Wang Y. (2018). Nonlinear Model with Random Inflection Points for Modeling Neurodegenerative Disease Progression. Statistics in Medicine. 37:4721–4742.
  • Sun M, Zeng D, Wang Y (2020). Modeling Temporal Biomarkers With Semiparametric Nonlinear Dynamical Systems. Biometrika. In press.

"Advanced Modeling Techniques for Brain Imaging Data with PET"


Principal Investigator: Todd Ogden

Funding Agent: NIBIB (R01EB024526)

Abstract: Positron emission tomography (PET) represents a powerful tool for investigating the biological base of depres- sion, Alzheimer's disease, and other neuro-Psychiatrychiatric diseases. To analyze PET data and address the broader scientific questions we propose to develop powerful new analysis techniques that will model data across many subjects at once, allowing for greater flexibility and precision. These advances will also relax the requirements for PET imaging, allowing for much greater clinical applicability.

"Identifying Reproducible Brain Signatures of Obsessive‐Compulsive Profiles"

(8/1/2017 ‐ 4/30/2022 )

Principal Investigator(s): Melanie Wall and Blair Simpson

Funding Agent: NIMH (R01 MH113250)

Abstract:  Obsessive‐compulsive disorder (OCD) is a prevalent and disabling disorder, and fewer than half of OCD patients become well with current treatments. This study seeks to identify reproducible neuroimaging signatures associated with cognitive and clinical profiles that are common in individuals with OCD and that transcend countries/cultures. Identifying brain signatures of measurable behaviors and clinical symptoms will provide robust new treatment targets and help pave the way to precision psychiatry where individual brain signatures can help guide treatment choices.

"Functional data analytics for kinematic assessments of motor control"


Principal Investigator: Jeff Goldsmith

Funding Agent: NINDS (R01NS097423)

Abstract: We propose to develop models that address gaps in the statistical literature exposed by data produced in experiments using kinematic data to assess motor control, skill, learning, and recovery following stroke. In such experiments, subjects make repeated motions that are recorded in their entirety, producing a rich dataset that allows unique insights into motor control. Analyses in the neuroscience literature have to date focused on simple summaries of this data, reducing hundreds of motions to single numbers. In place of this immense reduction, we propose a collection of models using a functional data analytic perspective to provide a comprehensive framework for the analysis of such data.

Key publications and products:

  • D. Backenroth, J. Goldsmith, M. D. Harran, J. C. Cortes, J. W. Krakauer, and T. Kitago (2018). Modeling motor learning using heteroskedastic functional principal components analysis. Journal of the American Statistical Association, 113 1003-1015. 
  • R. Kundert, J. Goldsmith, J. Veerbeek, J. W. Krakauer, and A. R. Luft (2019). What the proportional recovery rule is (and is not): methodological and statistical considerations. Neurorehabilitation and Neural Repair, 33 876-887. 
  • J. C. Cortes, J. Goldsmith, M. Harran, J. Xu, N. Kim, A. R. Luft, P. Celnik, J. W. Krakauer, and T. Kitago (2017). A short and distinct time window for recovery of arm motor control after stroke revealed with a global measure of trajectory kinematics. Neurorehabilitation and Neural Repair, 31 552-560.
  • J. Goldsmith, T. Kitago (2016). Assessing Systematic Effects of Stroke on Motor Control using Hierarchical Function-on-Scalar Regression. Journal of the Royal Statistical Society: Series C, 65 215-236.

"Novel Methods for evaluation and Implementation of Behavioral Intervention Technologies for Depression"

(5/20/2016 -1/31/2021)

Principal Investigator: Ken Cheung

Funding Agent: NIMH (R01MH109496)

Abstract: Major depressive disorder (MDD) is common, and imposes a high societal cost in terms of quality of life, work productivity, functional status, morbidity, and mortality. The national healthcare system will not be able to meet the needs of the population with standard one-to-one intensive psychological treatments. With the growing number of smartphone use, behavioral intervention technologies have become a viable and scalable option to deliver psychotherapy to the population. This research proposes novel evaluation and implementation concepts and methods that aim to extend our capacity to adopt these technologies in an evidence-based manner.

"Develop Quantile Analysis Tools for Sequencing and EQTL Studies"


Principal Investigator: Ying Wei

Funding Agent: NHGRI (R01HG008980)

Abstract: The project will develop quantile analysis tools to the expression Quantitative Trait Loci (eQTLs) in single/multiple tissues, and identify the associations between infrequent/rare variants with human complex traits using next-generation sequencing data. Once complete, the developed methods and their applications have great potential to deepen and expand the existing knowledge in genetics, and to contribute significantly to the fields of statistics as well.

"Integrative methods for the identification of causal variants in mental disorder"


Principal Investigator: Iuliana Iontia-Laza

Funding Agent: R01MH106910 (NIMH)

Abstract: Autism Spectrum Disorders and Schizophrenia are common diseases with a major impact on public health. The proposed integrative statistical methods and their direct applications to psychiatric diseases will lead to a better understanding of the biological mechanisms underlying these disorders, with important implications on disease treatment.

"Statistical Methods for Neural Mechanisms mediating cognitive system in mental health"

(10/1/15 - 05/31/19)

Principal Investigator: Seonjoo Lee

Funding Agent: K01AG051348 (NIA)

Abstract: The overarching aim of my K01 Mentored Research Development Award is to acquire training that will allow me to pursue a line of mental health research related to cognition and to develop novel statistical methods to support the emerging research in cognitive neuroimaging and mental health.