To evaluate the clinical utility of new biomarkers for risk prediction, a crucial step is to measure their predictive accuracy with prospective studies. However, it is often infeasible to obtain marker values for all study participants. The nested case-control(NCC) design is a useful cost-effective strategy for such settings. Under the NCC design, markers are only ascertained for cases and a fraction of controls sampled randomly from the risk sets. The outcome dependent sampling generates a complex data structure and therefore a challenge for analysis. Existing methods for analyzing NCC studies focus primarily on association measures. When there is a single marker of interest, we propose a class of non-parametric estimators for commonly used accuracy measures. Asymptotic theory for the proposed estimators were derived to account for both the outcome dependent missingness and the correlation induced by finite population sampling due to the NCC design. When there are multiple markers under investigation, we extended the proposed procedures to derive an optimal composite risk score for prediction. We provided inference procedures for the prediction accuracy of the risk score and as well as for making comparisions between two risk scores. The new procedures were illustrated with data from the Nurse's Health Study to evaluate the accuracy of biomarkers and genetic markers for predicting the risk of developing Rheumatoid Arthritis.
Dept of Biostatistics
biostats [at] columbia [dot] edu