Not So Fast: Mailman Biostatisticians Raise Concerns Over Cancer Screening Study
Late last year, The Lancet published the long-anticipated results of the largest ovarian cancer screening trial to date. The study of more than 200,000 women over a 14-year period examined a new screening protocol for a disease called the “silent killer” because its symptoms often don’t manifest until the cancer has spread. While results were inconclusive, the investigators presented enticing evidence in the paper that the screening was effective. Shortly after, Abcodia, the company behind the screening test, made it available commercially.
But earlier this month, the Food and Drug Administration issued a “safety communication” statement recommending against the screening test, and a week later, Abcodia voluntarily pulled their product, the $295 ROCA (“Risk of Ovarian Cancer Algorithm”) test, from the market. The FDA action came on the heels of a June editorial in American Family Physician authored by a group of experts—including two Mailman School biostatisticians—that pointed to uncertainties in the Lancet study and expressed reservations about the marketing of the screening test.
The proprietary algorithm developed by Harvard biostatistician Steven J. Skates works by assessing changes in levels of a protein biomarker called CA-125 over time. In the 2012 study known as the United Kingdom Collaborative Trial of Ovarian Cancer Screening (UKCTOCS), postmenopausal women were randomized to one of three groups: multimodal screening (MMS) using ROCA, transvaginal ultrasound (USS), or no screening. On the surface, the results of UKCTOCS published in The Lancet several years ahead of the study’s conclusion, were very promising.
For women enrolled in the MMS arm, who were followed up by ultrasound screening when increasing CA-125 was found, ovarian cancer was diagnosed earlier than for those not screened. Even more exciting, the researchers reported a significant reduction in risk of death for women in the subset screened annually for at least seven years. Yet at a February meeting called by the Ovarian Cancer Research Fund that gave rise to the June editorial, Mailman’s Bruce Levin and Cody Chiuzan and others voiced serious concerns about the research and underlined the significant downside of imprecise screening.
A test that is insufficiently specific would generate many false positives—which at the least would give women a bad scare, and at the most, lead to unnecessary surgery, chemotherapy, and radiation. On the other hand, a test that is insufficiently sensitive would miss cancers, potentially delaying necessary treatment.
While there was nothing fraudulent about the UKCTOCS study, the Mailman biostatisticians say its most promising results are the result of several misleading statistical contortions.
For starters, they question why it would take seven years to show a survival benefit for the screening test. In the typical screening trial, Levin, a professor of Biostatistics, explains, it may take several years until enrolled patients develop a disease, but in the UKCTOCS trial, the survival curves in both the ROCA and no screening arms overlap perfectly for about ten years, a period during which many women had died. Experts at the June meeting said there was no plausible explanation for the delay in mortality reduction, except perhaps as an artifact of shifting demographics as older study participants dropped out—a possibility the study hadn’t explored but is currently. “Older women might be less likely to go through all the repeated screenings,” posits Chiuzan, an assistant professor of Biostatistics.
Another more technical issue relates to a mismatch between certain published p-values indicating statistical significance and confidence intervals for mortality reductions indicating insignificance. “Statistics 101 says these two methods ought to agree,” she says. To arrive at the findings reported in the Lancet, the investigators employed a complex statistical model for the cumulative incidence curves, undertaken only after they deemed the original method, a Cox proportional hazards model, to be suboptimal. According to Levin and Chiuzan, biostatisticians generally abhor this kind of post hoc methodological rejiggering. And it turns out the significant p-value referred to a different hypothesis than the one concerning mortality reduction.
The Final Analysis
The two biostatisticians say the UKCTOCS researchers deserve credit for organizing such a complex and ambitious study, and were suitably cautious in their reporting. Others have raised the possibility of financial bias in the Lancet paper: Ian J. Jacobs, one of two lead authors is also a co-inventor of ROCA and has a financial stake in its success. But Levin says in no way do the study’s shortcomings rise to the level of fundamental errors of the kind he recently helped expose in the PACE trial for myalgic encephalomyelitis (a.k.a. chronic fatigue syndrome).
“The real problem was the overenthusiasm of the investigators with or without the financial impetus to put spin on the findings that should not yet be touted as life-saving,” says Levin. “The bottom line is that the screening test is not ready for primetime. We need more evidence of a benefit.”
The UKCTOCS study continues for another three years. Will additional data make a difference? We’ll just have to wait and see. But according to the Mailman biostatisticians, the bar is always high for screening tests—particularly for a rare disease like ovarian cancer.
As any introductory biostatistics lecture makes clear, even if you have a screening test with 99 percent sensitivity and 99 percent specificity used in a population where one in a hundred people have the disease, you’ll get a lot of false positives. “Half the time you’ll scare the hell out of a patient and cause anxiety, stress, or other psychosocial consequences while they’re not actually diseased,” says Levin.
And this is better than the situation for ovarian cancer: while MMS did correctly identify substantially more cancers among those testing positive than did ultrasound alone, still, more than half of the positives were false positives. Says Levin, “That’s why we need to be cautious.”