Abstract
PURPOSE Multiple cancer screening tests have been advocated for the general population; however, clinicians and patients are not always well-informed of screening burdens. We sought to determine the cumulative risk of a false-positive screening result and the resulting risk of a diagnostic procedure for an individual participating in a multimodal cancer screening program.
METHODS Data were analyzed from the intervention arm of the ongoing Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial, a randomized controlled trial to determine the effects of prostate, lung, colorectal, and ovarian cancer screening on disease-specific mortality. The 68,436 participants, aged 55 to 74 years, were randomized to screening or usual care. Women received serial serum tests to detect cancer antigen 125 (CA-125), transvaginal sonograms, posteroanterior-view chest radiographs, and flexible sigmoidoscopies. Men received serial chest radiographs, flexible sigmoidoscopies, digital rectal examinations, and serum prostate-specific antigen tests. Fourteen screening examinations for each sex were possible during the 3-year screening period.
RESULTS After 14 tests, the cumulative risk of having at least 1 false-positive screening test is 60.4% (95% CI, 59.8%–61.0%) for men, and 48.8% (95% CI, 48.1%–49.4%) for women. The cumulative risk after 14 tests of undergoing an invasive diagnostic procedure prompted by a false-positive test is 28.5% (CI, 27.8%–29.3%) for men and 22.1% (95% CI, 21.4%–22.7%) for women.
CONCLUSIONS For an individual in a multimodal cancer screening trial, the risk of a false-positive finding is about 50% or greater by the 14th test. Physicians should educate patients about the likelihood of false positives and resulting diagnostic interventions when counseling about cancer screening.
Annals Journal Club selection—see inside back cover or http://www.annfammed.org/AJC/.
INTRODUCTION
Numerous cancer screening tests are promoted to the healthy public.1–8 The motivating factor behind regular cancer screening is the theory that the earlier one detects a malignancy or premalignancy, the more likely treatment is to be effective in increasing lifespan while minimizing harms caused by the therapy.9 Although this model has intuitive appeal, it is often used without actual proof in hand and without full consideration of potential adverse consequences. The most common potential adverse consequence is a false-positive result, which often brings with it physical, psychological, and economic burdens of further diagnostic testing.10–13
The false-positive rate of a single screening test has been studied, but the cumulative false-positive rate of repeating the test at regular intervals is infrequently reported,14–17 and the cumulative false-positive rate of multiple tests has not, to our knowledge, been reported at all. The ongoing Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial is designed to assess the benefits and harms of screening for 4 major causes of cancer mortality. As such, it represents an ideal opportunity to assess the cumulative false-positive rate and resulting diagnostic procedure rates of a combined-modality screening program.
METHODS
Methods for the PLCO Cancer Screening Trial
The PLCO trial is a randomized, 2-armed trial to determine the effects of prostate, lung, colorectal, and ovarian cancer screening on disease-specific mortality.18–23 Ten screening centers are participating. Randomization began in 1993 and concluded in 2001. The total number of enrolled participants was 154,935. Individuals were eligible to enroll if they were aged between 55 and 74 years with no history of prostate, lung, colorectal, or ovarian cancer. Further exclusion criteria included current treatment for any cancer except basal or squamous skin cancer; prior removal of the entire prostate, 1 lung, or the entire colon; concurrent participation in another cancer screening or primary prevention study; use of finasteride within 6 months of enrollment; and, after 1995, men reporting more than 1 prostate-specific antigen test in the 3 years before enrollment, or any colonoscopy, sigmoidoscopy, or barium enema in the 3 years before enrollment.
Eligible participants were randomly assigned to the control arm (normal health care routines), or to the intervention arm. Within the control arm, patients were informed about screening tests for PLCO cancers and could have potentially received testing outside the confines of the trial through private physicians. Randomization to the intervention arm meant that women were offered annual tests for the cancer antigen 125 (CA-125) and transvaginal ultrasonography for ovarian cancer for 4 years, posteroanterior chest radiographs for lung cancer at baseline and yearly for 2 years (non-smokers) or 3 years (smokers), and baseline and 3- or 5-year flexible sigmoidoscopy for colorectal cancer. Men were offered the same tests for lung and colorectal cancers; additionally, they received annual digital rectal examinations (DREs) and prostate-specific antigen (PSA) blood tests for 4 years.
The order in which the tests were offered to a given patient during a round of screening varied, with the exception that PSA had to be performed before a DRE, and serum to test for CA-125 was drawn before a transvaginal ultrasonography was offered. Flexible sigmoidoscopy was usually scheduled to be the last procedure during a round of testing, as it was believed that delaying the more unpleasant examination might improve overall compliance. Positive screening results were specifically defined in the study protocol,18 but follow-up was left to the discretion of the participants’ personal physicians, who were notified of all abnormal results. According to PLCO protocol, colonic lesions found during flexible sigmoidoscopy were not immediately biopsied and removed; instead, they prompted referral to the participant’s primary physician for follow-up. Criteria for a positive screening examination are listed in Table 1⇓.24
Methods for the PLCO False-Positive Screening Study
Study Population and Eligibility
Enrollment exclusions used by the PLCO trial as a whole applied. We began with the intervention arm of the PLCO trial (n = 77,464). Additional exclusion criteria for this study were (1) death before the first screening test (n = 39); (2) missing all screening tests (n = 5,028); and (3) inadequate follow-up time (less than 3 years after the last screening test taken) (n = 3,961) (Figure 1⇓). We observed the remaining 68,436 participants up to a maximum possible 14 tests (although not all participants consented to or received all 14 examinations). At the time of the initial screening and 3 years later, participants were offered all 4 tests (chest radiograph, flexible sigmoidoscopy, CA-125, and transvaginal ultrasonography for women; and chest radiograph, flexible sigmoidoscopy, PSA, and DRE for men); 1 and 2 years after the initial screening, participants were offered all tests except flexible sigmoidoscopy.
Flexible sigmoidoscopy frequency was decreased from every 3 to every 5 years during the trial to better match changing clinical practices. Given our 3-year timeframe, flexible sigmoidoscopies offered at year 5 were not included in this study. Approximately 25% of participants had the opportunity to undergo 2 flexible sigmoidoscopies; the rest were scheduled for only 1 (at baseline). As a posteroanterior chest radiograph was offered only twice after the baseline examination for nonsmokers, not all participants underwent the fourth chest radiograph at year 3.
Definitions
False-Positive Result
By consensus of the investigators, a positive screen was considered a false positive if the participant had at least 3 years of follow-up after the positive result and the target cancer was not diagnosed by that time. For flexible sigmoidoscopy, advanced adenomas (including those with villous histopathologic findings, or severe cellular dysplasia, or that were ≥1 cm in diameter) were considered true positives for the base-case analysis. There is ongoing debate concerning the target lesion in colorectal cancer screening. Although clinicians generally agree that colorectal lesions 1 cm and larger warrant work-up, discordant recommendations are found for smaller polyps, with some clinicians advocating for the removal of any observable lesion.25 The natural history of all of these lesions (how often they progress to cancer and how long it takes), including advanced adenomas, however, remains poorly understood.26 We therefore chose the midpoint along this spectrum of opinions for our primary analysis.
Diagnostic Follow-Up
Diagnostic follow-up for positive results was defined by consensus of investigators and divided into the following categories: (1) minimally invasive procedure (ie, simple endoscopy with conscious sedation); (2) moderately invasive procedure (ie, tissue removal, more involved instrumentation, or general anesthesia, including laparoscopy); and (3) major thoracic, abdominal, or pelvic surgery (ie, prostatectomy or colectomy). (The adjective “invasive” is not intended as a value-laden term but is being used according to the medical definition: “Involving puncture or incision of the skin or insertion of an instrument or foreign material into the body” [http://www.dorlands.com/].) Repeated screening examinations, physical examinations, chart reviews, and basic imaging examinations were not considered invasive procedures and did not contribute to these categories.
The causal relationship between a diagnostic procedure and a positive findings on a screening examination was determined by trained chart abstractors at each study center. An organ-specific standardized diagnostic evaluation form was completed for every positive screening result that captured follow-up care through 1 year after the screening date.
Statistical Methods
We sought to answer the question, What is the probability that an individual entering a multimodal screening program will obtain at least 1 false-positive test in a given number of tests? The approach applied a model for a single screening test to multiple tests.27 The model is a type of survival analysis based upon number of tests taken rather than time elapsed. The tests are ordered by appearance and analyzed by tests-to-event rather than by time-to-event; this method allows for appropriate handling of missing tests while enhancing use of available data. For example, if a participant took the first 7 tests, skipped 3 tests, and then returned for the last 4 tests, his or her results contribute to the first 11 points along the cumulative incidence curve. A participant no longer contributes to the curve either after the first false-positive result (event reached) or after the last test taken (censored).
Because participants could receive different screening tests at each testing point, the model treats each test in order of appearance as a composite of all modalities. It makes a single informative censoring assumption, applied only to individuals who did not receive a false-positive result during the study period and who took some, but not all, 14 tests: the unobserved probability of receiving a first false-positive result among the censored tests of various modalities follows a geometric distribution with a constant hazard.
This method was also used to determine the cumulative risk of any single invasive diagnostic procedure resulting from a false-positive test result. Sensitivity analysis was performed by using a recently published modified Kaplan-Meier approach.28
RESULTS
The demographic characteristics of the PLCO trial have been described elsewhere.23,29 There were 68,436 participants who underwent at least 1 screening test and had adequate follow-up (Figure 1⇑). Most participants (65%) ranged in age from 55 to 64 years, were white, non-Hispanic (88.3%), and had a college education or higher (57.5%). Participant sex was equally distributed (50% men, 50% women).
False-Positive and Follow-up Diagnostic Procedure Rates
The maximum number of false-positive results observed in a female was 8, and in a male, 10, from a potential 14 (Table 2⇓). Of the 68,436 participants, 43.1% (n = 29,517) had 1 or more false-positive findings: 35.3% of women and 50.9% of men. In men flexible sigmoidoscopy accounted for the greatest percentage of false-positive results (26.8%), then chest radiograph (18.6%), DRE (15.0%), and PSA (10.4%). In women flexible sigmoidoscopy accounted for the greatest percentage of false-positive results (17.2%), then chest radiograph (16.3%); transvaginal ultrasonography (7.8%), and CA-125 (3.0%) (Table 3⇓).
Of the participants only 3.0% had minimally invasive diagnostic procedures, and 15.8% underwent moderately invasive procedures; 1.6% of participants incurred major surgery. When examining diagnostic responses by screening modality, flexible sigmoidoscopy resulted in the highest rates of minimally and moderately invasive diagnostic follow-up (for men, 3.4% and 15.8%, respectively; for women, 3.0% and 10.1%, respectively). Transvaginal ultrasonography accounted for the preponderance of major surgeries resulting from false-positive findings. Of the screened women 3% underwent a major surgical procedure for false-positive findings on a transvaginal sonogram.
Cumulative Risk of a False-Positive Result
For men the risk of having at least 1 false-positive finding is 36.7% (95% CI, 36.2%–37.3%) by the 4th screening test (the end of the first day of screening), and 60.4% (95% CI, 59.8%–61.0%) by the 14th examination. Women have a 26.2% risk (95% CI, 25.7%–26.8%) by the 4th screening test, and 48.8% risk (95% CI, 48.1%–49.4%) by the 14th examination. Sensitivity analysis by modified Kaplan-Meier approach provides similar results at 14 tests: 57.2% for men (95% CI, 56.3%–58.2%) and 44.1% for women (95% CI, 43.4%–44.7%). The cumulative risk of receiving at least 1 false-positive result in the 14 tests is displayed in Figure 2⇓, panel A. The exact estimates and 95% confidence intervals for each test number (t) can be found in Supplemental Table 1, available online at http://www.annfammed.org/cgi/content/full/7/3/212/DC1.
When comparing individual screening modalities, flexible sigmoidoscopy accounted for the highest cumulative false-positive risk: 41.8% (95% CI, 40.9%–42.7%) for men and 29.2% (95% CI, 28.3%–30.0%) for women after 2 examinations. Chest radiograph accounted for the next highest false-positive risk, with a 22.3% (95% CI, 21.4%–23.2%) probability after 4 tests for men and a 21.5% (95% CI, 20.6%–22.4%) probability for women. The cumulative risks of receiving at least 1 false-positive result by screening modality is displayed in Figure 2⇑, panel B. The exact estimates and 95% confidence intervals for each test number (t) can be found in Supplemental Table 2, available online at http://www.annfammed.org/cgi/content/full/7/3/212/DC1.
Cumulative Risk of an Invasive Diagnostic Procedure Resulting from a False-Positive Result
For a woman the cumulative risk of undergoing a false-positive–prompted invasive diagnostic procedure* is 12.3% (95% CI, 11.8–12.8%) after 4 tests, and 22.1% (95% CI, 21.4%–22.7%) after 14 examinations. For men, the risk is slightly higher: 17.2% (95% CI, 16.7%–17.6%) after 4 tests, and 28.5% (95% CI, 27.8%–29.3%) after 14 examinations. Sensitivity analysis confirmed similar percentages: for women, the risk after 14 tests is 20.6% (95% CI, 20.4%–20.9%); for men, 27.5% (95% CI, 27.2%–27.9%). The cumulative probability of receiving at least 1 invasive diagnostic procedure is displayed in Figure 3⇓, panel A. The exact estimates and 95% confidence intervals for each test number (t) are shown in Supplemental Table 3, available online at http://www.annfammed.org/cgi/content/full/7/3/212/DC1.
We also compared cumulative risks of false-positive–prompted invasive diagnostic interventions by individual modalities. Flexible sigmoidoscopy accounts for the greatest probability of a false-positive–prompted invasive diagnostic procedure: a 30.1% (95% CI, 29.2%–31.1%) probability after 2 tests for men, and a 22.1% (95% CI, 21.2%–23.0%) probability for women. A false-positive transvaginal sonogram is associated with the next highest risk, with a 6.7% (95% CI, 5.6%–7.7%) probability after 4 tests. The probability of receiving at least 1 invasive diagnostic procedure by screening modality is displayed in Figure 3⇑, panel B. The exact estimates and 95% confidence intervals for each test number (t) are shown shown for all modalities in Supplemental Table 4, available online at http://www.annfammed.org/cgi/content/full/7/3/212/DC1.
DISCUSSION
The cumulative risk of an individual obtaining a false-positive result in a multimodal screening program increases with number of screening tests; by the 4th screening test—which in the PLCO trial would mean the end of day 1—the risk is about 37% for men and 26% for women. By the 14th test, the risk is approximately 60% and 49% for men and women. The risk of undergoing any false-positive–prompted invasive diagnostic procedure is about 17% for men and 12% for women after 4 tests, and 29% for men and 22% for women after 14 screening tests.
Historically, recommendations for individual screening modalities have varied and will likely continue to vary across health care systems. The overarching principle shown by this study, however—that of rapidly increasing cumulative false-positive rates associated with multiphasic screening—remains important and underappreciated when recommending screening. To our knowledge, this study is the first to use randomized trial data to estimate the general magnitude of false positives that might be expected with a multimodal screening program; the fundamental approach and techniques used in this work should be applicable across a wide range of screening regimens.
The PLCO trial does not include an evaluation of mammography; however, as this modality is in widespread use, the associated false-positive rates are of clear importance to women and clinicians. Elmore et al estimated the cumulative incidence of false positives associated with mammography to be 49.1% after 10 examinations.14 To put this estimate in context with our own findings, the cumulative incidence of a false-positive mammogram was estimated at approximately 18% after 3 examinations (the maximum number that would have been observed during our study’s timeframe). Although this percentage is not directly additive to our cumulative incidence curve, it is a mathematical certainty that adding another testing modality would increase the final risk of obtaining at least 1 false positive for women.
Our study was a test-to-event rather than time-to-event analysis: we included any participant who took at least 1 screening examination sometime during the course of the 3 rounds (or 14 possible tests), and we analyzed the test results by total number rather than by time elapsed to allow for appropriate handling of missing tests while maximizing the use of available data. Individuals in the intervention arm who received no screening examinations were not included in the results: we chose this approach because we believe it to be most accurate and informative for the real-world scenario of a patient visiting a clinic with the intention of being screened and who wants to understand the risks involved if he or she undergoes at least some proportion of a multiphasic program of screening.
Additionally, not all participants received all 14 tests; as a result, the calculated false-positive cumulative incidence curves may underestimate true false-positive rates for a multiphasic screening program with 100% compliance. We believe, however, that the population included in this analysis—with good but not perfect compliance—and the cumulative risks obtained from this population best approximate what could be expected in real-world practice.
Because a given individual could receive a different screening modality at each point along the cumulative risk curves, the curves show the general accumulated burden better than they serve as precise indicators of risk at a given time. That is, the curves provide rule-of-thumb estimates at the point of each screening examination taken (t), but most accurately predict exact percentages at the end of the first day of screening and once all of the tests have been undergone (at t = 14). The curves provide rough approximations of time-to-event: if all tests are taken according to protocol, estimated rates at t = 4 are equivalent to the end of day 1 of testing, rates at t = 7 are equal to year 1 of testing, rates at t = 10 are equal to year 2 of testing, and rates at t = 14 are equal to year 3 of testing. Our methods, however, maximize the use of available data from variably compliant individuals and analyze tests by total number rather than elapsed time. For that reason, it is more accurate to interpret results by final number of tests taken than by years within the screening program.
A steep rise in the cumulative risk for false-positive examination results can be seen during the initial screening session (tests 1 to 4, Figure 2⇑, panel A) There is likely to be a subset of healthy individuals in the general population who do not have cancer but chronically express a benign abnormality detectable by blood test, radiographic test, or endoscopy. Upon initial screening, these individuals would likely have been immediately evaluated further, and once a false positive was identified, these individuals did not contribute further to the overall multimodal risk curve.
The classification of follow-up procedures into 3 categories is subjective in that the diagnostic tests performed were widely disparate; different clinicians may have conflicting opinions about what relegates a procedure to a minimally or moderately invasive category. For this reason, we have presented the full list of procedures and how they fit in our own classification system for the sake of transparency.
Clinicians may have varying thresholds for what they consider a true- vs false-positive result. Some researchers currently estimate a 5- to 15-year lead-time for PSA and prostate cancer.30 Our study used a standardized 3-year negative follow-up period across all modalities to define a false positive; we did not feel it appropriate, absent definitive knowledge, to vary the time-interval criteria for a false positive by individual modality.
In the case of colorectal cancer, as previously noted, some clinicians argue that any adenoma, regardless of size, should be considered a true-positive result. We performed a sensitivity analysis to examine the effect that including any adenoma would have on false-positive rates in the PLCO trial. The percentage of men receiving false-positive flexible sigmoidoscopy results using this broader definition changes from 26.8% to 20.2%, and for women, from 17.2% to 13.5%. (In this situation, the remaining false positives encompassed such findings as inflammation, hyperplastic polyps, or a lack of demonstrable lesion on diagnostic follow-up.) The percentage of participants receiving at least 1 false positive of any modality shifts from 43.1% to 39.5% (for men, 50.9% to 46.6%; for women, 35.3% to 32.3%). If, conversely, the definition of a true-positive is restricted to overt cancer only, as some PLCO investigators favored, the percentage of men receiving at least 1 false-positive flexible sigmoidoscopy result increases to 30.3%; in women, 19.3%. For all modalities combined, the absolute percentage of participants receiving at least 1 false positive rises to 45.2% (men, 53.5%; women, 36.8%). Establishing a threshold for referral to colonoscopy after abnormal flexible sigmoidoscopy is complex. Randomized trials in Europe used biopsy at the time of flexible sigmoidoscopy to assist in triaging patients to reduce the number sent for colonoscopy,31,32 an approach not used in the PLCO trial.33 Ultimately, understanding the benefit of a higher compared with a lower rate of referral for colonoscopy after flexible sigmoidoscopy must await the outcome of the randomized trials.
Since the PLCO trial began, the use of flexible sigmoidoscopy in the United States has become less common, while the use of screening colonoscopy has increased substantially with the introduction of Medicare reimbursement coverage for this procedure.34 The Centers for Disease Control and Prevention estimates that approximately 2.8 million flexible sigmoidoscopy examinations were performed in 2002; flexible sigmoidoscopy also remains a recommended screening modality by many professional organizations, including the US Preventive Services Task Force and, jointly, the American Cancer Society, the US Multi-society Task Force on Colorectal Cancer, and the American College of Radiology.35–37 Because colonoscopy allows evaluation of more of the total area of the colon than does flexible sigmoidoscopy (and thus, has the potential to discover more abnormalities of uncertain significance), it is likely that substituting this modality would have increased the cumulative incidence of false positives. Our observed flexible sigmoidoscopy false-positive rate could thus be considered a conservative estimate of the false-positive risks for colonoscopy.
The rate of prostatic biopsy follow-up (ie, the percentage of moderately invasive procedures) for abnormal PSA results may appear low to some in the urologic community (44%). This percentage does not include men whose prostatic biopsies resulted in a diagnosis of cancer (ie, true positives), and thus it represents only a fraction of the total number of biopsies performed in the PLCO trial. Our cutoff point for an abnormal PSA was 4 ng/mL; however, because there is no absolute value below which a man can be assured he does not have prostate cancer,38 some researchers and clinicians have used lower cut-points for screening (between 2.5 to 4 ng/mL). Attempting to increase sensitivity in this manner would likely increase the cumulative false-positive rates we report here.
Our study has several limitations. The PLCO trial examines a series of cancer screening tests, but not all of these modalities are used by a majority of the public. Nevertheless, all have been publicly advocated at one time by professional organizations or advocacy groups. The trial participants may not be entirely representative of the general population in that by agreeing to join a long-term cancer screening trial, they may be more involved in health promotion activities and may have higher screening compliance rates than nonparticipants.29
The false-positive and diagnostic follow-up rates found in this study have important practical implications, especially given that the study was limited to 4 rounds of screening examations. Most formal screening recommendations are open-ended; none advise stopping after 4 rounds. Given the rate at which false-positive results can accumulate in multimodal screening programs and the potential iatrogenic burden these regimens can generate in healthy individuals, we propose that future guidelines begin determining risk-benefit equations for entire screening regimens rather than continuing to evaluate individual tests separately.
This study has developed a clearer picture in a previously unexplored aspect of burden and risks associated with scheduled multimodal cancer screening programs.
The benefit of regular combined-modality cancer screenings in reducing mortality is not yet known; we will not have that information until the conclusion of the PLCO trial. Physicians and patients must therefore examine the balance of known risks vs potential benefits to determine the most appropriate course of action for each individual.
Acknowledgments
The authors wish to thank all of the men and women who agreed to participate in this landmark clinical trial, and all of the centers, researchers, clinicians, and health professionals that have run the trial. We are grateful to the PLCO Steering Committee for providing exceptionally useful feedback as this study was being developed.
Footnotes
-
↵* Invasive categories were previously defined as minimally invasive procedure, moderately invasive procedure, or major surgical procedure. These categories exclude repeated screening examinations, chart reviews, physical examinations, and basic radiographic imaging.
↵* Drs Croswell and Kramer contributed equally as lead authors of this article.
Conflicts of interest: All authors report no potential conflicts of interest relevant to this article.
Funding support: Funding for the overall Prostate, Lung, Colorectal, and Ovarian (PLOC) Cancer Screening Trial was provided by the US National Cancer Institute. No additional funding was provided for the design and conduct of this study.
Trial registration: Screening for Cancer of the Prostate, Lung, Colon, Rectum, or Ovaries in Older Patients. NCT00002540. http://clinicaltrials.gov.
Disclaimer: The views expressed in this article are those of the authors and do not necessarily represent the views of the US Federal government or the National Institutes of Health.
- Received for publication April 30, 2008.
- Revision received August 15, 2008.
- Accepted for publication August 20, 2008.
- © 2009 Annals of Family Medicine, Inc.