Abstract
PURPOSE In this study, we assessed whether multivariate models and clinical decision rules can be used to reliably diagnose influenza.
METHODS We conducted a systematic review of MEDLINE, bibliographies of relevant studies, and previous meta-analyses. We searched the literature (1962–2010) for articles evaluating the accuracy of multivariate models, clinical decision rules, or simple heuristics for the diagnosis of influenza. Each author independently reviewed and abstracted data from each article; discrepancies were resolved by consensus discussion. Where possible, we calculated sensitivity, specificity, predictive value, likelihood ratios, and areas under the receiver operating characteristic curve.
RESULTS A total of 12 studies met our inclusion criteria. No study prospectively validated a multivariate model or clinical decision rule, and no study performed a split-sample or bootstrap validation of such a model. Simple heuristics such as the so-called fever and cough rule and the fever, cough, and acute onset rule were each evaluated by several studies in populations of adults and children. The areas under the receiver operating characteristic curves were 0.70 and 0.79, respectively. We could not calculate a single summary estimate, however, as the diagnostic threshold varied among studies.
CONCLUSIONS The fever and cough, and the fever, cough, and acute onset heuristics have modest accuracy, but summary estimates could not be calculated. Further research is needed to develop and prospectively validate clinical decision rules to identify patients requiring testing, empiric treatment, or neither.
- Influenza
- clinical decision rule
- systematic review
- diagnosis
- prediction model
- decision model
- clinical rule
INTRODUCTION
Accurate diagnosis of influenza is important for several reasons. If the probability of disease exceeds the treatment threshold or is below the testing threshold, no further testing is needed. If office-based testing is performed, its interpretation depends on the pretest probability of disease. And, although a systematic review found that neuraminidase inhibitors are of only modest benefit in patients with undifferentiated influenza-like illness, greater benefit was seen in patients who actually had laboratory-confirmed influenza.1 Accurate diagnosis is also helpful because it enables a more accurate prognosis, implicitly rules out other diagnoses, and guides patient education; however, 2 previous meta-analyses2,3 showed that individual findings on the history and physical examination have only modest accuracy for the clinical diagnosis of influenza (Table 1⇓). These studies did find that certain combinations of variables, such as the combination of fever plus cough, the combination of fever, cough, and acute onset,3 and the combination of fever plus presentation within 3 days,2 had positive likelihood ratios for influenza between 4.0 and 5.4. These results suggest that clinical decision rules (CDRs) that integrate data from several clinical findings and are developed using multivariate methods might be helpful.
Economic analyses have shown that diagnostic testing is cost-effective only when the pretest probability of influenza is low or intermediate, whereas empiric therapy may be appropriate for patients seeking care within 36 hours of symptom onset if the pretest probability is high.4–6 CDRs to accurately identify patients who are at low, moderate, or high risk of influenza could thus identify patients for whom testing or empiric therapy may be appropriate, and others who need neither. Such identification would help physicians avoid unnecessary testing and treatment, and potentially reduce health care costs. The goal of this study was therefore to identify and evaluate the accuracy and validity of existing CDRs for the diagnosis of influenza.
METHODS
We undertook a systematic review of studies reported between 1962 and 2010 evaluating combinations of symptoms and CDRs for the diagnosis of influenza. As this was a secondary literature review, institutional review board approval was not required. We defined a CDR as a point score, equation, or algorithm developed using multivariate methods. We limited our search to CDRs using elements of the history and physical examination, including vital signs. We performed a multipronged search of the relevant medical literature. To be included in our review, a study had to (1) provide data on the accuracy of a combination of symptoms or CDR using elements of the history and physical examination in patients with respiratory tract infection, (2) enroll patients prospectively using a cohort (not case-control) design, and (3) use an adequate reference standard. We defined an adequate reference standard as any reference laboratory test for the diagnosis of influenza. As point-of-care tests are not sufficiently sensitive3 to be appropriate reference standards, we excluded studies taking that approach.
Our initial search of PubMed used the following strategy: (influenza[Title/Abstract]) AND (diagnosis[Title/Abstract]) AND (multivariate[Title/ Abstract] OR logistic[Title/Abstract] OR “prediction model”[Title/Abstract] OR “decision model”[Title/ Abstract] OR “decision rule”[Title/Abstract] OR “clinical model”[Title/Abstract] OR “clinical rule”[Title/ Abstract]) LIMITED TO abstracts, human.
This search yielded 45 studies, of which 7 appeared potentially relevant and were reviewed in more detail.7–13 Next, we searched the Clinical Queries feature of PubMed using parameters for “Clinical prediction guide (narrow)” and “influenza.” This search yielded 181 articles, of which 5 were possibly relevant,14–18 but on closer review, all dealt with prognosis of patients with influenza rather than diagnosis. Next, we used the “Related articles” feature of PubMed’s Clinical Queries service to search for studies indexed using similar key words to a particularly relevant study, that of Stein and colleagues.8 This process yielded 244 articles, of which 9 were potentially relevant, and 5 had not been found using other search strategies.2,3,19–21 A search of the references of the 12 articles deemed potentially relevant identified 10 articles for closer review, and 3 additional studies for inclusion.22–24 Finally, we searched Google Scholar using the terms “influenza clinical decision,” but did not identify any new studies among the first 200 returned search results. The Cochrane Controlled Trials Register was not searched as it is limited to studies of therapy.
Studies were abstracted in parallel and discrepancies were resolved by consensus. Study design elements (eg, size of the study, reference standard) were evaluated for each study to assess its validity. We queried authors where data were missing for multivariate models, such as the cutoff for an abnormal score, but had limited success. The search was initially performed on January 26, 2010, and was repeated on July 8, 2010, as part of the revision of this article. One potential study25 was identified during that final search, but was not included because it did not meet the inclusion criterion of prospective data collection.
RESULTS
Our preliminary search identified 12 studies that on review of the abstract appeared to meet the inclusion criteria. We excluded several articles after review of the full publication because they used a case-control design,26 did not gather original data,2 or provided information only on the predictive accuracy of a white blood cell count,9 which we did not consider a CDR. This process left 9 studies for the systematic review that met our inclusion criteria. A review of bibliographies of the 12 studies initially deemed relevant identified an additional 10 articles that appeared to fit inclusion criteria, of which 3 were included based on a review of the full article. We ultimately included a total of 12 studies in the systematic review (Table 2⇓).
The prevalence of influenza in the included studies ranged from 6.6% to 79%, and studies had widely varying inclusion criteria. Some included only children,20 whereas others included only older adults.13,21 Most were conducted in the outpatient setting, although 1 study was limited to emergency department patients20 and 2 studies were limited to inpatients.21,22 One study13 prospectively followed a large group of elderly patients through the flu season and recorded symptoms of those who sought care (many of the patients never had any respiratory symptoms during the study period).
The studies were generally of good quality (Table 3⇓), in part because of our inclusion criteria. Although a number of studies developed multivariate models, none of the studies included any kind of prospective validation of these models, such as a split-sample or bootstrap procedure (Table 4⇓).
Variables common to 2 or more studies included fever, cough, headache, and vaccination status. Some studies used variables that were difficult to generalize, such as the week of the study,7 or used definitions that were not reproducible, such as “increased influenza activity.”10 A number of studies did not report details of the multivariate model such as the constant, β coefficients, odds ratios, and the cutoff for defining a positive test,7,8,10,12,13 while others did not report the accuracy of the model in terms of sensitivity, specificity, predictive value, likelihood ratios, or area under the receiver operating characteristic curve.13,21
Several studies evaluated simple clinical heuristics such as the fever and cough rule, the fever, cough, and acute onset rule, and the cough, headache, and pharyngitis rule (Table 5⇓). The fever and cough rule was evaluated in 5 studies, and the fever, cough, and acute onset rule was evaluated in 4 studies. The positive likelihood ratio for these rules ranged from 1.7 to 6.5, and the negative likelihood ratio ranged from 0.3 to 0.8. The ratio of positive to negative likelihood ratio (a measure of the ability to discriminate between diseased and nondiseased individuals) ranged from 3.6 to 21.7.8,11–13,21,22 The area under the receiver operating characteristic curve was 0.70 for the fever and cough rule (Figure 1⇓) and 0.79 for the fever, cough, and acute onset rule (Figure 2⇓). The studies were too heterogeneous and the diagnostic threshold varied too extensively to estimate summary measures of accuracy for these simple heuristics, however. In general, surveillance studies and those with broader inclusion criteria had lower sensitivity and higher specificity,8,13,22 whereas studies enrolling patients with influenza-like illness had higher sensitivity but lower specificity.11,12,21 Only a single study evaluated a point score,13 and that study had by far the lowest prevalence of influenza, because patients were enrolled and then reported any symptom occurring during flu season. No study prospectively proposed a point score or multivariate model, or used a split-sample or bootstrap procedure to evaluate such a model.
DISCUSSION
Although influenza is common and an important source of morbidity and mortality, studies of the diagnosis of this infection are largely small, use varied inclusion criteria and reference standards, and do not report their results in a way that would assist clinicians. In many cases, the inclusion criteria for the study (fever plus at least 1 other symptom) are also among the variables being evaluated for their accuracy, a potential source of bias. No study has prospectively evaluated a clinical score, CDR, or multivariate model.
Only simple clinical heuristics, such as the fever and cough rule, and the fever, cough, and acute onset rule, have been prospectively validated. Their sensitivity and specificity varied considerably, however, and it was not possible to calculate summary measures of accuracy for these rules. In part, this inability was due to varying selection criteria and use of the variables in question as some of the inclusion criteria for patients to be studied. It could be argued that it is inappropriate to combine data from a population-based study with a low prevalence of influenza13 with data from 3 outpatient studies. It is interesting, however, that the positive and negative likelihood ratios of the studies of Govaert et al13 and Stein et al8 were almost identical despite their different populations. In addition, results of all 5 studies evaluating the fever and cough rule closely follow the same receiver operating characteristic curve, suggesting that they are measuring the same underlying construct. Because all 5 studies defined fever as a temperature of greater than 37.8°C or 38°C, it seems likely that the implicit definition of cough or how it was measured may have varied between studies.
The findings of our systematic review provide guidance for the design and conduct of future studies. For example, polymerase chain reaction should be used as the reference standard rather than culture because of its greater sensitivity for the detection of influenza.10,28 Future studies should also ensure that they have an adequate sample size and include a broad range of patients with either respiratory tract infection or suspected influenza (without regard to whether they have fever, cough, or other symptoms).
Studies to date of influenza diagnosis have not prospectively validated multivariate models, an important next step in this area of research. In addition to multivariate models, researchers should validate point scores (based on multivariate models) that are simpler to use at the point of care. They should also explore alternate analytic methods, such as classification and regression trees and artificial neural networks. The latter have been widely used in biomedical research to develop classification and pattern recognition tools.29 Originally designed to mimic the behavior of neurons, these networks are “trained” on one set of data and tested or validated on another. Mathematically, they are similar to a fully saturated multivariate model. Although prone to overfitting, the bootstrap procedure identifies the point at which error in the test set begins to rise because of overfitting.
When evaluating a patient with acute respiratory tract infection, a clinician must use information from the history and physical examination to decide among 3 courses of action: (1) rule out influenza and consider other diagnoses, (2) order a point-of-care test, or (3) treat empirically as influenza. These options are illustrated in the threshold diagram in Figure 3⇓. Researchers have not adequately considered this clinical context. In addition to developing and validating a model, future researchers should identify the most useful test and treatment thresholds, either by surveying physicians to determine when they are comfortable ruling out or ruling in influenza, or by using quantitative analysis based on the harms and benefits of testing and treating.30 Once established, these thresholds for testing and treating would be used to guide model development. These models for the diagnosis of influenza should identify at least 3 groups for whom different management strategies are indicated: low risk (do not pursue further testing or treatment for influenza), moderate risk (consider confirmatory testing), and high risk (give empiric therapy if the patient is seeking care within 48 hours of symptom onset).
What are clinicians to do? During influenza season, patients with fever and cough, especially if the onset was acute, have a high likelihood of influenza and do not require further testing unless complications such as pneumonia are suspected. For example, given a 33% pretest probability and using the primary care data from the study by Stein and colleagues,8 the post-test probability of flu in such a patient is 76%. In the so-called shoulder season leading up to and following peak influenza season, a patient with fever, cough, and acute onset has a 42% likelihood of having influenza assuming a 10% pretest probability. Conversely, during shoulder season, a patient without this symptom triad has only a 3% likelihood of having influenza. These statistics can help clinicians and patients make informed decisions about care while CDRs undergo more rigorous evaluation.
Footnotes
-
Conflicts of interest: authors report none.
- Received for publication April 6, 2010.
- Revision received August 2, 2010.
- Accepted for publication September 1, 2010.
- © 2011 Annals of Family Medicine, Inc.