Abstract
The goal of evidence-based clinical guidelines is to improve the value of health care by recommending treatments with favorable benefit/harm ratios. Achieving this goal requires use of evidence-grading systems that explicitly address strength of evidence in terms of external validity (generalizability), internal validity, and patient-oriented outcomes. To be clinically useful, guidelines should also incorporate patient preferences, particularly when evidence is weak. The National Heart, Lung and Blood Institute recently published Expert Panel Report 3: Guidelines for the Diagnosis and Management of Asthma (EPR-3). This special report addresses the extent to which current guidelines adhere to the principles enunciated above by using EPR-3 as the prime example. EPR-3 used an unconventional evidence-grading system that emphasized precision and consistency (statistical significance, large sample sizes, and/or consistency of results) at the expense of patient-oriented outcomes and generalizability (applicability to the general population). EPR-3 did not report information on numbers needed to treat or numbers needed to harm, which are useful in eliciting patient preferences via shared decision making. Asthma guidelines (and others) are limited by lack of a generalizable research base, flawed evidence grading, and lack of attention to patient preferences. An evidence-grading system based on applicable populations, patient-oriented outcomes, and shared decision making might improve physician and patient guideline adherence and improve asthma outcomes.
Annals Journal Club selection—see inside back cover or http://www.annfammed.org/AJC/.
INTRODUCTION
Effective implementation of guidelines requires 3 interrelated processes: (1) an explicit assessment of the strength of the best available medical evidence (evidence quality), (2) application of clinical judgment in the care of individual patients, and (3) elicitation of patient preferences via shared decision making. Most current evidence-based guidelines address only the first of these 3 processes (evidence quality). Implementation requires the addition of tailoring guidelines to individuals because “evidence alone is never sufficient to make a clinical decision.”1 With a few notable exceptions,2,3 current versions of evidence-based guidelines do not address patient preferences or include tools for shared decision making. Although patient preferences have not historically been at the heart of evidence-based medicine,4 an emerging consensus seems to be developing that patient preferences should be included,5–7 particularly when evidence is weak.8 The growing body of evidence-based shared decision-making tools should facilitate elicitation of patient preferences.9 This line of reasoning suggests that guidelines should present all relevant benefits (as numbers needed to treat [NNT]) and harms (as numbers needed to harm [NNH]) of the best medical evidence in terms understandable to physicians (and patients) in order to facilitate the shared decision-making process.
The National Heart, Lung and Blood Institute recently published its updated guidelines on asthma: Expert Panel Report 3: Guidelines for the Diagnosis and Management of Asthma (EPR-3).10 The first expert panel report, published in 1991,11 was criticized by Berg and Moy12 for lack of an evidence-based process for developing and communicating the guidelines. The EPR-3, the third such report, states that it has overcome these critical flaws by adopting an evidence-based methodology. This special report aims to illustrate strengths and deficiencies of current evidence-based guidelines using EPR-3 as the prime example. It should be noted, however, that some or all of the guideline limitations discussed in this analysis may apply to other disease guidelines as well as to asthma.13
ASSESSING EVIDENCE QUALITY: 4 IMPORTANT QUESTIONS
Practitioners and patients assessing evidence-based guidelines should ask themselves at least 4 questions: Are study results true, correct, or valid (internal validity); do the results relate to something that I or my patients value (patient-oriented outcomes); are the results applicable to me or to my patients (generalizability or external validity); and are all clinically relevant benefits (and harms) considered?1 Evidence-grading systems include a hierarchy of evidence that usually acknowledges the primacy of randomized controlled trials (RCTs), then nonexperimental (observational) studies, then expert opinion.1 Some systems, including the Strength of Recommendation Taxonomy (SORT),14–16 additionally emphasize the primacy of patient-oriented (eg, morbidity, mortality, quality of life) outcomes, then surrogate (eg, laboratory, histopathologic) outcomes. Regarding generalizability, evidence-grading systems and guidelines may neglect to emphasize that the results of internally valid RCTs using patient-oriented outcomes may not be applicable to the populations specified in a guideline recommendation and should not always be assumed to provide high-quality evidence for therapy recommendations.13 Clinicians must therefore remain vigilant about the validity of recommendations of guidelines that claim to be evidence based. In addition, clinicians should be aware that some guidelines contain considerable material that has not been critically assessed for evidence quality (see Supplemental Appendix 1, available online at http://www.annfammed.org/cgi/content/full/7/4/364/DC1 for further details).
EVIDENCE-GRADING SYSTEMS
The EPR-3 described an explicit approach to a literature retrieval, review, and evidence-grading process for these sections of the guideline: the 4 components of asthma management: (1) measures of asthma assessment and monitoring, (2) education for a partnership in asthma care, (3) control of environmental factors and comorbid conditions that affect asthma, and (4) medications. The EPR-3 ranked newer evidence using the evidence grading system described in Table 1⇓. An evidence category A recommendation for randomized trials required “a consistent pattern of findings in the population for which the recommendation was made” and on “substantial numbers of studies involving substantial numbers of participants.” The system assigned primacy to quantity and consistency of evidence without mentioning generalizability or patient-oriented outcomes. Inexplicably, the EPR-3 system downgraded evidence-based meta-analyses, including those produced by the Cochrane Collaboration, to B-level evidence. In support of this strategy, EPR-3 cited a study by Jadad et al,17 but this citation actually reported that Cochrane reviews were more rigorous and better reported compared with reviews published in peer-reviewed journals or funded by industry. Importantly, the EPR-3 grading system did not explicitly assess the quality of individual studies, and there was no transparency in the linkage between individual study quality and strength of recommendation. These deficiencies are illustrated in example 1. The EPR-3 grading system also insufficiently addressed generalizability, as shown in example 2.
Ranking the Evidence: Expert Panel Report 3 (EPR-3): Guidelines for the Diagnosis and Management of Asthma
The SORT evidence-grading system has been adopted by most family medicine journals.14–16 SORT emphasizes the primacy of patient-oriented over surrogate outcomes and explicitly describes criteria for internal validity (Figure 1⇓). In keeping with well-accepted principles of evidence-based medicine, SORT also acknowledges the primacy of evidence-based meta-analyses of high-quality RCTs.1
Strength of Recommendation Taxonomy (SORT): evidence-grading system for individual studies.
Reproduced by permission from the American Board of Family Medicine.
Here are some examples of how the EPR-3 guideline recommendations appear differently when viewed through the SORT lens:
Example 1: The authors recommend various allergen-avoidance measures, including encasing mattresses in allergen-impermeable covers.1 They cite 10 supporting references, including an editorial, a review, a before-after study, a study on rhinitis, a study involving multiple interventions and allergens, and a study excluded from a Cochrane review18 because it included some patients who did not have dust mite sensitivity. The remaining 5 trials did not show a benefit for mattress encasings.18 The EPR-3 did not cite the Cochrane meta-analysis of 49 RCTs (2,733 patients) finding no evidence for effectiveness.18
Example 2: Despite using the phrase “in the populations for which the recommendation was made” (Table 1⇑), the EPR-3 does not acknowledge that evidence-based recommendations for inhaled corticosteroids (ICS) in asthma apply exclusively to nonsmokers. First, current smokers and those who have consumed more than 10 pack-years are routinely excluded from pharmaceutical-sponsored trials in pursuit of Food and Drug Administration approval for ICS in asthma. Second, there are several non–pharmaceutical-funded trials showing that ICS treatment in asthmatic smokers does not improve patient-oriented outcomes.19–21 Furthermore, in nonsmoking asthmatics with less than a 10 pack-year smoking history, a recent trial sponsored by the National Heart, Lung and Blood Institute found that ICS did not improve patient-oriented outcomes in 33 (46%) of 72 asthma patients.22
The EPR-3 guidelines were produced by a group of American experts. A very similar guideline (Global Initiative for Asthma [GINA]) was produced by a group of experts worldwide.23 The GINA report uses the identical EPR-3 grading system (Table 1⇑) but states that it is considering adoption of a different grading system (Grading of Recommendations Assessment, Development and Evaluation [GRADE])5 in future guidelines. GRADE was developed by a widely representative group of international guideline developers.6 As does SORT, GRADE clearly separates evidence quality assessment from strength of recommendation and includes patient-oriented outcomes. Importantly, GRADE also specifies generalizability and explicitly acknowledges values and preferences.8,24–27 For the clinician, GRADE endorses elicitation of patient preferences, particularly when evidence is weak.6 For the policy maker, GRADE incorporates resource allocation and consensus building.26,27 GRADE has been endorsed by the American Thoracic Society but has not yet found its way into contemporary asthma guidelines.28
PATIENT PREFERENCES
The EPR-3 guidelines recommend daily low-dose ICS for mild persistent asthma (p. 343), yet prescribing of ICS for mild persistent asthma is controversial.29 There are few generalizable effectiveness trials that yield information for shared decision making, but those that are available appear to support a weak recommendation. The only available SORT level-1 trial of the population effectiveness of ICS in mild persistent asthma of recent onset is START,30 a large, multinational, randomized effectiveness trial in children and adults that did not exclude smokers. START randomized 7,241 patients in 32 countries to inhaled budesonide or placebo once daily for 3 years. The primary outcome was time to first “severe asthma-related event” defined as admission, emergency treatment, or death from asthma. The 3 components were not reported separately, so it is not possible to judge the actual clinical severity of outcome events, although no deaths were reported. One hundred ninety-eight (5.5%) of 3,568 patients assigned placebo and 117 (3.3%) of 3,597 assigned budesonide had a “severe asthma-related event” (NNT = 44 to prevent 1 exacerbation over 3 years, P <.001). No information was provided on whether smokers and nonsmokers responded differentially to treatment. Regarding harms, low-dose budesonide caused growth retardation in children aged 5 to 15 years (–0.43 cm/y, P <.001). Growth retardation in children taking ICS has also been reported in another study.31 START provides the kind of information needed for shared decision making. Nowhere in the EPR-3 guidelines is there mention of the number needed to treat (NNT), number needed to harm (NNH), or lack of effectiveness of ICS in smokers. Including this kind of information in future guidelines would facilitate shared decision making and would allow individual patients (or their parents) to apply their personal values. Clinicians should realize, however, that NNTs derived from poor-quality asthma RCTs may not be valid for shared decision making (see Supplemental Appendix 2, available online at http://www.annfammed.org/cgi/content/full/7/4/364/DC1 for an example).
Shared decision making depends on the quality and completeness of the underlying research, as well as on the completeness of its presentation in guidelines. An analytical review of 504 trials of ICS found significant associations between specific study design characteristics and the likelihood of reporting statistically significant results for adverse effects.32 Compared with other study designs, the likelihood of reporting adverse effects was lower in randomized trials, in those with an efficacy design, in those studying only nonspecific clinical adverse effects, and in studies of children; studies focusing on specific adverse effects—ie, growth, cortisol, and higher steroid doses—were more likely to report adverse effects.32 In a comparison of 275 industry-funded and 229 non–industry-funded studies, industry funding was significantly associated with many of the study designs reporting fewer adverse effects, including more randomized trials, more studies stating efficacy as an aim, more studies of only nonspecific clinical or laboratory results, fewer studies of specific adverse effects, and fewer studies stating safety as an aim or as the only aim.32 In addition to a decreased likelihood for industry-funded studies to report statistically significant adverse effects, the authors of industry-funded studies that did report such adverse effects were less likely than other authors to conclude that these were clinically important. For example, of studies that found statistically significant adverse effects, 41.8% of 79 industry-funded studies vs 11.4% of 132 non–industry-funded studies concluded that the treatment was safe (prevalence ratio 3.68; 95% CI, 2.14–6.33).32 The authors of the analytical review stated:
[E]xtrapolation of statistical to clinical significance is based on subjective criteria, so we cannot estimate if (industry-funded studies) are too benevolent or (non–industry-funded) studies are too cautious. However, we postulate that having information on source of funding will help readers of these studies have a better informed and balanced judgment on the authors interpretations.32
A current characteristic of the National Institues of Health asthma expert panel report guideline development process is that many of the same experts who sit on the panel also perform the research—often industry funded—that forms the basis for the guidelines.
IMPLICATIONS FOR PRIMARY CARE PRACTICE
Do the identified deficiencies negate the guidelines entirely? ICS treatment is currently the cornerstone of asthma control, but there is growing awareness that ICS treatment has significant limitations, as described above. Will patients who stop smoking regain ICS responsiveness? This important clinical question cannot be answered at this time because this particular research question will never be recognized by an asthma research agenda that continues to exclude smokers. Other parts of the guideline, including asthma self-management, are supported by better evidence. One wonders, however, how much better management outcomes would be if the limitations of ICS treatments were recognized and addressed.
The implications for primary care are in general are as follows: (1) be vigilant and skeptical when reading evidence-based guidelines; (2) in particular, recognize the limitations of current asthma treatments; (3) share this information in a balanced way with your patients; and (4) at every opportunity be an advocate for more clinically relevant research. Look for shared decision-making tools that are unbiased and that can be used in the office. Consider whether any guideline recommendations made without such information are useful.
CONCLUDING REMARKS
Less than one-third of asthma patients nationwide adhere to the expert panel report guideline recommendations for asthma treatment.33 Some believe that limitation in primary care physicians’ knowledge is largely responsible for lack of guideline adherence. More attention should be paid to the limitations of the guidelines and underlying evidence base themselves as causes for the current gap between recommendation and practice. An Agency for Health Research and Quality technology assessment on asthma treatment commissioned by the expert panel34 states:
The overriding priority is to develop a national research agenda for long-term studies to improve the effectiveness of asthma management. Short-term drug efficacy studies are over-represented in the current literature. It is imperative to develop an evidence base that supports clinical decisionmaking on the intensity of treatment, optimization of medication regimens, and utility of disease management interventions for various asthma populations.34
This special report advocates that future asthma guidelines (1) should adopt a conventional evidence-grading system that accounts for generalizability and patient-oriented outcomes, and (2) should present all relevant outcomes as NNT and NNH to facilitate shared decision making. GRADE appears to be the most comprehensive grading system available to account for evidence quality, patient-important outcomes, and patient values and preferences. Adoption of GRADE or its equivalent would serve to highlight the limitations of current asthma research, promote effectiveness research in a broad sample of the kind of asthma patients encountered daily, and discourage efficacy research in highly selected nonrepresentative sub-samples. When coupled with effective decision aids,9 such guidelines might improve the overall effectiveness of care delivery.
Footnotes
Conflicts of interest: none reported
- Received for publication June 30, 2008.
- Revision received November 14, 2008.
- Accepted for publication December 1, 2008.
- © 2009 Annals of Family Medicine, Inc.