Abstract
PURPOSE We undertook a study to estimate the sufficiently important difference (SID) for the common cold. The SID is the smallest benefit that an intervention would require to justify costs and risks.
METHODS Benefit-harm tradeoff interviews (in-person and telephone) assessed SID in terms of overall severity reduction using evidence-based simple-language scenarios for 4 common cold treatments: vitamin C, the herbal medicine echinacea, zinc lozenges, and the unlicensed antiviral pleconaril.
RESULTS Response patterns to the 4 scenarios in the telephone and in-person samples were not statistically distinguishable and were merged for most analyses. The scenario based on vitamin C led to a mean SID of 25% (95% confidence interval [CI] 0.23–0.27). For the echinacea-based scenario, mean SID was 32% (95% CI, 0.30–0.34). For the zinc-based scenario, mean SID was 47% (95% CI, 0.43–0.51). The scenario based on preliminary antiviral trials provided a mean SID of 57% (95% CI, 0.53–0.61). Multivariate analyses suggested that (1) between-scenario differences were substantive and reproducible in the 2 samples, (2) presence or severity of illness did not predict SID, and (3) SID was not influenced by age, sex, tobacco use, ethnicity, income, or education. Despite consistencies supporting the model and methods, response patterns were diverse, with wide spreads of individual SID values within and among treatment scenarios.
CONCLUSIONS Depending on treatment specifics, people want an on-average 25% to 57% reduction in overall illness severity to justify costs and risks of popular cold treatments. Randomized trial evidence does not support benefits this large. This model and these methods should be further developed for use in other disease entities.
- Clinical significance
- common cold
- effect size
- important difference
- outcomes
- quality of life
- questionnaires
- respiratory tract infections
- evidence-based medicine
- health policy research
- quantitative methods
- randomized clinical trial
INTRODUCTION
It is generally agreed that randomized controlled trials (RCTs) provide the least biased evidence regarding the effects of interventions on health-related outcomes, and hence are important for medical decision making and public policy. These principles apply to treatments aimed at reducing symptom burden, screening tests designed to detect disease, or preventive therapies aimed at avoiding diseases altogether.
Randomized trials are powered to detect specific effect sizes. The larger the number of participants, the smaller the effect size that can be detected. A trial that is too small may miss an effect size that might be important, whereas a larger trial might demonstrate an effect that is too small to be clinically significant. Following this rationale, many experts now agree that trials should be powered to detect a minimal important difference.1–3 This conceptual entity was first defined in 1989 to be “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troubling side effects and excessive cost, a change in the patient’s management.”4 Although an important addition to the theory and practice of evidence-based medicine, minimal important difference is limited in that it does not account for negatively valued consequences of medical interventions (harms), such as costs, side effects, and risks of adverse effects.
In 2005 we defined “sufficiently important difference” (SID) as “the smallest amount of patient-valued benefit that an intervention would require in order to justify associated costs, risks, and other harms.”5 We consider SID to be very similar to the “smallest worthwhile effect” concept described elsewhere.6–9 The advantage of SID is that it extends the notion of minimal important difference, has an operational definition, and can be estimated using benefit-harm tradeoff methods. Using reduction of illness duration in the common cold as the desired benefit (outcome), we previously showed how benefit-harm tradeoffs could serve as a method of SID estimation.10 In this article, we report on a second sample of respondents and assess SID as overall severity reduction benefit.
METHODS
Using community advertisement, we interviewed respondents with acute respiratory tract infection, presumed viral (common cold). This study was carried out alongside other common cold research projects using shared study promotion and screening methods, one of which was an RCT testing echinacea, placebo, and doctor-patient interaction.11 Another project was aimed at assessing validity of the Wisconsin Upper Respiratory Symptom Survey (WURSS),12 an illness-specific quality-of-life questionnaire instrument. For the WURSS validation study, participants were enrolled within 48 hours of their first cold symptom, then monitored with daily questionnaires until the illness had resolved. To provide data for the current SID study, participants in the WURSS validation study were asked benefit-harm tradeoff questions at enrollment (intake, within 48 hours of first cold symptom) and again at exit, after the illness had resolved. These participants make up the prospective, in-person group of the SID study. The same benefit-harm tradeoff questions were administered to a second group of participants interviewed by telephone. Participants responding by telephone also had self-described colds but were interviewed only once and had extended inclusion criteria allowing symptoms for up to 7 days.
To be eligible for either arm of this study, prospective adult participants had to answer “yes” to the question, “Do you think that you have a cold or are coming down with a cold?” They also had to report at least 1 of 4 cold symptoms (sneezing, runny nose, nasal obstruction, or sore throat), and to have a total Jackson score of at least 2 points. Jackson scores13–15 are simple sums of severity ratings (1 = mild, 2 = moderate, 3 = severe) for 8 symptoms: those noted above plus cough, headache, chilliness, and malaise. For prospective participants with eye or nose itching, sneezing, or history of allergy, the interviewer, as well as the participant, had to be convinced that the participant’s symptoms indicated a cold, not an allergic illness. Interviewers were also given permission to exclude potential participants whom they suspected were dishonest when reporting symptoms or whom they believed would not be competent to follow study protocol. These instances occurred rarely. Interviewers questioned prospective participants by telephone and then again in person using semistructured interview techniques aimed at stimulating recall to enhance accuracy of symptom reporting. All interviewers were trained and supervised by the senior author (B.B.), a family physician and anthropologist who had substantial experience with interview methods and patients with acute respiratory tract infection. The protocol was approved by the Institutional Review Board of the University of Wisconsin School of Medicine and Public Health.
Interviews began with the following statement:
We are trying to understand what people think about when making decisions about treating their colds. Specifically, we’re interested in how much benefit you would want to expect in exchange for the costs and possible side effects of a given treatment. Benefit can come in the form of reduced severity and/or decreased duration of illness. I would like to describe 4 different cold treatments, then have you tell me whether or not you would want to take these medicines, and why or why not.
Next, the participant was presented with 1 of the following scenarios:
A 10-cent vitamin pill must be taken 3 times daily for the first 3 days of your cold. There are no significant risks or side effects to this treatment. It is unlikely that the length of your cold would be reduced significantly. Severity of symptoms might be reduced by as much as 30%.
A 20-cent lozenge must be dissolved in the mouth every 2 to 3 hours while awake for the first 3 days of your cold. Side effects may include bad taste, and, very occasionally, nausea. It is possible that the length of the cold could be reduced slightly. Severity of symptoms might be reduced by as much as 30%.
A 50-cent dropperful of an herbal extract must be taken 3 times each day for the first 3 days of your cold. Side effects are limited to bad taste. It is possible that the length of the cold could be reduced slightly. Severity of symptoms might be reduced by as much as 30%.
A $2 prescription-only pill must be taken 3 times daily for the first 3 days of the cold. Side effects are unknown. Preliminary data suggests an average 24-hour reduction in the length of your cold. Severity of symptoms might be reduced by as much as 30%.
The scenarios were presented in varied order, so that each scenario had an approximately equal chance of being considered first, second, third, or last. After each scenario was presented, participants were asked, “Would you take this treatment?” and then, “Why?” or “Why not?” Brief notes were taken regarding the answers to these qualitative questions. Next, participants who had answered “yes” to the original question were asked: “Would you take this [treatment] if it were able to reduce severity by 20%?” If the answer was still “yes,” the hypothetical severity reduction was lowered to “10%,” then if still “yes,” it was lowered to “5%,” and, finally, “any?” If the original answer was “no,” severity reduction benefit was increased to “40%,” then if still “no,” it was increased to “50%,” then “75%.” Severity reduction SID was defined as the smallest severity reduction that justified the treatment scenario for that participant.
The scenarios represent our interpretation of best evidence available when the study began. Although potential severity reduction benefits were varied beyond what many experts would consider reasonable (ie, a 75% overall severity reduction is unlikely), the initial scenarios were designed to be realistic. Toward this end, we reviewed trial reports and systematic reviews16–19 and aimed for a brief scenario that was both easy to understand and evidence-based.
After benefit-harm trade-off interviews were completed, we gathered descriptive data, including age, sex, tobacco use, ethnicity, income, and educational achievement. Responses were scored on paper forms by the interviewer for telephone interviews, and directly on paper by the participant for the in-person interviews. Data were entered twice, then cross-checked, with discrepancies resolved by comparison with paper sheets. Statistical analysis began with tabular and graphical inspection, followed by assessment of missing data and outliers. We then proceeded to correlation matrices, analysis of variance, and multivariate regression models. We assessed potential relationships of demographic variables (age, sex, smoking, ethnicity, education, income) and Jackson severity scores with the primary SID severity reduction outcome variable. Whereas data representing SID were ordinal in nature, the underlying SID domain was considered to be continuous. We did not assume that SID distributions would be normal and used nonparametric as well as parametric methods in our analysis strategy. Multivariate models were developed using PROC MIXED in SAS 9.1.3 (SAS Institute, Cary, NC, 2001). These models assessed within-person effects assuming a compound symmetry variance matrix.
RESULTS
From May 6, 2003, when the study began until August 22, 2005, when data collection ended, 983 people contacted our research team, and 253 enrolled in 1 of the 2 groups reported here. Of the 730 not enrolled in this study, 217 joined another study, 201 did not meet inclusion criteria, 128 declined to participate, and 43 were simply calling for information. Some 141 could not be categorized meaningfully. Of those excluded, 55 were thought to have allergy or an illness other than a cold, 35 had symptoms for more than 7 days, 25 were younger than 18 years, 19 were considered unreliable after the screening interview, and 67 were excluded for a variety of other reasons. (Our screening protocol allowed people to be excluded for more than one reason.)
Of the 253 participants enrolled in this study, 162 completed a single benefit-harm trade-off interview by telephone. The remaining 91 were enrolled in person within 48 hours of first cold symptom, were followed up with daily reports until their cold had resolved, and then were interviewed again in person using the same questions they were asked at intake. Thus, the database for this report includes data from 162 telephone and 182 in-person interviews, representing SID values for the 253 participants.
Table 1⇓ shows that our sampled population was reasonably diverse in terms of income and education, but mostly female (67.8%) and mostly white (68.4%). Jackson scores were similar for telephone interviews (mean 9.6; SD 3.7) and the intake interviews (mean 9.7; SD 3.6). Demographic measures were collected for all prospective participants, but for only 117 of 162 participants responding by telephone because of interviewer error during the first few weeks of the study. To calculate indicators of central tendency and variability for the SID variable, “any” and “never” responses were conservatively assigned values of 5% and 88%, respectively.
As in our previous study,10 the scenario based on vitamin C received the most favorable ratings, with a mean severity benefit SID of 24.6% (95% CI, 0.23–0.27). Of the 253 participants, 77 (30%) said they would take the vitamin regardless of any severity benefit, and 8 (3%) said they would not even if global severity reduction was 75% or greater (Table 2⇓). The scenario based on the herbal medicine echinacea yielded the next most favorable responses, with a mean SID of 31.9% (95% CI, 0.30–0.34). Among the 253 participants, 39 (15%) said they would take the herbal medicine regardless of severity benefit, and 18 (7%) said they would not regardless of benefit. The scenario based on zinc lozenge led to a mean SID of 46.9% (95% CI, 0.43–0.51), with 12 (5%) saying they would take the lozenge regardless of severity benefit, and 37 (15%) refusing. Finally, the antiviral scenario had a mean SID for severity benefit value of 57.2% (95% CI, 0.53–0.61), with 12 (5%) saying they would take an antiviral treatment regardless of benefit, and 94 (37%) rejecting such treatment. The range of treatment-specific responses is illustrated in Figure 1⇓.
Extending the analysis a step further, we calculated a value to represent SID for cold treatments in general. Averaging across all 4 scenarios, the general mean severity benefit SID was calculated to be 40.3% (95% CI, 39.3–41.4). To reflect values of those who were sick, and to represent participants equally, only intake and telephone interview data were averaged when computing these values.
Scatter plots, correlation matrices, and simple linear regressions were used to look for potential relationships between SID and the covariates of age, sex, ethnicity, education, household income, Jackson severity score, and nature of interview (telephone, intake, exit). Next, multivariate regression equations were constructed to account for potential covariate influence on SID values. Very little in the way of statistically significant associations were found. Considering multiple comparisons, those associations that were found could be due to chance (Table 3⇓). In both bivariate and full multivariate models, sex was significantly associated with SID for lozenges (coefficient = 0.09; P < .05), but not for other treatments. Higher education was significantly associated with lower SID for vitamins in bivariate analysis (fixed-effect F score = 2.98; P <.05) and in the multivariate model (fixed-effect F score = 2.57; P<.05), but there was no clear dose-response relationship. Education was not significant in other estimates of SID. No other association reached statistical significance.
DISCUSSION
This study represents the second phase of benefit-harm trade-off interviews aimed at assessing sufficiently important difference (SID) for common cold. In the first study,10 reduction in duration of illness was the primary benefit domain under investigation. In the present study, overall severity reduction was assessed. The 2 studies yielded very similar results. In both studies the 4 treatment scenarios followed the same order of preference, with the vitamin requiring the least benefit to justify treatment, then an herbal medicine, then lozenge, and finally a prescription pill. In both studies, neither demographic indicators (age, sex, ethnicity, education, income) nor severity of illness at time of interview appeared to influence SID value judgments.
In both studies, heterogeneity characterized both between-person and between-scenario distributions. For example, although many participants would accept a treatment for small benefit (10% or less), many others required larger benefits (50% or greater). Similar between-person differences were seen in the first study, with many saying they would take treatments for small duration reductions (12 hours or less for a 6-day cold), but many others indicating the need for larger benefits (36 hours or more).10 Despite this heterogeneity, specific treatment scenarios yielded distinctive response distributions. In the current study, for example, mean severity-reduction SID for the prescription antiviral drug scenario (57%) was more than twice that resulting from the vitamin C scenario (25%). This between-scenario difference was even more pronounced in the first study, where the mean SID reduction in length of a 6-day cold was 83 hours (57%) for the prescription pill scenario, but just 26 hours (18%) for the vitamin.10
Qualitative responses in the 2 studies were also similar and helped make sense of response patterns. Participant responses suggested that between-scenario differences were partially due to negatively valued costs and risks. For the prescription pill scenario, responses were influenced by the implication that potential side effects were not well known. Additionally, the need to see a physician was reported as a barrier by some participants. Regarding the lozenge scenario, several participant responses suggested that the use of the word “nausea” negatively influenced responses. Overall, we interpret these findings to support the SID concept and the benefit-harm trade-off method. Effect size in and of itself is not sufficient to justify treatment. Instead, expected benefits should be interpreted within the specific context of associated costs and risks.
Another consistent finding, and one more pronounced in the first study, was the trimodal nature of the SID data distribution. Some people will take a treatment regardless of the benefit under consideration. Some will not, even when hypothetical benefits are large. The majority, however, require a certain amount of benefit to justify costs and risks, which makes sense and fits with both clinical experience and psychological theory. People are (somewhat) rational, and make choices based on perceived likelihood and magnitude of both positively and negatively valued outcomes, but since different people value health-related domains differently, there is diversity in SID magnitudes across populations.
When benefits and harms are made explicit and portrayed in simple language, population distributions of SID are characteristic, reproducible, and largely unaffected by age, sex, ethnicity, income, education, and severity of illness at time of interview. By including 2 sets of participants (in-person and telephone) with varying degrees of illness severity, we were able to show that SID value judgments are influenced by positive and negative aspects of treatments, but not by demographic characteristics, and, in this study, not by severity of illness at time of interview. We must caution that this finding may not generalize to other illness conditions. Indeed, we suspect that preference patterns may be more stable for common cold than for other disease entities, as most people have experienced numerous colds and thus have had a chance to solidify their expectations, values, and preferences regarding cold treatments.
It is relatively clear that existing cold treatments do not provide the SID desired by our participants. Although space does not permit a review of available evidence,20–22 the existence of any benefit is controversial for most, if not all, cold treatments. Even the more optimistic interpretations of existing evidence fall short of supporting overall severity reductions in the 25% to 57% range indicated as necessary by our participants.
This conclusion, however, may be trivial compared with the implications should similar results be found for other medical treatments, where effects sizes are usually quite modest. For example, best evidence suggests that a person with mild or moderate depression might expect a 1- or 2-point reduction on the Hamilton Depression Rating Scale, where 20 points is considered indicative of depression.23–25 For Alzheimer-type dementia, where 100 points may be expected on the Alzheimer Disease Assessment Scale cognitive function subscale, a 1- to 3-point improvement may be attributable to cholinesterase inhibitors.26–28 For both classes of medication, monetary costs are substantial and side effects are common. If patients (or their families or caregivers) were provided simple descriptions of likely benefits and harms, how many would choose these treatments? Of the millions currently taking these medicines, how many have been provided an accurate description of expected benefits, costs, and risk of harm?
In our opinion, there have been too few investigations into health values in general and into the nature of clinical significance in particular, which is unfortunate, as medical decision making, as well as trial design and interpretation, is inextricably linked to these conceptual entities. The introduction and development of minimal important difference were milestones, as benefits reaching the minimal important difference threshold must not only be of statistical significance but must also be recognized and valued by patients. The concept of a SID raises the bar another notch, as effect size meeting a SID must also be sufficient to justify costs and risks. Finally, we would like to note that the benefit-harm trade-off methods portrayed here and previously5,10 are only one method of estimating SID. Others will surely be invented, as SID (smallest worthwhile effect6–9) serves as both the effect size for which trials should be powered and as the benchmark by which they should be judged.
Acknowledgments
The Department of Family Medicine at the University of Wisconsin School of Medicine and Public Health provided an institutional base and collegial support. Laurie Draheim, Shari Barlow, and Rob Maberry conducted the interviews. Gay Thomas managed study promotion and other activities. We would also like to acknowledge the many research respondents who, during a time of illness, generously contributed their time, energy, and ideas.
Footnotes
-
Conflicts of interest: none reported
-
Funding support: This work was made possible by a career development grant from the Robert Wood Johnson Foundation Generalist Physician Faculty Scholars Program.
- Received for publication September 19, 2006.
- Revision received December 19, 2006.
- Accepted for publication December 26, 2006.
- © 2007 Annals of Family Medicine, Inc.