Abstract
PURPOSE Current health literacy screening instruments for health care settings are either too long for routine use or available only in English. Our objective was to develop a quick and accurate screening test for limited literacy available in English and Spanish.
METHODS We administered candidate items for the new instrument and also the Test of Functional Health Literacy in Adults (TOFHLA) to English-speaking and Spanish-speaking primary care patients. We measured internal consistency with Cronbach’s α and assessed criterion validity by measuring correlations with TOFHLA scores. Using TOFLHA scores <75 to define limited literacy, we plotted receiver-operating characteristics (ROC) curves and calculated likelihood ratios for cutoff scores on the new instrument.
RESULTS The final instrument, the Newest Vital Sign (NVS), is a nutrition label that is accompanied by 6 questions and requires 3 minutes for administration. It is reliable (Cronbach α >0.76 in English and 0.69 in Spanish) and correlates with the TOFHLA. Area under the ROC curve is 0.88 for English and 0.72 for Spanish versions. Patients with more than 4 correct responses are unlikely to have low literacy, whereas fewer than 4 correct answers indicate the possibility of limited literacy.
CONCLUSION NVS is suitable for use as a quick screening test for limited literacy in primary health care settings.
INTRODUCTION
In 2004, the Institute of Medicine (IOM), the Agency for Healthcare Research and Quality (AHRQ), and the American Medical Association (AMA), all issued reports on health literacy.1–3 Health literacy is the degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions.4 It involves the ability to use and interpret text, documents, and numbers effectively—skills that might seem to be distinct but are highly correlated with one another.1,5–7
The IOM, AHRQ, and AMA reports all noted that large segments of the American population—as many as one half of all adults—lack the literacy skills needed to function adequately in a health care environment. They would not, for example, be able to reliably and consistently determine the proper dose of cold medicine for a child, nor would they be able to read and understand informed consent documents.8,9 Individuals with limited literacy come from all segments of society, and most are white, native-born Americans.10
Individuals with limited literacy have less knowledge about their health problems,11–16 more hospitalizations,17,18 higher health care costs,19,20 and poorer health status21–25 than those with adequate literacy. The relation between limited literacy and these factors is consistent across studies and persists after adjusting for confounding sociodemographic variables. With awareness of patients’ literacy skills, health information can be tailored for delivery to patients in a format they can understand.26
Although health literacy is a complex and multifaceted construct, researchers have developed instruments that assess literacy skills using health-context materials. Two such literacy assessments are widely used. One is the Test of Functional Health Literacy in Adults (TOFHLA),27,28 which is the instrument most often used for literacy assessment in health care research. The TOFHLA is available in English and Spanish and has good psychometric characteristics, but the length of time required for administration of the TOFHLA (18 to 22 minutes for the full version and 7 to10 minutes for a short version) precludes its use in busy primary care settings.29 The second test, the Rapid Estimate of Adult Literacy in Medicine (REALM), can be administered quickly (less than 3 minutes) but it, too, has limitations. In particular, the REALM is only available in English.5,28 This report describes the validation of a new rapid literacy assessment instrument in both English and Spanish, using the TOFHLA as the reference standard.
METHODS
Overview
We recruited English-speaking and Spanish-speaking patients from primary care clinics. The full TOFHLA and candidate test items for the new health literacy instrument were administered to the patients. Statistical tests were then used to determine which of the candidate test items from the new instrument best correlated with results of the TOFHLA. The University of Arizona Human Subjects Protection Committee approved our methods, and all participants gave informed consent.
Instruments
Test of Functional Health Literacy in Adults
The TOFHLA is a 2-part test that is available in both English and Spanish. The first part provides participants with medical information or instructions about various scenarios, such as instructions on a prescription label or instructions about preparation for a diagnostic procedure. Participants review the scenarios and then answer questions that test their understanding of the information in the scenarios. The second part of the TOFLHA is based on the Cloze method, in which participants are given passages of text about medical topics with selected words deleted and replaced with blank spaces. The participants must fill in the blank spaces using words selected from a multiple choice list of options, identifying the words most appropriate to the context of the passage.
TOFHLA scores can range from 0 to 100, with higher scores indicating better literacy. Scores of <60 represent inadequate literacy, 60 to 74 represent marginal literacy, and >75 represent adequate literacy. Individuals with TOFHLA scores in the inadequate or marginal range (ie, score of <75) would likely have trouble understanding written material that requires a 7th-grade reading level or higher, and often need assistance to understand completely instructions for their medical care.
Scores on the TOFHLA correlate with scores on standardized reading tests used in general education, such as the Revised Wide Range Achievement Test (r = 0.74)27,32 and also with scores on the REALM (r = 0.84).27
Newest Vital Sign
Our new literacy screening tool, the Newest Vital Sign (NVS), was developed from a series of scenarios. Patients were given health-related information, which the patients read and then demonstrated their ability to use the information by answering questions about the scenarios. The questions were scored as either correct or incorrect according to a scoring key provided to the interviewers. The Spanish version was developed by translation and back-translation of the English version.
The development of the new instrument involved serially testing candidate scenarios and candidate questions on more than 1,000 patients. This preliminary testing was undertaken with the same patient population (though different patients) on which final testing was performed.
The original scenarios were developed by a panel of health literacy experts based on the concepts and types of scenarios used in health literacy research and in general literacy (reading and writing) assessments, such as the National Adult Literacy Survey33 and the Instrument for Diagnosis of Reading,34 and in health-literacy assessment instruments, such as the TOFHLA.
The scenarios and questions were refined after feedback from patients, interviewers, and data analysts about the clarity and ease of scoring of items. Ultimately, they were reduced to 5 candidate scenarios: (1) instructions from a prescription for headache medication, (2) a consent form for coronary angiography, (3) heart failure self-care instructions, (4) a nutrition label from an ice cream container, and (5) instructions for asthma medication that included a tapering steroid dose. These candidate scenarios varied in the type of literacy skills needed for understanding. Some, such as the angiography consent form and heart failure self-care instructions, emphasized the ability to understand text. Others, such as the nutrition label, placed greater emphasis on the ability to use numbers and mathematical concepts (numeracy). The inclusion of scenarios that involved both reading and numeracy skills was driven by research indicating that these skills are highly correlated with one another,1,5–7 plus an intuitive understanding that patients must be able to use and understand both text and numbers if they are to successfully deal with today’s health care system.
A total of 21 questions (3 to 6 questions per scenario) accompanied the 5 scenarios. This 21-item pool of questions was administered to 500 participants in this study.
A final short form of the test, the NVS, was selected from this 21-item pool on the basis of its psychometric properties. The NVS uses 1 scenario (the ice cream nutrition label) and 6 questions (Figures 1a⇓ and 1b⇓, and Supplemental Appendix 1, which is available online only at: http://www.annfammed.org/cgi/content/full/3/6/514/DC1). The reliability, validity, and accuracy of the 1-scenario 6-item final version are reported in detail in this report. A summary of the reliability and validity of the other scenarios is provided in Table 1⇓.
Participants
Participants were patients from 3 primary care practices in Tucson, Ariz, all of which are affiliated with the University of Arizona College of Medicine. Two practices are operated by the college’s faculty practice plan, and 1 is a publicly funded clinic located in a primarily Spanish-speaking area of Tucson.
Sample Size
Based on pilot data indicating that the prevalence of adequate health literacy in our patient population is about 75% (defined as a score >74 on the TOFHLA), sampling error calculations revealed that with a random sample of 492 patients, we could be 95% confident that 71.5% to 78.5% of the participants would score >74 on the TOFHLA. Thus, we collected data on 500 patients (250 in each language group). This sample size provided a power of 0.90 for an independent samples t test, when mean differences between groups were equal to 0.40 standard deviations and α was set at 0.05.
Participant Recruitment
Participants had to be 18 years old or older, speak English or Spanish, have visual acuity sufficient to read the instruments being tested, and have grossly normal cognitive function that was adequate to interact with study personnel. Participants completing the interview received a $20 supermarket gift certificate.
Bilingual project staff interviewers approached consecutive patients in the waiting rooms of the 3 clinics during specified periods. If the patient agreed and met eligibility criteria, the interviewer administered both the TOFHLA and the candidate items for the NVS. We gave patients a hard copy of the NVS nutrition label to hold and to which they could refer, as needed, while the interviewer asked the 6 questions out loud. Patients were testing in the language they preferred for reading.
This process continued until 250 English-speaking and 250 Spanish-speaking patients had been tested. The order of administering the tests was alternated so that interviewers administered the NVS questions first to even-numbered participants and the TOFHLA first to odd-numbered participants.
Data Analysis
We used means, standard deviation (SD), standard error of the means, histograms, t tests, and analysis of covariance to summarize the participant’ demographic characteristics and their performance on the tests. The TOFHLA was scored according to the instructions provided with the instrument. Candidate items on the NVS test were scored by giving 1 point for each correct answer.
Analysis of the psychometric properties of the English and Spanish versions of the NVS test and the TOFHLA were conducted separately using identical methods. Reliability of the NVS was assessed in terms of internal consistency (Cronbach α). Criterion validity was determined by calculating the correlation (Pearson r) between scores on the NVS and TOFHLA. We quantified the relative accuracy of age, educational level, and NVS scores as predictors of adequate literacy (defined by TOFHLA scores >74) by computing their receiver operating characteristic (ROC) curves. The ROC curves were used to calculate the sensitivity and specificity for selected cutoff scores on the NVS test. Stratum-specific likelihood ratios were calculated for each NVS score.
RESULTS
We enrolled 250 English-speaking participants to validate the English version of the NVS (NVS-E) and 250 Spanish-speaking participants to validate the Spanish version (NVS-S). These 500 participants represented approximately 80% of the individuals asked to participate in the study. Participants’ demographic characteristics are shown in Table 2⇓.
Table 2⇑ also provides a comparison of the NVS with TOFHLA scores of the English and Spanish samples before and after adjustment for sex and educational level. The English sample had significantly higher scores than the Spanish sample on both instruments; these differences could not be explained by differences in sex and educational level.
Newest Vital Sign: English
The number of correct items on the NVS-E ranged from 0 to 6 (mean = 3.4 ± 1.9). The time required to administer the 6 items was recorded for a series of 24 participants; the average time was 2.9 minutes (SD 1.2 minutes; range = 1.5–6.2 minutes). Total scores on the English version of the TOFHLA (TOFHLA-E) ranged from 12 to 100 (mean = 86.3 ± 14.1). As shown in Figure 2⇓, the distribution of TOFHLA-E scores was severely negatively skewed.
There was no significant difference in scores between men (mean = 3.3 ± 2.0; n = 55) and women (mean = 3.5 ± 1.9; n = 195) on the NVS-E (P >.05). In contrast, there was significant sex bias on the TOFHLA-E, with men scoring significantly lower (mean = 82.5 ± 17.1; n = 55) than women (mean = 87.4 ± 13.0; n = 195; P <.001). This bias is not the result of confounding with age or educational level, as there were no significant differences between men and women in either of these variables (P >.05).
Reliability, Validity, and Accuracy
The internal consistency of the NVS-E was good (Cronbach α = 0.76), as was the criterion validity (r = 0.59, P <.001). Supplemental Appendix 2 (available online only at http://www.annfammed.org/cgi/content/full/3/6/514/DC1) plots the relationship between scores on the NVS-E and the TOFHLA-E.
The area under the ROC curve for predicting TOFHLA-E scores was 0.88 (95% CI, 0.84–0.93; P <.001) for the NVS-E, substantially higher than the 0.72 (95% CI, 0.63–0.81; P <.001) found for educational level or the 0.71 (95% CI, 0.63–0.79; P <.001) found for age. Thus, the NVS-E score was more accurate than educational level or age.
The ROC curve for the NVS-E showed that a score of <2 on the NVS-E had a sensitivity of 72% and specificity of 87% for predicting limited literacy (TOF-HLA score <75), whereas a score of <4 had a sensitivity of 100% and a specificity of 64%. Stratum–specific likelihood ratios for cutoff scores on the NVS-E are shown in Table 3⇓ under English. These ratios show, for example, that getting only 1 item correct on the NVS-E has a stratum-specific likelihood ratio of 5.1 for marginal or inadequate literacy, a 4-fold increase over that seen with getting 2 items correct.
Newest Vital Sign: Spanish
The number of correct items on the NVS-S ranged from 0 to 6 (mean = 1.6 ± 1.5). The time required to administer the 6 items was recorded for a series of 36 participants; the average time was 3.4 minutes (SD 1.2 minutes; range = 2.1–8.2 minutes). Total scores on the Spanish version of the TOFHLA (TOFHLA-S) ranged from 8 to 100 (mean = 75.7 ± 18.5). As shown in Figures 2C and 2D⇑, the distribution of NVS-S scores was quite different from that of TOFHLA-S scores; NVS-S scores were positively skewed while TOFHLA-S scores were negatively skewed.
There were no significant differences (P >.05) between men (n = 29) and women (n = 221) on the NVS-S (mean = 1.5 ± 1.5, and mean = 1.6 ± 1.5, respectively) or on the TOFHLA-S (mean = 71.0 ± 18.2, and mean = 76.3 ±18.5, respectively). Nor were there any significant differences between men and women in age or educational level (P >.05).
Reliability, Validity, and Accuracy
The internal consistency of the NVS-S was good (Cronbach α = 0.69), as was the correlation with the TOFHLA (r = 0.49, P <.001). Supplemental Appendix 2 plots the relationship between scores on the NVS-S and the TOFHLA-S.
The area under the ROC curve for predicting TOFHLA-S scores was 0.72 (95% CI, 0.66–0.79; P <.001) for the NVS-S, slightly higher than the 0.69 (95% CI, 0.62–0.76; P <.001) found for educational level or the 0.64 (95% CI, 0.56–0.71; P <.001) found for age. The ROC curve for the NVS-S showed that scoring <2 on the NVS-S had a sensitivity of 77% and a specificity of 57% for predicting limited literacy (TOFHLA-S score <75), whereas scoring <4 had a sensitivity of 100% and a specificity of 19%. Stratum-specific likelihood ratios for cutoff scores on the NVS-S are shown in Table 3⇑ under Spanish.
DISCUSSION
In samples of English-speaking and Spanish-speaking primary care patients in the southwestern United States, the NVS—a 6-item test based on the ability to read and apply information from a nutrition label—was a reliable and accurate measure of literacy with high sensitivity for detecting persons with limited literacy. The NVS is the first literacy screening test available in both English and Spanish that can be administered in approximately 3 minutes. It will permit clinicians or health care administrators to rapidly assess literacy in their patients.
Properties and Clinical Significance of the NVS
The NVS had good sensitivity; in fact, based on the distribution of scores (Figure 2⇑), NVS may be more sensitive than the TOFHLA to marginal health literacy. Its specificity, although less than optimal, is similar to or better than that of other widely used clinical screening methods, such as questionnaires to detect alcohol abuse,35,36 breast self-examinations to screen for cancer,37 and methods to detect arthritis38 and measure osteoporosis risk.39 Although the specificity of NVS may result in overestimating the percentage of patients with limited literacy, using the test can alert physicians to patients who may need more attention and help physicians focus on physician-patient communication using recommended techniques.26,40
All patients who score >4 on the NVS will have adequate literacy when measured by the TOFHLA. A score <4 on the NVS, on the other hand, indicates the possibility of limited literacy. Clinicians should be particularly careful in their communication with patients who score < 2, as they have a greater than 50% chance of having marginal or inadequate literacy skills. Such patients cannot be reliably identified by questions about their education level, as education does not always predict of literacy—it only measures the number of years an individual attended school. Indeed, about one quarter of participants who scored at the very lowest of 5 literacy levels on the US Department of Education’s National Adult Literacy Survey were high school graduates.33
The Role of Numeracy in Health Literacy
Of the 5 candidate scenarios evaluated in this validation study, the one most effective at discriminating low literacy from adequate literacy was the scenario requiring the most complex numeracy skills (Table 1⇑). As discussed earlier, our testing included scenarios that assessed both reading and numeracy skills, but in every analysis we performed, the nutrition label with its quantitative-numerical questions was the best predictor of literacy (using the TOFHLA as a reference standard). That the questions from the nutrition label scenario had such high internal consistency suggests answering those questions involved not only a math skill, but also a locate-the-information skill (by reading and comprehending) and an abstract reasoning skill (eg, imagining they have an allergy to peanuts, noting that even vanilla ice cream can have a peanut product in it, and reasoning that peanut oil is probably not good for you if you are allergic to peanuts).
Furthermore, the use of a nutrition label to assess health literacy is intuitively appealing because nutrition labels are familiar items that are important parts of health management for many chronic diseases. They are also used for health promotion in that many healthy people use information on nutrition labels to help achieve health eating habits. Patients’ ability to understand and use the information on nutrition labels has been the subject of study in a host of health promotion and epidemiology research projects, both in the United States and elsewhere.41–43
Limitations
We compared the NVS to the full version of the TOFHLA, not the short-version TOFHLA that is in more widespread use. The full TOFHLA, however, is the standardized instrument from which the short version was derived, so its psychometric properties are an appropriate reference standard for the development of new instruments.
Health literacy is a complex construct that encompasses many aspects of how individuals use health information and the health care system. Our test, like the TOFHLA and the REALM, measures reading and interpretation skills (ie, general literacy, reasoning, and the ability to use numbers) as applied to material with health content, rather than all aspects of health literacy.32,44–45
The psychometric properties of the Spanish version of the NVS, although adequate to screen patients for limited literacy, were not as good as those of the English version. This fact may stem from the greater heterogeneity of language and culture among our Spanish-speaking patients, who come from all regions of South America, Central America, and Mexico.
Finally, the primary care practices in this study were selected because of their high percentage of Spanish-speaking patients, and among the Spanish-speaking participants the percentage of male patients was relatively small. These practices do not, therefore, have demographics that are fully representative of all primary care practices in the United States. Testing of the NVS on other patient populations could further validate the accuracy of the instrument.
The NVS has advantages over currently available instruments. Specifically, it is available in Spanish, whereas the REALM is not, and it can be administered much more quickly than the TOFHLA. The NVS also does not have the ceiling effect seen with the TOFHLA and, therefore, particularly in the English version, the NVS provides better discrimination of skill levels among individuals in the upper part of the distribution of literacy skills. Future investigations should examine (1) how to best introduce and implement NVS in primary care practice, (2) the validity of NVS in other primary care practices and also in non-primary care settings, (3) whether raising clinicians’ awareness of patients’ literacy by using NVS results in improved clinician-patient communication and better health outcomes, and (4) whether a similar nutrition label scenario can assess literacy in speakers of languages other than English and Spanish.
Acknowledgments
The authors thank Barbara DeBuono, MD, MPH, and Lisa Dieter of Pfizer, Inc, and Ana Rita Gonzales, ScD, of Fleishman-Hillard for their support of this project. The authors also acknowledge and appreciate the assistance of John Blackburn, Katie O’Brien, and Berenis Romo, who conducted health literacy testing. Additional thanks to Maria Chavez, research specialist in the University of Arizona Department of Family and Community Medicine, for her help with data entry.
Footnotes
-
Conflicts of interest: Drs Weiss, DeWalt, and Pignone have received research support and honoraria from the Pfizer’s Health Literacy/Clear Health Communication Initiative. None of these individuals have any other financial interest in the Initiative or in Pfizer, Inc.
-
Funding support: This study was supported by funding from the Pfizer Clear Health Communication Initiative.
- Received for publication April 30, 2005.
- Revision received July 26, 2005.
- Accepted for publication July 31, 2005.
- © 2005 Annals of Family Medicine, Inc.