Abstract
PURPOSE We set out to develop and validate a patient-reported instrument for measuring experiences and outcomes related to patient safety in primary care.
METHOD The instrument was developed in a multistage process supported by an international expert panel and informed by a systematic review of instruments, a meta-synthesis of qualitative studies, 4 patient focus groups, 18 cognitive interviews, and a pilot study. The trial version of Patient Reported Experiences and Outcomes of Safety in Primary Care (PREOS-PC) covered 5 domains and 11 scales: practice activation (1 scale); patient activation (1 scale); experiences of patient safety events (1 scale); harm (6 scales); and general perceptions of patient safety (2 scales). The questionnaire was posted to 6,736 patients in 45 practices across England. We used “gold standard” psychometric methods to evaluate its acceptability, reliability, structural and construct validity, and ability to discriminate among practices.
RESULTS 1,244 completed questionnaires (18.5%) were returned. Median item-specific response rate was 91.3% (interquartile range 28.0%). No major ceiling or floor effects were observed. All 6 multi-item scales showed high internal consistency (Cronbach’s α 0.75–0.96). Factor analysis, correlation between scales, and known group analyses generally supported structural and construct validity. The scales demonstrated a heterogeneous ability to discriminate between practices. The final version of PREOS-PC consisted of 5 domains, 8 scales, and 58 items.
CONCLUSIONS PREOS-PC is a new multi-dimensional patient safety instrument for primary care developed with experts and patients. Initial testing shows its potential for use in primary care, and future developments will further address its use in actual clinical practice.
INTRODUCTION
Patient safety, defined by the World Health Organization as “the prevention of errors and adverse effects to patients associated with health care,”1 is a growing interest in primary care systems.2 Despite the potential impact on population health, major gaps remain in our understanding of primary care patient safety, particularly due to the lack of appropriate measurement methods,2 which limits our ability to obtain reliable and repeatable rates of events for safety improvement and for research to identify fundamental underlying causes and mechanisms.
Current tools rely almost exclusively on information supplied by health care providers (eg, safety culture questionnaires and voluntary reporting of safety events).3 A growing body of evidence, however, suggests that patients are sensitive to and able to recognize a range of problems in health care delivery4,5 that are not identified by traditional systems of health care monitoring.6,7 Patient reports constitute a reliable source of information8,9 and have potential to improve the systematic detection of problems in health care.10–13
Our recent systematic review of primary care patient-reported safety measures showed that such instruments largely focus on a small number of relevant dimensions, mostly related to medication problems, and do not allow for a comprehensive assessment of care safety.14
We aimed therefore to develop a patient-reported instrument for comprehensively measuring experiences and outcomes of patient safety in primary care, and to test its psychometric properties.
METHODS
Based on quality standards for instrument development and evaluation,15 these steps were followed in the development of the new measure: (1) developing the framework for questionnaire domains based on the literature and expert consensus; (2) identifying and piloting relevant domains and items; and (3) psychometric testing for characteristics including acceptability, internal consistency, construct validity, and response bias.
Conceptual Framework
Two members of the research team, supported by 2 external experts (see Acknowledgments), reviewed and discussed the conceptual models proposed for patient safety in primary care.1,16–22 Consensus emerged on 3 necessary elements for patients’ safety events: (1) patient interaction with the health care system, including self-management; (2) standards of care (with failure to adhere to them possibly due to error, but also due to other causes); and (3) actual or potential harm to patients, conceptualized as deterioration in health, including physical, mental, and social well-being. An event was hence defined as “harm or potential harm to 1 or more patients due either to an interaction with the health care system that fails to adhere to accepted standards of care (ie, that is affected by error or systemic dysfunction), or to the intrinsic risks of health care interventions.”
We extracted domains from a meta-synthesis of qualitative studies on patients’ experiences and perceptions of patient safety in general practices: factors contributing to safety events, experiences of safety events (active failures and harm), and patient and provider responses to safety events.23 Additional domains and themes were obtained from 4 focus groups with 27 primary care users,24 and from 23 instruments identified in our previous systematic review.14
After removing redundant domains and combining overlapping ones, 5 main domains emerged: practice activation (what does the practice do to create a safe environment); patient activation (how proactive is the patient in relation to his or her safety); experiences of patient safety events (errors); outcomes of patient safety events (harm); and overall perceptions of patient safety (how safe patients perceive their practice to be).
Item Identification and Instrument Refinement
An expert committee composed of 5 international experts in patient safety in primary care, 3 local experts, and 2 members of the public (see Acknowledgments) was convened to support the development of the questionnaire (Figure 1).
Items were extracted from previous instruments14 to generate an item pool, which was further populated with items proposed by the development team based on the literature reviews and the focus groups. Response scales were homogenized wherever feasible. A first draft of the questionnaire was produced and then revised in an iterative process (4 iterations over 12 months) informed by repeated feedback from the expert committee.
Four waves of cognitive testing using the think-aloud technique were undertaken, including 13 individual interviews lasting 45 to 60 minutes carried out with members of the public purposefully selected to represent a range of sociodemographic backgrounds.25
In a pilot with 1,975 patients in 26 English general practices, the feasibility of administration of a pretrial version of the instrument, The Patient Reported Experiences and Outcomes of Safety in Primary Care (PREOS-PC 0.1) was tested, and the information was also used in an additional round of expert committee feedback and 5 additional cognitive interviews.
Psychometric Evaluation
In June 2014, the trial version of the questionnaire was sent to 6,736 patients registered in 45 practices purposefully sampled to ensure maximal variation in practice size and levels of deprivation and distributed across 5 regions in the North, Center, and South of England. Each practice sent the questionnaire to a computer-generated random sample of 150 patients aged 18 years and older who had had at least 1 contact with the practice in the last 12 months. Due to funding constraints, a reminder was feasible only for 10 practices, and it was sent after an interval of approximately 2 weeks.
Information on practice characteristics is available in Supplemental Appendix 1 (http://www.annfammed.org/content/14/3/253/suppl/DC1). Practices were asked to complete the tool PC SafeQuest,26 a measure of health care professionals’ perceptions of the safety climate of their practice. Ethical approval was granted by Nottingham Research Ethics Committee (Reference 13/EM/0258; July 2013).
The acceptability of the questionnaire was evaluated through examination of individual item response rates. Scale scores were calculated as the percentage of the maximum score achievable on all items, with scores ranging from 0 (very dissatisfied, totally disagree, etc) to 100 (very satisfied, totally agree, etc). Where responses were missing for 50% or more of the items in a scale, it was scored as missing; otherwise a score was derived using the available items without any imputation.
Internal consistency was deemed acceptable where inter-item correlation coefficients were at least 0.327 and Cronbach’s α at least 0.7.28 Test-retest reliability was analyzed using 1-way random-effects intra-class correlations (ICC), with a threshold ICC of at least 0.7, using data from a sample of 235 respondents who had been invited to complete the instrument twice approximately 2 weeks apart.15
Confirmatory factor analysis was conducted to examine the construct validity of the pre-hypothesized scales. Goodness-of-fit statistics examined included the Satorra-Bentler χ2 statistic, comparative fit index (CFI), and standardized root-mean residual (SRMR). We used Hu and Bentler’s recommendation for model evaluation,29 consisting in the use of a combinational rule CFI greater than 0.95 and SRMR less than 0.09. Construct validity was further examined by means of (1) pre-specified group differences, testing whether mean scores discriminated among defined groups of (a) users in line with hypothesized differences (age, ethnicity, language, country of origin, number of long-term conditions and of medications) and (b) practices (practice size, deprivation, proportion of patients aged at least 65 years, and safety climate as characterized by PC-SafeQuest); and (2) observed correlations among PREOS-PC scales with a priori hypothesized relationships.
To examine the performance of each scale as a measure of safety at the practice level, we calculated the standard error of a practice mean score as a measure of precision of measurement and the reliability coefficient (based on the between-practice intra-cluster correlation coefficient) as a measure of ability to discriminate between practices. Both measures are influenced by sample size: we based them on the mean number of patient per practice, but also estimated the sample size required to achieve reliable discrimination between practice scores at the 0.7 level.
Finally, post-hoc sensitivity analyses were carried out to examine the magnitude of potential response bias. In the subgroup of practices where reminders were sent, we used hierarchical regression models (adjusting for clustering effect) to compare patient characteristics and scale scores between patients responding to initial invitations and those responding to reminders. In order to account for skewed score distributions, bootstrap methods (50 samples) were used.
All data manipulation and analysis was conducted using STATA version 12.0 (StataCorp LP).
RESULTS
PREOS-PC
The Patient Reported Experiences and Outcomes of Safety in Primary Care (PREOS-PC), invites patients to report on their perceptions and experiences concerning the safety of the health care received in their primary care practice over the past 12 months (Table 1). The trial version (PREOS-PC 0.2) contained 54 standardized items and 7 open-ended questions. Forty-two standardized items were distributed across 11 scales covering all 5 domains. The remaining 12 standardized items captured details on a specific event (where did the event occur; what actions were taken, etc) and therefore were not part of any scale since their purpose was descriptive rather than evaluative.
Response Rate
The overall response rate was 18.5% (1,244/6,736), an average of 28 responses per practice. The response rate for patients who received a reminder (29.6%; 354/1,195) almost doubled that of patients who did not receive it (16.1%; 890/5541).
Compared with the overall characteristics of all eligible patients registered in the 45 participating practices, respondents were more likely to be female (59% vs 51%), at least 65 years old (39% vs 19%), and of white ethnicity (91% vs 82%) (Table 2). In our sensitivity analyses comparing demographic characteristics and scale scores between patients responding to initial invitations and those responding to reminders, we observed that the youngest and oldest age groups and those taking less than 4 medications were less likely to respond to the first mailing (Supplemental Appendix 2, http://www.annfammed.org/content/14/3/253/suppl/DC1). No differences in scores between those 2 groups were observed for any of the scales, however.
Acceptability
Median item response rate was 91.3% (interquartile range 69.6% to 92.4%). When items were ranked according to nonresponse, all items in the lowest quartile pertained to the “experiences of the most recent safety problem” construct.
There was no evidence of significant ceiling or floor effects except for 2 items: “harm causing increased personal needs” and “harm causing increased financial needs” (80.1% and 80.4% of patients reporting “not at all,” respectively).
Reliability
The 6 pre-hypothesized multi-item scales demonstrated high internal consistency (Cronbach’s α, 0.75 to 0.96) and adequate homogeneity (inter-item correlations, 0.22 to 0.83) (Table 3). Test-retest intra-class correlation coefficients, however, were above the 0.7 standard for only 2 of the 11 scales (practice activation and harm specific to the health domain).
Practice-Level Precision and Discrimination
Taking a standard error of 5 points on the scale of 0 to 100 as indicating good precision, practice mean scores for all the globally applicable scales except patient activation demonstrated high precision. Practice means on the subset of specific scales (ie, patients who reported harm), however, showed very low precision (in all cases a standard error of more than 13 points).
Between-practice ICCs were mostly low (less than 0.03), suggesting that patient scores only weakly clustered within practices. This is reflected in the low reliability coefficients (all less than 0.7), indicating that although precise, the practice mean scores do not discriminate well between practices in terms of patient perceptions of safety. For most scales, however, a sample of around 100 patients would be sufficient to produce scores that discriminate well (ie, with reliability of at least 0.7).
Validity
Structural Validity
Confirmatory factor analysis was performed on the 5 multi-item scales with more than 2 items and provided evidence for high structural validity (Supplemental Appendix 3, http://www.annfammed.org/content/14/3/253/suppl/DC1). Three of the models met Hu and Bentler’s criteria,29 suggesting adequate goodness-of-fit. Moderately high item-total correlations, high internal consistency coefficients, and the results of the factor analysis indicated that each scale measures a single construct, and that the items can be combined to produce summary scores.
Construct Validity
The great majority of pairwise correlations supported our pre-specified hypothesis (Supplemental Appendix 4, http://www.annfammed.org/content/14/3/253/suppl/DC1). Whereas the results from the analyses of hypothesized differences between groups of patients generally supported the construct validity of the scales examined, the results from the analyses based on practice characteristics were largely inconclusive (Table 4).
Further Modifications and Final Version of PREOS-PC
Final modifications were made to PREOS-PC based on the results of the psychometric analyses (Supplemental Appendix 5, http://www.annfammed.org/content/14/3/253/suppl/DC1). The modifications mostly concerned the 3 single item scales in the harm domain (“time to recover from overall harm”; “amount of overall harm experienced”; and “impact of overall harm on overall health”). They were removed because they measured constructs very similar to the 3 multi-item harm-related scales that remained in the questionnaire, which demonstrated better psychometric properties. The final version of PREOS-PC includes 58 items and 8 scales (Supplemental Appendix 6, http://www.annfammed.org/content/14/3/253/suppl/DC1).
DISCUSSION
The PREOS-PC instrument has been developed as a tool to provide a comprehensive measure of patient-centered evaluations of patient safety in primary care, filling a gap identified in a previous systematic review.14 It was developed following the highest standards of instrument development, and this study provides preliminary evidence supporting its reliability and validity.
Strengths and Limitations
This study presents a number of methodological strengths. Evidence of the content and face validity of PREOS-PC is supported by the development of the conceptual model, the preparatory qualitative work undertaken,23 a systematic review of instruments,14 and the iterative process of questionnaire development, which was supported by an expert committee. The questionnaire covers all of the key dimensions of our conceptual framework for primary care patient safety. It was piloted in a large sample of adults registered at a wide range of practices across England. Well-established procedures for the assessment of patient-reported instruments15 were applied to examine its reliability and validity.
In terms of limitations, our study had a low response rate (18.5%), substantially lower than response rates from similar large scale surveys such the GP Patient Survey,30 which had a response rate of 39%. The subsample of patients who received a reminder demonstrated a substantially higher response rate (29.6%); it seems reasonable to anticipate that the inclusion of a second reminder (as was the case for the GP Patient Survey) could have increased the response rate even further.
Nonresponse can constitute a bias, since nonrespondents might differ from respondents on the key measures of interest. Meta-analyses suggest that, as long as rigorous probability sampling processes (such as those used in our study) are followed, the association between response rates and nonresponse bias within samples is generally weak.31 Our post-hoc analyses showed that although the low response rate resulted in an over-representation of elderly and polymedicated patients, this did not affect to the scale scores, suggesting that response bias did not significantly limit our estimations of the psychometric properties of the instrument.
We observed skewed score distributions for a number of items and scales. Skew is common, however, in questionnaires assessing patients’ views of medical care32,33 and does not necessarily limit the ability to reliably distinguish practices and patient subgroups with sufficient sample sizes such as ours.34
The acceptability of the “Most recent safety problem” section was relatively low, with only 60% of eligible participants adequately completing that section. This could be partially explained by potentially unclear instructions in the branching question preceding that section. This has subsequently been amended to increase clarity. It may also suggest, however, that some patients are reluctant to provide what might be considered overly detailed information about the safety problems experienced.
A substantial proportion of the scales included a low number of items, and 5 of them were based on single items. This constitutes a limitation, since short scales usually present lower levels of accuracy and reliability than scales based on higher number of items. Also, test-retest reliability could not be examined for 4 of the harm scales due to lack of sufficient cases of harm. This has minor implications for the instrument, since 3 of these have been excluded from the final version. Five of the remaining scales demonstrated low levels of test-retest reliability, suggesting that they are not adequately stable over time. This might suggest interpretation issues; further cognitive testing is needed to inform potential item modification.
We computed scale scores for patients responding to more than 50% of scale items. Measurement errors will be somewhat larger for patients close to the 50% threshold; a stricter threshold, however, would result in more patients being fully excluded from the calculation of practice-level scores, potentially increasing the error and bias on those scores, particularly if item nonresponse is related to patient characteristics or experience. We considered 50% to offer a reasonable balance between these 2 sources of error and bias. Also, analyses of the psychometric properties were not stratified by levels of service use, and therefore we cannot ascertain the extent to which reliability of the scales was influenced by the number of interactions that patients had with their primary care providers.
Finally, some features of the scales are worth noting, namely the extremely high Cronbach’s α in “harm specific health domains” (0.96, which may suggest item redundancy); the low inter-item correlation in the “experiences of safety problems” scale (0.22, which suggests that problems were quite independent among them); and the low test-retest coefficient for “harm: health care, personal care, and financial needs” (−0.02, presumably a result of the low number of patients reporting harm in our retest sample).
Future Steps
Further work is needed before general application of the instrument. Additional developments will include the assessment of the instrument’s responsiveness to change (important if the instrument is to be used as an outcome measure in intervention studies). The development of formal methods for interpretation of the scores is pending, although provider benchmarking may in itself substantially contribute to this aim. In addition, further work comparing levels of patient safety as measured with PREOS-PC against other measures of the concept is still needed to support the validity of the instrument. Although versions of the current length may be appropriate for research purposes, shorter versions may present some advantages for service improvement. Rasch modeling is especially suitable to identify redundant items.35 This work is currently underway; so is the examination of the acceptability and validity of alternative methods for administration (online and in the practice). Future steps will also include the translation of PREOS-PC into a number of different languages, and its cross-cultural adaptation and validation.
In sum, then, PREOS-PC provides a comprehensive measure of patient-reported experiences and outcomes of safety in primary care. Results from psychometric analysis support its internal consistency and validity, though findings for test-retest reliability were mixed. Further work is needed before general application of the instrument.
Acknowledgments
We would like to thank the following participants and organizations involved in the development of the questionnaire:
Dr Itziar Larizgoitia (WHO), Prof Tony Avery (University of Nottingham), Prof Stephen Campbell (University of Manchester), Prof Charles Vincent (University of Oxford), Dr Angela Coulter (University of Oxford), Dr Sarah P Slight (University of Durham), Dr Umesh Kadam (University of Keele and previously Arthritis Research UK Primary Care Centre), Ms Liz Thomas (Action against Medical Accidents (AvMA)), Mr Derek Shaw, and Mr Antony Chuter (members of the public) for their participation as members of the Expert Committee.
Dr Daniela Gonçalves Bradley and Dr Suzanne Shale (Oxford University) for their contribution in the development of the conceptual framework of patient safety.
Mr Antony Chuter (member of the public) and Ms Liz Taylor (Association against Medical Accidents (AvMA)) for their support in recruiting members of the public for the focus groups.
Ms Kate Marsden (University of Nottingham), Dr Katherine Perryman (University of Manchester), Ms Jane Barnett (University of Southampton), Dr Ian Litchfield (University of Birmingham), Ms Sally Thomas (University of Keele) and all the health professionals in the 45 practices involved in this project for distributing the questionnaires to the patients as part of its pilot-testing.
Dr Brian Bell for providing the data on the characteristics of the practices.
Finally we would like to thank all the patients and members of the public that participated in the cognitive interviews, focus groups, and completing the survey.
Footnotes
Conflicts of interest: authors report none.
Funding support: This research is part-funded by the UK National Institute for Health Research School for Primary Care Research (NIHR SPCR). The views expressed are those of the authors and not necessarily those of the NIHR, the NHS, or the Department of Health.
Supplementary materials: Available at http://www.AnnFamMed.org/content/14/3/253/suppl/DC1/.
- Received for publication August 20, 2015.
- Revision received December 7, 2015.
- Accepted for publication December 21, 2015.
- © 2016 Annals of Family Medicine, Inc.