Abstract
PURPOSE Physicians’ use of self-assessment to guide quality improvement or board certification activities often does not correlate with more objective measures, and they may spend valuable time on activities that support their strengths instead of addressing gaps. Our objective was to study whether viewing quality measures, with peer comparisons, would affect the selection of certification activities.
METHODS We conducted a cluster-randomized controlled trial—the Trial of Data Exchange for Maintenance of certification and Raising Quality (TRADEMaRQ)—with 4 partner organizations during 2015-2017. Physicians were presented their quality data within their online certification portfolios before (intervention) vs after (control) they chose board certification activities. The primary outcome was whether the selected activity addressed a quality gap (a quality area in which the physician scored below the mean for the study population).
RESULTS Of 2,570 invited physicians, 254 physicians completed the study: 130 in the intervention group and 124 in the control group. Nearly one-fifth of participating physicians did not complete any certification activities during the study. A sizable minority of those in the intervention group, 18.4%, never reviewed their quality dashboard. Overall, just 27.2% of completed certification activities addressed a quality gap, and there was no significant difference in this outcome in the intervention group vs the control group in either bivariate or adjusted analyses (odds ratio = 1.28; 95% CI, 0.90-1.82).
CONCLUSIONS Physicians did not use quality performance data in choosing certification activities. Certification boards are being pressed to make their programs relevant to practice, less burdensome, and supportive of quality improvement in alignment with value-based payment models. Using practice data to drive certification choices would meet these goals.
- electronic health records
- health information technology
- clinical quality measures
- quality improvement
- randomized controlled trial
- primary care
- quality of care
- continuing medical education
- practice-based research
INTRODUCTION
The Crossing the Quality Chasm landmark report of the Institute of Medicine (now National Academy of Medicine) detailed the uneven quality of the US health care system nearly 20 years ago.1 Focused continuing medical education and quality improvement (QI) offer physicians options to address poor quality of care. The American Board of Medical Specialties member boards’ response to the Quality Chasm series produced the Maintenance of Certification process to help address gaps in quality by requiring physicians to regularly demonstrate lifelong learning and participation in QI. These activities are often ineffective, however, because physicians commonly choose activities in areas where they are already competent, comfortable, or both.2-4 Physicians who perceive their knowledge gaps are more likely to show improvement from participation in continuing medical education activities.5 Like most health professionals,6 however, physicians are poor at assessing their own competence when compared with formal external assessment.7 Relying on physicians’ self-assessment to guide their choices of learning activities conflicts with physician learning theories that suggest physicians generally select self-directed activities in response to immediate clinical problems.8 If physicians are not able to accurately self-assess their practice gaps, they may be spending valuable time on activities that are unlikely to improve care.
Ahead of the Continuing Board Certification: Vision for the Future Commission report of the American Board of Medical Specialties,9 the American Board of Family Medicine (ABFM) set goals to reduce the burden of its certification program and provide more meaningful tools for practice assessment and targeting QI.10,11 As part of this goal, the ABFM created the PRIME registry to enable whole-panel electronic health record (EHR) data to produce quality measure dashboards that enable reporting for value-based payment and that feed into certification activities. The ABFM recognized that many of its diplomates may not participate in PRIME and wanted to test quality measure exchanges with health systems to provide an alternative pathway. The Trial of Data Exchange for Maintenance of certification and Raising Quality (TRADEMaRQ) was conducted in partnership with organizations (health systems) to reinforce quality assessment and improvement, make certification activities more relevant to practice needs, and reduce clinician burden. Our objective was to study whether viewing quality measures with peer comparison data would affect the types of self-assessment and QI efforts that family physicians chose as part of their ABFM certification process.
METHODS
Study Design
TRADEMaRQ was a practice-level cluster-randomized trial. The ABFM partnered with 4 organizations having mature, physician-level clinical quality measure processes: Group Health Cooperative of Puget Sound, now Kaiser Permanente Washington (KPWA); Kaiser Permanente Colorado (KPCO); OCHIN (a non-profit health care innovation center focused on the needs of community health centers, small practices, and critical access hospitals); and Southeast Texas Medical Associates (SETMA). Physicians were eligible for the study if they were participating in ABFM certification. Our clinical partners recruited family physicians in their networks or participating networks via face-to-face and e-mail solicitations. The ABFM provided study information in written form or via webinar, as desired by the organization, after which those physicians wanting to participate provided consent to their organization. We used a single-blind cluster randomization procedure at the clinic level to assign physicians to an intervention group or a control group. With each subsequent enrollment wave, physicians in a clinic that had already been randomized were automatically assigned to that group.
Quality measure data were transmitted to the ABFM every 2 to 4 weeks per partner abilities and instantiated into physicians’ certification portfolios to allow presentation of performance and peer comparison. Participants in the intervention group were presented with their quality measure dashboard inside their ABFM physician portfolio before choosing a certification activity (Supplemental Figure 1). We hypothesized that physicians who spent more time reviewing their quality dashboard would be more likely to select an activity that reflected quality gaps, and we built a time-tracking feature on this section of the application to capture time spent reviewing the dashboard. Control group participants were shown their quality dashboard after making a choice and could use its data to complete a QI activity.
During the study, participants had the same ABFM continuous certification requirements as nonparticipants, which included completing, on average, 3 certification activities of their choice in a 3-year period, with 1 being a QI activity. These activities were a diverse set of knowledge self-assessments, clinical self-assessments, and QI activities. Participants were not required to complete any specific activity, and the intervention was solely providing quality data before starting an activity. The study was designed to run 2 years but was extended to 3 years to increase the chance each participant would complete an activity. Additionally, some activities could be completed without accessing the portfolio.
Measures
Participating organizations agreed on 19 certified quality measures for the study (Supplemental Table 1). These measures encompassed both process and outcome measures that could be affected solely by physician action (eg, diabetic foot examination) or patient action. Data on performance were presented in the portfolio as a dashboard along with the mean value of the study population overall to support peer comparison. Clinic site was provided by the partner organization at enrollment. Other demographic data were obtained from ABFM administrative databases. These data included age, sex, degree type, international medical graduate status, and most recent ABFM certification examination score. We hypothesized that physicians with higher board scores would be more likely to select activities reflecting lower performance.12
Our primary outcome was whether the clinical topic or focus of a certification activity addressed a quality gap by aligning with a clinical quality measure on which the physicians’ performance was below the study population mean. Two physician authors (R.L.P., L.E.P.) reviewed the list of activities completed and came to consensus on the match of measures to activities. Our secondary outcome was time spent reviewing the quality dashboard. We counted only time reviewing the dashboard in the 2 weeks before starting an activity for main analysis but, in a sensitivity analysis, extended this window to 3 weeks.
Analytic Strategy
First, we computed descriptive statistics for our sample and their ABFM certification activities. Then, differences in physician characteristics and certification activities between control and intervention groups were assessed with χ2 tests or t tests. We calculated the time spent reviewing the dashboard before making a choice in the 2 weeks and 3 weeks before starting an activity.
For the primary outcome, choice of certification activity was the unit of analysis. We used a logistic regression model, nested by physician, assessing whether the certification activity reflected a clinical area in which the physician’s performance was below the population mean. Control variables included physician characteristics and a dummy variable for organization. In a sensitivity analysis, we excluded physicians in SETMA as they had a publicly available quality-reporting system for the measures in the study and may have already known their performance. For our secondary outcome, we limited the sample to choices by intervention physicians who viewed their dashboard anytime within either 2 weeks or 3 weeks before beginning an activity and included an indicator variable for whether the time spent was above the mean time reviewing the dashboard (80 seconds).
The study was powered for a different outcome that proposed to evaluate the change in quality measures between study groups. With an α of .05, a randomization unit of physicians of 6 (the average number of physicians per clinical site), a standard deviation of change of 20%, and the intraclass correlation coefficient of 0.08, we estimated an 80% probability to detect a 2% difference in low-density lipoprotein cholesterol goals if 2,761 physicians enrolled and a 4% difference if 969 physicians enrolled. We conducted a post hoc power analysis for the revised primary outcome. The original study period was 2 years; however, the study was extended by a year to allow a full 3-year certification cycle.
All analyses were conducted in SAS version 9.4 (SAS Institute Inc). The study was approved by the American Academy of Family Physicians’ institutional review board after all parties entered into a nationwide authority consortium agreement approved by the Office for Human Research Protections.
RESULTS
We contacted 2,570 physicians about the study, of whom 2 were ineligible (Figure 1). A total of 269 physicians enrolled and were randomized, over multiple waves, with 137 assigned to the intervention group and 132 to the control group. Thirteen left the trial because of changing jobs, retirement, or withdrawal. Our final sample therefore included 130 physicians in the intervention group and 124 in the control group, with characteristics balanced between groups (Table 1).
Overall, 80.3% of all participants completed at least 1 certification activity during the 3-year study period (Supplemental Table 2) and participants completed more than 100 unique activities. Intervention physicians were less likely to complete an activity than control physicians (75.4% vs 85.5%, P = .04). Almost 20% of intervention physicians never reviewed their quality dashboard, with variation across organizations ranging from 10.9% to 34.6%, and only 45.4% viewed their dashboard 2 weeks before starting a certification activity (Table 2). The median time spent reviewing the dashboard in these 2 weeks was 82 seconds (interquartile range = 52 to 155), with a range of 0 to 86 seconds across organizations.
The mean percentage of certification activity choices that reflected a quality gap (an area where the physician’s quality performance was below the study population mean) was 28.5% overall, with no difference between intervention and control groups (31.1% vs 25.9%) (Table 3). Additionally, there was no significant between-group difference within each organization.
In adjusted nested logistic regression models, there was no significant association of intervention vs control study group with selecting a certification activity that addressed a quality gap (odds ratio [OR] = 1.28; 95% CI, 0.90-1.82) (Table 4). Male physicians had higher odds of selecting such certification activities (OR = 1.51; 95% CI, 1.06-2.16), while physicians having board examination scores in the middle and top tertiles had lower odds. There was significant variation by organization. Results were similar in our sensitivity analysis that excluded participants in SETMA (Supplemental Table 3).
In the analyses restricted to intervention group physicians (Table 5), there was no significant association between time spent reviewing the quality dashboard and selecting a certification activity that addressed a quality gap using either a 2-week look-back period (OR = 1.09; 95% CI, 0.46-2.55) or a 3-week look-back period (OR = 0.91; 95% CI, 0.39-2.08).
Our post hoc power analysis found that we had 7% power to detect a statistically significant difference between the intervention and control groups for the primary outcome.
DISCUSSION
Physicians in health systems with the capability to provide quality measures largely did not use the data provided in their ABFM physician portfolio to select certification activities aimed at addressing quality areas with lower performance. The negative result of this study is concerning as physicians are under increasing time pressures with EHR documentation and nondirect patient care tasks,13,14 which may be a root cause of burnout.15 Aligning certification activities with practice improvement using data already gathered from the EHR should reduce burden while also increasing the relevance and meaningfulness of these activities.
Our results indicate that nearly 3 in 4 certification activities completed by physicians in this trial did not focus on quality areas where their practice data showed room for improvement. This finding is in keeping with existing literature that physicians often choose continuing medical education activities in clinical areas of strength.2-4 Given that nearly 20% of our physicians never reviewed their ABFM portfolio quality dashboard and only 45% reviewed it 2 weeks before beginning an activity, it is not surprising that we found no evidence that quality data were used to guide the selection of certification activity. Physicians may have accessed an internal quality measure dashboard, but these data should have been similar to those in the ABFM portfolio, albeit without the peer benchmark; it is unknown if these measures were used to help drive practice-level QI efforts.
Assumptions of our study were that lower quality measure scores reflect lower levels of medical knowledge on a clinical topic, and that physicians will choose knowledge activities to increase their clinical quality scores. A study of general internists suggests that physicians’ medical knowledge, as measured on a certification examination, significantly moderated the association between practice infrastructure and quality of care.16 This association suggests that although quality measure performance may be driven by knowledge of clinic workflows and guidelines, physicians with more medical knowledge are better able to translate those principles into improved patient care. We did not find evidence that higher ABFM examination scores influenced our outcome. Additionally, there may be different motivation in choosing a knowledge activity vs a QI activity, but we had insufficient power to test each separately.
Multicomponent audit and feedback programs using practice data can prompt physicians to perform QI activities that focus on practice gaps with increased relevance.17-19 Our study tested only whether providing comparative performance could increase alignment of certification activity choice with gaps. This nudge played to physicians’ intrinsic motivation to improve care, which has been recommended as a way to improve quality programs,20 and included other components of audit and feedback known to improve care such as orientation to patient outcome data and support from health care system.19 Future studies to increase the relevance of quality data to guide certification activity choice may need to be more grounded in the everyday clinic environment and use team approaches, and may need practice facilitation and support over multiple touch points to see an effect.
We found significant variation across organizations in the odds of physicians choosing certification activities related to quality measures with lower performance, with SETMA physicians having the highest odds of selecting such activities. SETMA posts all quality data by physician on their webpage, so these physicians likely knew these data regardless of the trial. Because of organizational or clinic-level group QI efforts, choice of QI activity may have been driven by factors outside the physician’s control.
Our study had limitations. First, although the study was powered for a different primary outcome, we fell short of our recruitment goal and, based on post hoc analyses, the study was underpowered. We extended the trial by 1 year to capture more selection of certification activities to partially compensate. The low participation rate, however, threatens the generalizability of the study results and introduced volunteer bias. Second, we had multiple data issues with our partner organizations including failure of data transmission, errors in measure calculation, and differences in measure standardization despite using e-certified quality measures and having agreement about measures used before study initiation. These problems, described in full in another article,21 created implementation issues and occasionally produced incorrect data on dashboards. This situation may have affected physicians’ use and trust of the dashboard. Third, we studied only choice of certification activity, while physicians may have used their quality data to guide choice of other continuing medical education activities.
In conclusion, we found that providing physicians their quality measures inside their certification portfolio did not drive their selection of certification activities. Multicomponent interventions with multiple touch points are likely needed to change physician behavior.
Acknowledgments
We would like to thank Miriam Dickson, PhD, for assistance with randomization. Bo Fang, PhD, also contributed to the analysis.
Footnotes
Conflicts of interest: authors report none.
Funding support: This work was supported by an Agency for Healthcare Research and Quality grant (R21 HS 022583-01A1).
Disclaimer: The views expressed are solely those of the authors and do not necessarily represent official views of the authors’ affiliated institutions or funder.
Previous presentations: This work was presented at the North American Primary Care Research Group Annual Meeting; November 16-20, 2019; Toronto, Ontario, Canada.
- Received for publication September 22, 2020.
- Revision received June 24, 2021.
- Accepted for publication July 19, 2021.
- © 2022 Annals of Family Medicine, Inc.