Abstract
PURPOSE Care continuity is foundational to the clinician/patient relationship; however, little has been done to operationalize continuity of care (CoC) as a clinical quality measure. The American Board of Family Medicine developed the Primary Care CoC clinical quality measure as part of the Measures That Matter to Primary Care initiative.
METHODS Using 12-month Optum Clinformatics Data Mart claims data, we calculated the Bice-Boxerman Continuity of Care Index for each patient, which we rolled up to create an aggregate, physician-level CoC score. The physician quality score is the percent of patients with a Bice-Boxerman Index ≥0.7 (70%). We tested validity in 2 ways. First, we explored the validity of using 0.7 as a threshold for patient CoC within the Optum claims database to validate its use for reflecting patient-level continuity. Second, we explored the validity of the physician CoC measure by examining its association with patient outcomes. We assessed reliability using signal-to-noise methodology.
RESULTS Mean performance on the measure was 27.6%; performance ranged from 0% to 100% (n = 555,213 primary care physicians). Higher levels of CoC were associated with lower levels of care utilization. The measure indicated acceptable levels of validity and reliability.
CONCLUSIONS Continuity is associated with desirable health and cost outcomes as well as patient preference. The CoC clinical quality measure meets validity and reliability requirements for implementation in primary care payment and accountability. Care continuity is important and complementary to access to care, and prioritizing this measure could help shift physician and health system behavior to support continuity.
- quality
- measures
- continuity of care
- value
- National Quality Forum
- merit-based incentive payment system
- Quality Payment Program
- family medicine
INTRODUCTION
Continuity of care (CoC) is a central tenet of primary care and is associated with fewer hospitalizations and emergency department (ED) visits, better patterns of care utilization, lower costs for patients with chronic conditions and residents of long-term care facilities, and lower mortality.1-13 Despite this, the translation of CoC from a research construct to a clinical quality measure had been limited to a single National Quality Forum (NQF)-endorsed measure (Continuity of Primary Care for Children With Medical Complexity [NQF #3153]; endorsed in 2017), for which endorsement was removed in 2020 because the measure was withdrawn by the developer. Measures endorsed by the NQF, a not-for-profit and non-partisan organization, serve as an important foundation to improve value and safety in health care. To address the lack of clinical quality measurement of a central tenet of primary care and to promote CoC as a quality indicator for primary care physicians (PCPs), the American Board of Family Medicine developed and rigorously tested the validity and reliability of the Measuring the Value-Functions of Primary Care: Physician-Level Continuity of Care measure as a component of its Measures That Matter to Primary Care initiative. The measure received full NQF endorsement in December 2021 (NQF #3617).
The primary goal of the present study was to present the methods and results of the validity and reliability testing for the CoC measure required for NQF endorsement. Although each quality measure is unique, our goal for documenting the measure-testing methodology, outcomes, and key components of the measure application was to help primary care researchers understand the translation from a research measure to a fully endorsed clinical quality measure. Whereas the analytical and statistical tasks involved in clinical quality measure testing resemble those of a research project, this study serves as a representation of the highly rigorous NQF evaluation process, in which a candidate measure addresses both significance and feasibility and also provides evidence-based information for it to be useful for quality improvement.14
METHODS
Data and Population
We used claims data from the Optum Clinformatics Data Mart (CDM, SES version 3.0; Optum Inc) database for a 12-month period (July 1, 2018-June 30, 2019). The Optum CDM comprises administrative claims for large commercial and Medicare Advantage health plans, containing medical claims and laboratory results for 15 to 18 million annual covered lives spanning 50 states. The analysis was limited to patients who had ≥2 primary care visits during an observed year. Primary care visits were defined as visits to PCPs in the outpatient setting. In the Optum CDM, this was operationalized by using the health care services categorization code (01) to identify PCPs, which included family physicians, general internists, obstetricians/gynecologists, and pediatricians, and outpatient place-of-service codes (01, 02, 03, 04, 11, 12, 13, 14, 15, 16, 17, 41, 42, 49, 50, 53, 57, 60, or 71).
From Patient to Physician Continuity of Care Measure
Multiple measures of CoC, including the Usual Provider Continuity, Modified Continuity, and Herfindahl Indices, exist and have been shown to be highly correlated, suggesting that they are similar.5 We chose to use the Bice-Boxerman Continuity of Care Index (BBI) because it has been used to study CoC in primary care,15-19 it appears in a measure previously endorsed by the NQF, which signals that organization’s belief in its validity, and it does not require one to attribute a patient to a specific PCP. The BBI attempts to capture the dispersion of visits across a set of physicians for individual patients rather than the aggregate population.20 That is, instead of just considering the percent of visits with an attributed PCP, the measure considers how many different PCPs were seen and how many times each was seen. The full formula is as follows1:
where TotalVisitsphyall reflects the total number of PCP visits by an individual during the observation period, and TotalVisitsphyi reflects the total number of visits to PCP i. In the case in which all visits were with the same physician, those values are equivalent and would result in a CoC of 1 (the maximum). If an individual saw multiple PCPs but only saw each one a single time, the numerator would be equal to 0 and produce a CoC of 0 (the minimum). Different combinations of the number of different PCPs seen and the number of times seen would produce values between 0 and 1, with higher values indicating greater continuity. In this way, patients are not assigned or attributed to a single PCP, and the BBI reflects a patient-centered measure of continuity.
To create a physician-based quality measure, we needed to determine how to consider patient-level CoC from the physician perspective. Whereas there is no standard for what constitutes high or low continuity, research has shown that patients with a BBI ≥0.7 often experience better outcomes.5 Therefore, we considered 0.7 a reasonable threshold for patient-level CoC. Our physician-level quality measure is then calculated as the number of patients seen who have a CoC of ≥0.7 divided by the total number of patients seen by that physician. For example, if a physician saw 8 patients during the observation period, and 5 had a CoC of ≥0.7, the physician’s performance during that period is calculated as 5 divided by 8, or 62.5%. (It should be noted that the BBI requires that patients have ≥2 PCP visits to any PCP during the observation period, so the physician’s measure only includes those patients seen who had ≥2 visits, ≥1 of which was with that physician). The physician CoC score ranges from 0% to 100%, and higher values indicate better performance (see formula below).
From a statistical standpoint, the physician CoC is essentially a sample proportion, which will inform the methods used to estimate reliability.
Empirical Testing Requirements for the National Quality Forum
The NQF requires that organizations empirically demonstrate several components of a proposed quality measure to receive endorsement of the measure, which represents an approval by the NQF for the adequacy of the measure for use in practice. Specifically, the NQF requires descriptive statistics of measure performance as well as testing of its validity and reliability. Validity is often determined by correlating performance on the proposed quality measure to subsequent patient outcomes thought to be related to the activities involved in the proposed quality measure. If a correlation exists, that is evidence that the quality measure reflects care that could influence those specific patient outcomes. For reliability, the NQF recommends using a specific signal-to-noise analysis described elsewhere.14 The analytic methods described below reflect our attempt to meet the NQF’s empirical requirements.
Measure Performance
We constructed descriptive statistics for the performance of the physician CoC measure for all tested entities (physicians). These statistics included mean, SD, SE, 95% CI, median, range, and interquartile range of scores across the measured entities.
Testing the Validity of the Physician Continuity of Care Measure
Because our measure incorporates both the patient-level BBI and a physician-level construct of the BBI, validity was tested in 2 ways. First, the validity of using 0.7 as a threshold for the patient-level BBI calculation within the Optum claims database was explored to see whether it correlated with better patient outcomes. Specifically, we used logistic regression to associate the odds of ≥1 ED visit in the measurement year with achieving the threshold of 0.7. Emergency department visits were selected as an outcome of interest on the basis of the extensive literature that has previously associated greater levels of continuity with lower levels of ED utilization. We adjusted the logistic model for patient sex and race. This is designed to look for evidence of convergent validity (ie, higher patient CoC scores associated with lower odds of having an ED visit). Second, we explored the validity of the physician-level CoC measure by examining its association with patient outcomes. Using the same data, we calculated for each physician the percentage of patients in their measure denominator who had ≥1 ED visit during the measurement period. The hypothesis was that physicians with higher CoC scores (ie, a greater percentage of patients with CoC ≥0.7) would have a lower percentage of patients experiencing ≥1 ED visit. To examine this association, we used linear regression (PROC REG in SAS; SAS Institute Inc) to estimate 2 models, each using the percent of patients with ≥1 ED visit as the response variable. The first model used physician CoC scores as the only independent variable. The second model also included physician specialty as a covariable to account for differences across specialties. For the second model, we used reference coding (with Pediatrics as the reference). Given that the Other category included >1 specialty (such as Maternal Specialist or Adult Medical Specialist) but accounted for only 7% of the physicians, those physicians were excluded from the analysis.
Testing the Reliability of the Physician Continuity of Care Measure
Reliability reflects the ability of a measure to accurately discriminate between entities that truly differ in quality. This was assessed in the current analysis using a signal-to-noise methodology. Specifically, we used the beta-binomial model, as recommended by the NQF and described by Adams,21 to evaluate the reliability of the physician-level measure. This method of signal-to-noise analysis attempts to estimate the variability in the quality measure between physicians as well as the variability within physicians, where the former reflects the signal and the latter the noise. Because the variability of a proportion depends, in part, on the proportion itself (ie, the variance of a proportion is greatest at 0.5 and decreases as it gets closer to 0 or 1), this method calculates a reliability score for each physician. Assessing overall reliability involves evaluating the distribution of reliability scores across the measured population. A reliability of 0 implies that all variability is due to measurement error, whereas a reliability of 1 indicates that all variability is due to real differences in performance. Reliability scores closer to 1 reflect better reliability. A general rule of thumb is that a reliability score of ≥0.7 is acceptable.5 Therefore, evaluating reliability involves evaluating how much of the distribution of all reliability scores is at or above that threshold. Because ≥2 PCP visits are required, the measure is only applicable for patients who have >1 PCP visit during the measurement period. This can limit the number of patients eligible for the denominator for some PCPs; in our results, we observed that many PCPs had small sample sizes (denominators <5) for this measure (we did not calculate how many were excluded because of only 1 PCP visit). Therefore, in addition to examining the reliability scores for the entire population, we also examined the reliability scores among subsets of physicians with larger denominators (≥5 and ≥10 patients).
RESULTS
Patient Population
The data set included a total of 555,213 physicians. Geographically, all 50 states and Puerto Rico were represented, with the number of physicians per state ranging from 135 (Puerto Rico) to 42,343 (Texas). The 5,478,835 patients included in the analysis (ie, those with ≥2 PCP visits during the study period) were more often female (58%), and the race/ethnicity breakdown was 70% White, 10% Hispanic, 10% Black, 5% Asian, and 4% unknown. The sample had a full range of patient age, with 26% aged <35 years, 21% aged 35-54 years, 14% aged 55-64 years, 19% aged 65-74 years, and 19% aged ≥75 years.
Measure Performance
The mean physician performance on the measure across all 555,213 physicians was 27.6%, meaning that more than one-quarter of the patient panels of these physicians had a continuity score ≥0.7. The SD was 30.6% (suggesting a large amount of variability in physician performance), and the median was 18.0% (indicating a skewed distribution toward higher performance because the median is less than the mean). The 25th percentile was 0%, meaning that more than one-fourth of physicians had the worst possible score for continuity. However, the maximum score was 100%, and the 75th percentile was 50%, suggesting significant spread in physician performance.
Validity
Validity testing of the threshold of 0.7 for patient-level continuity showed that achieving that threshold was significantly associated with decreased odds of having ≥1 ED visit (Table 1; adjusted odds ratio = 0.718, P < .0001). Greater physician-level continuity (ie, greater proportion of the patient panel having ≥0.7 continuity) was associated with a lower percentage of that physician’s patients having ≥1 ED visit in both the unadjusted and adjusted models (Table 2).
Validation of Patient-Level Continuity-of-Care Threshold: Odds Ratio Estimates From Adjusted Logistic Regression
Validation of Physician-Level Continuity-of-Care Measure: Simple Linear Regression on Percent of Patients With ≥1 ED Visit
Reliability
Figure 1 and its accompanying data table show changes in the reliability testing results on samples with different restrictions. The reliability testing for the entire population of physicians produced a mean reliability of 0.85, a median reliability of 0.95, and a minimum reliability of 0.27. After limiting the sample to physicians with >5 and >10 patients in the denominator, the minimum reliability increased to 0.49 and 0.65, respectively, whereas the 90th and 75th percentiles decreased. Across all 3 samples, the mean reliability remained in the mid-0.80s. Limiting the reliability analyses to physician samples with ≥5 and ≥10 patients with ≥2 visits excluded 44% and 61%, respectively, of the physicians.
Reliability testing results and accompanying data table.
DISCUSSION
Drawing on a national sample of primary care physicians and a previously validated CoC index, we assessed the properties of physician-level CoC for primary care as a clinical quality measure. The results of reliability testing suggest that the measure meets or exceeds acceptable criteria, based on the majority of reliability scores >0.7. This remained true when the sample was limited to those with >5 and >10 patients in the denominator. When limiting the sample, the minimum reliability increased from 0.27 to 0.65, suggesting that many low-reliability values might be a function of small sample size as opposed to inherent reliability of the measure itself. The decrease in some of the upper percentiles in the limited samples suggested that a number of reliability values of 1.0 were also excluded, but the 75th and 90th percentiles decreased only slightly, to 95% and 99%, respectively, reflecting that there was a large number of extremely high reliability values, even with larger sample sizes.
In addition, our findings of the significant negative association between physician-level continuity and percentage of patients with any ED visits in a national sample of >500,000 primary care physicians suggest empirical validity and utility of this measure. Other studies have also shown an association between care continuity and care utilization. A Canadian study with similar methods to those in the present study observed an increased rate of ED use among those with low and medium levels of care continuity with a PCP.11 A study of US Medicare patients with dementia reported that compared with those in the highest-continuity group, those with the lowest levels of care continuity had greater levels of hospitalization, ED visits, computed tomography, and overall medical spending.10 In another Medicare-based study, patients in the highest quintile for care continuity had 14.1% lower health care expenditures and 16.1% lower hospitalization rates than those in the lowest-continuity quintile.5 These studies suggest a link between greater care continuity and lower levels of utilization and cost.
The present results should be viewed in light of several limitations. First, the BBI requires ≥2 visits; therefore, it does not include the entire PCP patient population. Second, many physicians practice team-based care, in which a patient has the option to see 1 of multiple physicians potentially including a nurse practitioner or physician assistant. Whereas visits to these types of clinicians would not result in a decrease in the BBI as calculated in the present study (because these clinician types were not classified as PCPs), the PCP would also not get credit for these visits, which would potentially strengthen continuity. The structure and makeup of care teams can vary across practices, which makes it challenging to establish a consistent definition of which providers constitute a care team. Identification of care teams is needed to further study and measure team-based continuity of care and assess how this measure could be applied in care settings. Our intent is that future iterations of the measure will incorporate any new research in this area. Finally, many primary care clinics have separated acute, same-day visits from wellness and chronic care visits, aiming to improve access. On the basis of this continuity measure, that could be deleterious to outcomes, reflecting a tension between access and continuity worthy of attention. This emphasizes the importance of measuring both access and continuity in primary care settings and studying their relations to outcomes and value.
There is a movement to reengineer primary care practices to achieve the Triple Aim of health reform—better health, improved patient experience, and more affordable costs.22 The 10 building blocks of high-performing primary care are part of a conceptual model that guides practice improvement.22 Four of the 10 are foundational—engaged leadership, data-driven improvement, empanelment, and team-based care—and assist with the implementation of the other 6: patient-team partnership, population management, CoC, prompt access to care, comprehensiveness, and care coordination. In this model, we see that patient empanelment is foundational but insufficient for implementation of CoC.22 The Centers for Medicare and Medicaid Services (CMS) currently has a Quality Payment Program improvement activity using empanelment (CMS IA_PM_12: Population Empanelment).23 Empanelment identifies the patients and populations for whom the Merit-Based Incentive Payment System–eligible physician or group and/or care team is responsible and is the foundation for relationship continuity. It will be important to assess the outcome of empanelment, which is the measure we propose.
The CoC quality measure and the research surrounding it should be a signal to practices and health systems that care continuity is important and should be complementary to efforts to improve access to care. Studies have repeatedly shown that high levels of CoC in primary care are associated with lower care utilization and costs and are preferred by patients and clinicians. Without this measure, access trumps continuity because it also aligns with practice efficiency; a highly valued continuity measure could help balance this tension and drive different appointment and practice strategies. This measure also aligns with the CMS Meaningful Measures Initiative domains of Strengthen Person and Family Engagement as Partners in Their Care, and the Meaningful Measures Area of Care is Personalized and Aligned With Patient’s Goals, and Patient’s Experience of Care. Currently, the physician-level CoC quality measure is being used in the CMS Quality Payment Program Merit-Based Incentive Payment System for Individual physicians and group practices and has been used by the PRIME Qualified Clinical Data Registry since 2018. The American Board of Family Medicine is also developing a merit-based incentive payment system Value Pathway, of which the CoC measure will be a part. The present test of validity and reliability should help the CMS in its adoption of CoC for general use in primary care.
Footnotes
Conflicts of interest: authors report none.
Funding support: The American Board of Family Medicine and the American Board of Family Medicine Foundation.
Disclaimer: The views expressed in this article are the authors’ own and are not an official position of the American Board of Family Medicine, the American Board of Family Medicine Foundation, or The Center for Professionalism and Value in Health Care.
- Received for publication August 20, 2021.
- Revision received July 6, 2022.
- Accepted for publication July 29, 2022.
- © 2022 Annals of Family Medicine, Inc.