All Quality Metrics are Wrong; Some Quality Metrics Could Become Useful

Michael E. Johansen; Andrew S. Detty; Jonathan Doo Young Yun

doi:10.1370/afm.250087

Published eLetters

If you would like to comment on this article, click on Submit a Response to This article, below. We welcome your input.

Submit a Response to This Article

Compose eLetter

Title *

Author Information

Contributors
First Name and Middle Initial * First or given name, e.g. 'Peter'. Last Name * Your last, or family, name, e.g. 'MacMoody'. Email Address * Your email address, e.g. higgs-boson@gmail.com Occupation * Your role and/or occupation, e.g. 'Orthopedic Surgeon'. Affiliation * Your organization or institution (if applicable), e.g. 'Royal Free Hospital'.

Statement of Competing Interests

Competing interests? *

Yes

Please describe the competing interests

CAPTCHA

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.

What code is in the image? *

Enter the characters shown in the image.

Vertical Tabs

Jump to comment:

Why Metrics Fail When Meaningful Care Is the Goal
Rebeca Tenajas and David Miraut

Published on: 17 May 2025
RE: All Quality Metrics are Wrong: the origin of that aphorism
Mark H. Ebell

Published on: 25 February 2025

Published on: (17 May 2025)
Page navigation anchor for Why Metrics Fail When Meaningful Care Is the Goal
Why Metrics Fail When Meaningful Care Is the Goal
Rebeca Tenajas, Medical Doctor, Master in Medicina Clínica, Family Medicine Department, Arroyomolinos Community Health Centre, Spain
Other Contributors:
David Miraut, Independent Researcher
Dear Editor,

In reading Johansen and colleagues’ editorial (1) on the variable utility of quality metrics in primary care we were struck by how faithfully their observations echo a warning articulated half a century ago by Charles Goodhart: “When a measure becomes a target, it ceases to be a good measure.” (2). Goodhart’s insight, originally framed in monetary policy, has found a close companion in Campbell’s formulation that quantitative indicators used for decision-making become “subject to corruption pressures … apt to distort the very processes they are intended to monitor” (3). Taken together, these dicta provide a useful lens through which to interpret the growing empirical record that Johansen et al marshal.

The United Kingdom’s Quality and Outcomes Framework (QOF) offers the most longitudinal data on pay-for-performance (P4P) in ambulatory care. A 2012 systematic review found improvements in documentation but little persuasive evidence of patient benefit and several signals of gaming and crowd-out of non-incentivised activities (4). Subsequent population-level work detected no acceleration in mortality decline attributable to the scheme, despite substantial expenditure (5). These mixed outcomes illustrate Goodhart’s and Campbell’s concerns: once financial reward is tied to a scorecard, effort accrues to what is counted, not necessarily to what matters.

The administrative burden attached to measurement is now quantifiable. Saraswathula and colle...
Show More

Dear Editor,

In reading Johansen and colleagues’ editorial (1) on the variable utility of quality metrics in primary care we were struck by how faithfully their observations echo a warning articulated half a century ago by Charles Goodhart: “When a measure becomes a target, it ceases to be a good measure.” (2). Goodhart’s insight, originally framed in monetary policy, has found a close companion in Campbell’s formulation that quantitative indicators used for decision-making become “subject to corruption pressures … apt to distort the very processes they are intended to monitor” (3). Taken together, these dicta provide a useful lens through which to interpret the growing empirical record that Johansen et al marshal.

The United Kingdom’s Quality and Outcomes Framework (QOF) offers the most longitudinal data on pay-for-performance (P4P) in ambulatory care. A 2012 systematic review found improvements in documentation but little persuasive evidence of patient benefit and several signals of gaming and crowd-out of non-incentivised activities (4). Subsequent population-level work detected no acceleration in mortality decline attributable to the scheme, despite substantial expenditure (5). These mixed outcomes illustrate Goodhart’s and Campbell’s concerns: once financial reward is tied to a scorecard, effort accrues to what is counted, not necessarily to what matters.

The administrative burden attached to measurement is now quantifiable. Saraswathula and colleagues estimated that a single US tertiary hospital spends about US $6.1 million annually merely to report mandatory quality indicators, with claims-based measures accounting for most of the cost (6). Such resources could plausibly yield greater population health gains if redirected toward direct care or thoughtfully designed quality-improvement research. Compounding the problem, frontline time is finite: simulation work suggests that delivering all guideline-recommended preventive and chronic care for an average patient panel would require 27 clinical hours per day, an impossibility that highlights the need to prioritise rather than multiply metrics (7,8).

Even when incentives are large, effect sizes may be modest. After more than a decade of the US Hospital Readmissions Reduction Program, refined analyses show only small absolute reductions in readmission rates, partly offset by growth in observation-status stays (9). In practice, clinicians experience these programmes less as levers for improvement than as triggers of moral distress and task illegitimacy, as Brulin and Teoh document (10) in the study cited by the editorial, and its related eLetters (11).

Johansen et al rightly call for rigorous pre- and post-implementation evaluation. We would add that evaluability itself should become a design criterion: a metric that cannot be risk-adjusted at reasonable cost, or that lacks a plausible causal pathway to patient-centred outcomes, should not progress beyond the pilot phase. Where evidence shows negligible or waning benefit, planned de-implementation is as essential as roll-out. This stance accords with emerging implementation-science guidance on “de-adoption” of low-value practices, and it reinforces the editorialists’ plea to confine incentives to measures that are time-limited, low-cost, and under clinicians’ control.

Finally, Einstein’s maxim: “Not everything that counts can be counted”, is more than a rhetorical flourish (12). It reminds us that primary care is built on strong relationships, careful decision-making, and adapting care to each patient’s situation. These important aspects of care are hard to measure with numbers, but they are what build trust and lead to the outcomes people care about most. The editorial shows, with evidence, that focusing too much on detailed performance metrics, without considering warnings like those from Goodhart and Campbell, can erode these essential characteristics of good primary care.

REFERENCES:

1. Johansen ME, Detty AS, Yun JDY. All Quality Metrics are Wrong; Some Quality Metrics Could Become Useful. Ann Fam Med. 2025 Mar 1;23(2):91–2.

2. Goodhart’s law. In: Wikipedia [Internet]. 2025 [cited 2025 May 16]. Available from: https://en.wikipedia.org/w/index.php?title=Goodhart%27s_law&oldid=128507...

3. Campbell DT. Assessing the impact of planned social change. Eval Program Plann. 1979 Jan 1;2(1):67–90.

4. Gillam SJ, Siriwardena AN, Steel N. Pay-for-Performance in the United Kingdom: Impact of the Quality and Outcomes Framework—A Systematic Review. Ann Fam Med. 2012 Sep 1;10(5):461–8.

5. Ryan AM, Krinsky S, Kontopantelis E, Doran T. Long-term evidence for the effect of pay-for-performance in primary care on mortality in the UK: a population study. The Lancet. 2016 Jul 16;388(10041):268–74.

6. Saraswathula A, Merck SJ, Bai G, Weston CM, Skinner EA, Taylor A, et al. The Volume and Cost of Quality Metric Reporting. JAMA. 2023 Jun 6;329(21):1840–7.

7. Porter J, Boyd C, Skandari MR, Laiteerapong N. Revisiting the Time Needed to Provide Adult Primary Care. J Gen Intern Med. 2023 Jan 1;38(1):147–55.

8. Tenajas R, Miraut D. Unhurried Conversations in a Hurried System: Lessons from Spanish Primary Care. Ann Fam Med. 2025 Jan 19;23(1):eLetter.

9. Figueroa JF, Wadhera RK. A Decade of Observing the Hospital Readmission Reductions Program—Time to Retire an Ineffective Policy. JAMA Netw Open. 2022 Nov 17;5(11):e2242593.

10. Brulin E, Teoh K. Performance-Based Reimbursement, Illegitimate Tasks, Moral Distress, and Quality Care in Primary Care: A Mediation Model of Longitudinal Data. Ann Fam Med. 2025 Mar 1;23(2):145–50.

11. Tenajas R, Miraut D. Performance Incentives and Their Unintended Consequences for Family Physicians. Ann Fam Med. 2025 Apr 20;23(2):eLetter.

12. Tenajas R, Miraut D. Evaluating Physician Value through Compassion. Ann Fam Med. 2024 Sep 24;22(4):eLetter.
Show Less

Competing Interests: None declared.
Published on: (25 February 2025)
Page navigation anchor for RE: All Quality Metrics are Wrong: the origin of that aphorism
RE: All Quality Metrics are Wrong: the origin of that aphorism
Mark H. Ebell, Professor of Family Medicien, Michigan State University
I appreciate the content and spirit of the editorial entitled "All Quality Metrics are Wrong; Some Quality Metrics Could Become Useful". However, the title strikes me as perhaps being an adaptation of SIr Muir Gray's quote about screening for disease: "All screening programmes do harm; some do good as well, and, of these, some do more good than harm at reasonable cost."(1) I had the privilege of meeting SIr Gray and Sir Iain Chalmers who were co-founders of the Cochrane Collaboration while cycling through Britain a few years before that quote appeared in 2005. I just thought it would be good to highlight the likely origin of the aphorism and its continuing relevance today.

Sincerely,

Mark H. Ebell MD, MS
Professor of Family Medicine
College of Human Medicine
Michigan State University

1 . Gray JAM, Patnick J, Blanks RG. Maximising benefit and minimising harm of screening. Br Med J 2008; 336(7642):480–483. doi: 10.1136/bmj.39470.643218.94

Competing Interests: None declared.