THE EVOLUTION OF KNOWLEDGE ASSESSMENT: ABFM’S STRATEGY GOING FORWARD
======================================================================

* Warren P. Newton
* Thomas R. O’Neill
* David W. Price

Knowledge is a foundation of the public trust in physicians. Knowledge drives diagnosis, treatment, and the shared decision making central to health care. Assessment of medical knowledge has thus been a cornerstone of Board Certification since 1914, when the American Board of Ophthalmology developed the first Board examination. As ABFM rethinks its certification portfolio, it is appropriate to revisit the scientific rationale for assessment of knowledge. What follows frames the key questions and evidence that drive ABFM policy and describes next steps.

## Do Physicians Know What They Know?

It is common for physicians to be confident about what they know. Unfortunately, however, the accuracy of self-assessment of knowledge is uneven and often poor. In the 1990s, Kruger and Dunning conducted a series of experiments with undergraduate students examining their ability to self-assess against objective criteria “in [1 of 4] domain[s] in which knowledge, wisdom, or savvy was crucial: humor, logical reasoning, and English grammar.”1 In each test, there was a relatively narrow range of self-perceived ability, but a much wider range was seen in actual test scores. In each case, the highest objective performers somewhat underestimated their ability and performance. Importantly, however, the lowest performers substantially overestimated their abilities. Subsequently, there has been substantial literature documenting that the Dunning-Kruger effect is pervasive across different professions,2 including medicine.3

Further complicating self-assessment of clinical knowledge is the challenge of keeping up to date. Modern health care is dynamic, with ongoing and important changes in practice standards. There is good evidence that more experienced physicians may be less likely to apply up to date practice guidelines.4 This is particularly challenging in a generalist specialty like family medicine. In recent years, for example, consider recent changes in evidence and practice in areas including COVID diagnosis and management, a new generation of effective agents for diabetes, point-of-care ultrasound and new guidelines for depression screening and overuse of x-rays for back pain.

The major goal of ABFM’s major knowledge assessments is thus to provide objective, independent assessment of the knowledge necessary to be board-certified. This is true for both the 1-day Family Medicine Certification Examination and the Family Medicine Certification Longitudinal Assessment (FMCLA).

## Is Assessment of Clinical Knowledge Valid and Fair?

Concerns about bias in standardized testing for undergraduate admissions have led some to question all multiple-choice testing. ABFM, however, believes that assessment of clinical knowledge can be valid and fair. Psychometrics emerged in the 1880s5; since then, many procedures have been developed for assessing specialty-specific clinical knowledge.6,7 Procedures for setting standards8 and for detecting possible9,10 biased questions for or against specific groups of examinees have been developed. Standards for testing organizations11,12 have been developed to promote best testing practices. ABFM’s strategy for certification, quality control and revalidation have been anchored in these foundations.

Some clinicians believe that it is easy to write multiple choice questions. Like meta-analysis, however, writing clinical multiple-choice questions is easy to do but hard to do well. Substantial effort goes into the development of questions used for ABFM certification. Practicing clinicians write questions, experienced editors work on them, and 2 different committees of practicing clinicians review them. ABFM pre-tests all questions on large samples of examinees, and, since 2013, has regularly used differential item functioning analysis10 to review all examination questions for evidence of racial, ethnic, or gender bias. Although a few questions have been removed as the result of this ongoing review, there seems to be no substantial systematic question-level bias: indeed, our questions slightly favor underrepresented minorities.13 We are committed to continuing this review process and will extend it to rurality.

An important advantage of standardized testing is fairness compared with oral examinations and other alternatives. Setting a passing standard that is uniform for all examinees is crucial. The ABFM reviews the passing standard once every 3 years. To help set the standard, ABFM collects extensive data from a pool of participants—much larger than most certification boards. Although setting a standard includes many different analyses, it is ultimately a policy decision. The ABFM Board of Directors reviews the results from several standard-setting procedures8 as well as information about the history of the passing standard over time and predictions for future pass rates. They then decide to either retain the existing standard or increase or decrease it by a specific amount.

## Can Board Certification Support Learning and Keeping Up to Date?

Although the major purpose of ABFM certification examinations is to assess knowledge on behalf of the public, there has been also substantial recent interest in using the certification process to support learning and keeping up to date. There is substantial evidence that knowledge decays over time. A “forgetting curve” was first empirically described by Ebbinghaus.14 Studies have since confirmed that, without reinforcement, much newly learned information is quickly forgotten, with further forgetting occurring more slowly over time.15 How and at what level of granularity this applies to clinical knowledge is less clear. Other knowledge, experiences, and cognitive processes as well as normal aging can interfere with information retrieval.16

Assessment can drive learning. Ample evidence published over the past 20 years shows the knowledge retrieval that occurs with testing is more effective than “traditional studying” for knowledge retention.17 Many medical schools have recently added frequent regular testing during the basic science years, and students and residents now commonly prepare for exams by using question banks. Feedback after questions further enhances learning and retention.17,18 Finally, there is evidence that the enhancement effect of assessment works both with rote information and more complex skills19 such as clinical reasoning.

These findings drive ABFM’s goal to leverage assessment to support learning and keeping up to date. We aim to help Diplomates identify gaps in knowledge so that they can turn to the AAFP and other sources for focused education. FMCLA will be a mainstay of our efforts. As described in a recent editorial,20 the FMCLA pilot has demonstrated significant learning among participants. Across over 11,000 Diplomates enrolled in the pilot, 95% used references to answer questions and 84% sought information after completing the test. Eighty-two percent indicated that they had changed their practice as the result of participating in FMCLA, and 89% reported incorporating FMCLA into their usual approach to keeping up with medical knowledge.

Interestingly, stakes seem to matter. Internal data allows us to contrast performance in over 11,000 Diplomates in the FMCLA pilot with the performance of over 20,000 Diplomates on Continuous Knowledge Self Assessment (CKSA). The 2 activities use a similar pool of questions, have the same IT platform, require 25 questions in a quarter and have the same psychometric scale. Diplomates are more likely to have had a higher percent of correct answers on the higher-stakes FMCLA than the lower-stakes CKSA. This is also true for the 4,701 Diplomates who have done both activities simultaneously. Assessment used for summative evaluations of knowledge thus seems to serve a valuable role in facilitating learning.

## Conclusion

In service to the public, ABFM will continue to provide independent assessment of the knowledge necessary for Board-certified family physicians of their careers. At the same time, we will support continuous learning and keeping up to date. Linking assessments to board certification supports learning. More broadly, working with the AAFP and other partners, our role is to guide self-learning: to provide objective information to Diplomates about gaps of knowledge to help them target their CME. In addition, as part of our commitment to advancing the scientific basis of Board certification, we are collaborating with the American Board of Internal Medicine and the American Board of Medical Specialties to commission an extensive independent review of the evidence undergirding Board certification.21 This comprehensive review will address the foundations of what we do: Do cognitive skills need to be kept current? Is self-assessment of knowledge sufficient? Does testing enhance learning? Do consequences matter? Finally, it is important to keep in mind that knowledge is only one dimension of excellence in clinical care, along with professionalism, the personal commitment to keep up to date and to improve care. Board Certification in Family Medicine is, and always has been, about more than knowledge.

*   © 2021 Annals of Family Medicine, Inc.

## References

1.  1.Kruger J, Dunning D. Unskilled and unaware of it: how difficulties in recognizing one’s own incompetence lead to inflated self-assessments. J Pers Soc Psychol. 1999;77(6):1121-1134.
    
    [CrossRef](http://www.annfammed.org/lookup/external-ref?access_num=10.1037/0022-3514.77.6.1121&link_type=DOI) 
    
    [PubMed](http://www.annfammed.org/lookup/external-ref?access_num=10626367&link_type=MED&atom=%2Fannalsfm%2F19%2F4%2F377.atom) 
    
    [Web of Science](http://www.annfammed.org/lookup/external-ref?access_num=000084208700002&link_type=ISI) 

2.  2.Zell E, Krizan Z. Do people have insight into their abilities? A meta-synthesis. Perspect Psychol Sci. 2014;9(2):111-125.
    
    [CrossRef](http://www.annfammed.org/lookup/external-ref?access_num=10.1177/1745691613518075&link_type=DOI) 
    
    [PubMed](http://www.annfammed.org/lookup/external-ref?access_num=26173249&link_type=MED&atom=%2Fannalsfm%2F19%2F4%2F377.atom) 

3.  3.Davis DA, Mazmanian PE, Fordis M, Van Harrison R, Thorpe KE, Perrier L. Accuracy of physician self-assessment compared with observed measures of competence: a systematic review. JAMA. 2006;296(9):1094-1102.
    
    [CrossRef](http://www.annfammed.org/lookup/external-ref?access_num=10.1001/jama.296.9.1094&link_type=DOI) 
    
    [PubMed](http://www.annfammed.org/lookup/external-ref?access_num=16954489&link_type=MED&atom=%2Fannalsfm%2F19%2F4%2F377.atom) 
    
    [Web of Science](http://www.annfammed.org/lookup/external-ref?access_num=000240298400011&link_type=ISI) 

4.  4.Choudhry NK, Fletcher RH, Soumerai SB. Systematic review: the relationship between clinical experience and quality of health care. Ann Intern Med. 2005;142(4):260-273.
    
    [CrossRef](http://www.annfammed.org/lookup/external-ref?access_num=10.7326/0003-4819-142-4-200502150-00008&link_type=DOI) 
    
    [PubMed](http://www.annfammed.org/lookup/external-ref?access_num=15710959&link_type=MED&atom=%2Fannalsfm%2F19%2F4%2F377.atom) 
    
    [Web of Science](http://www.annfammed.org/lookup/external-ref?access_num=000227239500004&link_type=ISI) 

5.  5.Anastasi A. Psychological Testing, 5th ed. Macmillan Publishing Co Inc; 1982.
    
    

6.  6.Raymond MR. A practical guide to practice analysis for credentialing examinations. Educ Meas. 2002;21(3):25-37.
    
    

7.  7.O’Neill TR, Peabody MR, Stelter KL, Puffer JC, Brady JE. Validating the test plan specifications for the American Board of Family Medicine’s Certification Examination. J Am Board Fam Med. 2019;32(6): 876-882.
    
    [Abstract/FREE Full Text](http://www.annfammed.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NToiamFiZnAiO3M6NToicmVzaWQiO3M6ODoiMzIvNi84NzYiO3M6NDoiYXRvbSI7czoyMzoiL2FubmFsc2ZtLzE5LzQvMzc3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

8.  8.Cizek GJ, Bunch MB. Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Sage Publications; 2007.
    
    

9.  9.1.  Wainer H, 
    2.  Braun H
    
    Holland PW, Thayer DT. Differential item performance and the Mantel-Haenszel procedure. In: Wainer H, Braun H, eds. Test Validity. Lawrence Elrbaum Associates, Inc; 1988:129-145.
    
    

10. 10.O’Neill TR, Peabody MR, Puffer JC. The ABFM begins to use differential item functioning. Ann Fam Med. 2013;11(6):578-579.
    
    [FREE Full Text](http://www.annfammed.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiRlVMTCI7czoxMToiam91cm5hbENvZGUiO3M6ODoiYW5uYWxzZm0iO3M6NToicmVzaWQiO3M6ODoiMTEvNi81NzgiO3M6NDoiYXRvbSI7czoyMzoiL2FubmFsc2ZtLzE5LzQvMzc3LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 

11. 11.AERA, APA, NCME. Standards for Educational and Psychological Testing. American Educational Research Association; 2014.
    
    

12. 12.National Organization for Competency Assurance. Standards for the Accreditation of Certification Program. National Organization for Competency Assurance; 2004.
    
    

13. 13.O’Neill TR, Wang T, Newton WP. The American Board of Family Medicine’s 8 years of experience with differential item functioning. J Amer Board Fam Med. Under review.
    
    

14. 14.Ebbinghaus H. Memory: A Contribution to Experimental Psychology. Teachers College, Columbia University; 1885.
    
    

15. 15.Murre JMJ, Dros J. Replication and analysis of Ebbinghaus’ forgetting curve. PLoS One. 2015;10(7):e0120644.
    
    [CrossRef](http://www.annfammed.org/lookup/external-ref?access_num=10.1371/journal.pone.0120644&link_type=DOI) 
    
    [PubMed](http://www.annfammed.org/lookup/external-ref?access_num=26148023&link_type=MED&atom=%2Fannalsfm%2F19%2F4%2F377.atom) 

16. 16.Schwartz BL, Benjamin AS, Bjork RA. The inferential and experiential bases of metamemory. Curr Dir Psychol Sci. 1997;6(5):132-137.
    
    [CrossRef](http://www.annfammed.org/lookup/external-ref?access_num=10.1111/1467-8721.ep10772899&link_type=DOI) 

17. 17.Larsen DP, Butler AC, Roediger HL III.. Test-enhanced learning in medical education. Med Educ. 2008;42(10):959-966.
    
    [CrossRef](http://www.annfammed.org/lookup/external-ref?access_num=10.1111/j.1365-2923.2008.03124.x&link_type=DOI) 
    
    [PubMed](http://www.annfammed.org/lookup/external-ref?access_num=18823514&link_type=MED&atom=%2Fannalsfm%2F19%2F4%2F377.atom) 
    
    [Web of Science](http://www.annfammed.org/lookup/external-ref?access_num=000259227200005&link_type=ISI) 

18. 18.Butler AC, Roediger HL III.. Feedback enhances the positive effects and reduces the negative effects of multiple-choice testing. Mem Cognit. 2008;36(3):604-616.
    
    [CrossRef](http://www.annfammed.org/lookup/external-ref?access_num=10.3758/MC.36.3.604&link_type=DOI) 
    
    [PubMed](http://www.annfammed.org/lookup/external-ref?access_num=18491500&link_type=MED&atom=%2Fannalsfm%2F19%2F4%2F377.atom) 

19. 19.Phillips JL, Heneka N, Bhattarai P, Fraser C, Shaw T. Effectiveness of the spaced education pedagogy for clinicians’ continuing professional development: a systematic review. Med Educ. 2019;53(9): 886-902.
    
    

20. 20.Newton W, Baxley E, O’Neill T, Fain R, Rode K, Stelter K. Family medicine certification longitudinal assessment becomes permanent. J Amer Board Fam Med. In press.
    
    

21. 21.Fraundorf S. Examining the foundational science behind continuing board certification. Presented at Annual American Board of Medical Specialties Virtual Conference, September 23, 2020.