HIGH-STAKES KNOWLEDGE ASSESSMENT AT ABFM: WHAT WE HAVE LEARNED AND HOW IT IS USEFUL

Warren P. Newton; Thomas R. O’Neill; Ting Wang

doi:10.1370/afm.2811

Clinical knowledge is fundamental to the social contract between medicine and society. As 1 of the 6 core competencies, appropriate clinical knowledge is effortfully acquired, constantly updated through practice and learning, and regularly assessed independently through board certification—and patients care a lot about it.

It is thus important for ABFM to regularly review the validity of ABFM high-stakes knowledge assessments. In comparison with other common assessments of clinical knowledge—the ward attending who sees the medical student on rounds and asks some questions, patient satisfaction surveys, a medical school specialty advisor who writes a letter of recommendation—a well-constructed multiple-choice exam potentially provides a more standardized approach, greater reliability and scalability, and much less expense. In an age of increased understanding of structural racism, however, it is important to ask whether board certification exams are biased against certain racial and ethnic groups. In recent years, many standardized tests have been accused of bias.^1,2

In this context, the recent report of O’Neill et al provides important information.³ ABFM began to collect data on race and ethnicity of its Diplomates in 2013 in order to assess its high-stakes multiple-choice questions for bias. Differential Item Functioning (DIF) is the industry standard approach to questions for bias.^4,5 Briefly, DIF analysis screens multiple-choice questions for differential impact across racial and ethnic groups, controlling for the ability of the test-taker. Any items that are identified by this statistical screening process are then reviewed by a panel of physicians of underrepresented race and ethnicity groups, who are charged to assess whether the underlying clinical concept is appropriate for family physicians. This report summarizes 8 years of DIF testing. The data suggest that about 11% of our questions show a degree of differential performance across groups, but overall, there was no significant advantage to one group over another. Furthermore, close review by the DIF panel concluded that only 0.1% the questions had an identifiable source of bias that was not an important aspect of family medicine. So, after 8 years, we have determined that there are some questions we will not use going forward, but it is a very modest number. A similar report was published in Academic Medicine about the United States Licensing Medical Examination Part I by the National Board of Medical Examiners.⁶ Modern national psychometric tests at the Licensure and Board certification level seem to have minimized bias of individual questions against major racial and ethnic groups. Given the importance of testing to health equity, ABFM will continue to monitor its questions for bias.

Furthermore, valid knowledge assessments can help track trends in education. Driven by ABFM’s commitment to improving health equity, ABFM has begun to look at trajectories of knowledge acquisition by race and ethnicity among family medicine residents. Wang et al publish their results this month in Family Medicine.⁷ Importantly, the In-Training Exam is set on the same psychometric scale as the ABFM Certification Exam, making it possible to characterize the trajectory of knowledge acquisition across the 3 years of residency training up to and including initial certification examination. Figure 1 illustrates their findings. Their analysis has 3 important findings: first, different racial and ethnic groups start residency at different levels of mean scores on the exam. ABFM believes that the magnitude of these differences is meaningful. Given that residency occurs at the end of a long educational pathway, this should be seen as a wake-up call for undergraduate educational institutions and medical schools—and other educational institutions even further upstream—to address the disparities in clinical knowledge entering residency. Working closely with the ACGME Family Medicine Review Committee, ABFM has demonstrated the value of focused emphasis on clinical knowledge in reducing disparities of examination scores between international and US medical graduates.⁸ Can medical schools focus more effectively on acquisition of clinical knowledge, even as they pay attention to other competencies? A second finding is that the trajectory of clinical knowledge acquisition is similar across racial and ethnic groups; thus, at some level our national residency system provides equality of opportunity. This is fundamental and reassuring. Third, our residencies are not improving equity for historically marginalized groups. ABFM believes that our collective goal should be that all racial and ethnic groups have a similar level of clinical knowledge at the end of residency. This should be one of the goals of our collective efforts in residency redesign.⁹

Figure 1.

Trajectory of knowledge acquisition of family medicine residents by race and ethnicity.

PGY1 = postgraduate year 1; PGY2 = postgraduate year 2; PGY3 = postgraduate year 3; FMCE = family medicine certification exam.

Valid knowledge assessment tools also allow us to assess the impact of the COVID pandemic clinical knowledge acquisition. Figure 2 shows the trends in family medicine In-Training Exams on a national sample. It is important to keep in mind that these data represent over 14,000 residents, so finding statistically significant differences is not surprising. What is surprising, however, is the magnitude of the differences. A reasonable estimate for a substantively significant difference is about 30-40 points on this scale, which is also about one-half of the standard deviation of the certification exam.¹⁰ This is likely to be a meaningful difference and may suggest that the growth of knowledge of family medicine residents has slowed significantly over the last 2 years.

Figure 2.

Trends in family medicine in-training and certification examination scores by training year.

PGY1 = postgraduate year 1; PGY2 = postgraduate year 2; PGY3 = postgraduate year 3; FMCE = family medicine certification exam.

It is important to be cautious in interpretation of these results. We are currently analyzing the etiology of this difference. As the graph shows, there is substantial variation from year to year and our results may represent routine process variation. Importantly, this year’s exams do not show a change in initial certification rate. There also may be many confounding factors, including dramatic changes in rotations, didactics, and staffing shortages, as well as changes in undergraduate preparation, rapid growth of family medicine residencies, impact on residents’ personal lives and perhaps changes in the students becoming family medicine residents.

What are the implications? ABFM will continue to track and report the findings. We believe, however, that the specialty needs to pay attention and redouble its efforts in the education of residents. This generation of residents has demonstrated tremendous professionalism by leaning in and doing whatever was necessary to take care of their patients during the pandemic—they are heroes and heroines. At the same time, breadth and depth of clinical knowledge is fundamentally important to the social responsibility of family physicians, and we must reach out and support residents and residencies to reinvigorate didactics,¹¹ recreate meaningful clinical experiences even as we redesign residencies.^9–12 Building back is crucial.

References

1.↵
1. Davis D,
2. Dorsey JK,
3. Franks RD,
4. Sackett PR,
5. Searcy CA,
6. Zhao X.
Do racial and ethnic group differences in performance on the MCAT exam reflect test bias? Acad Med. 2013; 88(5): 593–602. doi:10.1097/ACM.0b013e318286803a
OpenUrl CrossRef PubMed
2.↵
1. Rosales J,
2. Walker T.
The racist beginnings of standardized testing. Published 2021. Accessed Feb 14, 2021. https://www.nea.org/advocating-for-change/new-from-nea/racist-beginnings-standardized-testing
3.↵
1. O’Neill TR,
2. Wang T,
3. Newton WP.
The American Board of Family Medicine’s 8 years of experience with differential item functioning. JABFM. 2022; 35(1): 18–25. doi:10.3122/jabfm.2022.01.210208
OpenUrl CrossRef
4.↵
1. Shealy R,
2. Stout W.
A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika. 1993; 58(2): 159–194.
OpenUrl CrossRef
5.↵
1. Holland PW,
2. Wainer H.
Differential Item Functioning. Lawrence Erlbaum Publishing; 1993.
6.↵
1. Rubright JD,
2. Jodoin M,
3. Woodward S,
4. Barone MA.
Differential item functioning analysis of United States medical licensing examination step 1 items. Acad Med. Published online Dec 14, 2021. doi:10.1097/ACM.0000000000004567
OpenUrl CrossRef
7.↵
1. Wang T,
2. O’Neill TR,
3. Eden AR, et al.
Racial/ethnic group trajectory differences in exam performance among US family medicine residents. Fam Med. Forthcoming. 2022.
8.↵
1. Puffer JC,
2. Peabody MR,
3. O’Neill TR.
Performance of graduating residents on the American Board of Family Medicine certification examination 2009-2016. JABFM. 2017; 30: 570–571. doi:10.3122/jabfm.2017.05.170065
OpenUrl CrossRef
9.↵
Family Medicine. 2021;53(7, theme issue). Accessed Sep 27, 2021. https://journals.stfm.org/familymedicine/2021/july-august/
10.↵
1. Norman GR,
2. Sloan JA,
3. Wyrwich KW.
Interpretation of Changes in Health-Related Quality of LIfe: The Remarkable Universalities of Half a Standard Deviation. Vol 41. Lippincott Williams & Wilkins; 2003.
11.↵
1. Zakrajsek T,
2. Newton W.
Promoting active learning in residency didactic sessions. Fam Med. 2021; 53(7): 608–610. doi:10.22454/FamMed.2021.894932
OpenUrl CrossRef
12.↵
1. Newton WP,
2. Magill M.
Re-envisioning family medicine residency education: from theory to practice. JABFM. 2021; 34(6): 1268–1271. doi:10.3122/jabfm.2021.06.210395
OpenUrl CrossRef