The purpose of the American Board of Family Medicine (ABFM) certification/maintenance of certification examination is to measure the basic knowledge necessary to deliver high quality care to patients and their families. More than 25 years ago, the ABFM became the first American Board of Medical Specialties (ABMS) board to introduce criterion-based methodology to establish the passing threshold for its examination. A criterion-referenced examination is one in which a particular score is required to pass, and the performance of those taking the exam is of no consequence in determining who passes or fails. In other words, all candidates taking the examination could theoretically pass if they met or exceeded the criterion-referenced passing score. Furthermore, the exam is equated across forms and administration, meaning candidates are not advantaged or disadvantaged by having received a particular version of the exam, or by taking it at a particular time of the year.
It should be apparent, therefore, that the ABFM is not interested in comparing the performance of one candidate with another, but rather comparing a candidate’s performance against the criterion-based passing threshold. Our ability to do so became more precise in 2006 when we moved to a new psychometric model, Item Response Theory (IRT), to develop and score the examination. Among its many advantages over the Classical Test Theory model that we had previously employed for over 35 years, IRT provides greater discrimination and precision around the passing threshold. However, it also provides less useful information for those who score very well or very poorly, and that is one of the major reasons why we have recently discontinued the use of percentile ranks associated with a candidate’s score. Reporting percentile ranks can be problematic and potentially misleading for examinees, and we would like to demonstrate why that is so.
Since candidates that apply for the examination consist of both recently trained residents seeking certification for the first time as well as seasoned family physicians seeking to maintain their certification, the cohort of family physicians who sit for the examination each year is quite diverse. The demographic characteristics, experience levels, geographic location and even scope of practice of the physicians in each sample vary considerably. This was particularly true with the cohorts that took the examination in 2010, 2011, and 2012.
Prior to 2005, the ABFM granted certification for 7-year periods. Beginning in 2005, a policy change was implemented within our Maintenance of Certification for Family Physicians (MC-FP) program that created the possibility for family physicians to earn a 3-year extension of their certificate, thereby extending the period of time between examinations by 10 years. As a result of this policy change, the ABFM experienced a 3-year period in which the number of family physicians seeking to maintain their certification was very low. However, the number of family physicians who had previously failed and were attempting to recertify was disproportionately high. This phenomenon is best demonstrated by comparing the 2009 and 2010 exam cohorts.
In the table below, percentile ranks are reported for both the 2009 and 2010 MC-FP exams. The passing standard for the exam in both years was 390 with a reported scaled score range of 200 to 800. Because the cohorts of initial certifiers (primarily residents) in 2009 and 2010 were relatively stable, the percentile rank did not change much from 2009 to 2010 (about 2 percentile points) for these candidates. However, for those attempting to maintain their certification, a scaled score of 390 in 2009 meant one was in the 15th percentile. In 2010, however, that same scaled score meant one was in the 31st percentile. One will note other significant differences when scanning Table 1 as well.
Interestingly, many examinees can recall their percentile ranking but cannot recall their scaled score. It is easy to understand why some examinees may be interested in learning how well they performed relative to their peers. Yet, from the example shown above, it is evident that percentile rankings may be misleading for both examinees and the general public. When the ranking portrays the examinee as being more knowledgeable than he or she truly is, it inflates and misrepresents one’s perceived ability and misleads the public. For example, consider an MC-FP candidate in 2010 that scored a 450 on the exam and wants to compare the ranking with other candidates. This examinee would rank in the 51st percentile among his or her MC-FP peers, but only in the 40th percentile when compared with candidates seeking initial certification.
The practice of reporting percentile rankings has the potential to introduce other undesirable elements into the score reporting process as well. For example, the very nature of reporting percentile ranks will no doubt mean some people will be pleased with their ranking, while others will not. After all, persons at the top end of the scale will certainly feel great about themselves knowing they outperformed the vast majority of their peers on a national examination. However, for those unfortunate examinees that happened to fail the exam it can be rather embarrassing to realize that say, 96% of one’s peers performed better than he or she did. When an examination is criterion-referenced, the only thing that really matters is one’s performance relative to the minimum passing standard. After all, someone that scores a 500 on the MC-FP examination is not “more certified” than someone that passed with a score of 400. We contend that through reporting scores properly and directing examinees toward the appropriate criteria for making meaningful inferences, we can be more responsible with our score reporting while concurrently preserving the dignity of those that inevitably fail.
- © 2013 Annals of Family Medicine, Inc.