Skip to main content

Main menu

  • Home
  • Current Issue
  • Content
    • Current Issue
    • Early Access
    • Multimedia
    • Podcast
    • Collections
    • Past Issues
    • Articles by Subject
    • Articles by Type
    • Supplements
    • Plain Language Summaries
    • Calls for Papers
  • Info for
    • Authors
    • Reviewers
    • Job Seekers
    • Media
  • About
    • Annals of Family Medicine
    • Editorial Staff & Boards
    • Sponsoring Organizations
    • Copyrights & Permissions
    • Announcements
  • Engage
    • Engage
    • e-Letters (Comments)
    • Subscribe
    • Podcast
    • E-mail Alerts
    • Journal Club
    • RSS
    • Annals Forum (Archive)
  • Contact
    • Contact Us
  • Careers

User menu

  • My alerts

Search

  • Advanced search
Annals of Family Medicine
  • My alerts
Annals of Family Medicine

Advanced Search

  • Home
  • Current Issue
  • Content
    • Current Issue
    • Early Access
    • Multimedia
    • Podcast
    • Collections
    • Past Issues
    • Articles by Subject
    • Articles by Type
    • Supplements
    • Plain Language Summaries
    • Calls for Papers
  • Info for
    • Authors
    • Reviewers
    • Job Seekers
    • Media
  • About
    • Annals of Family Medicine
    • Editorial Staff & Boards
    • Sponsoring Organizations
    • Copyrights & Permissions
    • Announcements
  • Engage
    • Engage
    • e-Letters (Comments)
    • Subscribe
    • Podcast
    • E-mail Alerts
    • Journal Club
    • RSS
    • Annals Forum (Archive)
  • Contact
    • Contact Us
  • Careers
  • Follow annalsfm on Twitter
  • Visit annalsfm on Facebook
NewsFamily Medicine UpdatesF

THE RELIABILITY OF ABFM EXAMINATIONS: IMPLICATIONS FOR TEST-TAKERS

Kenneth D. Royal and James C. Puffer
The Annals of Family Medicine September 2011, 9 (5) 463-464; DOI: https://doi.org/10.1370/afm.1303
Kenneth D. Royal
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
James C. Puffer
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • eLetters
  • Info & Metrics
  • PDF
Loading

A common theme among family physicians that have repeatedly performed poorly on the ABFM Maintenance of Certification (MC-FP) Examination is the complaint that they received a score that was identical, or almost identical to their score on a previous administration of the exam. From their perspective, it is a mystery as to why they received the exact same score (or a very similar score), despite additional study time and preparation. Often, physicians assume a mixup has occurred and ask if it is possible that results have erroneously been provided from their previous attempt. After a psychometric review, it is clear that there is no mistake at all. In fact, we anticipate many test-takers will receive a comparable score on future attempts at successfully taking the exam. We base this anticipation on the psychometric concept of reliability.

Overview of Reliability

The notion of reliability is perhaps one of the oldest, yet most misunderstood notions in the measurement and assessment arena. Commonly, researchers of all experience levels assert their instruments are reliable. The truth is there is no such thing as a reliable instrument. Only the scores produced from an assessment have the property of reliability. All tests are dependent upon the characteristics of the test, the test administration, and the group of examinees. It is the interaction among these 3 elements that determine the reliability of results for any test.

With regard to the 3 major elements, let us briefly discuss each. Test characteristics typically include test length, item type, and item quality. Generally speaking, longer tests produce more reliable scores than shorter tests. With regard to item type, objective items such as multiple-choice items typically produce more reliable scores than subjective items such as essays. Item quality is also important as poor quality items tend to reduce reliability. Also, good quality items should sufficiently vary in difficulty so that they effectively discriminate among examinees. Discrimination is useful in that it helps identify which examinees possess the knowledge necessary to correctly answer an item. Those who possess the most knowledge will have the greatest probability of answering difficult items correctly. Over the course of a lengthy examination, distinctions between examinees become clearer, and we are better able to determine how much knowledge an examinee possesses.

Conditions of administration are also important. Conditions include physical conditions (eg, temperature levels, noise, etc, in the testing room), exam instructions, and time limits. Our testing vendor goes to great lengths to ensure these factors remain as constant as possible across multiple administrations of our examination. Variation in these conditions could affect some examinees differently, resulting in scores that vary for reasons other than an examinee knowing more or less about the content. The ABFM acknowledges that disruptions such as excessive noise or other distractions can introduce additional error into one’s score, thus potentially invalidating results. We have policies in place to rectify situations when this occurs. However, other administration factors such as instructions and time limits are imposed equally upon everyone, unless a disability is documented in which case extra time and possibly other accommodations may be permitted.

Finally, the characteristics of the group of examinees are also important. As mentioned previously, a good test should contain a considerable number of items with varying degrees of difficulty. But what happens when a good test is attempted by a very homogenous sample, say, all high-achievers with similar levels of knowledge? Although the test may be psychometrically sound, the sample of examinees varies so little that scores cannot be reliably differentiated. When this happens, low reliability estimates are produced and many researchers quickly dismiss the instrument (or assessment) as being of poor quality. It is for this reason that reliability estimates are not the measure of exam quality, but rather a measure of exam quality. In order for a test to produce reliable scores, the ability of examinees must also sufficiently vary. When there is a great range of ability in a group, reliable distinctions between what an examinee knows and does not know can be made.

Empirical Example and Interpretation

Although no strict guidelines for minimum levels of reliability exist, many measurement experts tend to agree with Nunnally and Bernstein’s recommendations.1 That is, the minimum reliability necessary for a group of test scores is .90 if important decisions are going to be made based on those scores. Reliability estimates between .80 and .89 are considered reasonably reliable. The 2009 ABFM MC-FP examination had a reliability estimate of .94. This is considered a very high estimate of internal consistency. This estimate indicates an estimated 94% of the observed variance in scores is due to systematic differences in examinee performance, with 6% due to chance differences. Another way to interpret this estimate is to consider perfect reliability (1.0) minus the observed reliability (.94). The difference, in this case .06 (or 6%), is the amount of observed variance that is due to measurement error.

Implications for High-Stakes Testing

In many ways high estimates of reliability essentially echo the old adage, if you always do what you’ve always done, you’ll always get what you’ve always gotten, to test-takers. For an examinee that has a history of scoring very high on the exam, this notion will typically work in the examinee’s favor. However, it should be made abundantly clear that this is not a guarantee. On the other hand, test-takers who have previously failed an examination may find this news disconcerting. However, this is not to say that one is not capable of making such gains. With a significantly improved approach to exam preparation, most examinees that have failed previously are capable of making the types of gains necessary to pass this examination. It all begins with asking the right question and preparing an effective study plan.

Examinees should not ask themselves “what do I have to do to reach the minimum score necessary for passing?” but rather “how can I become a more knowledgeable physician?” For physicians whose goal is to simply pass the test, their intentions, and possibly preparation strategy, are misguided. One’s goal should not be to pass the exam, but rather to become a better family physician. With an increased fund of medical knowledge, the chances of passing the examination will improve naturally as a result of actual learning. However, if one’s goal is to simply receive a passing score, then the examinee will likely find him or herself in the position of trying to anticipate examination items and otherwise resorting to methods similar to “cramming.” Spending exorbitant amounts of time and energy attempting to memorize content solely for the purposes of regurgitating it at a later time, or working on improving one’s test-taking skills with regard to identifying distracters do not work well on a high-stakes, criterion-referenced examination such as ours that measures one’s fund of medical knowledge.

As we have demonstrated previously, simply being a good test-taker is not likely to significantly improve one’s chances of passing a high-stakes certification exam.2 Also, the scoring methods used for our exams work in such a way that one’s ability is estimated based on correct/incorrect responses to items of varying degrees of difficulty. When both person ability and item difficulty are mapped onto a single continuum, it becomes clear from a psychometric perspective what an examinee knows and what he or she does not.3 Therefore, only when a physician has taken an improved approach to exam preparation, particularly one that focuses on increasing one’s fund of medical knowledge, can one seriously expect to advance along that continuum of ability.

Conclusion

It is important to clearly and directly emphasize that an examinee of marginal ability or someone with a history of previous failures is likely to continue to fail the MC-FP exam if he or she continues with the same preparation approach or otherwise utilizes study preparation methods that do not solicit actual and sustained learning. Improving test-taking skills will be of minimal benefit to a test-taker, as high-stakes examinations are not a measure of one’s test-taking skills. The MC-FP examination is constructed in such a way that the influence of test-taking skills is negligible. Examinees should understand that the only legitimate way to improve one’s performance on the MC-FP Examination is to increase their fund of medical knowledge and decision making ability in clinical scenarios; that is what the exam measures. When examinees make real gains with regard to improving these, they are most likely to receive higher scores. It should be noted that the ABFM provides important information on its Web site about its exams intended to help the family physician understand both the type and amount of content one might expect to see, as well as tips for developing a study plan.4 Utilizing this information can assist with improving performance on our examinations.

  • © Annals of Family Medicine, Inc.

References

  1. ↵
    1. Nunnally JC,
    2. Bernstein IH
    . Psychometric Theory. 3rd ed. New York, NY: Mcgraw Hill; 1994.
  2. ↵
    1. O’Neill TR,
    2. Royal KD,
    3. Puffer JC
    . Performance on the American Board of Family Medicine (ABFM) certification examination: are superior test-taking skills alone sufficient to pass? J Am Board Fam Med. 2011;24(2):175–180.
    OpenUrlAbstract/FREE Full Text
  3. ↵
    1. Linacre JM
    . KR-20 or Rasch reliability: which tells the “truth”? Rasch Measurement Transactions. 1997:11(3):580–581. http://www.rasch.org/rmt/rmt113l.htm.
    OpenUrl
  4. ↵
    Examination Descriptions ABFM. (2010). Examination Descriptions. https://www.theabfm.org/cert/exams.aspx. Accessed Nov 11, 2010.
PreviousNext
Back to top

In this issue

The Annals of Family Medicine: 9 (5)
The Annals of Family Medicine: 9 (5)
Vol. 9, Issue 5
September/October 2011
  • Table of Contents
  • Index by author
  • In Brief
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Annals of Family Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
THE RELIABILITY OF ABFM EXAMINATIONS: IMPLICATIONS FOR TEST-TAKERS
(Your Name) has sent you a message from Annals of Family Medicine
(Your Name) thought you would like to see the Annals of Family Medicine web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
3 + 6 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
THE RELIABILITY OF ABFM EXAMINATIONS: IMPLICATIONS FOR TEST-TAKERS
Kenneth D. Royal, James C. Puffer
The Annals of Family Medicine Sep 2011, 9 (5) 463-464; DOI: 10.1370/afm.1303

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Get Permissions
Share
THE RELIABILITY OF ABFM EXAMINATIONS: IMPLICATIONS FOR TEST-TAKERS
Kenneth D. Royal, James C. Puffer
The Annals of Family Medicine Sep 2011, 9 (5) 463-464; DOI: 10.1370/afm.1303
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Overview of Reliability
    • Empirical Example and Interpretation
    • Implications for High-Stakes Testing
    • Conclusion
    • References
  • eLetters
  • Info & Metrics
  • PDF

Related Articles

  • No related articles found.
  • PubMed
  • Google Scholar

Cited By...

  • Community Size and Organization of Practice Predict Family Physician Recertification Success
  • Google Scholar

More in this TOC Section

  • Resident Leadership Roles and Selection
  • New Advocacy Ambassadors Program Helps AAFP Members Engage With Their Legislators
  • STFM Announces New Point of Care Ultrasound Task Force and Initiative on POCUS Family Medicine Education
Show more Family Medicine Updates

Similar Articles

Content

  • Current Issue
  • Past Issues
  • Early Access
  • Plain-Language Summaries
  • Multimedia
  • Podcast
  • Articles by Type
  • Articles by Subject
  • Supplements
  • Calls for Papers

Info for

  • Authors
  • Reviewers
  • Job Seekers
  • Media

Engage

  • E-mail Alerts
  • e-Letters (Comments)
  • RSS
  • Journal Club
  • Submit a Manuscript
  • Subscribe
  • Family Medicine Careers

About

  • About Us
  • Editorial Board & Staff
  • Sponsoring Organizations
  • Copyrights & Permissions
  • Contact Us
  • eLetter/Comments Policy

© 2025 Annals of Family Medicine