Every high-stakes examination should have a set of test specifications that describes the content of the examination. This includes the number of questions presented to candidates, the content categories included in the exam, and the percentage of questions devoted to each category. These test specifications, often called the exam “blueprint,” provide guidelines for developing examinations but can also provide information to candidates that will help them prepare for the exam. After the examination is given, the content categories can be used to provide feedback about performance, which can help guide candidates in their future study and can also help them to understand what contributed to their overall score.
The American Board of Family Medicine (ABFM) is currently developing a new blueprint for its examinations. The first blueprint for ABFM examinations was based on medical specialties that were considered to be part of the training of family physicians. These included internal medicine, which made up more than one-fourth of the exam, as well as surgery, obstetrics, gynecology, pediatrics, psychiatry, and community medicine. Geriatrics was also included in the blueprint, even though it was not a specialty or subspecialty at the time, because it was still considered an important part of care across the full life cycle. The current blueprint for our examinations is based primarily on body systems, as shown in Table 1.1 This blueprint was put into effect in 2006 and is used for all our examinations, including the Certification Examination, the longitudinal assessments that have recently been offered by ABFM, and the In-Training Examination taken by residents each year.
The rationale behind this classification system was that it mirrored the way physicians were trained and the way many medical textbooks are organized. One problem we have had with this blueprint, however, is that some categories are not clearly defined. We provide examples of topics that fall under Population-Based Care and Patient-Based Systems but not clear definitions. Another problem has been the Nonspecific category, which is defined as problems that affect multiple body systems, such as sarcoidosis, but candidates are not clear about what is included in this category. In addition, it is difficult to classify test questions consistently because it is not always clear which body system is primary. For example, should a question about managing hypertension in a patient with kidney disease be classified as Cardiovascular or Nephrologic? Should a question about osteoporosis be classified as Musculoskeletal or as Endocrine? There is no best answer, as in both cases either choice would be logical. Given the thousands of questions in our database, and the broad spectrum of family medicine, trying to create a long list of decision rules has also proved to be impractical.
Some categories of the current blueprint are also less useful than others in terms of test development and candidate feedback because they are so small, reflecting the frequency of these problems. The most obvious example is the Male Reproductive category, which is only 1% of the exam. Combining these small categories has not been possible because body systems do not combine in a logical way.
In thinking about a new version of the blueprint, one major goal has been to use categories that reflect the ways family physicians think about their practices. While family medicine does include all of the organ systems in the current blueprint, family physicians don’t typically think about their practices being more devoted to the cardiovascular system than the respiratory system, for example. We would also like the blueprint categories to be large enough to be useful for providing candidate feedback, to be distinct from one another, and to be semantically parallel. For example, the major categories should not include both organ systems and etiologies of disease because these are not parallel. Medical textbooks often include chapters on both organ systems and etiologies, which works because most textbooks have an index to help find information. When assembling an examination, however, it would be hard to know how to categorize questions about skin infections if the blueprint contained a category for infections and one for dermatology. This could also be confusing to candidates preparing for an exam. In addition, it is important that the categories are defined by everyone in a similar way, including both candidates and test developers.
Figure 1 indicates the limitations that we have to consider when creating a new blueprint. This figure shows that our examination content is limited by the multiple-choice format. ABFM examinations, including the In-Training Examination, are taken by more than 20,000 physicians each year, so multiple-choice questions are the most efficient and practical method of assessment we currently have available. This format does not, however, lend itself to assessing certain skills such as physician-patient interactions, and the blueprint needs to take this into account. We also need to keep in mind that the goal of our examination is to determine who should be certified in family medicine. Questions that are too easy or too difficult should not be included because they do not help determine whether someone has the cognitive knowledge necessary to be certified. For example, even though minor upper respiratory infections are commonly seen by family physicians, questions about diagnosing this problem are unlikely to help determine who should be certified. On the other end of the spectrum, questions about appropriate use of chemotherapy for stage-IV cancer are also not useful for making our certification decisions.
Most test blueprints use unidimensional models like what we have used in the past, and those that are more detailed use a hierarchical outline form that may go 4 or 5 levels deep. This makes it more difficult to retrieve certain types of information. For example, if body systems are used as the top-level category and many of them have infectious problems nested under them in various locations it becomes difficult to know how many questions on the exam address infectious disease. Instead of following this model we decided to use a multidimensional approach that would allow us to look at exam content across one dimension and then re-sort it and look across another dimension. We have several years of experience working with an in-house multidimensional model based on a disease staging system developed by Dr Joseph Gonella at Thomas Jefferson University.2 The categories included body system, etiology, urgency, type of skill, age, and sex. Although we never published this classification system as a blueprint, it proved to be valuable for sorting test questions when reviewing a draft of an exam so that similar items would be located near each other, and it was also useful for querying the question bank.
To help us develop and test the new blueprint, we put together a small group of family physicians we had worked with in the past who were familiar with our exams. Some of them had also served on committees that developed examinations that were administered by other specialty boards, including Geriatric Medicine and Adolescent Medicine. They had experience with the blueprints used by the American Board of Internal Medicine and the American Board of Pediatrics so they had some sense of alternative ways to describe exam content. There was also a mix of backgrounds that included both academia and private practice.
We decided that we would start with 2 primary dimensions that reflected the populations and the types of problems seen by family physicians. We decided on Age as 1 dimension and Urgency/Duration of Disease as the second dimension. The group went through an iterative process of classifying several hundred questions, with periodic discussions and comparisons that eventually led to the subcategories shown in Table 2.
In addition to the primary categories, they developed a list of problems that can be included under the subcategories. These include common presentations such as abdominal/pelvic pain, fatigue, fever, and headache, as well as common conditions such as hypertension, diabetes, and cardiac disease. In addition, there are categories for types of preventive care and knowledge about pharmacology and disease processes.
We realized that we would need some brief definitions to clarify the difference between such things as urgent and acute, for example. In terms of urgency, acute may mean a problem that requires immediate attention, but in terms of duration it may simply mean a problem that lasts for a limited time. The need for definitions is particularly true for the age category, because the group chose the term older adult rather than geriatric in order to include problems that become more common starting around age 50, such as arthritis and Parkinson’s disease.
The final step in implementing a new blueprint will be to determine percentages for these categories and to gather evidence that these percentages are appropriate for defining the content of the exam. In the past we have looked at data from the National Ambulatory Medical Care Survey (NAMCS) from the National Center for Health Statistics.3 This provides information about the frequency of ambulatory care visits to family physicians for a large number of problems, and this does help to support the blueprint, but it is also limited by the fact that it does not include nonambulatory settings where family physicians see patients, such as nursing homes, emergency departments, and hospitals.
In addition, it is important to keep in mind that the NAMCS data is strictly a frequency-based guide to the problems that family physicians see in an ambulatory setting. The frequency of particular health problems is not the sole criterion for evaluating the knowledge and skills that family physicians need, however. If that were the case a large number of our questions would be devoted to upper respiratory infections and ear infections. Minor problems such as this are common but they have a lower potential for harm than some less common problems such as meningitis. The ABFM has recently worked on an Index of Harm that can be associated with the NAMCS data. The Index of Harm for the diagnoses listed in NAMCS was assigned by a group of practicing family physicians, and these values were used in studies that evaluated how well the current ABFM blueprint represents both the Index of Harm and frequency, based on the 2012 NAMCS data.4,5 We expect to use a similar methodology to produce the initial content category weights with the new blueprint.
The design of an examination used to make a decision about whether a physician should be certified should be evidence-informed but not evidence-based. We need to ask about problems that carry a significant potential for harm, we need to place extra emphasis on problems that require training and skill to diagnose and manage, and we need to ask about how to maintain patient health. The blueprint should take all of these factors into account if board certification is to be meaningful to the public.
Footnotes
The authors have no conflicts of interest.
- © 2019 Annals of Family Medicine, Inc.