Abstract
BACKGROUND Practice-based research networks (PBRNs) replicating the National Ambulatory Medical Care Survey (NAMCS) must sample more than 1 year to account for presumed seasonal variation in illnesses. This study evaluated the effects of seasonality on diagnoses within NAMCS family physician data.
METHODS Using combined data from the 1995–1998 NAMCS, diagnostic clusters that accounted for more than 1% of total visits were analyzed for seasonality. Seasons were coded categorically as dummy variables with summer as the reference category. A logistic regression was performed with each diagnosis as an outcome on the full data. To examine the ability of alternative sampling strategies to replicate the full year of data, a simulation study was carried out drawing 50 samples of 1,000 visits each for winter-summer and spring-fall sampling periods.
RESULTS We found 23 diagnostic clusters that had a frequency more than 1%, of which 10 had seasonal variations (P ≤.001), primarily between winter and summer. If sampling were restricted to spring, the diagnostic clusters of pregnancy and coronary artery disease would account for less than 1% of visits. All other diagnostic clusters, though changing rank order, would account for more than 1% if sampled in a single quarter. In the simulated sampling strategy, visit prevalence dropped below 1% for at least 1 diagnosis in 24 of 50 samples in spring-fall compared with 20 of 50 samples for winter-summer (P >.20).
CONCLUSIONS There is little seasonal variation in the 23 diagnoses that occur in more than 1% of visits to family physicians. There is, however, important seasonal variation in the rank order of these diagnoses. A sampling strategy that uses any quarter of the year but spring (March, April, May) could be used to understand what diagnoses are frequently seen within a PBRN.
- Practice-based research network
- seasonal variation
- family practice/statistics & numerical data
- statistics
- logistic models
INTRODUCTION
In the past decade, practice-based research networks (PBRNs) have developed as essential laboratories for studying primary care.1–,9 Few networks can guarantee that their patient population matches that of the United States as a whole, and many networks are established with the expressed purpose of studying selected patient populations. As the number of regional networks grows, it becomes increasingly important for these networks to be able to describe their patient population.
One way a PBRN can understand how its patient population compares with that of another PBRN or the rest of the United States is to replicate the National Ambulatory Medical Care Survey (NAMCS). In fact, a number of PBRNs were asked to closely replicate NAMCS for the Agency for Healthcare Research and Quality during 2001–2002 with a survey called the PRImary care Network Survey (PRINS).
NAMCS is currently a yearly survey that provides a probabilistic sample of all nonmilitary, nonprison, ambulatory care in the United States.10 Though the survey instrument changes slightly from year to year, the sampling methods are standardized, and most of the survey items can be compared across years. The NAMCS samples practices for the course of a full year, presumably to account for seasonal variation in illnesses clinicians diagnose and treat.
Outside primary care, seasonal variation has been documented in admission rates to intensive care units (United Kingdom),11 visits to the emergency department (United States),12 and all-cause mortality (United Kingdom).13,14 Although many clinicians intuitively believe there is seasonal variation in the patient problems that primary care physicians diagnose and treat, few studies have documented this finding. The few published studies we found show seasonal variation in the diagnosis of ischemic heart disease in general practice (United Kingdom)15 and in the number of visits to primary care physicians (Sweden).16 The Swedish study, based on a 14-year observation period from 1969 to 1982, found that visits to primary care physicians declined during July and August in relation to a decline in diagnoses related to respiratory tract infections, and that there were no appreciable differences for the rest of the year. To date, we are unaware of any literature describing the effects of seasonality on the relative frequency of specific diagnoses in primary care.
Even with scant evidence supporting seasonal variation in diagnoses or use of primary care, the NAMCS protocol requires 1 full year of data collection. Thus, for a PBRN to replicate the NAMCS, it must collect data for an entire year, with considerable investment in network resources as well as clinician and staff time. The time and cost could be considerably reduced if the NAMCS data could be collected at fewer points in time throughout the year while yielding results similar to a full year of data collection.
In this study we evaluate the effects of seasonality on diagnoses using 4 years of NAMCS data for visits to family physicians. The null hypothesis for the study was that there is no difference in the relative frequency of diagnoses in different quartiles of the year.
METHODS
This study used the combined data from the 1995–1998 NAMCS data sets. Four years of data were combined to enhance generalizability of results, given the year-to-year variability for diagnoses within the NAMCS. We included all visits to family physicians, excluding visits referred from another physician. The data elements used for analysis include date of visit, patient age, patient sex, geographic region, and all diagnoses. The NAMCS identified 4 geographic regions (Northeast, Midwest, South, and West). The variables of geographic region and survey year were used as covariates (categorical) in the logistic regression analyses described below. The NAMCS allows the recording of up to 3 diagnoses per visit, and diagnoses are coded using the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). For this study all 3 diagnoses were pooled, and the codes were truncated to the fourth digit. We then analyzed the top 100 diagnostic codes for the pooled data set. The relative frequency of diagnoses below 100 is so small (less than 0.2% of visits per diagnosis) that the addition of these diagnoses does not affect the aggregated outcomes.
Selected diagnoses were combined using a modification of diagnostic clusters by Schneweiss et al.17 Because of the low frequency of usage at the time the diagnostic clusters were developed, the diagnoses of hyperlipidemia, hypercholesterolemia, and viral syndrome were not assigned to a cluster. To account for the increased frequency of these diagnoses, we made the following modifications to Schneweiss et al diagnostic clusters: the diagnoses of hyperlipidemia and hypercholesterolemia were added as a new cluster, the diagnosis viral syndrome was added to the upper respiratory tract cluster, and the diagnosis of back strain was included in the back pain cluster instead of in the joint sprain cluster. Each diagnosis is in the top 40 codes used by family physicians from 1995 to 1998. We moved the back sprain diagnosis because we believed it fit more closely with the generic back pain codes than with peripheral joint sprains. Back sprain ranks 61st within the full data set; therefore, its inclusion or exclusion in the back pain cluster has little impact on the results. A list of modified diagnostic clusters is available from the authors.
The top 20 diagnoses using the modified diagnostic clusters and age-sex profiles for family practice visits were compared with general practice visits for the combined data set to determine whether the data from these 2 groups should be combined. We used chi-square tests and t tests to compare the top 20 diagnostic clusters and the age-sex profiles. There were significant differences in the proportion of cases, with 12 of the 20 diagnoses as well as with the age-sex profiles, indicating that these 2 groups were not comparable. Thus, we decided not to include the general practice group in the analysis. The NAMCS weighted estimates of visits were used to compute the total visits by season for family physicians using SUDAAN software.18 We did not use weighted diagnoses for this analysis, because no attempt to relate the diagnostic cluster findings to the entire population or to specific populations is implied.
All diagnostic clusters that accounted for more than 1% of the total visits were analyzed for seasonality. A total of 13,149 nonreferred primary care visits were coded for up to 3 diagnoses. A total of 22,380 diagnoses were coded for these visits; 21,669 of the total diagnoses were from nonreferred visits, and 16,415 were included in the 23 diagnoses analyzed for seasonality. Month of visit was used initially to categorize each visit as follows: summer (June, July, August), fall (September, October, November), winter (December, January, February), and spring (March, April, May). Seasons were coded categorically as dummy variables, with summer as the reference category. To determine whether there were seasonal differences in the probability of encountering a particular diagnosis in a visit, controlling for survey year, a logistic regression was performed with each diagnosis as the dependent variable (present or absent) and season as a categorical independent variable. Geographic region and year of survey were included as covariates (both categorical).
We also examined visits by sex and age as a percentage of total visits for each season. Age was divided into the following categories for this analysis: 0 to17 years, 18 to 45 years, 46 to 65 years, and older than 65 years. Because seasonal variation in diagnoses may not group into the standard seasons, we varied each season by 1 month forward and 1 month back around the months of July (summer), October (fall), January (winter) and April (spring), and the evaluations of seasonality were repeated. This analysis resulted in 3 separate P values for each of the 23 diagnostic clusters, corresponding to 3 different approaches to dividing up a year. This finding further describes the sensitivity of the seasonality of specific diagnostic clusters according to the definition of each season. We used a P value of ≤.001 as a cutoff for statistical significance to take into account the large sample size and multiple comparisons (Bonferroni adjusted P values would be P = .002 for 23 diagnoses). We sorted the diagnostic clusters into 4 groups based on the strength and robustness of the seasonality results. The first group showed strong seasonality, robust to seasonal definition, with all 3 P ≤.001. The second group showed moderate seasonality, with 2 of the 3 P ≤.001. The third group showed low seasonality, with 1 of 3 P ≤.00l, and the final group showed no seasonality with all P >.001. We present actual P values from the logistic regressions described above. We also evaluated each diagnostic cluster for the number of quarters of the year it fell below the 1% inclusion level.
To examine the ability of alternative sampling strategies to replicate the full year of data, we carried out a simulation study using random subsampling to simulate actual data collection approaches that a PBRN might undertake.17 We drew a total of 50 samples of 1,000 randomly selected visits each without replacement from the NAMCS data set described above for both winter-summer and spring-fall sampling strategies. For each random sample, we determined visit prevalence rates for each diagnostic cluster, yielding a set of 50 visit prevalence rates each for winter-summer and spring-fall. Next, we calculated mean visit prevalence rates for each diagnostic cluster across all 50 samples for the 2 sampling strategies. Finally, the 95% nonparametric confidence interval (2.5–97.5 percentiles) was determined for the visit prevalence rate for each diagnostic cluster for both sampling strategies in addition to the standard deviation.19 We also note the total number of samples with at least 1 diagnostic cluster in which visit prevalence fell below 1%.
RESULTS
Within the 4 years of NAMCS data, 23 diagnostic clusters accounted for greater than 1% of the total diagnoses. These 23 diagnostic clusters represent 75.8% of all diagnoses recorded during this period. Table 1⇓ includes information on the weighted number of visits predicted for each season per year. Based on weighted NAMCS data, the estimated number of visits to family physicians within the United States does not show seasonality (P >.20). Table 2⇓ includes data on the overall percentage of visits for each diagnosis, along with percentage of visits for each season based on the original definition of seasons.
Of the 23 diagnostic clusters, 10 showed seasonal variability at the P ≤.001 level in the original seasonal model (Table 3⇓). Seasonal variability is primarily seen between the winter and summer quarters (Table 2⇑), though rhinitis shows greater variation between spring and fall and coronary artery disease shows greater summer to spring variation.
Table 3⇑ also shows the effects of reallocating the months within each season. Hypercholesterolemia, back pain, and neck pain show seasonal variation within one of the redefined seasons but not with the original definition. No diagnostic clusters show seasonality in 2 of the 3 seasonal definitions without showing seasonality within the original model. Six diagnostic clusters show robust seasonality, with significant differences in percentage of visits across the year in all 3 seasonal models. Three diagnostic clusters—rhinitis, obesity, and pregnancy—have moderately robust seasonality. Four diagnostic clusters—hypertension, hypercholesterolemia, back pain, and neck pain—have 1 each of P ≤.001. The remaining diagnostic clusters show no evidence of seasonality.
An alternative approach to analyzing the information concerning seasonality is to consider which diagnostic clusters would not be represented at the greater than 1% level if sampling were restricted to less than 1 full year. If sampling were restricted to a single quarter of the year, a common data collection technique for other PBRN studies, then the diagnostic clusters of pregnancy and coronary artery disease would drop below the threshold of 1% of visits in the spring quarter (March, April, May). All other diagnostic clusters, though changing rank order, would remain at greater than 1% if sampled only in a single quarter of the year.
Because of the seasonality found in the diagnostic clusters of pregnancy and coronary artery disease (diagnostic clusters for which prevalence is related to sex of patient), the distribution of visits by sex for the original 4 seasons was examined. No significant difference in percentage of visits by sex was found. Because of the seasonality in diagnostic clusters that are highly correlated with age, such as otitis media, upper respiratory tract infection, asthma (more frequently observed in children); or coronary artery disease, obesity, and pregnancy (more frequently observed in adults), the seasonality of visits by age was examined. Visits by age-groups displayed seasonality (P <.001), primarily occurring in the 0- to17-year-old group. Thirty-two percent of the visits by this age-group occurred in winter compared with 19.4% in summer. Visits in all other age-groups were more evenly distributed across seasons (Table 4⇓).
Finally, to simulate a sampling strategy based on sampling patients from 2 of the 4 seasons, we drew 50 random samples of 1,000 patients each for the summer-winter and spring-fall sampling time frames.19 In each of the 50 samples using the 2 sampling strategies, we determined the visit prevalence rates for each of the 23 diagnostic clusters. The 95% nonparametric confidence intervals overlap the full-year curve for all diagnostic clusters for both sampling methods. We also reviewed the number of samples for which the visit prevalence dropped below 1% (Table 5⇓). Across the 50 samples, visit prevalence dropped below 1% for a total of 32 diagnostic clusters (from a total of 1,150 possible diagnostic clusters; 23 diagnostic clusters times 50 samples) the using spring-fall sampling strategy compared with 21 diagnostic clusters for the winter-summer sampling strategy.
Table 5⇑ indicates the frequency of each diagnostic cluster that decreased to the 1% threshold for both winter-summer and spring-fall sampling strategies. Visit prevalence decreased to less than 1% for at least 1 diagnosis in 24 of 50 samples in spring-fall samples compared with 20 of 50 samples for winter-summer (P >.20). Mean number of diagnostic clusters per sample with a visit prevalence of less than 1% was 0.80 (SD 1.01) for winter-summer compared with 1.12 (SD 1.26) for spring-fall. These rates (and means) are not significantly different for the 2 sampling strategies. The data from the resampling process are represented graphically (Figures 1 and 2) and in table format (Table 6) as online-only supplemental data, which can be found at http://www.annfammed.org/cgi/content/full/2/5/411/DC1.
DISCUSSION
Seasonal variability of visit-related diagnostic clusters is considerable for family physicians. Although this variability creates different absolute rank orders of diagnostic clusters depending on the time of year sampled, it has much less effect on the overall list of frequently seen diagnoses. Furthermore, the percentage of visits by various age-groups also varies across seasons. Thus, a PBRN wishing to replicate PRINS/ NAMCS data collection should consider the issues of sampling time frame in relation to the intended use of the data. If the absolute rank order of diagnoses or the age distribution of the patients in a network is important, then careful attention to sampling time frames is critical. If the intent is to understand what diagnoses are frequently encountered (eg, more than 1%) within the network, then our results suggest that any quarter other than spring could be used for data collection.
Many diagnostic clusters that displayed seasonal variation can be attributed to the infectious processes typically seen more frequently in the winter months in North America. This seasonality also corresponds with the greater percentage of visits by children in the winter. The seasonality associated with obesity, coronary artery disease, and pregnancy is not intuitively obvious, as these diagnostic clusters are for chronic conditions or are associated with multiple visits over more than 6 months. It is possible that visits relating to obesity and coronary artery disease might be linked opportunistically to other types of visits that do display seasonality. For instance, these diagnoses may be linked to health maintenance visits. Because children make more frequent visits to family physicians in the winter, the number of health maintenance visits by adults in winter may be less than those made in other seasons, artificially lowering the frequency of coronary artery disease diagnoses in winter. The diagnostic cluster physical examination did not display seasonality, making this possibility less likely. Nonetheless, the diagnostic cluster physical examination includes both adults and children, thus the possibility that adult visits vary by season was not tested.
The possibility of one diagnostic cluster being linked to a second to account for seasonality was not examined. Alternatively, if the total percentage of visits by men was unevenly distributed across the year, then the diagnoses of coronary artery disease and pregnancy could reflect that variation. Because of the higher prevalence of coronary artery disease and no prevalence of pregnancy in men, any quarter of the year with a differentially high or low percentage of men could account for the variation found for these 2 diagnoses. This possibility was examined, and no significant seasonal variation by sex was seen. Furthermore, the coronary artery disease diagnosis represents the chronic condition, not an acute coronary event, further obfuscating potential causes for this finding.
We found 10 diagnostic clusters with seasonal variation for all 3 or 2 of 3 seasonal definitions. Data collection time frames of less than 1 year, unless thoughtfully selected, are likely to have a substantial effect on these diagnostic clusters. Not only is the rank order for these diagnostic clusters likely to change, but in the case of coronary artery disease and pregnancy, the diagnostic cluster may drop below the 1% threshold in restricted data collection time frames.
We are unable to corroborate the findings of Ejlertsson,16 who found seasonality in the number of visits to general practitioners as a result of a lower number of visits in summer. Despite the lower number of visits in summer in our analysis, this difference is not statistically significant. Ejlertsson attributed his findings, which derive from 14 years of claims data for all of Sweden, to the lower number of visits for acute infectious processes in summer. Those diagnoses are also less common in our data set. The NAMCS data set is not nearly as robust as the data Ejlertsson used, and thus it may mask such a finding. Nonetheless, the NAMCS data could indicate that family physicians essentially fill their office schedules year round and the lower total number of summer visits are due to an increase in time off during the summer.
To establish the value of PBRNs as primary care laboratories, the similarities and differences among networks must be described and understood. NAMCS and the more recent PRINS survey, specifically developed for primary care, are currently among the best available methods for describing primary care populations and the clinical care they receive. Current methods dictate that NAMCS data be collected for an entire year, yet this protocol is costly often impractical. The ability to collect data at fewer points in time could decrease cost to PBRNs, increase practice and clinician participation, and minimize the frustration of both groups, resulting in increased quality of data and better estimates. The findings of the present study indicate that diagnoses collected at 2 points in time are similar to those collected across the course of a year. Our resampling analysis suggests that if less than a full year of data collection is used, either a summer-winter or fall-spring sampling method would reasonably approximate sampling across the full year. In fact, whereas the absolute rank of a given diagnostic cluster would change, the actual list of most frequent diagnostic clusters would show little variation if any single quarter other than spring were used for data collection.
Footnotes
-
Conflicts of interest: none reported
-
Funding support: This research was supported in part by the Primary Care Research Unit, University of Colorado Health Sciences Center, Denver, Colo, and HRSA Administrative Unit Grant #HP00054 5 D12.
- Received for publication January 29, 2003.
- Revision received June 24, 2003.
- Accepted for publication July 20, 2003.
- © 2004 Annals of Family Medicine, Inc.