Abstract
PURPOSE We aimed to analyze regional variations in the assignment of International Classification of Diseases, 10th Revision (ICD-10) codes to acute respiratory infections, seeking to identify notable anomalies that suggest diverse diagnoses of the same condition.
METHODS We analyzed national weekly diagnosis data for acute respiratory infections (ICD-10 codes J00-J22) in Poland from 2010 to 2019, covering all 380 county-equivalent administrative regions and encompassing 292 million consultations. Data were aggregated into age brackets. We calculated the Kendall tau correlations between shares of particular diagnoses.
RESULTS We found staggering differences across regions in applied diagnoses that persisted even after disaggregating the data into age groups. The differences did not seem to stem from different levels of health care use, as there was no consistent pattern suggesting variability in milder diagnoses. Instead, there were numerous pairs of strongly negatively correlated codes implying classification ambiguity, with the most problematic diagnosis being J06 (acute upper respiratory infections of multiple and unspecified sites), which was used almost interchangeably with a diverse range of others, especially J00 (common cold) and J20 (bronchitis).
CONCLUSIONS To the best of our knowledge, this is the first study using observable anomalies to analyze regional coding variability for the same respiratory infection. Although some of these discrepancies may raise concerns about misdiagnosis, the majority of cases involving interchangeably used codes did not seem to substantially impact treatment or prognosis. This suggests that ICD codes may have clinical ambiguities and could face challenges not only in fulfilling their intended purpose of generating internationally comparable health data but also in their use for comprehensive government health planning.
- International Classification of Diseases
- coding, medical
- respiratory tract infections
- community health planning
- epidemiology
- primary health care
INTRODUCTION
The World Health Organization (WHO) has emphasized “The goal of having ICD [International Classification of Diseases] in place for health data collection is to generate comparable health data at the international level,”1 and it intends that this classification will be a “clinically relevant classification system.”1 The literature pertaining to whether the intended goal has been achieved is relatively sparse in directly addressing this problem, although some studies indirectly provide insight. The issue was theoretically addressed by Cimino2 in a highly influential 1998 publication on hierarchical medical terminology. In his later coauthored works, however, limitations in implementing detailed electronic health records were highlighted, including increased physician workload without compelling evidence of any benefit.3 He and colleagues also acknowledged some diagnostic variability, although it arose from differing laboratory thresholds.4
There are reports showing some medical variations in the interpretation of symptoms of respiratory infection, although most seem to indicate genuine challenges in diagnosis. Some imply notable controversy among medical specialists regarding diagnoses based on recorded lung sounds, primarily attributed to low specificity and sensitivity in detecting pneumonia or bronchitis based on wheezing and crackles.5 Similarly, studies demonstrate a marked enhancement in the accuracy of symptom-based diagnoses when augmented with tests detecting a specific pathogen, as observed in the case of influenza.6 A study on international discrepancies of medical coding, however, despite having a relatively small sample, found that respiratory infections were less problematic than other diagnoses.7
The anomalies in diagnosis rates could also be seen as a result of a flawed incentive structure. For example, because of isolation requirements under COVID-19, people without paid sick leave were disinclined to obtain a diagnosis.8 Reimbursement and supervision mechanisms may also lead to different coding; overly vague codes may potentially lead to claim denial.9 Martinez et al10 noted that the same physicians who are more likely to diagnose potential exceptions such as sinusitis, pharyngitis, or bronchitis are, even after correcting for diagnosis, more likely to prescribe antibiotics for exactly those conditions.
Acute respiratory infection (ARI) represents a cluster of clinically similar conditions for which the International Classification of Diseases, 10th Revision (ICD-10) offers numerous potentially applicable diagnosis codes (codes J00-J22). Clinicians are tasked with selecting among these codes (assigning a diagnosis) based on available clinical, laboratory, and imaging findings for a given patient. In this study, we analyzed comparability of data based on the assignment of ICD-10 codes to ARIs. Our goal was to identify substantial regional differences in the frequency of these codes, indicating heterogeneous diagnoses.
METHODS
Data Source
We obtained from the Polish National Healthcare Fund weekly diagnosis data for ARIs in the ICD-10 (codes J00-J22) for International Organization for Standardization (ISO)-weeks. The data spanned all 380 Polish regions (“powiat,” which are equivalent to administrative counties) for the period 2010-Week 1 to 2019-Week 52. Because of patient confidentiality, when the weekly number of diagnoses in a particular age group was between 1 and 4, we denoted it as less than 5. Conversely, for the purpose of calculation, it was assumed to be 2, except for patients aged older than 100 years for whom it was assumed to be 1.
These 292 million visits in Poland’s single-payer system with universal free coverage encompassed almost all primary health care visits. Compensation for visits primarily relied on a fixed monthly payment per enrolled patient, thereby discouraging intentional manipulation of recorded codes. The codes were recorded by physicians, and although the system allows for use of numerous codes or subcodes, for practical purposes, only a single respiratory primary code is typically selected.
Statistical Analysis
After the data were cleaned and consolidated, which merged seldom-used zoonotic influenza codes with unspecified influenza (J09 to J11), the number of visits was categorized by region into age brackets: children (aged 0-17 years), middle-aged adults (aged 18-64 years), and older adults (aged 65 years and older).
We intended to rule out the possibility that any differences were a product of demographic composition. We therefore analyzed the correlation between these proportions using the Kendall tau coefficient (τ); higher positive values indicate stronger positive correlation, and higher negative values indicate stronger negative correlation. A network plot was constructed to visualize correlations where τ was less than −0.1 or τ was greater than 0.1. Diagnoses with the strongest negative correlation, suggesting that patients with specific symptoms could receive either diagnosis based on local approaches, were plotted to visualize whether regional differences had a relevant impact on diagnosis.
We conducted statistical analysis and data visualization in Python 3.10 (Python Software Foundation) using default Python data-processing libraries such as pandas 1.4.3 and numpy 1.26.orc1. The τ values were calculated using the Scientific Python (SciPy) library 1.11.2 with locally weighted scatterplot smoothing (Loess) with statsmodels 0.14.0. Data visualization used networkx 3.2.1, the plotly library 5.14.1, and geopandas 0.14.0.
RESULTS
The relative prevalences of particular diagnosis codes (diagnoses) applied to ARI for the 380 Polish regions/counties are presented for each age group in Table 1. The data show that a few diagnoses effectively dominate, accompanied by a wide range of their regional proportions. However, this finding alone does not establish whether these differences reflect true epidemiologic variations or merely coding artifacts.
Distribution of ARls by ICD-10 Diagnosis Codes for Polish Regions During 2010-2019
Figure 1 presents correlations between shares of a particular diagnosis, with the existence of strongly negatively correlated codes implying coding ambiguity. In all age groups, the most troublesome code was J06 (acute upper respiratory infections of multiple and unspecified sites). Among children, it was used almost interchangeably with J00 (common cold). For the middle-aged population, J06 was used as an alternative for almost any other upper respiratory infection, and the same was true to a lesser extent among older adults. In each age group, certain infections were designated as either lower respiratory infections (J20 - bronchitis) or upper ones (J00 or J06). Similarly, the same conditions could be diagnosed as either J02 (pharyngitis) or as a more severe condition, especially J10 (influenza due to other identified influenza virus) or J15 (bacterial pneumonia, not elsewhere classified). There was one negative correlation that could plausibly be attributed to differences in use of medical services: a weak negative relationship observed in both adult groups between the common cold and pneumonia.
Network Plots Showing Correlations of Diagnoses by Age Group
Note: See Table 1 footnotes for code definitions.
Some diagnosis codes showed an interesting subtle positive correlation, particularly noticeable among nuanced lower respiratory infections. Across all groups, J04 (laryngitis and tracheitis) was positively correlated with J11 (influenza due to an unidentified influenza virus), which may seem counterintuitive. As our analysis was conducted before lateral flow tests were issued,11 however, this positive correlation may instead imply the co-occurrence of detailed diagnoses. This mechanism becomes clearer with another pair of diagnoses showing a positive correlation across all groups: J13 (pneumonia due to Streptococcus pneumoniae) and J14 (pneumonia due to Hemophilus influenzae), suggesting that certain facilities implemented more rigorous testing protocols.
In order to ascertain whether the observed relationship was a systemic problem or the wide ranges were merely caused by a minuscule number of miscoding clinicians, we assessed selected cases with the strongest negative correlation for all groups in pairs (Figure 2). Assuming that the same criteria had been consistently applied in each region, the plotted lines would be effectively flat, except for a few outliers in the top and bottom percentile. Nevertheless, although such outliers were detectable, there was no constant segment; this implies that the observed patterns were not due to isolated miscoding but rather reflected a broader lack of consensus on diagnostic delimitation.
Relative Region (Powiat) Probabilities of Receiving a Particular Diagnosis From Ambiguous Pairs
Note: See Table 1 footnotes for code definitions.
Finally, in order to verify whether any discernible regional pattern could explain the observed differences, we plotted relationships for the pair with the strongest negative correlation, detected for pair J00 and J06 among children (Figure 3). The map did not show any consistent regional differences. The only noticeable relationship was decreasing dispersion with increasing region diagnosis count, suggesting that the observed phenomena are not indicative of better diagnosis in larger cities, but rather that as diagnosis count increases, random factors begin to cancel out.
Relative Region (Powiat) Probabilities of Receiving a Particular Diagnosis for the Pair of J00 and J06 Among Children
J00 = acute nasopharyngitis [common cold]; J06 = acute upper respiratory infections of multiple and unspecified sites; Loess = locally weighted scatterplot smoothing.
DISCUSSION
To the best of our knowledge, this is the first study where observable anomalies were used to analyze the extent of regional variability in coding for the same respiratory infection. The observed differences in ICD-10 diagnosis code assignment for ARI appear huge, and although literature on the subject acknowledges some genuine diagnostic problems, these problems are insufficient to explain the magnitude of mismatch; thus, the most plausible explanation appears to be ambiguity of the codes. The International Classification of Diseases, 11th Revision (ICD-11) is highly unlikely to mitigate this problem as it has yet additional ARI diagnosis codes.12
Potential Explanations
Given studies indicating that patients seldom seek medical care for viral infections, including the majority of respiratory syncytial virus or influenza cases,13 and that there are substantial differences in care-seeking behavior between European countries with public health care systems,14,15 there could have been modest variations in the use or availability of medical services. If issues such as differences in perceived severity or effective access (eg, in rural regions or during peak infection seasons) influence medical visits, severe cases should remain constant while milder cases vary. This scenario would result in the common cold showing strong negative correlations with other conditions, and severe conditions being strongly correlated with each other. Such a pattern is only weakly supported by our data, however; instead, the formed pairs suggest that the observed process stems from classification differences.
We did identify some potential local mechanisms that could compromise data quality. Although the per capita payment system does not directly incentivize diagnosis manipulation, subtle indirect incentives exist. Polish regulations do require physicians to refund inappropriate reimbursements, although this was deemed more of a problem in cases of some particularly liability-laden ambiguous regulations16 or off-label use of medication.17 In the case of antibiotics, despite the known issue of overprescribing,18 there does not appear to be pressure to ensure that codes are unquestionably suitable for such treatments. A study of Polish adult patients, albeit having a small sample, found that most were given prescriptions for antibiotics for any major respiratory infection category, excluding the common cold.19 The lack of testing11 meant diagnoses were uncertain, leading to the incentive to avoid influenza diagnoses because of the extra epidemiologic reporting requirements.20 Nevertheless, these mechanisms do not appear strong enough to explain regional variability, especially considering their nationwide impact.
Although one could speculate that the observed phenomenon reflects only some poor record-keeping at the local level, deeper analysis uncovers a more substantial problem—the ICD system, intended for global use, encounters challenges even in countries with a seemingly satisfactory profile, such as Poland, a European Union member classified as having a high-income economy by the World Bank. Scrutinizing the overall performance of medical statistics in cases where independent cross-validation is possible, Poland is estimated to have missed approximately one-third of COVID-19 deaths,21 whereas the United States missed one-fifth.22 In correctly attributing influenza-associated deaths, neither country was successful.11,23 The observed problems therefore suggest that ICD codes are unlikely to serve their intended global purpose effectively, and one should exercise extra caution when drawing epidemiologic conclusions or making international comparisons based on these codes. This is implicitly suggested also by monitoring programs on influenza-like illnesses, where the adopted methodology does not show any resemblance and is instead symptom based.24 Moreover, it is hard to demonstrate the relevance of ICD classification even to ARI treatment as antibiotic use guidelines not only rely on specific symptoms and their severity, but also tend to be based on a somewhat different classification of respiratory illnesses.25,26
Limitations
Acknowledging Poland’s inconsistent application of codes suggests a challenging prospect for achieving intended comparable global data, particularly in less developed countries. Definitive conclusions, however, await replication of this study elsewhere. The data presented in the Supplemental Appendix and Supplemental Figures 1-5 leave some possibility that beyond miscoding, there could also be subtle genuine variability related to regional differences in visits among older adults.
Conclusions
Our findings suggests that ICD codes may have clinical ambiguities. These codes could therefore face challenges not only in fulfilling their intended purpose of generating internationally comparable health data but also in their use for comprehensive government health planning.
Footnotes
Conflicts of interest: authors report none.
- Received for publication January 27, 2024.
- Revision received April 21, 2024.
- Accepted for publication August 14, 2024.
- © 2025 Annals of Family Medicine, Inc.