Using Machine Learning to Predict Primary Care and Advance Workforce Research

Peter Wingrove; Winston Liaw; Jeremy Weiss; Stephen Petterson; John Maier; Andrew Bazemore

doi:10.1370/afm.2550

Abstract

PURPOSE To develop and test a machine-learning–based model to predict primary care and other specialties using Medicare claims data.

METHODS We used 2014-2016 prescription and procedure Medicare data to train 3 sets of random forest classifiers (prescription only, procedure only, and combined) to predict specialty. Self-reported specialties were condensed to 27 categories. Physicians were assigned to testing and training cohorts, and random forest models were trained and then applied to 2014-2016 data sets for the testing cohort to generate a series of specialty predictions. Comparing the predicted specialty to self-report, we assessed performance with F1 scores and area under the receiver operating characteristic curve (AUROC) values.

RESULTS A total of 564,986 physicians were included. The combined model had a greater aggregate (macro) F1 score (0.876) than the prescription-only (0.745; P <.01) or procedure-only (0.821; P <.01) model. Mean F1 scores across specialties in the combined model ranged from 0.533 to 0.987. The mean F1 score was 0.920 for primary care. The mean AUROC value for the combined model was 0.992, with values ranging from 0.982 to 0.999. The AUROC value for primary care was 0.982.

CONCLUSIONS This novel approach showed high performance and provides a near real-time assessment of current primary care practice. These findings have important implications for primary care workforce research in the absence of accurate data.

Key words

INTRODUCTION

Approximately 1 in 8 Americans works in health care.¹ Translating that into better health depends on the presence of an effective workforce, and many believe the system needs to address shortages and maldistribution.^2–4 In response, Congress established the National Health Care Workforce Commission, though it was never funded.¹

A primary task of the Commission was to analyze data that would inform responses to threats. For example, organizations have projected increasing shortages of primary care physicians,^4–7 underscoring the need for coordination across agencies and timely, accurate data.⁸

Unfortunately, the data needed are inadequate. Workforce data sets—the American Medical Association’s Masterfile and the Centers for Medicare and Medicaid Services’ (CMS) National Plan and Provider Enumeration System—have limitations. The Masterfile is a registry that documents medical school, residency, and fellowship training. Whereas training information is accurate, the registry relies on voluntary, self-reported responses for updates.⁹ Thus, the Masterfile’s accuracy decreases as clinicians age, reduce their hours, or change the type of care they deliver.^7,9

The National Plan and Provider Enumeration System similarly has difficulty reflecting actual practice.^10,11 Congress requires that physicians, regardless of Medicare participation, have unique identifiers—National Provider Identifiers (NPIs). The NPI specialty is self-reported, and there are neither requests for updated information, nor mechanisms to determine whether providers are clinically active.⁹ Clinicians are instructed to report changes within 30 days, though there are no penalties for failing to do so.⁹

Even with timely data, misclassification remains a risk. Workforce projections use the most recent residency to categorize specialties. A first problem with this approach is the services might be inconsistent with the residency, eg, family medicine residency graduates might be practicing dermatology. Second, it disregards the contributions of physicians in other specialties and nonphysicians, eg, a rural cardiologist might be practicing primary care.

The method described below overcomes these limitations by evaluating current behavior to infer specialty. Integrating the additional data has the potential to improve accuracy and serve as a check to traditional approaches. Prescription and procedure data are available via the CMS,¹² and technological advances allow us to apply emerging techniques. Machine learning, which develops algorithms to detect patterns, has been used to predict myriad outcomes including cancer survival and myocardial infarctions,^13–17 and has also been applied to Medicare billing data to predict physician specialty and identify fraud; however, this was not restricted to physicians, did not combine specialties performing similar roles, and did not incorporate prescribing data and, as a result, had low accuracy.¹⁸

The present study combined prescription and procedure data to predict specialty for this purpose. Rather than rely on training, we propose a new method that assesses prescriptions and procedures to determine specialty. The objectives were to describe prescriptions and procedures by specialty, combine data on prescriptions and procedures with machine learning to develop algorithms to predict physician specialties, and test model performance against self-reported specialty.

METHODS

Data Sources

The American Academy of Family Physicians Institutional Review Board approved this study. For this cross-sectional study, we used the 2014-2016 CMS Medicare Fee-For-Service Provider Utilization and Payment Data: Part D Prescriber Public Use Files to identify prescriptions.¹⁹ These data sets include information regarding beneficiaries enrolled in Medicare Part D (70% of all beneficiaries), information about providers (eg, NPI and self-reported specialty), and prescriptions (except for over-the-counter drugs).

To identify procedures, we used the 2014-2016 CMS Medicare Fee-For-Service Provider Utilization and Payment Data: Physician and Other Supplier Public Use Files.²⁰ In this Medicare Part B data set, procedures were identified with Healthcare Common Procedure Coding System codes. To protect privacy in these data sets, drugs and procedures were not reported by NPI if there were ≤10 claims.

Variables

To assess the same cohort of physicians, the analysis was restricted to nonpediatric physicians appearing in all 3 years (though they only needed to appear in either the procedure or prescription data sets for a given year). To maintain consistency, physicians were only included if they self-reported the same specialty across all 3 years. We excluded nonphysicians and physician assistants (PAs) and nurse practitioners (NPs) because their subspecialties were not listed. We assigned physicians from specialties with a low number of physicians or for which multiple specialties practice in similar ways into 1 of 27 larger specialties (eg, internal medicine or family medicine were relabeled as primary care). To avoid rare drugs or procedures, we restricted the analysis to the 850 most common prescriptions and 1,500 most common procedure codes and excluded items that did not appear in all 3 years. For each year, we characterized physicians by whether they prescribed or performed each of the 2,350 prescriptions/procedures. We did not account for the number of times they prescribed or performed each.

Physicians were then randomly assigned to 2 groups of the same size (Train and Test). Each physician in the Train and Test groups had a data set of associated prescription/procedure behavior for each of the 3 years.

Deriving the Algorithm

Random forest is an ensemble learning method that creates decision trees and generates an output based on the class value that appears most frequently, incorporating random variation to generate a lot of trees that are slightly different. This minimizes overfitting and makes the analysis robust to imbalanced data by limiting the pool of possible variables available at each split.^21,22 We selected this method for its conceptual simplicity and favorable statistical properties.²³

To begin, we trained a separate random forest model (the combined model, consisting of both prescription and procedure data) for each year. Each random forest consisted of 200 trees and had a pool of 100 possible variables at each node. Changes in hyper-parameters failed to significantly improve these models over the default settings, with the exception of slightly better performance with more possible variables at each node than the default setting; we selected a value of 100 for simplicity. We chose to run 3 separate models as an alternative to cross-validation. Because the prescription and procedure patterns associated with each specialty should be stable across each year, applying 3 separate random forest models to each year of Test data was a robust way to generate many sets of predictions and assess how consistent the method was at predicting specialty. Though these are imbalanced data, various methods to account for this, including undersampling the larger specialties and weighting the smaller specialties, improved performance for some specialties at the expense of others. Because the goal was accurate prediction for physicians regardless of specialty, we chose to leave the data unbalanced.

Validating the Algorithm

To assess consistency, we applied each of the 3 random forest models to each of the 3 years of Test data, giving 9 sets of predictions based on the physicians in the Test group. The 9 sets of predictions were compared with self-reported specialty to generate an F1 score (harmonic mean of precision [positive predictive value] and recall [sensitivity]) for each specialty, and a macro F1 score, calculated on the average precision and recall of all specialties. We reported these values as an average across the 9 sets of predictions. We used the 2016 random forest on the 2016 Test data to create sample receiver operating characteristic curves and calculate area under the curve (AUC) values for each specialty.

The F1 score was selected as the primary measure instead of AUC value because of class imbalance. The F1 score is ideal in that it does not take into account true negatives (which will be large no matter what specialty is examined). The F1 score will be low for a given specialty if a significant number of false negatives or false positives occur, and as a result, F1 score can be low for individual specialties even if the model predicts most other specialties well. Because of the large number of true negatives when predicting small specialties, specificity (true negatives/[true negatives + false positives]) can be high even when there are many false positives and precision (true positives/[true positives + false positives]) is low. The high specificity over a large range of sensitivities leads to high AUC values.

Prescription- and Procedure-Only Subanalyses

We generated 3 additional random forests using only the prescription variables and removing physicians with no prescription data available. We did the same for the procedure variables, removing physicians with no procedure data.

We used the 3 prescription-only models to generate 9 predictions (eg, the 2016 prescription-only model can generate predictions using the 2014, 2015, and 2016 Test data sets) based on the Test data sets, looking only at variables for prescriptions. We did the same for the 3 procedure-only models. We then generated an F1 score for each specialty and macro F1 scores for the prescription-only and procedure-only sets of predictions.

Statistical Analysis

We used 2-sided paired t tests to assess whether the performance of the combined method differed from the prescription-only or procedure-only method, by specialty as well as macro F1 score. Data are presented as mean (%) or mean (95% CI). We considered P <.05 to be statistically significant.

Aggregate Analysis

We summed the predicted number of physicians in each specialty for the 9 predictions generated by the combined random forests, averaged the counts, and compared them to the specialty distribution of the Test set to assess if the overall predicted physician counts were in line with the actual Test set counts. To assess model consistency at the individual physician level, we looked at 2016 data for physicians in the Test set and used the 3 combined (2014-2016) models to generate 3 predictions. We defined model agreement as all 3 models predicting the same specialty. We focused on a single year of prescribing and procedural data for physicians in the Test set because even though we excluded physicians who did not self-report a consistent specialty across all 3 years, it was still possible that a physician’s actual specialty had changed year to year. Choosing to apply the 2014-2016 random forest models to just the 2016 Test data set removed the possibility that the model was inconsistent when a physician in the Test data set showed behavior that changed across the years; prediction disagreement in terms of their 2014, 2015, and 2016 specialty might have reflected the model working as intended. We then categorized physicians according to whether their self-reported specialties did or did not match the predictions.

Statistical analyses were performed with Stata version 15.0 (StataCorp, LLC). The random forest models were run with the ranger package in R, and AUC was calculated in R with the pROC package.²⁴

National Provider Identifiers

Despite its flaws, self-report via NPI is an appropriate reference standard. First, it effectively deals with the concern that historical training differs from current practice by divorcing specialty categorization from residency training. This would remain an issue if we used the American Medical Association’s Masterfile. Second, by only including those physicians who appeared in the prescribing or procedural data set, we excluded those not clinically active. Our models are based on the aggregate behavior of a large number of physicians, and we hypothesized that they are not meaningfully influenced by the small number of physicians with inaccurate self-reported specialty.

RESULTS

We included 564,986 physicians (n = 282,493 in the Train and Test groups). A breakdown by specialty for the Train and Test sets is shown in Table 1. The smallest specialty was allergy/immunology, comprising 0.6% of the physicians in both data sets, and the largest was primary care, comprising 35.6% and 35.9% of the Train and Test sets, respectively. Using prescription data only, approximately 40% of physicians identified as primary care compared with approximately 34% using procedure data only (Supplemental Table 1). Psychiatrists exhibited a similar pattern, with more appearing in the prescription data set than the procedure data set. The inverse was true for specialists who routinely perform procedures.

View this table:

Table 1

Prescriptions and Procedures and Comparison of Train and Test Data Sets, by Specialty

Primary care physicians prescribed the greatest mean number of unique drugs (61.4), more than 50% more than the next greatest group (cardiologists, 38.1) (Table 1). Radiologists had the greatest mean number of unique procedure codes (35.7).

Comparing the combined and procedure-only predictions, the combined model was significantly better for 18 (66.7%) specialties, worse for 8 (29.6%), and no different for 1 (3.7%) (Table 2; see Supplemental Table 2 for recall, negative predictive, and positive predictive values). Comparing the combined to prescription-only predictions, 19 (70.4%) were significantly better, 6 (22.2%) were worse, and 2 (7.4%) were no different. Macro F1 scores also showed statistically significant differences; the combined model (0.876) was more than 0.05 greater than the procedure-only model (0.821) and more than 0.10 greater than the prescription-only model (0.745).

View this table:

Table 2

F1 Scores for Random Forests, by Specialty and Type of Training Data

With respect to the overall robustness of the combined model, 22 specialties (81.5%) had mean F1 scores > 0.80, and 15 (55.6%) had sc0ores > 0.90 (Table 2). The 3 worst specialties were plastic surgery (0.533), physical medicine and rehabilitation (0.586), and neurosurgery (0.650), and the combined model was significantly better than the procedure-only and prescription-only models for all 3 of these specialties. No specialty had a score of < 0.500 for the combined model. The F1 score for the combined model for primary care was 0.920.

These performance characteristics translated to high AUC values (Supplemental Table 3); 22 specialties (81.5%) had AUC values > 0.99. The lowest AUC was for primary care (0.982).

These models also generated relatively accurate predictions for specialty counts (Table 3). Nineteen (70.4%) of the predicted counts for specialties were within 5% of the actual counts. The models underestimated the number of physicians in several specialties, including infectious disease, neurosurgery, physical medicine and rehabilitation, and plastic surgery. In contrast, the model overestimated the number of physicians practicing primary care by 3.7%.

View this table:

Table 3

Predicted vs Actual Counts of Physicians, by Specialty, for Combined Random Forest Models

With respect to consistency, the 3 models predicted the same specialty for 97.0% of physicians, when applied to the same year of Test prescription and procedure data (2016) (Table 4). Among these, 89.4% were consistently predicted as the specialty that matched their self-report, whereas 7.6% were consistently predicted as a nonmatching specialty. These values were 98.3%, 92.6%, and 5.8%, respectively, for primary care.

View this table:

Table 4

Model Agreement and Specialty Match Using 2016 Data

DISCUSSION

In this study, we developed high-performing models to predict specialties. With noted exceptions, these models exhibited high F1 scores and AUC values, especially in comparison to earlier work.¹⁸

For several specialties, including neurosurgery and physical medicine and rehabilitation, the models’ performance was suboptimal. We hypothesize that these specialties have high overlap with other specialties, making classification difficult. This finding was not true for primary care, suggesting that the constellation of procedures and prescriptions is also important. Whereas primary care shares prescriptions and procedures with a broad range of specialties, few share its breadth.

Our method has implications for primary care workforce studies. For example, this approach can be used to identify primary care PAs/NPs, who do not have mandated residencies and have eluded classification.²⁵ Workforce projections have been hampered by these limitations. For example, across 40 state workforce assessments, 60% did not include PAs/NPs, citing inadequate data as justification for their exclusion.²⁶ To capture the contribution of PAs/NPs, researchers have relied on surveys and state licensing data,^27,28 which have response rates of 20% to 30%.^29,30

Our approach also enhances the accuracy and granularity of projections. As noted, workforce projections rely on training though this might not reflect current practice.^5–7 Our approach provides a near real-time assessment of behavior. This subtle distinction might affect residencies created and policies supported. This method also allows for identification of physicians not easily categorized such as those providing HIV care.³¹

There are several limitations to the study. First, we excluded physicians not billing Medicare, only participating in Medicare Advantage, or only providing pediatric care. Physicians had to prescribe drugs or perform procedures >10 times to appear in the data set. A national all-payer claims database would overcome these limitations. Second, we evaluated a single technique in this analysis. Whereas random forest models are broadly used, it is possible that other techniques or changes to parameters might improve accuracy.³² Third, we only included physicians appearing in 3 consecutive years. These analyses need to be repeated with a cohort that involves physicians with less longitudinal data to determine if results are similar. Fourth, we were unable to understand the motivations behind scope deviations, eg, a family physician could practice differently because of unique disease patterns in their service area. Understanding these motivations via a qualitative approach would provide additional context. Finally, we used self-reported specialty for training and testing. As mentioned, this database does not have a penalty for out-of-date information, though physicians are instructed to report changes.⁹

In summary, we report a novel method for identifying primary care physicians. These models exhibit high performance, and because they identify the practice patterns of specialties, they can be used to identify primary care PAs and NPs. By assessing current practice rather than historical training, this approach has the potential to change how the primary care workforce is tracked.

Footnotes

Conflicts of interest: authors report none.
To read or post commentaries in response to this article, see it online at https://www.AnnFamMed.org/content/18/4/334.
Prior presentation: 2017 North American Primary Care Research Group Annual Meeting; November 17-21, 2017; Montreal, Canada.
Supplemental materials: available at https://www.AnnFamMed.org/content/18/4/334/suppl/DC1/.

Received for publication February 22, 2019.
Revision received November 27, 2019.
Accepted for publication January 6, 2020.

References

↵
1. Buerhaus PI,
2. Retchin SM
. The dormant National Health Care Workforce Commission needs congressional funding to fulfill its promise. Health Aff (Millwood). 2013; 32(11): 2021–2024.
OpenUrl Abstract/FREE Full Text
↵
1. Eden J,
2. Berwick D,
3. Wilensky G
1. Committee on the Governance and Financing of Graduate Medical Education,
2. Board on Health Care Services,
3. Institute of Medicine
. Graduate Medical Education That Meets the Nation’s Health Needs. Eden J, Berwick D, Wilensky G, eds. National Academies Press; 2014. https://pubmed.ncbi.nlm.nih.gov/25340242/. Accessed May 19, 2020.
1. Chen C,
2. Petterson S,
3. Phillips RL,
4. Mullan F,
5. Bazemore A,
6. O’Donnell SD
. Toward graduate medical education (GME) accountability: measuring the outcomes of GME institutions. Acad Med. 2013; 88(9): 1267–1280.
OpenUrl CrossRef PubMed
↵
1. Council on Graduate Medical Education
. Towards the Development of a National Strategic Plan for Graduate Medical Education. 23rd Report. https://www.hrsa.gov/sites/default/files/hrsa/advisory-committees/graduate-medical-edu/reports/April2017.pdf. Published 2017. Accessed May 19, 2020.
↵
1. Association of American Medical Colleges
. The Complexities of Physician Supply and Demand: Projections from 2013 to 2025. https://www.kff.org/wp-content/uploads/sites/3/2015/03/ihsreportdownload.pdf. Published 2015. Accessed Jun 26, 2016.
1. Duchovny N,
2. Trachtman S,
3. Werble E,
4. Congressional Budget Office
. Projecting Demand for the Services of Primary Care Doctors. https://www.cbo.gov/system/files/115th-congress-2017-2018/workingpaper/52748-workingpaper.pdf. Published 2017. Accessed May 19, 2020.
↵
1. Petterson SM,
2. Liaw WR,
3. Tran C,
4. Bazemore AW
. Estimating the residency expansion required to avoid projected primary care physician shortages by 2035. Ann Fam Med. 2015; 13(2): 107–114.
OpenUrl Abstract/FREE Full Text
↵
1. Starfield B,
2. Shi L,
3. Macinko J
. Contribution of primary care to health systems and health. Milbank Q. 2005; 83(3): 457–502.
OpenUrl CrossRef PubMed
↵
1. Bindman AB
. Using the National Provider Identifier for health care workforce evaluation. Medicare Medicaid Res Rev. 2013; 3(3): E1–E10: mmrr.003.03.b03.
OpenUrl CrossRef PubMed
↵
1. Will KK,
2. Williams J,
3. Hilton G,
4. Wilson L,
5. Geyer H
. Perceived efficacy and utility of postgraduate physician assistant training programs. JAAPA. 2016; 29(3): 46–48.
OpenUrl PubMed
↵
1. American Nurses Association
. Nurse Practitioner Perspective on Education and Post-graduate Training. https://www.nursingworld.org/practice-policy/nursing-excellence/official-position-statements/id/nurse-practitioner-perspective-on-education/. Published 2014. Accessed May 19, 2020.
↵
1. Centers for Medicare and Medicaid Services
. Medicare Provider Utilization and Payment Data: Part D Prescriber. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Part-D-Prescriber.html. Published May, 2017. Updated Nov 2019. Accessed May 19, 2020.
↵
1. Gupta S,
2. Tran T,
3. Luo W,
4. et al
. Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. BMJ Open. 2014; 4(3): e004007.
OpenUrl Abstract/FREE Full Text
1. Weiss JC,
2. Natarajan S,
3. Peissig PL,
4. McCarty CA,
5. Page D
. Machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records. AI Mag. 2012; 33(4): 33–45.
OpenUrl
1. Zhai H,
2. Brady P,
3. Li Q,
4. et al
. Developing and evaluating a machine learning based algorithm to predict the need of pediatric intensive care unit transfer for newly hospitalized children. Resuscitation. 2014; 85(8): 1065–1071.
OpenUrl CrossRef PubMed
1. Weng SF,
2. Reps J,
3. Kai J,
4. Garibaldi JM,
5. Qureshi N
. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017; 12(4): e0174944.
OpenUrl CrossRef PubMed
↵
1. Gulshan V,
2. Peng L,
3. Coram M,
4. et al
. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316(22): 2402–2410.
OpenUrl CrossRef PubMed
↵
1. Bauder RA,
2. Khoshgoftaar TM,
3. Richter AN,
4. Herland M
. Predicting medical provider specialties to detect anomalous insurance claims. 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), San Jose, CA; 2016. pp. 784–790.
↵
1. Centers for Medicare and Medicaid Services
. Medicare Fee-For-Service Provider Utilization & Payment Data Part D Prescriber Public Use File: A Methodological Overview. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Prescriber_Methods.pdf. Published 2019. Accessed Jun 26, 2020.
↵
1. Centers for Medicare and Medicaid Services
. Medicare Fee-For-Service Provider Utilization & Payment Data Physician and Other Supplier Public Use File: A Methodological Overview. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Medicare-Physician-and-Other-Supplier-PUF-Methodology.pdf. Published 2014. Updated 2019. Accessed May 19, 2020.
↵
1. Liaw A,
2. Wiener M
. Classification and regression by randomForest. R News. 2002; 2/3: 18–22.
OpenUrl
↵
1. Khoshgoftaar TM,
2. Golawala M,
3. Van Hulse J
. An empirical study of learning from imbalanced data using random forest. 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). Patras, Greece; 2007. pp. 310–317.
↵
1. Breiman L
. Random forests. Mach Learn. 2001; 45: 5–32.
OpenUrl CrossRef PubMed
↵
1. Wright MN,
2. Ziegler A
. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017; 77(1): 1–17.
OpenUrl CrossRef
↵
1. Wiltse Nicely KL,
2. Fairman J
. Postgraduate nurse practitioner residency programs: supporting transition to practice. Acad Med. 2015; 90(6): 707–709.
OpenUrl
↵
1. Morgan P,
2. Strand De Oliveira J,
3. Short NM
. Physician assistants and nurse practitioners: a missing component in state workforce assessments. J Interprof Care. 2011; 25(4): 252–257.
OpenUrl PubMed
↵
1. Doescher MP,
2. Andrilla CH,
3. Skillman SM,
4. Morgan P,
5. Kaplan L
. The contribution of physicians, physician assistants, and nurse practitioners toward rural primary care: findings from a 13-state survey. Med Care. 2014; 52(6): 549–556.
OpenUrl CrossRef PubMed
↵
1. Spetz J,
2. Fraher E,
3. Li Y,
4. Bates T
. How many nurse practitioners provide primary care? It depends on how you count them. Med Care Res Rev. 2015; 72(3): 359–375.
OpenUrl CrossRef PubMed
↵
1. American Academy of Physician Assistants
. Physician Assistant Census Report: Results From the 2010 AAPA Census. https://www.aapa.org/wp-content/uploads/2016/12/2010_AAPA_Census_Report.pdf. Published 2011. Accessed May 19, 2020.
↵
1. US Department of Health and Human Services, Health Resources and Services Administration, National Center for Health Workforce Analysis
. Projecting the Supply and Demand for Primary Care Practitioners Through 2020. https://bhw.hrsa.gov/sites/default/files/bhw/nchwa/projectingprimarycare.pdf. Published 2013. Accessed May 19, 2020.
↵
1. Gilman B,
2. Bouchery E,
3. Barrett K,
4. et al
. HIV Clinician Workforce Study: Final Report. Cambridge, MA: Mathematica Policy Research; 2013.
↵
1. Verikas A,
2. Gelzinis A,
3. Bacauskiene M
. Mining data with random forests: a survey and results of new tests. Pattern Recognition. 2011; 44(2): 330–349.
OpenUrl CrossRef

View Abstract

In this issue

Download PDF

Article Alerts

Email Article

Citation Tools

Get Permissions

Cited By...

No citing articles found.

Google Scholar

More in this TOC Section

Show more Original Research

Subjects

Keywords

[1] ↵
Buerhaus PI,
Retchin SM
. The dormant National Health Care Workforce Commission needs congressional funding to fulfill its promise. Health Aff (Millwood). 2013; 32(11): 2021–2024.
OpenUrl Abstract/FREE Full Text

[2] Buerhaus PI,

[3] Retchin SM

[4] ↵
Eden J,
Berwick D,
Wilensky G
Committee on the Governance and Financing of Graduate Medical Education,
Board on Health Care Services,
Institute of Medicine
. Graduate Medical Education That Meets the Nation’s Health Needs. Eden J, Berwick D, Wilensky G, eds. National Academies Press; 2014. https://pubmed.ncbi.nlm.nih.gov/25340242/. Accessed May 19, 2020.

[5] Eden J,

[6] Berwick D,

[7] Wilensky G

[8] Committee on the Governance and Financing of Graduate Medical Education,

[9] Board on Health Care Services,

[10] Institute of Medicine

[11] Chen C,
Petterson S,
Phillips RL,
Mullan F,
Bazemore A,
O’Donnell SD
. Toward graduate medical education (GME) accountability: measuring the outcomes of GME institutions. Acad Med. 2013; 88(9): 1267–1280.
OpenUrl CrossRef PubMed

[12] Chen C,

[13] Petterson S,

[14] Phillips RL,

[15] Mullan F,

[16] Bazemore A,

[17] O’Donnell SD

[18] ↵
Council on Graduate Medical Education
. Towards the Development of a National Strategic Plan for Graduate Medical Education. 23rd Report. https://www.hrsa.gov/sites/default/files/hrsa/advisory-committees/graduate-medical-edu/reports/April2017.pdf. Published 2017. Accessed May 19, 2020.

[19] Council on Graduate Medical Education

[20] ↵
Association of American Medical Colleges
. The Complexities of Physician Supply and Demand: Projections from 2013 to 2025. https://www.kff.org/wp-content/uploads/sites/3/2015/03/ihsreportdownload.pdf. Published 2015. Accessed Jun 26, 2016.

[21] Association of American Medical Colleges

[22] Duchovny N,
Trachtman S,
Werble E,
Congressional Budget Office
. Projecting Demand for the Services of Primary Care Doctors. https://www.cbo.gov/system/files/115th-congress-2017-2018/workingpaper/52748-workingpaper.pdf. Published 2017. Accessed May 19, 2020.

[23] Duchovny N,

[24] Trachtman S,

[25] Werble E,

[26] Congressional Budget Office

[27] ↵
Petterson SM,
Liaw WR,
Tran C,
Bazemore AW
. Estimating the residency expansion required to avoid projected primary care physician shortages by 2035. Ann Fam Med. 2015; 13(2): 107–114.
OpenUrl Abstract/FREE Full Text

[28] Petterson SM,

[29] Liaw WR,

[30] Tran C,

[31] Bazemore AW

[32] ↵
Starfield B,
Shi L,
Macinko J
. Contribution of primary care to health systems and health. Milbank Q. 2005; 83(3): 457–502.
OpenUrl CrossRef PubMed

[33] Starfield B,

[34] Shi L,

[35] Macinko J

[36] ↵
Bindman AB
. Using the National Provider Identifier for health care workforce evaluation. Medicare Medicaid Res Rev. 2013; 3(3): E1–E10: mmrr.003.03.b03.
OpenUrl CrossRef PubMed

[37] Bindman AB

[38] ↵
Will KK,
Williams J,
Hilton G,
Wilson L,
Geyer H
. Perceived efficacy and utility of postgraduate physician assistant training programs. JAAPA. 2016; 29(3): 46–48.
OpenUrl PubMed

[39] Will KK,

[40] Williams J,

[41] Hilton G,

[42] Wilson L,

[43] Geyer H

[44] ↵
American Nurses Association
. Nurse Practitioner Perspective on Education and Post-graduate Training. https://www.nursingworld.org/practice-policy/nursing-excellence/official-position-statements/id/nurse-practitioner-perspective-on-education/. Published 2014. Accessed May 19, 2020.

[45] American Nurses Association

[46] ↵
Centers for Medicare and Medicaid Services
. Medicare Provider Utilization and Payment Data: Part D Prescriber. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Part-D-Prescriber.html. Published May, 2017. Updated Nov 2019. Accessed May 19, 2020.

[47] Centers for Medicare and Medicaid Services

[48] ↵
Gupta S,
Tran T,
Luo W,
et al
. Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry. BMJ Open. 2014; 4(3): e004007.
OpenUrl Abstract/FREE Full Text

[49] Gupta S,

[50] Tran T,

[51] Luo W,

[52] et al

[53] Weiss JC,
Natarajan S,
Peissig PL,
McCarty CA,
Page D
. Machine learning for personalized medicine: predicting primary myocardial infarction from electronic health records. AI Mag. 2012; 33(4): 33–45.
OpenUrl

[54] Weiss JC,

[55] Natarajan S,

[56] Peissig PL,

[57] McCarty CA,

[58] Page D

[59] Zhai H,
Brady P,
Li Q,
et al
. Developing and evaluating a machine learning based algorithm to predict the need of pediatric intensive care unit transfer for newly hospitalized children. Resuscitation. 2014; 85(8): 1065–1071.
OpenUrl CrossRef PubMed

[60] Zhai H,

[61] Brady P,

[62] Li Q,

[63] et al

[64] Weng SF,
Reps J,
Kai J,
Garibaldi JM,
Qureshi N
. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017; 12(4): e0174944.
OpenUrl CrossRef PubMed

[65] Weng SF,

[66] Reps J,

[67] Kai J,

[68] Garibaldi JM,

[69] Qureshi N

[70] ↵
Gulshan V,
Peng L,
Coram M,
et al
. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016; 316(22): 2402–2410.
OpenUrl CrossRef PubMed

[71] Gulshan V,

[72] Peng L,

[73] Coram M,

[74] et al

[75] ↵
Bauder RA,
Khoshgoftaar TM,
Richter AN,
Herland M
. Predicting medical provider specialties to detect anomalous insurance claims. 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), San Jose, CA; 2016. pp. 784–790.

[76] Bauder RA,

[77] Khoshgoftaar TM,

[78] Richter AN,

[79] Herland M

[80] ↵
Centers for Medicare and Medicaid Services
. Medicare Fee-For-Service Provider Utilization & Payment Data Part D Prescriber Public Use File: A Methodological Overview. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Prescriber_Methods.pdf. Published 2019. Accessed Jun 26, 2020.

[81] Centers for Medicare and Medicaid Services

[82] ↵
Centers for Medicare and Medicaid Services
. Medicare Fee-For-Service Provider Utilization & Payment Data Physician and Other Supplier Public Use File: A Methodological Overview. https://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Medicare-Physician-and-Other-Supplier-PUF-Methodology.pdf. Published 2014. Updated 2019. Accessed May 19, 2020.

[83] Centers for Medicare and Medicaid Services

[84] ↵
Liaw A,
Wiener M
. Classification and regression by randomForest. R News. 2002; 2/3: 18–22.
OpenUrl

[85] Liaw A,

[86] Wiener M

[87] ↵
Khoshgoftaar TM,
Golawala M,
Van Hulse J
. An empirical study of learning from imbalanced data using random forest. 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI). Patras, Greece; 2007. pp. 310–317.

[88] Khoshgoftaar TM,

[89] Golawala M,

[90] Van Hulse J

[91] ↵
Breiman L
. Random forests. Mach Learn. 2001; 45: 5–32.
OpenUrl CrossRef PubMed

[92] Breiman L

[93] ↵
Wright MN,
Ziegler A
. ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017; 77(1): 1–17.
OpenUrl CrossRef

[94] Wright MN,

[95] Ziegler A

[96] ↵
Wiltse Nicely KL,
Fairman J
. Postgraduate nurse practitioner residency programs: supporting transition to practice. Acad Med. 2015; 90(6): 707–709.
OpenUrl

[97] Wiltse Nicely KL,

[98] Fairman J

[99] ↵
Morgan P,
Strand De Oliveira J,
Short NM
. Physician assistants and nurse practitioners: a missing component in state workforce assessments. J Interprof Care. 2011; 25(4): 252–257.
OpenUrl PubMed

[100] Morgan P,

[101] Strand De Oliveira J,

[102] Short NM

[103] ↵
Doescher MP,
Andrilla CH,
Skillman SM,
Morgan P,
Kaplan L
. The contribution of physicians, physician assistants, and nurse practitioners toward rural primary care: findings from a 13-state survey. Med Care. 2014; 52(6): 549–556.
OpenUrl CrossRef PubMed

[104] Doescher MP,

[105] Andrilla CH,

[106] Skillman SM,

[107] Morgan P,

[108] Kaplan L

[109] ↵
Spetz J,
Fraher E,
Li Y,
Bates T
. How many nurse practitioners provide primary care? It depends on how you count them. Med Care Res Rev. 2015; 72(3): 359–375.
OpenUrl CrossRef PubMed

[110] Spetz J,

[111] Fraher E,

[112] Li Y,

[113] Bates T

[114] ↵
American Academy of Physician Assistants
. Physician Assistant Census Report: Results From the 2010 AAPA Census. https://www.aapa.org/wp-content/uploads/2016/12/2010_AAPA_Census_Report.pdf. Published 2011. Accessed May 19, 2020.

[115] American Academy of Physician Assistants

[116] ↵
US Department of Health and Human Services, Health Resources and Services Administration, National Center for Health Workforce Analysis
. Projecting the Supply and Demand for Primary Care Practitioners Through 2020. https://bhw.hrsa.gov/sites/default/files/bhw/nchwa/projectingprimarycare.pdf. Published 2013. Accessed May 19, 2020.

[117] US Department of Health and Human Services, Health Resources and Services Administration, National Center for Health Workforce Analysis

[118] ↵
Gilman B,
Bouchery E,
Barrett K,
et al
. HIV Clinician Workforce Study: Final Report. Cambridge, MA: Mathematica Policy Research; 2013.

[119] Gilman B,

[120] Bouchery E,

[121] Barrett K,

[122] et al

[123] ↵
Verikas A,
Gelzinis A,
Bacauskiene M
. Mining data with random forests: a survey and results of new tests. Pattern Recognition. 2011; 44(2): 330–349.
OpenUrl CrossRef

[124] Verikas A,

[125] Gelzinis A,

[126] Bacauskiene M

Main menu

User menu

Search

Using Machine Learning to Predict Primary Care and Advance Workforce Research

Abstract

INTRODUCTION

METHODS

Data Sources

Variables

Deriving the Algorithm

Validating the Algorithm

Prescription- and Procedure-Only Subanalyses

Statistical Analysis

Aggregate Analysis

National Provider Identifiers

RESULTS

DISCUSSION

Footnotes

References

In this issue

Citation Manager Formats

Related Articles

Cited By...

More in this TOC Section

Similar Articles

Subjects

Keywords

Content

Info for

Engage

About

Main menu

User menu

Search

Using Machine Learning to Predict Primary Care and Advance Workforce Research

Abstract

INTRODUCTION

METHODS

Data Sources

Variables

Deriving the Algorithm

Validating the Algorithm

Prescription- and Procedure-Only Subanalyses

Statistical Analysis

Aggregate Analysis

National Provider Identifiers

RESULTS

DISCUSSION

Footnotes

References

In this issue

Citation Manager Formats

Jump to section

Related Articles

Cited By...

More in this TOC Section

Similar Articles

Subjects

Keywords

Content

Info for

Engage

About