Table 2.

Discriminative Performance (ROC-AUC), Calibration, and Brier Scores for the NoMicro and NeedMicro Predictive Models Under Internal (Emergency Department) and External (Primary Care) Validation

Model	ROC-AUC (95% CI^{^a})		Calibration Decile Linear Fit R² (95% CI^{^a})		Scaled Brier Score (95% CI^{^a})
Model	Primary Care^{^b}	Emergency Department^{^c}	Primary Care^{^b}	Emergency Department^{^c}	Primary Care^{^b}	Emergency Department^{^c}
NoMicro/XGB	0.84 (0.8-0.88)	0.86 (0.86-0.87)	0.98 (0.83-0.98)	>0.99 (0.99-1.0)	0.34 (0.25-0.42)	0.34 (0.33-0.36)
NoMicro/RF	0.85 (0.81-0.89)	0.85 (0.84-0.85)	0.94 (0.77-0.97)	>0.99 (0.98-1.0)	0.37 (0.27-0.46)	0.3 (0.28-0.32)
NoMicro/ANN	0.85 (0.81-0.89)	0.86 (0.85-0.86)	0.97 (0.86-0.98)	>0.99 (0.99-1.0)	0.35 (0.26-0.43)	0.33 (0.32-0.35)
NeedMicro/XGB	NA^{^d}	0.88 (0.87-0.88)	NA^{^d}	>0.99 (0.99-1.0)	NA^{^d}	0.4 (0.38-0.42)

ANN = artificial neural networks; AUC = area under the curve; NA = not applicable; R² = coefficient of determination; RF = random forests; ROC = receiver operating characteristic; XGB = extreme gradient boosting (XGBoost).
↵a Estimate and 95% CI values across 2,000 stratified (by pathogenicity) bootstrap replicates using the percentage method.
↵b External validation on the primary care data set.
↵c Internal validation on the emergency department data set.
↵d The NeedMicro classifier cannot be validated on the primary care data set because urine microscopy data are not available for almost all records.