Median Quality, Accuracy, and Bias Scores Assigned by Humans and ChatGPT to Articles Overall and Stratified by Journal
No. | Human Adjudicated, Median (IQR) | GPT Predicted, Median (IQR) | |||||
---|---|---|---|---|---|---|---|
Quality | Accuracy | Bias | Quality | Accuracy | Bias | ||
Overall | 140 | 90.0 (87.0-92.5) | 92.5 (89.0-95.0) | 0 (0-7.5) | 90.0 (85.0-90.0) | 90.0 (90.0-95.0) | 0 (0) |
Journal | |||||||
Ann Intern Med | 10 | 90.0 (89.6-95.0) | 93.75 (85.6-95.0) | 0 (0) | 90.0 (90.0-90.0) | 95.0 (95.0-95.0) | 0 (0) |
Ann Fam Med | 10 | 88.25 (86.2-90.0) | 90.75 (87.5-93.3) | 1.25 (0-10.6) | 90.0 (90.0-93.8) | 95.0 (91.25-95.0) | 0 (0) |
Chest | 10 | 93.75 (88.8-95.0) | 95.0 (90.6-95.0) | 0 (0-3.8) | 90.0 (81.25-90.0) | 90.0 (90.0-93.8) | 0 (0) |
J Am Coll Cardiol | 10 | 90.0 (88.0-92.5) | 92.5 (89.5-95.0) | 0 (0-7.5) | 85.0 (85.0-90.0) | 90.0 (90.0-90.0) | 0 (0) |
JAMA Surg | 10 | 88.5 (85.5-90.0) | 89.75 (87.5-90.0) | 2.5 (0-7.5) | 87.5 (80.0-90.0) | 90.0 (90.0-90.0) | 0 (0) |
JAMA | 10 | 90.0 (83.6-91.5) | 91.75 (89.6-94.8) | 0 (0) | 90.0 (90.0-90.0) | 90.0 (90.0-90.0) | 0 (0) |
J Gen Intern Med | 10 | 95.0 (91.25-96.88) | 95.0 (93.1-97.5) | 0 (0-5.6) | 90.0 (90.0-90.0) | 90.0 (90.0-95.0) | 0 (0) |
Lancet Neurol | 10 | 90.0 (84.6-94.4) | 90.5 (89.3-94.4) | 0 (0-3.8) | 90.0 (85.0-90.0) | 90.0 (90.0-90.0) | 0 (0) |
Lancet Psychiatry | 10 | 87.5 (85.0-91.9) | 90.0 (86.6-94.4) | 0 (0-10.5) | 90.0 (86.25-90.0) | 95.0 (90.0-95.0) | 0 (0) |
Lancet Respir Med | 10 | 87.5 (86.8-87.5) | 89.75 (85.3-91.9) | 0 (0-21.4) | 90.0 (85.0-90.0) | 90.0 (90.0-90.0) | 0 (0) |
Am J Obstet Gynecol | 10 | 89.5 (85.1-90.0) | 92.5 (88.9-94.3) | 5.0 (0-10.5) | 90.0 (90.0-90.0) | 90.0 (90.0-90.0) | 0 (0) |
N Eng J Med | 10 | 91.25 (87.1-92.5) | 91.25 (89.2-92.5) | 0 (0) | 90.0 (90.0-93.8) | 95.0 (90.0-95.0) | 0 (0) |
Nat Med | 10 | 89.25 (87.5-92.9) | 94.25 (91.4-96.4) | 0 (0-1.9) | 90.0 (90.0-90.0) | 90.0 (90.0-93.8) | 0 (0) |
Am J Epidemiol | 10 | 91.25 (88.1-96.2) | 91.0 (90.0-96.9) | 2.5 (0-9.4) | 87.5 (80.0-93.8) | 90.0 (90.0-93.8) | 0 (0) |
Am J Epidemiol = American Journal of Epidemiology; Am J Obstet Gynecol = American Journal of Obstetrics and Gynecology; Ann Fam Med = Annals of Family Medicine; Ann Intern Med = Annals of Internal Medicine; ChatGPT = Chat Generative Pretrained Transformer; GPT = Generative Pretrained Transformer; IQR = Interquartile range; J Am Coll Cardiol = Journal of the American College of Cardiology; JAMA = Journal of the American Medical Association; JAMA Surg = JAMA Surgery; J Gen Intern Med = Journal of General Internal Medicine; Lancet Neurol = The Lancet Neurology; Lancet Psychiatry = The Lancet Psychiatry; Lancet Respir Med = The Lancet Respiratory Medicine; N Engl J Med = The New England Journal of Medicine; Nat Med = Nature Medicine.