Skip to main content

Main menu

  • Home
  • Current Issue
  • Content
    • Current Issue
    • Early Access
    • Multimedia
    • Podcast
    • Collections
    • Past Issues
    • Articles by Subject
    • Articles by Type
    • Supplements
    • Plain Language Summaries
    • Calls for Papers
  • Info for
    • Authors
    • Reviewers
    • Job Seekers
    • Media
  • About
    • Annals of Family Medicine
    • Editorial Staff & Boards
    • Sponsoring Organizations
    • Copyrights & Permissions
    • Announcements
  • Engage
    • Engage
    • e-Letters (Comments)
    • Subscribe
    • Podcast
    • E-mail Alerts
    • Journal Club
    • RSS
    • Annals Forum (Archive)
  • Contact
    • Contact Us
  • Careers

User menu

  • My alerts

Search

  • Advanced search
Annals of Family Medicine
  • My alerts
Annals of Family Medicine

Advanced Search

  • Home
  • Current Issue
  • Content
    • Current Issue
    • Early Access
    • Multimedia
    • Podcast
    • Collections
    • Past Issues
    • Articles by Subject
    • Articles by Type
    • Supplements
    • Plain Language Summaries
    • Calls for Papers
  • Info for
    • Authors
    • Reviewers
    • Job Seekers
    • Media
  • About
    • Annals of Family Medicine
    • Editorial Staff & Boards
    • Sponsoring Organizations
    • Copyrights & Permissions
    • Announcements
  • Engage
    • Engage
    • e-Letters (Comments)
    • Subscribe
    • Podcast
    • E-mail Alerts
    • Journal Club
    • RSS
    • Annals Forum (Archive)
  • Contact
    • Contact Us
  • Careers
  • Follow annalsfm on Twitter
  • Visit annalsfm on Facebook
Research ArticleOriginal Research

Quality, Accuracy, and Bias in ChatGPT-Based Summarization of Medical Abstracts

Joel Hake, Miles Crowley, Allison Coy, Denton Shanks, Aundria Eoff, Kalee Kirmer-Voss, Gurpreet Dhanda and Daniel J. Parente
The Annals of Family Medicine March 2024, 22 (2) 113-120; DOI: https://doi.org/10.1370/afm.3075
Joel Hake
Department of Family Medicine and Community Health, University of Kansas Medical Center, Kansas City, Kansas
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Miles Crowley
Department of Family Medicine and Community Health, University of Kansas Medical Center, Kansas City, Kansas
MD, MPH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Allison Coy
Department of Family Medicine and Community Health, University of Kansas Medical Center, Kansas City, Kansas
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Denton Shanks
Department of Family Medicine and Community Health, University of Kansas Medical Center, Kansas City, Kansas
DO, MPH
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aundria Eoff
Department of Family Medicine and Community Health, University of Kansas Medical Center, Kansas City, Kansas
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kalee Kirmer-Voss
Department of Family Medicine and Community Health, University of Kansas Medical Center, Kansas City, Kansas
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gurpreet Dhanda
Department of Family Medicine and Community Health, University of Kansas Medical Center, Kansas City, Kansas
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel J. Parente
Department of Family Medicine and Community Health, University of Kansas Medical Center, Kansas City, Kansas
MD, PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: dparente@kumc.edu
  • Article
  • Figures & Data
  • eLetters
  • Info & Metrics
  • PDF
Loading

Abstract

PURPOSE Worldwide clinical knowledge is expanding rapidly, but physicians have sparse time to review scientific literature. Large language models (eg, Chat Generative Pretrained Transformer [ChatGPT]), might help summarize and prioritize research articles to review. However, large language models sometimes “hallucinate” incorrect information.

METHODS We evaluated ChatGPT’s ability to summarize 140 peer-reviewed abstracts from 14 journals. Physicians rated the quality, accuracy, and bias of the ChatGPT summaries. We also compared human ratings of relevance to various areas of medicine to ChatGPT relevance ratings.

RESULTS ChatGPT produced summaries that were 70% shorter (mean abstract length of 2,438 characters decreased to 739 characters). Summaries were nevertheless rated as high quality (median score 90, interquartile range [IQR] 87.0-92.5; scale 0-100), high accuracy (median 92.5, IQR 89.0-95.0), and low bias (median 0, IQR 0-7.5). Serious inaccuracies and hallucinations were uncommon. Classification of the relevance of entire journals to various fields of medicine closely mirrored physician classifications (nonlinear standard error of the regression [SER] 8.6 on a scale of 0-100). However, relevance classification for individual articles was much more modest (SER 22.3).

CONCLUSIONS Summaries generated by ChatGPT were 70% shorter than mean abstract length and were characterized by high quality, high accuracy, and low bias. Conversely, ChatGPT had modest ability to classify the relevance of articles to medical specialties. We suggest that ChatGPT can help family physicians accelerate review of the scientific literature and have developed software (pyJournalWatch) to support this application. Life-critical medical decisions should remain based on full, critical, and thoughtful evaluation of the full text of research articles in context with clinical guidelines.

Key words:
  • artificial intelligence
  • large language models
  • ChatGPT
  • primary care research
  • critical assessment of scientific literature
  • bias
  • text mining
  • text analysis
  • Received for publication April 21, 2023.
  • Revision received October 13, 2023.
  • Accepted for publication November 17, 2023.
  • © 2024 Annals of Family Medicine, Inc.
View Full Text
PreviousNext
Back to top

In this issue

The Annals of Family Medicine: 22 (2)
The Annals of Family Medicine: 22 (2)
Vol. 22, Issue 2
March/April 2024
  • Table of Contents
  • Index by author
  • Front Matter (PDF)
  • Plan-Language Summaries
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Annals of Family Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Quality, Accuracy, and Bias in ChatGPT-Based Summarization of Medical Abstracts
(Your Name) has sent you a message from Annals of Family Medicine
(Your Name) thought you would like to see the Annals of Family Medicine web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
2 + 10 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
Quality, Accuracy, and Bias in ChatGPT-Based Summarization of Medical Abstracts
Joel Hake, Miles Crowley, Allison Coy, Denton Shanks, Aundria Eoff, Kalee Kirmer-Voss, Gurpreet Dhanda, Daniel J. Parente
The Annals of Family Medicine Mar 2024, 22 (2) 113-120; DOI: 10.1370/afm.3075

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Get Permissions
Share
Quality, Accuracy, and Bias in ChatGPT-Based Summarization of Medical Abstracts
Joel Hake, Miles Crowley, Allison Coy, Denton Shanks, Aundria Eoff, Kalee Kirmer-Voss, Gurpreet Dhanda, Daniel J. Parente
The Annals of Family Medicine Mar 2024, 22 (2) 113-120; DOI: 10.1370/afm.3075
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • INTRODUCTION
    • METHODS
    • RESULTS
    • DISCUSSION
    • Footnotes
    • References
  • Figures & Data
  • eLetters
  • Info & Metrics
  • PDF

Related Articles

  • PubMed
  • Google Scholar

Cited By...

  • Use of AI in Family Medicine Publications: A Joint Editorial from Journal Editors
  • Use of AI in family medicine publications: a joint editorial from journal editors
  • Use of artificial intelligence in family medicine publications: Joint statement from journal editors
  • Use of AI in Family Medicine Publications: A Joint Editorial From Journal Editors
  • Generative artificial intelligence and social media: insights for tobacco control
  • Utilizing AI-Generated Plain Language Summaries to Enhance Interdisciplinary Understanding of Ophthalmology Notes: A Randomized Trial
  • Google Scholar

More in this TOC Section

  • Teamwork Among Primary Care Staff to Achieve Regular Follow-Up of Chronic Patients
  • Shared Decision Making Among Racially and/or Ethnically Diverse Populations in Primary Care: A Scoping Review of Barriers and Facilitators
  • Convenience or Continuity: When Are Patients Willing to Wait to See Their Own Doctor?
Show more Original Research

Similar Articles

Subjects

  • Methods:
    • Quantitative methods

Keywords

  • artificial intelligence
  • large language models
  • ChatGPT
  • primary care research
  • critical assessment of scientific literature
  • bias
  • text mining
  • text analysis

Content

  • Current Issue
  • Past Issues
  • Early Access
  • Plain-Language Summaries
  • Multimedia
  • Podcast
  • Articles by Type
  • Articles by Subject
  • Supplements
  • Calls for Papers

Info for

  • Authors
  • Reviewers
  • Job Seekers
  • Media

Engage

  • E-mail Alerts
  • e-Letters (Comments)
  • RSS
  • Journal Club
  • Submit a Manuscript
  • Subscribe
  • Family Medicine Careers

About

  • About Us
  • Editorial Board & Staff
  • Sponsoring Organizations
  • Copyrights & Permissions
  • Contact Us
  • eLetter/Comments Policy

© 2025 Annals of Family Medicine