Skip to main content

Main menu

  • Home
  • Current Issue
  • Content
    • Current Issue
    • Early Access
    • Multimedia
    • Podcast
    • Collections
    • Past Issues
    • Articles by Subject
    • Articles by Type
    • Supplements
    • Plain Language Summaries
    • Calls for Papers
  • Info for
    • Authors
    • Reviewers
    • Job Seekers
    • Media
  • About
    • Annals of Family Medicine
    • Editorial Staff & Boards
    • Sponsoring Organizations
    • Copyrights & Permissions
    • Announcements
  • Engage
    • Engage
    • e-Letters (Comments)
    • Subscribe
    • Podcast
    • E-mail Alerts
    • Journal Club
    • RSS
    • Annals Forum (Archive)
  • Contact
    • Contact Us
  • Careers

User menu

  • My alerts

Search

  • Advanced search
Annals of Family Medicine
  • My alerts
Annals of Family Medicine

Advanced Search

  • Home
  • Current Issue
  • Content
    • Current Issue
    • Early Access
    • Multimedia
    • Podcast
    • Collections
    • Past Issues
    • Articles by Subject
    • Articles by Type
    • Supplements
    • Plain Language Summaries
    • Calls for Papers
  • Info for
    • Authors
    • Reviewers
    • Job Seekers
    • Media
  • About
    • Annals of Family Medicine
    • Editorial Staff & Boards
    • Sponsoring Organizations
    • Copyrights & Permissions
    • Announcements
  • Engage
    • Engage
    • e-Letters (Comments)
    • Subscribe
    • Podcast
    • E-mail Alerts
    • Journal Club
    • RSS
    • Annals Forum (Archive)
  • Contact
    • Contact Us
  • Careers
  • Follow annalsfm on Twitter
  • Visit annalsfm on Facebook
EditorialEditorial

The AI Moonshot: What We Need and What We Do Not

José E. Rodríguez and Yves Lussier
The Annals of Family Medicine January 2025, 23 (1) 7; DOI: https://doi.org/10.1370/afm.240602
José E. Rodríguez
1Department of Family & Preventive Medicine, University of Utah, Salt Lake City, Utah
MD, FAAP
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Jose.Rodriguez@hsc.utah.edu
Yves Lussier
2Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah
MD, FACMI
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • eLetters
  • Info & Metrics
  • PDF
Loading

Published eLetters

If you would like to comment on this article, click on Submit a Response to This article, below. We welcome your input.

Submit a Response to This Article
Compose eLetter

More information about text formats

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Author Information
First or given name, e.g. 'Peter'.
Your last, or family, name, e.g. 'MacMoody'.
Your email address, e.g. higgs-boson@gmail.com
Your role and/or occupation, e.g. 'Orthopedic Surgeon'.
Your organization or institution (if applicable), e.g. 'Royal Free Hospital'.
Statement of Competing Interests
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.

Vertical Tabs

Jump to comment:

  • Governing AI to Serve the Patient in Front of Us
    Rebeca Tenajas and David Miraut
    Published on: 19 May 2025
  • The Promise of AI: Enhancing Care, Equity, and Physician-Patient Interaction
    Ezra N. S. Lockhart
    Published on: 20 March 2025
  • The Hidden Risk of AI Hallucinations in Medical Practice
    Rebeca Tenajas and David Miraut
    Published on: 16 March 2025
  • Published on: (19 May 2025)
    Page navigation anchor for Governing AI to Serve the Patient in Front of Us
    Governing AI to Serve the Patient in Front of Us
    • Rebeca Tenajas, Medical Doctor, Master in Medicina Clínica, Family Medicine Department, Arroyomolinos Community Health Centre, Spain
    • Other Contributors:
      • David Miraut, Independent Researcher

    The editorial by Rodríguez and Lussier (1) invites those of us who practice medicine to articulate, with equal parts pragmatism and ambition, what kinds of artificial-intelligence tools we genuinely need at the point of care. As two family medicine researchers working in Spain, we share the authors’ wish to steer the current “moonshot” rhetoric toward solutions that actually lighten our clinical workload, protect the therapeutic relationship, and respect the realities of diverse health-system contexts. In the lines that follow we reflect on their argument through the lens of the recent World Health Organization’s Global Initiative on AI for Health (GI-AI4H) described in its recent Nature commentary (2). Our aim is to show how the governance scaffolding proposed by GI-AI4H can channel primary-care–driven priorities, making ambient AI scribes (3), multilingual note generation, and smarter informational triage not only technically feasible but also safe, equitable, and sustainable.

    The workload data that frame the Annals piece are now well documented: quite often outpatient physicians spend more time interacting with the electronic health record than with patients, a pattern associated with measurable burnout and intent to leave practice (4). Longitudinal analyses of academic primary-care clinicians show that inbox volume and after-hours clicks have continued to rise even after the acute phase of the COVID-19 pandemic (5). These observations explain why the workshop pa...

    Show More

    The editorial by Rodríguez and Lussier (1) invites those of us who practice medicine to articulate, with equal parts pragmatism and ambition, what kinds of artificial-intelligence tools we genuinely need at the point of care. As two family medicine researchers working in Spain, we share the authors’ wish to steer the current “moonshot” rhetoric toward solutions that actually lighten our clinical workload, protect the therapeutic relationship, and respect the realities of diverse health-system contexts. In the lines that follow we reflect on their argument through the lens of the recent World Health Organization’s Global Initiative on AI for Health (GI-AI4H) described in its recent Nature commentary (2). Our aim is to show how the governance scaffolding proposed by GI-AI4H can channel primary-care–driven priorities, making ambient AI scribes (3), multilingual note generation, and smarter informational triage not only technically feasible but also safe, equitable, and sustainable.

    The workload data that frame the Annals piece are now well documented: quite often outpatient physicians spend more time interacting with the electronic health record than with patients, a pattern associated with measurable burnout and intent to leave practice (4). Longitudinal analyses of academic primary-care clinicians show that inbox volume and after-hours clicks have continued to rise even after the acute phase of the COVID-19 pandemic (5). These observations explain why the workshop participants placed ambient AI scribes at the top of their wish list. Early qualitative studies suggest that such systems, when integrated into routine consultations, are perceived by physicians as reducing clerical load and improving conversational flow (6), and pragmatic randomised trials are now under way to quantify their effect on time-in-notes and burnout scores (7). Although the technology has yet to reach full maturity, as evidenced by our experimental observations in Spanish healthcare facilities and hospitals.

    Yet the same evidence base that highlights opportunity also flags ethical risks (8). Speech-to-text engines can mis-transcribe under-represented dialects and strong accents, and large language models may introduce factual errors or fabricate content when assembling structured notes (9–13). The GI-AI4H ethical framework provides concrete guardrails for managing such risks, building on WHO guidance that positions transparency, explainability, and human oversight as prerequisites for clinical deployment (14). Importantly, the GI-AI4H places special emphasis on the participation of low- and middle-income countries (LMICs) in standard-setting, acknowledging that systems trained exclusively on data from high-income settings will propagate bias and jeopardise safety when exported elsewhere (2). In our own Spanish clinics, where encounters routinely alternate between Castilian, Catalan, Galician, Basque, and Arabic, we see direct relevance: multilingual note generation must be evaluated not only for transcription accuracy but also for its potential to exclude patients whose languages are poorly represented in the training corpus (15).

    Rodríguez and Lussier caution against another avalanche of diagnostic aids or risk calculators and instead call for tools that “free brain space” for relational work. Their skepticism resonates with studies showing that cardiovascular risk scores, although recommended by guidelines, remain under-used because integration hurdles and time constraints outstrip perceived benefit (16). The GI-AI4H regulatory agenda, which draws on the International Medical Device Regulators Forum’s Good Machine Learning Practice principles (17), offers a pathway for distinguishing high-value ambient documentation support from lower-yield predictive widgets. By requiring developers to document intended use, data provenance, and performance metrics in real-world sub-groups, the framework favours applications that demonstrably reduce administrative load without displacing clinical judgement.

    Crucially, GI-AI4H recognises that compliance processes must be proportionate to resource levels. Here Spain’s experience as a mid-income European country intersects with LMIC concerns (18): excessive documentation or costly post-market surveillance can deter smaller vendors from bringing context-appropriate solutions to market. The initiative’s proposal for shared repositories of de-identified conversational data (or even federated ones), accessible through tiered governance arrangements, could lower entry barriers while maintaining oversight (2). Such repositories would also facilitate independent benchmarking of AI scribes across languages, an urgent need given the rapid commercialisation highlighted by recent investment trends in clinical note-taking platforms.

    The workshop narrative describes the cognitive friction of searching an EHR for an ECG that appears five times yet is nowhere to be found. Studies of automatic summarisation show that large language models can assemble coherent SOAP notes from raw transcripts, but faithfulness (the degree to which generated text reflects the source conversation) remains imperfect, especially when audio is noisy or accents unfamiliar (19). Implementation frameworks advocated by GI-AI4H emphasise co-design with end-users and pilot testing in local settings before scale-up (20). In Spain’s decentralized health system, such pilots would need to account for autonomous-community variations in EHR architecture and data-sharing policies.

    Beyond note generation, Rodriguez and Lussier imagine AI that resolves conflicting specialist recommendations. Here we see potential synergy with the Focus Group on AI for Health (FG-AI4H) assessment protocols, which propose standardised validation tasks for triage and decision-support models (21). Embedding ambient scribes within these validation pipelines could allow the same speech-based interface to surface guideline discrepancies in real time, offering primary-care clinicians a concise, hierarchically organised summary rather than another risk score. Early evidence from randomised pilots suggests that physicians spend up to 30 % less after-hours time on documentation when ambient AI is introduced, without detectable decline in note quality, although rigorous peer-review studies are still scarce (22). In our experience, however, the reality has been quite the opposite: we have had to invest additional effort in correcting transcription errors and the misplacement of information in standardized forms. Hopefully, these tools will continue to improve over time.

    The Nature commentary underscores GI-AI4H’s commitment to transferring technical capacity to LMICs through on-the-ground collaborations (2). Automated visual evaluation of cervical images, for example, has achieved diagnostic performance on par with expert colposcopists in National Institutes of Health–supported studies (23). Yet uptake remains limited by infrastructure gaps and data-poverty paradoxes, as described in recent analyses of AI deployment inequalities (24). The parallel with primary-care documentation is instructive: both use-cases entail shifting cognitive labour from clinicians to machines, but both risk amplifying disparities if broadband connectivity, hardware availability, or language coverage are missing. The GI-AI4H operational pillar therefore proposes resource-pooling agreements in which member states share annotated datasets and model checkpoints while retaining sovereignty over patient data (2).

    An often overlooked dimension of AI scribes is their environmental footprint. Life-cycle assessments suggest that training and running large language models entail non-trivial energy and water consumption, and that the net ecological benefit of digital interventions cannot be assumed a priori (25). The GI-AI4H paper hints at integrating climate metrics into operational governance but leaves practical details open. We argue that any procurement decision for ambient AI in primary care should include disclosures of model size, hardware requirements, and strategies for carbon emissions mitigation, echoing the sustainability clauses now common in European public-sector tenders. If multilingual note generation saves clinician hours but demands power-hungry GPUs in every consultation room, the societal trade-off becomes ambiguous. Conversely, cloud-based inference served from renewable-powered data centres may yield a genuine emissions reduction by avoiding human scribes’ commuting and paper printouts. Transparent reporting, as mandated by the GI-AI4H operations pillar, will allow health systems to weigh these factors.

    The Annals editorial (1) urges the profession to declare openly what we need and do not need from AI developers. In our view, three lines of inquiry align with GI-AI4H governance and merit immediate investment. First, rigorous head-to-head trials comparing ambient AI with human scribes across multiple languages, measuring not only documentation time but also patient-reported experience and clinical appropriateness. The trial design registered recently in the United States represents a promising template (7). Second, implementation science studies examining how ambient AI alters team workflows, inbox dynamics, and inter-professional communication, building on qualitative explorations of physician perspectives (6). Third, cost-effectiveness analyses that incorporate environmental externalities and regulatory compliance costs, consistent with WHO guidance on health-technology assessment.

    In each domain, GI-AI4H offers a mechanism for pooling data and harmonising evaluation metrics. Shared repositories of de-identified consultation audio, annotated for speech-recognition accuracy and note fidelity, would accelerate independent benchmarking while safeguarding privacy through tiered access controls. Spain’s existing networks of teaching centres, linked via the National Health System’s digital-health platform, could contribute to and benefit from such repositories. The multilingual nature of our patient population means that improvements made here would generalise to other linguistically diverse settings, reinforcing the GI-AI4H principle of mutual benefit.

    Rodríguez and Lussier’s appeal to seize a moonshot moment need not encourage grandiose claims. Rather, it challenges our community to articulate plain, evidence-based requirements for AI that supports, not supplants, family medicine. The GI-AI4H framework, by integrating ethical safeguards, proportionate regulation, context-sensitive implementation, and long-term operational stewardship, provides the institutional architecture within which those requirements can be met. Ambient AI scribes that truly reduce documentation burden; multilingual note generators that respect linguistic diversity; and decision-support tools that reconcile specialist advice without adding new clicks—all are technically feasible, but their real-world value depends on the governance choices we make now.

    We therefore close with a pragmatic invitation: let GI-AI4H become one of the main forum where primary-care clinicians, informaticians, regulators, and patient advocates co-design evaluation protocols that reflect the day-to-day realities described in the Annals workshop. If we succeed, the result will not be a singular “moonshot” but a series of incremental, verifiable gains, each one making it a little easier to focus our attention where it belongs: on the patient in front of us.

    REFERENCES:

    1. Rodríguez JE, Lussier Y. The AI Moonshot: What We Need and What We Do Not. Ann Fam Med. 2025 Jan 1;23(1):7–7.

    2. Muralidharan V, Ng MY, AlSalamah S, Pujari S, Kalra K, Singh R, et al. Global Initiative on AI for Health (GI-AI4H): strategic priorities advancing governance across the United Nations. Npj Digit Med. 2025 Apr 23;8(1):1–4.

    3. Blaseg E, Huffstetler A. Artificial Intelligence Scribes Shape Health Care Delivery. [cited 2025 May 16]; Available from: https://www.aafp.org/pubs/afp/issues/2025/0400/graham-center-artificial-...

    4. Tawfik D, Bayati M, Liu J, Nguyen L, Sinha A, Kannampallil T, et al. Predicting Primary Care Physician Burnout From Electronic Health Record Use Measures. Mayo Clin Proc. 2024 Sep 1;99(9):1411–21.

    5. Arndt BG, Micek MA, Rule A, Shafer CM, Baltus JJ, Sinsky CA. More Tethered to the EHR: EHR Workload Trends Among Academic Primary Care Physicians, 2019-2023. Ann Fam Med. 2024 Jan 1;22(1):12–8.

    6. Shah SJ, Crowell T, Jeong Y, Devon-Sand A, Smith M, Yang B, et al. Physician Perspectives on Ambient AI Scribes. JAMA Netw Open. 2025 Mar 24;8(3):e251904.

    7. John N. Mafi M MD. A Randomized Controlled Trial of Two Ambient Artificial Intelligence Scribe Technologies to Improve Documentation Efficiency and Reduce Physician Burnout [Internet]. University of California, Los Angeles; 2025. Available from: https://ichgcp.net/clinical-trials-registry/NCT06792890

    8. Tenajas R, Miraut D. The 24 Big Challenges of Artificial Inteligence Adoption in Healthcare: Review Article. Acta Medica Ruha. 2023 Sep 20;1(3):432–67.

    9. Ramprasad S, Ferracane E, Selvaraj SP. Generating more faithful and consistent SOAP notes using attribute-specific parameters. In: Proceedings of the 8th Machine Learning for Healthcare Conference [Internet]. PMLR; 2023 [cited 2025 May 18]. p. 631–49. Available from: https://proceedings.mlr.press/v219/ramprasad23a.html

    10. Tenajas R, Miraut D. The Hidden Risk of AI Hallucinations in Medical Practice. Ann Fam Med. 2025 Jan 19;23(1):eLetter.

    11. Tenajas R, Miraut D. The Risks of Artificial Intelligence in Academic Medical Writing. Ann Fam Med. 2024 Feb 10;(2023):eLetter.

    12. Tenajas-Cobo R, Miraut-Andrés D. Riesgos en el uso de Grandes Modelos de Lenguaje para la revisión bibliográfica en Medicina. Investig En Educ Médica. 2024 Jan 9;13(49):141.

    13. Tenajas R, Miraut D, Tenajas R, Miraut D. El pulso de la Inteligencia Artificial y la alfabetización digital en Medicina: Nuevas herramientas, viejos desafíos. Rev Medica Hered. 2023 Oct;34(4):232–3.

    14. Organization WH. Ethics and governance of artificial intelligence for health: large multi-modal models. WHO guidance [Internet]. World Health Organization; 2024. 98 p. Available from: https://www.who.int/publications/i/item/9789240029200

    15. Tenajas-Cobo R, Miraut-Andrés D. El futuro del médico en la era de la inteligencia artificial. Investig En Educ Médica. 13(49):138–9.

    16. Tuzzio L, O’Meara ES, Holden E, Parchman ML, Ralston JD, Powell JA, et al. Barriers to Implementing Cardiovascular Risk Calculation in Primary Care: Alignment With the Consolidated Framework for Implementation Research. Am J Prev Med. 2021 Feb 1;60(2):250–7.

    17. International Medical Device Regulators Forum, Artificial Intelligence/Machine Learning-enabled Working Group. Good Machine Learning Practice for Medical Device Development: Guiding Principles [Internet]. International Medical Device Regulators Forum (IMDRF); 2024 Jun. Report No.: IMDRF/AIWG/N73 DRAFT:2024. Available from: https://www.fda.gov/medical-devices/software-medical-device-samd/good-ma...

    18. Tenajas R, Miraut D. Learning from the Amazon to Build Resilient Rural Health Systems. Ann Fam Med. 2025 May 18;22(1):eLetter.

    19. Leong HY, Gao YF, Ji S, Kalaycioglu B, Pamuksuz U. A GEN AI Framework for Medical Note Generation [Internet]. arXiv; 2024 [cited 2025 May 18]. Available from: http://arxiv.org/abs/2410.01841

    20. Abdulazeem HM, Meckawy R, Schwarz S, Novillo-Ortiz D, Klug SJ. Knowledge, attitude, and practice of primary care physicians toward clinical AI-assisted digital health technologies: Systematic review and meta-analysis. Int J Med Inf. 2025 Sep 1;201:105945.

    21. Tenajas R, Miraut D, Illana CI, Alonso-Gonzalez R, Arias-Valcayo F, Herraiz JL. Recent Advances in Artificial Intelligence-Assisted Ultrasound Scanning. Appl Sci. 2023 Jan;13(6):3693.

    22. Wen LS. Opinion | This technology is becoming beloved by doctors and patients alike. The Washington Post [Internet]. 2025 Mar 25 [cited 2025 May 18]; Available from: https://www.washingtonpost.com/opinions/2025/03/25/ambient-ai-health-car...

    23. Dellino M, Cerbone M, d’Amati A, Bochicchio M, Laganà AS, Etrusco A, et al. Artificial Intelligence in Cervical Cancer Screening: Opportunities and Challenges. AI. 2024 Dec;5(4):2984–3000.

    24. Yu L, Zhai X. Use of artificial intelligence to address health disparities in low- and middle-income countries: a thematic analysis of ethical issues. Public Health. 2024 Sep 1;234:77–83.

    25. Bratan T, Heyen NB, Hüsing B, Marscheider-Weidemann F, Thomann J. Hypotheses on environmental impacts of AI use in healthcare. J Clim Change Health. 2024 Mar 1;16:100299.

    Show Less
    Competing Interests: None declared.
  • Published on: (20 March 2025)
    Page navigation anchor for The Promise of AI: Enhancing Care, Equity, and Physician-Patient Interaction
    The Promise of AI: Enhancing Care, Equity, and Physician-Patient Interaction
    • Ezra N. S. Lockhart, Associate Professor, Marriage & Family Therapy, Northcentral University, USA

    José E. Rodríguez and Yves Lussier’s article The AI Moonshot: What We Need and What We Do Not presents an informed perspective on the potential benefits of artificial intelligence (AI) within family medicine, while also highlighting the pitfalls of overcomplicating or misdirecting AI development. In responding to their work, I draw upon a broader framework that integrates humanistic psychology, systems thinking, and decolonial ethics. These frameworks allow for a more nuanced understanding of how AI can be deployed meaningfully in healthcare—ensuring that it fosters relationships, respects cultural contexts, and addresses systemic inequities rather than reinforcing them.

    1. AI as a Tool for Reducing Administrative Burden and Enhancing Efficiency
    AI has significant potential to streamline the administrative aspects of healthcare, particularly through the optimization of electronic health records (EHRs) and related systems. As the authors highlight, current EHR systems are fragmented and redundant, imposing cognitive and procedural burdens on physicians. This inefficiency distracts healthcare providers from their core role: engaging meaningfully with patients.

    Reducing Redundancy: AI can reduce the time physicians spend navigating these fragmented records by organizing and consolidating data. AI could automate tasks such as note-taking, reducing the time spent on documentation and enabling physicians to engage more fully with their patients.

    Facili...

    Show More

    José E. Rodríguez and Yves Lussier’s article The AI Moonshot: What We Need and What We Do Not presents an informed perspective on the potential benefits of artificial intelligence (AI) within family medicine, while also highlighting the pitfalls of overcomplicating or misdirecting AI development. In responding to their work, I draw upon a broader framework that integrates humanistic psychology, systems thinking, and decolonial ethics. These frameworks allow for a more nuanced understanding of how AI can be deployed meaningfully in healthcare—ensuring that it fosters relationships, respects cultural contexts, and addresses systemic inequities rather than reinforcing them.

    1. AI as a Tool for Reducing Administrative Burden and Enhancing Efficiency
    AI has significant potential to streamline the administrative aspects of healthcare, particularly through the optimization of electronic health records (EHRs) and related systems. As the authors highlight, current EHR systems are fragmented and redundant, imposing cognitive and procedural burdens on physicians. This inefficiency distracts healthcare providers from their core role: engaging meaningfully with patients.

    Reducing Redundancy: AI can reduce the time physicians spend navigating these fragmented records by organizing and consolidating data. AI could automate tasks such as note-taking, reducing the time spent on documentation and enabling physicians to engage more fully with their patients.

    Facilitating Collaborative Care: AI can bridge gaps between healthcare teams by organizing information across different specialties and reducing confusion. This functionality ensures that all members of the care team can access coherent, up-to-date information that aligns with the patient’s overall treatment plan.

    These applications directly address the administrative overload that currently burdens family physicians and could restore some balance to their workload, allowing them to focus on the patient-provider relationship.

    2. AI and Patient-Centered Communication
    A central tenet of family medicine is the relational aspect of care, where physicians build trust with patients and work collaboratively to navigate complex health challenges. AI must not only support physicians in their clinical duties but also serve to enhance communication between physician and patient.

    Bridging the Technocratic Gap: AI can serve as an intermediary to translate complex medical information into digestible formats, facilitating understanding and improving patient compliance. For instance, AI could provide explanations of medical conditions, treatment options, and lifestyle changes in ways that are culturally sensitive and easily understood by patients, even if they face language barriers or lack health literacy.

    Supporting Marginalized Communities: AI, when designed with cultural humility, can be used to address health disparities by creating tools that are specifically tailored to the needs of marginalized communities. This may include providing materials in patients’ preferred languages, offering accessible self-management resources, and ensuring that healthcare access is not limited by socioeconomic factors.

    These AI applications can help close the gap between physician expertise and patient understanding, ensuring that patients are not only informed but also empowered to make decisions that align with their values and needs.

    3. AI as a Facilitator of Relational Care
    While AI has the potential to optimize workflow and communication, it must not overshadow the relational nature of healthcare. Family physicians do much more than diagnose and treat—they build lasting relationships, provide emotional support, and help patients navigate the often bewildering health system.

    Enhancing Physician-Patient Interaction: By automating routine administrative tasks, AI can free up physicians’ cognitive resources, enabling them to focus on what matters most: fostering trust and providing compassionate care. In particular, AI could act as a scribe during patient interactions, allowing physicians to be present with their patients rather than distracted by note-taking.

    Maintaining Empathy in the Face of Technology: It is critical that AI tools are designed in a way that complements—not diminishes—the empathic and relational dimensions of care. The integration of AI into clinical practice should focus on enhancing the physician’s ability to attend to the emotional and social needs of their patients, rather than reducing them to a series of data points.

    This approach ensures that AI supports rather than diminishes the human aspects of healthcare, aligning with the broader goal of delivering care that is both technically sound and deeply human.

    4. AI as a Solution to Structural and Systemic Healthcare Inequities
    At its best, AI can be a powerful force for promoting health equity, particularly for historically underserved and marginalized populations. Family medicine, with its emphasis on community health and preventative care, is particularly well-positioned to benefit from AI’s capacity to reduce barriers to care.

    Addressing Healthcare Access: AI could streamline processes like prior authorization, ensuring that patients receive the care they need without unnecessary delays or bureaucratic obstacles. Additionally, AI could assist in identifying gaps in care for patients with chronic conditions, ensuring that no one falls through the cracks.

    Cultural Sensitivity and Inclusion: AI must be designed with an awareness of the sociocultural contexts in which healthcare is delivered. This means developing tools that recognize and account for the diverse backgrounds of patients, including their socioeconomic status, language, and cultural practices. AI could thus help physicians better understand the full complexity of a patient’s life, including familial and socio-ethical factors that may influence treatment decisions.

    In this way, AI has the potential to challenge and dismantle existing inequities within healthcare, making it more accessible, inclusive, and just.

    5. Ethical Considerations and the Importance of Cultural Humility
    As AI technologies become more integrated into healthcare, it is imperative that they are developed with ethical and cultural sensitivity. AI should not merely be a tool for efficiency—it must also respect and uphold the dignity and autonomy of patients.

    Avoiding Bias and Discrimination: AI tools must be carefully designed to avoid perpetuating systemic biases, which are already prevalent in healthcare. This includes ensuring that AI does not reinforce health disparities or marginalize vulnerable groups, but instead promotes equity and fairness.

    Ethical Deployment of AI: The integration of AI into healthcare systems must be guided by ethical principles that prioritize patient autonomy, informed consent, and privacy. The use of AI should align with decolonial ethics, resisting the trend toward the commodification of healthcare and preserving the human elements that make care meaningful.

    AI, when ethically developed and deployed, can act as a tool for promoting not just efficiency but also justice in healthcare.

    Show Less
    Competing Interests: None declared.
  • Published on: (16 March 2025)
    Page navigation anchor for The Hidden Risk of AI Hallucinations in Medical Practice
    The Hidden Risk of AI Hallucinations in Medical Practice
    • Rebeca Tenajas, Medical Doctor, Master in Medicina Clínica, Family Medicine Department, Arroyomolinos Community Health Centre, Spain
    • Other Contributors:
      • David Miraut, Independent Researcher

    Dear Editor,

    We write this letter as a Spanish family physician and an independent researcher, both deeply engaged in the ongoing debates surrounding artificial intelligence (AI) in medical practice (1,2). Our shared perspective arises from years of clinical work in primary care settings, coupled with a commitment to exploring how technological innovations can best serve both patients and professionals. Recent advances in Large Language Models (LLMs), such as ChatGPT or Gemini, have ignited our enthusiasm for the transformation they might bring to healthcare, while at the same time raising serious questions about the reliability and potential pitfalls of such systems (3,4), especially as they become integrated into sensitive tasks like medical diagnosis and treatment planning.

    An essential point of concern that has recently garnered much attention is the phenomenon commonly termed “hallucinations”. These hallucinations occur when an LLM generates output that appears plausible but is factually incorrect or entirely fabricated. In the legal sphere, the starkest examples have been instances in which judges have confronted fictional citations or misapplied case law, all traced back to LLM-generated documents (5). However, the medical domain faces a different and potentially graver danger. A lawyer citing a fabricated case usually triggers an alarm that can be checked in legal databases (6). But for a busy family physician working in a demanding clinic, a subtle m...

    Show More

    Dear Editor,

    We write this letter as a Spanish family physician and an independent researcher, both deeply engaged in the ongoing debates surrounding artificial intelligence (AI) in medical practice (1,2). Our shared perspective arises from years of clinical work in primary care settings, coupled with a commitment to exploring how technological innovations can best serve both patients and professionals. Recent advances in Large Language Models (LLMs), such as ChatGPT or Gemini, have ignited our enthusiasm for the transformation they might bring to healthcare, while at the same time raising serious questions about the reliability and potential pitfalls of such systems (3,4), especially as they become integrated into sensitive tasks like medical diagnosis and treatment planning.

    An essential point of concern that has recently garnered much attention is the phenomenon commonly termed “hallucinations”. These hallucinations occur when an LLM generates output that appears plausible but is factually incorrect or entirely fabricated. In the legal sphere, the starkest examples have been instances in which judges have confronted fictional citations or misapplied case law, all traced back to LLM-generated documents (5). However, the medical domain faces a different and potentially graver danger. A lawyer citing a fabricated case usually triggers an alarm that can be checked in legal databases (6). But for a busy family physician working in a demanding clinic, a subtle misstep like a misplaced clinical guideline, an incorrect dosage, or an invented side effect… may not raise immediate suspicion. This risk is particularly acute in complex diagnostic contexts, where real-time decision-making can be literally a matter of life and death.

    We have come across several troubling anecdotes of fabricated references to scientific articles (7,8), a phenomenon that is often easy to spot once the actual journals are consulted. It is one thing when the AI invents an author’s name or confounds the details of a publication; such errors, though worrisome, are readily identified through diligent verification of sources. Far more insidious are the nuanced distortions that mimic plausible findings while omitting crucial details (9). For instance, an LLM may claim that a certain imaging test is the “gold standard” for a specific condition, citing partial or outdated evidence rather than wholly inventing a paper-like research report. The appearance of verisimilitude lowers our guard, making us more likely to accept the AI’s counsel without the thorough scrutiny that genuine critical appraisals demand. Recent commentaries, such as those by Liebrenz and colleagues in The Lancet Digital Health (10) and by Biswas in Radiology (11), highlight how such hallucinations pose ethical and clinical challenges that extend beyond mere academic inconvenience.

    From the vantage point of a family physician, the attractiveness of LLM-based tools lies in their potential to enhance efficiency. AI-driven applications promise a degree of rapid, wide-ranging literature synthesis that could theoretically allow doctors to make better decisions in less time. Certain pioneering studies, such as the work by Kung et al. (12), even propose that these models might bolster medical education by rapidly generating useful summaries of complex subjects and facilitating on-demand tutoring for medical students and residents. When implemented responsibly, there is no doubt that AI can unlock new levels of efficiency (13), offering second opinions within seconds or sifting through electronic health records to spot patterns that would be imperceptible to the human eye (14,15).

    Yet what is less frequently discussed, and what concerns us as practicing and researching professionals, is the practical burden of supervising these tools. In principle, a diagnostic support system should lighten the workload by providing reliable, evidence-based suggestions. However, once a physician learns that LLMs can hallucinate (and often do so in ways not easily detected) we encourage to verify each statement, reference, and recommendation generated. This elevated demand for vigilance could paradoxically make the adoption of AI more time-consuming than traditional approaches, at least in the early stages of integration. In an era when consultations are already time-pressured and administrative tasks are ever increasing, the prospect of continually fact-checking a voluminous AI output can feel more burdensome than beneficial.

    Moreover, our independent research on the integration of AI into medical workflows indicates that family physicians may bear a disproportionate share of this supervisory load. Primary care requires a breadth of knowledge covering pediatrics, geriatrics, chronic disease management, mental health, and more. An LLM cannot be an all-in-one-expert in all domains simultaneously without occasionally tripping into the kind of oversights that humans can detect only after meticulous analysis. While the AI might identify correlations that our human brains might miss, it can just as easily invent them. This precarious mix of potential brilliance, “black-box” and imaginative failures underscores the importance of expert oversight, yet that oversight itself demands additional training, greater familiarity with AI’s capabilities and limitations (16), and an extension of the current standards of practice.

    Furthermore, adopting LLM-based tools must involve active consideration of ethical and legal implications. The accountability question remains largely unresolved: when an LLM suggests a flawed diagnosis that a physician follows, who is held responsible? Professional guidelines and regulatory frameworks are beginning to adapt, but the process remains in its infancy. In our dialogue with colleagues, we sense a collective concern that the hurried deployment of AI solutions, often motivated by cost-effectiveness or the desire for technological advancement, might outpace the development of robust safeguards and protocols. The real risk is that we adopt these solutions wholesale without building a “fail-safe” mechanism to catch subtle errors before they result in patient harm.

    In closing, we urge a balanced approach: one that wholeheartedly embraces AI’s promise of improved efficiency and expanded analytical capacity, but with an unwavering commitment to independent verification and clinical judgment. As Spanish professionals (one of us in the trenches of family medicine, the other studying these evolutions from a applied research standpoint) we believe these concerns must be voiced transparently in the scientific literature and in clinical practice guidelines. Our fear is that in the eagerness to adopt new technology, we neglect the fundamental principle of “primum non nocere” (first, do no harm). We remain hopeful that concerted efforts in research and development, such as those examining LLM performance in formal medical examinations and specialized diagnostic tasks, will continue to expose weaknesses and sharpen the reliability of these systems. In the meantime, we advocate for cautious, stepwise integration of AI tools into medical practice, always accompanied by vigilant oversight from trained professionals, even in the finest details (17).

    REFERENCES:

    1. Rodríguez JE, Lussier Y. The AI Moonshot: What We Need and What We Do Not. Ann Fam Med. 2025 Jan 1;23(1):7–7.

    2. Tenajas R, Miraut D, Tenajas R, Miraut D. El pulso de la Inteligencia Artificial y la alfabetización digital en Medicina: Nuevas herramientas, viejos desafíos. Rev Medica Hered. 2023 Oct;34(4):232–3.

    3. Tenajas R, Miraut D. The 24 Big Challenges of Artificial Inteligence Adoption in Healthcare: Review Article. Acta Medica Ruha. 2023 Sep 20;1(3):432–67.

    4. Tenajas R, Miraut D. The Risks of Artificial Intelligence in Academic Medical Writing. Ann Fam Med. 2024 Feb 10;(2023):eLetter. [cited 2025 Mar 16]. Available from: https://www.annfammed.org/content/risks-artificial-intelligence-academic...

    5. Dahl M, Magesh V, Suzgun M, Ho DE. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. J Leg Anal. 2024 Jan 1;16(1):64–93.

    6. Magesh V, Surani F, Dahl M, Suzgun M, Manning CD, Ho DE. Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools [Internet]. arXiv; 2024 [cited 2025 Mar 16]. Available from: http://arxiv.org/abs/2405.20362

    7. Buchanan J, Hill S, Shapoval O. ChatGPT Hallucinates Non-existent Citations: Evidence from Economics. Am Econ. 2024 Mar 1;69(1):80–7.

    8. Schrager S, Seehusen DA, Sexton S, Richardson CR, Neher J, Pimlott N, et al. Use of AI in Family Medicine Publications: A Joint Editorial From Journal Editors. Ann Fam Med. 2025 Jan 1;23(1):1–4.

    9. Tenajas-Cobo R, Miraut-Andrés D. Riesgos en el uso de Grandes Modelos de Lenguaje para la revisión bibliográfica en Medicina. Investig En Educ Médica. 2024 Jan 9;13(49):141.

    10. Liebrenz M, Schleifer R, Buadze A, Bhugra D, Smith A. Generating scholarly content with ChatGPT: ethical challenges for medical publishing. Lancet Digit Health. 2023 Mar 1;5(3):e105–6.

    11. Biswas S. ChatGPT and the Future of Medical Writing. Radiology. 2023 Apr;307(2):e223312.

    12. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models | PLOS Digital Health [Internet]. [cited 2025 Mar 16]. Available from: https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig....

    13. Tenajas R, Miraut D. Echoes in space: Online training and AI’s potential in advancing ultrasound competency. WFUMB Ultrasound Open. 2023 Dec 1;1(2):100015.

    14. Tenajas R, Miraut D, Illana CI, Alonso-Gonzalez R, Arias-Valcayo F, Herraiz JL. Recent Advances in Artificial Intelligence-Assisted Ultrasound Scanning. Appl Sci. 2023 Jan;13(6):3693.

    15. Tenajas R, Miraut D. El renacimiento tecnológico de la Radiología: la revolución open source y la inteligencia artificial. Rev Cuba Informática Médica. 2023;15(2).

    16. Tenajas R, Miraut D. Ecografía Inteligente. Rev Cuba Informática Médica. 2023;15(2).

    17. Tenajas R, Miraut D. Rethinking ultrasound probe maintenance in the era of AI. WFUMB Ultrasound Open. 2023 Dec 1;1(2):100014.

    Show Less
    Competing Interests: None declared.
PreviousNext
Back to top

In this issue

The Annals of Family Medicine: 23 (1)
The Annals of Family Medicine: 23 (1)
Vol. 23, Issue 1
January/February 2025
  • Table of Contents
  • Index by author
  • Front Matter (PDF)
  • Plain-Language Summaries
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Annals of Family Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
The AI Moonshot: What We Need and What We Do Not
(Your Name) has sent you a message from Annals of Family Medicine
(Your Name) thought you would like to see the Annals of Family Medicine web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
12 + 8 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Citation Tools
The AI Moonshot: What We Need and What We Do Not
José E. Rodríguez, Yves Lussier
The Annals of Family Medicine Jan 2025, 23 (1) 7; DOI: 10.1370/afm.240602

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Get Permissions
Share
The AI Moonshot: What We Need and What We Do Not
José E. Rodríguez, Yves Lussier
The Annals of Family Medicine Jan 2025, 23 (1) 7; DOI: 10.1370/afm.240602
Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Footnotes
    • References
  • Figures & Data
  • eLetters
  • Info & Metrics
  • PDF

Related Articles

  • PubMed
  • Google Scholar

Cited By...

  • Information Technology in Primary Care Screenings: Ready for Prime Time?
  • Google Scholar

More in this TOC Section

  • Information Technology in Primary Care Screenings: Ready for Prime Time?
  • All Quality Metrics are Wrong; Some Quality Metrics Could Become Useful
Show more Editorial

Similar Articles

Keywords

  • artificial intelligence
  • family medicine
  • EHR

Content

  • Current Issue
  • Past Issues
  • Early Access
  • Plain-Language Summaries
  • Multimedia
  • Podcast
  • Articles by Type
  • Articles by Subject
  • Supplements
  • Calls for Papers

Info for

  • Authors
  • Reviewers
  • Job Seekers
  • Media

Engage

  • E-mail Alerts
  • e-Letters (Comments)
  • RSS
  • Journal Club
  • Submit a Manuscript
  • Subscribe
  • Family Medicine Careers

About

  • About Us
  • Editorial Board & Staff
  • Sponsoring Organizations
  • Copyrights & Permissions
  • Contact Us
  • eLetter/Comments Policy

© 2025 Annals of Family Medicine