Skip to main content
The editorial by Rodríguez and Lussier (1) invites those of us who practice medicine to articulate, with equal parts pragmatism and ambition, what kinds of artificial-intelligence tools we genuinely need at the point of care. As two family medicine researchers working in Spain, we share the authors’ wish to steer the current “moonshot” rhetoric toward solutions that actually lighten our clinical workload, protect the therapeutic relationship, and respect the realities of diverse health-system contexts. In the lines that follow we reflect on their argument through the lens of the recent World Health Organization’s Global Initiative on AI for Health (GI-AI4H) described in its recent Nature commentary (2). Our aim is to show how the governance scaffolding proposed by GI-AI4H can channel primary-care–driven priorities, making ambient AI scribes (3), multilingual note generation, and smarter informational triage not only technically feasible but also safe, equitable, and sustainable.
The workload data that frame the Annals piece are now well documented: quite often outpatient physicians spend more time interacting with the electronic health record than with patients, a pattern associated with measurable burnout and intent to leave practice (4). Longitudinal analyses of academic primary-care clinicians show that inbox volume and after-hours clicks have continued to rise even after the acute phase of the COVID-19 pandemic (5). These observations explain why the workshop participants placed ambient AI scribes at the top of their wish list. Early qualitative studies suggest that such systems, when integrated into routine consultations, are perceived by physicians as reducing clerical load and improving conversational flow (6), and pragmatic randomised trials are now under way to quantify their effect on time-in-notes and burnout scores (7). Although the technology has yet to reach full maturity, as evidenced by our experimental observations in Spanish healthcare facilities and hospitals.
Yet the same evidence base that highlights opportunity also flags ethical risks (8). Speech-to-text engines can mis-transcribe under-represented dialects and strong accents, and large language models may introduce factual errors or fabricate content when assembling structured notes (9–13). The GI-AI4H ethical framework provides concrete guardrails for managing such risks, building on WHO guidance that positions transparency, explainability, and human oversight as prerequisites for clinical deployment (14). Importantly, the GI-AI4H places special emphasis on the participation of low- and middle-income countries (LMICs) in standard-setting, acknowledging that systems trained exclusively on data from high-income settings will propagate bias and jeopardise safety when exported elsewhere (2). In our own Spanish clinics, where encounters routinely alternate between Castilian, Catalan, Galician, Basque, and Arabic, we see direct relevance: multilingual note generation must be evaluated not only for transcription accuracy but also for its potential to exclude patients whose languages are poorly represented in the training corpus (15).
Rodríguez and Lussier caution against another avalanche of diagnostic aids or risk calculators and instead call for tools that “free brain space” for relational work. Their skepticism resonates with studies showing that cardiovascular risk scores, although recommended by guidelines, remain under-used because integration hurdles and time constraints outstrip perceived benefit (16). The GI-AI4H regulatory agenda, which draws on the International Medical Device Regulators Forum’s Good Machine Learning Practice principles (17), offers a pathway for distinguishing high-value ambient documentation support from lower-yield predictive widgets. By requiring developers to document intended use, data provenance, and performance metrics in real-world sub-groups, the framework favours applications that demonstrably reduce administrative load without displacing clinical judgement.
Crucially, GI-AI4H recognises that compliance processes must be proportionate to resource levels. Here Spain’s experience as a mid-income European country intersects with LMIC concerns (18): excessive documentation or costly post-market surveillance can deter smaller vendors from bringing context-appropriate solutions to market. The initiative’s proposal for shared repositories of de-identified conversational data (or even federated ones), accessible through tiered governance arrangements, could lower entry barriers while maintaining oversight (2). Such repositories would also facilitate independent benchmarking of AI scribes across languages, an urgent need given the rapid commercialisation highlighted by recent investment trends in clinical note-taking platforms.
The workshop narrative describes the cognitive friction of searching an EHR for an ECG that appears five times yet is nowhere to be found. Studies of automatic summarisation show that large language models can assemble coherent SOAP notes from raw transcripts, but faithfulness (the degree to which generated text reflects the source conversation) remains imperfect, especially when audio is noisy or accents unfamiliar (19). Implementation frameworks advocated by GI-AI4H emphasise co-design with end-users and pilot testing in local settings before scale-up (20). In Spain’s decentralized health system, such pilots would need to account for autonomous-community variations in EHR architecture and data-sharing policies.
Beyond note generation, Rodriguez and Lussier imagine AI that resolves conflicting specialist recommendations. Here we see potential synergy with the Focus Group on AI for Health (FG-AI4H) assessment protocols, which propose standardised validation tasks for triage and decision-support models (21). Embedding ambient scribes within these validation pipelines could allow the same speech-based interface to surface guideline discrepancies in real time, offering primary-care clinicians a concise, hierarchically organised summary rather than another risk score. Early evidence from randomised pilots suggests that physicians spend up to 30 % less after-hours time on documentation when ambient AI is introduced, without detectable decline in note quality, although rigorous peer-review studies are still scarce (22). In our experience, however, the reality has been quite the opposite: we have had to invest additional effort in correcting transcription errors and the misplacement of information in standardized forms. Hopefully, these tools will continue to improve over time.
The Nature commentary underscores GI-AI4H’s commitment to transferring technical capacity to LMICs through on-the-ground collaborations (2). Automated visual evaluation of cervical images, for example, has achieved diagnostic performance on par with expert colposcopists in National Institutes of Health–supported studies (23). Yet uptake remains limited by infrastructure gaps and data-poverty paradoxes, as described in recent analyses of AI deployment inequalities (24). The parallel with primary-care documentation is instructive: both use-cases entail shifting cognitive labour from clinicians to machines, but both risk amplifying disparities if broadband connectivity, hardware availability, or language coverage are missing. The GI-AI4H operational pillar therefore proposes resource-pooling agreements in which member states share annotated datasets and model checkpoints while retaining sovereignty over patient data (2).
An often overlooked dimension of AI scribes is their environmental footprint. Life-cycle assessments suggest that training and running large language models entail non-trivial energy and water consumption, and that the net ecological benefit of digital interventions cannot be assumed a priori (25). The GI-AI4H paper hints at integrating climate metrics into operational governance but leaves practical details open. We argue that any procurement decision for ambient AI in primary care should include disclosures of model size, hardware requirements, and strategies for carbon emissions mitigation, echoing the sustainability clauses now common in European public-sector tenders. If multilingual note generation saves clinician hours but demands power-hungry GPUs in every consultation room, the societal trade-off becomes ambiguous. Conversely, cloud-based inference served from renewable-powered data centres may yield a genuine emissions reduction by avoiding human scribes’ commuting and paper printouts. Transparent reporting, as mandated by the GI-AI4H operations pillar, will allow health systems to weigh these factors.
The Annals editorial (1) urges the profession to declare openly what we need and do not need from AI developers. In our view, three lines of inquiry align with GI-AI4H governance and merit immediate investment. First, rigorous head-to-head trials comparing ambient AI with human scribes across multiple languages, measuring not only documentation time but also patient-reported experience and clinical appropriateness. The trial design registered recently in the United States represents a promising template (7). Second, implementation science studies examining how ambient AI alters team workflows, inbox dynamics, and inter-professional communication, building on qualitative explorations of physician perspectives (6). Third, cost-effectiveness analyses that incorporate environmental externalities and regulatory compliance costs, consistent with WHO guidance on health-technology assessment.
In each domain, GI-AI4H offers a mechanism for pooling data and harmonising evaluation metrics. Shared repositories of de-identified consultation audio, annotated for speech-recognition accuracy and note fidelity, would accelerate independent benchmarking while safeguarding privacy through tiered access controls. Spain’s existing networks of teaching centres, linked via the National Health System’s digital-health platform, could contribute to and benefit from such repositories. The multilingual nature of our patient population means that improvements made here would generalise to other linguistically diverse settings, reinforcing the GI-AI4H principle of mutual benefit.
Rodríguez and Lussier’s appeal to seize a moonshot moment need not encourage grandiose claims. Rather, it challenges our community to articulate plain, evidence-based requirements for AI that supports, not supplants, family medicine. The GI-AI4H framework, by integrating ethical safeguards, proportionate regulation, context-sensitive implementation, and long-term operational stewardship, provides the institutional architecture within which those requirements can be met. Ambient AI scribes that truly reduce documentation burden; multilingual note generators that respect linguistic diversity; and decision-support tools that reconcile specialist advice without adding new clicks—all are technically feasible, but their real-world value depends on the governance choices we make now.
We therefore close with a pragmatic invitation: let GI-AI4H become one of the main forum where primary-care clinicians, informaticians, regulators, and patient advocates co-design evaluation protocols that reflect the day-to-day realities described in the Annals workshop. If we succeed, the result will not be a singular “moonshot” but a series of incremental, verifiable gains, each one making it a little easier to focus our attention where it belongs: on the patient in front of us.
REFERENCES:
1. Rodríguez JE, Lussier Y. The AI Moonshot: What We Need and What We Do Not. Ann Fam Med. 2025 Jan 1;23(1):7–7.
2. Muralidharan V, Ng MY, AlSalamah S, Pujari S, Kalra K, Singh R, et al. Global Initiative on AI for Health (GI-AI4H): strategic priorities advancing governance across the United Nations. Npj Digit Med. 2025 Apr 23;8(1):1–4.
3. Blaseg E, Huffstetler A. Artificial Intelligence Scribes Shape Health Care Delivery. [cited 2025 May 16]; Available from: https://www.aafp.org/pubs/afp/issues/2025/0400/graham-center-artificial-...
4. Tawfik D, Bayati M, Liu J, Nguyen L, Sinha A, Kannampallil T, et al. Predicting Primary Care Physician Burnout From Electronic Health Record Use Measures. Mayo Clin Proc. 2024 Sep 1;99(9):1411–21.
5. Arndt BG, Micek MA, Rule A, Shafer CM, Baltus JJ, Sinsky CA. More Tethered to the EHR: EHR Workload Trends Among Academic Primary Care Physicians, 2019-2023. Ann Fam Med. 2024 Jan 1;22(1):12–8.
6. Shah SJ, Crowell T, Jeong Y, Devon-Sand A, Smith M, Yang B, et al. Physician Perspectives on Ambient AI Scribes. JAMA Netw Open. 2025 Mar 24;8(3):e251904.
7. John N. Mafi M MD. A Randomized Controlled Trial of Two Ambient Artificial Intelligence Scribe Technologies to Improve Documentation Efficiency and Reduce Physician Burnout [Internet]. University of California, Los Angeles; 2025. Available from: https://ichgcp.net/clinical-trials-registry/NCT06792890
8. Tenajas R, Miraut D. The 24 Big Challenges of Artificial Inteligence Adoption in Healthcare: Review Article. Acta Medica Ruha. 2023 Sep 20;1(3):432–67.
9. Ramprasad S, Ferracane E, Selvaraj SP. Generating more faithful and consistent SOAP notes using attribute-specific parameters. In: Proceedings of the 8th Machine Learning for Healthcare Conference [Internet]. PMLR; 2023 [cited 2025 May 18]. p. 631–49. Available from: https://proceedings.mlr.press/v219/ramprasad23a.html
10. Tenajas R, Miraut D. The Hidden Risk of AI Hallucinations in Medical Practice. Ann Fam Med. 2025 Jan 19;23(1):eLetter.
11. Tenajas R, Miraut D. The Risks of Artificial Intelligence in Academic Medical Writing. Ann Fam Med. 2024 Feb 10;(2023):eLetter.
12. Tenajas-Cobo R, Miraut-Andrés D. Riesgos en el uso de Grandes Modelos de Lenguaje para la revisión bibliográfica en Medicina. Investig En Educ Médica. 2024 Jan 9;13(49):141.
13. Tenajas R, Miraut D, Tenajas R, Miraut D. El pulso de la Inteligencia Artificial y la alfabetización digital en Medicina: Nuevas herramientas, viejos desafíos. Rev Medica Hered. 2023 Oct;34(4):232–3.
14. Organization WH. Ethics and governance of artificial intelligence for health: large multi-modal models. WHO guidance [Internet]. World Health Organization; 2024. 98 p. Available from: https://www.who.int/publications/i/item/9789240029200
15. Tenajas-Cobo R, Miraut-Andrés D. El futuro del médico en la era de la inteligencia artificial. Investig En Educ Médica. 13(49):138–9.
16. Tuzzio L, O’Meara ES, Holden E, Parchman ML, Ralston JD, Powell JA, et al. Barriers to Implementing Cardiovascular Risk Calculation in Primary Care: Alignment With the Consolidated Framework for Implementation Research. Am J Prev Med. 2021 Feb 1;60(2):250–7.
17. International Medical Device Regulators Forum, Artificial Intelligence/Machine Learning-enabled Working Group. Good Machine Learning Practice for Medical Device Development: Guiding Principles [Internet]. International Medical Device Regulators Forum (IMDRF); 2024 Jun. Report No.: IMDRF/AIWG/N73 DRAFT:2024. Available from: https://www.fda.gov/medical-devices/software-medical-device-samd/good-ma...
18. Tenajas R, Miraut D. Learning from the Amazon to Build Resilient Rural Health Systems. Ann Fam Med. 2025 May 18;22(1):eLetter.
19. Leong HY, Gao YF, Ji S, Kalaycioglu B, Pamuksuz U. A GEN AI Framework for Medical Note Generation [Internet]. arXiv; 2024 [cited 2025 May 18]. Available from: http://arxiv.org/abs/2410.01841
20. Abdulazeem HM, Meckawy R, Schwarz S, Novillo-Ortiz D, Klug SJ. Knowledge, attitude, and practice of primary care physicians toward clinical AI-assisted digital health technologies: Systematic review and meta-analysis. Int J Med Inf. 2025 Sep 1;201:105945.
21. Tenajas R, Miraut D, Illana CI, Alonso-Gonzalez R, Arias-Valcayo F, Herraiz JL. Recent Advances in Artificial Intelligence-Assisted Ultrasound Scanning. Appl Sci. 2023 Jan;13(6):3693.
22. Wen LS. Opinion | This technology is becoming beloved by doctors and patients alike. The Washington Post [Internet]. 2025 Mar 25 [cited 2025 May 18]; Available from: https://www.washingtonpost.com/opinions/2025/03/25/ambient-ai-health-car...
23. Dellino M, Cerbone M, d’Amati A, Bochicchio M, Laganà AS, Etrusco A, et al. Artificial Intelligence in Cervical Cancer Screening: Opportunities and Challenges. AI. 2024 Dec;5(4):2984–3000.
24. Yu L, Zhai X. Use of artificial intelligence to address health disparities in low- and middle-income countries: a thematic analysis of ethical issues. Public Health. 2024 Sep 1;234:77–83.
25. Bratan T, Heyen NB, Hüsing B, Marscheider-Weidemann F, Thomann J. Hypotheses on environmental impacts of AI use in healthcare. J Clim Change Health. 2024 Mar 1;16:100299.