Skip to main content
Dear Editor,
As a family physician and an independent researcher based in Spain, we read with great interest the recent article by Mazur et al. examining an AI-based voice biomarker tool to detect moderate to severe depression (1). We would like to share our perspective on the applicability of these findings to Spanish primary care and, more broadly, to settings where family physicians frequently encounter individuals presenting with varied symptoms that can mask or hint at depressive disorders. In particular, we aim to highlight key considerations regarding methodologic validity, the broader clinical implications, and ethical aspects of deploying voice-based tools for early disease detection in our region.
The authors’ cross-sectional evaluation demonstrates an appreciable balance of sensitivity and specificity—71.3% and 73.5%, respectively—when comparing the AI-generated voice biomarker analysis to the PHQ-9 at a cutoff score of 10. We recognize that these figures approximate ranges typically reported for classical psychiatric screening tools (2). In our clinical practice, the PHQ-9 has proven itself an efficient method for the routine identification of depression, especially as it is straightforward to use and interpret. Voice biomarkers may further ease the process by incorporating natural speech into the screening workflow, thereby promoting a noninvasive and potentially less time-consuming process for both clinicians and patients. Nonetheless, the authors rightly underscore the importance of addressing the potential for false positives and false negatives, given that a missed case of depression may hamper early referral to mental health services, while an overestimation could cause unwarranted anxiety in patients and lead to increased clinical load.
One of the most compelling arguments in favor of voice-based screening is its potential to extend beyond depression. Studies examining speech analysis in neurocognitive disorders have reported that subtle changes in vocal features may appear before the emergence of overt cognitive decline, particularly in early Alzheimer’s disease (3). At research centers such as the Barcelonaβeta Brain Research Center (4,5) and Ace Alzheimer Center de Barcelona (6), the exploration of speech and linguistic patterns stands as a promising frontier for detecting presymptomatic stages of dementia. Our own clinical experience in Spanish primary care reaffirms the urgency of improving early detection strategies for Alzheimer’s and other neurodegenerative conditions: many such disorders often remain masked until clinical manifestations become pronounced, at which point the window for effective intervention has narrowed substantially.
Beyond dementias, voice analysis has also been explored in Parkinson’s disease (7). While our main interest pertains to depression detection, the notion that the same or similar technologies could be employed across a spectrum of neurologic and psychiatric conditions is attractive from a health systems perspective. In Spain, as in many other countries with a strong primary care network, integrating a single versatile and cost-effective tool into daily practice may help general practitioners recognize subtle warning signs and accelerate specialist referral. The prospect of merging evidence-based questionnaires with AI-aided voice assessment is especially promising in areas plagued by physician shortages, where more efficient screening protocols could help clinicians prioritize the most at-risk patients.
At the same time, we must be cognizant of ethical, regulatory, and privacy considerations inherent in voice recording and data analysis, especially under Europe’s General Data Protection Regulation (GDPR) (8). Mazur et al. emphasize patient confidentiality and informed consent, areas which inevitably come to the forefront when considering widespread adoption in public health services. Maintaining patient trust is key, and that involves establishing robust data-management protocols, ensuring complete transparency about the use of recorded samples, and clearly communicating the scope and limitations of any algorithmic conclusions.
Another challenge lies in properly training and guiding clinicians. From our vantage point, family physicians would require structured education to interpret machine outputs. Clinical decision-making could become overly reliant on AI if practitioners do not remain actively engaged with the interpretation process. We therefore echo the authors’ assertion that voice biomarker tools should be positioned as adjuvant instruments, rather than replacements, for clinical interviews. As these technologies advance, it may be beneficial to develop formal training modules, either at the undergraduate or postgraduate level, as well as continuing education programs, to facilitate their safe and appropriate use in real-world primary care.
Furthermore, linguistic variability must be considered. Spain is home to several co-official languages, in addition to numerous regional dialects. If algorithms rely on features specific to Castilian Spanish or English speech, their performance might differ substantially in Catalan, Basque, or Galician contexts. We concur with the authors’ view that representative data sets are essential for machine learning models. Collecting robust speech samples from multiple linguistic groups—and ensuring that each group’s speech patterns are used to refine the algorithm—would help mitigate potential biases and inaccuracies in future iterations.
Finally, there is the matter of cost-effectiveness and implementation feasibility. While voice-based AI holds clear advantages, including minimal intrusion and lower operational costs once integrated, its true economic impact on health systems remains to be explored. However, considering that mental and neurodegenerative disorders pose a significant burden to individuals and healthcare budgets alike, investing in tools that detect these conditions earlier could reduce costly interventions and improve quality of life in the long term (9). Indeed, as Topol has argued, the future of “high-performance medicine” likely involves convergent technologies—where human judgment is supported by artificial intelligence, resulting in more accurate diagnoses and personalized care (10).
We applaud Mazur et al. for taking an important step toward validating an AI-based voice biomarker within a large sample. Their findings can inform both Spanish and international efforts aimed at early depression detection. However, additional studies are needed to determine whether voice-based technology might prove similarly reliable across diverse languages, to understand better how best to integrate it into primary care workflows without undermining the patient–physician relationship, and to ensure we adhere to the highest ethical standards. Going forward, well-designed pilot programs in Spanish healthcare centers could offer valuable insights, including acceptance among practitioners and patients, logistical demands, and the interplay between AI-based screening and conventional clinical assessments.
In conclusion, the approach described by Mazur et al. exemplifies a growing interest in harnessing machine learning for clinical applications, and we see considerable promise in voice biomarker technology. We endorse the authors’ caution that such methodologies be validated rigorously and that they serve as a complement to, rather than a replacement for, comprehensive clinical evaluations. We believe this technology, if implemented responsibly, could be of genuine value for Spanish primary care in detecting a wide range of disorders spanning mental health and neurodegeneration. We look forward to the continued evolution of this technology and the collaborative international efforts that will be required to make it safe, equitable, and genuinely beneficial to patients.
REFERENCES
1. Mazur A, Costantino H, Tom P, Wilson MP, Thompson RG. Evaluation of an AI-Based Voice Biomarker Tool to Detect Signals Consistent With Moderate to Severe Depression. Ann Fam Med. 2025 Jan 1;23(1):60–5.
2. Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–13.
3. Fraser KC, Meltzer JA, Rudzicz F, Garrard P. Linguistic Features Identify Alzheimer’s Disease in Narrative Speech. J Alzheimer’s Dis. 2016 Jan 1;49(2):407–22.
4. Cattaneo G, Bartrés-Faz D, Morris TP, Sánchez JS, Macià D, Tormos JM, et al. The Barcelona Brain Health Initiative: Cohort description and first follow-up. PLOS ONE. 2020 Feb 11;15(2):e0228754.
5. Cattaneo G, Bartrés-Faz D, Morris TP, Sánchez JS, Macià D, Tarrero C, et al. The Barcelona Brain Health Initiative: A Cohort Study to Define and Promote Determinants of Brain Health. Front Aging Neurosci [Internet]. 2018 Oct 11 [cited 2025 Feb 28];10. Available from: https://www.frontiersin.org/journals/aging-neuroscience/articles/10.3389...
6. García-Gutiérrez F, Marquié M, Muñoz N, Alegret M, Cano A, de Rojas I, et al. Harnessing acoustic speech parameters to decipher amyloid status in individuals with mild cognitive impairment. Front Neurosci [Internet]. 2023 Sep 7 [cited 2025 Feb 28];17. Available from: https://www.frontiersin.org/journals/neuroscience/articles/10.3389/fnins...
7. Little M, Mcsharry P, Roberts S, Costello D, Moroz I. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Nat Preced. 2007;1–1.
8. Regulation E. 679 of the European Parliament and of the Council (General Data Protection Regulation). Off J Eur Union. 2016;
9. Cuijpers P, Beekman ATF, Reynolds CF. Preventing Depression: A Global Priority. JAMA. 2012 Mar 14;307(10):1033–4.
10. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56.