Abstract
PURPOSE The US Preventive Services Task Force recommends screening for depression in the general adult population. Although screening questionnaires for depression and anxiety exist in primary care settings, electronic health tools such as computerized adaptive tests based on item response theory can advance screening practices. This study evaluated the validity of the Computerized Adaptive Test for Mental Health (CAT-MH) for screening for major depressive disorder (MDD) and assessing MDD and anxiety severity among adult primary care patients.
METHODS We approached 402 English-speaking adults for participation from a primary care clinic, of whom 271 adults (71% female, 65% black) participated. Participants completed modules from the CAT-MH (Computerized Adaptive Diagnostic Test for MDD, CAT–Depression Inventory, CAT–Anxiety Inventory); brief paper questionnaires (9-item Patient Health Questionnaire [PHQ-9], 2-item Patient Health Questionnaire [PHQ-2], Generalized Anxiety Disorder 7-item Scale [GAD-7]); and a reference-standard interview, the Structured Clinical Interview for DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition) Diagnoses.
RESULTS On the basis of the interview, 31 participants met criteria for MDD and 29 met criteria for GAD. The diagnostic accuracy of the Computerized Adaptive Diagnostic Test for MDD (area under curve [AUC] = 0.85) was similar to that of the PHQ-9 (AUC = 0.84) and higher than that of the PHQ-2 (AUC = 0.76) for MDD screening. Using the interview as the reference standard, the accuracy of the CAT–Anxiety Inventory (AUC = 0.93) was similar to that of the GAD-7 (AUC = 0.97) for assessing anxiety severity. The patient-preferred screening method was assessment via tablet/computer with audio.
CONCLUSIONS Computerized adaptive testing could be a valid and efficient patient-centered screening strategy for depression and anxiety screening in primary care settings.
- screening
- depression
- anxiety
- mental health
- symptom assessment
- surveys and questionnaires
- health informatics
- electronic health records
- vulnerable populations
- primary care
- practice-based research
INTRODUCTION
Major depressive disorder (MDD) and generalized anxiety disorder (GAD) affect nearly 10% of adults1 and are largely managed in primary care settings.2,3 At least one-half of patients with depression in primary care are not recognized or adequately treated, however.4–7 Adequately treating MDD and GAD is imperative, given patients’ adverse health outcomes and high health care costs when these conditions go untreated.8–13
A crucial first step to improving depression and anxiety outcomes is adequate screening.14 The most commonly used screening tools in primary care are paper based and have a limited number of predetermined questions.15–18 However, nearly 90% of US primary care physicians have electronic health records (EHRs),19 presenting the opportunity to leverage electronic tools for screening.
Computerized adaptive tests (CATs) are electronic tools that create personalized assessments by adaptively varying the questions administered based on patient responses to previous questions. By design, CATs minimize measurement uncertainty and have greater precision than traditional self-report assessments. Several CATs for depression and anxiety have been developed,20–37 including the Computerized Adaptive Test for Mental Health (CAT-MH). The CAT-MH comprises a suite of assessments, including ones for MDD screening,38 MDD severity,39,40 and anxiety severity.41 It was developed using multidimensional item response theory and random forests to capture the multidimensional nature of psychological disorders.27
The CAT-MH has been validated for adults presenting for outpatient psychiatric treatment,38,39,41,42 but has yet to be validated among adult primary care populations. Because the prevalence of depression and anxiety may be comparatively lower among the latter, different questions in the item bank may be more appropriate; thus, validation in this population is warranted. Also, the use of CATs, based on multidimensional item response theory, has great potential in primary care to increase the efficiency of assessing mental and physical health. We therefore evaluated the validity of the CAT-MH for MDD screening and for assessing depression and anxiety severity in adult primary care patients.
METHODS
Participants
Participants were adults aged 18 years or older presenting to the internal medicine clinic in an urban, academic medical center. Individuals were eligible if they spoke English, they could see and hear study directions, they screened negative for dementia,43,44 and their physician assented to approaching them for recruitment.
Procedure
This study was approved by the medical center’s institutional review board and monitored by a data safety monitoring board. Patients were approached for participation while waiting to meet with their physician. All participants provided informed consent. Study activities took place in private clinic rooms with only the participant and assessor present.
We rotated the order of assessments (ie, interview, CAT-MH, brief questionnaires) weekly to reduce bias due to testing order. If a participant expressed active suicidal or homicidal ideation, plan, or intent, the assessor notified the patient’s physician and study principal investigator (N.L.). Safety assessments were conducted, with follow-up as necessary. At the end of the study, participants received a referral list of mental health resources and $10 gift card.
Measures
Demographics
Participants reported their age, sex, race, ethnicity, education, income level, and medical history.
Clinical Interview
Trained assessors (A.K.G. and A.M.) administered the SCID—Structured Clinical Interview for DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition), Research Version—a semistructured interview, to assess for MDD and GAD.45 The SCID is generally considered the reference-standard psychiatric assessment and has been used in past validation studies.42,46 Interviewers were not aware of the results of the other assessments.
Computerized Adaptive Test for Mental Health
Participants completed the CAT-MH without assistance using a tablet computer. They could both read and listen to the questions, and used the tablet touchscreen to provide responses; headphones were offered for privacy. The CAT-MH was delivered via a secure, Health Insurance Portability and Accountability Act (HIPAA)-compliant server.
The Computerized Adaptive Diagnostic Test for MDD (CAD-MDD) was administered to screen for MDD,38 and the CAT-Depression Inventory (CAT-DI) was administered to assess depression severity among patients who screened positive for MDD on the former.39 The CAD-MDD and CAT-DI select questions from an item bank of 389 possible questions. Questions are adaptively administered until a precise symptom severity estimate is achieved. The first question is selected randomly from the middle of the severity range; additional items are selected based on their information content conditional on the current severity score determined by the items already administered. CAT-DI scores range from 0 to 100 and are grouped as normal (less than 50), mild symptoms (50 to 65), moderate symptoms (66 to 75), and severe symptoms (higher than 75).39
The CAT-Anxiety Inventory (CAT-ANX) assesses anxiety severity based on 431 possible questions. Scores range from 0 to 100 and are grouped as normal (less than 35), mild symptoms (35 to 50), moderate symptoms (51 to 65), and severe symptoms (higher than 65).41
The CAT-MH is distributed by Adaptive Testing Technologies, of which author R.D.G. is a founder. He contributed to the study design and writing, but was not responsible for data acquisition or analysis. Information regarding use of the CAT-MH is available from the company (https://adaptivetestingtechnologies.com/).
Brief Questionnaires
Participants self-administered brief questionnaires using paper and pen.
The 9-item Patient Health Questionnaire (PHQ-9) assesses MDD symptoms over the past 2 weeks and has been validated for screening and severity assessment in primary care.18,47,48 Scores range from 0 to 27; scores of 10 and higher indicate likely MDD.18 Scores are grouped as normal or minimal symptoms (0 to 4), mild symptoms (5 to 9), moderate symptoms (10 to 14), moderately severe symptoms (15 to 19), and severe symptoms (20 and higher).18
The 2-item Patient Health Questionnaire (PHQ-2) contains the first 2 questions of the PHQ-9, which assess depressed mood and anhedonia over the past 2 weeks.17,47 The PHQ-2 has been validated in primary care, with a sensitivity of 0.61 and specificity of 0.92.17,49 Scores range from 0 to 6, and a score of 3 or higher is a common cutoff for indicating likely MDD.
The Generalized Anxiety Disorder 7-item Scale (GAD-7) assesses GAD symptoms, with good internal and test-retest reliability for detecting GAD in primary care.16 Scores range from 0 to 21; scores of 10 and higher indicate likely GAD.16 Scores are grouped as mild symptoms (5 to 9), moderate symptoms (10-14), and severe symptoms (15 and higher).16
We used a self-administered paper questionnaire to assess participant preference for screening delivery method. Participants were asked, “When answering questions about your mood, what format did you like best?” and “What format did you like least?” Response options to both questions were online (CAT-MH; on the tablet/computer), interview (SCID; with an assessor), and paper and pencil (questionnaires).
Statistical Analysis
To assess performance of the CAT-MH compared with that of the brief questionnaires, we needed to recruit 270 participants to achieve a 90% area under the curve (AUC) with a 5% margin of error.50
We performed descriptive analyses, including tests of associations between SCID diagnoses and participant self-reported histories of MDD and GAD. The diagnostic performance of the CAT-MH and questionnaires was compared with that of the SCID using receiver operating characteristic curve analysis.51 Agreement between the CAT-MH and brief questionnaires for MDD and anxiety severity were compared using κ statistics.52 As κ statistics require an equal number of categories between variables, we calculated 2 κ values for MDD severity, by collapsing the fourth PHQ-9 severity category (moderately severe) into the third (moderate) or fifth (severe) category. These groupings were selected to merge clinically similar categories (eg, moderately severe and severe), not based on an empirical distribution of the responses.
We used logistic regression models to compare anxiety severity scores with SCID diagnoses and generate predicted probabilities of GAD for specific CAT-ANX scores. The Kruskal-Wallis H test and χ2 test were used to assess associations between patients’ preferred screening delivery method and demographics.
Analyses were conducted using SPSS version 22 (IBM Corp). Statistical significance was set at a 2-sided P value <.05.
RESULTS
Participant Characteristics
Figure 1 presents the study flow diagram. Of 402 patients approached, 271 (67%) completed the study assessments. Table 1 shows their sociodemographic and clinical characteristics, which reflect those of the overall clinic population.
On the basis of the SCID, 31 participants met criteria for MDD and 29 met criteria for GAD. SCID-diagnosed MDD was associated with self-reported depression (odds ratio [OR] = 19.2; 95% CI, 7.8-46.7) and anxiety (OR = 14.3; 95% CI, 6.2-33.2). SCID-diagnosed GAD was associated with self-reported depression (OR = 9.1; 95% CI, 3.9-21.2) and anxiety (OR = 13.6; 95% CI, 5.7-32.5).
Screening for MDD
The CAD-MDD identified 42 participants as screening positive for likely MDD, whereas the PHQ-9 identified 37 participants. The CAD-MDD administered 4.2 (SD 0.5) questions on average (range, 4 to 6), and the median time to completion was 42 seconds (interquartile range [IQR] = 34 to 60).
With the SCID as the reference, accuracy of the CAD-MDD (AUC = 0.85) was similar to that of the PHQ-9 (AUC = 0.84) and higher than that of the PHQ-2 (AUC = 0.76) (Table 2). Agreement for MDD screening between the CAD-MDD and PHQ-9 (κ = 0.66 ± 0.07; P <.001) was higher than agreement between the CAD-MDD and PHQ-2 (κ = 0.45 ± 0.08; P <.001).
Assessing Depression Severity
The CAT-DI administered 7.6 (SD 1.9) questions on average (range, 5 to 15), and the median time to completion was 71 seconds (IQR = 52 to 93). CAT-DI scores strongly correlated with PHQ-9 scores (r = 0.76; P <.001). The CAT-DI and PHQ-9 severity levels had fair agreement, regardless of whether participants classified as having moderately severe symptoms by the PHQ-9 were grouped with those having moderate symptoms (κ = 0.26 ± 0.09; P = .001) or with those having severe symptoms (κ = 0.22 ± 0.09; P = .008).
Assessing Anxiety Severity
The CAT-ANX administered 11.8 (SD 4.1) questions on average (range, 5 to 22), and the median time to completion was 94 seconds (IQR = 67 to 150). Compared with the SCID, the CAT-ANX and the GAD-7 performed similarly well (AUC = 0.93 and 0.97, respectively) (Table 2). Participants’ odds of SCID-diagnosed GAD increased with each 1-unit increase in CAT-ANX score (measured on a 100-point scale) (OR = 1.10; 95% CI, 1.07-1.13) and each 1-category increase in severity (normal, mild, moderate, severe) (OR = 6.4; 95% CI, 3.7-10.9). The CAT-ANX scores correlated with GAD-7 scores (ρ = 0.74; P <.001). There was fair agreement between severity classifications on the CAT-ANX and GAD-7 (κ = 0.40 ± 0.06; P <.001). Figure 2 shows the probability that a patient would meet criteria for GAD based on the SCID given various CAT-ANX scores, as predicted by logistic regression analysis.
Preferred Screening Delivery Method
Participants preferred using the tablet computer most often (53%), followed by the interview (33%), and, lastly, paper-and-pencil questionnaires (14%) (Supplemental Figure 1A, available at http://www.annfammed.org/content/17/1/23/suppl/DC1/). The majority of participants (64%) rated paper-and-pencil questionnaires as their least preferred screening method (Supplemental Figure 1B, available at http://www.annfammed.org/content/17/1/23/suppl/DC1/).
Screening by tablet was the preferred method among black individuals compared with nonblack individuals (χ2 = 7.8; P = .02). There was no association between preferred screening method and age, sex, education, income, self-reported depression or anxiety, or SCID-diagnosed depression or anxiety.
DISCUSSION
To improve depression and anxiety detection and management in primary care, efficient and accurate screening tools are essential. We evaluated CATs among adult primary care patients and demonstrated that the CAT-MH is a valid instrument for screening for MDD and assessing depression and anxiety severity compared with reference-standard interviews. Also, the CAT-MH had higher accuracy than the commonly used PHQ-2 for depression screening.15,49 Participants preferred delivery by tablet computer over interview and paper-based questionnaires, highlighting the acceptability of this screening approach.
The CAD-MDD performance was comparable to that of the PHQ-9 for MDD screening and administered fewer questions on average (4 vs 9). The CAT-ANX performance also was comparable to that of the GAD-7 for assessing anxiety but required more questions (12 vs 7). The CAD-MDD outperformed the PHQ-2 for screening for MDD and required only 2 additional questions on average. As the CAD-MDD median completion time was 42 seconds, efficiency was not sacrificed. Compared with past CAD-MDD validation studies,38,42 this study showed lower sensitivity but higher specificity, which may be due to a lower prevalence of MDD in primary care than in psychiatric clinics. Our results also may differ because the clinic population has a much higher proportion of black individuals (65%) compared with past CAT-MH study populations (eg, 10%38 and 5%42). Future work to tailor the item bank questions, the algorithm, or both for primary care patients may improve the accuracy and efficiency of the CAT-MH.
CATs using multidimensional item-response theory and cloud-based assessments offer potential advantages over traditional written assessments for use in medicine. Ideally, the growing integration of EHRs in primary care19 could enable implementation of CATs in clinical practice.22,23,25,53–56 Patients’ test responses could be added to EHRs in real time and to searchable forms to automate development of disease-specific population registries. The online format can incorporate modules for additional mental health concerns (eg, suicidality57) and can be modified in real time. Using cloud-based assessments may enhance possibilities for patients to self-administer these tools, including outside of the clinic.58 Because clinicians can immediately access patients’ responses, physicians can monitor symptom changes without necessarily requiring in-person visits. Further, as the same questions on the CAT-MH are not repeatedly administered, patients can be routinely assessed in or out of the clinic without producing response bias due to repeated administration of the same questions using traditional instruments.40 The CAT-MH was developed using multidimensional item response theory, which permits measurement of complex traits such as depression and anxiety, and allows for much larger item banks than CATs based on unidimensional item response theory. These features offer advantages over other electronic tools that have been tested in primary care, such as the PsyScan e-tool,59 Patient-Reported Outcome Measure Information System (PROMIS) symptom measures (for which CATs are available),54,60,61 and the Adaptive Pediatric Symptom Checklist.62
The potential impact of health-related technologies for improving mental health assessment may be tempered by their limited integration in EHRs, however, as EHRs in many health care systems are not yet capable of integrating stand-alone programs. Also, in general, the impact of screening tools on mental health outcomes is limited because clinician assessment is necessary for diagnosis.
Limitations of this study should be noted. Our sample was disproportionately female, black, and well educated (with one-half having a college degree or higher). Replication in other populations and non-US samples is warranted and would allow for detailed analyses of severity scores. We recruited a small number of participants in accordance with our sample size calculation, which precludes analyzing findings among subpopulations. Lastly, we compared the CAT-MH with paper questionnaires, rather than with electronic questionnaires, because the former are commonly used in clinical practice. Participant preferences may have been influenced by delivery mode differences, however.
In conclusion, CATs could be a valid, highly efficient, and patient-centered approach for depression and anxiety screening and assessment in primary care patients. In this first validation study in primary care, the CAT-MH had similar diagnostic accuracy and correlated well with the PHQ-9 and GAD-7, and was delivered in a way that patients preferred.
Footnotes
↵* Both authors contributed equally to this work.
Conflicts of interest: Dr Gibbons is a founder of Adaptive Testing Technologies, the company that distributes the CAT-MH. This activity has been reviewed and approved by the University of Chicago. All other authors report none.
To read or post commentaries in response to this article, see it online at http://www.AnnFamMed.org/content/17/1/23.
Funding support: This work was supported by the National Institutes of Health (grants K23 DK097283, R01 MH100155, R01 MH66302, and F32 HD089586), the Agency for Healthcare Research & Quality (grant T32 HS000078), and an Innovation Award from the University of Chicago Medicine.
Previous presentations: Portions of this work have been presented at the Society for General Internal Medicine Annual Meeting; April 19-22, 2017; Washington, DC; and the Midwest Society for General Internal Medicine Regional Meeting; September 14-15, 2017; Chicago, Illinois.
Supplemental Materials: Available at http://www.AnnFamMed.org/content/17/1/23/suppl/DC1/.
- Received for publication December 7, 2017.
- Revision received August 21, 2018.
- Accepted for publication September 10, 2018.
- © 2019 Annals of Family Medicine, Inc.