Abstract
PURPOSE Over 95% of patients who screen positive on the Patient Health Questionnaire-9 (PHQ-9) suicide risk item do not attempt or die by suicide, which could lead to unnecessary treatment and/or misallocation of limited resources. The present study seeks to determine if suicide risk screening can be meaningfully improved to identify the highest-risk patients.
METHODS Patients eligible to receive medical treatment from the US Department of Defense medical system were recruited from 6 military primary care clinics located at 5 military installations around the United States. Patients completed self-report measures including the PHQ-9 and 16 items from the Suicide Cognitions Scale (SCS) during routine primary care clinic visits. Postbaseline suicidal behaviors (suicide attempts, interrupted attempts, and aborted attempts) were assessed by evaluators who were blind to screening results using the Self-Injurious Thoughts and Behaviors Interview.
RESULTS Among 2,744 patients, 13 (0.5%) engaged in suicidal behavior in the 30 days after screening and 28 (1.0%) displayed suicidal behavior in the 90 days after screening. Multiple SCS items differentiated patients with suicidal behavior less than 30 days after screening positive for suicide risk. Augmenting the PHQ-9 suicide risk item with SCS items improved the identification of patients who were most likely to have suicidal behavior within a month of screening positive without sacrificing sensitivity.
CONCLUSION Among primary care patients who screen positive for suicide risk on the PHQ-9, SCS items improved screening efficiency by identifying those patients who are most likely to engage in suicidal behavior within the next 30 days.
INTRODUCTION
From 1999 to 2017, the US suicide rate increased by more than 33%.1 Suicides have increased at an even faster rate among military personnel and veterans.2 Nearly one-half of suicide decedents visit primary care during the months immediately preceding their deaths.1,3,4 Expanded suicide risk screening in primary care may therefore improve suicide prevention efforts.3 At one time, the Joint Commission recommended universal screening of all patients for suicidal ideation, but the 2020 revision of The Joint Commission’s National Patient Safety Goals does not require universal screening. Other groups have also not recommended universal screening due to insufficient evidence regarding its potential benefits and harms.5-8
Where suicide risk screening is conducted, a common approach is to use the Patient Health Questionnaire-2 (PHQ-2),9 which assesses the frequency of depressed mood and anhedonia in the past 2 weeks. Patients who screen positive for depression on the PHQ-2 (ie, total score of 3 or higher) are then administered the remaining 7 items of the PHQ-9,10 which includes a single item (item 9) that asks about the frequency of “thoughts that you would be better off dead, or of hurting yourself in some way” during the past 2 weeks. This approach is supported by research showing that higher scores on item 9 correlate with increased risk for subsequent suicidal behavior, including suicide death,11,12 but findings are tempered by lengthy follow-up periods and low accuracy. For example, for each patient who screens positive on item 9 and then attempts suicide there are approximately 200 patients who screen positive but do not go on to attempt suicide.11,12
Screening results often drive clinical decision making, therefore poor accuracy could potentially lead to a range of unintended effects including misallocation of limited resources and initiation of unnecessary treatments, especially treatments that carry increased risk for adverse effects (eg, psychotropic medication or involuntary hospitalization). Improved methods for distinguishing which patients are most likely to attempt suicide in the near term could help primary care clinicians better identify those patients who are higher risk and warrant more immediate attention.
This study was designed to determine if suicide risk screening can be meaningfully improved to identify high-risk patients. To this end, we evaluated 2 approaches to augmenting responses to PHQ-9 item 9 in order to assess suicide risk among primary care patients. First, we assessed if any of the other PHQ-9 items could improve upon the use of item 9 alone to identify the patients who were most likely to attempt suicide in the near-term. Second, we evaluated the performance of 16 items extracted from the Suicide Cognitions Scale (SCS),13 a self-report questionnaire that asks patients to rate the extent to which they agree or disagree with a series of statements that are commonly endorsed by suicidal individuals. The SCS was selected for testing due to prior research showing that SCS item responses prospectively distinguish patients who attempt suicide from those with suicidal ideation only.13,14 We tested these 2 approaches to improving suicide risk screening with data collected in military primary care clinics. The setting is important because of the rapid rise in military suicides over the past 2 decades2 and because primary care and family practice clinics are the type of clinic most commonly visited during the 30 days preceding suicidal behavior.4
METHODS
The PRImary care Screening Methods (PRISM) study was a multisite, prospective cohort study of patients recruited from 6 military primary care clinics across the United States from July 2015 through August 2018. The sites were selected to represent a range of clinic types (eg, small community clinics to large medical centers) and 5 branches of the US military (Air Force, Army, Coast Guard, Marine Corps, Navy). This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines for cohort studies. Details about PRISM rationale and methods have been published.15 This study was approved by the Naval Health Research Center’s Institutional Review Board.
Participants
Participants were 2,744 primary care patients ranging from 18 to 89 years of age with a mean (SD) age of 40.4 (19.6) years. Patient characteristics are summarized in Table 1. Patients were included if they were aged 18 years or older, eligible to receive medical services from the Department of Defense, able to understand and read the English language, and able to complete the informed consent process. The only exclusion criterion was the presence of a medical or psychiatric condition that diminished capacity for providing informed consent (eg, acute intoxication, psychosis).
Procedures
Patients were recruited from waiting rooms in 6 primary care clinics located at 5 military installations across the United States, either before or after a routine medical visit. A trained research associate was located in the waiting room of each clinic at a table visible to all patients. Research associates invited patients seated in the waiting room to learn more about the study. Interested patients completed the informed consent process, after which they were provided with a WiFi-enabled computer tablet to complete a baseline self-report survey. After completing the survey, patients returned the tablet and were allowed to select a small token of appreciation for their participation (eg, t-shirt, $5 gift card to a local coffee shop). Participants were contacted 6 and 12 months postbaseline to complete a telephone-based interview assessing the occurrence of suicidal behaviors. Participants received a $50 electronic gift card for each completed follow-up interview.
Measures
Patient Health Questionnaire-9
The Public Health Questionaire-9 (PHQ-9) is an empirically supported self-report scale that assesses the frequency of depression symptoms during the preceding 2 weeks (0 = “not at all” to 3 = “nearly every day”).10 Item responses are summed to obtain an overall indicator of depressive symptom severity. Item 9 assesses the frequency of “thoughts that you would be better off dead, or of hurting yourself in some way” during the past 2 weeks. In the present study, nonzero responses on item 9 were considered positive for elevated suicide risk. Because we used item 9 as an independent variable, we summed item responses for the first 8 items only (PHQ-8) as a metric of depression symptoms (α = .90).
Suicide Cognitions Scale
The Suicide Cognitions Scale (SCS)13 is a self-report scale that assesses thoughts and perceptions commonly endorsed by suicidal patients. Respondents are asked to rate their agreement with each statement (1 = “strongly disagree” to 5 = “strongly agree”), and item responses are summed to obtain an overall indicator of suicide risk severity. Various combinations of SCS items have been shown to be correlated with suicidal ideation and future suicidal behaviors in outpatient and inpatient psychiatric samples.13,14,16-18 Because many suicidal patients underreport suicide-related thoughts due to concerns about the consequences of self-disclosure,19 we used only the 16 SCS items (out of 18 total) that did not include the word “suicide” (α = .97). We transformed SCS item responses from a 1-5 scale to a 0-4 scale for ease of interpretation, and summed the items to create a modified SCS total score.
Self-Injurious Thoughts and Behaviors Interview
The Self-Injurious Thoughts and Behaviors Interview is an empirically supported clinician-administered interview that assesses the occurrence and features of multiple suicide-related thoughts and behaviors.20,21 The interview asks about the occurrence of suicide attempts (“Have you made an actual attempt to kill yourself in which you had at least some intent to die?”), aborted suicide attempts (“Have you been close to killing yourself and at the last minute decide not to kill yourself?”), and interrupted suicide attempts (“Have you been very close to killing yourself and at the last minute someone or something else stopped you?”), consistent with the Self-Directed Violence Classification System.22 Positive endorsement of any of these behaviors was classified as suicidal behavior. Patients reporting suicidal behavior during follow-up interviews were subsequently asked to report the date on which the first behavior occurred. Timing of postbaseline suicidal behavior was determined by computing the number of days from enrollment to the first reported suicidal behavior.
Data Analytic Approach
The primary outcome was suicidal behavior, which combined suicide attempts, aborted attempts, and interrupted attempts. Our power analysis for this outcome indicated that a sample size of 2,744 yields 80% power to detect a small effect size (odds ratio [OR] >1.6), assuming a 2-tailed α = .05 and an estimated 0.5% event rate during the first month postbaseline. Follow-up data were missing from 952 (34.7%) participants. Preliminary assessment of missing values suggested data were missing at random23 (see Supplemental Appendix for details, available at https://www.AnnFamMed.org/lookup/suppl/doi:10.1370/afm.2729/-/DC1). We therefore used multiple imputation with an expectation maximization algorithm, because this approach has been shown to yield unbiased results even with rates of missing data that exceed the rate observed in this study.24 Results of simulation studies have found that, for data sets with 30% missing data, 10 imputations yield a 2% loss in power as compared with 100 imputations.25 We therefore imputed 10 data sets to balance computational efficiency with preservation of statistical power at 78% for our main analyses. To assess for potential bias, we compared results based on pooled data from the imputed data sets to results based on complete data, and found that they did not meaningfully differ. Additional details about our missing data analysis and multiple imputation procedures are available in the Supplemental Appendix.
Several preliminary analyses were also conducted to describe the sample. Pearson’s correlation coefficients and χ2 tests of association were used to identify correlates of positive screens at baseline. Univariate and multivariate logistic regression models were used to identify correlates of suicidal behavior occurring within 30 days and 90 days postbaseline.
The primary aim of the project was to improve the identification of the highest-risk primary care patients who screened positive for suicide risk, ie, those who engaged in suicidal behavior within 30 and 90 days of screening positive for suicide risk on the PHQ-9. We therefore sought to maximize specificity, consistent with the recommendations of Kraemer.26 To accomplish this goal, we used receiver operator characteristic (ROC) analysis to compute the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) associated with each PHQ-9 and SCS item among patients who screened positive at baseline. All analyses were conducted using SPSS version 26 software (IBM Corp).
RESULTS
Sixty-six (2.4%) participants declined to answer the suicide risk screening item 9. Of the 262 (11.9% of the total sample) patients who screened positive for suicide risk on the PHQ-9 at baseline, 177 (67.6%) endorsed “several days,” 51 (19.5%) endorsed “more than half the days,” and 34 (13.0%) endorsed “nearly every day.” Compared with patients who screened negative at baseline, those screening positive at baseline were significantly more likely to report a previous attempted suicide (6.4% vs 33.3%; χ2 (1) = 209.8, P <.001), scored significantly higher on the PHQ-9 (mean [SD] 5.0 [5.0] vs 15.4 [5.9]; t(2,676) = 31.5, P <.001), and scored significantly higher on the modified SCS (mean [SD] 4.5 [8.0] vs 24.7 [16.0]; t(2,583) = 33.9, P < .001). Sex, race, ethnicity, age, and branch of service were unrelated to screening result.
Correlates of Follow-Up Suicide Attempts
Thirteen (0.5%) patients reported suicidal behavior during the first 30 days postbaseline and 28 (1.0%) reported suicidal behavior during the first 90 days postbaseline. Five of the 13 patients (38.5%) with suicidal behavior during the first 30 days postbaseline and 11 of the 28 (39.3%) of patients with suicidal behavior during the first 90 days postbaseline screened negative at baseline (see Supplemental Table 1, available at https://www.AnnFamMed.org/lookup/suppl/doi:10.1370/afm.2729/-/DC1). Intercorrelations among predictor variables and results of the univariate and multivariate logistic regression models predicting follow-up suicidal behaviors are summarized in Supplemental Table 2, https://www.AnnFamMed.org/lookup/suppl/doi:10.1370/afm.2729/-/DC1, and Supplemental Table 3, https://www.AnnFamMed.org/lookup/suppl/doi:10.1370/afm.2729/-/DC1. In the univariate models, modified SCS total score, PHQ-9 item 9, prior suicide attempts, and PHQ-8 total score were associated with significantly increased risk for suicidal behavior during both time frames. In the multivariate models, the modified SCS total score was associated with significantly increased risk for suicidal behavior within 90 days of baseline but fell shy of statistical significance within 30 days of baseline.
ROC Analysis
Results of ROC analysis are summarized in Supplemental Table 4, https://www.AnnFamMed.org/lookup/suppl/doi:10.1370/afm.2729/-/DC1. Only 1 PHQ-8 item (item 5 = “poor appetite or overeating”) had a statistically significant area under the curve (AUC) during the first 30 days postbaseline, indicating responses identified patients with suicidal behavior at better than chance. None of the PHQ items had statistically significant AUC values for the first 90 days postbaseline. This is in contrast to the SCS items which all had statistically significant AUC values during the first 30 and first 90 days postbaseline. Item 8 (“It is unbearable when I get this upset”), item 13 (“I can’t imagine anyone being able to withstand this kind of pain”), and item 16 (“I don’t deserve to live another moment”) had 3 of the largest AUC values during the first 30 and 90 days postbaseline. In each case, the optimal cutoff score was an item response of 2 (“neutral”) or higher (“agree” or “strongly agree”) on a 0 to 4 scale.
Augmenting the PHQ-9 Item 9 With SCS items
A positive screen on PHQ-9 item 9 combined with a positive screen on SCS items 8, 13, or 16 was associated with an increased probability of suicidal behavior in the near term (Supplemental Table 5, https://www.AnnFamMed.org/lookup/suppl/doi:10.1370/afm.2729/-/DC1). For each of these SCS items, the probability of suicidal behavior within the following 30 days increased monotonically with higher item scores, indicating that higher scores signaled increasing risk for suicidal behavior in the near term. Performance metrics for the PHQ-2, PHQ-9, and the PHQ-9/SCS combinations (ie, PHQ-9 with SCS item 8, 13, or 16) are summarized in Table 2. As can be seen, all of the 2-item screeners improved the 30-day PPV, with items 13 and 16 performing better than item 8. Items 8 and 13 did not reduce sensitivity, meaning the improved accuracy did not increase the number of missed cases. By contrast, 30-day sensitivity dropped with item 16 because the number of missed cases increased from 6 out of 13 to 8 out of 13.
DISCUSSION
Although positive endorsement of the PHQ-9 suicide risk item 9 is correlated with significantly elevated risk for suicidal behavior during the following year, the vast majority (>95%) of primary care patients who screen positive on this item do not go on to attempt suicide.11,12 Because screening results often influence treatment allocation, improved accuracy could improve clinical decision making. The present study found that the accuracy of suicide risk screening using the PHQ-9 can be significantly enhanced among primary care patients by augmenting this widely used scale with 1 or more self-report items from the SCS.
Three of the SCS items contributed to meaningful improvements in the identification of patients who went on to engage in suicidal behaviors within 30 days and 90 days of screening. For example, only 4.1% of patients who screened positive on the PHQ-9 suicide risk item 9 engaged in suicidal behavior within the next 30 days, but when the PHQ-9 suicide risk item 9 was combined with SCS item 13 (the best-performing SCS item), this percentage nearly doubled, to 7.6% of patients. Critically, the improvement in screening accuracy did not come with an increase in false negatives. Overall, the performance statistics of the combined PHQ-9/SCS item 13 screening tool (sensitivity = 0.538, specificity = 0.960, PPV = 0.071) were better than those achieved from complex suicide prediction models using predictive analytic methods like machine learning.27 Simon et al,28 for instance, found that a machine learning algorithm developed to identify primary care patients who attempted suicide during the following 3 months based on electronic medical record data yielded lower performance statistics (sensitivity = 0.482, specificity = 0.951, PPV = 0.025). If replicated in future research, our results may represent a simple, accessible, and low-cost suicide screening method for health care systems that do not have the resources to develop computationally intensive data analytic models.
The performance of the SCS items stand in contrast to the performance of the first 8 items of the PHQ-9, of which only 1 significantly distinguished those patients who engaged in suicidal behavior within 30 days of screening positive for suicide risk. Collectively, these findings suggest 2 conclusions. First, the observed improvements in accuracy are not attributable to the mere addition of any second item. Second, the SCS is measuring something that is more strongly related to suicide risk than depression symptoms. This pattern converges with previous research supporting the validity of the SCS as an indicator of elevated risk for suicidal behaviors.13,14
An important caveat is that the primary aim of our analysis was to improve the identification of the highest risk patients among those who screened positive for suicide risk on the PHQ-9, not to improve the identification of high-risk patients who were missed by the PHQ-9. Consistent with this objective, we focused on maximizing specificity and PPV without adversely impacting sensitivity. Efforts to improve the identification of patients who are missed by the PHQ-9 (ie, maximizing sensitivity) would likely result in a different combination of items.26 Additional research aimed at this objective is warranted.
Strengths of this study include its prospective design and enrollment across multiple sites located in various geographic regions of the United States. Our use of researcher interviews instead of medical record data to assess follow-up suicide attempts is another strength as previous research has found that medical record data significantly underestimates the occurrence of suicidal behaviors when compared with researcher interviews.29,30 This strength is counterbalanced by the limitation that researcher interviews are more likely to and did result in missing data due to participant attrition. Although we used a robust method for handling missing data (multiple imputation) conclusions should nonetheless be considered within the context of this limitation. Another limitation involves the restriction of data collection to military health care beneficiaries, which may limit the generalizability of results to nonmilitary populations and health care systems. Also, because we were unable to track how many unique patients declined to participate or chose to not approach our recruitment station, we are unable to assess if our study sample differed from the entire population of patients who were eligible to participate. Further, the administration of all 16 items of the SCS to patients, rather than administering only a single item, may have influenced patient response patterns. Finally, it is possible that stigma influenced participants’ willingness to endorse suicidal behaviors during follow-up. Results and conclusions should therefore be considered preliminary until further testing of a 2-item screening assessment can be accomplished.
CONCLUSIONS
Our findings suggest that the accuracy of the PHQ-9 item 9, a common suicide risk screening tool widely used in primary care medical settings, can be meaningfully improved with the addition of a single item from the SCS. Of the 3 SCS items that improved the identification of those patients who were most likely to engage in suicidal behaviors soon after reporting thoughts of death or self-harm during a primary care visit, 2 SCS items improved specificity and PPV without reducing sensitivity: “It is unbearable when I get this upset” (item 8) and “I can’t imagine anyone being able to withstand this kind of pain” (item 13). Augmenting the PHQ-9 with one of these SCS items could provide a simple, accessible, and lowcost method for more accurately identifying primary care patients who require more immediate clinical intervention.
Footnotes
Conflicts of interest: see endnote.
To read or post commentaries in response to this article, go to https://www.AnnFamMed.org/content/19/6/492/tab-e-letters.
Conflicts of interest: C.J.B.has received research grants awarded to The University of Utah and The Ohio State University from the Department of Defense, National Institutes of Health, the Boeing Company, and Cardinal Health Foundation. He is a paid consultant in research design, data analysis, and report writing from Oui Therapeutics LLL and Neurostat Analytical Solutions. He receives royalties from Guilford Publications, Routledge, and Taylor and Francis Publishing. He is a principal of Anduril LLC, which conducts training workshops and provides mental health consultation focused on suicide prevention and PTSD. M.H.A. has received support from grants awarded to the University of Colorado from the Department of Defense, National Institutes of Health, and the Substance Abuse and Mental Health Services Administration. He is also a paid research and training consultant for Receptor Life Sciences, Allergan, and Signant Health. A.M.M. has received research grants awarded to The University of Utah from the Department of Defense. All other authors report none.
Funding support: This project was supported by the Office of the Assistant Secretary of Defense for Health Affairs, through the Defense Medical Research and Development Program under Award No. W81XWH-14-1-0272 (PI: Bryan) and work unit no. N1426. The views expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the Department of the Navy, Department of Defense, nor the US Government.
Disclaimer: C.A.C, C.J.T., and M.D.W. are military service members or employees of the US Government. This work was prepared as part of their official duties. Title 17, USC §105 provides that copyright protection under this title is not available for any work of the US Government. Title 17, USC §101 defines a US Government work as work prepared by a military service member or employee of the US Government as part of that person’s official duties.
Supplemental materials: Available at https://www.AnnFamMed.org/lookup/suppl/doi:10.1370/afm.2729/-/DC1.
- Received for publication November 10, 2020.
- Revision received February 5, 2021.
- Accepted for publication February 9, 2021.
- © 2021 Annals of Family Medicine, Inc.