Classifying depression severity in online chats through human-coded psycholinguistic analysis using DSM-5 and Beck Depression Inventory

Authors

Ross Azura Zahit Universiti Malaysia Sarawak
Amalia Madihie Universiti Malaysia Sarawak
Salmah Mohamad Yusoff Universiti Malaysia Sarawak
Ida Juliana Hutasuhut Universiti Malaysia Sarawak
Mohamad Azhari Abu Bakar Universiti Malaysia Sarawak
Mohamad Hardyman Barawi Universiti Malaysia Sarawak
Syahrul Nizam Junaini Universiti Malaysia Sarawak
Nur Haziyah Amni Raimaini Universiti Malaysia Sarawak

DOI:

https://doi.org/10.33736/jcshd.10058.2025

Keywords:

online chats, depression, mental health, social media, psycholinguistic

Abstract

Despite advances in artificial intelligence, accurately detecting the severity of depression in online communications remains a challenge, underscoring the need for expert-led psycholinguistic analysis. This study employs such an approach to examine depression and other mental health issues in online chat data. Depression severity was classified using DSM-5 and Beck's Depression Inventory by five mental health professionals, with results tested for inter-rater reliability. Human-coded psycholinguistics adds clinical nuance to the classification. A random sample of 4,000 chat entries was analysed, with five professionals independently categorising each entry based on the DSM-5 and Beck Depression Inventory criteria into the categories of no depression, mild, moderate, severe, or unknown. The analysis showed a high average inter-rater reliability, indicating substantial agreement among raters. Results revealed that 7% of chats exhibited some level of depression (2% mild, 2% moderate, 3% severe), 19% indicated other mental health issues such as anxiety, and 58% were ambiguous. These findings suggest that psycholinguistic analysis of online communication has strong potential for early detection of mental health issues. Integrating such features into digital tools could enhance early identification on online platforms, enabling timely intervention and better mental health support within communities.

References

Al-Mosaiwi, M., & Johnstone, T. (2018). In an absolute state: Elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation. Clinical Psychological Science, 6(4), 529–542. https://doi.org/10.1177/2167702617747074

Amanat, A., Rizwan, M., Javed, A. R., Abdelhaq, M., Alsaqour, R., Pandya, S., & Uddin, M. (2022). Deep learning for depression detection from textual data. Electronics, 11(5), 676. https://doi.org/10.3390/electronics11050676

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). https://doi.org/10.1176/appi.books.9780890425596

Amin, R., Schreynemackers, S., Oppenheimer, H., Petrovic, M., Hegerl, U., & Reich, H. (2025). Use of mobile sensing data for longitudinal monitoring and prediction of depression severity: Systematic review. Journal of Medical Internet Research, 27, e57418. https://doi.org/10.2196/57418

Barawi, M. H., Lin, C., & Siddharthan, A. (2017). Automatically labelling sentiment-bearing topics with descriptive sentence labels. In F. Frasincar, A. Ittoo, L. Nguyen, & E. Métais (Eds.), Lecture notes in computer science: Vol. 10260. Natural language processing and information systems (pp. 478–484). Springer. https://doi.org/10.1007/978-3-319-59569-6_38

Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4(6), 561–571. https://doi.org/10.1001/archpsyc.1961.01710120031004

Bijou, S. W., Umbreit, J., Ghezzi, P. M., & Chao, C.-C. (1986). Psychological linguistics: A natural science approach to the study of language interactions. The Analysis of Verbal Behavior, 4, 23–29. https://doi.org/10.1007/bf03392812

Brockmeyer, T., Zimmermann, J., Kulessa, D., Hautzinger, M., Bents, H., Friederich, H.-C., Herzog, W., & Backenstrass, M. (2015). Me, myself, and I: Self-referent word use as an indicator of self-focused attention in relation to depression and anxiety. Frontiers in Psychology, 6, 1564. https://doi.org/10.3389/fpsyg.2015.01564

Burkhardt, H. A., Alexopoulos, G. S., Pullmann, M. D., Hull, T. D., Areán, P. A., & Cohen, T. (2021). Behavioral activation and depression symptomatology: Longitudinal assessment of linguistic indicators in text-based therapy sessions. Journal of Medical Internet Research, 23(7), e28244. https://doi.org/10.2196/28244

Cacheda, F., Fernandez, D., Novoa, F. J., & Carneiro, V. (2019). Early detection of depression: Social network analysis and random forest techniques. Journal of Medical Internet Research, 21(6), e12554. https://doi.org/10.2196/12554

Cai, Y., Wang, H., Ye, H., Jin, Y., & Gao, W. (2023). Depression detection on online social network with multivariate time series feature of user depressive symptoms. Expert Systems with Applications, 217, 119538. https://doi.org/10.1016/j.eswa.2023.119538

Chancellor, S., & De Choudhury, M. (2020). Methods in predictive techniques for mental health status on social media: A critical review. NPJ Digital Medicine, 3, 43. https://doi.org/10.1038/s41746-020-0233-7

Chiong, R., Budhi, G. S., Dhakal, S., & Chiong, F. (2021). A textual-based featuring approach for depression detection using machine learning classifiers and social media texts. Computers in Biology and Medicine, 135, 104499. https://doi.org/10.1016/j.compbiomed.2021.104499

DeSouza, D. D., Robin, J., Gumus, M., & Yeung, A. (2021). Natural language processing as an emerging tool to detect late-life depression. Frontiers in Psychiatry, 12. https://doi.org/10.3389/fpsyt.2021.719125

Dewangan, D., Selot, S., & Panicker, S. (2022). Implementation of machine learning techniques for depression in text messages: A survey. I-Manager's Journal on Computer Science, 9(4), 13–20. https://doi.org/10.26634/jcom.9.4.18549

Geisinger, K. F., Bracken, B. A., Carlson, J. F., Hansen, J.-I. C., Kuncel, N. R., Reise, S. P., & Rodriguez, M. C. (2013). APA handbook of testing and assessment in psychology, Vol. 3: Testing and assessment in school psychology and education. American Psychological Association. https://doi.org/10.1037/14049-000

Han, K., Rezapour, R., Nakamura, K., Devkota, D., Miller, D. C., & Diesner, J. (2023). An expert-in-the-loop method for domain-specific document categorization based on small training data. Journal of the Association for Information Science and Technology, 74(6), 669–684. https://doi.org/10.1002/asi.24714

Huang, X. (2022). Ideal construction of chatbot based on intelligent depression detection techniques. IEEE International Conference on Electrical Engineering, Big Data and Algorithms, China, 511–515. https://doi.org/10.1109/EEBDA53927.2022.9744938

Jackson, J. C., Watts, J., List, J. -M., Puryear, C., Drabble, R., & Lindquist, K. A. (2022). From text to thought: How analyzing language can advance psychological science. Perspectives on Psychological Science, 17(3), 805–826. https://doi.org/10.1177/17456916211004899

Korhonen, J., Axelin, A., Katajisto, J., Lahti, M., & MEGA Consortium/Research Team. (2022). Construct validity and internal consistency of the revised Mental Health Literacy Scale in South African and Zambian contexts. Nursing Open, 9(2), 966–977. https://doi.org/10.1002/nop2.1132

Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9. Journal of General Internal Medicine, 16, 606–613 https://doi.org/10.1046/j.1525-1497.2001.016009606.x

Lange, R. T. (2011). Inter-rater reliability. In J. S. Kreutzer, J. DeLuca, & B. Caplan (Eds.), Encyclopedia of clinical neuropsychology (pp. 1348). Springer. https://doi.org/10.1007/978-0-387-79948-3_1203

Li, I., Li, Y., Li, T., Alvarez-Napagao, S., Garcia-Gasulla, D., & Suzumura, T. (2020). What are we depressed about when we talk about COVID-19: Mental health analysis on tweets using natural language processing. In M. Bramer & R. Ellis (Eds.), Lecture notes in computer science: Vol. 12498. Proceedings of the 2020 SGAI International Conference (pp. 358–370). Springer. https://doi.org/10.1007/978-3-030-63799-6_27

Low, D. M., Rumker, L., Talkar, T., Torous, J., Cecchi, G., & Ghosh, S. S. (2020). Natural language processing reveals vulnerable mental health support groups and heightened health anxiety on reddit during COVID-19: Observational study. Journal of Medical Internet Research, 22(10), e22635. https://doi.org/10.2196/22635

Mehta, Y., Fatehi, S., Kazameini, A., Stachl, C., Cambria, E., & Eetemadi, S. (2020). Bottom-up and top-down: Predicting personality with psycholinguistic and language model features. IEEE International Conference on Data Mining, Italy, 1184–1189. https://doi.org/10.1109/ICDM50108.2020.00146

Ostic, D., Qalati, S. A., Barbosa, B., Shah, S. M. M., Galvan Vela, E., Herzallah, A. M., & Liu, F. (2021). Effects of social media use on psychological well-being: A mediated model. Frontiers in Psychology, 12, 678766. https://doi.org/10.3389/fpsyg.2021.678766

Oudin, A., Maatoug, R., Bourla, A., Ferreri, F., Bonnot, O., Millet, B., Schoeller, F., Mouchabac, S., & Adrien, V. (2023). Digital phenotyping: Data-driven psychiatry to redefine mental health. Journal of Medical Internet Research, 25, e44502. https://doi.org/10.2196/44502

Pennebaker, J. W., Mehl, M. R., & Niederhoffer, K. G. (2003). Psychological aspects of natural language use: Our words, our selves. Annual Review of Psychology, 54, 547–577. https://doi.org/10.1146/annurev.psych.54.101601.145041

Ren, X., Burkhardt, H. A., Areán, P. A., Hull, T. D., & Cohen, T. (2023). Deep representations of first-person pronouns for prediction of depression symptom severity. Annual Symposium Proceedings Archive, 1226–1235.

Solmi, M., Radua, J., Olivola, M., Croce, E., Soardo, L., Salazar de Pablo, G., Il Shin, J., Kirkbride, J. B., Jones, P., Kim, J. H., Kim, J. Y., Carvalho, A. F., Seeman, M. V., Correll, C. U., & Fusar-Poli, P. (2022). Age at onset of mental disorders worldwide: Large-scale meta-analysis of 192 epidemiological studies. Molecular Psychiatry, 27, 281–295. https://doi.org/10.1038/s41380-021-01161-7

Tinsley, H. E., & Weiss, D. J. (1975). Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology, 22(4), 358–376. https://doi.org/10.1037/h0076640

Zahra, T., & Ahmed, S. (2025). Generational differences in emoji interpretation: A study of millennial, Gen Z, and baby boomers. Advance Social Science Archive Journal, 3(2), 857–864.

Ziemer, K. S., & Korkmaz, G. (2017). Using text to predict psychological and physical health: A comparison of human raters and computerised text analysis. Computers in Human Behavior, 76, 122–127. https://doi.org/10.1016/j.chb.2017.06.038

Downloads

Published

2025-09-30

How to Cite

Zahit, R. A. ., Madihie, A. ., Mohamad Yusoff, S., Hutasuhut, I. J., Abu Bakar, M. A., Barawi, M. H., Junaini, S. N., & Nur Haziyah Amni Raimaini. (2025). Classifying depression severity in online chats through human-coded psycholinguistic analysis using DSM-5 and Beck Depression Inventory. Journal of Cognitive Sciences and Human Development, 11(2), 214–231. https://doi.org/10.33736/jcshd.10058.2025

Download Citation

Issue

Vol. 11 No. 2 (2025): Journal of Cognitive Sciences and Human Development

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Copyright Transfer Statement for Journal

1) In signing this statement, the author(s) grant UNIMAS Publisher an exclusive license to publish their original research papers. The author(s) also grant UNIMAS Publisher permission to reproduce, recreate, translate, extract or summarize, and to distribute and display in any forms, formats, and media. The author(s) can reuse their papers in their future printed work without first requiring permission from UNIMAS Publisher, provided that the author(s) acknowledge and reference publication in the Journal.

2) For open access articles, the author(s) agree that their articles published under UNIMAS Publisher are distributed under the terms of the CC-BY-NC-SA (Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License) which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original work of the author(s) is properly cited.

3) The author(s) is/are responsible to ensure his or her or their submitted work is original and does not infringe any existing copyright, trademark, patent, statutory right, or propriety right of others. Corresponding author(s) has (have) obtained permission from all co-authors prior to submission to the journal. Upon submission of the manuscript, the author(s) agree that no similar work has been or will be submitted or published elsewhere in any language. If submitted manuscript includes materials from others, the authors have obtained the permission from the copyright owners.

4) In signing this statement, the author(s) declare(s) that the researches in which they have conducted are in compliance with the current laws of the respective country and UNIMAS Journal Publication Ethics Policy. Any experimentation or research involving human or the use of animal samples must obtain approval from Human or Animal Ethics Committee in their respective institutions. The author(s) agree and understand that UNIMAS Publisher is not responsible for any compensational claims or failure caused by the author(s) in fulfilling the above-mentioned requirements. The author(s) must accept the responsibility for releasing their materials upon request by Chief Editor or UNIMAS Publisher.

5) The author(s) should have participated sufficiently in the work and ensured the appropriateness of the content of the article. The author(s) should also agree that he or she has no commercial attachments (e.g. patent or license arrangement, equity interest, consultancies, etc.) that might pose any conflict of interest with the submitted manuscript. The author(s) also agree to make any relevant materials and data available upon request by the editor or UNIMAS Publisher.