바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Topic Modeling Insomnia Social Media Corpus using BERTopic and Building Automatic Deep Learning Classification Model

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2022, v.39 no.2, pp.111-129
https://doi.org/10.3743/KOSIM.2022.39.2.111
Young Soo Ko (Yonsei University)
Soobin Lee (Yonsei University)
Minjung Cha (Yonsei University)
Seongdeok Kim (Yonsei University)
Juhee Lee (Yonsei University)
Ji Yeong Han (Yonsei University)
Min Song (Yonsei University)
  • Downloaded
  • Viewed

Abstract

Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from ‘insomnia’, a community on ‘Reddit’, a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups (‘Negative emotions’, ‘Advice and help and gratitude’, ‘Insomnia-related diseases’, ‘Sleeping pills’, ‘Exercise and eating habits’, ‘Physical characteristics’, ‘Activity characteristics’, ‘Environmental characteristics’) could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

keywords
topic modeling, insomnia, social media, BERTopic, deep learning model

Journal of the Korean Society for Information Management