정보관리학회지, 한국정보관리학회

421

베스트셀러 순위가 공공도서관 대출에 미치는 영향 분석: 패널자료 분석을 중심으로

이종욱(경북대학교 문헌정보학과) ; 강우진(경북대학교 일반대학원 문헌정보학과) ; 박중규(경북대학교 심리학과) 2021, Vol.38, No.4, pp.1-23 https://doi.org/10.3743/KOSIM.2021.38.4.001

초록보기

초록

본 연구에서는 베스트셀러 목록에 포함된 도서의 순위가 공공도서관에서의 평균 대출 건수에 미치는 영향을 패널분석을 통해 살펴보고자 하였다. 본 연구를 위해 문화 빅데이터 플랫폼을 통하여 국립중앙도서관이 제공하는 데이터를 바탕으로 2018년 1월 1일부터 2019년 12월 29일까지 총 104주 동안의 분석 대상 도서 179권의 공공도서관 대출 데이터 세트를 생성하였고, YES24 웹사이트를 통해 같은 기간 주간 베스트셀러 목록 데이터 세트를 구축하였다. 공공도서관 대출과 베스트셀러 도서 순위 간 정확한 관계를 확인하기 위해 패널자료의 특성을 활용한 분석 방식인 선형회귀모형, 고정효과모형, 확률효과모형 등 세 개의 모형을 비교한 결과, 고정효과모형이 가장 적합한 것으로 나타났다. 순위 데이터 결측값이 47주 미만인 179권의 도서의 자료를 고정효과모형으로 분석한 결과, 도서의 베스트셀러 순위가 한 단계 내려가면 공공도서관에서의 해당 도서 평균 대출 건수가 0.108권 유의미한 수준에서 감소한다는 것을 밝혀내었다. 또한, 베스트셀러 순위가 도서 평균 대출 건수에 미치는 효과가 도서의 내용분류에 따라 상이함을 알 수 있었다. 이 연구는 베스트셀러 순위가 사람들의 도서관 대출행태에 영향을 미치고 있음을 실증적으로 확인한 것으로, 공공도서관에서는 이용자의 요구를 예측하고, 장서 개발 정책 수립에 베스트셀러 목록을 비롯한 사회문화적 맥락을 고려할 필요가 있음을 시사한다.

Abstract

The purpose of this study is to analyze the effects of the bestseller ranks on the book circulations in public libraries. To achieve this goal, the weekly data sets of 179 books’ library circulation and bestseller list from January 1, 2018 to December 29, 2019 were constructed based on the data collected from BigData MarketC and YES24. Three methods for analyzing panel data including linear regression, fixed-effect, and random effect models were compared, and it turned out that fixed-effect model was better than other methods. The results show that the average ranks of bestsellers were associated with their public library circulations visually. Also, the analysis of fixed-effect model showed that the single rank decline of a book on the bestseller list decreases its average circulation of 0.108 while the size of effect varied depending on subject of books. The study empirically demonstrated the impact of a bestseller list on people’s book circulation behavior, suggesting that public libraries need to reference sociocultural context as well as bestseller book lists to predict library user needs and to formulate collection development policy.

422

트위터 게시물 분석을 통한 코로나바이러스감염증-19 백신에 대한 의견 탐색

정우진(성균관대학교 문헌정보학과) ; 김규리(성균관대학교 문헌정보학과) ; 유승희(성균관대학교) ; 주영준(성균관대학교) 2021, Vol.38, No.4, pp.113-128 https://doi.org/10.3743/KOSIM.2021.38.4.113

초록보기

초록

본 연구는 코로나바이러스감염증-19(이하 코로나바이러스) 백신에 대한 사회적 의견을 파악하기 위해 트위터에서 작성된 백신 관련 게시물들을 분석하였다. 2020년 3월 16일부터 2021 3월 15일까지 1년간 트위터에서 작성된 코로나바이러스 백신 이름을 키워드로 포함한 45,413개의 게시물을 수집하여 분석하였다. 데이터 수집을 위해 활용된 코로나바이러스 백신 키워드는 총 12개이며, 수집된 게시물 수순으로 ‘화이자’, ‘아스트라제네카’, ‘모더나’, ‘얀센’, ‘노바백스’, ‘시노팜’, ‘시노백’, ‘스푸트니크’, ‘바라트’, ‘캔시노’, ‘추마코프’, ‘벡토르’이다. 수집된 게시물들은 수기와 자동화된 방법을 동시 활용하여 키워드 분석, 감성 분석, 및 토픽모델링을 통하여 백신들에 대한 의견을 탐색하였다. 연구결과에 따르면 전반적으로 백신에 대한 부정적인 반응이 많았으며, 백신 접종 후유증에 대한 불안 및 백신의 효능에 대한 불신이 백신들에 대한 부정적인 주요 요소로 파악되었다. 이와는 반대로, 백신 접종에 따른 코로나바이러스 확산 억제에 대한 기대감이 백신에 대한 긍정적인 사회적 요소인 것을 확인할 수 있었다. 본 연구는 기존의 선행연구들이 뉴스 등 대중매체 데이터를 통해 코로나바이러스 백신에 대한 사회적 분위기를 파악하고자 했던 것과 달리, 소셜 미디어 데이터 수집 및 이를 활용한 키워드 분석, 감성 분석, 토픽 모델링 등의 여러 분석방법들을 사용하여 대중들의 의견을 파악하는 것으로 학술적 의의를 지닌다. 또한, 본 연구의 결과는 백신에 대한 사회적 분위기를 반영한 백신 접종 권장 정책 수립 기여라는 실질적 함의를 시사한다.

Abstract

In this study, we aimed to understand the public opinion on COVID-19 vaccine. To achieve the goal, we analyzed COVID-19 vaccine-related Twitter posts. 45,413 tweets posted from March 16, 2020 to March 15, 2021 including COVID-19 vaccine names as keywords were collected. The 12 vaccine names used for data collection included ‘Pfizer’, ‘AstraZeneca’, ‘Modena’, ‘Jansen’, ‘NovaVax’, ‘Sinopharm’, ‘SinoVac’, ‘Sputnik V’, ‘Bharat’, ‘KhanSino’, ‘Chumakov’, and ‘VECTOR’ in the order of the number of collected posts. The collected posts were analyzed manually and automatedly through keyword analysis, sentiment analysis, and topic modeling to understand the opinions for the investigated vaccines. According to the results, there were generally more negative posts about vaccines than positive posts. Anxiety about the aftereffects of vaccination and distrust in the efficacy of vaccines were identified as major negative factors for vaccines. On the contrary, the anticipation for the suppression of the spread of coronavirus following vaccination was identified as a positive social factor for vaccines. Different from previous studies that investigated opinions about COVID-19 vaccines through mass media data such as news articles, this study explores opinions of social media users using keyword analysis, sentiment analysis, and topic modeling. In addition, the results of this study can be used by governmental institutions for making policies to promote vaccination reflecting the social atmosphere.

423

향토문화 콘텐츠를 위한 디지털 편찬 관리시스템 개발에 관한 연구: "한국향토문화전자대전"의 사례를 중심으로

김수영(한국학중앙연구원 한국학정보센터) 2009, Vol.26, No.3, pp.213-237 https://doi.org/10.3743/KOSIM.2009.26.3.213

초록보기

초록

향토문화란 한 지방의 자연환경 속에서 과거로부터 현재까지 면면히 전승되어 온 역사와 전통, 풍물과 생활, 예술과 유물 및 유적 등의 모든 유산을 의미한다. 한국학중앙연구원에서는 이런 향토문화를 디지털 콘텐츠로 제작하고 이를 이용하여 『한국향토문화전자대전』을 편찬하였다. 향토문화 콘텐츠는 기록물의 특성을 가지고 있어 출처주의, 계층목록과 같은 기록물관리체계를 따르고 있으며 이를 편찬․관리하기 위한 시스템은 기초자료, 단편적 정보 및 고급정보 등이 하나의 시스템 안에서 순환하면서 새로운 지식정보를 만들어내도록 도와주는 순환형 지식정보관리시스템을 지향한다. 순환형 지식정보관리시스템의 이용자는 이 시스템을 통해 직접적으로 자료를 수집할 수 있을 뿐만 아니라 다른 곳으로부터 데이터를 수집할 수 있으며, 나아가 수집한 데이터를 가공하여 새로운 지식 정보를 창출할 수 있다. 그러나 다양한 향토문화 콘텐츠의 구조에 포함된 의미적인 특징을 손상시키지 않고 데이터베이스를 구축하기가 매우 어렵고, 또한 이러한 작업은 장시간에 걸쳐 여러 차례의 교정 작업이 수행되어야 하므로 문서편찬, 교정, 서비스가 동시에 수행되는 시스템을 필요로 한다. 따라서 본 논문에서는 『한국향토문화전자대전』의 사례를 중심으로 고문서가 많이 포함된 향토문화 콘텐츠의 의미적인 특징을 손상시키지 않으면서, 문서의 구조정보를 표현할 수 있는 XML 기반의 디지털 편찬 관리시스템을 제시하고 본 연구에서 개발된 관리시스템에서 향토문화 콘텐츠 관리를 위해 확장된 기능을 소개한다.

Abstract

Local culture is a cultural heritage that has come down from generation to generation in the natural environment of a region. It includes history, tradition, natural features, art, and historic relics. The Academy of Korean Studies has complied “The Encyclopedia of Korean Local Culture” using those local culture contents. Local culture content shave the features of documentary, such as authenticating the source, and managing hierarchy structure. Thus, to deal with local culture contents, a “circular knowledge information management system” is sought for that helps basic, fragmentary, and high-level information to circulate to create new knowledge information within the system. A user of this circular knowledge information management system is able not only to collect data directly in it, but also to fetch data from other database. Besides, processing the collected data helps to create new knowledge information. But, it’s very difficult to sustain the features of the original hierarchy bearing meaning contained in the various kinds of local culture contents when building a new database. Moreover, this kind of work needs many times of correction over a long period of time. Therefore, a system in which compilation, correction, and service can be done simultaneously is needed. Therefore, in this study, focusing on the case of “The Encyclopedia of Korean Local Culture”, I propose a XML-based digital compilation management system that can express hierarchy information and sustain the semantic features of the local culture contents containing lots of ancient documents, and introduce the expanded functions developed to manage contents in the system.

424

빅딜, 오픈액세스, 구글학술검색과 대학도서관의 전자학술정보구독

심원식(성균관대학교) 2012, Vol.29, No.4, pp.143-163 https://doi.org/10.3743/KOSIM.2012.29.4.143

초록보기

초록

현재 국내외 대학도서관의 전자학술정보 입수는 일명 빅딜로 불리는 수백, 수천 종의 전자학술지 묶음을 다년간, 고정된 인상율로 계약하는 구독방식이 주류를 이루고 있다. 1990년대 중반에 시작된 이러한 구독방식은 대학도서관과 이용자에게 많은 장점을 제공했다. 하지만 이들 패키지의 가격이 지속적으로 상승함에 따라 이러한 방식의 지속가능성에 대한 의문이 제기되고 있다. 현재까지 pay-per-view 방식을 제외하면 구독기반 모형의 구체적인 대안은 제시되지 않고 있으며 향후 도서관 예산문제가 심각한 뇌관으로 남아있다. 2000년대 초반 시작된 오픈액세스 운동은 다양한 방법으로 학술지의 출판과 유통의 장벽을 제거하고 있다. 오픈액세스 출판 규모는 매년 두 자릿수로 증가하고 있고, 오픈액세스 학술지 논문은 Scopus와 Web of Science 인용데이터베이스에의 편입비율이 20%에 육박하는 등 양적, 질적인 성장을 보이고 있다. 2004년에 시작된 구글 학술검색은 현재 대다수 학술출판사의 학술지 논문에 대한 편리한 검색 및 접근 도구로 성장하고 있다. 비록 학술지 선택의 기준, 제한된 검색 기능, 독점화에 대한 우려 등이 있지만 구글 학술검색을 대학도서관 데이터베이스의 대안으로 진지하게 주목할 필요가 있다. 대학도서관의 예산 문제, 오픈액세스 출판의 활성화, 구글 학술검색과 같은 무료 도구의 성장은 구독기반 모형을 대체할 수 있는 파괴적인 변화로 인식되고 있으며 대학도서관 사서는 새로운 환경에 대한 구체적인 대응을 고민해야 한다.

Abstract

The dominant model of acquiring scholarly contents at academic libraries is so called big deal where libraries subscribe to a bundle of hundreds, if not thousands of journals in a multi-year contract with fixed annual rate increase. The bid deal, started in the mid-1990s, offered a number of advantages for academic libraries and their users. However, escalating prices for these packages have become a serious issue casting doubts about the sustainability of the subscription-based model. At the moment, it appears there is no viable alternative other than pay-per-view method that is being tested at some libraries. Libraries’ budget situation will remain a key factor that might change the situation. Open access started in the 2000s as a vehicle to eliminate barriers to publishing and distributing peer-reviewed scholarly journal articles. Open access publishing is witnessing two-digit growth annually. Open access articles now occupy close to 20% of two major citation databases: Scopus and Web of Science. Google Scholar service, debuted in late 2004, is now a popular tool for discovering and accessing scholarly articles from a vast selection of journals around the world. There is a call for taking Google Scholar seriously as a potential replacement of library databases amid concerns regarding the quality of journals indexed, limited search capabilities vis-à-vis library databases, and monopoly of public goods. Escalating budget problems, rapid growth of open access publishing and the emergence of powerful free tool, such as Google Scholar, need to be taken seriously as these forces might bring disruptive changes to the existing subscription-based model of scholarly contents at academic libraries

425

기록물용 KORMARC 데이터필드 개발을 위한 메타데이터 요소에 관한 연구

박진희(전북대학교) 2005, Vol.22, No.3, pp.351-378 https://doi.org/10.3743/KOSIM.2005.22.3.351

초록보기

초록

본 연구는 기존의 도서관정보시스템에서 기록물을 검색, 이용할 수 있도록 기록물용 KORMARC 데이터필드 개발을 위한 메타데이터 요소를 설정하였다. 본 연구의 결과를 요약하면 다음과 같다.첫째, 본 연구에서는 ISAD(G)2에서 제시하고 있는 7개 영역 외에 보존영역(conservation area) 과 물리적 기술영역(physical description area)을 추가하였다. 그리고 ISAD(G)2는 26가지 요소만을 제시하 고 있어 상세수준의 기술요소를 필요로 하는 기관에서는 불충분하다는 선행연구에서 제시된 문제점을 보완하기 위해 분석결과를 토대로 영역별 하위요소를 종합하여 선정하였다.둘째, 우리나라 기록물의 특수성을 기술요소에 반영하기 위해 사무관리규정시행규칙과 전자정부 구현을 위한 행정업무 등의 전자화 촉진에 관한 법률에서 제시하고 있는 종이 공문서 및 전자문서 서식의 분석을 통해 선정한 기록물 기술요소를 추가하였다. 또한 공공기관의 기록물 관리에 관한 법률 시행령에서 규정하고 있는 공개여부 및 등급, 공개 일자, 공개범위, 보존기간, 보존등급, 보존가치, 기록물의 상태기술 요소를 추가하였다.셋째, 기록물 관리를 위해 512 생산일자 관련주기(creation dates note)와 5 검색보조도구주 기(finding aids note), 583 작업현황 주기(action note), 584 245 표제저자사항(title statement), 30 물리적 기술(physical description), 306 재생/연 주시간(playing time), 506 접근제한주기(restriction on acces note), 534 원본주기(original version note), 535 원본/사본의 소장처주기(location of originals/duplicates note), 540 이용과 복제제한에 관한 주기(terms governing use and reproduction notes), 541 직접적 graphical or historical note), 581 출판주기(publication note), 850 소장처(holding institution) 데이터필드의 식별기호를 재구성, 추가하였다.

Abstract

The study intended to develop KORMARC for archives in order to integrate archives with library materials. The results of the study can be sumarized as folows; (1) 2 areas for conservation and physical description are aded study has also proved that the existing 26 elements of ISAD(G)2 are not fuly enough to satisfy the information demands of institutions and its users as wel. (2) For the use of domestic archives in particular, the study h as added the description elements of archives that apeared in the Government Regulations of Ofice Managemen t and those forms of documents that are specified by law for the sake of computerization. The study has aded the possible release and grade, release dates, release range, conservation periods, conservation grade, conservation value, the status description of archives elements that are specified in Public Record Management Law.(3) The study has developed the following data fields to be add ed into KORMARC. and 584 accumulation note. Also it reorganizes and adds the indicators of the 245 title statement, 300 physica l description, 306 playing time, 506 restriction on access note, 534 original version note, 535 location of orig inals/duplicates note, 540 terms governing use and reproduction notes, 541 imediate source of acquisition not publication note, 850 holding institution data fields.

426

BERTopic을 활용한 불면증 소셜 데이터 토픽 모델링 및 불면증 경향 문헌 딥러닝 자동분류 모델 구축

고영수(연세대학교 문헌정보학과 석사과정) ; 이수빈(연세대학교 문헌정보학과 박사과정) ; 차민정(연세대학교 소셜오믹스 연구센터) ; 김성덕(연세대학교 문헌정보학과 석사과정) ; 이주희(연세대학교 문헌정보학과 석사과정) ; 한지영(연세대학교 문헌정보학과 석사과정) ; 송민(연세대학교 문헌정보학과) 2022, Vol.39, No.2, pp.111-129 https://doi.org/10.3743/KOSIM.2022.39.2.111

초록보기

초록

불면증은 최근 5년 새 환자가 20% 이상 증가하고 있는 현대 사회의 만성적인 질병이다. 수면이 부족할 경우 나타나는 개인 및 사회적 문제가 심각하고 불면증의 유발 요인이 복합적으로 작용하고 있어서 진단 및 치료가 중요한 질환이다. 본 연구는 자유롭게 의견을 표출하는 소셜 미디어 ‘Reddit’의 불면증 커뮤니티인 ‘insomnia’를 대상으로 5,699개의 데이터를 수집하였고 이를 국제수면장애분류 ICSD-3 기준과 정신의학과 전문의의 자문을 받은 가이드라인을 바탕으로 불면증 경향 문헌과 비경향 문헌으로 태깅하여 불면증 말뭉치를 구축하였다. 구축된 불면증 말뭉치를 학습데이터로 하여 5개의 딥러닝 언어모델(BERT, RoBERTa, ALBERT, ELECTRA, XLNet)을 훈련시켰고 성능 평가 결과 RoBERTa가 정확도, 정밀도, 재현율, F1점수에서 가장 높은 성능을 보였다. 불면증 소셜 데이터를 심층적으로 분석하기 위해 기존에 많이 사용되었던 LDA의 약점을 보완하며 새롭게 등장한 BERTopic 방법을 사용하여 토픽 모델링을 진행하였다. 계층적 클러스터링 분석 결과 8개의 주제군(‘부정적 감정’, ‘조언 및 도움과 감사’, ‘불면증 관련 질병’, ‘수면제’, ‘운동 및 식습관’, ‘신체적 특징’, ‘활동적 특징’, ‘환경적 특징’)을 확인할 수 있었다. 이용자들은 불면증 커뮤니티에서 부정 감정을 표현하고 도움과 조언을 구하는 모습을 보였다. 또한, 불면증과 관련된 질병들을 언급하고 수면제 사용에 대한 담론을 나누며 운동 및 식습관에 관한 관심을 표현하고 있었다. 발견된 불면증 관련 특징으로는 호흡, 임신, 심장 등의 신체적 특징과 좀비, 수면 경련, 그로기상태 등의 활동적 특징, 햇빛, 담요, 온도, 낮잠 등의 환경적 특징이 확인되었다.

Abstract

Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from ‘insomnia’, a community on ‘Reddit’, a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups (‘Negative emotions’, ‘Advice and help and gratitude’, ‘Insomnia-related diseases’, ‘Sleeping pills’, ‘Exercise and eating habits’, ‘Physical characteristics’, ‘Activity characteristics’, ‘Environmental characteristics’) could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

427

다차원척도법을 이용한 어린이도서관 별치 자료에 대한 인지 거리 연구

김효윤(연수청학도서관) ; 조재인(인천대학교) 2017, Vol.34, No.1, pp.51-71 https://doi.org/10.3743/KOSIM.2017.34.1.051

초록보기

초록

본 연구는 초등학교 저학년과 고학년, 학부모로 구성된 어린이 도서관 이용자들 200여명이 인지하는 별치 자료간 희망 인지 거리를 다차원척도법(Multi-Dimensional Scaling: MDS)과 K-means 군집분석을 활용해 비교 분석하고 이들의 인지 거리가 실제 어린이 도서관에 어떻게 투영되어 있는지 몇 가지 사례를 통하여 검토해 보았다. 다차원척도법은 분석 대상의 유사성이나 속성 등을 평가하여 공간상에 투영시키는 기법으로 마케팅에서 주로 시장 진단을 위해 활용되지만, 제품이나 시설에 대한 이용자의 인지적 거리를 분석하여 이상적인 물리적 배치 방안을 제시하는 데에도 적용할 수 있다. 분석 결과, 별치 자료간 인지 거리에 있어 초등학교 저학년과 고학년 그리고 학부모 집단간에 각각 차이가 나타났으며, 특히 유․아동자료와 컴퓨터자료 그리고 유아자료와 아동자료간의 인지 거리에 있어 큰 차이가 존재하는 것으로 분석되었다. 한편, Y구의 3개 어린이도서관을 대상으로 분석된 인지 거리 체계가 어떻게 투영되어 있는지 확인해 본 결과, 특정 집단의 인지 체계에 완벽히 부합하는 공간 구조를 지닌 도서관은 존재하지 않았으나, 공통적으로 유․아동자료와 컴퓨터자료, 그리고 유아자료와 아동자료가 분리 배치되어 있다는 점에서 학부모와 초등학생들의 인지 거리가 부분적으로 투영되어 있는 것으로 검토되었다.

Abstract

This study conducted a survey to measure recognition distance between the materials which are located separately in a children’s library targeting 200 elementary school lower grade students, higher grade students, and school parents(adults). And compared recognition distance between the elements of materials of individual visitor group with multidimensional scaling and K-mean group analysis. Multidimensional Scaling (MDS) is a technique for projecting the cognitive state in space by evaluating the similarity or attribute of the analysis target. Even though it is mainly used for market diagnosis in marketing, It can also be applied to present an ideal physical layout plan by analyzing the distance. As a result of analysis, the main discoveries are as follows. First, elementary school students cognize child, baby and computer materials should be adjacent as a same group. But recognition of adults(school parents) is reflected by differing from elementary school students vastly. They cognize that computer materials should be formed as a special group separated from child and baby’s materials. Second, elementary school higher graders and adults(school parents) groups also want to separate their main reading materials from baby’s book, therefore They both want to secure silent reading space separating from baby. Third, as a result to confirming how this recognition distance system of materials is reflected in a real children’s library through three children’s libraries in Y-gu, Incheon, there is no library with structure according perfectly with a recognition system of a particular class, but a recognition system of adults and elementary school students is partially reflected because baby, child and computer materials, and baby and child materials are commonly separated and placed. It is difficult to insist that a recognition system of a visitor group, especially a recognition system of children is absolute consideration conditions in material placement of a children’s library. However, understanding cognition of the user groups can be an important evidentiary factors to offer differentiated service space according to visitors and effective placement of the elements of library resources.

428

딥러닝 기반 소셜미디어 한글 텍스트 우울 경향 분석

박서정(연세대학교 문헌정보학과) ; 이수빈(연세대학교 문헌정보학과) ; 김우정(연세대학교 의과대학 용인세브란스병원 정신건강의학교실) ; 송민(연세대학교 문헌정보학과) 2022, Vol.39, No.1, pp.91-117 https://doi.org/10.3743/KOSIM.2022.39.1.091

초록보기

초록

국내를 비롯하여 전 세계적으로 우울증 환자 수가 매년 증가하는 추세이다. 그러나 대다수의 정신질환 환자들은 자신이 질병을 앓고 있다는 사실을 인식하지 못해서 적절한 치료가 이루어지지 않고 있다. 우울 증상이 방치되면 자살과 불안, 기타 심리적인 문제로 발전될 수 있기에 우울증의 조기 발견과 치료는 정신건강 증진에 있어 매우 중요하다. 이러한 문제점을 개선하기 위해 본 연구에서는 한국어 소셜 미디어 텍스트를 활용한 딥러닝 기반의 우울 경향 모델을 제시하였다. 네이버 지식인, 네이버 블로그, 하이닥, 트위터에서 데이터 수집을 한 뒤 DSM-5 주요 우울 장애 진단 기준을 활용하여 우울 증상 개수에 따라 클래스를 구분하여 주석을 달았다. 이후 구축한 말뭉치의 클래스 별 특성을 살펴보고자 TF-IDF 분석과 동시 출현 단어 분석을 실시하였다. 또한, 다양한 텍스트 특징을 활용하여 우울 경향 분류 모델을 생성하기 위해 단어 임베딩과 사전 기반 감성 분석, LDA 토픽 모델링을 수행하였다. 이를 통해 문헌 별로 임베딩된 텍스트와 감성 점수, 토픽 번호를 산출하여 텍스트 특징으로 사용하였다. 그 결과 임베딩된 텍스트에 문서의 감성 점수와 토픽을 모두 결합하여 KorBERT 알고리즘을 기반으로 우울 경향을 분류하였을 때 가장 높은 정확률인 83.28%를 달성하는 것을 확인하였다. 본 연구는 다양한 텍스트 특징을 활용하여 보다 성능이 개선된 한국어 우울 경향 분류 모델을 구축함에 따라, 한국 온라인 커뮤니티 이용자 중 잠재적인 우울증 환자를 조기에 발견해 빠른 치료 및 예방이 가능하도록 하여 한국 사회의 정신건강 증진에 도움을 줄 수 있는 기반을 마련했다는 점에서 의의를 지닌다.

Abstract

The number of depressed patients in Korea and around the world is rapidly increasing every year. However, most of the mentally ill patients are not aware that they are suffering from the disease, so adequate treatment is not being performed. If depressive symptoms are neglected, it can lead to suicide, anxiety, and other psychological problems. Therefore, early detection and treatment of depression are very important in improving mental health. To improve this problem, this study presented a deep learning-based depression tendency model using Korean social media text. After collecting data from Naver KonwledgeiN, Naver Blog, Hidoc, and Twitter, DSM-5 major depressive disorder diagnosis criteria were used to classify and annotate classes according to the number of depressive symptoms. Afterwards, TF-IDF analysis and simultaneous word analysis were performed to examine the characteristics of each class of the corpus constructed. In addition, word embedding, dictionary-based sentiment analysis, and LDA topic modeling were performed to generate a depression tendency classification model using various text features. Through this, the embedded text, sentiment score, and topic number for each document were calculated and used as text features. As a result, it was confirmed that the highest accuracy rate of 83.28% was achieved when the depression tendency was classified based on the KorBERT algorithm by combining both the emotional score and the topic of the document with the embedded text. This study establishes a classification model for Korean depression trends with improved performance using various text features, and detects potential depressive patients early among Korean online community users, enabling rapid treatment and prevention, thereby enabling the mental health of Korean society. It is significant in that it can help in promotion.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지