Research Trends in Record Management Using Unstructured Text Data Analysis

홍덕용; 허준석

doi:10.14404/JKSARM.2023.23.4.073

ACOMS+ 및 학술지 리포지터리 설명회

한국과학기술정보연구원(KISTI) 서울분원 대회의실(별관 3층)
2024년 07월 03일(수) 13:30

사전등록 바로가기

오늘 하루 그만보기

P-ISSN1598-1487
E-ISSN2671-7247

홈으로

OA 정책

ISSN : 1598-1487

논문 상세

이전 다음

논문 투고

Vol.23 No.4

Citation Share

비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향

Research Trends in Record Management Using Unstructured Text Data Analysis

한국기록관리학회지 / Journal of Korean Society of Archives and Records Management, (P)1598-1487; (E)2671-7247

2023, v.23 no.4, pp.73-89

https://doi.org/10.14404/JKSARM.2023.23.4.073

홍덕용 (부산광역시 수영구청 기록물관리전문요원)
허준석 (㈜에이티앤아이 대표이사)

홍덕용, & 허준석. (2023). 비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향. 한국기록관리학회지, 23(4), 73-89, https://doi.org/10.14404/JKSARM.2023.23.4.073

복사

초록

본 연구에서는 텍스트 마이닝 기법을 활용하여 국내 기록관리 연구 분야의 비정형 텍스트 데이터인 국문 초록에서 사용된 키워드 빈도를 분석하여 키워드 간 거리 분석을 통해 국내기록관리 연구 동향을 파악하는 것이 목적이다. 이를 위해 한국학술지인용색인(Korea Citation Index, KCI)의 학술지 기관통계(등재지, 등재후보지)에서 대분류(복합학), 중분류(문헌정보학)으로 검색된 학술지(28종) 중 등재지 7종 1,157편을 추출하여 77,578개의 키워드를 시각화하였다. Word2vec를 활용한 t-SNE, Scattertext 등의 분석을 수행하였다. 분석 결과, 첫째로 1,157편의 논문에서 얻은 77,578개의 키워드를 빈도 분석한 결과, "기록관리"(889회), "분석"(888회), "아카이브"(742회), "기록물"(562회), "활용"(449회) 등의 키워드가 연구자들에 의해 주요 주제로 다뤄지고 있음을 확인하였다. 둘째로, Word2vec 분석을 통해 키워드 간의 벡터 표현을 생성하고 유사도 거리를 조사한 뒤, t-SNE와 Scattertext를 활용하여 시각화하였다. 시각화 결과에서 기록관리 연구 분야는 두 그룹으로 나누어졌는데 첫 번째 그룹(과거)에는 "아카이빙", "국가기록관리", "표준화", "공문서", "기록관리제도" 등의 키워드가 빈도가 높게 나타났으며, 두 번째 그룹(현재)에는 "공동체", "데이터", "기록정보서비스", "온라인", "디지털 아카이브" 등의 키워드가 주요한 관심을 받고 있는 것으로 나타났다.

keywords: 기록관리연구동향, 빅데이터, 텍스트마이닝, t-분포확률적임베딩, 산점도, Research Trends in Record Management, Big Data, Text Mining, t-SNE, Scattertext

Abstract

This study aims to analyze the frequency of keywords used in Korean abstracts, which are unstructured text data in the domestic record management research field, using text mining techniques to identify domestic record management research trends through distance analysis between keywords. To this end, 1,157 keywords of 77,578 journals were visualized by extracting 1,157 articles from 7 journal types (28 types) searched by major category (complex study) and middle category (literature informatics) from the institutional statistics (registered site, candidate site) of the Korean Citation Index (KCI). Analysis of t-Distributed Stochastic Neighbor Embedding (t-SNE) and Scattertext using Word2vec was performed. As a result of the analysis, first, it was confirmed that keywords such as “record management” (889 times), “analysis” (888 times), “archive” (742 times), “record” (562 times), and “utilization” (449 times) were treated as significant topics by researchers. Second, Word2vec analysis generated vector representations between keywords, and similarity distances were investigated and visualized using t-SNE and Scattertext. In the visualization results, the research area for record management was divided into two groups, with keywords such as “archiving,” “national record management,” “standardization,” “official documents,” and “record management systems” occurring frequently in the first group (past). On the other hand, keywords such as “community,” “data,” “record information service,” “online,” and “digital archives” in the second group (current) were garnering substantial focus.

keywords: 기록관리연구동향, 빅데이터, 텍스트마이닝, t-분포확률적임베딩, 산점도, Research Trends in Record Management, Big Data, Text Mining, t-SNE, Scattertext

바로가기메뉴

논문 상세

Vol.23 No.4

비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향

Research Trends in Record Management Using Unstructured Text Data Analysis

초록

Abstract

한국기록관리학회지