바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

  • P-ISSN1013-0799
  • E-ISSN2586-2073
  • KCI

Automatic Extraction of References for Research Reports using Deep Learning Language Model

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2023, v.40 no.2, pp.115-135
https://doi.org/10.3743/KOSIM.2023.40.2.115
Yukyung Han (Korea Information Society Development Institute)
Wonsuk Choi (Korea Information Society Development Institute)
Minchul Lee (Kakao Enterprise Corp.)

Abstract

The purpose of this study is to assess the effectiveness of using deep learning language models to extract references automatically and create a reference database for research reports in an efficient manner. Unlike academic journals, research reports present difficulties in automatically extracting references due to variations in formatting across institutions. In this study, we addressed this issue by introducing the task of separating references from non-reference phrases, in addition to the commonly used metadata extraction task for reference extraction. The study employed datasets that included various types of references, such as those from research reports of a particular institution, academic journals, and a combination of academic journal references and non-reference texts. Two deep learning language models, namely RoBERTa+CRF and ChatGPT, were compared to evaluate their performance in automatic extraction. They were used to extract metadata, categorize data types, and separate original text. The research findings showed that the deep learning language models were highly effective, achieving maximum F1-scores of 95.41% for metadata extraction and 98.91% for categorization of data types and separation of the original text. These results provide valuable insights into the use of deep learning language models and different types of datasets for constructing reference databases for research reports including both reference and non-reference texts.

keywords
research reports, automatic reference extraction, deep learning language model, natural language processing, named entity recognition
Submission Date
2023-05-15
Revised Date
2023-06-03
Accepted Date
2023-06-10

Journal of the Korean Society for Information Management