Automatic Extraction of References for Research Reports using Deep Learning Language Model

Han Yukyung; 한유경; Choi Wonsuk; 최원석; Lee Minchul; 이민철

doi:10.3743/KOSIM.2023.40.2.115

P-ISSN1013-0799
E-ISSN2586-2073
KCI

Home

OA Policy

ISSN : 1013-0799

Article Contents

Prev Next

e-Submission

Vol.40 No.2

Citation Share

Automatic Extraction of References for Research Reports using Deep Learning Language Model

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073

2023, v.40 no.2, pp.115-135

https://doi.org/10.3743/KOSIM.2023.40.2.115

Yukyung Han (Korea Information Society Development Institute)
Wonsuk Choi (Korea Information Society Development Institute)
Minchul Lee (Kakao Enterprise Corp.)

Han, Y., Choi, W., & Lee, M. (2023). Automatic Extraction of References for Research Reports using Deep Learning Language Model. Journal of the Korean Society for Information Management, 40(2), 115-135, https://doi.org/10.3743/KOSIM.2023.40.2.115

copy

Abstract

The purpose of this study is to assess the effectiveness of using deep learning language models to extract references automatically and create a reference database for research reports in an efficient manner. Unlike academic journals, research reports present difficulties in automatically extracting references due to variations in formatting across institutions. In this study, we addressed this issue by introducing the task of separating references from non-reference phrases, in addition to the commonly used metadata extraction task for reference extraction. The study employed datasets that included various types of references, such as those from research reports of a particular institution, academic journals, and a combination of academic journal references and non-reference texts. Two deep learning language models, namely RoBERTa+CRF and ChatGPT, were compared to evaluate their performance in automatic extraction. They were used to extract metadata, categorize data types, and separate original text. The research findings showed that the deep learning language models were highly effective, achieving maximum F1-scores of 95.41% for metadata extraction and 98.91% for categorization of data types and separation of the original text. These results provide valuable insights into the use of deep learning language models and different types of datasets for constructing reference databases for research reports including both reference and non-reference texts.

keywords: research reports, automatic reference extraction, deep learning language model, natural language processing, named entity recognition

Submission Date: 2023-05-15

Revised Date: 2023-06-03

Accepted Date: 2023-06-10

바로가기메뉴

Article Contents

Vol.40 No.2

Automatic Extraction of References for Research Reports using Deep Learning Language Model

Abstract

Journal of the Korean Society for Information Management