바로가기메뉴

본문 바로가기 주메뉴 바로가기

Automatic Generation of Bibliographic Metadata with Reference Information for Academic Journals

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2022, v.56 no.3, pp.241-264
https://doi.org/10.4275/KSLIS.2022.56.3.241




Abstract

Bibliographic metadata can help researchers effectively utilize essential publications that they need and grasp academic trends of their own fields. With the manual creation of the metadata costly and time-consuming. it is nontrivial to effectively automatize the metadata construction using rule-based methods due to the immoderate variety of the article forms and styles according to publishers and academic societies. Therefore, this study proposes a two-step extraction process based on rules and deep neural networks for generating bibliographic metadata of scientific articlles to overcome the difficulties above. The extraction target areas in articles were identified by using a deep neural network-based model, and then the details in the areas were analyzed and sub-divided into relevant metadata elements. IThe proposed model also includes a model for generating reference summary information, which is able to separate the end of the text and the starting point of a reference, and to extract individual references by essential rule set, and to identify all the bibliographic items in each reference by a deep neural network. In addition, in order to confirm the possibility of a model that generates the bibliographic information of academic papers without pre- and post-processing, we conducted an in-depth comparative experiment with various settings and configurations. As a result of the experiment, the method proposed in this paper showed higher performance.

keywords
자연어처리, 정보 추출, 참고문헌 추출, 메타데이터 추출, 언어모델, NLP, Information Extraction, Reference Extraction, Metadata Extraction, Language Model

Journal of the Korean Society for Library and Information Science