바로가기메뉴

본문 바로가기 주메뉴 바로가기

A Study on Automatic Classification of Subject Headings Using BERT Model

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2023, v.57 no.2, pp.435-452
https://doi.org/10.4275/KSLIS.2023.57.2.435
Yong-Gu Lee

Abstract

This study experimented with automatic classification of subject headings using BERT-based transfer learning model, and analyzed its performance. This study analyzed the classification performance according to the main class of KDC classification and the category type of subject headings. Six datasets were constructed from Korean national bibliographies based on the frequency of the assignments of subject headings, and titles were used as classification features. As a result, classification performance showed values of 0.6059 and 0.5626 on the micro F1 and macro F1 score, respectively, in the dataset (1,539,076 records) containing 3,506 subject headings. In addition, classification performance by the main class of KDC classification showed good performance in the class General works, Natural science, Technology and Language, and low performance in Religion and Arts. As for the performance by the category type of the subject headings, the categories of plant, legal name and product name showed high performance, whereas national treasure/treasure category showed low performance. In a large dataset, the ratio of subject headings that cannot be assigned increases, resulting in a decrease in final performance, and improvement is needed to increase classification performance for low-frequency subject headings.

keywords
Automatic Classification, Deep Learning, BERT Model, Automated Subject Indexing, Automated Subject Classification

Journal of the Korean Society for Library and Information Science