바로가기메뉴

본문 바로가기 주메뉴 바로가기

Research on Multi-facted News Article Classification Models Classifying Subjects, Geographies and Genres

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2024, v.58 no.3, pp.65-89
https://doi.org/10.4275/KSLIS.2024.58.3.065
Hyojin Lee
Sung-Pil Choi

Abstract

This study developed a model to classify news articles into categories of topic, genre, and region using a Korean Pre-trained Language model. To achieve this, a new news article classification system was designed by referring to the classification systems of domestic media outlets. The topic and genre classification models were implemented as hierarchical classification models that link the main categories and subcategories, and their performance was compared with that of an integrated category model. The evaluation results showed that the hierarchical structure classification model had the advantage of providing more precise categorization in ambiguous or overlapping categories compared to the integrated category model. For regional classification of news articles, a model was built to classify into 18 categories, and for regional news articles, the regional characteristics were clearly reflected in the text, resulting in high performance. This study demonstrated the effectiveness of classifying news articles from multiple perspectives—topic, genre, and region—and emphasized the significance of suggesting the potential for a multi-dimensional news article classification service that meets user needs.

keywords
BERT Model, News Article Classification, Hierarchical Classification Model, Multi-Class Classification Model, Multidimensional classification

Journal of the Korean Society for Library and Information Science