바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

  • P-ISSN1013-0799
  • E-ISSN2586-2073
  • KCI

An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2019, v.36 no.2, pp.57-77
https://doi.org/10.3743/KOSIM.2019.36.2.057

Abstract

Random Forest (RF), a representative ensemble technique, was applied to automatic classification of journal articles in the field of library and information science. Especially, I performed various experiments on the main factors such as tree number, feature selection, and learning set size in terms of classification performance that automatically assigns class labels to domestic journals. Through this, I explored ways to optimize the performance of random forests (RF) for imbalanced datasets in real environments. Consequently, for the automatic classification of domestic journal articles, Random Forest (RF) can be expected to have the best classification performance when using tree number interval 100〜1000(C), small feature set (10%) based on chi-square statistic (CHI), and most learning sets (9-10 years).

keywords
자동분류, 자동주석, 디지털 큐레이션, 학술지 논문, 랜덤포레스트(RF), 복수-범주 분류, 불균형 데이터, 자질선정, automatic classification, automatic annotation, digital curation, journal articles, random forest (RF), multi-label classification, imbalanced data, feature selection

Journal of the Korean Society for Information Management