An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest

김판준

doi:10.3743/KOSIM.2019.36.2.057

P-ISSN1013-0799
E-ISSN2586-2073
KCI

Home

OA Policy

Article Contents

Prev Next

e-Submission

Vol.36 No.2

Citation Share

An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073

2019, v.36 no.2, pp.57-77

https://doi.org/10.3743/KOSIM.2019.36.2.057

(2019). An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest. Journal of the Korean Society for Information Management, 36(2), 57-77, https://doi.org/10.3743/KOSIM.2019.36.2.057

copy

Abstract

Random Forest (RF), a representative ensemble technique, was applied to automatic classification of journal articles in the field of library and information science. Especially, I performed various experiments on the main factors such as tree number, feature selection, and learning set size in terms of classification performance that automatically assigns class labels to domestic journals. Through this, I explored ways to optimize the performance of random forests (RF) for imbalanced datasets in real environments. Consequently, for the automatic classification of domestic journal articles, Random Forest (RF) can be expected to have the best classification performance when using tree number interval 100〜1000(C), small feature set (10%) based on chi-square statistic (CHI), and most learning sets (9-10 years).

keywords: 자동분류, 자동주석, 디지털 큐레이션, 학술지 논문, 랜덤포레스트(RF), 복수-범주 분류, 불균형 데이터, 자질선정, automatic classification, automatic annotation, digital curation, journal articles, random forest (RF), multi-label classification, imbalanced data, feature selection

바로가기메뉴

Article Contents

Vol.36 No.2

An Analytical Study on Automatic Classification of Domestic Journal articles Using Random Forest

Abstract

Journal of the Korean Society for Information Management