바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

An Experimental Study on Feature Ranking Schemes for Text Classification

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2023, v.40 no.1, pp.1-21
https://doi.org/10.3743/KOSIM.2023.40.1.001
Pan Jun KIm (Silla University)
  • Downloaded
  • Viewed

Abstract

This study specifically reviewed the performance of the ranking schemes as an efficient feature selection method for text classification. Until now, feature ranking schemes are mostly based on document frequency, and relatively few cases have used the term frequency. Therefore, the performance of single ranking metrics using term frequency and document frequency individually was examined as a feature selection method for text classification, and then the performance of combination ranking schemes using both was reviewed. Specifically, a classification experiment was conducted in an environment using two data sets (Reuters-21578, 20NG) and five classifiers (SVM, NB, ROC, TRA, RNN), and to secure the reliability of the results, 5-Fold cross-validation and t-test were applied. As a result, as a single ranking scheme, the document frequency-based single ranking metric (chi) showed good performance overall. In addition, it was found that there was no significant difference between the highest-performance single ranking and the combination ranking schemes. Therefore, in an environment where sufficient learning documents can be secured in text classification, it is more efficient to use a single ranking metric (chi) based on document frequency as a feature selection method.

keywords
text classification, text categorization, feature selection, feature ranking schemes, document frequency, term frequency
Submission Date
2023-02-01
Revised Date
2023-03-08
Accepted Date
2023-03-17

Journal of the Korean Society for Information Management