바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

INTERNATIONAL JOURNAL OF CONTENTS / INTERNATIONAL JOURNAL OF CONTENTS, (P)1738-6764; (E)2093-7504
2005, v.1 no.2, pp.26-30
Kang Yun-Hee

Abstract

According to the fast growth of information on the Internet, it is becoming increasingly difficult to find and organize useful information. To reduce information overload, it needs to exploit automatic text classification for handling enormous documents. Support Vector Machine (SVM) is a model that is calculated as a weighted sum of kernel function outputs. This paper describes a document classifier for web documents in the fields of Information Technology and uses SVM to learn a model, which is constructed from the training sets and its representative terms. The basic idea is to exploit the representative terms meaning distribution in coherent thematic texts of each category by simple statistics methods. Vector-space model is applied to represent documents in the categories by using feature selection scheme based on TFiDF. We apply a category factor which represents effects in category of any term to the feature selection. Experiments show the results of categorization and the correlation of vector length.

keywords
Document classification, Vector-space model, SVM, Feature selection, Category factor

INTERNATIONAL JOURNAL OF CONTENTS