바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Improving the Performance of SVM Text Categorization with Inter-document Similarities

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2005, v.22 no.3, pp.261-287
https://doi.org/10.3743/KOSIM.2005.22.3.261

  • Downloaded
  • Viewed

Abstract

The purpose of this paper is to explore the ways to improve the performance of SVM(Support Vector Machines) text classifier using inter-document similarit ies. SVMs are powerful machine technique for automatic document classification. In this paper text categorization via SVMs aproach based on feature representation with document vectors is suggested. In this appr oach, document vectors instead stead of term weights are used as feature values. Experiments show that SVM clasifier with do cument vector features can improve the document classification performance. For the sake o f run-time efficiency, two methods are developed: One is to select document vector feature s, and the other is to use category centroid vector features instead. Experiments on these two methods show that we the performance of conventional methods with index term features.

keywords
문헌자동분류, 문서범주화, SVM 분류기, 분류자질, 문헌유사도, automatic document classification, text categorization, SVM classifier, classificationfeatures, document similarity, automatic document classification, text categorization, SVM classifier, classificationfeatures, document similarity

Reference

1.

(2000). 한국어 테스트 컬렉션 HANTEC의 확장및 보완. , 210-215.

2.

(2001). 지식 분류의 자동화를 위한클러스터링 모형 연구. , 203-230.

3.

(2000). SVM 분류기를 이용한 문서범주화 연구. , 229-248.

4.

(2003). “Support vector machines for textcategorization Proceedings of the 36thHawaii International Conference onSystem Sciences. , -.

5.

(2003). “Automating hierarchical documentclassification for constructionmanagement information systems Automation in Construction. , 395-406.

6.

(2000). AnIntroduction to Support VectorMachines and Other Kernel-basedLearning Methods. , -.

7.

(1998). “Inductive learningalgorithms and representations for textcategorization Proceedings of theSeventh International Conference onInformation and KnowledgeManagement. , 148-155.

8.

(1998). Proceedings of the 10th EuropeanConference on Machine Learning. , -.

9.

(2001). Using Unlabeled Data toImprove Text Classification. , -.

10.

(2002). Proceedings of theEleventh International Conference onInformation and Knowledge Management. , 659-661.

11.

(1983). Introduction to Modern InformationRetrieval. , -.

12.

(19991995). “Featureselection in SVM text categorization Proceedings of the 16th NationalConference on Artificial Intelligence The Nature of Statistical. haru, 480-99 486.

13.

(2000). Learning Theory. New York: Springer. Witten, Ian H., and Eibe Frank.. , -.

14.

(1997). “Acomparative study on feature selectionin text categorization Proceedings of theFourteenth International Conference onMachine Learning. , 412-420.

15.

(1999). Proceedings of the ACM SIGIRConference on Research andDevelopment in Information Retrieval. , 42-49.

Journal of the Korean Society for Information Management