바로가기메뉴

본문 바로가기 주메뉴 바로가기

The Study on the Effective Automatic Classification of Internet Document Using the Machine Learning

Journal of Korean Library and Information Science Society / Journal of Korean Library and Information Science Society, (P)2466-2542;
2001, v.32 no.3, pp.307-330
노영희

Abstract

This study experimented the performance of categorization methods using the kNN classifier. Most sample based automatic text categorization techniques like the kNN classifier reduces the feature set of the training documents. We sought to find out which percentage reductions in the feature set would result in high performances. In addition, the kNN classifier has to find the k number of training documents most similar to the test documents in the training documents. We sought to verify the most appropriate k value through experiments.

keywords
Automatic Text Categorization Techniques, kNN Classifier

Reference

1.

2.

van Rijsbergen, C. J.. .

3.

McKierman, Gerry. .

4.

Dahlberg, Ingtraut. (1995). The Future of Classification in Libraries and Networks, a Theoretical Point of View. Cataloging & Classification Quarterly, 21(2), 23-36.

5.

6.

Lewis, David D.(et al.). (1996). Training algorithms for linear text classifiers (298-306). Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

7.

8.

김영보. .

9.

최재황. (1998). 인터넷 학술정보자원의 디렉토리 서비스 설계에 있어서 DDC 분류체계의 활용에 관한 연구. 정보관리학회지, 15(2), 47-67.

10.

Belur V. Dasarathy. .

11.

이영숙;정영미. (2000). KNN 분류기의 범주할당 방법 비교 실험. 정보관리학회 학술대회 논문집, , 37-40.

12.

Iwayama, Makato;Takenobu Tokunaga. (1995). Cluster-based Text Categorization: a Comparison of Category Search Strategies (273-281). Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

13.

Weiss, Ron(et al.). (1996). HyPursuit: a Hierarchical Network Search Engine That Exploits Content-Link Hypertext Clustering . Proceedings of the Seventh ACM Conference on Hypertext.

14.

Yang, Y.. (1994). Expert Network : Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval (11-21). Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

15.

이명희. (1997). 네트웍 데이터베이스에서의 주제별 디렉토리와 키워드 탐색엔진의 탐색효율에 관한 탐색적 연구. 한국문헌정보학회지, 3(2), 177-197.

16.

Markey, K.;A. N. Demeyer. .

17.

Moulinier, I.;G. Raskinis;J. Ganascia. (1996). Text Categorization: a Symbolic Approach . Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval.

18.

Masand, B.;G. Linoff;D.Waltz. (1992). Classifying News Stories Using Memory Based Reseonin (59-64). Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

19.

Ard, Anders;Koch, Traugott. .

20.

21.

최희윤. (1998). 인터넷 정보서비스의 분류체계에 대한 비교연구 : 물리학을 중심으로. 정보관리학회지, 15(3), 45-72.

22.

Yang, Y.. (1999). An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval, 1(1-2), 67-88.

23.

Yang, Y.;Xin Liu. (1999). A Re-examination of Text Categorization Methods (42-49). Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

24.

Hearst, M.A(et al.). (1998). Support vector machines. IEEE Intelligent Systems, 13(4), 18-28.

25.

Svenonius, Elaine. (1983). Use of Classification in Online Retrieval. Library Resources and Technical Services, 27(1), 76-80.

26.

Yang, Y.. .

27.

Cohen, William W.;Yoram Singer. (1996). Context-Sensitive Learning Methods for Text Categorization (307-315). Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

28.

Vizine-Goetz, Diane. .

Journal of Korean Library and Information Science Society