바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

기계학습을 기반으로 한 인터넷 학술문서의 효과적 자동분류에 관한 연구

The Study on the Effective Automatic Classification of Internet Document Using the Machine Learning

한국도서관·정보학회지 / Journal of Korean Library and Information Science Society, (P)2466-2542;
2001, v.32 no.3, pp.307-330
노영희 (이화여대 국제정보센터)
  • 다운로드 수
  • 조회수

초록

본 연구에서는 kNN분류기를 이용한 범주화 방법에 대한 성능 실험을 하였다. kNN분류기와 같은 대부분의 예제기반 자동 분류기법은 학습문서집단의 자질을 축소하게 되는데 자질을 몇 퍼센트 축소함으로써 높은 성능을 얻을 수 있는지를 알아보고자 하였다. 또한, kNN분류기는 학습문서집단에서 검증문서와 가장 유사한 k개의 학습문서를 찾아야 하는데, 이때 가장 적합한 k값은 얼마인지를 실험을 통하여 검증하여 보고자 하였다.

keywords
Automatic Text Categorization Techniques, kNN Classifier

Abstract

This study experimented the performance of categorization methods using the kNN classifier. Most sample based automatic text categorization techniques like the kNN classifier reduces the feature set of the training documents. We sought to find out which percentage reductions in the feature set would result in high performances. In addition, the kNN classifier has to find the k number of training documents most similar to the test documents in the training documents. We sought to verify the most appropriate k value through experiments.

keywords
Automatic Text Categorization Techniques, kNN Classifier

참고문헌

1.

2.

van Rijsbergen, C. J.. .

3.

McKierman, Gerry. .

4.

Dahlberg, Ingtraut. (1995). The Future of Classification in Libraries and Networks, a Theoretical Point of View. Cataloging & Classification Quarterly, 21(2), 23-36.

5.

6.

Lewis, David D.(et al.). (1996). Training algorithms for linear text classifiers (298-306). Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

7.

8.

김영보. .

9.

최재황. (1998). 인터넷 학술정보자원의 디렉토리 서비스 설계에 있어서 DDC 분류체계의 활용에 관한 연구. 정보관리학회지, 15(2), 47-67.

10.

Belur V. Dasarathy. .

11.

이영숙;정영미. (2000). KNN 분류기의 범주할당 방법 비교 실험. 정보관리학회 학술대회 논문집, , 37-40.

12.

Iwayama, Makato;Takenobu Tokunaga. (1995). Cluster-based Text Categorization: a Comparison of Category Search Strategies (273-281). Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

13.

Weiss, Ron(et al.). (1996). HyPursuit: a Hierarchical Network Search Engine That Exploits Content-Link Hypertext Clustering . Proceedings of the Seventh ACM Conference on Hypertext.

14.

Yang, Y.. (1994). Expert Network : Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval (11-21). Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

15.

이명희. (1997). 네트웍 데이터베이스에서의 주제별 디렉토리와 키워드 탐색엔진의 탐색효율에 관한 탐색적 연구. 한국문헌정보학회지, 3(2), 177-197.

16.

Markey, K.;A. N. Demeyer. .

17.

Moulinier, I.;G. Raskinis;J. Ganascia. (1996). Text Categorization: a Symbolic Approach . Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval.

18.

Masand, B.;G. Linoff;D.Waltz. (1992). Classifying News Stories Using Memory Based Reseonin (59-64). Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

19.

Ard, Anders;Koch, Traugott. .

20.

21.

최희윤. (1998). 인터넷 정보서비스의 분류체계에 대한 비교연구 : 물리학을 중심으로. 정보관리학회지, 15(3), 45-72.

22.

Yang, Y.. (1999). An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval, 1(1-2), 67-88.

23.

Yang, Y.;Xin Liu. (1999). A Re-examination of Text Categorization Methods (42-49). Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

24.

Hearst, M.A(et al.). (1998). Support vector machines. IEEE Intelligent Systems, 13(4), 18-28.

25.

Svenonius, Elaine. (1983). Use of Classification in Online Retrieval. Library Resources and Technical Services, 27(1), 76-80.

26.

Yang, Y.. .

27.

Cohen, William W.;Yoram Singer. (1996). Context-Sensitive Learning Methods for Text Categorization (307-315). Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR).

28.

Vizine-Goetz, Diane. .

한국도서관·정보학회지