바로가기메뉴

본문 바로가기 주메뉴 바로가기

The Experimental Study on the Relationship between Hierarchical Agglomerative Clustering and Compound Nouns Indexing

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2004, v.38 no.4, pp.179-192


Abstract

In this paper, we present that the result of document clustering can change dramatically with respect to the different ways of indexing compound nouns. First of all, the automatic indexing engine specialized for Korean words analysis, which also serves as the backbone engine for automatic document clustering system, is introduced. Then, the details of hierarchical agglomerative clustering(HAC) method, one of the widely used clustering methodologies in these days, was illustrated. As the result of observing the experiments, carried out in the final part of this paper, it comes to the conclusion that the various modes of indexing compound nouns have an effect on the outcome of HAC.

keywords
Automatic Indexing, Document Clustering, Korean Morphological Analysis, 자동색인, 문서클러스터링, 한글 형태소 분석, Automatic Indexing, Document Clustering, Korean Morphological Analysis

Reference

1.

(2001) 의미정보의 효율적인 분류를 위한 계층적 중복 문서 클러스터링 ,

2.

(1993) 음절 정보와 복수어 단위정보를 이용한 한국어 형태소 분석 서울대학교 컴퓨터공학과 박사학위논문,

3.

(1995) 어절간 연관관계와 오류 유형 추정 규칙에 기반한 한국어 철자교정기,

4.

(2001) 어절 분석 기반 형태소 분석 시스템 개발에 관한 연구,

5.

(2002) 자동 색인을 위한 한국어 형태소 분석기의 실제적인 구현 및 적용,

6.

(1998) 오류분석정보와 복합명사의 의미처리규칙 및 말뭉치를 이용한 철자 교정기의 성능개선,

7.

(1991) 한국어 철자 검색을 위한 형태소 분석 기법, 국어정보학회

8.

(1999) Mordern Information Retrieval, ACM Press

9.

(1993) The MIT Press,

10.

(2000) DATA MINING Methods for Knowledge Discovery, Kluwer Academic Publishers

11.

(1999) Foundations of Statistical Natural Language Processing, The MIT Press

12.

(2002) A First Course in Probability, Prentice Hall

Journal of the Korean Society for Library and Information Science