The Experimental Study on the Relationship between Hierarchical Agglomerative Clustering and Compound Nouns Indexing

조현양; 최성필

P-ISSN1225-598X
E-ISSN2982-6292

홈으로

논문 상세

이전 다음

논문 투고

Vol.38 No.4

Citation Share

계층적 결합형 문서 클러스터링 시스템과 복합명사 색인방법과의 연관관계 연구

The Experimental Study on the Relationship between Hierarchical Agglomerative Clustering and Compound Nouns Indexing

한국문헌정보학회지 / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292

2004, v.38 no.4, pp.179-192

조현양 (경기대학교)
최성필 (한국과학기술정보연구원)

조현양, & 최성필. (2004). 계층적 결합형 문서 클러스터링 시스템과 복합명사 색인방법과의 연관관계 연구. , 38(4), 179-192.

복사

초록

본 논문에서는 복합명사에 대한 색인 방법을 다각적으로 적용하여 계층적 결합 문서 클러스터링 시스템의 결과를 분석한다. 우선 한글 색인 엔진과 HAC(Hierarchical Agglomerative Clustering) 엔진에 대해서 설명하고 한글 색인 엔진에서 제공되는 3가지 복합명사 분석 모드에 대해서 기술한다. 또한 구현된 클러스터링 엔진의 특징과 속도 향상을 위한 기법 등을 예시한다. 실험에서는 3가지 복합명사 색인 방법을 기준으로 문서 클러스터링을 수행하고, 실험 결과에 대한 분석에서 복합명사에 대한 색인 방법이 문서 클러스터링의 결과에 직접적인 영향을 준다는 것을 보여준다.

keywords: Automatic Indexing, Document Clustering, Korean Morphological Analysis, 자동색인, 문서클러스터링, 한글 형태소 분석, Automatic Indexing, Document Clustering, Korean Morphological Analysis

Abstract

In this paper, we present that the result of document clustering can change dramatically with respect to the different ways of indexing compound nouns. First of all, the automatic indexing engine specialized for Korean words analysis, which also serves as the backbone engine for automatic document clustering system, is introduced. Then, the details of hierarchical agglomerative clustering(HAC) method, one of the widely used clustering methodologies in these days, was illustrated. As the result of observing the experiments, carried out in the final part of this paper, it comes to the conclusion that the various modes of indexing compound nouns have an effect on the outcome of HAC.

keywords: Automatic Indexing, Document Clustering, Korean Morphological Analysis, 자동색인, 문서클러스터링, 한글 형태소 분석, Automatic Indexing, Document Clustering, Korean Morphological Analysis

참고문헌

(2001) 의미정보의 효율적인 분류를 위한 계층적 중복 문서 클러스터링 ,

(1993) 음절 정보와 복수어 단위정보를 이용한 한국어 형태소 분석 서울대학교 컴퓨터공학과 박사학위논문,

(1995) 어절간 연관관계와 오류 유형 추정 규칙에 기반한 한국어 철자교정기,

(2001) 어절 분석 기반 형태소 분석 시스템 개발에 관한 연구,

(2002) 자동 색인을 위한 한국어 형태소 분석기의 실제적인 구현 및 적용,

(1998) 오류분석정보와 복합명사의 의미처리규칙 및 말뭉치를 이용한 철자 교정기의 성능개선,

(1991) 한국어 철자 검색을 위한 형태소 분석 기법, 국어정보학회

(1999) Mordern Information Retrieval, ACM Press

(1993) The MIT Press,

10.

(2000) DATA MINING Methods for Knowledge Discovery, Kluwer Academic Publishers

11.

(1999) Foundations of Statistical Natural Language Processing, The MIT Press

12.

(2002) A First Course in Probability, Prentice Hall

바로가기메뉴

논문 상세

Vol.38 No.4

계층적 결합형 문서 클러스터링 시스템과 복합명사 색인방법과의 연관관계 연구

The Experimental Study on the Relationship between Hierarchical Agglomerative Clustering and Compound Nouns Indexing

초록

Abstract

참고문헌

한국문헌정보학회지