Selection of Cluster Hierarchy Depth and Initial Centroids in Hierarchical Clustering using K-Means Algorithm

이신원; 안동언; 정성종

doi:10.3743/KOSIM.2004.21.4.173

P-ISSN1013-0799
E-ISSN2586-2073
KCI

홈으로

OA 정책

논문 상세

이전 다음

논문 투고

Vol.21 No.4

Citation Share

K-Means 알고리즘을 이용한 계층적 클러스터링에서 클러스터 계층 깊이와 초기값 선정

Selection of Cluster Hierarchy Depth and Initial Centroids in Hierarchical Clustering using K-Means Algorithm

정보관리학회지 / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073

2004, v.21 no.4, pp.173-185

https://doi.org/10.3743/KOSIM.2004.21.4.173

이신원 (중원대학교)
안동언 (전북대학교)
정성종 (전북대학교)

이신원, 안동언, & 정성종. (2004). K-Means 알고리즘을 이용한 계층적 클러스터링에서 클러스터 계층 깊이와 초기값 선정. , 21(4), 173-185, https://doi.org/10.3743/KOSIM.2004.21.4.173

복사

초록

정보통신의 기술이 발달하면서 정보의 양이 많아지고 사용자의 질의에 대한 검색 결과 리스트도 많이 추출되므로 빠르고 고품질의 문서 클러스터링 알고리즘이 중요한 역할을 하고 있다. 많은 논문들이 계층적 클러스터링 방법을 이용하여 좋은 성능을 보이지만 시간이 많이 소요된다. 반면 K-means 알고리즘은 시간 복잡도를 줄일 수 있는 방법이다. 본 논문에서는 계층적 클러스터링 시스템인 콘도르(Condor) 시스템에서 간단하고 고품질이며 효율적으로 정보 검색 할 수 있도록 구현하였다. 이 시스템은 K-Means Algorithm을 이용하였으며 클러스터 계층 깊이와 초기값을 조절하여 88%의 정확율을 보였다.

keywords: 문서 클러스터링, K-Means 알고리즘, 클러스터 계층 깊이, 클러스터 초기값, 계층적 클러스터링, 클러스터 중심document clustering, K-Means algorithm, cluster hierarchy depth, cluster initial value, hierarchical clustering, cluster centroid, 문서 클러스터링, K-Means 알고리즘, 클러스터 계층 깊이, 클러스터 초기값, 계층적 클러스터링, 클러스터 중심document clustering, K-Means algorithm, cluster hierarchy depth, cluster initial value, hierarchical clustering, cluster centroid

Abstract

Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. In this paper, Condor system using K-Means algorithm Compares with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

keywords: 문서 클러스터링, K-Means 알고리즘, 클러스터 계층 깊이, 클러스터 초기값, 계층적 클러스터링, 클러스터 중심document clustering, K-Means algorithm, cluster hierarchy depth, cluster initial value, hierarchical clustering, cluster centroid, 문서 클러스터링, K-Means 알고리즘, 클러스터 계층 깊이, 클러스터 초기값, 계층적 클러스터링, 클러스터 중심document clustering, K-Means algorithm, cluster hierarchy depth, cluster initial value, hierarchical clustering, cluster centroid

참고문헌

김해남,. (2004). 계층적 클러스터링에서 분류 계층 깊이에 관한 연구 (-). 한국정보처리학회 춘계학술발표대회.

박순철. (2003). 콘도르 정보 검색 시스템. 한국산업정보학회지, 8(4), 31-37.

오형진. (2002). 클러스터 중심 결정 방법을 개선한 K-Means Algorithm의 구현.

오형진. (2003). 색인어 가중치 부여 방법에 따른 K-Means 문서 클러스터링의 LSI 분석. 한국정보처리학회지, 10(7), 735-742.

이경순. (2001). 정보검색에서 벡터공간 검색과 클러스터 분석을 통한 문서 순위 결정 모델.

이상선. (2004). 계층적 클러스터링에서 분류 대표어 선정에 관한 연구. 한국정보처리학회 춘계학술발표대회, , -.

Baeza-Yates. (1999). Modern Information Retrieval:Addison- Wesley.

Khaled Alsabti. (1998). An Efficient K- Means Clustering Algorithm (-). IPPS/SPDP Workshop on High Performance Data Mining.

Michael Steinbach. (2000). A Comparison of Document Clustering Techniques". Technical Report #00_034:University of Minnesota.

10.

Patrice Bellot. (1999). A Clustering Method for Information Retrieval. .

11.

Qin He. (1999). A Review of Clustering Algorithms as Applied in IR:UIUCLIS1999/6+IRG.

12.

Ramon A. (2000). A relative approach to hierarchical clustering (12-14). ACM symposium of Computa- tional geometry.

13.

Tapas Kanung. (2000). The Analysis of a Simple k-Means Clustering Algorithms (12-14). ACM symposium on Computational geometr.

바로가기메뉴

논문 상세

Vol.21 No.4

K-Means 알고리즘을 이용한 계층적 클러스터링에서 클러스터 계층 깊이와 초기값 선정

Selection of Cluster Hierarchy Depth and Initial Centroids in Hierarchical Clustering using K-Means Algorithm

초록

Abstract

참고문헌

정보관리학회지