Hierarchic Document Clustering in OPAC

본 연구는 OPAC에서 계층 클러스터링을 응용하여 소장자료를 계층구조로 분류하여 열람하는데 사용될 수 있는 최적의 계층 클러스터링 모형을 찾기 위한 목적으로 수행되었다. 문헌정보학 분야 단행본과 학위논문으로 실험집단을 구축하여 다양한 색인기법(서명단어 자동색인과 통제어 통합색인)과 용어가중치 기법(절대빈도와 이진빈도), 유사도 계수(다이스, 자카드, 피어슨, 코싸인, 제곱 유클리드), 클러스터링 기법(집단간 평균연결, 집단내 평균연결, 완전연결)을 변수로 실험하였다. 연구결과 집단간 평균연결법과 제곱 유클리드 유사도를 제외하고 나머지 유사도 계수와 클러스터링 기법은 비교적 우수한 클러스터를 생성하였으나, 통제어 통합색인을 이진빈도로 가중치를 부여하여 완전연결법과 집단간 평균연결법으로 클러스터링 하였을 때 가장 좋은 클러스터가 생성되었다. 그러나 자카드 유사도 계수를 사용한 집단간 평균연결법이 십진구조와 더 유사하였다.

keywords: 온라인 목록, 문헌 클러스터링, 계층 클러스터링, 자동분류, 열람, 유사도 계수, OPAC, document clustering, hierarchic clustering, automatic classification, browsing, similarity coefficient

Abstract

This study is to develop a hiararchic clustering model for document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean), and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.

keywords: 온라인 목록, 문헌 클러스터링, 계층 클러스터링, 자동분류, 열람, 유사도 계수, OPAC, document clustering, hierarchic clustering, automatic classification, browsing, similarity coefficient

참고문헌

(2001). 지식분류의 자동화를 위한 클러스터링 모형 연구. 18(2), 203-230.

(1999). 문헌클러스터링을 위한 유사계수 간의 연관성 측정. 8, 25-28.

(1999). A Lorgitudinal study of the effects of OPAC screen changes on searching behavior and searcher success. 60(Nov.), 515-530.

(1996). Ordering author and work records: an evaluation of colledtion in online catalog displays. 47(7), 538-554.

(2001). Predicting the relevance of a library catalog search. 52(10), 812-827.

(1980189-195). A Model of cluster searching based on classification. , -.

(1992). a cluster-based approach to browsing large document collections Processing of the 15th Annual International ACM SIGIR Conference on Research and development in Information Retrieval. , 318-329.

(1989). Comparison of hierarchic agglomerative clustering methods for document retrieval. , 220-227.

(1985). Automatic classification of book material represented by back-of-book index Journal of Documentation. , 135-155.

10.

(1983). An Experiment in automatic hierachical document classification. , 113-120.

11.

(1984). Hierarchic agglomerative clustering methods for automatic document classification Journal of Documentation. , 175-205.

12.

(1996). Reexamining the cluster hypothesis: Scatter/Gather on retrieval results. , 76-84.

13.

(1986). Workload Characteristics and Computer System Utilization in Online Library Catalog University of California. , -.

14.

(1991197-215). The Decline of subject searching long-term trends and patterns of index use in an online catalog Journal of the American Society for Information Science. , -.

15.

(1998). Evaluating a visual navigation system for a digital library. , 535-554.

16.

(1998). The WebCluster Project Using clustering for mediating access to the world Wide Web. , 357-358.

17.

(1973189-190). Clustering as an output option Proceedings of the American Society for information Science. , -.

18.

(2001). Information navigation on the web by clustering and summarizing query results. 37, 789-816.

19.

(1971). The SMART Retrieval System-Experiments in Automatic Document Retrieval. , -.

20.

(1997). Almost-constant-time clustering of arbitary corpus subsets. , 60-66.

21.

(2002). The Effectiveness of query-specific hierarchic clustering in information retrieval. 38(4), 559-582.

22.

(1998). Title key-words and subject descriptors: a comparison of subject entries of books in the humanity and social science. 54(4), 466-476.

23.

(1995). User persistence in displaying online catalog posting: LUIS. 39(3), 247-264.

24.

(1985). Internation Forum on Information and Documentation. , 28-32.

25.

(1998). Web document clustering: A feasibility demonstration. , 46-54.

26.

(1991). Monitoring user success through transaction log analysis. , 49-56.

바로가기메뉴

논문 상세

Vol.21 No.1

OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구

Hierarchic Document Clustering in OPAC

초록

Abstract

참고문헌

정보관리학회지