바로가기메뉴

본문 바로가기 주메뉴 바로가기

A Comparative Study on Clustering Methods for Grouping Related Tags

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2009, v.43 no.3, pp.399-416
https://doi.org/10.4275/KSLIS.2009.43.3.399

  • Downloaded
  • Viewed

Abstract

In this study, clustering methods with related tags were discussed for improving search and exploration in the tag space. The experiments were performed on 10 Delicious tags and the strongly-related tags extracted by each 300 documents, and hierarchical and non-hierarchical clustering methods were carried out based on the tag co-occurrences. To evaluate the experimental results, cluster relevance was measured. Results showed that Ward's method with cosine coefficient, which shows good performance to term clustering, was best performed with consistent clustering tendency. Furthermore, it was analyzed that cluster membership among related tags is based on users' tagging purposes or interest and can disambiguate word sense. Therefore, tag clusters would be helpful for improving search and exploration in the tag space.

keywords
Tag Clustering, Term Clustering, Related Tags, Tag-based Retrieval, 태그 클러스터링, 용어 클러스터링, 연관 태그, 태그 기반 정보검색, Tag Clustering, Term Clustering, Related Tags, Tag-based Retrieval

Reference

1.

박병재, 우종우. 2008. 연관 태그의 군집 알고리즘의 설계 및 구현. ꡔ한국IT서비스학회지ꡕ, 7(4): 199-208.

2.

유사라. 1999. ꡔ정보학연구와 분석방법론ꡕ. 서울: 나남출판.

3.

이순규, 김정훈, 이지형. 2008. 트랙백을 이용한 연관태그 클러스터링. ꡔ한국지능시스템학회 추계학술대회 학술발표논문집ꡕ, 18(2): 125-128.

4.

이시화, 이만형, 황대훈. Web2.0 환경에서의 효율적인 이미지 검색을 위한 태그 클러스터링 시스템의 설계 및 구현. ꡔ멀티미디어학회 논문지ꡕ, 11(8): 169-178.

5.

이재윤. 2007. 분포 유사도를 이용한 문헌클러스터링의 성능향상에 대한 연구. ꡔ정보관리학회지ꡕ, 24(4): 267-283.

6.

이재윤, 정도헌. 2008. 폭소노미 태그 사용 패턴 분석 통제어휘 및 비통제어휘와의 비교. ꡔ제15회 한국정보관리학회 학술대회 논문집ꡕ, 21-26.

7.

이정미. 2007. 폭소노미의 개념적 접근과 웹 정보 서비스에의 적용. ꡔ한국비블리아학회지ꡕ, 18(2): 141-159.

8.

정영미. 2005. ꡔ정보검색연구ꡕ. 서울: 구미무역(주)출판부.

9.

정충영, 최이규. 2009. ꡔSPSSWIN을 이용한 통계분석ꡕ. 제5판. 서울: 무역경영사.

10.

한승희. 2004. ꡔ클러스터링 기법을 이용한 개별문서의 지식구조 자동 생성에 관한 연구ꡕ. 박사학위논문, 연세대학교 대학원 문헌정보학과.

11.

Begelman, Grigory, Keller, Phillip, and Smadja, Frank. 2006. Automated tag clustering: Improving search and exploration in the tag space. [online]. [cited 2009.7.13]. <http://www.pui.ch/phred/automated_tag_clustering/>.

12.

Candan, K. Selçuk, Caroz, Di, Luigi, and Sapino, Luisa, Maria. 2008. “Creating tag hierarchies for effective navigation in social media." In Proceeding of the 2008 ACM Workshop on Search in Social Media, 75-82.

13.

Dagan, Ido, Lee, Lillian, and Pereira, Fernando. 1999. “Similarity-based models of cooccurrence probabilities." Machine Learning, 34(1-3): 43-69.

14.

Delicious. [online]. <http://delicious.com>.

15.

Ding, Y., Chowdhury, G. G., and Foo, S. 2001. “Bibliometric cartography of information retrieval research by using co-word analysis." Information Processing and Management, 37: 817-842.

16.

Fichter, Darlene 2006. “Intranet applications for tagging and folksonomies." Online, 30(3): 43-45.

17.

Hammond, Tony, Hannay, Timo, Lund, Ben, and Scott, Joanna. 2005. “Social bookmarking tools(I)." D-Lib Magazine, 11(4). [online]. [cited 2009.8.7]. <http://www.dlib.org/dlib/april05/hammond/04hammond.html>.

18.

Jardine, N., and Sibson, R. 1968. “The construction of hierarchic and non-hierarchic clas- sifications." The Computer Journal, 11(2): 177-184.

19.

Lee, Lillan. 1999. “Measures of distributional similarity." In Proceedings of 37th Annual Meeting of the Association for Computational Linguistics, 25-32.

20.

Mathes, Adam. 2004. Folksonomies – Cooperative Classification and Communication Through Shared Metadata. [online]. [cited 2008.7.31]. <http://www.adammathes.com/academic/computer-mediated-communication/folksonomies.html>.

21.

Milligan, G. W., Soon, S. C., and Sokol, L. M. 1983. “The effect of cluster size, dimensionality and the number of clusters on recovery of true cluster structure." IEEE Transactions on Patterns Analysis and Machine Intelligence, 5(1): 40-47.

22.

Schrammel, Johann, Leitner, Michael, and Tscheligi, Manfred. 2009. “Semantically structured tag clouds: An empirical evaluation of clustered presentation approaches." In Proceedings of the 27th international conference on Human factors in computing systems, 2037-2040.

23.

Shepitsen, Andriy, Janathan, Gemmell, Bamshad, Mobasher, and Robin, Burke. 2008. “Per- sonalized recommendation in social tagging systems using hierarchical clustering." In Proceedings of the 2008 ACM conference on Recommender systems, 259-266.

24.

Simpson, Edwin. 2008. Clustering Tags in Enterprise and Web Folksonomies. [online]. [cited 2009.7.13]. <http://www.hpl.hp.com/techreports/2007/HPL-2007-190.pdf>.

25.

Sneath, P. H. A., and Sokal, R. R. 1973. Numerical Taxonomy. SF: Freeman.

26.

Strehl, Alexander, Joydeep, Ghosh, and Raymond, Mooney. 2000. “Impact of similarity measures on web-page clustering." In Proceedings of the 17th National Conference on Artificial Intel- ligence: Workshop of Artificial Intelligence for Web Search(AAAI 2000), 58-64.

27.

Tombros, Anastasios. 2002. The Effects of Query-based Hierarchical Clustering of Documents for Information Retrieval. Ph.D. diss., Department of Computer Science, Cornell University.

28.

Voorhees, Ellen M. 1985. “The cluster hypothesis revisited." In Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 188-196.

29.

Ward, Joe H. 1963. “Hierarchical grouping to optimize an objective function." Journal of the American Statistical Association, 58: 236-244.

30.

Weeds, J. E. 2003. Measures and Applications of Lexical Distributional Similarity. Ph. D. diss., University of Sussex.

31.

Willet, Peter. 1988. “Recent trends in hierarchic document clustering: a critical review." Information Processing and Management, 24(5): 577-597.

32.

Xu, Rui, and Wunsch II, Donald C. 2009. Clustering. NJ: IEEE Press.

33.

Yi, Kwan. 2009. “Mining semantically similar tags from delicious." Journal of the Korean Society for Information Science, 26(2): 127-147.

Journal of the Korean Society for Library and Information Science