바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

  • P-ISSN1013-0799
  • E-ISSN2586-2073
  • KCI

An Experimental Study on Selecting Association Terms Using Text Mining Techniques

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2006, v.23 no.3, pp.147-165
https://doi.org/10.3743/KOSIM.2006.23.3.147


Abstract

In this study, experiments for selection of association terms were conducted in order to discover the optimum method in selecting additional terms that are related to an initial query term. Association term sets were generated by using support, confidence, and lift measures of the Apriori algorithm, and also by using the similarity measures such as GSS, Jaccard coefficient, cosine coefficient, and Sokal & Sneath 5, and mutual information. In performance evaluation of term selection methods, precision of association terms as well as the overlap ratio of association terms and relevant documents' indexing terms were used. It was found that Apriori algorithm and GSS achieved the highest level of performances.

keywords
text mining, association terms, similarity measures, Apriori algorithm, term clustering, 텍스트 마이닝, 연관용어, 유사계수, Apriori 알고리즘, 용어 클러스터링

Reference

1.

박우창. (2003). 데이터마이닝: 개념 및 기법. , -.

2.

이재윤. (2004). 연관성 척도의 빈도수준 선호경향에 대한 연구. 정보관리학회지, 21(4), 281-294.

3.

정영미. (2005). 정보검색연구. , -.

4.

Mining Association Rules between Sets of Items in Large Database Proceeding of the ACM SIGMOD International Conference on Management of Data. , 207-216.

5.

(r.1994). Fast Algorithms for Mining Association Rules Proceeding of the 20th International Conference on Very Large Databases. , -.

6.

(1997). Exploiting Clustering and Phrases for Context-Based Information Retrieval Proceeding of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. , 314-323.

7.

(1999). Modern Information Retrieval. , -.

8.

(j.1994). The Effect of Adding Relevance Information in a Relevance Feedback Environment Proceeding of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 292-300.. , -.

9.

(2004). Optimization of Some Factors Affecting the Performance of Query Expansion. Information Processing and Management. 0(6), 891-917.

10.

(1996). Advances in Knowledge Discovery and Data Mining. MIT Press.. , -.

11.

(2000). Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization Proc. of ECDL-00. 4th European Conference on Research and Advanced Technology for Digital Libraries. , 59-68.

12.

(ed.thesmartretrievalsystemexperimentsinautomaticdocumentprocessing.337-354.). New Experiments in Relevance Feedback. , -.

13.

(1999). A Comparison of Collocation-Based Similarity Measures in Query Expansion. 35(1), -.

14.

(1). Association in Document Retrieval Systems. , 27-38.

15.

(1997). Data Mining Techniques: For Marketing, Sales, and Customer Support. , -.

16.

(p.1991). The Limitation of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems Journal of the American Society for Information Science. , 378-383.

17.

(h.p.1993). Proceedings of the 16th Annual International ACM SIGIR conference on Research and Development in Information Retrieval. 160-169.. , -.

18.

(ed.thesmartretrievalsystemexperimentsinautomaticdocumentprocessing.313-323.). Relevance Feedback in Information Retrieval. , -.

19.

(1999). Novel Query Expansion Technique using Apriori Algorithm. , -.

20.

(1999). Text Mining. 34, 385-419.

21.

(2000). Mining Term Rules for Automatic Global Query Expansion: Methodology and Preliminary Results. , 366-373.

22.

(1996.). Query Expansion using Local and Global Document Analysis. , 4-11.

Journal of the Korean Society for Information Management