Combining Multiple Sources of Evidenceto Enhance Web Search Performance

The Web is rich with various sources of information that go beyond the contents of documents, such as hyperlinks and manually classified directories of Web documents such as Yahoo. This research extends past fusion IR studies, which have repeatedly shown that combining multiple sources of evidence (i.e. fusion) can improve retrieval performance, by investigating the effects of combining three distinct retrieval approaches for Web IR: the text-based approach that leverages document texts, the link-based approach that leverages hyperlinks, and the classification-based approach that leverages Yahoo categories. Retrieval results of text-, link-, and classification-based methods were combined using variations of the linear combination formula to produce fusion results, which were compared to individual retrieval results using traditional retrieval evaluation metrics. Fusion results were also examined to ascertain the significance of overlap (i.e. the number of systems that retrieve a document) in fusion. The analysis of results suggests that the solution spaces of text-, link-, and classification-based retrieval methods are diverse enough for fusion to be beneficial while revealing important characteristics of the fusion environment, such as effects of system parameters and relationship between overlap, document ranking and relevance.

keywords: Fusion, Web search, Information retrieval, 융합, 웹검색, 정보검색

참고문헌

Bartell, Brian T., G. W. Cottrell and R. K. Belew. 1994. “Automatic combination of multiple ranked retrieval systems.” Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval.

Belkin, Nicholas J., C. Cool, W. B. Croft and J. P. Callan. 1993. “The effect of multiple query representations on information retrieval system performance.” Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, 339-346.

Bharat, Krishnaand M. R. Henzinger. 1998. “Improved Algorithms for Topic Distillation in Hyperlinked Environments.” Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 104-111.

Brin, Serge andL. Page. 1998. “The anatomy of a large-scale hyper textual Web search engine.” Computer networks and ISDN systems, 30(1): 107-117.

Buckley, Chris, G. Salton, J. Allan and A. Singhal. 1995. “Automatic query expansion using SMART: TREC 3.” In D. K. Harman (Ed.), The Third Text Rerieval Conference (TREC-3) (NIST Spec. Publ. 500-225, pp.1-19). Washington, DC: U.S. Government Printing Office.

Buckley, Chris, A. Singhal and M. Mitra. 1997. “Using query zoning and correlation within SMART: TREC 5.” In E. M. Voorhees & D. K. Harman (Eds.), The Fifth Text REtrieval Conference (TREC-5) (NIST Spec. Publ. 500-238, pp. 105-118). Washington, DC: U.S. Government Printing Office.

Buckley, Chris, A. Singhal, M. Mitra and G. Salton. 1996. “New retrieval approaches using SMART: TREC 4.” In D. K. Harman (Ed.), The Fourth Text REtrieval Conference (TREC-4) (NIST Spec. Publ. 500-236, pp. 25-48). Washington, DC: U.S. Government Printing Office.

Chakrabarti, Soumen, B. Dom, P. Raghavan, S. Rajagopalan, D. Gibson and J. Kleinberg. 1998. “Automatic resource list compilation by analyzing hyperlink structure and associated text.” Proceedings of the 7th International World Wide Web Conference.

Fishburn, Peter C. 1970. Utility theory for decision making. New York: John Wiley & Sons.

10.

Fox, Edward A. andJ. A. Shaw. 1994. “Combination of multiple searches.” In D. K. Harman (Ed.), The Second Text Rerieval Conference (TREC-2) (NIST Spec. Publ. 500-215, pp.243-252). Washington, DC: U.S. Government Printing Office.

11.

Fox, Edward A. and J. A. Shaw. 1995. “Combination of multiple searches.” In D. K. Harman (Ed.), The Third Text Rerieval Conference (TREC-3) (NIST Spec. Publ. 500-225, pp. 105-108). Washington, DC: U.S. Government Printing Office.

12.

Frakes, Williams B. and R.Baeza-Yates.eds. 1992. Information retrieval: Data structures & algorithms. Englewood Cliffs, NJ: Prentice Hall.

13.

Gurrin, Cathal and A. F.Smeaton. 2001. “Dublin City University experiments in connectivity analysis for TREC-9.” In E. M. Voorhees & D. K. Harman (Eds.), TheNineth Text Rerieval Conference(TREC-9). Washington, DC: U.S. Government Printing Office.

14.

Katzer, Jeffrey, M. J. McGill, J. A. Tessier, W. Frakes and P. DasGupta. 1982. “A study of the overlap among document representations.” Information Technology: Research and Development, 1, 261-274. Combining Multiple Sources of Evidence to Enhance Web Search Performance 31

15.

Keen, E. Michael. 1973. “The Aberystwyth index languages test.” Journal of Documentation, 29, 1-35.

16.

Kleinberg, Jon. 1999. “Authoritative sources in a hyperlinked environment.” Journal of the Association for Computing Machinery, 46(5), 604-632.

17.

Lee, Joon Ho. 1996. “Combining multiple evidence from different relevance feedback methods(Tech. Rep. No.IR-87).” Amherst: University of Massachusetts, Center for Intelligent Information Retrieval.

18.

Lee, Joon Ho. 1997. “Analyses of multiple evidence combination.” Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 267-276.

19.

Modha, Dharmendra and W. S. Spangler. 2000. “Clustering hypertext with applications to Web searching.” Proceedings of the 11th ACM Hypertext Conference, 143-152.

20.

Page, Larry, S. Brin, R. Motwani and T. Winograd.1998. “The Page Rank citation ranking: Bringing order to the Web.” Technical Report, Stanford Digital Library Technologies Project.

21.

Plaunt, Christian and B. A. Norgard. 1998. “An Association Based Method for Automatic Indexing with a Controlled Vocabulary.” Journal of the American Society for Information Science, 49(10): 888-902.

22.

Saracevic, Tefko and P. Kantor. 1988. “A study of information seeking and retrieving. III. Searchers, searches, overlap.” Journal of American Society for Information Science, 39: 197-216.

23.

Singhal, Amit, C. Buckley and M. Mitra. 1996. “Pivoted document length normalization.” Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 21-29.

24.

Smith, Linda. C. 1979. Selected Artificial Intelligence Techniques in Information Retrieval Systems Research. Ph. D. diss., Syracuse University, U. S.

25.

Sparck Jones, Karen. 1974. “Automatic indexing.” Journal of Documentation 30, 393-432.

26.

Sumner, Robert. G., K. Yang, R. Akers and W. M. Shaw. 1998. “Interactive retrieval using IRIS: TREC-6 experiments.” In E. M. Voorhees & D. K. Harman(Eds.), The Sixth Text REtrieval Conference(TREC-6).

27.

Vogt, Christopher. C and G. W. Cottrell. 1998. “Predicting the performance of linearly combined IR systems.” Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, 190-196.

28.

Williams, Martha E. 1977. “Analysis of terminology in various CAS data files as access points for retrieval.” Journal of Chemical Information and Computer Sciences, 17: 16-20.

29.

Wong, S. K. Michael, Y. Y. Yao and P.Bollmann. 1988. “Linear structure in information retrieval.” Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 219-232.

30.

Wong, S. K. Michael, Y. Y. Yao, G. Salton and C. Buckley. 1991. “Evaluation of an adaptive linear model.” Journal of the American Society for Information Science, 42: 723-730.

31.

Yang, Kiduk. 2005. “Information retrieval on the web.” ARIST, 39(1): 33-80.

바로가기메뉴

논문 상세

Vol.45 No.3

Combining Multiple Sources of Evidenceto Enhance Web Search Performance

Abstract

참고문헌

한국도서관·정보학회지