This study proposes a topic distillation algorithm that ranks the relevant sites selected from retrieved web pages, and evaluates the performance of the algorithm. The algorithm calculates the topic score of a site using its hierarchical structure. The TREC .GOV test collection and a set of TREC-2004 queries for topic distillation task are used for the experiment. The experimental results showed the algorithm returned at least 2 relevant sites in top ten retrieval results. We performed an in-depth analysis of the relevant sites list provided by TREC-2004 to find out that the definition of topic distillation was not strictly applied in selecting relevant sites. When we re-evaluated the retrieved sites/sub-sites using the revised list of relevant sites, the performance of the proposed algorithm was improved significantly.
(2003). 문서 내의 주제정보를 이용한 개선된 링크 분석 알고리즘. 30(2), 7-9.
(1998). Improved Algorithms for Topic Distillation in a Hyperlinked Environment. , 104-111.
(2002). When experts agree: Using non-affiliated Experts to rank popular topics. 20(1), 46-58.
(1999). Focused Crawling: A new approach to topic-specific web resource discovery. , -.
(2003). Task Descriptions: Web Track 2003. , -.
(2004). Overview of the TREC-2004 Web Track. , -.
(2003). Approaches to Robust and Web Retrieval. , -.
(1999). Authoritative sources in a hyperlinked environment. 46(5), 604-632.
(2005). Multiple sets of features for automatic genre classification of web documents. 41(5), 1263-1276.
(2002). Pliers at TREC 2002. , -.
(2003). University of Glasgow at the Web Track: Dynamic Application of Hyperlink Analysis using Query Scope. , -.
(2007). Topic distillation via sub-site retrieval. 43(2), 445-460.
(k.1976.). Relevance weighting of search terms. Journal of the American Society and Information Science. , 129-146.
(2000). Experimentation as a way of life Okapi at TREC. 36(1), 95-108.
(m.1994.). Okapi at TREC-3. In Proceedings of the Third Text Retrieval Conference. , 3-3.
(2004). Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004. , -.
(2003). Web Unit Mining - Finding and Classifying Subgraphs of Web Pages. , 108-115.
(2002). Experiments in Named Page Finding and Arabic Retrieval with Hummingbird SearchServer™ at TREC 2002. , -.
(2003). Robust, Web and Genomic Retrieval with Hummingbird SearchServer™ at TREC 2003. , -.
(2004). Microsoft Cambridge at TREC-13: Web and Hard Tracks. , -.
(2003). THUIR at TREC 2003: Novelty, Robust and Web. , -.
(2002). THU TREC-2002 Web Track Experiments. , -.