바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

A Study on Information Resource Evaluation for Text Categorization

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2007, v.24 no.4, pp.305-321
https://doi.org/10.3743/KOSIM.2007.24.4.305

  • Downloaded
  • Viewed

Abstract

The purpose of this study is to examine whether the information resources referenced by human indexers during indexing process are effective on Text Categorization. More specifically, information resources from bibliographic information as well as full text information were explored in the context of a typical scientific journal article data set. The experiment results pointed out that information resources such as citation, source title, and title were not significantly different with full text. Whereas keyword was found to be significantly different with full text. The findings of this study identify that information resources referenced by human indexers can be considered good candidates for text categorization for automatic subject term assignment.

keywords
Text Categorization, 문서범주화, 자동색인, 정보원, Text Categorization, 주제색인과정, Text Categorization

Reference

1.

Chan, L.M. (1981). Cataloging and classification: An introduction. , -.

2.

Chan, L.M. (1987). Instructional materials used in teaching cataloging and classification. , 131-144.

3.

Chu, C.M. (1993). Subject analysis: The critical first stage in indexing. , 439-454.

4.

Cunningham, S.J. (1999). Applications of machine learning in information retrieval. 34, 341-384.

5.

Diaz, I. (2004). Improving performance of text categorization by combining filtering and support vector machines. 55(7), 579-592.

6.

Efron, M. (2004). Machine learning for information architecture in a large governmental website. , 151-159.

7.

(2006). Engineering Village. 2, -.

8.

Foskett, A.C.. (1996). The Subject Approach to Information. , -.

9.

(1985). Documentation-methods for examining documents: Determining their subjects and selecting indexing terms. , -5963.

10.

Jeng, L.H.. (1996). Using verbal reports to understand cataloging expertise: Two cases. 40(4), 343-358.

11.

Joachims, T. (1998). Text categorization with support vector machine: Learning with many relevant features. , 137-142.

12.

Larkey, L.S.. (1999). A patent search and classification system. , 179-187.

13.

Lewis, D.D. (1995). Evaluating and optimizing autonomous text categorization systems. , -.

14.

Mai, J.E.. (2005). Analysis in indexing: document and domain centered approaches. 41, 599-611.

15.

Mitchell, J.S. (2003). Dewey Decimal Classification and Relative Index. , -.

16.

Moens, M.F.. (2000). Automatic Indexing and Abstracting of Document Texts. , -.

17.

O′Connor, B.C.. (1996). Explorations in Indexing and Abstracting: pointing, virtue, and power. , -.

18.

Porter, M.F. (1980). An algorithm for suffix stripping. , 130-137.

19.

Sauperl, A. (2002). Subject determination during the cataloging process. , -.

20.

Sauperl, A. (2004). Catalogers′ common ground and shared knowledge. 55(1), 55-63.

21.

Sebastiani, F.. (2002). Hypertext categorization. , 109-129.

22.

Sebastiani, F. (2005). Text categorization. , 109-129.

23.

Slattery, S.. (2002). Hypertext categorization. , -.

24.

Taylor, A.G. (2003). The organization of information. , -.

25.

van Rijsbergen, C.J. (1979). Information Retrieval. , -.

26.

Witten, I.H. (2000). Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations. , -.

27.

Yang, Y.. (1999). An evaluation of statistical approaches to text categorization. 1, 69-90.

28.

Zhang, B. (2004). Combining structural and citation-based evidence for text categorization. , 162-163.

Journal of the Korean Society for Information Management