바로가기메뉴

본문 바로가기 주메뉴 바로가기

Classification Performance Analysis of Cross-Language Text Categorization using Machine Translation

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2009, v.43 no.1, pp.313-332
https://doi.org/10.4275/KSLIS.2009.43.1.313

Abstract

Cross-language text categorization(CLTC) can classify documents automatically using training set from other language. In this study, collections appropriated for CLTC were extracted from KTSET. Classification performance of various CLTC methods were compared by SVM classifier using machine translation. Results showed that the classification performance in the order of poly-lingual training method, training-set translation and test-set translation. However, training-set translation could be regarded as the most useful method among CLTC, because it was efficient for machine translation and easily adapted to general environment. On the other hand, low performance was shown to be due to the feature reduction or features with no subject characteristics, which occurred in the process of machine translation of CLTC.

keywords
교차언어 문서 범주화, 문헌자동분류, 다국어 분류, 다국어 학습, 교차언어 학습, Cross-Language Text Categorization, CLTC, Document Classification, Multilingual Classification, Poly-Lingual Training, Cross-Language Training, Cross-Language Text Categorization, CLTC, Document Classification, Multilingual Classification, Poly-Lingual Training, Cross-Language Training

Reference

1.

김성혁, 서은경, 이원규, 김명철, 김영환, 김재군. 1994. 자동색인기 성능시험을 위한 Test Set 개발. 정보관리학회지 , 11(1): 81-102.

2.

Adeva, J., R. Calvo, and D. L. Ipiña. 2005. “Multilingual Approaches to Text Categorisation." The European Journal for the Informatics Professional, 6(3): 43-51.

3.

Amine, B. M., and M. Mimoun. 2007. “Word-Net based Cross-Language Text Categorization." ACS International Conference on Computer Systems and Applications, 848-855.

4.

Bel, N., C. Koster, and M. Villegas. 2003. “Cross-Lingual Text Categorization." LNCS, 2769: 126-139.

5.

Chang, C. and C. Lin. 2001. “LIBSVM : a library for support vector machines." [online]. [cited 2008.08.30]. <http://www.csie.ntu.edu.tw/~cjlin/libsvm>.

6.

Cristianini, N., and J. Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. London: Cambridge University Press.

7.

Gliozzo, A. M., and C. Strapparava. 2005. “Cross language text categorization by acquiring multilingual domain models from comparable corpora." Proceedings of the ACL workshop on building and using parallel texts, 9-16.

8.

Joachims, T. 1998. “Text categorization with Support Vector Machines: Learning with many relevant features." Proceedings of the 10th European Conference on Machine Learning, 137-142.

9.

Kishida, K. 2005. “Technical issues of crosslanguage information retrieval: a review." Information Processing & Management, 41: 433-455.

10.

Melo, G. and S. Siersdorfer. 2007. “Multilingual text classification using ontologies." Proceeding 29th European Conference on Information Retrieval, 541-548.

11.

Oard, D. W., and A. R. Diekema. 1998. “Crosslanguage information retrieval." Annual Review of Information Science and Technology, 33: 223-256.

12.

Peters, C., and P. Sheridan. 2001. “Multilingual information access." Lectures on information retrieval, 51-80.

13.

Rigutini, L., M. Maggini, and B. Liu. 2005. “An EM based training algorithm for Cross- Language Text Categorization." Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, 529-535.

14.

Taira, H., and M. Haruno. 1999. “Feature selection in SVM text categorization." Proceedings of the 16th National Conference on Artificial Intelligence (AAAI-99), 480-486.

15.

Vapnik, V. N. 1995. The Nature of Statistical Learning Theory. New York: Springer.

16.

Vapnik, V. N. 2000. The nature of statistical learning theory. 2nd ed. New York: Springer.

17.

Wu, K. and B. Lu. 2008. “A Refinement Framework for Cross Language Text Categorization." Information Retrieval Technology 4th Asia Information Retrieval Symposium, 15-18.

18.

Yang, Y., and X. Liu. 1999. “A re-examination of text categorization methods." Proceedings of the ACM SIGIR Conference on Research and Development in International Retrieval (SIGIR'99), 42-49.

Journal of the Korean Society for Library and Information Science