바로가기메뉴

본문 바로가기 주메뉴 바로가기

한국비블리아학회지

A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation

한국비블리아학회지 / 한국비블리아학회지, (P)1229-2435; (E)2799-4767
2011, v.22 no.2, pp.5-25
https://doi.org/10.14699/kbiblia.2011.22.2.005
Lee, Yong-Gu
  • Downloaded
  • Viewed

Abstract

This study aims to identify the most effective statistical feature selecting method and context window size for word sense disambiguation using supervised methods. In this study, features were selected by four different methods: information gain, document frequency, chi-square, and relevancy. The result of weight comparison showed that identifying the most appropriate features could improve word sense disambiguation performance. Information gain was the highest. SVM classifier was not affected by feature selection and showed better performance in a larger feature set and context size. Naive Bayes classifier was the best performance on 10 percent of feature set size. kNN classifier on under 10 percent of feature set size. When feature selection methods are applied to word sense disambiguation, combinations of a small set of features and larger context window size, or a large set of features and small context windows size can make best performance improvements.

keywords
단어 중의성 해소, 통계적 자질선정, 문맥 크기, 나이브 베이즈 분류기, kNN 분류기

한국비블리아학회지