바로가기메뉴

본문 바로가기 주메뉴 바로가기

Kiwi: Developing a Korean Morphological Analyzer Based on Statistical Language Models and Skip-Bigram

Korean Journal of Digital Humanities / Korean Journal of Digital Humanities, (E)3058-311X
2024, v.1 no.1, pp.109-136
https://doi.org/10.23287/KJDH.2024.1.1.6
Min-chul Lee (Kakao)

Abstract

One of the challenges faced by models in Korean morphological analysis is ambiguity. This arises because different combinations of morphemes with completely different base forms can share the same surface form in Korean, necessitating the model's ability to consider context for accurate analysis. The morphological analyzer Kiwi addresses this issue by proposing a combination of a statistical language model that considers local context and a Skip-Bigram model that considers global context. This proposed method achieved an average accuracy of 86.7% in resolving ambiguities, outperforming existing open-source morphological analyzers, particularly deep learning-based ones, which typically achieve between 50-70%. Additionally, thanks to the optimized lightweight model, Kiwi shows faster speeds compared to other analyzers, making it useful for analyzing large volumes of text. Kiwi, released as open source, is widely used in various fields such as text mining, natural language processing, and the humanities due to these features. Although this study improved both the accuracy and efficiency of morphological analysis, it shows limitations in handling out-of-vocabulary problem and analyzing Korean dialects, necessitating further improvements in these areas.

keywords
Korean, NLP, Morphological Analyzer, Disambiguation, Language Model

Submission Date
2024-04-30
Revised Date
2024-05-26
Accepted Date
2024-05-26

Korean Journal of Digital Humanities