Kiwi: Developing a Korean Morphological Analyzer  Based on Statistical Language Models and Skip-Bigram

Lee Min-chul; 이민철

doi:10.23287/KJDH.2024.1.1.6

Log In/Sign Up
KOREAN
E-ISSN3058-311X

Home

E-ISSN : 3058-311X

Article Contents

Prev Next

e-Submission

Vol.1 No.1

Citation Share

Kiwi: Developing a Korean Morphological Analyzer Based on Statistical Language Models and Skip-Bigram

Korean Journal of Digital Humanities / Korean Journal of Digital Humanities, (E)3058-311X

2024, v.1 no.1, pp.109-136

https://doi.org/10.23287/KJDH.2024.1.1.6

Min-chul Lee (Kakao)

Lee, M. (2024). Kiwi: Developing a Korean Morphological Analyzer Based on Statistical Language Models and Skip-Bigram. Korean Journal of Digital Humanities, 1(1), 109-136, https://doi.org/10.23287/KJDH.2024.1.1.6

copy

Abstract

One of the challenges faced by models in Korean morphological analysis is ambiguity. This arises because different combinations of morphemes with completely different base forms can share the same surface form in Korean, necessitating the model's ability to consider context for accurate analysis. The morphological analyzer Kiwi addresses this issue by proposing a combination of a statistical language model that considers local context and a Skip-Bigram model that considers global context. This proposed method achieved an average accuracy of 86.7% in resolving ambiguities, outperforming existing open-source morphological analyzers, particularly deep learning-based ones, which typically achieve between 50-70%. Additionally, thanks to the optimized lightweight model, Kiwi shows faster speeds compared to other analyzers, making it useful for analyzing large volumes of text. Kiwi, released as open source, is widely used in various fields such as text mining, natural language processing, and the humanities due to these features. Although this study improved both the accuracy and efficiency of morphological analysis, it shows limitations in handling out-of-vocabulary problem and analyzing Korean dialects, necessitating further improvements in these areas.

keywords: Korean, NLP, Morphological Analyzer, Disambiguation, Language Model

Submission Date: 2024-04-30

Revised Date: 2024-05-26

Accepted Date: 2024-05-26

바로가기메뉴

Article Contents

Vol.1 No.1

Kiwi: Developing a Korean Morphological Analyzer Based on Statistical Language Models and Skip-Bigram

Abstract

Korean Journal of Digital Humanities