바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

  • P-ISSN1013-0799
  • E-ISSN2586-2073
  • KCI

Performance Comparison of Automatic Classification Using Word Embeddings of Book Titles

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2023, v.40 no.4, pp.307-327
https://doi.org/10.3743/KOSIM.2023.40.4.307
Yong-Gu Lee (Kyungpook National University)

Abstract

To analyze the impact of word embedding on book titles, this study utilized word embedding models (Word2vec, GloVe, fastText) to generate embedding vectors from book titles. These vectors were then used as classification features for automatic classification. The classifier utilized the k-nearest neighbors (kNN) algorithm, with the categories for automatic classification based on the DDC (Dewey Decimal Classification) main class 300 assigned by libraries to books. In the automatic classification experiment applying word embeddings to book titles, the Skip-gram architectures of Word2vec and fastText showed better results in the automatic classification performance of the kNN classifier compared to the TF-IDF features. In the optimization of various hyperparameters across the three models, the Skip-gram architecture of the fastText model demonstrated overall good performance. Specifically, better performance was observed when using hierarchical softmax and larger embedding dimensions as hyperparameters in this model. From a performance perspective, fastText can generate embeddings for substrings or subwords using the n-gram method, which has been shown to increase recall. The Skip-gram architecture of the Word2vec model generally showed good performance at low dimensions(size 300) and with small sizes of negative sampling (3 or 5).

keywords
Word2vec, GloVe, fastText, word embedding, automatic classification, Dewey Decimal Classification(DDC), Word2vec, GloVe, fastText
Submission Date
2023-11-20
Revised Date
2023-12-08
Accepted Date
2023-12-13

Journal of the Korean Society for Information Management