바로가기메뉴

본문 바로가기 주메뉴 바로가기

A Comparative Study on Topic Modeling of LDA, Top2Vec, and BERTopic Models Using LIS Journals in WoS

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2024, v.58 no.1, pp.5-30
https://doi.org/10.4275/KSLIS.2024.58.1.005
Yong-Gu Lee
SeonWook Kim
  • Downloaded
  • Viewed

Abstract

The purpose of this study is to extract topics from experimental data using the topic modeling methods(LDA, Top2Vec, and BERTopic) and compare the characteristics and differences between these models. The experimental data consist of 55,442 papers published in 85 academic journals in the field of library and information science, which are indexed in the Web of Science(WoS). The experimental process was as follows: The first topic modeling results were obtained using the default parameters for each model, and the second topic modeling results were obtained by setting the same optimal number of topics for each model. In the first stage of topic modeling, LDA, Top2Vec, and BERTopic models generated significantly different numbers of topics(100, 350, and 550, respectively). Top2Vec and BERTopic models seemed to divide the topics approximately three to five times more finely than the LDA model. There were substantial differences among the models in terms of the average and standard deviation of documents per topic. The LDA model assigned many documents to a relatively small number of topics, while the BERTopic model showed the opposite trend. In the second stage of topic modeling, generating the same 25 topics for all models, the Top2Vec model tended to assign more documents on average per topic and showed small deviations between topics, resulting in even distribution of the 25 topics. When comparing the creation of similar topics between models, LDA and Top2Vec models generated 18 similar topics(72%) out of 25. This high percentage suggests that the Top2Vec model is more similar to the LDA model. For a more comprehensive comparison analysis, expert evaluation is necessary to determine whether the documents assigned to each topic in the topic modeling results are thematically accurate.

keywords
Topic Modeling, LDA, Top2Vec, BERTopic, Library and Information Science

Journal of the Korean Society for Library and Information Science