Article Detail

Home > Article Detail
  • P-ISSN 1010-0695
  • E-ISSN 2288-3339

Application of text-mining technique and machine-learning model with clinical text data obtained from case reports for Sasang constitution diagnosis:a feasibility study

Journal of Korean Medicine / Journal of Korean Medicine, (P)1010-0695; (E)2288-3339
2024, v.45 no.3, pp.193-210
Jinseok Kim
So-hyun Park
Roa Jeong
Eunsu Lee
Yunseo Kim
Hyundong Sung
Jun-Sang Yu
  • Downloaded
  • Viewed

Abstract

Objectives: We analyzed Sasang constitution case reports using text mining to derive network analysis results and designed a classification algorithm using machine learning to select a model suitable for classifying Sasang constitution based on text data. Methods: Case reports on Sasang constitution published from January 1, 2000, to December 31, 2022, were searched. As a result, 343 papers were selected, yielding 454 cases. Extracted texts were pretreated and tokenized with the Python-based KoNLPy package. Each morpheme was vectorized using TF-IDF values. Word cloud visualization and centrality analysis identified keywords mainly used for classifying Sasang constitution in clinical practice. To select the most suitable classification model for diagnosing Sasang constitution, the performance of five models—XGBoost, LightGBM, SVC, Logistic Regression, and Random Forest Classifier—was evaluated using accuracy and F1-Score. Results: Through word cloud visualization and centrality analysis, specific keywords for each constitution were identified. Logistic regression showed the highest accuracy (0.839416), while random forest classifier showed the lowest (0.773723). Based on F1-Score, XGBoost scored the highest (0.739811), and random forest classifier scored the lowest (0.643421). Conclusions: This is the first study to analyze constitution classification by applying text mining and machine learning to case reports, providing a concrete research model for follow-up research. The keywords selected through text mining were confirmed to effectively reflect the characteristics of each Sasang constitution type. Based on text data from case reports, the most suitable machine learning models for diagnosing Sasang constitution are logistic regression and XGBoost.

keywords
Data Mining, Machine Learning, Case reports, Sasang Constitutional Medicine


  • Downloaded
  • Viewed
  • 0KCI Citations
  • 0WOS Citations

Other articles from this issue

Recommanded Articles

상단으로 이동

Journal of Korean Medicine