바로가기메뉴

본문 바로가기 주메뉴 바로가기

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

Tuberculosis & Respiratory Diseases / Tuberculosis & Respiratory Diseases,
2023, v.86 no.3, pp.203-215
https://doi.org/10.4046/trd.2022.0048
Chang Dong Yeo, M.D., Ph.D. (The Catholic University of Korea, Seoul, Korea)
Oh Beom Kwon, M.D. (Division of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, College of Medicine, The Catholic University of Korea, Seoul)
Solji Han, B.Ec. (Department of Applied Statistics, Yonsei University, Seoul)
Hwa Young Lee, M.D., Ph.D. (Seoul St. Mary's Hospital)
Hye Seon Kang, M.D. (The Catholic University of Korea, Seoul, Korea)
Sung Kyoung Kim, M.D. (The Catholic University of Korea, Seoul, Korea)
Ju Sang Kim, M.D. (The Catholic University of Korea, Incheon, Korea)
Chan Kwon Park, M.D. (Department of Internal Medicine, Yeouido St. Mary’s Hospital, College of Medicine)
Sang Haak Lee, M.D., Ph.D. (St. Paul’s Hospital, College of Medicine, The Catholic University of Korea, Seoul)
Seung Joon Kim, M.D., Ph.D. (Seoul St. Mary’s Hospital, The Cancer Research Institute, College of Medicine)
Jin Woo Kim, M.D., Ph.D. (Uijeongbu St. Mary's Hospital)
  • Downloaded
  • Viewed

Abstract

Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lungfunction is used to determine the treatment modality. The aim of this study was to evaluatethe predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed threesets: set I, the linear regression model; set II, machine learning models omitting themissing data: and set III, machine learning models imputing the missing data. Six machinelearning models, the least absolute shrinkage and selection operator (LASSO),Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost),and the light gradient boosting machine (LightGBM) were implemented. The forced expiratoryvolume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machinelearning models. The dataset was split into training and test datasets at a 70:30ratio. Implementation was done after dataset splitting in set III. Predictive performancewas evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patientswere included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was thebest model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III,LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperativelung function.

keywords
Lung Cancer, Chronic Obstructive Pulmonary Disease, Postoperative Lung Function, Linear Regression, Machine Learning

Tuberculosis & Respiratory Diseases