J. Renuka Prasanna
An explainable machine learning framework for lung cancer hospital length of stay prediction.
Published on: September 06, 2023
Original author: Belal Alsinglawi, et al. (2022) (DOI: 10.1038/s41598-021-04608-7)
The importance of predicting the length of stay (LOS) in the ICU for lung cancer patients is not well understood. This is because LOS can be used to manage hospital resources and improve patient outcomes. Traditional LOS calculation methods are not disease-specific and can be inaccurate. Machine learning algorithms can be used to predict LOS more accurately, but they often suffer from class imbalance problems. This proposes a new machine learning framework for predicting LOS in lung cancer patients. The framework addresses the class imbalance problem by using six different class-balancing algorithms. It also uses a features selection method to reduce the dimensionality of the data and improve the performance of the machine learning algorithms. Methodology: The proposed framework is based on the dataset of lung cancer patients. The results show that the framework can accurately predict LOS in lung cancer patients. The framework also outperforms other machine learning algorithms that do not address the class imbalance problem. 1. The study used the MIMIC-III dataset, which is a large dataset of de-identified health-related data associated with adult patients who stayed in ICU between 2001 and 2012 at the Harvard Medical School’s teaching hospital (BIDMC) in Massachusetts, USA. 2. The dataset was pre-processed to handle missing values, outliers, and raw data. 3. The target class (LOS) was discretized into two categories: short LOS (0–7 days) and long LOS (>7 days). 4. Categorical variables were transformed into numerical variables. 5. Two feature selection techniques were used: clinical significance and recursive feature elimination (RFE). 6. The six class-balancing techniques used in the study were: (a) Random oversampling (b) SMOTE (Synthetic Minority Oversampling Technique) (c) ADASYN (Adaptive Synthetic Sampling Approach for Imbalanced Learning) (d) SMOTEENN (SMOTE + Edited Nearest Neighbor) (e) SMOTETomek (SMOTE + Tomek Links) (f) Borderline SMOTE (g) Undersampling Results: 1. The random forest (RF) model was used to predict the short and long length of stay (LOS) in lung cancer patients. 2. Six class-balancing methods were used: random oversampling, SMOTE, ADASYN, SMOTEENN, SMOTETomek, Borderline SMOTE, and under-sampling. 3. The best results were obtained with the ADASYN oversampling method, which achieved an IBA score of 100% and 96% with the clinical significance (CS) and recursive feature elimination (RFE) feature selection methods, respectively. 4. The SMOTE oversampling method also achieved good results, with an IBA score of 96% and 98% with CS and RFE, respectively. 5. The under sampling methods performed poorly, with an IBA score of 0% for both ENN and Interlinks. 6. The combination of over- and under-sampling methods, SMOTETomek and SMOTE-ENN, also achieved good results, with an IBA score of 96% and 94% with CS and RFE, respectively. Impact of research: This study has an impact in determining the: 1. Improved care for lung cancer patients: The research can help clinicians to make better decisions about the care of lung cancer patients by predicting their LOS. This can lead to earlier interventions and treatment, which can improve patient outcomes. 2. Reduced healthcare costs: The research can help to identify patients who are at risk of having a long LOS, so that clinicians can take steps to prevent or shorten their stay. This can lead to reduced healthcare costs, both for the individual patient and for the healthcare system as a whole.