אפקה - פרויקטי גמר 2017

Prediction of Patient Readmission in Intensive

Care Unit Using Machine Learning Techniques

Project Challenges:

Project Overview:

Hila Delouya

Advisor: Dr. Yehudit Aperstein

Client: Beilinson Intensive Care Unit

Industrial Engineering

The Model

Patients readmitted to an ICU during the same

hospitalization have an increased risk of death, length of stay

and higher costs. The model can support the decision-

making process used when discharging patients from the

intensive care unit at Belinson Hospital.

Imbalanced dataset (6% readmission rate)

Large portion of missing values (14%)

High correlation between the features

Very little positive instances to learn from (271 patients)

Inputs

Laboratory tests

& Bedside

monitors

Data

Preprocessing

Balancing data &

Dealing with

missing values

Classification

Ensemble

models

Output

Readmission

Probability

7,503 ICU Admission

6,095 Patients

4,357 patient with

ICU LOS ≥24h,

discharged alive

from the ICU

271 (6%)

readmitted

within 7

days of ICU

discharge

4,086 patients

discharged alive

after one ICU stay

1,738 Patients

excluded: 658 died

in the ICU, 147 died

in the following 14

days, 933 with ICU

stay < 24 hours.

Fig. 1

patient flowchart

Preprocessing

Fig. 2

SMOTE example



Splitting the data to 70% train and 30% test.



Balancing the train set (1:1) with SMOTE sampling.



Applying ensemble model XGboost based on decision trees.



Validation- 10 fold cross validation with 10 repeats.

Dealing with missing values:

1. Imputation (KNN & other) 3. Binarization

2. Discretization

4. Smoothing

Synthetic Minority Over-sampling Technique:

Undersampling the majority class by 135%.

Oversampling the minority class by 300% (K = 8).

Extreme Gradient Boosting (XGboost) is an

optimized distributed gradient boosting system

that builds additive tree models.



Model stacking (GBM or linear regression).



Bagging of similar non-readmitted patients (k bags).



Asymmetric adaboost (assigning higher initial weights for the positive instances).



Feature selection by correlation threshold.



Clustering by patient similarity and creating K different ensemble models.

1. XGboost AUC - 75.2%

2. Adaboost AUC - 70.8%

3. Random forest AUC - 67.1%

4. Adabag AUC - 66%

5. Bagging AUC - 65.18%

Fig. 3

ensemble model

Ensemble Models

Other Implemented Methods

The best classifier - XGboost

Fig. 4

boosting example

Results

Results of previous research:

1. Ouanes, 2012 : AUC - 0.74 ± 0.06 3. Afonso 2013 : AUC - 0.68 ± 0.03

2. Vieira, 2013 : AUC - 0.72 ± 0.04 4. Fialho 2013 : AUC - 0.64 ± 0.10

The model improves

the doctor’s decision by

50%

Synthetic

instances

Combined classifier

Trained classifier

Update weights, D2

Trained classifier

Update weights, D3

Original data set, D1

Trained classifier

False Positive Rate (1-Specificity)

True Positive Rate (Sensitivity)