Mediclaim Processing

NOTE: This project was done over a duration of 8 weeks to fulfill the Capstone Project requirement at BITS Pilani – Hyderabad Campus for the course: PGP in Artificial Intelligence and Machine Learning. The following is a paraphrased version of the 92 page project report that was written in conjunction with three peers: Gandhi Gannamaneni, Mohammed Riaz, Sai Gudipati.

Time Period: April, 2020 – May, 2020

Introduction
Dataset Information
Data Preprocessing
Generating Machine Learning Models
Models Built
Results
- scikit-learn Models Comparison Report
- H2O Models Comparison Report

Introduction

The objective of the project is to develop multiple machine learning models for the healthcare industry that will help the insurance providers in lowering potential denials of claims and reduce the operational costs in back and forth communications between the claim submitter and the provider. This in turn will result in accelerating the claim disbursement from the provider, thereby saving time for both the claim submitter and the provider.

To this end, historical medical claim data was analyzed and various machine learning models were developed using this claim data as the basis for training. After multiple models were built, they were compared using different performance parameters like Recall, Accuracy, AUC, F1 Score, Precision etc., on the test data.

Dataset Information

The dataset provided resulted in a binary classification task as the result of the claims can be divided into two categories: Accepted, Rejected. The dataset contained information about 470k claims, with each claim containing containing information about 21 features. The data imbalance was extremely significant, with 99.6of the dataset containing Accepted claims, and the remaining 0.4being Denied claims.

Data Preprocessing

Multiple data preprocessing activities were performed on the dataset before the models were trained. The following are a few of the data preprocessing activities:

Deleting the following:
- Rows with invalid Denial Code
- Fields with over 60null values
- Dropping irrelevant features based on the input from domain expert
Outlier Analysis: To conduct outlier analysis, a few critical features were identified to generate boxplots for. After generating boxplots, outliers were identified and the corresponding data points were removed from the dataset. For example, generating a scatter plot between the Claim Charge Amount and Provider Payment Amount resulted in some rows where the provider paid more than the claim request. These rows are outliers and were removed.

After imputing missing values, removing a few features and data points based on various parameters and inferences, the resultant dataset contained about 327k rows.

Generating Machine Learning Models

To generate the best prediction model for the project, the following tasks were performed:

70-30 train-test split
Data scaling
Training multiple different models
Hyperparameter tuning
Handling overfitting
GridSearchCV and KFold techniques to further finetune hyperparameters
Model evaluation on metrics like Recall, Precision, F1, and AUC curves

Models Built

Building the models happened in two phases: the first phase involved building classification models using scikit-learn. scikit-learn plays host to many different classification algorithms, and the following were chosen:

Decision Tree
Random Forest
SVM
XGBoost – gbtree
XGBoost – gblinear

The above models were built with a second dataset, which was generated by applying SMOTE on the cleaned dataset to tackle the issue of class imbalance.

The second phase consisted of using the H2O library, AutoML, and a DNN. The following models were built:

H2O – Gradient Boost
H2O – Gradient Boost (balance_classes = True)
H2O – Random Forest
H2O – Random Forest (balance_classes = True)
H2O – XGBoost
Stacked Ensemble – 1
Stacked Ensemble – 2
AutoML
DNN

Results

scikit-learn Models Comparison Report:

The following is a table listing the performance metric scores of different models.

	Decision Tree	Decision Tree (SMOTE)	Random Forest	Random Forest (SMOTE)	SVM	SVM (SMOTE)	XGBoost - gbtree	XGBoost - gblinear	XGBoost - gbtree (SMOTE)	XGBoost - gblinear (SMOTE)
Training Accuracy	99.72	99.65	99.72	99.65	1.26	64.31	99.53	98.74	96.59	86.59
Testing Accuracy	99.36	97.42	99.40	97.45	1.26	52.83	99.44	98.74	96.52	91.73
ROC	89.70	93.63	95.97	97.75	49.32	74.70	99.35	85.74	99.26	95.45
Recall	71.13	85.21	72.89	85.56	100.00	77.29	70.95	0.00	93.84	86.09
Precision	76.23	30.93	77.97	31.21	1.26	2.03	82.08	0.00	25.74	11.79
F1	73.59	45.38	75.34	45.74	2.48	3.96	76.11	0.00	40.39	20.75

There’s two ways of looking at this: First, if the objective of the project is to reduce administrative costs, the focus is to correctly predict the denials i.e. to reduce False Negatives. So, using Recall as the defining evaluation metric, following are the best models:

XGBoost with gbtree (SMOTE)
Random Forest (SMOTE)
Decision Tree (SMOTE)

Second, if we are supposed to take the provider concerns into consideration, correctly predicting the accepted claims is important i.e. to reduce False Positives. So, using F1 score as the defining evaluation metric, following are the best models:

XGBoost with gbtree
Random Forest
Decision Tree

H2O Models Comparison Report: