Data Mining and Machine Learning Algorithms for Workers' Compensation Early Severity Prediction

No Thumbnail Available
Mathews, David
Journal Title
Journal ISSN
Volume Title
Middle Tennessee State University
Although the number of workers' compensation claims have been declining over the last two decades, average cost per claim has been steadily increasing. Identifying factors that contribute to severe claims and effectively managing those claims early in the claim life-cycle could reduce costs for employers and insurers. This research project utilizes machine learning algorithms to predict a binary severity outcome variable. A text mining algorithm, Correlated Topics Model, was used to convert textual description fields to topics. Support Vector Machines and Regularized Logistic Regression were implemented for severity classification and variable selection, respectively. Due to the asymmetric severity outcomes in the training data, a balancing method for matching the volume of severe/non-severe claims was employed. Optimal model parameters for both algorithms were selected based on a profitability metric and 10-fold cross-validation. Discussion of data processing techniques and mathematical exposition of machine learning algorithms are provided. Open source statistical programming software, R, was utilized in this project.
Data Mining, Machine Learning, Predictive Analytics, Workers' Compensation