Statistical Computing Schemes for Proteomics Data Processing and Insurance Solvency Modeling

No Thumbnail Available
Xiong, Lu
Journal Title
Journal ISSN
Volume Title
Middle Tennessee State University
The accumulating of big-data such as medical data and insurance data requires more advanced computational statistical data analysis methods. As an interdisciplinary computational science research, we study mathematical methods of multi-resolution analysis (MRA), statistical techniques of Bayes classifiers and Markov Random Field (MRF), computing tools of pyramid imaging matching and Markov Chain Monte Carlo (MCMC) and develop new statistical computing schemes in the applications of Imaging Mass Spectrometry (IMS) proteomic data analysis and insurance solvency modeling.
IMS technique is an important and useful tool to discover biomarkers and detect early cancer. However, the high-dimensionality of IMS data makes IMS data processing a difficult task and the development of computational methods for IMS data analysis is lagging behind its technological progress. To overcome high-dimensionality difficulty in IMS data analysis, we propose the MRA method to reduce the dimensionality of IMS data. By transforming IMS data onto wavelet coefficients space and analyze it from low resolution scale to high resolution scale using the idea inspired by pyramid imaging matching technique, the computational complexity can be reduced, while important biomarkers are still selected. For better IMS classification results, we select feature variables from wavelet coefficients and use Bayes classifier to classify IMS pixels based on its feature variables. To incorporate spatial information of IMS data, we consider the Markovianity in cancer growth that the state (cancer or non-cancer) of a sample point (pixel) is highly determined by the configuration of its neighboring system and use MRF to incorporate spatial information of IMS data. This algorithm is implemented using MCMC sampling and the result is probabilistic which provides more information than a deterministic result. We also tested different neighborhood definitions.
As another application of statistical computing techniques, we study insurance solvency modeling. Insurance solvency is one of the most important measurements of insurance companies' financial health. It is directly related to the financial security of an insurance company and the benefits of insurance policyholders. The current solvency prediction methods are more deterministic rather than probabilistic. However, the deterministic method can not provide information such as percentiles and probabilities as a probabilistic method provides. In this application, we design an innovating model to predict captive insurance solvency using a probabilistic method with Monte Carlo simulation. Based on a pre-built financial report for captive insurance, we simulate future losses according to loss distribution to predict solvency scores in coming years. We score solvency from 0 to 1. This solvency score measures the probability that any of the future Insurance Regulatory Information System (IRIS) ratios breaks its upper and lower bounds. These bounds can be defined by users according to their business situations.
The data experiment shows MRA methods in proteomic data analysis are able to select important biomarkers and also achieve a higher classification accuracy with less computation complexity. The data experiment for the MCMC-MRF method shows that the MCMC-MRF method can improve classification accuracy significantly. Also, the captive insurance solvency model designed in this research can be a useful tool for captive managers to use and give more probabilistic information than the traditional deterministic IRIS models.
Bayes classifier, Big data analysis, Captive Insurance Solvency Mod, Monte Carlo Markov chain, Proteomics Data Processing, Wavelet methods