New Algorithms for Supervised Dimension Reduction

No Thumbnail Available
Zhang, Ning
Journal Title
Journal ISSN
Volume Title
Middle Tennessee State University
Advances in data collection and storage capabilities during the past decades have led to information overload in most sciences and ushered in a big data era. Data of big volume, as well as high dimensionality, become ubiquitous in many scientific domains. They present many mathematical challenges as well as some opportunities and are bound to give rise to new theoretical developments. Dimension reduction aims to explore low dimensional representation for high dimensional data. It helps promote the understanding of the data structure through visualization and enhance the predictive performance of machine learning algorithms by preventing the “curse of dimensionality.” As high dimensional data become ubiquitous in modern sciences, dimension reduction methods are playing more and more important roles in data analysis. The contribution of this dissertation is to propose some new algorithms for supervised dimension reduction that can handle high dimensional data more efficiently. The first new algorithm is the overlapping sliced inverse regression (OSIR). Sliced inverse regression (SIR) is a pioneer tool for supervised dimension reduction. It identifies the subspace of significant factors with intrinsic lower dimensionality, specifically known as the effective dimension reduction (EDR) space. OSIR refines SIR through an overlapping slicing scheme and can estimate the EDR space and determine the number of effective factors more accurately. We show that the overlapping procedure has the potential to identify the information contained in the derivatives of the inverse regression curve, which helps to explain the superiority of OSIR. We prove that OSIR algorithm is √n-consistent. We also propose the use of bagging and bootstrapping techniques to further improve the accuracy of OSIR. Online learning has attracted great attention due to the increasing demand for systems that have the ability of learning and evolving. When the data to be processed is also high dimensional, and dimension reduction is necessary for visualization or prediction enhancement, online dimension reduction will play an essential role. We propose four new online learning approaches for supervised dimension reduction, namely, the incremental sliced inverse regression, the covariance-free incremental sliced inverse regression, the incremental overlapping sliced inverse regression, and the covariance-free incremental overlapping sliced inverse regression. All four methods are able to update the EDR space fast and efficiently when new observations come in. The effectiveness and efficiency of all four algorithms are verified by simulations and real data applications.
Statistics, Computer science