A Comparative Study on Two Strategies for Distributed Classification

No Thumbnail Available
Date
2018-05-30
Authors
Xu, Honglan
Journal Title
Journal ISSN
Volume Title
Publisher
Middle Tennessee State University
Abstract
Distributed learning is an effective tool to process big data. An easy and effective distributed learning approach is the divide and conquer method. It first partitions the whole data set into multiple subsets. A base learning algorithm is then applied to each subset. Finally the results from these subsets are coupled together. In the classification setting, many classification algorithms can be used in the second stage. Typical ones include the logistic regression and support vector machines. For the third stage, both voting and averaging can be used as the coupling strategies. In this thesis, empirical studies are done to thoroughly compare the effectiveness of these two coupling strategies. Averaging is found to be more effective in most scenarios.
Description
Keywords
Citation
Collections