Technical Reports
HPL-2011-25
Training Set Compression by Incremental Clustering
Li, Dalong; Simske, Steven
HP Laboratories
HPL-2011-25
Keyword(s): Clustering, Support vector machine, KNN, Pattern recognition, CONDENSE.
Abstract: Compression of training sets is a technique for reducing training set size without degrading classification accuracy. By reducing the size of a training set, training will be more efficient in addition to saving storage space. In this paper, an incremental clustering algorithm, the Leader algorithm, is used to reduce the size of a training set by effectively subsampling the training set. Experiments on several standard data sets using SVM and KNN as classifiers indicate that the proposed method is more efficient than CONDENSE in reducing the size of training set without degrading the classification accuracy. While the compression ratio for the CONDENSE method is fixed, the proposed method offers variable compression ratio through the cluster threshold value.
9 Pages
Additional Publication Information: To be published in Journal of Pattern Recognition Research, Vol 6, No 1 (2011).
External Posting Date: February 21, 2011 [Fulltext]. Approved for External Publication
Internal Posting Date: February 21, 2011 [Fulltext]