|
Click here for full text:
A Statistical Combined Classifier and its Application to Region and Image Classification
Simske, Steven J.
HPL-2005-179
External
Keyword(s): archiving; zoning analysis; image classification; classifier; binary classification; normal; combined classifiers
Abstract: A new method for combining classifiers is introduced for two problem types. (1) Archiving and re-purposing are automated using zoning analysis that performs segmentation (region boundary definition), classification (region typing) and bit-depth determination. For performance throughput reasons, zoning analysis is often performed on a low-resolution (e.g. 50-100 ppi) representation of the document. At these resolutions, heuristic metrics for classification are required. Reported here are metrics for distinguishing photos and color drawings, and a novel classification technique based solely on the statistics of each heuristic metric. The statistical technique allows ready combination of multiple binary classifiers, and provides a lower classification error than simple voting or metric-confidence techniques. This technique permits new metrics to improve the overall classification. The benefit of this technique on archival optimization is shown. (2) The classification of documents with sparse text, and video analysis, relies on accurate image classification. We herein present a method for binary classification that accommodates any number of individual classifiers. Each individual classifier is defined by the critical point between its two means, and its relative weighting is inversely proportional to its expected error rate. Using 10 simple image analysis metrics, we distinguish a set of "natural" and "city" scenes, providing a "semantically meaningful" classification. The optimal combination of 5 of these 10 classifiers provides 85.8% accuracy on a small (120 image) feasibility corpus. When this feasibility corpus is then split into half training and half testing images, the mean accuracy of the optimum set of classifiers was 81.7%. Accuracy as high as 90% was obtained for the test set when training percentage was increased. These results demonstrate that an accurate classifier can be constructed from a large pool of simple classifiers through the use of the statistical ("Normal") classification method described herein. Notes:
19 Pages
Back to Index
|