Technical Reports

HP Labs


»	Research


»	News and events
»	Technical reports


»	About HP Labs
»	Careers @ HP Labs
»	People
»	Worldwide sites


»	Downloads

Click here for full text:

A Statistical Combined Classifier and its Application to Region and Image Classification

Simske, Steven J.

HPL-2005-179
External

Keyword(s): archiving; zoning analysis; image classification; classifier; binary classification; normal; combined classifiers

Abstract: A new method for combining classifiers is introduced for two problem types. (1) Archiving and re-purposing are automated using zoning analysis that performs segmentation (region boundary definition), classification (region typing) and bit-depth determination. For performance throughput reasons, zoning analysis is often performed on a low-resolution (e.g. 50-100 ppi) representation of the document. At these resolutions, heuristic metrics for classification are required. Reported here are metrics for distinguishing photos and color drawings, and a novel classification technique based solely on the statistics of each heuristic metric. The statistical technique allows ready combination of multiple binary classifiers, and provides a lower classification error than simple voting or metric-confidence techniques. This technique permits new metrics to improve the overall classification. The benefit of this technique on archival optimization is shown. (2) The classification of documents with sparse text, and video analysis, relies on accurate image classification. We herein present a method for binary classification that accommodates any number of individual classifiers. Each individual classifier is defined by the critical point between its two means, and its relative weighting is inversely proportional to its expected error rate. Using 10 simple image analysis metrics, we distinguish a set of "natural" and "city" scenes, providing a "semantically meaningful" classification. The optimal combination of 5 of these 10 classifiers provides 85.8% accuracy on a small (120 image) feasibility corpus. When this feasibility corpus is then split into half training and half testing images, the mean accuracy of the optimum set of classifiers was 81.7%. Accuracy as high as 90% was obtained for the test set when training percentage was increased. These results demonstrate that an accurate classifier can be constructed from a large pool of simple classifiers through the use of the statistical ("Normal") classification method described herein. Notes:

19 Pages

Back to Index


»Technical Reports
	»	2009
	»	2008
	»	2007
	»	2006
	»	2005
	»	2004
	»	2003
	»	2002
	»	2001
	»	2000
	»	1990 - 1999



Heritage Technical Reports
	»	Compaq & DEC Technical Reports
	»	Tandem Technical Reports

Printable version


Privacy statement	Using this site means you accept its terms	Feedback to HP Labs

© 2009 Hewlett-Packard Development Company, L.P.

Technical Reports

HP Labs

»Technical Reports

Heritage Technical Reports