Hewlett-Packard
WW
Search
Assistance
HP Labs Home
Spacer
Research
News
Job Openings
Technical Reports
Spacer
Locations
Bristol, UK
Cambridge, USA
Grenoble, FR
Israel
Japan
Palo Alto, USA

Spacer

 

 

HP Labs Technical Reports



Click here for full text: PDF

Model-Independent Measure of Regression Difficulty

Zhang, Bin; Elkan, Charles; Dayal, Umeshwar; Hsu, Meichun

HPL-2000-5

Keyword(s):data mining; machine learning; model fitting; regression; exploratory data analysis

Abstract:We prove an inequality bound for the variance of the error of a regression function plus its non-smoothness as quantified by the Uniform Lipschitz condition. The coefficients in the inequality are calculated based on training data with no assumptions about how the regression function is learned. This inequality, called the Unpredictability Inequality, allows us to evaluate the difficulty of the regression problem for a given dataset, before applying any regression method. The Inequality gives information on the tradeoff between prediction error and how sensitive predictions must be to predictor values. The Unpredictability Inequality can be applied to any convex subregion of the space X of predictors. We improve the effectiveness of the Inequality by partitioning X into multiple convex subregions via clustering, and then applying the Inequality on each subregion. Experimental results on genuine data from a manufacturing line show that, combined with clustering, the Unpredictability Inequality provides considerable insight and help in selecting a regression method.

19 Pages

Back to Index


HP Bottom Banner
Terms of Use Privacy Statement