| HP Labs Technical Reports  
 Click here for full text:
   
Model-Independent Measure of Regression Difficulty
 Zhang, Bin; Elkan, Charles; Dayal, Umeshwar; Hsu, Meichun
 HPL-2000-5
 Keyword(s):data mining; machine learning; model fitting;
		   regression; exploratory data analysis
 Abstract:We prove an inequality bound for the variance of the
                   error of a regression function plus its
non-smoothness
                   as quantified by the Uniform Lipschitz condition. The
                  coefficients in the inequality are calculated based on
                  training data with no assumptions about how the
                   regression function is learned. This inequality,
                   called the Unpredictability Inequality, allows us to
                   evaluate the difficulty of the regression problem for
                  a given dataset, before applying any regression
                   method. The Inequality gives information on the
                   tradeoff between prediction error and how sensitive
                   predictions must be to predictor values. The
                   Unpredictability Inequality can be applied to any
                   convex subregion of the space X of predictors. We
                   improve the effectiveness of the Inequality by
                   partitioning X into multiple convex subregions via
                   clustering, and then applying the Inequality on each
                   subregion. Experimental results on genuine data from
a
                   manufacturing line show that, combined with
                   clustering, the Unpredictability Inequality provides
                   considerable insight and help in selecting a
                   regression method.
  19 Pages
 Back to Index
 |