Technical Reports

printable version

HP Labs


»	Research


»	News and events
»	Technical reports


»	About HP Labs
»	Careers @ HP Labs
»	People
»	Worldwide sites


»	Downloads

Click here for full text:

Dependence of Clustering Algorithm Performance on Clustered-ness of Data

Zhang, Bin

HPL-2001-91

Keyword(s): clustering; K-Means; K-Harmonic Means; EM; Data Mining

Abstract: Intuitively, clustering algorithms should work better on the datasets that have well separated clusters. But we found the contrary for the center-based clustering algorithms, including K-Means, K-Harmonic Means and EM. We generated 1200 synthetic datasets with varying ratio of inter-cluster variance over within-cluster variance, which we call the clustered-ness of the dataset. We run K-Means, K-Harmonic Means and EM on these datasets and found that the ratio of the performance over the global optimum grows with increasing clustered-ness. Dependence of clustering algorithm performance on other parameters -- quality of initialization and dimensionality of data -- are also demonstrated.

12 Pages

Back to Index


»Technical Reports
	»	2009
	»	2008
	»	2007
	»	2006
	»	2005
	»	2004
	»	2003
	»	2002
	»	2001
	»	2000
	»	1990 - 1999



Heritage Technical Reports
	»	Compaq & DEC Technical Reports
	»	Tandem Technical Reports


Privacy statement	Using this site means you accept its terms	Feedback to HP Labs

© 2009 Hewlett-Packard Development Company, L.P.

Technical Reports

HP Labs

»Technical Reports

Heritage Technical Reports