Hewlett-Packard
WW
Search
Assistance
HP Labs Home
Spacer
Research
News
Job Openings
Technical Reports
Spacer
Locations
Bristol, UK
Israel
Japan
Palo Alto, USA

Spacer

 

 

HP Labs Technical Reports



Click here for full text: PDF

K-Harmonic Means - A Data Clustering Algorithm

Zhang, Bin; Hsu, Meichun; Dayal, Umeshwar

HPL-1999-124

Keyword(s): clustering; K-Means; K-Harmonic Means; data mining

Abstract: Data clustering is one of the common techniques used in data mining. A popular performance function for measuring goodness of data clustering is the total within-cluster variance, or the total mean-square quantization error (MSE). The K-Means (KM) algorithm is a popular algorithm which attempts to find a K- clustering which minimizes MSE. The K-Means algorithm is a centerbased clustering algorithm. The dependency of the K-Means performance on the initialization of the centers is a major problem; a similiar issue exists for an alternative algorithm, Expectation Maximization(EM), although to a lesser extent. In this paper, we propose a new clustering method called the K-Harmonic Means algorithm (KHM). KHM is a center- based clustering algorithm which uses the Harmonic Averages of the distances from each data point to the centers as components to its performance function. It is demonstrated that K-Harmonic Means is essentially insensitive to the initialization of the centers. In certain cases, K-Harmonic Means significantly improves the quality of clustering results comparing with both K-Means and EM, which are the two most popular clustering algorithms used in data exploration and data compression. A unified view of the three performance functions, K-Means', K-Harmonic Means' and EM's, are given for comparison. Experimental results of KHM comparing with KM on high dimensional data and visualization of the animation of the convergence of all three algorithms using 2-dimensional data are given.

25 Pages

Back to Index


HP Bottom Banner
Terms of Use Privacy Statement