Tech Report: HPL-1999-124: K-Harmonic Means

HP Labs Technical Reports

Click here for full text:

K-Harmonic Means - A Data Clustering Algorithm

Zhang, Bin; Hsu, Meichun; Dayal, Umeshwar

HPL-1999-124

Keyword(s): clustering; K-Means; K-Harmonic Means; data mining

Abstract: Data clustering is one of the common techniques used in data mining. A popular performance function for measuring goodness of data clustering is the total within-cluster variance, or the total mean-square quantization error (MSE). The K-Means (KM) algorithm is a popular algorithm which attempts to find a K- clustering which minimizes MSE. The K-Means algorithm is a centerbased clustering algorithm. The dependency of the K-Means performance on the initialization of the centers is a major problem; a similiar issue exists for an alternative algorithm, Expectation Maximization(EM), although to a lesser extent. In this paper, we propose a new clustering method called the K-Harmonic Means algorithm (KHM). KHM is a center- based clustering algorithm which uses the Harmonic Averages of the distances from each data point to the centers as components to its performance function. It is demonstrated that K-Harmonic Means is essentially insensitive to the initialization of the centers. In certain cases, K-Harmonic Means significantly improves the quality of clustering results comparing with both K-Means and EM, which are the two most popular clustering algorithms used in data exploration and data compression. A unified view of the three performance functions, K-Means', K-Harmonic Means' and EM's, are given for comparison. Experimental results of KHM comparing with KM on high dimensional data and visualization of the animation of the convergence of all three algorithms using 2-dimensional data are given.

25 Pages

Back to Index