Hewlett-Packard
WW
Search
Assistance
HP Labs Home
Spacer
Research
News
Job Openings
Technical Reports
Spacer
Locations
Bristol, UK
Israel
Japan
Palo Alto, USA

Spacer

 

 

HP Labs Technical Reports



Click here for full text: PDF

A Local Search Approach to K-Clustering

Zhang, Bin; Kleyner, Gary; Hsu, Meichun

HPL-1999-119

Keyword(s): local search algorithm; data clustering; K-means; clustering aggregated data; clustering large data sets; data compression, vector quantization

Abstract: Data clustering is one of the common techniques used in data mining. A popular performance function for measuring goodness of the K-clustering is the total within-cluster variance, or the total mean-square quantization error (MSE). The K-Means (KM) algorithm is a popular algorithm which attempts to find a K- clustering which minimizes MSE. In this paper, we approach the min-MSE clustering problem by way of a Local Search (LS) algorithm, and analytically derive a clustering algorithm which we call LKM. A number of analyses of LKM are given; in particular, we prove that the set of local optima that can trap KM is a superset of those that can trap LKM. The experimental results also show that LKM converges faster and better than KM. More importantly, LKM naturally extends to an aggregated version, called A-LKM, which can be applied to the problem of clustering large data sets. A-LKM is a clustering algorithm which clusters subsets of data points, or subclusters, instead of individual data points. It can be used to cluster, for example, a large data set that has been aggregated through an algorithm such as the Phase 1 of the BIRCH algorithm ([ZRL96]), with the intention of fitting the aggregated data into the main memory to enable main memory-based clustering. We prove that A-LKM, as applied to the problem of clustering subclusters, preserve the monotone convergence property. Experimental results also show that A-LKM performs better than A-KM, in clustering aggregated data.

20 Pages

Back to Index


HP Bottom Banner
Terms of Use Privacy Statement