Incorporating Stability and Error-based Constraints for a Novel Partitional Clustering Algorithm
Abstract
Data clustering is one of the major areas in data mining. The bisecting clustering algorithm is
one of the most widely used for high dimensional dataset. But its performance degrades as the
dimensionality increases. Also, the task of selection of a cluster for further bisection is a
challenging one. To overcome these drawbacks, we developed a novel partitional clustering
algorithm called a HB-K-Means algorithm (High dimensional Bisecting K-Means). In order to
improve the performance of this algorithm, we incorporate two constraints, such as a stabilitybased measure and a Mean Square Error (MSE) resulting in CHB-K-Means (Constraint-based
High dimensional Bisecting K-Means) algorithm. The CHB-K-Means algorithm generates two
initial partitions. Subsequently, it calculates the stability and MSE for each partition generated.
Inference techniques are applied on the stability and MSE values of the two partitions to select
the next partition for the re-clustering process. This process is repeated until K number of
clusters is obtained. From the experimental analysis, we infer that an average clustering
accuracy of 75% has been achieved. The comparative analysis of the proposed approach with
the other traditional algorithms shows an achievement of a higher clustering accuracy rate and
an increase in computation time.