Nonnegative matrix factorization based consensus for clusterings with a variable number of clusters

Imran Khan, Zongwei Luo*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Consensus clustering is an aggregation of base clusterings into an ensemble clustering which is better than the individual base clusterings. It is beneficial to determine the clusters from heterogeneous data. This paper presents a new approach that generates a set of good quality base clusterings and finds a single by aggregation of base clusterings into one clustering solution. The new approach consists of two phases. In the first phase, we present a new tree-based $k$-means algorithm to build different base clusterings. It builds a cluster-tree which gives us one base clustering. The tree generation process uses two stopping criteria which base on the underlying data distribution of a data set. We change the value of the input parameter of the tree generation algorithm to produce multiple cluster-trees where each tree gives a base clustering with a variable number of clusters. In the second phase, we propose a new nonnegative matrix factorization-based consensus method to ensemble base clusterings into final clustering. We investigated the quality and diversity of base clusterings, which often have a large influence on the performances of consensus clustering. Experimental results on various real-world and synthetic data sets have demonstrated that the proposed algorithm was dominant over the well-known algorithms in term of clustering accuracy.

Original languageEnglish
Article number8542938
Pages (from-to)73158-73169
Number of pages12
JournalIEEE Access
Volume6
DOIs
Publication statusPublished - 2018
Externally publishedYes

Keywords

  • Consensus clustering
  • base clusterings
  • cluster tree
  • consensus function

ASJC Scopus subject areas

  • General Computer Science
  • General Materials Science
  • General Engineering

Cite this