Variable weighting in fuzzy k-Means clustering to determine the number of clusters

Imran Khan

doi:10.1109/TKDE.2019.2911582

Variable weighting in fuzzy k-Means clustering to determine the number of clusters

Imran Khan

Computer Science

نتاج البحث: المساهمة في مجلة › Article › مراجعة النظراء

39 اقتباسات (Scopus)

ملخص

One of the most significant problems in cluster analysis is to determine the number of clusters in unlabeled data, which is the input for most clustering algorithms. Some methods have been developed to address this problem. However, little attention has been paid on algorithms that are insensitive to the initialization of cluster centers and utilize variable weights to recover the number of clusters. To fill this gap, we extend the standard fuzzy $k$k-means clustering algorithm. It can automatically determine the number of clusters by iteratively calculating the weights of all variables and the membership value of each object in all clusters. Two new steps are added to the fuzzy $k$k-means clustering process. One of them is to introduce a penalty term to make the clustering process insensitive to the initial cluster centers. The other one is to utilize a formula for iterative updating of variable weights in each cluster based on the current partition of data. Experimental results on real-world and synthetic datasets have shown that the proposed algorithm effectively determined the correct number of clusters while initializing the different number of cluster centroids. We also tested the proposed algorithm on gene data to determine a subset of important genes.

اللغة الأصلية	English
رقم المقال	8692620
الصفحات (من إلى)	1838-1853
عدد الصفحات	16
دورية	IEEE Transactions on Knowledge and Data Engineering
مستوى الصوت	32
رقم الإصدار	9
المعرِّفات الرقمية للأشياء	https://doi.org/10.1109/TKDE.2019.2911582
حالة النشر	Published - سبتمبر 1 2020

ASJC Scopus subject areas

???subjectarea.asjc.1700.1710???
???subjectarea.asjc.1700.1706???
???subjectarea.asjc.1700.1703???

الوصول إلى المستند

10.1109/TKDE.2019.2911582

الملفات والروابط الأخرى

Link to publication in Scopus

قم بذكر هذا

@article{b8522b2b56494bb18ccb79cea9e49d1b,

title = "Variable weighting in fuzzy k-Means clustering to determine the number of clusters",

abstract = "One of the most significant problems in cluster analysis is to determine the number of clusters in unlabeled data, which is the input for most clustering algorithms. Some methods have been developed to address this problem. However, little attention has been paid on algorithms that are insensitive to the initialization of cluster centers and utilize variable weights to recover the number of clusters. To fill this gap, we extend the standard fuzzy $k$k-means clustering algorithm. It can automatically determine the number of clusters by iteratively calculating the weights of all variables and the membership value of each object in all clusters. Two new steps are added to the fuzzy $k$k-means clustering process. One of them is to introduce a penalty term to make the clustering process insensitive to the initial cluster centers. The other one is to utilize a formula for iterative updating of variable weights in each cluster based on the current partition of data. Experimental results on real-world and synthetic datasets have shown that the proposed algorithm effectively determined the correct number of clusters while initializing the different number of cluster centroids. We also tested the proposed algorithm on gene data to determine a subset of important genes.",

keywords = "Fuzzy k-means, clustering, data mining, number of clusters, variable weighting",

author = "Imran Khan",

note = "Publisher Copyright: {\textcopyright} 1989-2012 IEEE.",

year = "2020",

month = sep,

day = "1",

doi = "10.1109/TKDE.2019.2911582",

language = "English",

volume = "32",

pages = "1838--1853",

journal = "IEEE Transactions on Knowledge and Data Engineering",

issn = "1041-4347",

publisher = "IEEE Computer Society",

number = "9",

}

TY - JOUR

T1 - Variable weighting in fuzzy k-Means clustering to determine the number of clusters

AU - Khan, Imran

PY - 2020/9/1

Y1 - 2020/9/1

N2 - One of the most significant problems in cluster analysis is to determine the number of clusters in unlabeled data, which is the input for most clustering algorithms. Some methods have been developed to address this problem. However, little attention has been paid on algorithms that are insensitive to the initialization of cluster centers and utilize variable weights to recover the number of clusters. To fill this gap, we extend the standard fuzzy $k$k-means clustering algorithm. It can automatically determine the number of clusters by iteratively calculating the weights of all variables and the membership value of each object in all clusters. Two new steps are added to the fuzzy $k$k-means clustering process. One of them is to introduce a penalty term to make the clustering process insensitive to the initial cluster centers. The other one is to utilize a formula for iterative updating of variable weights in each cluster based on the current partition of data. Experimental results on real-world and synthetic datasets have shown that the proposed algorithm effectively determined the correct number of clusters while initializing the different number of cluster centroids. We also tested the proposed algorithm on gene data to determine a subset of important genes.

AB - One of the most significant problems in cluster analysis is to determine the number of clusters in unlabeled data, which is the input for most clustering algorithms. Some methods have been developed to address this problem. However, little attention has been paid on algorithms that are insensitive to the initialization of cluster centers and utilize variable weights to recover the number of clusters. To fill this gap, we extend the standard fuzzy $k$k-means clustering algorithm. It can automatically determine the number of clusters by iteratively calculating the weights of all variables and the membership value of each object in all clusters. Two new steps are added to the fuzzy $k$k-means clustering process. One of them is to introduce a penalty term to make the clustering process insensitive to the initial cluster centers. The other one is to utilize a formula for iterative updating of variable weights in each cluster based on the current partition of data. Experimental results on real-world and synthetic datasets have shown that the proposed algorithm effectively determined the correct number of clusters while initializing the different number of cluster centroids. We also tested the proposed algorithm on gene data to determine a subset of important genes.

KW - Fuzzy k-means

KW - clustering

KW - data mining

KW - number of clusters

KW - variable weighting

UR - http://www.scopus.com/inward/record.url?scp=85090324994&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85090324994&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2019.2911582

DO - 10.1109/TKDE.2019.2911582

M3 - Article

AN - SCOPUS:85090324994

SN - 1041-4347

VL - 32

SP - 1838

EP - 1853

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

IS - 9

M1 - 8692620

ER -