Optimization of K-Means algorithm in grouping data using the statistical gap method

Alfiansyah Hasibuan; Djubir R.E.  Kembuan; Christine Takarina Meitty  Manoppo; Medi Hermanto  Tinambunan

doi:10.35335/idss.v6i3.149

PDF

Published: Aug 24, 2023

DOI: https://doi.org/10.35335/idss.v6i3.149

Keywords:

Algorithm, clustering, EuStockMarkets, K-Means, Statistical gaps

Issue

Vol. 6 No. 3 (2023): September : Intelligent Decision Support System (IDSS)

Section

Articles

Statistics Article

Article View : 422 Times

Alfiansyah Hasibuan

Universitas Negeri Manado, Manado, Indonesia

Djubir R.E. Kembuan

Universitas Negeri Manado, Manado, Indonesia

Christine Takarina Meitty Manoppo

Universitas Negeri Manado, Manado, Indonesia

Medi Hermanto Tinambunan

Universitas Negeri Manado, Manado, Indonesia

Abstract

In this study, we study the core concepts of the K-Means algorithm, explore its algorithmic framework, computation steps, and practical applications. Using data that is used as a basic need to perform calculations from the k-means algorithm optimization method. Using data taken from the r studio dataset with the EuStockMarkets dataset. The purpose of this study is to optimize the k-means algorithm and cluster the clustering process from a dataset, minimizing the objective function that has been set in the clustering process. The tools used are R Studio. Based on the results of this study, profiling of each group formed can be carried out. Based on the grouping results that have been carried out, the grouping results are 75.7% the accuracy of the statistical Gap method in optimizing clusters from existing datasets and the results of 92.9% are obtained from the results of minimizing the object functions in the dataset from grouping with k-means. The smaller the percentage in this grouping process the better it is in optimizing the clusters from the dataset. The author applies the k-means clustering algorithm to minimize objects for grouping from the EuStockMarkets dataset which consists of 4 variables. And the author uses the Statistical Gap method to optimize the clusters from the dataset.

Downloads

Download data is not yet available.

How to Cite

Hasibuan, A., Kembuan, D. R. ., Manoppo, C. T. M. ., & Tinambunan, M. H. . (2023). Optimization of K-Means algorithm in grouping data using the statistical gap method. Journal of Intelligent Decision Support System (IDSS), 6(3), 112-120. https://doi.org/10.35335/idss.v6i3.149

References

Abualigah, L., Diabat, A., Mirjalili, S., Abd Elaziz, M., & Gandomi, A. H. (2021). The Arithmetic Optimization Algorithm. Computer Methods in Applied Mechanics and Engineering, 376, 113609. https://doi.org/10.1016/j.cma.2020.113609
Al-jabery, K. K., Obafemi-Ajayi, T., Olbricht, G. R., & Wunsch II, D. C. (2020). Clustering algorithms. In Computational Learning Approaches to Data Analytics in Biomedical Applications. https://doi.org/10.1016/b978-0-12-814482-4.00003-6
Alhaj, Y. A., Al-qaness, M. A. A., Dahou, A., Abd Elaziz, M., Zhao, D., & Xiang, J. (2020). Effects of Light Stemming on Feature Extraction and Selection for Arabic Documents Classification. In Studies in Computational Intelligence (Vol. 874). https://doi.org/10.1007/978-3-030-34614-0_4
Arinik, N., Figueiredo, R., & Labatut, V. (2020). Multiple partitioning of multiplex signed networks: Application to European parliament votes. Social Networks, 60, 83–102. https://doi.org/10.1016/j.socnet.2019.02.001
Arunkumar, N., Mohammed, M. A., Abd Ghani, M. K., Ibrahim, D. A., Abdulhay, E., Ramirez-Gonzalez, G., & de Albuquerque, V. H. C. (2019). K-Means clustering and neural network for object detecting and identifying abnormality of brain tumor. Soft Computing, 23(19), 9083–9096. https://doi.org/10.1007/s00500-018-3618-7
Dhuhita, W. M. P. (2015). Clustering Metode K-Means Untuk Menentukan Status Gizi Balita. Jurnal Informatika, 15(2), 160–174.
Edelmann, D., Móri, T. F., & Székely, G. J. (2021). On relationships between the Pearson and the distance correlation coefficients. Statistics and Probability Letters, 169, 108960. https://doi.org/10.1016/j.spl.2020.108960
George Seif. (2018). The 5 Clustering Algorithms Data Scientists Need to Know. Towards Data Science. https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Greenacre, Michael (Universitat Pompeu Fabra, B. (n.d.). Chapter 6 Measures of distance and correlation between variables. 1–10.
Hamaker, E. L., Mulder, J. D., & van IJzendoorn, M. H. (2020). Description, prediction and causation: Methodological challenges of studying child and adolescent development. Developmental Cognitive Neuroscience, 46(January), 100867. https://doi.org/10.1016/j.dcn.2020.100867
Kencana, E. N. (2020). Sains Data dengan R : Klasterisasi Menggunakan K-Means Clustering Sains Data dengan R : Klasterisasi Menggunakan K-Means Clustering. ResearchGate, June. https://doi.org/10.13140/RG.2.2.29495.34721
Kumar, V., Chhabra, J. K., & Kumar, D. (2014). Parameter adaptive harmony search algorithm for unimodal and multimodal optimization problems. Journal of Computational Science, 5(2), 144–155. https://doi.org/10.1016/j.jocs.2013.12.001
Linear, T., & Survival, A. (n.d.). Manhattan distance Mathematica domain of a scalar. 219–233. https://doi.org/10.1016/B978-0-12-409520-5.50020-5
Mohajer, M., Englmeier, K.-H., & Schmid, V. J. (2011). A comparison of Gap statistic definitions with and without logarithm function. 096. http://arxiv.org/abs/1103.4767
Oyewole, G. J., & Thopil, G. A. (2023). Data clustering: application and trends. In Artificial Intelligence Review (Vol. 56, Issue 7). Springer Netherlands. https://doi.org/10.1007/s10462-022-10325-y
Pulkit Sharma. (2020). 4 Types of Distance Metrics in Machine Learning. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2020/02/4-types-of-distance-metrics-in-machine-learning/
Pulkit Sharma. (2023). The Ultimate Guide to K-Means Clustering: Definition, Methods and Applications. https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/
Wang, C., Koh, J. M., Yu, T., Xie, N. G., & Cheong, K. H. (2020). Material and shape optimization of bi-directional functionally graded plates by GIGA and an improved multi-objective particle swarm optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 366, 113017. https://doi.org/10.1016/j.cma.2020.113017
Wierzchoń, S. T., & Kłopotek, M. A. (2018). Cluster Analysis. In Studies in Big Data (Vol. 34). https://doi.org/10.1007/978-3-319-69308-8_2
Žalik, K. R. (2008). An efficient k′-means clustering algorithm. Pattern Recognition Letters, 29(9), 1385–1391. https://doi.org/10.1016/j.patrec.2008.02.014
Zhang, G., Zhang, C., & Zhang, H. (2018). Improved K-means algorithm based on density Canopy. Knowledge-Based Systems, 145, 289–297. https://doi.org/10.1016/j.knosys.2018.01.031
Zhang, J., Xiao, M., Gao, L., & Pan, Q. (2018). Queuing search algorithm: A novel metaheuristic algorithm for solving engineering optimization problems. Applied Mathematical Modelling, 63, 464–490. https://doi.org/10.1016/j.apm.2018.06.036

Copyright and Licensing

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details