Optimization of K-Means algorithm in grouping data using the statistical gap method
Main Article Content
Abstract
In this study, we study the core concepts of the K-Means algorithm, explore its algorithmic framework, computation steps, and practical applications. Using data that is used as a basic need to perform calculations from the k-means algorithm optimization method. Using data taken from the r studio dataset with the EuStockMarkets dataset. The purpose of this study is to optimize the k-means algorithm and cluster the clustering process from a dataset, minimizing the objective function that has been set in the clustering process. The tools used are R Studio. Based on the results of this study, profiling of each group formed can be carried out. Based on the grouping results that have been carried out, the grouping results are 75.7% the accuracy of the statistical Gap method in optimizing clusters from existing datasets and the results of 92.9% are obtained from the results of minimizing the object functions in the dataset from grouping with k-means. The smaller the percentage in this grouping process the better it is in optimizing the clusters from the dataset. The author applies the k-means clustering algorithm to minimize objects for grouping from the EuStockMarkets dataset which consists of 4 variables. And the author uses the Statistical Gap method to optimize the clusters from the dataset.
Downloads
Article Details
Al-jabery, K. K., Obafemi-Ajayi, T., Olbricht, G. R., & Wunsch II, D. C. (2020). Clustering algorithms. In Computational Learning Approaches to Data Analytics in Biomedical Applications. https://doi.org/10.1016/b978-0-12-814482-4.00003-6
Alhaj, Y. A., Al-qaness, M. A. A., Dahou, A., Abd Elaziz, M., Zhao, D., & Xiang, J. (2020). Effects of Light Stemming on Feature Extraction and Selection for Arabic Documents Classification. In Studies in Computational Intelligence (Vol. 874). https://doi.org/10.1007/978-3-030-34614-0_4
Arinik, N., Figueiredo, R., & Labatut, V. (2020). Multiple partitioning of multiplex signed networks: Application to European parliament votes. Social Networks, 60, 83–102. https://doi.org/10.1016/j.socnet.2019.02.001
Arunkumar, N., Mohammed, M. A., Abd Ghani, M. K., Ibrahim, D. A., Abdulhay, E., Ramirez-Gonzalez, G., & de Albuquerque, V. H. C. (2019). K-Means clustering and neural network for object detecting and identifying abnormality of brain tumor. Soft Computing, 23(19), 9083–9096. https://doi.org/10.1007/s00500-018-3618-7
Dhuhita, W. M. P. (2015). Clustering Metode K-Means Untuk Menentukan Status Gizi Balita. Jurnal Informatika, 15(2), 160–174.
Edelmann, D., Móri, T. F., & Székely, G. J. (2021). On relationships between the Pearson and the distance correlation coefficients. Statistics and Probability Letters, 169, 108960. https://doi.org/10.1016/j.spl.2020.108960
George Seif. (2018). The 5 Clustering Algorithms Data Scientists Need to Know. Towards Data Science. https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68
Greenacre, Michael (Universitat Pompeu Fabra, B. (n.d.). Chapter 6 Measures of distance and correlation between variables. 1–10.
Hamaker, E. L., Mulder, J. D., & van IJzendoorn, M. H. (2020). Description, prediction and causation: Methodological challenges of studying child and adolescent development. Developmental Cognitive Neuroscience, 46(January), 100867. https://doi.org/10.1016/j.dcn.2020.100867
Kencana, E. N. (2020). Sains Data dengan R : Klasterisasi Menggunakan K-Means Clustering Sains Data dengan R : Klasterisasi Menggunakan K-Means Clustering. ResearchGate, June. https://doi.org/10.13140/RG.2.2.29495.34721
Kumar, V., Chhabra, J. K., & Kumar, D. (2014). Parameter adaptive harmony search algorithm for unimodal and multimodal optimization problems. Journal of Computational Science, 5(2), 144–155. https://doi.org/10.1016/j.jocs.2013.12.001
Linear, T., & Survival, A. (n.d.). Manhattan distance Mathematica domain of a scalar. 219–233. https://doi.org/10.1016/B978-0-12-409520-5.50020-5
Mohajer, M., Englmeier, K.-H., & Schmid, V. J. (2011). A comparison of Gap statistic definitions with and without logarithm function. 096. http://arxiv.org/abs/1103.4767
Oyewole, G. J., & Thopil, G. A. (2023). Data clustering: application and trends. In Artificial Intelligence Review (Vol. 56, Issue 7). Springer Netherlands. https://doi.org/10.1007/s10462-022-10325-y
Pulkit Sharma. (2020). 4 Types of Distance Metrics in Machine Learning. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2020/02/4-types-of-distance-metrics-in-machine-learning/
Pulkit Sharma. (2023). The Ultimate Guide to K-Means Clustering: Definition, Methods and Applications. https://www.analyticsvidhya.com/blog/2019/08/comprehensive-guide-k-means-clustering/
Wang, C., Koh, J. M., Yu, T., Xie, N. G., & Cheong, K. H. (2020). Material and shape optimization of bi-directional functionally graded plates by GIGA and an improved multi-objective particle swarm optimization algorithm. Computer Methods in Applied Mechanics and Engineering, 366, 113017. https://doi.org/10.1016/j.cma.2020.113017
Wierzchoń, S. T., & Kłopotek, M. A. (2018). Cluster Analysis. In Studies in Big Data (Vol. 34). https://doi.org/10.1007/978-3-319-69308-8_2
Žalik, K. R. (2008). An efficient k′-means clustering algorithm. Pattern Recognition Letters, 29(9), 1385–1391. https://doi.org/10.1016/j.patrec.2008.02.014
Zhang, G., Zhang, C., & Zhang, H. (2018). Improved K-means algorithm based on density Canopy. Knowledge-Based Systems, 145, 289–297. https://doi.org/10.1016/j.knosys.2018.01.031
Zhang, J., Xiao, M., Gao, L., & Pan, Q. (2018). Queuing search algorithm: A novel metaheuristic algorithm for solving engineering optimization problems. Applied Mathematical Modelling, 63, 464–490. https://doi.org/10.1016/j.apm.2018.06.036

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.