Research on Performance Optimization of K-Means Algorithm on Large Dataset

Authors

  • Zhiyu Jiang Wuhan Donghu University, Wuhan 430200, Hubei, China

Keywords:

K-Means algorithm, Big datasets, Performance optimization, Clustering effect, Computational efficiency

Abstract

This article aims to delve into the performance optimization methods of the K Means algorithm on large datasets, in order to improve its efficiency and accuracy in large-scale data processing. Through theoretical analysis, this article will explore how to optimize the K-Means algorithm to address the challenges it faces on big datasets, in order to meet the current demand for efficient data clustering in the big data era. The article will focus on the basic principles and practical methods of performance optimization, aiming to provide innovative research results for the K-Means algorithm in large-scale data processing.

References

Wu, Z. (2024). An Efficient Recommendation Model Based on Knowledge Graph Attention-Assisted Network (KGATAX). arXiv preprint arXiv:2409.15315.

Ji, H., Xu, X., Su, G., Wang, J., & Wang, Y. (2024). Utilizing Machine Learning for Precise Audience Targeting in Data Science and Targeted Advertising. Academic Journal of Science and Technology, 9(2), 215-220.

Santhi, V., & Jose, R. (2018). Performance analysis of parallel k-means with optimization algorithms for clustering on spark. In Distributed Computing and Internet Technology: 14th International Conference, ICDCIT 2018, Bhubaneswar, India, January 11–13, 2018, Proceedings 14 (pp. 158-162). Springer International Publishing.

Zheng, H., Wang, B., Xiao, M., Qin, H., Wu, Z., & Tan, L. (2024). Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function. arXiv preprint arXiv:2408.11839.

Belhaouari, S. B., Ahmed, S., & Mansour, S. (2014). Optimized K‐Means Algorithm. Mathematical Problems in Engineering, 2014(1), 506480.

Wu, X., Wu, Y., Li, X., Ye, Z., Gu, X., Wu, Z., & Yang, Y. (2024). Application of adaptive machine learning systems in heterogeneous data environments. Global Academic Frontiers, 2(3), 37-50.

Yang, H., Zi, Y., Qin, H., Zheng, H., & Hu, Y. (2024). Advancing Emotional Analysis with Large Language Models. Journal of Computer Science and Software Applications, 4(3), 8-15.

Fong, S., Deb, S., Yang, X. S., & Zhuang, Y. (2014). Towards enhancement of performance of K‐means clustering using nature‐inspired optimization algorithms. The Scientific world journal, 2014(1), 564829.

Wu, Z. (2024). Deep Learning with Improved Metaheuristic Optimization for Traffic Flow Prediction. Journal of Computer Science and Technology Studies, 6(4), 47-53.

Wang, Z., Zhu, Y., Chen, M., Liu, M., & Qin, W. (2024). Llm connection graphs for global feature extraction in point cloud analysis. Applied Science and Biotechnology Journal for Advanced Research, 3(4), 10-16.

Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means algorithm: A comprehensive survey and performance evaluation. Electronics, 9(8), 1295.

Z. Ren, "A Novel Feature Fusion-Based and Complex Contextual Model for Smoking Detection," 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 2024, pp. 1181-1185, doi: 10.1109/CISCE62493.2024.10653351.

Cui, X., Zhu, P., Yang, X., Li, K., & Ji, C. (2014). Optimized big data K-means clustering using MapReduce. The Journal of Supercomputing, 70, 1249-1259.

Z. Ren, "Enhancing Seq2Seq Models for Role-Oriented Dialogue Summary Generation Through Adaptive Feature Weighting and Dynamic Statistical Conditioninge," 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 2024, pp. 497-501, doi: 10.1109/CISCE62493.2024.10653360.

Wang, Z., Yan, H., Wang, Y., Xu, Z., Wang, Z., & Wu, Z. (2024). Research on autonomous robots navigation based on reinforcement learning. arXiv preprint arXiv:2407.02539.

Shen, Z. (2023). Algorithm Optimization and Performance Improvement of Data Visualization Analysis Platform based on Artificial Intelligence. Frontiers in Computing and Intelligent Systems, 5(3), 14-17.

Chen, G., Liu, M., Zhang, Y., Wang, Z., Hsiang, S. M., & He, C. (2023). Using Images to Detect, Plan, Analyze, and Coordinate a Smart Contract in Construction. Journal of Management in Engineering, 39(2), 1–18. https://doi.org/10.1061/JMENEA.MEENG-5121

Wang, Z., Chu, Z. C., Chen, M., Zhang, Y., & Yang, R. (2024). An Asynchronous LLM Architecture for Event Stream Analysis with Cameras. Social Science Journal for Advanced Research, 4(5), 10-17.

Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210.

Chen, G., He, C., Hsiang, S., Liu, M., & Li, H. (2023). A mechanism for smart contracts to mediate production bottlenecks under constraints. 31st Annual Conference of the International Group for Lean Construction (IGLC), 1232–1244. https://doi.org/10.24928/2023/0176

Tian, Q., Wang, Z., Cui, X. Improved Unet brain tumor image segmentation based on GSConv module and ECA attention mechanism. arXiv preprint arXiv:2409.13626.

Downloads

Published

2024-10-27

How to Cite

Jiang, Z. (2024). Research on Performance Optimization of K-Means Algorithm on Large Dataset. International Journal of Advance in Applied Science Research, 3, 59–66. Retrieved from https://h-tsp.com/index.php/ijaasr/article/view/47

Issue

Section

Articles