Research on Performance Optimization of K-Means Algorithm on Large Dataset
Keywords:
K-Means algorithm, Big datasets, Performance optimization, Clustering effect, Computational efficiencyAbstract
This article aims to delve into the performance optimization methods of the K Means algorithm on large datasets, in order to improve its efficiency and accuracy in large-scale data processing. Through theoretical analysis, this article will explore how to optimize the K-Means algorithm to address the challenges it faces on big datasets, in order to meet the current demand for efficient data clustering in the big data era. The article will focus on the basic principles and practical methods of performance optimization, aiming to provide innovative research results for the K-Means algorithm in large-scale data processing.
References
Wu, Z. (2024). An Efficient Recommendation Model Based on Knowledge Graph Attention-Assisted Network (KGATAX). arXiv preprint arXiv:2409.15315.
Ji, H., Xu, X., Su, G., Wang, J., & Wang, Y. (2024). Utilizing Machine Learning for Precise Audience Targeting in Data Science and Targeted Advertising. Academic Journal of Science and Technology, 9(2), 215-220.
Santhi, V., & Jose, R. (2018). Performance analysis of parallel k-means with optimization algorithms for clustering on spark. In Distributed Computing and Internet Technology: 14th International Conference, ICDCIT 2018, Bhubaneswar, India, January 11–13, 2018, Proceedings 14 (pp. 158-162). Springer International Publishing.
Zheng, H., Wang, B., Xiao, M., Qin, H., Wu, Z., & Tan, L. (2024). Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function. arXiv preprint arXiv:2408.11839.
Belhaouari, S. B., Ahmed, S., & Mansour, S. (2014). Optimized K‐Means Algorithm. Mathematical Problems in Engineering, 2014(1), 506480.
Wu, X., Wu, Y., Li, X., Ye, Z., Gu, X., Wu, Z., & Yang, Y. (2024). Application of adaptive machine learning systems in heterogeneous data environments. Global Academic Frontiers, 2(3), 37-50.
Yang, H., Zi, Y., Qin, H., Zheng, H., & Hu, Y. (2024). Advancing Emotional Analysis with Large Language Models. Journal of Computer Science and Software Applications, 4(3), 8-15.
Fong, S., Deb, S., Yang, X. S., & Zhuang, Y. (2014). Towards enhancement of performance of K‐means clustering using nature‐inspired optimization algorithms. The Scientific world journal, 2014(1), 564829.
Wu, Z. (2024). Deep Learning with Improved Metaheuristic Optimization for Traffic Flow Prediction. Journal of Computer Science and Technology Studies, 6(4), 47-53.
Wang, Z., Zhu, Y., Chen, M., Liu, M., & Qin, W. (2024). Llm connection graphs for global feature extraction in point cloud analysis. Applied Science and Biotechnology Journal for Advanced Research, 3(4), 10-16.
Ahmed, M., Seraj, R., & Islam, S. M. S. (2020). The k-means algorithm: A comprehensive survey and performance evaluation. Electronics, 9(8), 1295.
Z. Ren, "A Novel Feature Fusion-Based and Complex Contextual Model for Smoking Detection," 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 2024, pp. 1181-1185, doi: 10.1109/CISCE62493.2024.10653351.
Cui, X., Zhu, P., Yang, X., Li, K., & Ji, C. (2014). Optimized big data K-means clustering using MapReduce. The Journal of Supercomputing, 70, 1249-1259.
Z. Ren, "Enhancing Seq2Seq Models for Role-Oriented Dialogue Summary Generation Through Adaptive Feature Weighting and Dynamic Statistical Conditioninge," 2024 6th International Conference on Communications, Information System and Computer Engineering (CISCE), Guangzhou, China, 2024, pp. 497-501, doi: 10.1109/CISCE62493.2024.10653360.
Wang, Z., Yan, H., Wang, Y., Xu, Z., Wang, Z., & Wu, Z. (2024). Research on autonomous robots navigation based on reinforcement learning. arXiv preprint arXiv:2407.02539.
Shen, Z. (2023). Algorithm Optimization and Performance Improvement of Data Visualization Analysis Platform based on Artificial Intelligence. Frontiers in Computing and Intelligent Systems, 5(3), 14-17.
Chen, G., Liu, M., Zhang, Y., Wang, Z., Hsiang, S. M., & He, C. (2023). Using Images to Detect, Plan, Analyze, and Coordinate a Smart Contract in Construction. Journal of Management in Engineering, 39(2), 1–18. https://doi.org/10.1061/JMENEA.MEENG-5121
Wang, Z., Chu, Z. C., Chen, M., Zhang, Y., & Yang, R. (2024). An Asynchronous LLM Architecture for Event Stream Analysis with Cameras. Social Science Journal for Advanced Research, 4(5), 10-17.
Ikotun, A. M., Ezugwu, A. E., Abualigah, L., Abuhaija, B., & Heming, J. (2023). K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 622, 178-210.
Chen, G., He, C., Hsiang, S., Liu, M., & Li, H. (2023). A mechanism for smart contracts to mediate production bottlenecks under constraints. 31st Annual Conference of the International Group for Lean Construction (IGLC), 1232–1244. https://doi.org/10.24928/2023/0176
Tian, Q., Wang, Z., Cui, X. Improved Unet brain tumor image segmentation based on GSConv module and ECA attention mechanism. arXiv preprint arXiv:2409.13626.