Modeling of Speech Recognition Based on Deep Learning

Min Zhang

Authors

Min Zhang School of Computer Science, Xianyang Normal University, Xianyang, Shaanxi 712000, China

Keywords:

Deep learning, Speech recognition, Feature extraction, DFSMN model

Abstract

As technology continues to advance, the application of speech recognition is becoming increasingly pervasive, and the significance of intelligent speech recognition cannot be overstated. This article delves into the intricate workings and classifications of speech recognition systems, meticulously outlining the process of designing the system's development environment and framework. It meticulously charts the course from the collection of speech datasets to the preprocessing of speech data, and then progresses to the crucial stages of feature extraction and the construction of both acoustic and language models tailored for deep learning-based Chinese speech recognition. This comprehensive study not only enables the system to record speech autonomously or upload pre-recorded speech to a server for Chinese recognition but also boasts the capability to translate the recognized Chinese speech into English. This functionality underscores the study's potential to pave the way for further in-depth exploration and advancements in the realm of speech recognition, establishing a solid foundation for future research endeavors.

References

Long, Y., Gu, D., Li, X., Lu, P., & Cao, J. (2024, September). Enhancing Educational Content Matching Using Transformer Models and InfoNCE Loss. In 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE) (pp. 11-15). IEEE.

Huang, S., Diao, S., Wan, Y., & Song, C. (2024, August). Research on multi-agency collaboration medical images analysis and classification system based on federated learning. In Proceedings of the 2024 International Conference on Biomedicine and Intelligent Technology (pp. 40-44).

Ukey, N., Zhang, G., Yang, Z., Li, B., Li, W., & Zhang, W. (2023). Efficient continuous kNN join over dynamic high-dimensional data. World Wide Web, 26(6), 3759-3794.

Chen, Y., Yao, R., Hassanieh, H., & Mittal, R. (2023). {Channel-Aware} 5g {RAN} slicing with customizable schedulers. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) (pp. 1767-1782).

Peng, Q., Zheng, C., & Chen, C. (2024). A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2240-2249).

Yan, Q., Yan, J., Zhang, D., Bi, S., Tian, Y., Mubeen, R., & Abbas, J. (2024). Does CEO power affect manufacturing firms’ green innovation and organizational performance? A mediational approach. Sustainability, 16(14), 6015.

Ren, F., Ren, C., & Lyu, T. (2025). Iot-based 3d pose estimation and motion optimization for athletes: Application of c3d and openpose. Alexandria Engineering Journal, 115, 210-221.

Fan, Y., Wang, Y., Liu, L., Tang, X., Sun, N., & Yu, Z. (2025). Research on the Online Update Method for Retrieval-Augmented Generation (RAG) Model with Incremental Learning. arXiv preprint arXiv:2501.07063.

Modeling of Speech Recognition Based on Deep Learning

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Current Issue

Information