Control Systems and Computers, N4, 2024, Article 5

Control Systems and Computers, 2024, Issue 4 (308), pp. 

Yevhen Mrozek, PhD Student, Department of Speech Recognition and Synthesis, International Research and Training Center for Information Technologies and Systems NAS and MES of Ukraine,
40, Akademika Glushkova Avenue, Kyiv, Ukraine, 03187, ORCID: https://orcid.org/0009-0008-4989-5016, zekamrozek@gmail.com

ANALYSIS OF MODERN APPROACHES TO SPEECH RECOGNITION TASKS

Introduction. The necessity for modern approaches to solving speech recognition tasks arises from the rapid development of artificial intelligence and the need to improve the accuracy and speed of human-computer interaction in various areas, such as voice assistants, translation, and automation. This direction is becoming increasingly relevant due to the growing volume of generated audio data and the need for real-time processing, particularly in Ukrainian contexts where multiple languages and dialects coexist. Currently, several approaches to speech recognition, analysis, and transcription exist, including methods based on neural networks, speaker diarization techniques, noise removal, and data structuring. However, the challenge of creating a universal solution that meets the needs of multilingual environments and effectively handles unstructured audio data remains relevant.

Purpose. To review existing tools and algorithms for solving speech recognition tasks, particularly for Ukranian.

Methods. Speech recognition, deep learning, transformers.

Results. Theoretical foundations of approaches and models for speech recognition were considered for building a knowledge base for a multilingual spoken dialogue system. Effective examples of improving transcription accuracy for languages with limited data were also explored, along with potential steps to enhance system speed. Potential datasets for model training were discussed.

Conclusion. A structured review of modern methods for processing and analyzing multilingual audio files was provided, outlining their advantages, disadvantages, and unresolved issues.

Keywords: speech recognition, neural networks, machine learning.