Control Systems and Computers

Control Systems and Computers, N4, 2023, Article 3

https://doi.org/10.15407/csc.2023.04.019

Control Systems and Computers, 2023, Issue 4 (304), pp. 19-28

UDK 004.934

Sazhok M.M., Ph.D. (Eng.), head of the department, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine, ORCID: https://orcid.org/0000-0003-1169-6851, E-mail: sazhok@gmail.com,

Robeiko V.V., Research fellow, Taras Shevchenko National University of Kyiv, Glushkov ave., 4g, 03022, Kyiv, Ukraine, ORCID: https://orcid.org/0000-0003-2266-7650, E-mail: valia.robeiko@gmail.com,

Smoliakov Ye.A., nternational Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine, ORCID: https://orcid.org/0000-0002-8272-2095, E-mail: egorsmkv@gmail.com,

Zabolotko T.O., International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine, ORCID: https://orcid.org/0009-0002-1575-3091, E-mail: wariushas@gmail.com,

Seliukh R.A., Research Associate, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine, ORCID: https://orcid.org/0000-0003-2230-8746, E-mail: vxml12@gmail.com,

Fedoryn D.Ya., Research Associate, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine, ORCID: https://orcid.org/0000-0002-4924-225X, E-mail: enomaj@gmail.com,

Yukhymenko O.A., Research fellow, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine, ORCID: https://orcid.org/0000-0001-5868-8547, E-mail: enomaj@gmail.com

Modeling Domain Openness in Speech Information Technologies

The paper addresses the problem of the need to use automatic speech signal transcription systems for various subject areas, including a variety of acoustic conditions, individual characteristics and content contexts, and taking into account elements of multilingualism. The described approaches to modeling wide classes of noise and interference and removing restrictions from vocabulary made it possible to increase the performance of the developed speech information technologies and systems to the openness of the subject area.

Download full text! (In Ukrainian)

Keywords: speech, speech signal, analysis, recognition, automatic speech signal transcription systems, speech information technologies.

Ugan, E.Y., Huber, C., Hussain, J., Waibel, A. (2023). “Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition”. arXiv preprint arXiv:2210.08992v2 [cs.CL], 3 Jul 2023.
Lyudovyk, T., Pylypenko, V. (2014). “Code-switching speech recognition for closely related languages. In Spoken Language Technologies for Under-Resourced Languages”. International Research and Training Center for Information Technologies and Systems, Kyiv, Ukraine.
Lovenia, H., Cahyawijaya, S., Winata, G. I., Xu, P., Yan, X., Liu, Z., …, Fung, P. (2021). ASCEND: A spontaneous chinese-english dataset for code-switching in multi-turn conversation. arXiv preprint arXiv:2112.06223. The Hong Kong University of Science and Technology. [cs.CL], 3 May 2022.
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., …, Schwarz, P. (2011). “The Kaldi speech recognition toolkit. IEEE 2011 workshop on automatic speech recognition and understanding”. IEEE Signal Processing Society. IEEE Catalog, No.: CFP11SRW-USB.
Sazhok, M., Robeiko, V. (2013). “Lexical Stress-Based Morphological Decomposition and Its Application for Ukrainian Speech Recognition”. In Text, Speech, and Dialogue: 16th International Conference, TSD 2013, Pilsen, Czech Republic, September 1-5, 2013. Proceedings 16, pp. 327-334. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_42
https://doi.org/10.1007/978-3-642-40585-3_42
Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., …, Ochiai, T. (2018). Espnet: End-to-end speech processing toolkit. Proc. Int. conference Interspeech’2018. arXiv preprint arXiv:1804.00015. https://doi.org/10.21437/Interspeech.2018-1456
https://doi.org/10.21437/Interspeech.2018-1456
Sazhok M.M., Seliukh R.A., Fedoryn D.Ya., Yukhymenko O.A., Robeiko V.V. (2019). “Automatic Speech Recognition For Ukrainian Broadcast Media Transcribing”. Control Systems and Computers, No6 (264), pp. 46-57. https://doi.org/10.15407/csc.2019.06.046
https://doi.org/10.15407/csc.2019.06.046
Yao, Z., Wu, D., Wang, X., Zhang, B., Yu, F., Yang, C., …, Lei, X. (2021). “Wenet: Production oriented streaming and non-streaming end-to-end speech recognition toolkit”. In Proc. Int conference Interspeech’ 2021, Brno, Czechia. arXiv preprint arXiv:2102.01547. https://doi.org/10.21437/Interspeech.2021-1983
https://doi.org/10.21437/Interspeech.2021-1983
Lu, Y. J., Chang, X., Li, C., Zhang, W., Cornell, S., Ni, Z., …, Watanabe, S. (2022). “ESPnet-SE++: Speech enhancement for robust speech recognition, translation, and understanding”. In Proc. Int conference Interspeech’ 2022, pp. 5458-5462. arXiv preprint arXiv:2207.09514. https://doi.org/10.21437/Interspeech.2022-10727
https://doi.org/10.21437/Interspeech.2022-10727
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023, July). Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pp. 28492-28518. PMLR. arXiv:2212.04356 [eess.AS], 2022, https://arxiv.org/abs/2212.04356
Vintsiuk T.K., Sazhok M.M., Seliukh R.A., Fedoryn D.Ya., Yukhymenko O.A., Robeiko V.V. (2018). “Automatic recognition, understanding and synthesis of speech signals in Ukraine”. Control Systems and Computers. No 6 (278), pp. 7-24, https://doi.org/10.15407/usim.2018.06.007 (In Ukrainian).
https://doi.org/10.15407/usim.2018.06.007
Sazhok, M., Seliukh, R., Fedoryn, D., Yukhymenko, O., & Robeiko, V. (2020). “Written form extraction of spoken numeric sequences in speech-to-text conversion for Ukrainian”. In CEUR workshop proceedings, pp. 442-451. https://ceur-ws.org/Vol-2604/paper32.pdf
Sazhok, M.M., Poltyeva, A., Robeiko, V., Seliukh, R., Fedoryn, D. (2021). “Punctuation Restoration for Ukrainian Broadcast Speech Recognition System based on Bidirectional Recurrent Neural Network and Word Embeddings”. In Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Systems 2021 (COLINS-2021), pp. 300-310. https://ceur-ws.org/Vol-2870/paper25.pdf
Zasukha, D. (2023). “Using Thumbnail Length Bounds To Improve Audio Thumbnailing For Beatles Songs”. Stucnij intelekt. 2023. 28 (1). pp. 60-65 (In Ukrainian). https://doi.org/10.15407/jai2023.01.060
https://doi.org/10.15407/jai2023.01.060
Pariente,M., Cornell, S., Cosentino, J. et al.. (2020). Asteroid: the PyTorch-based audio source separation toolkit for researchers. Proc. Int. conference Interspeech’2020. arXiv preprint arXiv:2005.04132. https://doi.org/10.48550/arXiv.2005.0413
https://doi.org/10.21437/Interspeech.2020-1673
Huh, M., Ray R., Karnei, C. (2023). A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit. arXiv preprint arXiv:2303.00510, https://doi.org/10.48550/arXiv.2303.00510

Received 09.11.2023

Control Systems and Computers

Contacts

Control Systems and Computers, N4, 2023, Article 3

Modeling Domain Openness in Speech Information Technologies

Archive of journal

2024: 1 2 3 4

2023: 1 2 3 4

2022: 1 2 3 4

2021: 1 2 3 4 5 6

2020: 1 2 3 4 5 6

2019: 1 2 3 4 5 6

2018: 1 2 3 4 5 6

2017: 1 2 3 4 5 6

2016: 1 2 3 4 5 6

Archive