Control Systems and Computers, N6, 2019, Article 5

https://doi.org/10.15407/csc.2019.06.046

Control Systems and Computers, 2019, Issue 6 (284), pp. 46-57.

UDK 004.934

Sazhok M.M., Ph.D. (Eng.), head of the department, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine, E-mail: sazhok@gmail.com,

Seliukh R.A., Research Associate, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine,

Fedoryn D.Ya., Research Associate, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine,

Yukhymenko O.A., Research fellow, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine,

Robeiko V.V., Research fellow, Taras Shevchenko National University of Kyiv, Glushkov ave., 4g, 03022, Kyiv, Ukraine

Automatic Speech Recognition for Ukrainian Broadcast Media Transcribing

A speech-to-text conversion scheme was implemented that made it possible to obtain the result of the recognition of phonograms of broadcasting in a convenient form both for a novice user and for further automatic processing. In particular, according to the text received, it’s clear what is being discussed, factual material is being tracked (names, numbers, dates, etc.), punctuation marks facilitate the perception of the text, and overall the cost of manual editing to get the final transcript is reduced.

 Download full text! (In Ukrainian)

Keywords: speech, speech signal, analysis, recognition, understanding, synthesis.

  1. Vintsiuk, K., 1987. Analysis, recognition and interpretation of speech signals. Kyiv: Naukova dumka, 264 p.
  2. Furui, S., 2005. “50 years of progress in speech and speaker recognition”. In Proc. of 10th Int. Conf. “Speech and Computer”, Patras, Greece, pp. 1—9.
  3. Hinton, G., Deng, L., Yu, D., Dahl, G. et al., 2012. Deep Neural Networks for Acoustic Modeling in Speech Recognition. Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97.
    https://doi.org/10.1109/MSP.2012.2205597
  4. Zheng-Hua Tan, Achintya kr. Sarkar and Najim Dehak, “rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method,” Computer Speech and Language, 2019.
  5. Mohri, M., Pereira, F.Riley, M., 2006. “Speech recognition with weighted finite-state transducers”. Springer Handbook on Speech Processing and Speech Communication. Springer-Verlag, pp. 559-584.
    https://doi.org/10.1007/978-3-540-49127-9_28
  6. Allauzen, C., Riley, M., Schalkwyk, J., Skut, W. Mohri, M., 2007. OpenFst: A General and Efficient Weighted Finite-State Transducer Library. In Proc. CIAA.
  7. Povey, D. “The Kaldi Speech Recognition Toolkit”, Povey D., Ghoshal A., Boulianne G. al, IEEE 2011 Workshop on Automatic Speech Recognition and Understanding.
  8. Shyrokov, V.A., Manako, V.V., 2001. “Orhanizatsiya resursiv natsionalʹnoyi slovnykovoyi bazy. Movoznavstvo”, pp. 3–13. (In Ukrainian).
  9. Robeyko, V.V., Sazhok, M.M., 2011. “Bahatoznachna bahatorivneva modelʹ peretvorennya orfohrafichnoho tekstu na fonemnyy”. Shtuchnyy intelekt, 4. Donetsk, pp. 117-125. (In Ukrainian).
  10. CMU Dictionary, http://www.speech.cs.cmu.edu/cgi-bin/cmudict/.
  11. Dehak, T., Kenny, P., Dehak,, Dumouchel, P., Ouellet,P., 2011. “Front-End Factor Analysis for Speaker Verification”, in IEEE Transactions on Audio, Speech, and Language Processing, 19(4), pp 788-798.
    https://doi.org/10.1109/TASL.2010.2064307
  12. Zewoudie, A.W., Luque,J., Hernando, J., 2018. “The use of long-term features for GMM- and i-vector-based speaker diarization systems”. EURASIP Journal on Audio, Speech, and Music Processing, 14.
    https://doi.org/10.1186/s13636-018-0140-x
  13. Tilk, O., Alumae, T., 2016. Bidirectional Recurrent Neural Network with Attention Mechanism for Punctuation Restoration. Interspeech, pp. 3047-3051.
    https://doi.org/10.21437/Interspeech.2016-1517
  14. Safarik, R., Nouza, J., 2017. “Unified Approach to Development of ASR Systems for East Slavic Languages”. In: Camelin N., Esteve Y., Martin-Vide C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science, vol 10583. Springer, Cham.
    https://doi.org/10.1007/978-3-319-68456-7_16
  15. Sazhok, N.N., Robeiko, V.V., Fedoryn, D.Ya., Selyukh, R.A., 2015. “Broadcast Speech-to-Text System for the Ukrainian“. Upravluusie sistemy i masiny, 6, pp. 66-73. (In Russian).
  16. Sazhok, M.M., Marikovskyy, O.V., Martynenko, M.R., Robeyko, V.V., Selyukh, R.A., Fedoryn, D.YA., 2016. “Systema avtomatychnoho monitorynhu mediynoho prostoru na osnovi tekhnolohiy rozpiznavannya slukhovykh i zorovykh obraziv”. Intelektualni systemy pryynyattya rishen ta problemy obchyslyuvalnoho intelektu: Materialy mizhnarodnoyi naukovoyi konferentsiyi. Zaliznyy Port, pp. 309-310. (In Ukrainian).

Received 26.11.19