Control Systems and Computers, N6, 2018, Article 1

http://usim.org.ua/arch/2018/6/5.pdfDOI: https://doi.org/10.15407/usim.2018.06.007

Upr. sist. maš., 2018, Issue 6 (278), pp. 7-24.

UDC 004.934

Vintsuk T.K., Doctor (Eng.), Prof., head of Department, pioneer in speech technology and systemsInternational Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine,

Sazhok M.M., Ph.D. (Eng.), head of the department, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine, E-mail: sazhok@gmail.com,

Selukh R.A., Research Associate, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine,

Fedorin D.Ya., Research Associate, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine,

Yukhimenko O.A., Research fellow, International Research and Training Center for Information Technologies and Systems of the NAS of Ukraine and MES of Ukraine, Glushkov ave, 40, Kyiv, 03187, Ukraine,

Robeyko V.V., Research fellow, Taras Shevchenko National University of Kyiv, Glushkov ave., 4g, 03022, Kyiv, Ukraine

Automatic recognition, understanding and synthesis of speech signals in Ukraine

Introduction. Speech is the most convenient, habitual, accessible and fast mean of communication between people and, therefore, is the most suitable for communication between human beings and machines. This makes topical the capability to develop automatic speech recognition and synthesis systems for the national science, technology and culture.

Purpose. The purpose is to analyze the state and outline the main ways of solving the problems of automatic recognition, understanding and synthesis for Ukrainian speech and spoken translation from Ukrainian Sign Language to Ukrainian language.

Methods. Modeling the spoken intellectual human activity using the analysis-by-synthesis approach accomplished with the experimental research and approbation in real application conditions.

Results. Methods and algorithms proposed and adapted to the specific hardware/software platforms allow the speech information systems developing meeting the growing expectations of potential users. The described contemporary spoken information systems demonstrate more generalization and less sensitivity to speaker and domain during analysis and high naturalness of synthesized speech signal. Due to these achievements, the processes of spoken information input and retrieval can be partially or fully automated, particularly, for Ukrainian.

Conclusion. For decades, methods and algorithms based on Generative Model are shown their productivity for speech technologies and systems that makes them widely applicable nowadays. The internationally recognized Ukrainian research school benefits from its history and traditions, demonstrates steady development and readiness to solve prospective problems related to multilingual, multimodal and acoustically adverse environments.

 Download full text! (In Ukrainian)

Keywords: speech, speech signal, analysis, recognition, understanding, synthesis.

  1. Vintsiuk T.K. Analysis, recognition and interpretation of speech signals, Kiev: Nauk. dumka, 1987, 264 p (In Russian).
  2. Vintsiuk T.K. “Comparative theoretical analysis of ICDP and HMM methods of speech recognition”, Automatic recognition of auditory images: Proc. report 15th All-Union. Workshop, Tallinn: Institute of Cybernetics, Estonian Academy of Sciences, 1989, pp. 18—24 (In Russian).
  3. Vintsiuk T.K., Gavrilyuk ON, Puchkova II.G. “Algorithms for the recognition of words and phrases and the results of their simulation”, Automatic recognition of auditory images: Tr. 8 All-Union. Seminar, Lviv: Publishing House of Lviv University, 1974, Part 3, pp. 33—37 (In Russian).
  4. Vintsiuk T.K., Gavrilyuk O.N., Kulyas A.I., Shinkazh A.G. “Real-time system for word recognition and continuous speech”, Automatic recognition of auditory images. Tbilisi: Metspiereba, 1978, pp. 176—178 (In Russian).
  5. Vintsiuk T.K., Lobanov B.M., Shinkaz A.G. “Speech recognition system and oral dialogue system SRD “RECH” on the Basis of a Micro Computer”, Automatic Pattern Recognition, Kiev: EC of the Ukrainian SSR, 1982, pp. 516—521 (In Russian).
  6. Vintsiuk T.K. “Speech Dialogue Systems of the RECH Series”, Proc. First Intern. Conf. on Information Technology for Image Analysis and Pattern Recognition, Lviv, 1990, Vol. 1, pp. 367—370.
  7. Vintsiuk T.K. “Speech recognition by dynamic programming methods”, Cybernetics, 1968, 1, pp. 81—88.
  8. Vintsiuk T.K. “Item-by-element recognition of continuous speech made up of words from a given vocabulary”, Cybernetics, 1971, 2, pp. 133—143 (In Russian).
  9. Vintsiuk T.K. “Phoneme recognition of coherent language. Initial prerequisites and problem statement”, Automation, 1972, 6, pp. 40—49 (In Ukrainian).
  10. Vintsiuk T.K. “Phoneme recognition of coherent language. Recognition, learning and self-learning algorithms”. Automation, 1973, 1, pp. 63—72 (In Ukrainian).
  11. Vintsiuk T.K. “The problem of automatic speech understanding, Pattern Recognition”, Kiev: EC of the Ukrainian Academy of Sciences, 1977, pp. 28—34 (In Russian).
  12. Vintsiuk T.K. “Learning element-by-speech recognition, Pattern Recognition and Design of Reading Automata”, 1969, 2, pp. 23—35 (In Russian).
  13. Vintsyuk T.K. “Algorithm for determining the reference elements of a word from the totality of its realizations”, Tr. Acoustic inst., 1970, 12, pp. 163—168 (In Russian).
  14. Vintsyuk T.K. “Recognition of a limited set of speech signals, Pattern recognition and design of reading machines”, 1966, 1, pp. 135—149 (In Russian).
  15. Biatov K.M., Vintsiuk T.K. “System of semantic interpretation of continuous speech”, Automatic recognition of auditory images 1982, Kiev: IC of the Ukrainian Academy of Sciences, 1982, pp. 365—368 (In Russian).
  16. Lienard J.S. “Le processus de la communication parlee”, Paris etc.: Masson, 1977, 190 p.
  17. Bridle J.S., Brown M.D., Chamberlain R.M. “Continuous Connected Word Recognition using Whole Word Templates”, The Radio and Electronic Eng., 1983, 53, 4, pp. 167—175.
    https://doi.org/10.1049/ree.1983.0034
  18. Ney H. “Dynamic Programming as a Technique for Pattern Recognition”, Proc. 6th Intern. Conf. on Pattern Recognition, Munich, 1992, pp. 1119—1125.
  19. Levinson S.E. “Structural Methods In Automatic Speech Recognition”, Proc. of the IEEE, 1985, 73, 11, pp. 1625—1650.
    https://doi.org/10.1109/PROC.1985.13344
  20. Tscheschner W. “Probleme der automatischen Sprachverarbeitung aus heutiger Sich”, Nachrichtentechnik, Electronic, 1979, 29 (1), pp. 26—29.
  21. Vintsiuk T.K. Recognition of certain classes of speech signals: author. diss. Cand. tech. Sciences, Kiev, IC of the Academy of Sciences of the USSR, 1967, 24 p.
  22. Vintsiuk T.K. “Semantic interpretation of word-by-word phrases in the RDS “Speech-1001”, Automatic recognition of auditory images, Kaunas, 1986, 4.1, pp. 15—16 (In Russian).
  23. Final Report on the UNESCO Contract SC/RP 261060.8 «Development of the Multilingual (including English, Russian languages) Speech Dialogue System for Micro-Computer», Kyiv : Institute of Cybernetics, 1988, 97 p.
  24. An indication of the achievements of Soviet science (Information TARS iz Parizhu), Rad. Ukraine, December, 17 1987. (In Ukrainian).
  25. The system of speech dialogue of the SRD “Speech-4” for the Poisk-2 microcomputer (Research Report), Kiev: EC of the Ukrainian Academy of Sciences, 1990, 171 p (In Russian).
  26. L. Rabiner, B.-H. Juang. Fundamentals of speech recognition. Prentice-Hall Int., 1993.
  27. Sadaoki Furui. “50 years of progress in speech and speaker recognition”. In Proc. of 10th Int. Conf. “Speech and Computer”, Patras, Greece, 2005, pp. 1—9.
  28. Daniel Jurafsky, James H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. (2nd edition, 2014)
  29. Gales M., Young S. “The Application of Hidden Markov Models in Speech Recognition.” Foundations and Trends in Signal Processing, 2007, 1(3), pp. 195—304.
    https://doi.org/10.1561/2000000004
  30. Povey D., Ghoshal A., Boulianne G. et. al, “The Kaldi Speech Recognition Toolkit”, IEEE 2011, Workshop on Automatic Speech Recognition and Understanding, 2011.
  31. Vintsyuk T., Lyudovyk T., Sazhok M., Selyukh R. “The automatic speaker of Ukrainian texts on the basis of a phoneme-trifon model using the natural speech signal”. Proceedings of the 6th All-Ukrainian International Conference “Processing Signals and Images and Pattern Recognition” – UkrObraz ‘ 2002, Kyiv, 2002. (In Ukrainian).
  32. Krak Yu, Vintsyuk T, Kirichenko M., Garaschenko F., Barmak O. “Development of computer technologies for modeling and controlling visual images of a human face in the synthesis of speech”, Mat-ly of the Sixth Allukr. international conf. “Processing of Signals and Images and Pattern Recognition” (UKROBRAZ’2002), October 8-12, 2002, Kyiv: Publications of UaIROO, 2002, pp. 23-26 (In Ukrainian).
  33. Dahl G., Dong Yu, Li Deng, Acero A. “Context-Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition”, IEEE Trans. Speech and Audio Proc., Special Issue on Deep Learning for Speech Processing, 2011.
  34. Vasylieva N, Sazhok M., Vintsiuk T., Chollet G. “Acoustic-Phonetic Model Application for Syllable Speech Recognition Output Post-Processing”. Proceedings of the 12th International Conference SpeCom’2007, Moscow, 2007, pp. 182-187.
  35. Sazhok M., Yatsenko V., Vintsiuk T. “Interpretation of Continuous Ukrainian Pronunciation for Spoken Dictionary-Interpreter”. Proceedings of the 12th International Conference on Speech and Computer – SpeCom’2007, Moscow, 2007, pp. 170-175.
  36. Pilipenko V.V., Robeiko V.V. Automated stenographer of Ukrainian speech, Artificial Intelligence. Donetsk: 2008, 4 (In Russian).
  37. Pylypenko V.V., Bidnyuk S.A., Selyukh R.A., Pylypenko A.V. Formalized Scenarios Building for Speech Dialog Systems on the Example of a Ticket Train Service, Upravlausie sistemy i masiny, 2013, 4, pp. 71—75 (In Russian).
  38. Sazhok M., Robeiko V., Fedoryn D. Distinctive features for Ukrainian real-time speech recognition system, Proceedings of XII Vseukr. international conf. “Processing signals and images and image recognition » (UKROBRAZ), 2014., Kyiv: Vydannya UAOIRO, 2014.
  39. Sazhok M.M. “Speech information technologies and systems”, Upravlausie sistemy i masiny, 2017, 2, pp. 38—45 (In Russian).
  40. Sazhok N.N., Robeiko V.V., Fedoryn D.Ya., Selyukh R.A. “Broadcast Speech-to-Text System for the Ukrainian Language”. Upravlausie sistemy i masiny, 2015, 6, pp. 66—73 (In Russian).

Received 05.12.18