Control Systems and Computers

Control Systems and Computers, N4, 2024, Article 4

https://doi.org/10.15407/csc.2024.04.034

Control Systems and Computers, 2024, Issue 4 (308), pp. 34-38.

UDK 681.3.062

Marchenko Oleksandr O., Doctor(physical and math), professor, head of the department, International Research and Training Center for Information Technologies and Systems NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03187, Ukraine, omarchenko@univ.kiev.ua

Nasirov Еmil М., PhD(physical and math), senior Researcher, International Research and Training Center for Information Technologies and Systems NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03187, Ukraine, enasirov@gmail.com

Volosheniuk Dmytro O. PhD(technical), head of the laboratory, International Research and Training Center for Information Technologies and Systems NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03187, Ukraine, p-h-o-e-n-i-x@ukr.net

Building the Ukrainian-language Training Dataset for Determining the Sentiment Analysis of Texts

Introduction. Every day, the number of news, pages on social networks and chats on the Internet is increasing, accordingly, there is an increase in information that carries an emotional load. At the same time, the number of information threats is also growing. Under such conditions, the construction of systems for determining the emotional color of texts becomes extremely relevant.

Purpose. Emotional messages can be found and classified using artificial intelligence, namely based on neural network methods. For the process of learning neural networks, it is necessary to have a training sample of texts with a preliminary assessment of their emotional coloring. Such marked learning samples exist for news and texts in English, however, at the moment, no accessible learning sample of Ukrainian news and texts has been created.

Methods. Using statistical methods of sentiment analysis for detecting text tonality with extended vocabulary.

Results. Extended tonality vocabulary of the Ukrainian language was built. A large corpus of texts and their emotional coloring was built with an expertly assessed markup accuracy of 98%, containing 5,318,783 texts of various types in the Ukrainian language.

Conclusion. The built text corpus can be used to train and test neural networks for sentiment analysis of Ukrainian-language texts.

Download full text! (On Ukrainian)

Keywords: artificial intelligence, computational linguistics.

Ukrainian sentiment vocabulary. [online] Available at: <https://github.com/lang-uk/tone-dict-uk> [Accessed: 05 Feb. 2024].
Ukrainian-Sentiment-Analysis. [online] Available at: <https://github.com/skupriienko/Ukrainian-Sentiment-Analysis> [Accessed: 08 May 2024].
Mohammad, S. and Peter Turney, P. (2013). “Crowdsourcing a Word-Emotion Association Lexicon”, Computational Intelligence, 29 (3), pp. 436-465.
https://doi.org/10.1111/j.1467-8640.2012.00460.x
Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
https://doi.org/10.1609/icwsm.v8i1.14550
Bird, S, Klein, E, Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. “O’Reilly Media, Inc”.

Received 01.11.2024

Control Systems and Computers

Contacts

Control Systems and Computers, N4, 2024, Article 4

Building the Ukrainian-language Training Dataset for Determining the Sentiment Analysis of Texts

Archive of journal

2024: 1 2 3 4

2023: 1 2 3 4

2022: 1 2 3 4

2021: 1 2 3 4 5 6

2020: 1 2 3 4 5 6

2019: 1 2 3 4 5 6

2018: 1 2 3 4 5 6

2017: 1 2 3 4 5 6

2016: 1 2 3 4 5 6

Archive