Control Systems and Computers, N5, 2019, Article 6

https://doi.org/10.15407/csc.2019.05.048

Control Systems and Computers, 2019, Issue 5 (283), pp. 48-61.

UDC 004.83

Pogorilyy S.D., Doctor (Eng.), Professor, Head of computer engineering department of the faculty of radiophysics, electronics and computer systems of Taras Shevchenko National University of Kyiv (Kyiv, Ukraine), Glushkov ave., 4G, Kyiv, 03022, Ukraine

 Kramov A.A., Postgraduate student at the computer engineering department, Faculty of radiophysics, electronics and computer systems of Taras Shevchenko National University of Kyiv (Kyiv, Ukraine), Glushkov ave., 4G, Kyiv, 03022, Ukraine

Method of Noun Phrase Detection in Ukrainian Texts

Introduction. The area of natural language processing considers AI‑complete tasks that cannot be solved using traditional algorithmic actions. Such tasks are commonly implemented with the usage of machine learning methodology and means of computer linguistics. One of the preprocessing tasks of a text is the search of noun phrases. The accuracy of this task has implications for the effectiveness of many other tasks in the area of natural language processing. In spite of the active developmentof researchin the area of natural language processing, the investigation of the search for noun phrases within Ukrainian texts are still at an early stage.

Purpose. Comparative analysis of the main methods of noun phrases detection in English and Ukrainian texts. The creation of a complex method for the detection of noun phrases in texts according to the features of the Ukrainian language. The performing of experimental examination of the suggested method on the corpus of Ukrainian articles.

Results. The different methods of noun phrases detection have been analyzed.The expediency of the representation of sentences as a tree structure has been justified. The key disadvantage of many methods of noun phrase detection is the severe dependence of the effectiveness of their detection from the features of a certain language. Taking into account the unified format of sentence processing and the availability of the trained model for the building of sentence trees for Ukrainian texts, the Universal Dependency model has been chosen.The complex method of noun phrases detection in Ukrainian texts utilizing Universal Dependencies means and named-entity recognition model has been suggested. Experimental verification of the effectiveness of the suggested method on the corpus of Ukrainian news has been performed. Different metrics of method accuracy have been calculated.

Conclusions. The results obtained can indicate that the suggested method can be used to find noun phrases in Ukrainian texts. An accuracy increase of the method can be made with the usage ofappropriate named-entity recognition modelsaccording to a subject area.

 Download full text! (In Ukrainian)

Keywords: natural language processing, noun phrase, UniversalDependencies model, NER model, tree structure of a sentence.

  1. Shkuratjana, N. and Shevchuk, S., 2010. Modern Ukrainian literary language. Modular course. [Suchasna ukrayins`ka literaturna mova. Modulnyj kurs]. Kyiv: Arij, 824 p. (In Ukrainian).
  2. Bošković, Ž., 2008. “What will you have, DP or NP”.  Proc. of NELS. pp. 101-114.
  3. Lyutikova, E., 2015. “Coordination, features and structure of the noun phrase in Russian” [Soglasovanie, priznaki i struktura imennoy gruppyi v russkom yazyike]. Russkiy yazyik v nauchnom osveschenii, 2(30), pp. 44-74. (In Russian).
  4. Su Nam, K., Baldwin, T. and Kan, M., 2010. “Evaluating N-gram based evaluation metrics for automatic keyphrase extraction”. In: Proc. of the 23rd international conference on computational linguistics. Association for Computational Linguistics, pp.572-580.
  5. Handler, A., Denny, M., Wallach, H. and O’Connor, B., 2016. “Bag of what? simple noun phrase extraction for text analysis”. In: Proc. of the First Workshop on NLP and Computational Social Science. pp.114-124.
    https://doi.org/10.18653/v1/W16-5615
  6. Nivre, J., de Marneffe, M., Ginter, F., Goldberg, Y., Hajič, J., Manning, C., McDonald, R., Petrov, S., Pyysalo, S., Silveira, N., Tsarfaty, R. and Zeman, D., 2016. “Universal Dependencies v1: A Multilingual Treebank Collection”. Proc. of the Tenth International Conference on Language Resources and Evaluation (LREC’16). [online] European Language Resources Association (ELRA), pp.1659–1666. Available at: <https://www.aclweb.org/anthology/L16-1262.pdf> [Accessed 18 Oct. 2019].
  7. UniversalDependencies/UD_Ukrainian-IU. [online] Available at: <https://github.com/UniversalDependencies/UD_Ukrainian-IU> [Accessed 18 Oct. 2019].
  8.  Models: lang-uk. [online] Available at: <http://lang.org.ua/en/models> [Accessed 18 Oct. 2019].
  9. Glybovets, A., 2017. “Automated search of named entities in unmarked Ukrainian texts”. Shtuchnyj intelekt, 2(76), pp.45-52. (In Ukrainian).
  10. Powers, D., 2011. “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation”. Journal of Machine Learning Technologies, 2(1), pp.37-63.
  11. Universal Dependencies. [online] Available at: <https://universaldependencies.org/guidelines.html> [Accessed 18 Oct. 2019].
  12. Laboratoriya ukrayinskoyi, 2019. Zolotyj standart. [online] Available at: <https://mova.institute/%D0%B7%D0%BE%D0%BB%D0%BE%D1%82%D0%B8%D0%B9_%D1%81%D1%82%D0%B0%D0%BD%D0%B4%D0%B0%D1%80%D1%82> [Accessed 18 Oct. 2019].
  13. Python package to extract NP from the Ukrainian language. [online] Available at:<https://github.com/artemkramov/np-extractor-ua> [Accessed 18 Oct. 2019].

Received 29.10.2019