Control Systems and Computers, N4, 2021, Article 2

 https://doi.org/10.15407/csc.2021.04.013

Control Systems and Computers, 2021, Issue 4 (294), pp. 13-18.

UDC 004.62:004.91

O.O. Letychevskyi, Doctor of Physical and Mathematical Sciences, head of the department,
V.M.Glushkov Institute of Cybernetics, The National Academy of Sciences of Ukraine,
Glushkov ave., 40, Kyiv, 03187, Ukraine
oleksandr.letychevskyi@litsoft.com.ua

M.K. Morokhovets, Candidate of Physical and Mathematical Sciences, senior research worker,
V.M.Glushkov Institute of Cybernetics, The National Academy of Sciences of Ukraine,
Glushkov ave., 40, Kyiv, 03187, Ukraine
marina.morokhovets@gmail.com

N.M. Shchogoleva, research worker,
V.M.Glushkov Institute of Cybernetics, The National Academy of Sciences of Ukraine,
Glushkov ave., 40, Kyiv, 03187, Ukraine
natashch2904@gmail.com

Some Means for Processing Electronic Text Documents

Introduction. Digitization of legislation is an important area today, which is identified by the government as a priority. Creating digital legal documents and verifying them for compliance with the law is a necessary task in all areas of jurisprudence.

This sets the task of automatic formalizing a legal document created as an arbitrary text in natural language.

Purpose. Preparing a document for storage in digital format for further processing may require prior work with an original text.

When using automatic means of linguistic analysis of the texts submitted in natural language, in particular, legal, which processes the text in sentences (working up the text sequentially sentence by sentence), problems of local and global nature arise.

The problem of local nature is created, in particular, by the presence in the text of the sentences, which due to their considerable length are difficult to process (with the help of one or another tool of text analysis). The problem of a global nature arises when the semantic connection between the components of different sentences should be taken into account during the automatic processing of the text. The purpose of this work is to develop means for overcoming these problems.

Results. A model for structuring long sentences containing enumerations as well as a method for eliminating the synonymy of object names referred to in the text, which is intended for automatic analysis, has been developed.

Conclusion. Marking up sentences containing enumerations is useful, especially when the text is intended for analysis using a procedure that processes the text sentence by sentence. Structuring a sentence with an enumeration enables, on the one hand, to prepare the sentence for processing in parts, and on the other hand, not to lose the integrity of the sentence when processing in parts.

In the method of eliminating the synonymy of names proposed in this paper, both the step of identifying the names of objects and the step of revealing the identity of names requires semantic analysis. To control the correctness of these steps, Oracle was introduced to improve the reliability of the result.

Download full text! (In Ukrainian)

Keywords: digital legal documents, synonymy of object names, sentence structuring, the trustworthiness of artificial intelligence systems.

  1. Loper E., Bird S., 2002. “NLTK: the natural language toolkit”, arXiv cs/0205028, Department of Computer and Information Science University of Pennsylvania, Philadelphia, PA 19104-6389, USA. [online] Available at: <https://arxiv.org/pdf/cs/0205028.pdf>.
  2. “Mnogofunktsionalnaia sistema proverki pravopisaniia tekstov” [“Multifunctional system for checking the spelling of texts”], ORFO. [online] Available at: <https://orfo.ru//>. (In Russian).
  3. Carpenter B., 2004. Phrasal queries with LingPipe and Lucene: ad hoc genomics text retrieval. [online] Available at: <https://trec.nist.gov/pubs/trec13/papers/alias-i.geo.pdf>.
  4. MetaFraz. [online] Available at: <http://www.metafraz.ru>. (In Russian).
  5. 5. Pullenti 3. [online] Available at: <http://www.pullenti.ru/>.
  6. Glybovets A. M., 2017. “Avtomatyzovanyi poshuk imenovanyh sutnostei u nerozmichenyh tekstah ukrayinskoiu movoiu” [“Automated search of named entities in unmarked Ukrainian texts”], Shtuchnyi intelekt, 2, pp. 45–51. [online] Available at: <http://dspace.nbuv.gov.ua/bitstream/handle/123456789/133662/05-Glibovets.pdf?sequence=1> (In Ukrainian).
  7. Pogorilyy S. D., Kramov A. A., 2019. “Method of Noun Phrase Detection in Ukrainian Texts”, Control Systems and Computers, 5 (283), pp. 48-59. (In Ukrainian).
    https://doi.org/10.15407/csc.2019.05.048
  8. Mishchenko N. M., Morokhovets M. K., Felizhanko O. D., Shtelik Y. V., Shcho-goleva N. N., 2018. “Novyie funktsionalnyie vozmozhnosti sistemy obrabotki iestestvennoiazykovykh spetsifikatsii i sreda ieio funktsionirovaniia” [“New functionalities of the system for natural-language specifications processing and its operating environment”], Cybernetics and Systems Analysis, 54 (6), pp. 37-46. [online] Available at: <http://www.kibernetika.org/PDFsE/2018/06/5.pdf>. (In Russian). (See also: Cybernetics and Systems Analysis, 54 (6), pp. 883-891. (In English)).
    https://doi.org/10.1007/s10559-018-0091-3
  9. Mishchenko N. M., Shchegoleva N. N., 1993. “O proektirovanii iazykovykh protsessorov na PEVM” [“On computer-aided design of language processors”], Cybernetics and Systems Analysis, 2, pp. 110-117 (In Russian). (See also: Cybernetics and Systems Analysis, 29, pp. 242-246. (In English)).
    https://doi.org/10.1007/BF01132785
  10. Sileno G., Boer A., van Engers T., 2018. “The Role of Normware in Trustworthy and Explainable AI”, Proceedings of the XAILA workshop on eXplainable AI and Law, in conjunction with JURIX 2018, CEUR Workshop Proceedings, 2381, pp. 9–16. [online] Available at: <http://ceur-ws.org/Vol-2381/xaila2018_paper_5.pdf>.
  11. Sileno G., 2020. “Of Duels, Trials and Simplifying Systems”, European Journal of Risk Regulation, 11 (3), pp. 683-692. DOI: https://doi.org/10.1017/err.2020.38

Received 28.05.2021