Control Systems and Computers, N3, 2023, Article 2

https://doi.org/10.15407/csc.2023.03.015

Control Systems and Computers, 2023, Issue 3 (303), pp. 15-23

UDC 681.3.062

O.O. MARCHENKO, Doctor (physical and math.), professor, head of the department, International Research and Training Center for Information Technologies and Systems of the NAS and MES of Ukraine, ORCID: https://orcid.org/0000-0002-5408-5279Glushkov ave., 40, Kyiv, 03187, Ukraine, 
omarchenko@univ.kiev.ua

E.M. NASIROV, PhD (physical and math.), Senior Research Associate, International Research and Training Center for Information Technologies and Systems of the NAS and MES of Ukraine, ORCID: https://orcid.org/0009-0006-9016-2602Glushkov ave., 40, Kyiv, 03187, Ukraine, enasirov@gmail.com

METHODS OF DIMENSIONS REDUCTION IN TEXT PROCESSING ALGORITHMS

Paper describes methods of dimensionality reduction widely used in artificial intelligence in general, and in computer linguistics in particular, such as Non-negative matrix factorization and Singular value decomposition from the point of use in methods of Latent Semantic Analysis and Method of Principal Components. Advantages and disadvantages of each method are given. The computational complexity was investigated and a comparison of performance on dense and sparse matrices of different sizes was made. It is proposed to use them to reduce the dimensionality also of multidimensional linguistic data arrays.

Download full text! (On English)

Keywords: artificial intelligence, computational linguistics, parallel computations.

  1. Allab, K., Labiod, L., Nadif, M., 2016. “A semi-NMF-PCA unified framework for data clustering”. IEEE Transactions on Knowledge and Data Engineering, 29(1), pp. 2-16.
  2. Kuang, D., Choo, J., Park, H., 2015. “Nonnegative matrix factorization for interactive topic modeling and document clustering”. Partitional clustering algorithms, pp. 215-243.
  3. Alghamdi, H., Selamat, A., 2015. “Topic modelling used to improve Arabic web pages clustering”. In 2015 International Conference on Cloud Computing (ICCC, IEEE). pp. 1-6.
  4. Hosseini-Asl, E., Zurada, J.M., 2014. “Nonnegative matrix factorization for document clustering: A survey”. In Artificial Intelligence and Soft Computing: 13th International Conference, ICAISC 2014, Zakopane, Poland, June 1-5, 2014, Proceedings, Part II, LNAI 8468,, Springer International Publishing, pp. 726-737.
  5. Klinczak, M.N., Kaestner, C.A., 2015. “A study on topics identification on Twitter using clustering algorithms”. In 2015 Latin America Congress on Computational Intelligence (LA-CCI), pp. 1-6. IEEE.
  6. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R., 1990. “Indexing by latent semantic analysis”. Journal of the American society for information science, 41(6), pp. 391-407.
  7. Dumais, S. T. (1991). “Improving the retrieval of information from external sources”. Behavior research methods, instruments, & computers, 23(2), pp. 229-236.
  8. Xu, W., Liu, X., Gong, Y., 2003. “Document clustering based on non-negative matrix factorization”. In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pp. 267-273.
  9. Shahnaz, F., Berry, M. W., Pauca, V.P., Plemmons, R.J., 2006. “Document clustering using nonnegative matrix factorization”. Information Processing & Management, 42(2), pp. 373-386. DOI: 10.1016/j.ipm.2004.11.005.
  10. Van De Cruys, T., 2010. A Non-negative Tensor Factorization Model for Selectional Preference Induction. Journal of Natural Language Engineering, T. 16(4), pp. 417–437.
  11. Van de Cruys, T., Rimell, L., Poibeau, T., Korhonen, A., 2012. “Multi-way Tensor Factorization for Unsupervised Lexical Acquisition”. Proceedings of COLING-2012, pp. 2703–2720.
  12. Lee, D.D., Seung, H.S., 2000. “Algorithms for non-negative matrix factorization”. In NIPS, MIT Press, 556–562.
  13. Li, X., Wang, S., Cai, Y., 2019. Tutorial: Complexity analysis of singular value decomposition and its variants. https://arxiv.org/abs/1906.12085.
  14. Golub, Gene H.; Van Loan, Charles F. “Lanczos Methods”. Matrix Computations. Baltimore: Johns Hopkins University Press. 1996, pp. 470–507.
  15. Marchenko, O.O., Nasirov, E.M., 2021. “Block-Diagonal approach for non-negative factorization of huge sparse linguistic matrices” Proceedings Actual problems of theory of control systems in compures sciences, 72–78 (In Ukrainian).
  16. Phan, A.H., Cichocki, A., 2008. “Multi-way Nonnegative Tensor Factorization Using Fast Hierarchical Alternating Least Squares Algorithm (HALS)”. In Proceedings 2008 International Symposium on Nonlinear Theory and its Applications,41–44.

Received 29.08.2023