Control Systems and Computers, N3, 2023, Article 3

https://doi.org/10.15407/csc.2023.03.024

Control Systems and Computers, 2023, Issue 3 (303), pp. 24-32

UDK  004.852

O.V. Radchenko, student, National Technical University of Ukraine “Ihor Sikorsky Kyiv Polytechnic Institute”, ORCID: https://orcid.org/0009-0002-5810-4526, 37 Beresteyskyi Avenue, Kyiv 03056, Ukraine, Radchenko.oleh@lll.kpi.ua

V.A. Pavlov, PhD, Аs. Рrof, National Technical University of Ukraine “Ihor Sikorsky Kyiv Polytechnic Institute”, ORCID: https://orcid.org/0000-0002-3293-5308, 37 Beresteyskyi Avenue, Kyiv 03056, Ukraine,
pavlov.volodymyr@lll.kpi.ua 

O.K. Horodetska, PhD, Аs. Рrof., National Technical University of Ukraine “Ihor Sikorsky Kyiv Polytechnic Institute”, ORCID: https://orcid.org/0000-0003-1288-3528, 37 Beresteyskyi Avenue, Kyiv 03056, Ukraine,
o.nosovets@gmail.com

G.A. Korniienko, Senior Lecturer, National Technical University of Ukraine “Ihor Sikorsky Kyiv Polytechnic Institute”, ORCID: https://orcid.org/0000-0003-2104-5745, 37 Beresteyskyi Avenue, Kyiv 03056, Ukraine,
galinakor5555@gmail.com

MULTICLASS CLASSIFIER BASED ON BINARY LOGISTIC REGRESSIONS OBTAINED ACCORDING TO THE PRINCIPLES OF GMDH

Introduction. The issue of accuracy improvement in classification tasks is always topical, and various approaches have been developed, applied in accordance with the peculiarities of the problem formulation and properties of the feature space. Among the most effective models, classifiers based on multiple logistic regressions have proved themselves.

Purpose. The aim of the paper is to develop an algorithm for solving multiclassification problems on the basis of binary logistic models built by the stepwise multiple logistic regression algorithm of the Stepwise type, improved according to the principles of the method of group accounting of arguments.

Methods. The paper proposes a modification of the stepwise algorithm for creating binary multivariate logistic regressions Stepwise, where it is proposed to optimize the algorithm parameters in accordance with the principles of the method of group consideration of arguments: significance levels by the logarithmic likelihood ratio test for inclusion and exclusion of model arguments. The choice of optimal parameters is realized in accordance with an external criterion that takes into account the balance of classification accuracy of training and test samples and the balance of class classification accuracy. Subsequently, the binary class models obtained by the one-versus-all principle are combined into a multiclass classifier that returns the answer according to the maximum likelihood of the class. The comparison of classification models obtained by the classical Stepwise algorithm and the one proposed in the robot is carried out on the medical data of the publicly available Internet resource Kaggle.

Conclusion. The paper substantiates and demonstrates the advantages of classifiers based on logistic multivariate regressions optimized according to the principles of the method of group consideration of arguments relative to the classical version of the Stepwise algorithm. The effective application of the algorithm in solving multiclass classification problem is shown.

Download full text! (On Ukrainian)

Keywords: multiclass classifier, stepwise algorithm, Stepwise, logistic regression, model optimization, external criterion, Group Method оf Data Handling.

  1. Schmidhuber, J., 2015. “Deep learning in neural networks: An overview”. Neural Networks. Vol. 61, pp. 85-117.
    https://doi.org/10.1016/j.neunet.2014.09.003
  2. Ben-Hur, A., Horn, D., Siegelmann, H. T., Vapnik, V., 2001. “Support vector clustering”. Journal of Machine Learning Research, 2, pp.125-137.
  3. Von Winterfeldt, D., Edwards, W., 1986. “Decision trees”. Decision Analysis and Behavioral Research. Cambridge University Press. pp. 63-89. ISBN 0-521-27304-8.
  4. Babenko, V., Nastenko, I., Pavlov, V., Horodetska, O., Dykan, I., Tarasiuk, B., Lazoryshinets, V., 2023. “Classification of Pathologies on Medical Images Using the Algorithm of Random Forest of Optimal-Complexity Trees”. Cybernetics and Systems Analysis, 59 (2), pp. 346-358. DOI: 10.1007/s10559-023-00569-z.
    https://doi.org/10.1007/s10559-023-00569-z
  5. Davydko, O., Hladkyi, Y., Linnik, M., Nosovets, O., Pavlov, V., Nastenko, Ie., 2021. “Hybrid Classifiers Based on CNN, LSOF, GMDH in COVID-19 Pneumonic Lesions Types Classification Task”. Proceedings of the XVI IEEE International Conference CSIT-21& International Workshop on Inductive Modeling. Lviv, Ukraine, 23-26 September, pp. 380-384. DOI: 10.1109/CSIT52700.2021.9648752.
    https://doi.org/10.1109/CSIT52700.2021.9648752
  6. Bisong, E., Bisong, E., 2019. “Logistic regression”. Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, pp. 243-250.
    https://doi.org/10.1007/978-1-4842-4470-8_20
  7. Nilashi, M., Ibrahim, O., Dalvi, M., Ahmadi, H., Shahmoradi, L., 2017. “Accuracy improvement for diabetes disease classification: a case on a public medical dataset”. Fuzzy Information and Engineering, 9(3), pp. 345-357.
    https://doi.org/10.1016/j.fiae.2017.09.006
  8. Kirasich, K., Smith, T., Sadler, B., 2018. “Random forest vs logistic regression: binary classification for heterogeneous datasets”. SMU Data Science Review, 1(3), Article 9. [online]. Available at: <https://scholar.smu.edu/datasciencereview/vol1/iss3/9> [Accessed: 1 Jan. 2023].
  9. Ivakhnenko, A.G., Stepashko, V.S., 1985. Noise-immunity of modeling. Kiev: Naukova dumka, 216 p. (In Russian).
  10. Ivakhnenko, A.G., Ivakhnenko, G.A., 1995. “The Review of Problems Solvable by Algorithms of the Group Method of Data Handling (GMDH)”. Pattern Recognition and Image Analysis. 5 (4), pp. 527-535.
  11. Strano, M., Colosimo, B.M., 2005. “Logistic regression analysis for experimental determination of forming limit diagrams”. International Journal of Machine Tools and Manufacture. 46 (6), pp. 673-682. DOI:10.1016/j.ijmachtools.2005.07.005.
    https://doi.org/10.1016/j.ijmachtools.2005.07.005
  12. Zhang, Zh., 2016. “Variable selection with stepwise and best subset approaches”. Annals of translational medicine, 4 (7). 136. DOI: 10.21037/atm.2016.03.35.
    https://doi.org/10.21037/atm.2016.03.35
  13. El-Koka, A., Cha, K.H., Kang, D.K., 2013. “Regularization parameter tuning optimization approach in logistic regression”. In 2013 15th International Conference on Advanced Communications Technology (ICACT), pp. 13-18.
  14. Derksen, S., Keselman, H.J., 1992. “Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables”. British Journal of Mathematical and Statistical Psychology, 45(2), pp. 265-282.
    https://doi.org/10.1111/j.2044-8317.1992.tb00992.x
  15. Zhou, J., Foster, D.P., Stine, R.A., Ungar, L.H., Guyon, I., 2006. “Streamwise feature selection”. Journal of Machine Learning Research, 7(9). pp. 1861-1885.
  16. Hupalo, M., Pavlov, V., Nastenko Ye., Kornienko G., 2023. “Modeling results optimization based on data splitting by Mahalanobis distance similarity criterion”. Biomedical Engineering and Technology, 11, pp. 21-30 (In Ukrainian).
  17. In Lee, K., Koval, J.J., 1997. “Determination of the best significance level in forward stepwise logistic regression”. Communications in Statistics-Simulation and Computation, 26 (2), pp. 559-575.
    https://doi.org/10.1080/03610919708813397
  18. Woolf, B., 1957. “The log likelihood ratio test (the G-test)”. Annals of human genetics, 21(4), pp. 397-409.
    https://doi.org/10.1111/j.1469-1809.1972.tb00293.x
  19. Buse, A., 1982. “The Likelihood Ratio, Wald, and Lagrange Multiplier Tests: An Expository Note”. The American Statistician. 36 (3a), pp. 153-157.
    https://doi.org/10.1080/00031305.1982.10482817
  20. Fetal Health Classification. [online]. Available at: <https://www.kaggle.com/datasets/andrewmvd/fetal-health-classification> [Accessed: 17 Dec. 2022].

 Received 19.09.2023