Control Systems and Computers, N2, 2019, Article 3

Upr. sist. maš., 2019, Issue 2 (280), pp. 25-31.

UDC 004.023

Inna V. Stetsenko, D. Eng. Sc., Professor of National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Peremohyave., 37, Kyiv, 03056, Ukraine,

Yuriy S. Talko, Master of Information Systems and Technology, student of National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Peremohyave., 37, Kyiv, 03056, Ukraine,

Compression Methods Of Deep Learning Models Based
on Student-Teacher Method

Introduction. The use of deep neural networks is associated with the processing of large volumes of data (datasets) from the outside world (images, videos, huge data arrays like statistics), which, in case of limited computing resources leads to unacceptable time consuming. After the invent of compression methods, it has become possible to significantly reduce the time spent on calculating deep networks and, accordingly, it was possible to apply them on mobile or other devices with limited computing resources. The article presents a method of compression using a noise regulator and distillation of knowledge.

Purpose. The purpose of the article is to offer an effective way of compressing and learning the model through the modification of the method of distillation of knowledge.

Methods. To provide greater accuracy and fewer errors in the model, a compression method is proposed based on the addition of a regularizer that implements the Gaussian noise to the teacher’s knowledge in the student-teacher methods.

Result. The results of experiments show that if the data and noise level is selected correctly, it is possible to reduce the number of errors to 11%. Consequently, the use of the proposed method led to accelerated learning of the student model (due to the fact that the training as such has already been carried out earlier), and using the regularizer, the number of mistakes are done by the student network is reduced.

Conclusion. The compression method proposed is based on the simulation of training from several teachers, which allows reducing the number of errors compared to the usual approach of student-teacher (student­-teacher methods).

Download full text! (In Ukrainian)

  1. Benedetto, J.I., Sanabria, P., Neyem, A., Navon, J., Poellabauer, C., Xia, B., 2018. “Deep Neural Networks in Mobile Healthcare Applications: Practical Recommendations”. Proceedings The 12th International Conferenceon Ubiquitous Computing and Ambient Intelligence (UCAmI 2018), 2(19), pp. 1-12.,
  2. Wong, M., Gales, M. J. F., 2016. Sequence Student-Teacher Training of Deep Neural Networks. INTERSPEECH. September 8-12, 2016, San Francisco, USA, . [online] Available at: <> [Accessed 21 Oct., 2018].
  3. Chen, W., Wilson, J. T., Tyree, S., Weinberger, K. Q., Chen, Y., 2015. Compressing neural networks with the hashing trick. CoRR, 2015, [online] Available at: <> [Accessed 21 Oct., 2018].
  4. Denil, M., Shakibi, B., Dinh, L., de Freitas, N. et al., 2013. “Predicting parameters in deep learning”. Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS’13, 2, pp. 2148-2156.
  5. Hinton, G., Vinyals, O., Dean, J., 2015. Distillingtheknowledgein a neuralnetwork. arXiv:1503.02531, [online] Available at: <> [Accessed 21 Oct., 2018].
  6. Baand, J., Caruana, R. Do deep nets really need to be deep? In Advances in neural information processing systems, 2014, [online] Available at: <> [Accessed 21 May, 2018].
  7. Bishop, C. M., 1995. “Training with noise is equivalent to Tikhonov regularization”. Neural computation, 7(1), pp. 108-116.
  8. MNIST hand written digit data base, YannLeCun, Corinna Cortes and Chris Burges. [online] Available at: <> [Accessed 21 Oct., 2018].

Received 24.01.2019