Control Systems and Computers, N6, 2018, Article 4

DOI: https://doi.org/10.15407/usim.2018.06.046

Upr. sist. maš., 2018, Issue 6 (278), pp. 46-73.

UDC 581.513

Tymchyshyn Roman M., PhD student,

Volkov Olexander Ye., head of department,

Gospodarchuk Oleksiy Yu., senior research fellow,

Bogachuk Yuriy P., leading research fellow,

International Research and Training Center for Information Technologies and Systems of the NAS and MES of Ukraine, Glushkov ave., 40, Kyiv, 03187, Ukraine

MODERN APPROACHES TO COMPUTER VISION

Introduction. Computer vision includes a wide variety of problems: image segmentation, processing, classification, scene reconstruction, pose estimation, object detection, trajectory tracking and others. These problems are cornerstones of artificial intelligence.

The field has been rapidly evolving in recent years, partly due to the fact that such giants of IT industry as Google and Microsoft have joined the research. AI systems are in high demand nowadays. Technological advances have enabled many applications of computer vision in dozens of industries. Among them are such well known applications as smart stores, biometric authentication, automation of agricultural processes using drones, video surveillance, improving the quality of photo and video data, autonomous delivery of parcels by unmanned aerial vehicles. The scope will be expanding since the need for artificial intelligence systems increases over time and vision is one of the most informative sensors that can be used in such systems.

Purpose. The number of developments in the field of computer vision increases exponentially and staying up to date is not an easy task. There is a wide variety of existing approaches and choosing the right one can be difficult. The goal of this paper is to present a structured overview of modern techniques in the field of computer vision with their advantages and disadvantages, and identification of unresolved problems. Accuracy is not the only quality measure considered, we also take speed and memory into account, which is critical for embedded systems (unmanned aerial vehicles, mobile devices, robotic and satellite systems).

Methods. Fuzzy logic, convolutional neural networks, feature detectors and descriptors.

Results. Fuzzy logic theory has led recognition to a completely new level by presenting a new methodological and algorithmic framework for working with complex and uncertain systems. Introduction of type-2 fuzzy sets has significantly improved accuracy and robustness. Their main advantages are the use of expert’s knowledge and interpretability of fuzzy logic models. Now fuzzy logic is mainly used as a complement for other systems with the aim to improve decision making process by handling the uncertainty. Researchers often employ this technique for solving image segmentation and filtering problems.

Convolutional neural networks (CNN) make the explicit assumption that the inputs are images. This assumption allows to encode certain properties into the architecture and lead to striking results. CNN architectures even managed to beat human in a classification task in some cases (e.g. on ImageNet visual database). Presented here are the architectures with state-of-the-art results in image classification and object detection tasks.

Feature detectors and descriptors were the most commonly used tool in image processing for years. They remain a great alternative to resource intensive neural networks. Methods based on feature detectors and descriptors don’t require large databases for learning. A good fit for these types of methods is autonomous navigation of unmanned aerial vehicles where images matching is needed for the coordinate identification.

Conclusion. While the great progress has been made in recent years there is still a number of unsolved problems. Existing algorithms lack generality. Performance improvement usually leads to accuracy degradation. There are no high-quality accurate algorithms that can solve object detection problems in real-time. Use of accurate computer vision algorithms requires significant amounts of memory and computing resources that may not be available on embedded systems. Training time of deep convolutional neural networks is still large and can reach weeks even on the most performant computers. There is no clear way to deal with low quality images.

Download full text! (In Ukrainian)

Keywords. Computer vision, image classification, object detection, image segmentation, image filtering, edge detection, fuzzy logic, neural networks, feature detectors, feature descriptors.

Wang, C., Xu, A., Li, C., Zhao, X., 2016. “Interval type-2 fuzzy based neural network for high resolution remote sensing image segmentation,” ISPRS The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLI-B7, pp. 385–391.
https://doi.org/10.5194/isprsarchives-XLI-B7-385-2016
Shi, J., Lei, Y., Zhou, Y., 2016. “A narrow band interval type-2 fuzzy approach for image segmentation,” Journal of Systems Architecture, 64, pp. 86-99.
https://doi.org/10.1016/j.sysarc.2015.11.002
Murugeswari, P., Manimegalai, D., 2011. “Noise Reduction in Color image using Interval Type-2 Fuzzy Filter (IT2FF),” International Journal of Engineering Science and Technology, 3, 2, pp. 1334-1338.
Yuksel, M., Basturk, A., 2012. “Application of Type-2 Fuzzy Logic Filtering to Reduce Noise in Color Images,” IEEE Computer Intelligence Magazine, 7, pp. 25-35.
https://doi.org/10.1109/MCI.2012.2200624
Own, C. M., Tsai, H. H., Yu P.T., Lee, Y.J., 2006. “Adaptive type-2 fuzzy median filter design for removal of impulse noise,” Imaging Scientific Journal, 54, pp. 3-18.
https://doi.org/10.1179/174313106X93778
Melin, P. Mendoza O., Castillo, O., 2010. “An improved method for edge detection based on interval type-2 fuzzy logic,” Expert Systems with Applications, 37, pp. 8527-8535.
https://doi.org/10.1016/j.eswa.2010.05.023
Melin, P., Gonzalez, C., Castro, J., Mendoza, O., Castillo, O., 2014. “Edge-Detection Method for Image Processing Based on Generalized Type-2 Fuzzy Logic,” IEEE Transactions on Fuzzy Systems, 22, pp. 1515-1525.
https://doi.org/10.1109/TFUZZ.2013.2297159
Gonzalez, C.I., Melin, P., Castillo, O., 2017. “Edge Detection Method Based on General Type-2 Fuzzy Logic Applied to Color Images,” Information (Switzerland), 8.
https://doi.org/10.1007/978-3-319-53994-2
Lucas, L., Centeno, T., Delgado, M., 2008. “Land cover classification based on general type-2 fuzzy classifiers,” International Journal of Fuzzy Systems, 10, pp. 207-216.
Melin, P., 2018. “Genetic optimization of type-1, interval and intuitionistic fuzzy recognition systems,” Notes on Intuitionistic Fuzzy Sets, 24, pp. 106-128.
https://doi.org/10.7546/nifs.2018.24.2.106-128
CS231n Convolutional Neural Networks for Visual Recognition, [Online]. Available: http://cs231n.github.io/convolutional-networks. [Accessed 6 November 2018].
Krizhevsky, A., Sutskever I., Hinton, G.E., 2012. “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems (NIPS 2012).
Zeiler, M.D., Fergus, R. “Visualizing and Understanding Convolutional Networks,” 2013. [Online]. Available: https://arxiv.org/abs/1311.2901v3. [Accessed 6 November 2018].
Simonyan, K., Zisserman, A. “Very deep convolutional networks for large-scale image recognition,” 2014. [Online]. Available: https://arxiv.org/abs/1409.1556. [Accessed 6 November 2018].
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. “Going deeper with convolutions,” 2014. [Online]. Available: https://arxiv.org/abs/1409.4842. [Accessed 6 November 2018].
He, K., Zhang, X., Ren, S., Sun, J., 2016. “Deep residual learning for image recognition,” CVPR.
https://doi.org/10.1109/CVPR.2016.90
Szegedy, C., Ioffe, S., Vanhoucke V., Alemi, A., 2016. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. [Online]. Available: https://arxiv.org/abs/1602.07261. [Accessed 6 November 2018].
Veit, A., Wilber, M., Belongie, S. “Residual Networks Behave Like Ensembles of Relatively Shallow Networks,” 2017. [Online]. Available: https://arxiv.org/abs/1605.06431v2. [Accessed 6 November 2018].
Abdi, M., Nahavandi, S. “Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks,” 2017. [Online]. Available: https://arxiv.org/abs/1609.05672v4. [Accessed 6 November 2018].
Zagoruyko, S., Komodakis, N., 2017. Wide Residual Networks. [Online]. Available: https://arxiv.org/abs/1605.07146v4. [Accessed 6 November 2018].
Larsson, G., Maire, M., Shakhnarovi, G. “FractalNet: Ultra-Deep Neural Networks without Residuals,” 2017. [Online]. Available: https://arxiv.org/abs/1605.07648v4. [Accessed 6 November 2018].
Iandola, F.N., Han, S., Moskewicz, M.W., Dally, W.J., Keutzer, K., 2016. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size”. [Online]. Available: https://arxiv.org/abs/1602.07360. [Accessed 6 November 2018].
Girshick, R., Donahue, J., Darrell, T., Malik, J., 2014. “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of the IEEE conference on computer vision and pattern recognition, 580-587.
https://doi.org/10.1109/CVPR.2014.81
Girshick, R., 2015. Fast R-CNN. [Online]. Available: https://arxiv.org/abs/1504.08083v2. [Accessed 6 November 2018].
Ren, S., He, K., Girshick, R., Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, 2015. [Online]. Available: https://arxiv.org/abs/1506.01497. [Accessed 6 November 2018].
Dai, J., Li, Y., He, K., Sun, J., 2016. R-FCN: Object Detection via Region-based Fully Convolutional Networks, [Online]. Available: https://arxiv.org/abs/1605.06409. [Accessed 6 November 2018].
Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2015. You only look once: Unified, real-time object detection, [Online]. Available: https://arxiv.org/abs/1506.02640. [Accessed 6 November 2018].
Liu, W., Anguelov, D., Erhan, D., Szegedy, C. Reed, S., Fu, C.-Y., Berg, A.C., 2016. SSD: Single Shot MultiBox Detector, [Online]. Available: https://arxiv.org/abs/1512.02325v5. [Accessed 6 November 2018].
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., Murphy, K., 2017. Speed/accuracy trade-offs for modern convolutional object detectors. [Online]. Available: https://arxiv.org/abs/1611.10012v3. [Accessed 6 November 2018].
Kyyko, V.M., Matsello, V.V., 2015. “The Fingerprints Recognition Based on Corresponding Points Searching”. Upravlausiesistemyimasiny, 3, pp. 36-41. (In Russian).
Fischler, M.A., Bolles, R.C., 1981. “Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography,” ACM.
Moravec, H., 1980. “Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover”. Tech Report CMU-RI-TR-3,” Carnegie-Mellon University, Robotics Institute.
Canny, J., 1986. “A Computational Approach To Edge Detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, 1986.
https://doi.org/10.1109/TPAMI.1986.4767851
Harris, C., Stephens, M., 1988. “A combined corner and edge detector,” Proceedings of the 4th Alvey Vision Conference.
https://doi.org/10.5244/C.2.23
Rosten, E., Drummond, T., 2006. Machine Learning for High-speed Corner Detection.
Rosten, E., 2008. Faster and better: a machine learning approach to corner detection.
Mair, E., Hager, G.D., Burschka, D., Suppa, M., Hirzinger, G. “Adaptive and Generic Corner Detection Based on the Accelerated Segment Test,” European Conference on Computer Vision (ECCV’10), September 2010.
https://doi.org/10.1007/978-3-642-15552-9_14
Lowe, David, G., 1999. “Object recognition from local scale-invariant features,” Proceedings of the International Conference on Computer Vision.
https://doi.org/10.1109/ICCV.1999.790410
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L., 2008. “Speeded Up Robust Feature,” ETH Zurich, Katholieke Universiteit Leuven.
Calonder, M., Lepetit, V., Strecha, C., Fua, P., 2010. “BRIEF: Binary Robust Independent Elementary Features,” 11th European Conference on Computer Vision (ECCV).
https://doi.org/10.1007/978-3-642-15561-1_56
Leutenegger, S., Chli, M. Siegwart, R.Y., 2011. “BRISK: Binary Robust invariant scalable keypoints,” 2011 International Conference on Computer Vision, Barcelona, Spain.
https://doi.org/10.1109/ICCV.2011.6126542
Rublee, E., Rabaud, V., Konolige, K.M, Bradski, G., 2011. “ORB: An efficient alternative to SIFT or SURF,” 2011 International Conference on Computer Vision, Barcelona, Spain.
https://doi.org/10.1109/ICCV.2011.6126544
Alcantarilla, P.F., Bartoli, A., Davison, A.J., 2012. “KAZE Features,” in European Conference on Computer Vision 2012 (ECCV 2012).
https://doi.org/10.1007/978-3-642-33783-3_16
Alcantarilla, P.F., Nuevo, J., Bartoli, A., 2013. Fast explicit diffusion for accelerated features in nonlinear scale spaces, BMVC.
https://doi.org/10.5244/C.27.13
Roos, D.R., Shiguemori, E.H., Lorena, A.C., 2016. “Comparing ORB and AKAZE for visual odometry of unmanned aerial vehicles,” 4th Conference of Computational Interdisciplinary Sciences, 2016.
Byrne, J., Laefer, D.F., O’Keffe, E., 2017. “Maximizing feature detection in aerial unmanned aerial vehicle datasets,” Journal of Applied Remote Sensing, 11(2).
https://doi.org/10.1117/1.JRS.11.025015
Isık, S. Özkan, K., 2014. “A Comparative Evaluation of Well-known Feature Detectors and Descriptors.,” International Journal of Applied Mathematics, Electronics and Computers,3.
Tareen, S.A.K., Saleem, Z., 2018. “A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK.,” 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET).
https://doi.org/10.1109/ICOMET.2018.8346440

Received 22.11.18

Control Systems and Computers

Contacts