A Comprehensive Review of Deep Learning Architectures for Computer Vision Applications


  • Arman Sarraf Department of Electrical and Computer Engineering Islamic Azad University North Tehran Branch
  • Mohammad Azhdari Data Processing Company (DPCO)
  • Saman Sarraf The Institute of Electrical and Electronics Engineers, Senior Member IEEE


Computer Vision, Machine Learning, Image Classification, Semantic Segmentation, Deep Learning, CNN


The emergence of machine learning in the artificial intelligence field led the world of technology to make great strides. Today’s advanced systems with the ability of being designed just like human brain functions has given practitioners the ability to train systems so that they could process, analyze, classify, and predict different data classes. Therefore, the machine learning field has become a hot topic for scientists and researchers to introduce the best network with the highest performance for such mentioned purposes. In this article, computer vision science, image classification implementation, and deep neural networks are presented. This article discusses how models have been designed based on the concept of the human brain. The development of a Convolutional Neural Network (CNN) and its various architectures, which have shown great efficiency and evaluation in object detection, face recognition, image classification, and localization, are also introduced. Furthermore, the utilization and application of CNNs, including voice recognition, image processing, video processing, and text recognition, are examined closely. A literature review is conducted to illustrate the significance and the details of Convolutional Neural Networks in various applications.


. W.-L. Chao, “Machine learning tutorial,” Digit. Image Signal Process., 2011.

. E. Murphy-Chutorian and M. M. Trivedi, “Head pose estimation in computer vision: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 4, pp. 607–626, 2008.

. M. E. Cintra, M. C. Monard, H. A. Camargo, and T. P. Martin, “A comparative study on classic machine learning and fuzzy approaches for classification problems,” 2005.

. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, “Understanding deep learning requires rethinking generalization,” arXiv Prepr. arXiv1611.03530, 2016.

. F. Altenberger and C. Lenz, “A non-technical survey on deep convolutional neural network architectures,” arXiv Prepr. arXiv1803.02129, 2018.

. J. A. E. Anderson et al., “Task-linked diurnal brain network reorganization in older adults: A graph theoretical approach,” J. Cogn. Neurosci., vol. 29, no. 3, pp. 560–572, 2017.

. T. B. Moeslund and E. Granum, “A survey of computer vision-based human motion capture,” Comput. Vis. image Underst., vol. 81, no. 3, pp. 231–268, 2001.

. H. Zhang, J. E. Fritts, and S. A. Goldman, “Image segmentation evaluation: A survey of unsupervised methods,” Comput. Vis. image Underst., vol. 110, no. 2, pp. 260–280, 2008.

. R. J. Radke, “A survey of distributed computer vision algorithms,” in Handbook of Ambient Intelligence and Smart Environments, Springer, 2010, pp. 35–55.

. O. Marques, E. Barenholtz, and V. Charvillat, “Context modeling in computer vision: techniques, implications, and applications,” Multimed. Tools Appl., vol. 51, no. 1, pp. 303–339, 2011.

. X. Yang, S. Sarraf, and N. Zhang, “Deep learning-based framework for Autism functional MRI image classification,” J. Ark. Acad. Sci., vol. 72, no. 1, pp. 47–52, 2018.

. J. C. S. J. Junior, S. R. Musse, and C. R. Jung, “Crowd analysis using computer vision techniques,” IEEE Signal Process. Mag., vol. 27, no. 5, pp. 66–77, 2010.

. D. Lu and Q. Weng, “A survey of image classification methods and techniques for improving classification performance,” Int. J. Remote Sens., vol. 28, no. 5, pp. 823–870, 2007.

. S. Sarraf, “Binary Image Segmentation Using Classification Methods: Support Vector Machines, Artificial Neural Networks and K th Nearest Neighbours,” Int. J. Comput., vol. 24, no. 1, pp. 56–79, 2017.

. P. Babaniamansour, M. Ebrahimian-Hosseinabadi, and A. Zargar-Kharazi, “Designing an optimized novel femoral stem,” J. Med. Signals Sens., vol. 7, no. 3, p. 170, 2017.

. S. Sarraf, “Hair color classification in face recognition using machine learning algorithms,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 26, no. 3, pp. 317–334, 2016.

. A. Sarraf, “Binary Image Classification Through an Optimal Topology for Convolutional Neural Networks,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 68, no. 1, pp. 181–192, 2020.

. S. Sarraf, “French Word Recognition Through a Quick Survey on Recurrent Neural Networks Using Long-Short Term Memory RNN-LSTM,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 39, no. 1, pp. 250–267, 2018.

. D. Tuia, M. Volpi, L. Copa, M. Kanevski, and J. Munoz-Mari, “A survey of active learning algorithms for supervised remote sensing image classification,” IEEE J. Sel. Top. Signal Process., vol. 5, no. 3, pp. 606–617, 2011.

. P. Babaniamansour, M. Mohammadi, S. Babaniamansour, and E. Aliniagerdroudbari, “The relation between atherosclerosis plaque composition and plaque rupture,” J. Med. Signals Sens., vol. 10, no. 4, pp. 267–273, 2020.

. C. Dhaware and K. H. Wanjale, “Survey on image classification methods in image processing,” Int. J. Comput. Sci. Trends Technol., vol. 4, no. 3, pp. 246–248, 2016.

. S. Sarraf and J. Sun, “Advances in functional brain imaging: a comprehensive survey for engineers and physical scientists,” Int. J. Adv. Res., vol. 4, no. 8, pp. 640–660, 2016.

. S. Sarraf, “EEG-based movement imagery classification using machine learning techniques and Welch’s power spectral density estimation,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 33, no. 1, pp. 124–145, 2017.

. C. Saverino, Z. Fatima, S. Sarraf, A. Oder, S. C. Strother, and C. L. Grady, “The associative memory deficit in aging is related to reduced selectivity of brain activity during encoding,” J. Cogn. Neurosci., vol. 28, no. 9, pp. 1331–1344, 2016.

. A. Sarraf, A. E. Jalali, and J. Ghaffari, “Recent Applications of Deep Learning Algorithms in Medical Image Analysis,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 72, no. 1, pp. 58–66, 2020.

. Y. Li, H. Zhang, X. Xue, Y. Jiang, and Q. Shen, “Deep learning for remote sensing image classification: A survey,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 8, no. 6, p. e1264, 2018.

. S. Sarraf, C. Saverino, and A. M. Golestani, “A robust and adaptive decision-making algorithm for detecting brain networks using functional mri within the spatial and frequency domain,” in 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Apr. 2016, pp. 53–56, doi: 10.1109/BHI.2016.7455833.

. Q. Zhang, L. T. Yang, Z. Chen, and P. Li, “A survey on deep learning for big data,” Inf. Fusion, vol. 42, pp. 146–157, 2018.

. L. Deng, “Three classes of deep learning architectures and their applications: a tutorial survey,” APSIPA Trans. signal Inf. Process., 2012.

. J. Gehring, Y. Miao, F. Metze, and A. Waibel, “Extracting deep bottleneck features using stacked auto-encoders,” in 2013 IEEE international conference on acoustics, speech and signal processing, 2013, pp. 3377–3381.

. G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006.

. S. Sarraf and J. Sun, “Functional brain imaging: A comprehensive survey,” arXiv Prepr. arXiv1602.02225, 2016.

. S. Sarraf, “Current Stage of Autonomous Driving Through A Quick Survey for Novice,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 73, no. 1, pp. 1–7, 2020.

. X. Chen, X. Liu, Y. Wang, M. J. F. Gales, and P. C. Woodland, “Efficient training and evaluation of recurrent neural network language models for automatic speech recognition,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 11, pp. 2146–2157, 2016.

. B. Zhao, J. Feng, X. Wu, and S. Yan, “A survey on deep learning-based fine-grained object classification and semantic segmentation,” Int. J. Autom. Comput., vol. 14, no. 2, pp. 119–135, 2017.

. B. Neyshabur, S. Bhojanapalli, D. McAllester, and N. Srebro, “Exploring generalization in deep learning,” arXiv Prepr. arXiv1706.08947, 2017.

. K. Krishnan, T. Schwering, and S. Sarraf, “Cognitive dynamic systems: A technical review of cognitive radar,” arXiv Prepr. arXiv1605.08150, 2016.

. D. T. Mane and U. V Kulkarni, “A survey on supervised convolutional neural network and its major applications,” in Deep Learning and Neural Networks: Concepts, Methodologies, Tools, and Applications, IGI Global, 2020, pp. 1058–1071.

. C. Grady, S. Sarraf, C. Saverino, and K. Campbell, “Age differences in the functional interactions among the default, frontoparietal control, and dorsal attention networks,” Neurobiol. Aging, vol. 41, pp. 159–172, 2016.

. A. A. M. Al-Saffar, H. Tao, and M. A. Talab, “Review of deep convolution neural network in image classification,” in 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET), 2017, pp. 26–31.

. Z. Qin, F. Yu, C. Liu, and X. Chen, “How convolutional neural network see the world-A survey of convolutional neural network visualization methods,” arXiv Prepr. arXiv1804.11191, 2018.

. S. Sarraf and G. Tofighi, “Classification of alzheimer’s disease using fmri data and deep learning convolutional neural networks,” arXiv Prepr. arXiv1603.08631, 2016.

. N. Aloysius and M. Geetha, “A review on deep convolutional neural networks,” in 2017 International Conference on Communication and Signal Processing (ICCSP), 2017, pp. 588–592.

. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European conference on computer vision, 2016, pp. 525–542.

. C.-Y. Lee, P. W. Gallagher, and Z. Tu, “Generalizing pooling functions in convolutional neural networks: Mixed, gated, and tree,” in Artificial intelligence and statistics, 2016, pp. 464–472.

. M. D. Zeiler and R. Fergus, “Stochastic pooling for regularization of deep convolutional neural networks,” arXiv Prepr. arXiv1301.3557, 2013.

. K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, 2015.

. W. Ouyang et al., “Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection,” arXiv Prepr. arXiv1409.3505, 2014.

. S. Sarraf and M. Ostadhashem, “Big data application in functional magnetic resonance imaging using apache spark,” in 2016 Future Technologies Conference (FTC), 2016, pp. 281–284.

. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580–587.

. N. Ershadinia, N. Mortazavinia, S. Babaniamansour, M. Najafi-Nesheli, P. Babaniamansour, and E. Aliniagerdroudbari, “The prevalence of autoimmune diseases in patients with multiple sclerosis: A cross-sectional study in Qom, Iran, in 2018,” Curr. J. Neurol., vol. 19, no. 3, pp. 98–102, 2020.

. S. Babaniamansour, M. Hematyar, P. Babaniamansour, A. Babaniamansour, and E. Aliniagerdroudbari, “The Prevalence of Vitamin D Deficiency Among One to Six Year Old Children of Tehran, Iran,” J. Kermanshah Univ. Med. Sci., vol. 23, no. 4, 2019.

. D. Misra, “Mish: A self regularized non-monotonic neural activation function,” arXiv Prepr. arXiv1908.08681, vol. 4, 2019.

. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, 2015, pp. 448–456.

. G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv Prepr. arXiv1207.0580, 2012.

. A. Dhillon and G. K. Verma, “Convolutional neural network: a review of models, methodologies and applications to object detection,” Prog. Artif. Intell., vol. 9, no. 2, pp. 85–112, 2020.

. A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez, “A review on deep learning techniques applied to semantic segmentation,” arXiv Prepr. arXiv1704.06857, 2017.

. Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew, “Deep learning for visual understanding: A review,” Neurocomputing, vol. 187, pp. 27–48, 2016.

. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

. M. Z. Alom et al., “The history began from alexnet: A comprehensive survey on deep learning approaches,” arXiv Prepr. arXiv1803.01164, 2018.

. M. D. Zeiler and R. Fergus, “Visualizing and understanding convolutional networks,” in European conference on computer vision, 2014, pp. 818–833.

. C. Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.

. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv Prepr. arXiv1409.1556, 2014.

. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2017, vol. 31, no. 1.

. X. Xia, C. Xu, and B. Nan, “Inception-v3 for flower classification,” in 2017 2nd International Conference on Image, Vision and Computing (ICIVC), 2017, pp. 783–787.

. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–1500.

. J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.

. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4510–4520.

. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4700–4708.

. F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1251–1258.

. S. C. Strother, S. Sarraf, and C. Grady, “A hierarchy of cognitive brain networks revealed by multivariate performance metrics,” in 2014 48th Asilomar Conference on Signals, Systems and Computers, 2014, pp. 603–607.

. H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional neural network cascade for face detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5325–5334.

. S. Yang, P. Luo, C.-C. Loy, and X. Tang, “From facial parts responses to face detection: A deep learning approach,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 3676–3684.

. D. CireAan, U. Meier, J. Masci, and J. Schmidhuber, “Multi-column deep neural network for traffic sign classification,” Neural networks, vol. 32, pp. 333–338, 2012.

. S. Sarraf and G. Tofighi, “Deep learning-based pipeline to recognize Alzheimer’s disease using fMRI data,” in 2016 Future Technologies Conference (FTC), 2016, pp. 816–820.

. S. Sarraf, “5g emerging technology and affected industries: Quick survey,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 55, no. 1, pp. 75–82, 2019.

. S. Gidaris and N. Komodakis, “Object detection via a multi-region and semantic segmentation-aware cnn model,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1134–1142.

. X. Wang, L. Gao, J. Song, and H. Shen, “Beyond frame-level CNN: saliency-aware 3-D CNN with LSTM for video action recognition,” IEEE Signal Process. Lett., vol. 24, no. 4, pp. 510–514, 2016.

. A. Ullah, J. Ahmad, K. Muhammad, M. Sajjad, and S. W. Baik, “Action recognition in video sequences using deep bi-directional LSTM with CNN features,” IEEE access, vol. 6, pp. 1155–1166, 2017.

. S. Sarraf, D. D. Desouza, J. A. E. Anderson, and C. Saverino, “MCADNNet: Recognizing stages of cognitive impairment through efficient convolutional fMRI and MRI neural network topology models,” IEEE Access, vol. 7, pp. 155584–155600, 2019.

. N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural network for modelling sentences,” arXiv Prepr. arXiv1404.2188, 2014.

. O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,” in 2012 IEEE international conference on Acoustics, speech and signal processing (ICASSP), 2012, pp. 4277–4280.

. K.-Y. Huang, C.-H. Wu, Q.-B. Hong, M.-H. Su, and Y.-H. Chen, “Speech emotion recognition using deep neural network considering verbal and nonverbal speech sounds,” in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 5866–5870.

. O. Abdeljaber, O. Avci, S. Kiranyaz, M. Gabbouj, and D. J. Inman, “Real-time vibration-based structural damage detection using one-dimensional convolutional neural networks,” J. Sound Vib., vol. 388, pp. 154–170, 2017.

. X. Peng, J. Hoffman, X. Y. Stella, and K. Saenko, “Fine-to-coarse knowledge transfer for low-res image classification,” in 2016 IEEE International Conference on Image Processing (ICIP), 2016, pp. 3683–3687.

. S. Sarraf, C. Saverino, H. Ghaderi, and J. Anderson, “Brain network extraction from probabilistic ICA using functional Magnetic Resonance Images and advanced template matching techniques,” in 2014 IEEE 27th Canadian Conference on Electrical and Computer Engineering (CCECE), 2014, pp. 1–6.

. S. Sarraf, “Analysis and Detection of DDoS Attacks Using Machine Learning Techniques,” Am. Sci. Res. J. Eng. Technol. Sci., vol. 66, no. 1, pp. 95–104, 2020.

. S. Sarraf, D. D. DeSouza, J. Anderson, and G. Tofighi, “DeepAD: Alzheimer’s Disease Classification via Deep Convolutional Neural Networks using MRI and fMRI,” bioRxiv, p. 70441, 2017.

. S. H. Sarraf, M. Soltanieh, and H. Aghajani, “Repairing the cracks network of hard chromium electroplated layers using plasma nitriding technique,” Vacuum, vol. 127, pp. 1–9, 2016.




How to Cite

Sarraf, A. ., Azhdari, M. ., & Sarraf, S. (2021). A Comprehensive Review of Deep Learning Architectures for Computer Vision Applications. American Scientific Research Journal for Engineering, Technology, and Sciences, 77(1), 1–29. Retrieved from https://www.asrjetsjournal.org/index.php/American_Scientific_Journal/article/view/6712