Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application

Seifeddine Messaoud; Soulef Bouaafia; Amna Maraoui; Ahmed Chiheb Ammari; Lazhar Kheriji; Mohsen Machhout

doi:10.1016/j.compeleceng.2021.107671

Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application

Seifeddine Messaoud, Soulef Bouaafia, Amna Maraoui, Ahmed Chiheb Ammari, Lazhar Kheriji, Mohsen Machhout

Electrical and Computer Engineering

نتاج البحث: المساهمة في مجلة › Article › مراجعة النظراء

16 اقتباسات (Scopus)

ملخص

Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial intelligence (AI) in vision applications that make these systems intelligent and able to make decisions close to or similar to humans. In this context, the AI's integration on embedded systems poses many challenges, given that its performance depends on data volume and quality they assimilate to learn and improve. This returns to the energy consumption and cost constraints of the FPGA-SoC that have limited processing, memory, and communication capacity. Despite this, the AI algorithm implementation on embedded systems can drastically reduce energy consumption and processing times, while reducing the costs and risks associated with data transmission. Therefore, its efficiency and reliability always depend on the designed prototypes. Within this range, this work proposes two different designs for the Traffic Sign Recognition (TSR) application based on the convolutional neural network (CNN) model, followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s, there is room for improvement using programmable logic (PL) and processing system (PS) in a hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers will be accelerated to provide a hardware accelerator for each layer where direct memory access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased hardware cost, and execution time optimization of 2.13 s, but, there was still room for design optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated to be a single computation engine where BRAM interface is used. The implementation results prove that our proposed embedded TSR design achieves the best performances compared to the first proposed architectures, in terms of execution time of about 0.03 s, computation roof of about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.

اللغة الأصلية	English
رقم المقال	107671
الصفحات (من إلى)	107671
عدد الصفحات	1
دورية	Computers and Electrical Engineering
مستوى الصوت	98
المعرِّفات الرقمية للأشياء	https://doi.org/10.1016/j.compeleceng.2021.107671
حالة النشر	Published - مارس 1 2022

ASJC Scopus subject areas

???subjectarea.asjc.2200.2207???
???subjectarea.asjc.1700.1700???
???subjectarea.asjc.2200.2208???

أهداف الأمم المتحدة للتنمية المستدامة

يساهم هذا المخرج في تحقيق أهداف الأمم المتحدة للتنمية المستدامة التالية (SDGs)

الوصول إلى المستند

10.1016/j.compeleceng.2021.107671

الملفات والروابط الأخرى

قم بذكر هذا

Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application. / Messaoud, Seifeddine; Bouaafia, Soulef; Maraoui, Amna وآخرون.
في: Computers and Electrical Engineering, المجلد 98 , 107671, ٠١.٠٣.٢٠٢٢, صفحة 107671.

نتاج البحث: المساهمة في مجلة › Article › مراجعة النظراء

@article{b0cc5c5b65b5402284a33705a9759568,

title = "Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application",

abstract = "Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial intelligence (AI) in vision applications that make these systems intelligent and able to make decisions close to or similar to humans. In this context, the AI's integration on embedded systems poses many challenges, given that its performance depends on data volume and quality they assimilate to learn and improve. This returns to the energy consumption and cost constraints of the FPGA-SoC that have limited processing, memory, and communication capacity. Despite this, the AI algorithm implementation on embedded systems can drastically reduce energy consumption and processing times, while reducing the costs and risks associated with data transmission. Therefore, its efficiency and reliability always depend on the designed prototypes. Within this range, this work proposes two different designs for the Traffic Sign Recognition (TSR) application based on the convolutional neural network (CNN) model, followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s, there is room for improvement using programmable logic (PL) and processing system (PS) in a hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers will be accelerated to provide a hardware accelerator for each layer where direct memory access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased hardware cost, and execution time optimization of 2.13 s, but, there was still room for design optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated to be a single computation engine where BRAM interface is used. The implementation results prove that our proposed embedded TSR design achieves the best performances compared to the first proposed architectures, in terms of execution time of about 0.03 s, computation roof of about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.",

keywords = "Acceleration, CNN, Co-design, FPGA, PYNQ-Z1",

author = "Seifeddine Messaoud and Soulef Bouaafia and Amna Maraoui and {Chiheb Ammari}, Ahmed and Lazhar Kheriji and Mohsen Machhout",

note = "Publisher Copyright: {\textcopyright} 2022 Elsevier Ltd DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.",

year = "2022",

month = mar,

day = "1",

doi = "10.1016/j.compeleceng.2021.107671",

language = "English",

volume = "98 ",

pages = "107671",

journal = "Computers and Electrical Engineering",

issn = "0045-7906",

publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application

AU - Messaoud, Seifeddine

AU - Bouaafia, Soulef

AU - Maraoui, Amna

AU - Chiheb Ammari, Ahmed

AU - Kheriji, Lazhar

AU - Machhout, Mohsen

N1 - Publisher Copyright: © 2022 Elsevier Ltd DBLP License: DBLP's bibliographic metadata records provided through http://dblp.org/ are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication. Although the bibliographic metadata records are provided consistent with CC0 1.0 Dedication, the content described by the metadata records is not. Content may be subject to copyright, rights of privacy, rights of publicity and other restrictions.

PY - 2022/3/1

Y1 - 2022/3/1

N2 - Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial intelligence (AI) in vision applications that make these systems intelligent and able to make decisions close to or similar to humans. In this context, the AI's integration on embedded systems poses many challenges, given that its performance depends on data volume and quality they assimilate to learn and improve. This returns to the energy consumption and cost constraints of the FPGA-SoC that have limited processing, memory, and communication capacity. Despite this, the AI algorithm implementation on embedded systems can drastically reduce energy consumption and processing times, while reducing the costs and risks associated with data transmission. Therefore, its efficiency and reliability always depend on the designed prototypes. Within this range, this work proposes two different designs for the Traffic Sign Recognition (TSR) application based on the convolutional neural network (CNN) model, followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s, there is room for improvement using programmable logic (PL) and processing system (PS) in a hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers will be accelerated to provide a hardware accelerator for each layer where direct memory access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased hardware cost, and execution time optimization of 2.13 s, but, there was still room for design optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated to be a single computation engine where BRAM interface is used. The implementation results prove that our proposed embedded TSR design achieves the best performances compared to the first proposed architectures, in terms of execution time of about 0.03 s, computation roof of about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.

AB - Embedded vision systems are the best solutions for high-performance and lightning-fast inspection tasks. As everyday life evolves, it becomes almost imperative to harness artificial intelligence (AI) in vision applications that make these systems intelligent and able to make decisions close to or similar to humans. In this context, the AI's integration on embedded systems poses many challenges, given that its performance depends on data volume and quality they assimilate to learn and improve. This returns to the energy consumption and cost constraints of the FPGA-SoC that have limited processing, memory, and communication capacity. Despite this, the AI algorithm implementation on embedded systems can drastically reduce energy consumption and processing times, while reducing the costs and risks associated with data transmission. Therefore, its efficiency and reliability always depend on the designed prototypes. Within this range, this work proposes two different designs for the Traffic Sign Recognition (TSR) application based on the convolutional neural network (CNN) model, followed by three implantations on PYNQ-Z1. Firstly, we propose to implement the CNN-based TSR application on the PYNQ-Z1 processor. Considering its runtime result of around 3.55 s, there is room for improvement using programmable logic (PL) and processing system (PS) in a hybrid architecture. Therefore, we propose a streaming architecture, in which the CNN layers will be accelerated to provide a hardware accelerator for each layer where direct memory access (DMA) interface is used. Thus, we noticed efficient power consumption, decreased hardware cost, and execution time optimization of 2.13 s, but, there was still room for design optimizations. Finally, we propose a second co-design, in which the CNN will be accelerated to be a single computation engine where BRAM interface is used. The implementation results prove that our proposed embedded TSR design achieves the best performances compared to the first proposed architectures, in terms of execution time of about 0.03 s, computation roof of about 36.6 GFLOPS, and bandwidth roof of about 3.2 GByte/s.

KW - Acceleration

KW - CNN

KW - Co-design

KW - FPGA

KW - PYNQ-Z1

UR - http://www.scopus.com/inward/record.url?scp=85123030584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85123030584&partnerID=8YFLogxK

U2 - 10.1016/j.compeleceng.2021.107671

DO - 10.1016/j.compeleceng.2021.107671

M3 - Article

AN - SCOPUS:85123030584

SN - 0045-7906

VL - 98

SP - 107671

JO - Computers and Electrical Engineering

JF - Computers and Electrical Engineering

M1 - 107671

ER -

Deep convolutional neural networks-based Hardware–Software on-chip system for computer vision application

ملخص

ASJC Scopus subject areas

أهداف الأمم المتحدة للتنمية المستدامة

الوصول إلى المستند

الملفات والروابط الأخرى

بصمة

قم بذكر هذا