Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning

Md Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin, Hamid Laga, Mohammed Bennamoun

نتاج البحث: Conference contribution

10 اقتباسات (Scopus)

ملخص

In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods.

اللغة الأصليةEnglish
عنوان منشور المضيف2019 Digital Image Computing
العنوان الفرعي لمنشور المضيفTechniques and Applications, DICTA 2019
ناشرInstitute of Electrical and Electronics Engineers Inc.
رقم المعيار الدولي للكتب (الإلكتروني)9781728138572
المعرِّفات الرقمية للأشياء
حالة النشرPublished - ديسمبر 2019
منشور خارجيًانعم
الحدث2019 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2019 - Perth, Australia
المدة: ديسمبر ٢ ٢٠١٩ديسمبر ٤ ٢٠١٩

سلسلة المنشورات

الاسم2019 Digital Image Computing: Techniques and Applications, DICTA 2019

Conference

Conference2019 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2019
الدولة/الإقليمAustralia
المدينةPerth
المدة١٢/٢/١٩١٢/٤/١٩

ASJC Scopus subject areas

  • ???subjectarea.asjc.1700.1702???
  • ???subjectarea.asjc.1700.1706???
  • ???subjectarea.asjc.1700.1707???
  • ???subjectarea.asjc.1700.1711???
  • ???subjectarea.asjc.2200.2214???

بصمة

أدرس بدقة موضوعات البحث “Bi-SAN-CAP: Bi-Directional Self-Attention for Image Captioning'. فهما يشكلان معًا بصمة فريدة.

قم بذكر هذا