TY - GEN
T1 - Bi-SAN-CAP
T2 - 2019 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2019
AU - Hossain, Md Zakir
AU - Sohel, Ferdous
AU - Shiratuddin, Mohd Fairuz
AU - Laga, Hamid
AU - Bennamoun, Mohammed
N1 - Publisher Copyright:
© 2019 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2019/12
Y1 - 2019/12
N2 - In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods.
AB - In a typical image captioning pipeline, a Convolutional Neural Network (CNN) is used as the image encoder and Long Short-Term Memory (LSTM) as the language decoder. LSTM with attention mechanism has shown remarkable performance on sequential data including image captioning. LSTM can retain long-range dependency of sequential data. However, it is hard to parallelize the computations of LSTM because of its inherent sequential characteristics. In order to address this issue, recent works have shown benefits in using self-attention, which is highly parallelizable without requiring any temporal dependencies. However, existing techniques apply attention only in one direction to compute the context of the words. We propose an attention mechanism called Bi-directional Self-Attention (Bi-SAN) for image captioning. It computes attention both in forward and backward directions. It achieves high performance comparable to state-of-the-art methods.
KW - Bi-directional Self-Attention
KW - Deep Learning
KW - Image Captioning
KW - Self-Attention
UR - http://www.scopus.com/inward/record.url?scp=85078486323&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078486323&partnerID=8YFLogxK
U2 - 10.1109/DICTA47822.2019.8946003
DO - 10.1109/DICTA47822.2019.8946003
M3 - Conference contribution
AN - SCOPUS:85078486323
T3 - 2019 Digital Image Computing: Techniques and Applications, DICTA 2019
BT - 2019 Digital Image Computing
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 2 December 2019 through 4 December 2019
ER -