Visual Attention with Sparse and Continuous Transformations

Visual attention mechanisms have become an important component of neural network models for Computer Vision applications, allowing them to attend to finite sets of objects or regions and identify relevant features. A key component of attention mechanisms is the differentiable transformation that maps scores representing the importance of each feature into probabilities. The usual choice is the softmax transformation, whose output is strictly dense, assigning a probability mass to every image feature. This density is wasteful, given that non-relevant features are still taken into consideration, making attention models less interpretable. Until now, visual attention has only been applied to discrete domains – this may lead to a lack of focus, where the attention distribution over the image is too scattered. Inspired by the continuous nature of images, we explore continuous-domain alternatives to discrete attention models. We propose solutions that focus on both the continuity and the sparsity of attention distributions, being suitable for selecting compact and sparse regions such as ellipses. The former encourages the selected regions to be contiguous and the latter is able to single out the relevant features, assigning exactly zero probability to irrelevant parts. We use the fact that the Jacobian of these transformations are generalized covariances to derive efficient backpropagation algorithms for both unimodal and multimodal attention distributions. Experiments on Visual Question Answering show that continuous attention models generate smooth attention maps that seem to better relate with human judgment, while achieving improvements in terms of accuracy over grid-based methods trained on the same data.


António Farinhas

António Farinhas is a first year PhD student at Instituto Superior Técnico (IST), who is interested in Machine Learning and Natural Language Processing, being advised by André Martins. He previously obtained his MSc degree in Aerospace Engineering at IST. The work in his MSc thesis, advised by André Martins and Pedro Aguiar, focused on continuous visual attention mechanisms and was part of the NeurIPS 2020 paper “Sparse and Continuous Attention Mechanisms”.IST/IT