The Explanation Game: Towards Prediction Explainability through Sparse Communication

Explainability is a topic of growing importance in NLP. In this work, we provide a unified perspective of explainability as a communication problem between an explainer and a layperson about a classifier’s decision. We use this framework to compare several prior approaches for extracting explanations, including gradient methods, representation erasure, and attention mechanisms, in terms of their communication success. In addition, we reinterpret these methods at the light of classical feature selection, and we use this as inspiration to propose new embedded methods for explainability, through the use of selective, sparse attention. Experiments in text classification and natural language inference, using different configurations of explainers and laypeople (including both machines and humans), reveal an advantage of attention-based explainers over gradient and erasure methods. Human experiments show promising results on text classification with post-hoc explainers trained to optimize communication success.


Marcos Treviso

Marcos is a Ph.D. student in the DeepSPIN Project, supervised by André Martins. His main interests include semi-parametric models and explainability of neural networks. Previously, he obtained an M.Sc. degree in Computer Science and Computational Mathematics at the University of São Paulo (USP), having worked with NLP and Machine Learning for sentence segmentation and disfluency detection. Marcos was also a research AI Intern at Unbabel in 2018, where he contributed to the OpenKiwi project.DeepSPIN/IT