Learning with Sparse Latent Structure

Structured representations are a powerful tool in machine learning, and in particular in natural language processing: The discrete, compositional nature of words and sentences leads to natural combinatorial representations such as trees, sequences, segments, or alignments, among others. At the same time, deep, hierarchical neural networks with latent representations are increasingly widely and successfully applied to language tasks. Deep networks conventionally perform smooth, soft computations resulting in dense hidden representations.

We study deep models with structured and sparse latent representations, without sacrificing differentiability. This allows for fully deterministic models which can be trained with familiar end-to-end gradient-based methods. We demonstrate sparse and structured attention mechanisms, as well as latent computation graph structure learning, with successful empirical results on large scale problems including sentiment analysis, natural language inference, and neural machine translation.

Joint work with Claire Cardie, Mathieu Blondel, and André Martins.

Vlad Niculae

Vlad is a postdoc in the DeepSPIN project at the Instituto de Telecomunicações in Lisbon, Portugal. His research aims to bring structure and sparsity to neural network hidden layers and latent variables, using ideas from convex optimization, and motivations from natural language processing. He earned a PhD in Computer Science from Cornell University in 2018. He is co-organizing the NAACL 2019 Workshop on Structured Prediction for NLP.IST DeepSpin / Instituto de Telecomunicações