In order to make decisions, for instance when purchasing a product, people rely on rich and accurate descriptions, which entail multi-label retrieval processes. However, multi-label classification is challenged by high dimensional and complex feature spaces and its dependency on large and accurately annotated datasets. Deep learning approaches brought a definite breakthrough in performance across numerous machine learning problems, and image classification was, undoubtedly, one of the tasks where these approaches had greater repercussions. In this presentation we will focus on image classification of fashion images, using deep learning approaches to tackle the multi-class/multi-label problems in order to generate rich images descriptions. Fashion datasets are challenging because they include a vast amount of similarly looking images and they are annotated with a large diversity of attributes but with few labels per exemplar. To address the previous issues we explore domain knowledge to constrain the (otherwise completely data-driven) solutions. Specifically, we first show how to incorporate knowledge about annotations structure. Secondly, we use context and semantic localization to guide an attention mechanism that designs the feature space by focusing on visually meaningful regions. We show with thorough experimentation the performance gains achieved for both cases.
Exploring Label Structure and Spatial Attention for Fashion Images Classification
May 26, 2020
1:00 pm
Beatriz Ferreira
Beatriz Quintino Ferreira is a PhD student of the NETSyS program, from the Signal and Image Processing Group at Instituto de Sistemas e Robótica. Her main research interests lie in the intersection of Computer Vision and Machine Learning. She is also an apologist of interpretable models, as she deems interpretability to be fundamental to the development of richer and more robust models, more easily comprehended by humans. She has been a PhD student intern at Farfetch and a visiting scholar at CMU. Some of her recent publications can be found on KDD and on ICCV workshops.ISR/ISTSeminários
Últimos seminários
Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…
Enhancing Uncertainty Estimation in Neural Networks
May 6, 2025Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this presentation, I will present…
Improving Evaluation Metrics for Vision-and-Language Models
April 22, 2025Evaluating image captions is essential for ensuring both linguistic fluency and accurate semantic alignment with visual content. While reference-free metrics…



