Improving Evaluation Metrics for Vision-and-Language Models

April 22, 2025

10:56 am

Evaluating image captions is essential for ensuring both linguistic fluency and accurate semantic alignment with visual content. While reference-free metrics such as CLIPScore have advanced automated caption evaluation, most existing work on learned evaluation metrics remains limited to pointwise English-centric assessments, with significant gaps in terms of reliability, interpretability, and multilingual inclusivity of vision-and-language evaluation metrics.

In this seminar session I will explore extensions of current English-centric benchmarks to a multilingual scenario promoting the development of more inclusive frameworks.

Additionally I will present two extensions from CLIPScore metric aiming to improve its interpretability and reliability in real world applications. Leveraging a model-agnostic conformal risk control framework, I will explore the calibration of CLIPScore distributions values for task-specific control variables tackling both granular assessment for individual word errors within captions, and the calibration of these raw distribution scores producing a more reliable interval for captioning evaluation by improving the correlation between uncertainty estimations and prediction errors.

Gonçalo Gomes

Gonçalo Gomes received a MSc degree in Data Science and Engineering, from Instituto Superior Técnico, Universidade de Lisboa. He is currently a second-year PhD student at the same institution and a junior researcher at the Human Language Technologies Lab of INESC-ID and also at SARDINE Labs of Instituto de Telecomunicações (IT). His research interests focus on developing more informative and trustworthy evaluation frameworks for vision-and-language applications, particularly envisioning a more inclusive AI frameworks for non-English environments.Instituto Superior Técnico

Seminários

Últimos seminários

Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025
Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025
Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…
Enhancing Uncertainty Estimation in Neural Networks
May 6, 2025
Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this presentation, I will present…
Improving Evaluation Metrics for Vision-and-Language Models
April 22, 2025
Evaluating image captions is essential for ensuring both linguistic fluency and accurate semantic alignment with visual content. While reference-free metrics…

Improving Evaluation Metrics for Vision-and-Language Models

Gonçalo Gomes

Seminários

Últimos seminários

Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding

Speech as a Biomarker for Disease Detection

Enhancing Uncertainty Estimation in Neural Networks

Improving Evaluation Metrics for Vision-and-Language Models