Nowadays, generative question-answering models (e.g., UnifiedQA) achieve state-of-the-art performance in various datasets. Despite their remarkable performance, these models still produce wrong answers with high confidence scores. The responsible use of such systems in high-risk applications, like healthcare, requires some guarantees in terms of the correlation of the model scores and the output’s correctness. One potential approach toward these guarantees is calibration. Despite the vast research on calibration for K-ary classification in machine learning, calibration for textual-based systems imposes additional challenges that range from a combinatorial output space to the many definitions of correctness. In this talk, we will discuss the challenges towards the calibration of generative question-answering systems, as well as the current state-of-the-art approaches to address it.
On the Calibration of Generative Question-Answering models: State-of-the-art and Challenges
May 17, 2022
1:00 pm
Catarina Belém
Catarina Belém is a first-year Ph.D. candidate in Computer Science at the University of California Irvine (UCI). Currently, she is working under the supervision of professors Sameer Singh and Padhraic Smyth on the calibration of generative question-answering models. Prior to joining UCI, Catarina worked as a research data scientist at the Responsible AI (FATE) group at Feedzai, where she developed a keen interest in fairness, explainability, and evaluation in AI. Catarina holds an integrated master’s degree (BSc+MSc) in Computer Engineering obtained from Instituto Superior Tecnico in 2019. Her main research interests include Machine Learning and Natural Language Processing with a particular focus on Responsible AI.University of California IrvineSeminários
Últimos seminários
Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…
Enhancing Uncertainty Estimation in Neural Networks
May 6, 2025Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this presentation, I will present…
Improving Evaluation Metrics for Vision-and-Language Models
April 22, 2025Evaluating image captions is essential for ensuring both linguistic fluency and accurate semantic alignment with visual content. While reference-free metrics…