Speech-to-text summarization is a time-saving technique used to filter and keep pace with the daily influx of broadcast news uploaded online. The emergence of powerful deep learning-based language models, boasting impressive text generation capabilities, has directed research attention towards summarization systems capable of producing concise paraphrased versions of document content, commonly referred to as abstractive summaries. The application of end-to-end modelling for speech-to-text abstractive summarization shows promise by enabling the generation of rich latent representations that directly exploit non-verbal and acoustic information extracted from the audio source. Nevertheless, the unavailability of publicly accessible extensive corpora specific to the broadcast news domain, containing paired audio and summary data, poses a challenge for fully supervised approaches to end-to-end modeling. In this presentation, the speaker will discuss his work on a strategy that leverages external data through transfer learning from a pre-trained text-to-text abstractive summarizer.
Towards End-to-end Speech-to-text Abstractive Summarization
June 6, 2023
1:00 pm
Raul Monteiro
Raul Monteiro is an NLP researcher at Priberam Labs. He obtained a Master's degree (MSc) in Engineering Physics from Instituto Superior Técnico in 2023. He conducted his master's thesis in collaboration with Priberam, concentrating on the domain of Speech-to-text Summarization. His research interests primarily revolve around Deep Learning and Speech Processing, with particular focus on Speech Summarization and Spoken Named Entity Recognition.PriberamSeminários
Últimos seminários
Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…
Enhancing Uncertainty Estimation in Neural Networks
May 6, 2025Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this presentation, I will present…
Improving Evaluation Metrics for Vision-and-Language Models
April 22, 2025Evaluating image captions is essential for ensuring both linguistic fluency and accurate semantic alignment with visual content. While reference-free metrics…