Speech enhancement techniques aim to recover the original clean signal underlying corrupted speech. Such techniques typically operate in the short-time Fourier transform (STFT) domain where phenomena like additivity of background noises, interfering speakers and echoes are easier to model. By contrast, automatic speech recognition (ASR), and in general most speech-related machine learning applications, operate on feature spaces that are non-linear transformations of the STFT. The reason for this is that such spaces provide a more compact representation of the acoustic space, the space of all acoustic realizations for a given task, and thus lead to simpler models. This talk discusses the integration of STFT speech enhancement and ASR using the concept of uncertainty propagation and decoding. This will include conventional speech enhancement in STFT domain, its associated uncertainty and various closed-form solutions for propagation into domains suitable for ASR.
Integration of Fourier Domain Speech Enhancement and Automatic Speech Recognition through Uncertainty Propagation
March 6, 2012
1:00 pm
Ramon Astudillo
Ramon F. Astudillo obtained the industrial engineering degree with specialization electronics in automatic regulation at the Escuela Politecnica Superior de Ingenieria de Gijón (Spain) in 2005, completing the last two years of this degree with an Erasmus scholarship at the Technische Universität Berlin. In 2006 he worked as an intern at Peiker Acustic researching model-based speech enhancement. On this same year he was awarded with a La Caixa and the German Academic Exchange Service (DAAD) scholarship for research towards the Ph.D. degree. He obtained the title with distinction from the Technische Universität Berlin in 2010 in the fields of speech processing and robust automatic speech recognition. Dr. Astudillo is currently a Post.- Doc. researcher at INESC-ID in Lisbon, researching both on robust speech recognition and robust natural language processing speech applications in a Bayesian setting. He is also an ISCA member and reviewer of IEEE-TASLP/SPL, CSL and EURASIP.INESC-IDSeminários
Últimos seminários
Unlocking Latent Discourse Translation in LLMs Through Quality-Aware Decoding
June 17, 2025Large language models (LLMs) have emerged as strong contenders in machine translation. Yet, they often fall behind specialized neural machine…
Speech as a Biomarker for Disease Detection
May 20, 2025Today’s overburdened health systems face numerous challenges, exacerbated by an aging population. Speech emerges as a ubiquitous biomarker with strong…
Enhancing Uncertainty Estimation in Neural Networks
May 6, 2025Neural networks are often overconfident about their predictions, which undermines their reliability and trustworthiness. In this presentation, I will present…
Improving Evaluation Metrics for Vision-and-Language Models
April 22, 2025Evaluating image captions is essential for ensuring both linguistic fluency and accurate semantic alignment with visual content. While reference-free metrics…