Integration of Fourier Domain Speech Enhancement and Automatic Speech Recognition through Uncertainty Propagation

Speech enhancement techniques aim to recover the original clean signal underlying corrupted speech. Such techniques typically operate in the short-time Fourier transform (STFT) domain where phenomena like additivity of background noises, interfering speakers and echoes are easier to model. By contrast, automatic speech recognition (ASR), and in general most speech-related machine learning applications, operate on feature spaces that are non-linear transformations of the STFT. The reason for this is that such spaces provide a more compact representation of the acoustic space, the space of all acoustic realizations for a given task, and thus lead to simpler models. This talk discusses the integration of STFT speech enhancement and ASR using the concept of uncertainty propagation and decoding. This will include conventional speech enhancement in STFT domain, its associated uncertainty and various closed-form solutions for propagation into domains suitable for ASR.

Ramon Astudillo

Ramon F. Astudillo obtained the industrial engineering degree with specialization electronics in automatic regulation at the Escuela Politecnica Superior de Ingenieria de Gijón (Spain) in 2005, completing the last two years of this degree with an Erasmus scholarship at the Technische Universität Berlin. In 2006 he worked as an intern at Peiker Acustic researching model-based speech enhancement. On this same year he was awarded with a La Caixa and the German Academic Exchange Service (DAAD) scholarship for research towards the Ph.D. degree. He obtained the title with distinction from the Technische Universität Berlin in 2010 in the fields of speech processing and robust automatic speech recognition. Dr. Astudillo is currently a Post.- Doc. researcher at INESC-ID in Lisbon, researching both on robust speech recognition and robust natural language processing speech applications in a Bayesian setting. He is also an ISCA member and reviewer of IEEE-TASLP/SPL, CSL and EURASIP.INESC-ID