Speech enhancement techniques aim to recover the original clean signal underlying corrupted speech. Such techniques typically operate in the short-time Fourier transform (STFT) domain where phenomena like additivity of background noises, interfering speakers and echoes are easier to model. By contrast, automatic speech recognition (ASR), and in general most speech-related machine learning applications, operate on feature spaces that are non-linear transformations of the STFT. The reason for this is that such spaces provide a more compact representation of the acoustic space, the space of all acoustic realizations for a given task, and thus lead to simpler models. This talk discusses the integration of STFT speech enhancement and ASR using the concept of uncertainty propagation and decoding. This will include conventional speech enhancement in STFT domain, its associated uncertainty and various closed-form solutions for propagation into domains suitable for ASR.
Integration of Fourier Domain Speech Enhancement and Automatic Speech Recognition through Uncertainty Propagation
March 6, 2012
1:00 pm