Automatic Recognition of Natural Speech


1:30 - 4:45 PM, 13 May, 2002
Presenter: Prof. Douglas O'Shaughnessy
Abstract:

The automatic conversion of conversational speech into text is an interdisciplinary task involving computer science, engineering, acoustics, linguistics, and psychology. This tutorial will discuss the modern techniques of automatic speech recognition, emphasizing the breadth of knowledge needed to approach near-human performance in this complex task. We will first briefly examine human speech production from an acoustic-phonetic view. The standard methods of speech analysis (e.g., FFT and mel-based cepstrum) will be presented and discussed in terms of efficiency and robustness. The differences in objectives between speech coding and speech recognition will be noted. We will present the modern stochastic techniques to speech recognition (i.e., hidden Markov models), with simple examples to emphasize understanding for a non-expert audience. The issues of adequate training corpora and the many trade-offs for different practical applications will be discussed (e.g., continuous vs. isolated-word recognition; small vs. large vocabularies). The differences between read speech and conversational speech will be examined, in terms of disfluencies, variable speaking rate, and increased use of function words. The added difficulties of recognizing speech over the telephone and with hands-free terminals will be explained. The importance of appropriate language models will be emphasized, with both basic N-gram models and more complex class-based and distance models discussed. We will describe the current state-of-the-art n recognition of natural speech, both commercial and research, noting where current systems do well and where they come up short. The possibilities of integrating knowledge-based sources (e.g., aspects of expert systems) into the current stochastic approaches to speech recognition will be examined. Predictions as to the future course of speech recognition research will be made.

About the
presenter:
Prof. Douglas O'ShaughnessyDr. O'Shaughnessy has worked in the speech communication field for 30 years, first in study at MIT (BSc and MS in 1972, PhD in 1976), then as director of a research team at INRS in the areas of speech analysis, coding, synthesis, recognition and enhancement. After working on the MITalk synthesis project in the early 1970s, he developed one of the first French text-to-speech system in the early 1980s. His textbook "Speech Communication: Human and Machine" (Addison-Wesley, 1987, and now in second edition by IEEE Press, 2000) has been widely used. His most recent focus has been on speech recognition, where his research group publishes regularly in the ICASSP, ICSLP, and Eurospeech Proceedings. He is an associate editor for the Journal of the Acoustical Society of America and just completed a term as associate editor for the IEEE Transactions on Speech and Audio Processing. He also teaches every year as an adjunct professor in the electrical engineering department at McGill University. He is the General Chair for ICASSP-2004 in Montreal.
 
 
Call for Papers | Committees | Exhibitor Info | Program | MySchedule | Regular Submissions
| ITT | Paper Review | Tutorials | Registration | Workshops | Housing | Events | Home
© 2002 CMS -||- Email: icassp2002web@securecms.com -||- Last Updated: 23 April, 2002