Spoken and Multimodal Dialog Technology and Systems


8:30 - 11:45 AM, 18 May, 2002
Presenters: Mazin Rahim and Alex Acero
Abstract:

Voice technologies provide opportunities for a new user interface to information access devices where a keyboard and/or screen may not be available. Whether using a cellular phone or a hand-held device, consumers will one day be able to access services and retrieve any information at anytime and anywhere through voice, gesture or a combination of the two modalities.

The goals of this tutorial are the following:

(a) Provide researchers and developers a theoretical and practical prospective of the technologies behind voice-enabled and multimodal applications
(b) Address the technical challenges that we face today as we continue to transfer these technologies from the labs to the real world
(c) Present an overview of the exciting market opportunities that exist today and in years ahead in both the business market (e.g., call centers and help desk) and the consumer market (e.g., personal and multimodal wireless hand-held devices, home appliances).

The tutorial will include demonstrations and videos drawn from business and consumer scenarios to illustrate the process of building spoken dialogue and multimodal applications using speech recognition, natural language dialogue and text-to-speech synthesis technologies.

Agenda:

Part I: Automatic speech recognition

  • Robust signal processing: signal acquisition, feature extraction
  • Acoustic modeling: maximum likelihood and discriminative training, hidden Markov modeling, model adaptation, pronunciation modeling
  • Language modeling: Stochastic and context-free grammars
  • Decoder: Fast search, small vs. large vocabulary
Part II: Natural Language Dialog
  • Natural language understanding, information extraction and retrieval
  • Topic classification and lexical/semantic parsing
  • Template/stochastic generation
  • System-directive and mixed initiative
  • Dialog evaluation and user interface
Part III: Text to Speech Synthesis
  • Front-end analysis
  • Concatenative synthesis
  • Visual text-to-speech
Part IV: Multimodal Technology
  • Multimodal Integration
  • Multimodal generation and dialogue
Part V: Application Development and Market Opportunities
  • How to build your own voice-enabled or multimodel application
  • Voice portals and speech service/engine providers
  • Market opportunities for business and consumers
  • A vision and a dream! What businesses and consumers should expect by 2005!
  • Demonstrations and videos

About the
presenters:
Mazin RahimMazin Rahim received the B.Eng. and Ph.D. degrees from the University of Liverpool, England, in 1987 and 1991, respectively. He joined AT&T Bell Labs in 1990 as a consultant in the area of articulatory speech synthesis. In 1991, he was appointed a research professor at Rutgers University, NJ, where he was engaged in research in the area of neural networks for speech and speaker recognition. He joined Bell Labs in 1993 as a technical staff member pursuing research in the areas of robustness, acoustic modeling and utterance verification for automatic speech recognition. Dr. Rahim is currently a division manager in the Speech Processing Center at AT&T Labs- Research. The major focus of his division is the advancement of AT&T's technologies in areas of interactive speech and multimodal user interfaces. This includes fundamental, forward looking research in robustness, acoustic and language modeling, multimodal and spoken language dialog. Dr. Rahim has over fifty publications in the areas of speech and dialog and is the author of the book "Artificial Neural Networks for Speech Analysis/Synthesis" (London: Chapman and Hall, 1994). He holds 10 US patents and is a recipient of several national and international awards. Dr. Rahim is a senior member of the Institute of Electrical and Electronics Engineers (IEEE). He was an associate editor for the IEEE Transactions on Speech and Audio Processing from 1995 to 1999, and a Chair of the 1999 workshop on Automatic Speech Recognition and Understanding, ASRU'99. He is currently a member of the IEEE Speech Technical Committee.
  Alex AceroAlex Acero received a Masters degree from the Polytechnic University of Madrid (Spain) in 1985, a Masters from Rice University (Houston, TX) in 1987 and a PhD from Carnegie Mellon (Pittsburgh, PA), all in Electrical Engineering. He joined Apple Computer in 1990 where he worked on the Plaintalk speech recognition system for the Macintosh. In 1991, he joined Telefonica R&D labs, where he was the manager of the speech technology group, working on speech recognition, synthesis and telephony integration for interactive voice response systems. In 1994 he joined Microsoft Research where he is currently senior researcher and manager of the speech technology group. Dr. Acero is an affiliate professor at the University of Washington. The speech technology group at Microsoft Research has contributed speech technology (both recognition and synthesis) to several Microsoft products including Office XP, Windows XP and the SAPI/SDK programming environment. The major focus of this group is to make speech an important modality of an application’s user interface, and as such has developed MiPad, one of the first speech-centric multimodal applications for handheld devices. Current research includes robustness, acoustic and language modeling, multimodal technologies and spoken language dialog. Dr. Acero is the author of the books "Spoken Language Processing" (Prentice Hall, 2001) and Acoustical and Environmental Robustness in Automatic Speech Recognition (Kluwer, 1993), as well as edited chapters in 3 other books. He has over 50 publications in the areas of speech and dialog and holds 7 US patents. Dr. Acero is a senior member of the Institute of Electrical and Electronics Engineers (IEEE) and chair of the IEEE Signal Processing Society’s Speech Technical Committee. He was general co-chair of the 2001 IEEE workshop on Automatic Speech Recognition and Understanding (ASRU 2001), sponsorship chair of ASRU ’99 and publications chair of ICASSP98. He is associate editor for Computer, Speech and Language (Academic Press, UK) and reviewer for numerous conferences and journals.
 
 
Call for Papers | Committees | Exhibitor Info | Program | MySchedule | Regular Submissions
| ITT | Paper Review | Tutorials | Registration | Workshops | Housing | Events | Home
© 2002 CMS -||- Email: icassp2002web@securecms.com -||- Last Updated: 23 April, 2002