Extras din curs
IntroductionEquation Section 1
From human prehistory to the new media of
the future, speech communication has been and will be the dominant mode of human social
bonding and information exchange. The spoken word is now extended, through technological
mediation such as telephony, movies, radio, television, and the Internet. This trend reflects
the primacy of spoken communication in human psychology.
In addition to human-human interaction, this human preference for spoken language
communication finds a reflection in human-machine interaction as well. Most computers
currently utilize a graphical user interface (GUI), based on graphically represented interface
objects and functions such as windows, icons, menus, and pointers. Most computer operating
systems and applications also depend on a users keyboard strokes and mouse clicks, with a
display monitor for feedback. Todays computers lack the fundamental human abilities to
speak, listen, understand, and learn. Speech, supported by other natural modalities, will be
one of the primary means of interfacing with computers. And, even before speech-based interaction
reaches full maturity, applications in home, mobile, and office segments are incorporating
spoken language technology to change the way we live and work.
2 Introduction
A spoken language system needs to have both speech recognition and speech synthesis
capabilities. However, those two components by themselves are not sufficient to build a useful
spoken language system. An understanding and dialog component is required to manage
interactions with the user; and domain knowledge must be provided to guide the systems
interpretation of speech and allow it to determine the appropriate action. For all these components,
significant challenges exist, including robustness, flexibility, ease of integration,
and engineering efficiency. The goal of building commercially viable spoken language systems
has long attracted the attention of scientists and engineers all over the world. The purpose
of this book is to share our working experience in developing advanced spoken language
processing systems with both our colleagues and newcomers. We devote many chapters
to systematically introducing fundamental theories and to highlighting what works well
based on numerous lessons we learned in developing Microsofts spoken language systems.
1.1. MOTIVATIONS
What motivates the integration of spoken language as the primary interface modality? We
present a number of scenarios, roughly in order of expected degree of technical challenges
and expected time to full deployment.
1.1.1. Spoken Language Interface
There are generally two categories of users who can benefit from adoption of speech as a
control modality in parallel with others, such as the mouse, keyboard, touch-screen, and joystick.
For novice users, functions that are conceptually simple should be directly accessible.
For example, raising the voice output volume under software control on the desktop speakers,
a conceptually simple operation, in some GUI systems of today requires opening one or
more windows or menus, and manipulating sliders, check-boxes or other graphical elements.
This requires some knowledge of the systems interface conventions and structures. For the
novice user, to be able to say raise the volume would be more direct and natural. For expert
users, the GUI paradigm is sometimes perceived as an obstacle or nuisance and shortcuts are
sought. Frequently these shortcuts allow the power users hands to remain on the keyboard or
mouse while mixing content creation with system commands. For example, an operator of a
graphic design system for CAD/CAM might wish to specify a text formatting command
while keeping the pointer device in position over a selected screen element.
Speech has the potential to accomplish these functions more powerfully than keyboard
and mouse clicks. Speech becomes more powerful when supplemented by information
streams encoding other dynamic aspects of user and system status, which can be resolved by
the semantic component of a complete multi-modal interface. We expect such multimodal
interactions to proceed based on more complete user modeling, including speech, visual orientation,
natural and device-based gestures, and facial expression, and these will be coordinated
with detailed system profiles of typical user tasks and activity patterns.
Motivations 3
In some situations you must rely on speech as an input or output medium. For example,
with wearable computers, it may be impossible to incorporate a large keyboard. When driving,
safety is compromised by any visual distraction, and hands are required for controlling
the vehicle. The ultimate speech-only device, the telephone, is far more widespread than the
PC. Certain manual tasks may also require full visual attention to the focus of the work. Finally,
spoken language interfaces offer obvious benefits for individuals challenged with a
variety of physical disabilities, such as loss of sight or limitations in physical motion and
motor skills. Chapter 18 contains detailed discussion on spoken language applications.
Preview document
Conținut arhivă zip
- Spoken Language.pdf