Speech Processing

The study of speech signals. Or more aptly a special application digital Signal Processing for speech signals.

Kinds of speech processing technologies:

Digital Speech Coding - Encode and transmit speech, eg. telephone.
Automatic Speech Recognition - Voice to text parsing
Text to Speech Synthesis - Text to speech conversion
Spoken Language Dialogue System - Automated Conversations (combination of ASR and TSS), eg. telephone marketing.

Applications of these technologies include:

For more, see [see page 14, here]. See summary chart [see page 19, here].

Advantages

All applications can be divided into 6 types, through the acronym C3I2PS:

These can also be [see page 12, further divided] into:

Robust - Robustly handles noisy input
Speaker Independent - works for anybody, any speaker is allowed.
Small Vocabulary - Accepts only a subset of english/commands
Automatic Speech Recognition - convert speech to intentions.
Forced Alignment - Align incoming speech with existing text, start interactive media as you read into it.
Large Vocbulary - Recognises anything you can say
Continuous Speech Recognition - Can speak continously, without need for pauses
Vocal Emotion Detection - Estimate how a person feels from their voice, (eg. Voice Stress Analysis).
Direction Estimation - Estimate position of voice from speech signal. eg. Speaker Localisation.
Audio Alignment - Make two voices play at near enough the same time and speed.