Speech Processing
The study of speech signals. Or more aptly a special application digital Signal Processing for speech signals.
Kinds of speech processing technologies:
- Digital Speech Coding - Encode and transmit speech, eg. telephone.
- Automatic Speech Recognition - Voice to text parsing
- Text to Speech Synthesis - Text to speech conversion
- Spoken Language Dialogue System - Automated Conversations (combination of ASR and TSS), eg. telephone marketing.
Applications of these technologies include:
- Extracting information from speech, involving [see page 5, several] stages.
- Enhancing bad quality speech.
- Speech 2 Speech - Automatic translation
- Dictation
For more, see [see page 14, here]. See summary chart [see page 19, here].
Advantages
- hands free control
- eyes free control
- fast
- intuitive
Markets and Applications
- Medicine
- Transport Systems
- Office - Document Dictation
- Military (communications, etc.)
All applications can be divided into 6 types, through the acronym C3I2PS:
- Command & Control
- Communications
- Information
- Intelligence
- Processing
- Storage
These can also be [see page 12, further divided] into:
- Interactive vs non-interactive
- Real-time vs non-Real-time
Keywords
- Robust - Robustly handles noisy input
- Speaker Independent - works for anybody, any speaker is allowed.
- Small Vocabulary - Accepts only a subset of english/commands
- Automatic Speech Recognition - convert speech to intentions.
- Forced Alignment - Align incoming speech with existing text, start interactive media as you read into it.
- Large Vocbulary - Recognises anything you can say
- Continuous Speech Recognition - Can speak continously, without need for pauses
- Vocal Emotion Detection - Estimate how a person feels from their voice, (eg. Voice Stress Analysis).
- Direction Estimation - Estimate position of voice from speech signal. eg. Speaker Localisation.
- Audio Alignment - Make two voices play at near enough the same time and speed.