Brain Dump

Speech Processing

Tags
speech-processing

The study of speech signals. Or more aptly a special application digital Signal Processing for speech signals.

Kinds of speech processing technologies:

  • Digital Speech Coding - Encode and transmit speech, eg. telephone.
  • Automatic Speech Recognition - Voice to text parsing
  • Text to Speech Synthesis - Text to speech conversion
  • Spoken Language Dialogue System - Automated Conversations (combination of ASR and TSS), eg. telephone marketing.

Applications of these technologies include:

  • Extracting information from speech, involving [see page 5, several] stages.
  • Enhancing bad quality speech.
  • Speech 2 Speech - Automatic translation
  • Dictation

For more, see [see page 14, here]. See summary chart [see page 19, here].

Advantages

  • hands free control
  • eyes free control
  • fast
  • intuitive

Markets and Applications

  • Medicine
  • Transport Systems
  • Office - Document Dictation
  • Military (communications, etc.)

All applications can be divided into 6 types, through the acronym C3I2PS:

  • Command & Control
  • Communications
  • Information
  • Intelligence
  • Processing
  • Storage

These can also be [see page 12, further divided] into:

  • Interactive vs non-interactive
  • Real-time vs non-Real-time

Keywords

  • Robust - Robustly handles noisy input
  • Speaker Independent - works for anybody, any speaker is allowed.
  • Small Vocabulary - Accepts only a subset of english/commands
  • Automatic Speech Recognition - convert speech to intentions.
  • Forced Alignment - Align incoming speech with existing text, start interactive media as you read into it.
  • Large Vocbulary - Recognises anything you can say
  • Continuous Speech Recognition - Can speak continously, without need for pauses
  • Vocal Emotion Detection - Estimate how a person feels from their voice, (eg. Voice Stress Analysis).
  • Direction Estimation - Estimate position of voice from speech signal. eg. Speaker Localisation.
  • Audio Alignment - Make two voices play at near enough the same time and speed.