Brain Dump

Speech

Tags
speech-processing

Speech is the continuous production of sound in a structured form to convey information. By continuous I mean that speech doesn't require a [see page 2, break] between utterances; a speech recording is a continuous signal.

Speech is the most sophisticated behaviour of the most complex organism in the known universe.

Richness

Speech is [see page 12, rich] in various kinds of information.

InformationMeaningExample
LinguisticWhat's being said (distinguishing meaning)Ahead vs. A Head
Para LinguisticHow it's being said emotion).Emotion
Extra LinguisticOther speaker generated behaviourBreathing

Variability

Speech is also [see page 4, variable] meaning there're over 7,700,000,000 people in the world and they all speak differently.

Speech also varies based on dialects (different words to reference the same thing in the same language) and accents.

VariationDescription
DialectDifferent words reference the same thing in the same language (eg. "beck" = "stream", in old yorkshire).
AccentDifferent sounds are used.

We [see page 6, divide] variations into:

VariationDescription
Inter-SpeakerCaused by age, gender, physical characteristics, etc.
Intra-SpeakerCaused by physiological factors (eg. bad health), psychological factors (eg. mood) or external factors (eg. environment)

[see page 9, Factors] that can affect your speech output:

FactorAffect
Noise (static)Hyper Articulation (The Lombard Effect). You speak much louder, clearer etc. You'll be putting in more effort.
VibrationStresses to the chest, vocal tract and jaw greatly diminishes your ability to speak.
The TaskCasual conversation vs. Reading out loud vs. Lecturing all have a different style of speaking.
The ListenerBah Bah Goo Gah. We speak differently depending on who we're talking to (eg. a baby).
Cognitive LoadWhen overloaded people can find it hard to speak or listen because their focus is entirely elsewhere (eg. driving)
Alcohol/DrugsImpaired motor control can make it difficult to properly articulate.

Consequences

Variability [see page 10, means] that signals we'd like to be the same are actually quite different and things which we'd like to be different can be very similar (ambiguity).

Disfluency

Normal spontaneous speech [see page 13, contains]:

  • False starts
  • Repeats
  • Filled pauses uhmms
  • Non-linguistic utterances (NLUs)
  • Overlaps

These so called disfluencies can make speech easier for a human to produce and understand.

Multimodality

Speech is not only acoustic, it's also [see page 14, visual]. Seeing a persons lips while they speak helps us tell what they're saying. Our visuals can even override acoustic information (the McGurk effect).

Contamination

Speech is [see page 15, contaminated] with countless noise making it harder to decipher acoustic signals.

Consonants are the part that make it easier to decipher censored words, but it's often the quiet part of noise meaning its affected by noise the most. Vowels are the loud part of speech but rarely as helpful as consonants.