Brain Dump

Linear Prediction

Tags
speech-processing

The ability to [see page 2, predict] the next sample of speech from a weighted sum of previous samples. This exploits the redundancy arising from the Stationary Assumption.

This can be done with 10-15 prediction-coefficients (previous samples).

We define [see page 3, prediction-error] as the difference between the predicted signal and the actual signal. Therefore our goal with linear prediction is to minimise the sum squared prediction-error (residual signal) over a frame of speech data.

Note: Doing so has the effect of whitening the error signal (removing it's structure and making it appear like whitenoise). See [see page 6, further assumptions].

There are 2 main algorithms for minimising the error auto-correlation (Durban's algorithm) and covariance, see [see page 4, comparisons] here.

Analysis vs. Synthesis

A linear prediction filter [see page 5, can] be used in either:

ApplicationDescription
AnalysisFinding filter coefficients and producing an error signal (risidual error)
SynthesisReconstructing the original signal from the error signal and coefficients

This makes linear prediction a great tool for coding: taking a signal, transmitting the lower bit rate error signal (and coefficients), reconstructing the signal at the other end.

Note: This is all only possible because speech has structure. Speech generally changes much slower than we record. The original signal is the recorded air distortions caused by the sound, the prediction coefficients are encoding the changes in the vocal tract and the error signal is encoding the source excitation (excitation of the larynx).

Assumptions

LP analysis assumes impule/white-noise excitation (meaning it captures the spectral shaping caused by the glottal source).

LP Synthesis ignores any zeros (eg. the nasal side branch and the path back down the glottis).

Coefficient Count vs Error Signal

The relationship between the coefficient count (LP Order) and the error signal is inversely proportional. The less coefficients we have, the more data we send through the error signal and vice-versa.

Reflection Coefficients

Alternative from using calculated coefficients, [see page 8, Durban's algorithm] uses vocal tract area estimation to use coefficients mapping directly to changes in the area of the vocal tract.

Applications

TermDescription
Pitch ExtractionFind the pitch frequency by auto-correlation (disregard noise (eg. tongue moving)).
[see page 10, Spectrum estimation]The LP spectrum is a smoothed all-pole approximation to the speech spectrum.
[see page 11, Formant extraction]Poly-pair from LP analysis can model a formant (either through root-solving or peak solving).
[see page 12, Speech synthesis]Implement the source-filter-model.
[see page 12, Speech recognition]Essentially analyse the Line spectral pairs through various methods to deduce intents.
[see page 13, Linear predictive coding]Compress speech signals.

Linear Predictive Coding

[see page 13, Mixed] excitation LPC uses two state excitation (buzzy sounds) and can offer intelligibility at 2400 down to 600 bit per second.

Multipulse LPC is a variant of this which find the best position/amplitude for every pulse by running in a LP loop.

Code-excited LPC builds on multipulse by quantizing multiple pulses into a codebook for various sets of pulses. This is the basis of most modern speech coding systems including the [see page 15, GSM speech codec].

Links to this note