Cepstral Analysis

Tags: speech-processing

Is a see page 1, method to separate the vocal tract filter response from the excitation.

This takes advantage of the fact that the spectrum of a speech signal is the product of the excitation spectrum and the vocal-tract frequency response.

Recall that: \[ \log(a . b) = log(a) + log(b) \] From this and the above observation we know that the log-spectrum of a signal is the summation of the log of the excitation spectrum and the log of the vocal-tract frequency response. Cepstral analysis extracts both these spectrums from a given signal by using

Component Extracted	Filter	Description
Excitation Spectrum	High Pass	Supresses low-time components
Vocal Tract Transfer Function	Low Pass	Supresses high-time components

See a formal description of the process [see page 4, here].

a low pass and high pass filter.

This is a form of Homomorphic Filtering.

We define the [see page 3, cepstrum] as the spectrum of the log transform of an input signal. I.E. We take a signal, apply a logarithm transform and treat it as a time-domain signal.

Note: The real cepstrum is the inverse Fourier transform of the log magnitude spectrum of the original signal and is measured in the Quefrency Domain.

Mel Frequency Cepstrum Coefficients (MFCCs)

Are approximations to full cepstral coefficients (with several [see page 7, properties]) that can be used as features in automatic speech recognition. They're often found using a DFT whose output is grouped into 20-40 bands.

See [see page 10, features] of good acoustic features.

Automatic Speech Recognition

The [see page 9, process] of ASR is therefore:

FFT to go from time domain to frequency domain spectrum.
Mel transform to get the powers for each frequency band.
Log transform
DFT to the logarithm to decorrelate the signal.

[see page 6, Applications]

Pitch estimation - by finding peak cepstral values in high quefrency components
Vocoding
Smoothed spectrum estimation - as an alternative to linear-prediction
Automatic speech recognition