Speech Frames

Tags: speech-processing

Frames refer to a fixed length sequence of quantised signal samples (after ADC conversion). signal-processing is often done on speech frames.

Due to the fact that the vocal tract changes shape fairly slowly with time and tends to be fairly constant over short intervals, it makes sense to process over short intervals rather than be mislead by subtle frequency shifts over these intervals. <01holmes-speech-synthesis+recognition>.

Frame Size Requirements

The choice of frame size is a [see page 1, compromise] of:

Having sufficient data in a frame to make the required measurements.
Having small enough amount of data that the quasi-stationary assumption of speech is fulfilled.

Note: we must ensure there are enough frames to capture non-stationary properties.

To accommodate all of these constraints we often use Block Processing, with:

\(NT = 30\) Milliseconds
\(f_r = 100\) fps

Links to this note

Short Time Energy
Windowing
Zero Crossing Rate