Correlation Attack

Tags: exploit

A form of attack on stream ciphers which [see page 31, exploits] the approximate linear-relationship between a combining functions inputs and outputs.

Example with a Geffe generator

A correlation attack on a Geffe generator works based on the following assumptions. We have:

An endless amount of bits from the input $ i $ and output-stream $ b $
The truth table of the combining function (Geffe generator)
The tap sequence of each LFSR feeding the Geffe generator.
An enumerator over all permutations of the initial key-state for two bits used in a linear expression $ s_i \forall i \in \{e,f\} $

We've already established that $ b_i $ agrees with $ e_i $ 75% of the time. Now if we:

Feed the [see page 28, correct] value for $s_i$ into a combining function and check whether $ i_j $ agrees with $ b_j $ with roughly 75% accuracy, we've found the correct key-state.
Otherwise with the key-state set to an [see page 29, incorrect] value $ i_j $ is expected to agree with $ b_j $ roughly randomly (50% of the time).

This approach of generating possible key values feeding them into the cipher, and observing the bias between the key and output bits to determine whether the generated key is correct can be used with both $ s_e $ and $ s_f $ because they both have 75% correlation with the generator output $ b $.

If we use this approach to find the key-state for both $ e $ and $ f $ we can compute the key-state for $ d $ through a plain brute-force search (substituting in the values for $ s_e $ and $ s_f $) and checking whether the output of a permutation matches the known ciphertext stream $ b $.

Observe that this solution isn't limited to a Geffe generator. Any bit-stream using a combination-function with multiple LFSRs can be attacked so long as the output bit has some bias.

Multi-prong Attack

Once we've performed a successful correlation attack we don't have to fallback to a brute-force attack on the remaining unknown key bits. We can substitute the correct value for bits we know and discover new linear-relations for unknown-bits to perform a subsequent correlation attack on.

See [see page 2, Q3]:

However, consider now only table values where $x_3 == 0$. Now it is the case that $x_1$, has $25%$ agreement ($\frac{2}{8}$) with $f(x_1, x_2, x_3)$. So if we now just look at values of $f(x_1, x_2, x_3)$ and values of the $L1$ stream (i.e. the $x_1$ stream), when $x_3$ is $0$ (and we know which bits these are) then we can launch a correlation attack on $L1$ in much the same way as the attack on $L3$.

Divide and Conquer

Observe that in the above example we have 3 input LFSRs each with their own key. Assuming each key is $32$bits in length we conclude that the compound key of the generator is $ 32+32+32 $bits. This means a brute-force (enumerative) attack will have try on the order of $ 2^{{32}3} = 2^{32+32+32} $ possible permutations of the key (key-space).

With the correlation attack we instead find the bits for each of the streams independently leading to a key-space ($ 3 * 2^{32} = {2}^{35} \approx {2}^{32} $) which is considerably smaller.

The practicality of multiple LFSRs is that the size of the key-space is a product of each indiddual register however this divide-and-conquer approach has reduced this to a sum of key sizes, each attacked in turn.