a5-hack.htm

a5-hack.htm: Posted Dec 21, 1999; a5-hack.htm is a Paper on Cryptanalysis of A5 Stream Cipher; tags | encryption; SHA-256 | 322c828389ebb473ecee8f9432ebe60ade40157b920caa7b12663fdc5c9761a1; Download | Favorite | View
a5-hack.htm

<HTML>
<HEAD>
  <!-- Created with AOLpress/2.0 -->
  <TITLE>Cryptanalysis of Alleged A5 Stream Cipher / On Random Mappings and
  Random Permutations</TITLE>
</HEAD>
<BODY BGCOLOR="#ffffff">
<P ALIGN=Center>
3 May 1998<BR>
Source: Hardcopy from Anonymous<BR>
Thanks to the authors and Springer
<P ALIGN=Center>
Add: <A HREF="#wgc">On Random Mappings and Random Permutations</A><BR>
W. G. Chambers
<P>
  <HR>
<BLOCKQUOTE>
  Walter Fumy (Ed.)
  <H2>
    Advances in Cryptology --<BR>
    EUROCRYPT ' 97
  </H2>
  <H3>
    International Conference on the Theory and <BR>
    Application of Cryptographic Techniques <BR>
    Konstanz, Germany, May 11-15, 1997 <BR>
    Proceedings
  </H3>
  <P>
  Springer
  <P>
  <I>[Excerpt, Pages 239-255]</I>
</BLOCKQUOTE>
<P>
  <HR>
<P>
<I>[Note: Please verify equations by reference to the originals]</I>
<P>
239
<H2 ALIGN=Center>
  Cryptanalysis of Alleged A5 Stream Cipher<BR>
</H2>
<P ALIGN=Center>
<B>Jovan Dj. Golic * </B><BR wp="br1">
<BR wp="br2">
School of Electrical Engineering, University of Belgrade<BR>
Bulevar Revolucije 73, 11001 Beograd, Yugoslavia<BR>
<BLOCKQUOTE>
  <STRONG>Abstract. </STRONG>A binary stream cipher, known as A5, consisting
  of three short LFSRs of total length 64 that are mutually clocked in the
  stop/go manner is cryptanalyzed. It is allegedly used in the GSM standard
  for digital cellular mobile telephones. Very short keystream sequences are
  generated from different initial states obtained by combining a 64-bit secret
  session key and a known 22-bit public key. A basic divide-and-conquer attack
  recovering the unknown initial state from a known keystream sequence is first
  introduced. It exploits the specific clocking rule used and has average
  computational complexity around 2<SUP>40</SUP>. A time-memory trade-off attack
  based on the birthday paradox which yields the unknown internal state at
  a known time for a known keystream sequence is then pointed out. The attack
  is successful if <I>T</I> · <I>M
  <U>></U></I>&nbsp;2<SUP>63.32</SUP> where <I>T</I> and <I>M</I> are the
  required computational time and memory (in 128-bit words), respectively.
  The precomputation time is <EM>O(M) </EM>and the required number of known
  keystream sequences generated from different public keys is about T/102.
  For example, one can choose T <U>~</U> 2<SUP>27.67</SUP> and M
  <U>~</U><SUB> </SUB>2<SUP>35.65</SUP>. To obtain the secret session key from
  the determined internal state, a so-called internal state reversion attack
  is proposed and analyzed by the theory of critical and subcritical branching
  processes. <I>["<U>~</U>" here means approximately]</I><BR wp="br1">
  <BR wp="br2">
</BLOCKQUOTE>
<H3>
  1 Introduction
</H3>
<P>
A common type of keystream generators for additive stream cipher applications
consists of a number of possibly irregularly clocked linear feedback shift
registers (LFSRs) that are combined by a function with or without memory.
Standard cryptographic criteria such as a large period, a high linear complexity,
and good statistical properties are thus relatively easily satisfied, see
[12]. However, such a generator may in principle be vulnerable to various
divide-and-conquer attacks in the known plaintext (or ciphertext-only) scenario,
where the objective is to reconstruct the secret key controlled LFSR initial
states from the known keystream sequence, for a survey see [12] and [5].
In practice, for resynchronization purposes, the internal state of a keystream
generator is reinitialized once in <BR wp="br1">
<BR wp="br2">
___________________
<P>
<SMALL>* This work was done while the author was with the Information Security
Research Centre, Queensland University of Technology, Brisbane, Australia.
Part of this work was carried out while the author was on leave at the Isaac
Newton Institute for Mathematical Sciences, Cambridge, United Kingdom. This
research was supported in part by the Science Fund of Serbia, grant #04M02,
through the Mathematical Institute, Serbian Academy of Science and
Arts.</SMALL>
<P>
  <HR>
<P>
240
<P>
a while by combining the same secret session key with different randomizing
keys (typically transmitted in the clear and called here <EM>public) </EM>into
the secret message keys defining different initial internal states. This
may open new possibilities for the secret key recovery cryptanalytic attacks,
see [3].
<P>
In this paper, a keystream generator consisting of three short binary LFSRs
with known primitive feedback polynomials that are mutually clocked in the
stop/go manner is cryptanalyzed. The LFSR lengths are 19, 22, and 23,
respectively, and the total length is thus 64. Middle taps in each of the
LFSRs are used to define the clock-control sequence, the clocking rule is
such that at least two LFSRs are effectively clocked per each output bit,
and the keystream sequence is formed as the bitwise sum of the three stop/go
clocked LFSR sequences. The 64-bit long secret key is nonlinearly combined
with a 22-bit long public key (frame number) to form the LFSR initial states.
The first 100 output bits are discarded and the message length is only 114
bits (frequent resynchronization). However, the full-duplex communication
mode makes the effective message length of 228 bits. The scheme along with
the code has been made public in [1] and is allegedly used under the name
A5 for stream cipher encryption in the GSM standard for digital cellular
mobile telephones, see [13]. For simplicity, the name A5 is used here throughout.
In a yet unpublished paper [14], it has been observed, perhaps surprisingly,
that the period of the keystream sequence is only slightly bigger than the
period, &nbsp;<U>~</U> 2<SUP>23</SUP>, of the longest LFSR. A possibility
for a divide-and-conquer attack of average complexity 2<SUP>40</SUP> has
been mentioned in [1] and [13]. The attack would consist in guessing the
initial states of the two shorter LFSRs and, then, in computing the longest
LFSR sequence from the known keystream sequence. However, this attack can
not work, because the clocking depends on the unknown longest LFSR sequence
as well. In addition, one has to take care of the first 100 output bits being
discarded as well.
<P>
Although one may in principle imagine that edit distance or edit probability
correlation attacks [4] can be adapted to deal with stop/go clocking, such
attacks are not likely to be successful on A5, because of a very short available
keystream sequence. Due to the bitwise summation, to achieve a divide-and-conquer
effect, one or two LFSRs have to be replaced by their linear models [7],
where linear models of individual LFSRs can be based on the repetition property
only, while linear models of pairs of the LFSRs must involve their feedback
polynomials as well. Instead of the so-called shrunk feedback polynomials
[7], we now have to introduce the expanded feedback polynomials. If the whole
scheme is replaced by the corresponding linear model, one may then even conceive
of a fast correlation attack<STRONG> </STRONG>framework similar to the one
from [6], but the required keystream sequence length would be much bigger
than the one at disposal. On the other hand, the conditional correlation
attack [11] based on the repetition property can not be extended to deal
with A5, because of the specific clocking rule.
<P>
The objective of this paper is to develop cryptanalytic attacks on A5 that
can reconstruct the 64-bit secret key in the known plaintext scenario with
the computational complexity smaller than 264. In Section 2, a more detailed
description of the A5 stream cipher is presented. It is shown that the known
plaintext
<P>
  <HR>
<P>
241
<P>
attacks are very realistic in the GSM applications. In Section 3, a basic
divide-and-conquer attack on A5 with the average computational complexity
2<SUP>40.16</SUP> is introduced. It essentially consists in guessing some
bits of the LFSR states, in recovering the others by solving appropriate
linear equations, and in the LFSR states reversion via the unknown binary
clocking sequences to obtain the LFSR initial states. The last step is needed
since the first 100 output sequence bits are discarded. In Section 4, a
time-memory trade-off attack based on the birthday paradox probabilistic
argument is pointed out. This attack is feasible due to relatively short
internal state size of 64 bits. It can recover the LFSR internal states for
a particular keystream sequence at a particular time and is successful if
<I>T</I> · <I>M <U>></U></I>&nbsp;2<SUP>63.32</SUP>, where <I>T</I>
and <EM>M </EM>are the required computational time and memory, respectively.
The precomputation time is <EM>O(M) </EM>and a sample of T/102 228-bit long
observed keystream sequences generated from the same secret session key and
different public keys is needed. To obtain the secret key, a low-complexity
internal state reversion attack is then proposed in Section 5. It consists
in the reversion of the LFSR internal states, first when the output sequence
is known, then when the output sequence is unknown, and finally when the
secret key is nonlinearly combined with the known public key. The complexity
of the attack is analyzed by the theory of critical and subcritical branching
processes, briefly outlined in the Appendix. Conclusions are given in Section
6.<BR>
<H3>
  2 Description of the Stream Cipher
</H3>
<P>
The stream cipher algorithm to be defined is for simplicity called A5 according
to [1], [13]. The A5 type keystream generator considered is shown in Fig.
1.<BR>
<P ALIGN=Center>
<IMG WIDTH="563" HEIGHT="312" SRC="jgf1.jpg">
<P ALIGN=Center>
<B>Fig. 1.</B> Alleged A5 type keystream generator<BR>
<P>
Let <I>f<SUB>i</SUB></I>(<I>z</I>) =
Sigma<I><SUP><r<SUB>i</SUB></SUP></I><SUB><I>>l</I>=0</SUB>
<I>f<SUB>i,l</SUB> z<SUP>l</SUP></I> denote a known binary primitive feedback
polynomial of LFSR<SUB>i</SUB> of length r<I><SUB>i</SUB></I>, <I>i</I> =
1, 2, 3, and let <I>r</I><SUB>1</SUB> = 19, <I>r</I><SUB>2</SUB> = 22, and
<I>r</I><SUB>3</SUB> = 23. The feedback polynomials specified in [1], [13]
are sparse, but our cryptanalytic methods to be presented do not depend on
their choice.<STRONG> </STRONG>Let <I>S<SUB>i</SUB></I>(0) =
(<I>x<SUB>i</SUB></I>(<I>t</I>))<SUP><I><r<SUB>i</SUB></I>-1></SUP>
<SUB><I>t</I>=0</SUB> denote the initial state of LFSR<I><SUB>i</SUB></I>
and let <I>x<SUB>i</SUB></I> =
(<I>x<SUB>i</SUB></I>(<I>t</I>)<SUP>ºº</SUP>
<SUB><I>t</I>=0</SUB> &nbsp;denote the corresponding maximum-length sequence
of period 2<I><SUP>r<SUB>i</SUB></SUP></I> -1 produced by
LFSR<I><SUB>i</SUB></I> via the linear recursion
<I>x<SUB>i</SUB></I>(<I>t</I>) = Sigma<I><SUP>r<SUB>i</SUB></SUP></I>
<SUB><I>l</I>=1</SUB> <I>f<SUB>i,t</SUB></I>
<I>x<SUB>i</SUB></I>(<I>t</I> - <I>l</I>), <I>t</I> <U>></U><SUB>
</SUB><I>tau<SUB>i</SUB>.</I>
<P>
Let <I>S<SUB>i</SUB></I>(<I>t</I>) =
<I>s<SUB>i,l</SUB></I>(<I>t</I>))<I><SUP>r<SUB>i</SUB></SUP></I>
<SUB><I>l</I>=1</SUB> denote the state of LFSRi at time <I>t</I>
<U>></U> 0 in a scheme with stop/go clocking to be defined below, and
let <I>tau<SUB>i</SUB></I> denote a middle tap from
LFSR<I><SUB>i</SUB></I> used for clock-control. The values suggested in [1]
are <I>tau</I><SUB>1</SUB> = 10, <I>tau</I><SUB>2</SUB> = 11, and
<I>tau</I><SUB>3</SUB> = 2. Then the clock-control sequence <I>C</I> =
(<I>C</I>(<I>t</I>))<SUP>ºº</SUP> <SUB><I>t </I>= 1</SUB> is defined
by
<P ALIGN=Center>
<BR>
<I>C</I>(<I>t</I>) =
<I>g</I>(<I>s</I><SUB>1,<I>tau</I>1</SUB>(<I>t</I> - 1) +
<I>s</I><SUB>2,<I>tau</I>2</SUB>(<I>t</I> - 1) +
<I>s</I><SUB>3,<I>tau</I>3</SUB>(<I>t</I> - 1) &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp;(1)<BR>
<P>
where <I>g</I> is a 4-valued majority function of three binary variables
such that <EM>g</EM>(<I>s</I><SUB>1</SUB>, <I>s</I><SUB>2</SUB>,
<I>s</I><SUB>3</SUB>) + {<I>i</I>, <I>j</I>} if <I>s<SUB>i</SUB></I> =
<I>s<SUB>j</SUB></I> /= <I>s<SUB>k</SUB></I> for <I>i</I> < <I>j</I> and
<I>k</I> /= <I>i</I>, <I>j</I>, and <EM>g</EM>(<I>s</I><SUB>1</SUB>,
<I>s</I><SUB>2</SUB>, <I>s</I><SUB>3</SUB>) = {1, 2, 3} if
<I>s</I><SUB>1</SUB> = <I>s</I><SUB>2</SUB> = <I>s</I><SUB>3</SUB>. The
clock-control value <EM>C(t) </EM>defines which LFSRs are clocked to produce
an output bit <EM>y(t) </EM>as the sum<BR>
<P ALIGN=Center>
<I>y</I>(<I>t</I>) = <I>s</I><SUB>1,l</SUB>(<I>t</I>) +
<I>s</I><SUB>2,l</SUB>(<I>t</I>) + <I>s</I><SUB>3,l</SUB>(<I>t</I>),
&nbsp;<I>t</I> <U>></U> 1. &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp;(2)<BR wp="br1">
<BR wp="br2">
<P>
  <HR>
<P>
242
<P>
Let <I>c<SUB>i</SUB></I> =
(<I>c<SUB>i</SUB></I>(<I>t</I>))<SUP>ºº</SUP> <SUB><I>t </I>= 1</SUB>
&nbsp;&nbsp;denote the binary clocking sequence for
LFSR<I><SUB>i</SUB></I> (it is clocked if <I>c<SUB>i</SUB>(t)</I> = 1 and
not clocked if <I>c<SUB>i</SUB>(t)</I> = 0) which is derived from the
clock-control sequence C in an obvious way. Equation (2) can formally be
used to generate the initial bit <I>y</I>(0) from <I>S</I>(0), so that
<I>y</I> = (<I>y</I>(<I>t</I>))<SUP>ºº</SUP> <SUB><I>t </I>=
0</SUB>&nbsp;&nbsp;is called the output sequence. The first 100 output bits,
(<I>y</I>(<I>t</I>))<SUP>100</SUP> <SUB><I>t </I>= 1</SUB>, are discarded,
the following 114 bits are used as the keystream for one direction of
communication in the full-duplex mode, then the next 100 bits are again
discarded, and the following 114 bits are used as the keystream for the reverse
direction of communication. The encrypted messages are thus very short and
the resynchronization is frequent.
<P>
The LFSR initial states are defined in terms of the secret and public keys.
The public key is a known 22-bit frame number generated by a counter and
hence different for every new message. The 64-bit secret session key is first
loaded into the LFSRs (the all-zero initial state is avoided by setting the
output of the last stage to 1) and the 22-bit public key is then bitwise
added into the feedback path of each of the LFSRs that are mutually clocked
as above. More precisely, if &nbsp;<I>p</I> =
(<I>p</I>(<I>t</I>))<SUP>0</SUP> <SUB><I>t </I>= -21</SUB> denotes the public
key, then for every -21 <U><</U> t <U><</U> 0, the LFSRs are first
stop/go clocked as before and, then, the bit <I>p(t)</I> is added to the
last stage of each of the LFSRs. The LFSR states after these 22 steps, as
a secret message key, represent the initial LFSR states for the keystream
generation.
<P>
The A5 stream cipher is allegedly used to encrypt the links between individual
cellular mobile telephone users and the base station in the GSM system, see
[13]. Therefore, if two users want to communicate to each other via their
base station(s), the same messages get encrypted twice which makes the known
plaintext cryptanalytic attack possible, provided a cooperative insider user
can be established. Note also that the links between the base stations are
not encrypted. For
<P>
  <HR>
<P>
243
<P>
any user, a 64-bit secret session key is generated by another algorithm from
the secret master key specific to the user and a public random 128-bit key
transmitted in the clear from the base station to the user. So, a possible
reconstruction of one or more session keys for a user opens a door for a
cryptanalytic attack on the master key of that user.<BR>
<H3>
  3 Basic Attack
</H3>
<P>
The objective of a divide-and-conquer attack to be presented in this section
is to determine the LFSR initial states from a known keystream sequence
corresponding to only one known plaintext-ciphertext pair. In fact, only
about 64 known successive keystream bits are required. Let <I>S(t)</I> =
(<I>S</I><SUB>1</SUB>(<I>t</I>), <I>S</I><SUB>2</SUB>(<I>t</I>),
<I>S</I><SUB>3</SUB>(<I>t</I>)) denote the whole internal state of A5 at
time t <U>></U> 0, where <I>S</I>(0) is the initial internal state defined
by the secret message key. The known keystream sequence is in fact composed
of two segments (<I>y</I>(<I>t</I>))<SUP>214</SUP> <SUB><I>t </I>=
101</SUB>&nbsp;and (<I>y</I>(<I>t</I>))<SUP>428</SUP> <SUB><I>t </I>=
315</SUB>&nbsp;. The first goal is to reconstruct the internal state
<I>S</I>(101) and the second one is to determine <I>S</I>(0) =
(<I>S</I><SUB>1</SUB>(0), <I>S</I><SUB>2</SUB>(0),
<I>S</I><SUB>3</SUB>(0)) from <I>S</I>(101).
<P>
Recall that <I>c<SUB>i</SUB></I> =
(<I>c<SUB>i</SUB></I>(<I>t</I>))<SUP>ºº</SUP> <SUB><I>t </I>=
1</SUB>&nbsp;denotes the binary clocking sequence for
LFSR<I><SUB>i</SUB></I>, which is clocked if
<I>c<SUB>i</SUB></I>(<I>t</I>) = 1 and not clocked if
<I>c<SUB>i</SUB></I>(<I>t</I>) = 0. If A<I><SUB>i</SUB></I> denotes the
state-transition matrix of regularly clocked LFSR<I><SUB>i</SUB></I>, then
<BR>
<P ALIGN=Center>
<IMG WIDTH="401" HEIGHT="37" SRC="jg15.jpg"><BR>
<P>
with the integer summation in the exponent. Also, let
<I><U>x</U><SUB>i</SUB></I> =
(<I><U>x</U><SUB>i</SUB></I>(<I>t</I>))<SUP>ºº</SUP><SUB>t=0</SUB><SUP>
</SUP>denote the stop/go clocked LFSRi sequence, where<STRONG>
</STRONG><I><U>x</U><SUB>i</SUB></I>(<I>t</I>) =
<I>s<SUB>i</SUB></I><SUB>,1</SUB>(<I>t</I>). In the probabilistic analysis
to follow, a sequence of independent uniformly distributed random variables,
over any finite set, is called purely random. As usual, we keep the same
notation for random variables and their values.
<P>
<STRONG>Proposition 1. </STRONG><I>Assume that the three regularly clocked
LFSR sequences are mutually independent and purely random. Then the 4-valued
clock-control sequence C is purely random and, hence, the binary clocking
sequence ci is a sequence of independent identically distributed binary random
variables with the probability of zero being equal to</I> 1/4.
<P>
<STRONG>Proposition 2. </STRONG>Assume that the three regularly clocked LFSR
<EM>sequences are mutually independent and purely random. Then the bitwise
sum of any two or more stop/go clocked sequences <U>x</U><SUB>i</SUB> is
purely random.</EM>
<P>
It is shown in Section 5 that the state-transition function of A5 is not
one-to-one, so that the set of all reachable internal states at time <EM>t,
t <U>></U></EM> 1, is a subset of the set <I>S</I><SUB>0</SUB> of all
2<SUP>64</SUP> initial states. In particular, only 5 · 2<SUP>61</SUP>
<U>~</U> 2<SUP>63.32</SUP> internal states are reachable for <I>t</I> = 1.
As a consequence, different initial states can give rise to the same internal
state at some time in future or even to the same output sequence too. This
is explained in terms of the theory of branching processes in Section 5.
More precisely, the number of different initial states giving rise to
<P>
  <HR>
<P>
244
<P>
the same internal state at some time in future is very likely linear in that
time and, therefore, relatively small for the times of interest (internal
state reversion when the output is not known, Subsection 5.1). On the other
hand, the number of different initial states yielding the same internal state
at some time in future and the same output sequence is very likely to be
a very small integer (internal state reversion when the output is known,
Subsection 5.2). In addition, since the individual LFSR sequences are
maximum-length sequences with good (low) autocorrelation and crosscorrelation
properties and the combining function is maximum-order correlation immune,
it is highly likely that different output sequences <I>y</I> =
(<I>y</I>(<I>t</I>))<SUP>ºº</SUP> <SUB><I>t </I>= 0</SUB>&nbsp;are
different on the first successive 64 positions,
(<I>y</I>(<I>t</I>))<SUP>63</SUP> <SUB><I>t </I>= 0</SUB>.
<P>
Consequently, it takes about 64 successive keystream bits to check if an
assumed preceding internal state is consistent with the subsequent output
sequence. The expected number of solutions for <I>S</I>(101) is with high
probability a small integer, whereas the the number of solutions for
<I>S</I>(0) (equivalent initial states) is very likely to be relatively small.
<BR>
<H3>
  3.1 Internal state reconstruction
</H3>
<P>
Let <I>S</I>(101) be the internal state to be determined in the first stage
of the attack. Since the number of reachable states <I>S</I>(101) is not
bigger than 2<SUP>63.32</SUP> and the unreachable ones can be simply
characterized by a set of linear equations, in the average complexity analysis
given below we can simply take 63.32 instead of 64. For every <I>i</I> =
1, 2, 3, first guess <EM>n </EM>bits
(<I>s<SUB>i,l</SUB></I>(101))<SUP><I>r<SUB>i</SUB> </I>+ <I>n</I>-1</SUP>
<SUB><I>l</I>=<I>tau</I></SUB><I><SUB>i</SUB> </I>if <EM>n <U><</U>
r<SUB>i</SUB> - tau<SUB>i</SUB> + </EM>1 ,and, if not, then also guess the
next <EM>n </EM>- <I>r<SUB>i</SUB> </I>+ <EM>tau<SUB>i</SUB></EM> - 1 bits
produced by the linear recursion from <I>S<SUB>i</SUB></I>(101). In any case,
one thus obtains 3<EM>n </EM>linearly independent equations for unknown bits
of <I>S</I>(101), provided that <EM>n </EM><U><</U> 19. Since the assumed
bits on average define 4<EM>n</EM>/3 elements of the clock-control sequence,
one can thus form 1 + 4<EM>n</EM>/3&nbsp;additional linear equations, where
the first one is clearly obtained from the first keystream bit <I>y</I>(101)
without using the clock-control sequence. The additional linear equations
are mutually linearly independent, provided that <EM>n <U><</U> </EM>18,
because each one then contains at least two new bits that have not appeared
before. They are linearly independent of the first 3<EM>n </EM>equations
if and only if each of them contains at least one new bit that is not already
guessed. This happens with high probability if<BR>
<P ALIGN=Center>
n < max(<EM>tau</EM><SUB>1</SUB>, <EM>tau</EM><SUB>2</SUB>,
<EM>tau</EM><SUB>3</SUB>) - 1. &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;(4)<BR>
<P>
If not, then the last among the additional equations will necessarily involve
some of the already guessed bits and will with high probability be linearly
dependent on the first 3<EM>n </EM>equations. Suppose first that the condition
&nbsp;(4)<EM> is </EM>satisfied. Then all the obtained linear equations are
with high probability linearly independent, so that the internal state can
be determined uniquely if 1<STRONG> </STRONG>+ 3<I>n</I> + 4<I>n</I>/3
<U>></U> 63.32, that is, if <EM>n <U>></U> </EM>14.38 (it follows that
max(<EM>tau</EM><SUB>1</SUB>, <EM>tau</EM><SUB>2</SUB>,
<EM>tau</EM><SUB>3</SUB>) <U>></U> 16). The obtained state should then
be tested for correctness on additional 3<I>n</I> keystream bits on average
The computational complexity is then <U>~</U> 2<SUP>43.15</SUP> and the total
required keystream
<P>
  <HR>
<P>
245
<P>
sequence length is about 64 successive bits (we keep the fractions since
we deal with the average case complexity).
<P>
Suppose now that max(<EM>tau</EM><SUB>1</SUB>, <EM>tau</EM><SUB>2</SUB>,
<EM>tau</EM><SUB>3</SUB>) <U><</U> 15, which means that the condition
(4) is not satisfied, as is the case in the particular proposal from [1],
where max(<EM>tau</EM><SUB>1</SUB>, <EM>tau</EM><SUB>2</SUB>,
<EM>tau</EM><SUB>3</SUB>) <EM>= tau</EM><SUB>3</SUB><EM>&nbsp;= </EM>12.
In this case, the last of the additional equations are with high probability
linearly dependent and as such can not be used as before, but can be used
to test the linear consistency of the initial guess. If the previous analysis
was extended, then one would get that <EM>n </EM>has to be bigger than 14.38
and that the average complexity would hence increase, contrary to the intuition.
Indeed, one can do better than that. Let initially <EM>n = </EM>10, so that
(4) is satisfied. One thus obtains the total of 1 + 3<I>n</I> + 4<I>n</I>/3
<U>~</U>&nbsp;44.3 linearly independent equations on average. Now, instead
of guessing the next <I>m</I> <U>~</U> 19.02/3 bits on average in each of
the LFSR sequences, we will build a tree structure to sequentially store
all the possibilities for the next bits that are consistent with the additional
linear equations. In each node of the tree one stores the next three input
bits to the majority clock-control function such that the resulting clocking
is consistent with the equations. This approach is in spirit similar to the
inversion attack [8] on nonlinear filter generators. The average number of
branches leaving each node would have been 3/4 · 4 + 1/4 · 8 =
5 if it were not for the additional equations. They on average reduce this
number to 2.5. The required depth of the tree should on average be
4<I>m</I>/3 to obtain the next <I>m</I> guessed bits in each of the LFSR
sequences. So, instead of 2<SUP>3<I>m</I></SUP> possibilities for the next
<I>m</I> bits, we have to check only 2.5<SUP>4<I>m</I>/3</SUP> <U>~</U>
2<SUP>1.76<I>m</I></SUP> <U>~</U> 2<SUP>11.16</SUP> possibilities on average,
under the reasonable independence assumption valid for the so-called
supercritical branching processes, see Theorem 6 from the Appendix. The overall
complexity is then 2<SUP>30+11.16</SUP> <U>~</U>&nbsp;2<SUP>41.16</SUP>.
For comparison, suppose that the clock-control bits are used to produce the
output, that is, <EM>tau</EM><SUB>1 </SUB><EM>= tau</EM><SUB>2</SUB> <EM>=
tau</EM><SUB>3</SUB>&nbsp;<EM>= </EM>1. Then, clearly, only the part of the
process involving the tree applies and the overall complexity is minimum
possible, that is, 2<SUP>1.76·63.32/3</SUP> <U>~</U> 2<SUP>37.15</SUP>.
<P>
To get the average number of trials needed to find the correct internal state
<I>S</I>(101), one should in fact divide by two the complexity figures given
above, e.g., 2<SUP>41.16</SUP> thus reduces to
2<SUP>40.16</SUP>.&nbsp;<BR>
<H3>
  3.2 Internal state reversion via clocking sequences
</H3>
<P>
In the second stage, our objective is to recover the initial LFSR states
from <I>S</I>(101). In view of (3), this can be done by guessing the number
of ones in individual binary clocking sequences, that is, the number of clocks
needed to get <I>S<SUB>i</SUB></I>(101) from <I>S<SUB>i</SUB></I>(0), for
each <I>i</I> = 1, 2, 3. According to Proposition 1, the underlying probability
distribution is binomial with the average number of clocks 0.75 · 101
<U>~</U> 76 and the standard deviation 0.25 · square root 303 <U>~</U>
&nbsp;4.35 for each of the LFSR sequences. If the search is organized in
order of decreasing probabilities for each of the LFSR sequences independently,
the number of trials required to find the correct numbers of clocks is with
high probability not bigger than about 10<SUP>4</SUP> and is at worst about
10<SUP>6</SUP>. For each guess, one first recovers
<I>S<SUB>i</SUB></I>(0) from <I>S<SUB>i</SUB></I>(101) by backward linear
recursion, for each <I>i</I> = 1, 2, 3, and then tests
<P>
  <HR>
<P>
246
<P>
the guess by running the keystream generator forwards to obtain
<I>S</I>(101). Note that multiple solutions for <I>S</I>(0), if they exist,
are all obtained by checking all <U>~</U>&nbsp;10<SUP>6</SUP> possibilities
for the clocking sequences, for any possible <I>S</I>(101) obtained in the
first stage. This number can clearly be reduced by assuming the mutually
constrained rather than independent clocking sequences for individual LFSRs.
In any case, reconstructing the initial state <I>S</I>(0) from <I>S</I>(101)
is much faster than obtaining <I>S</I>(101) itself.<BR>
<H3>
  4 Time-Memory Trade-Off Attack
</H3>
<P>
As was already explained in the previous section, the first 64 successive
output bits of A5, (<I>y</I>(<I>t</I>))<SUP>63</SUP> <SUB><I>t </I>= 0</SUB>,
represent a vectorial boolean function of 64 initial state bits <I>S</I>(0)
such that the number of different initial states <I>S</I>(0) producing the
same 64-bit initial output block is in most cases only 1 or a very small
integer. In fact, since the initial 101 output bits are not used for the
keystream, the initial state bits <I>S</I>(0) should be confined to the
2<SUP>63.32</SUP> values achievable by <I>S</I>(1) which are easily
characterized. As a consequence, for any observed 64 successive keystream
bits, one can find all the preceding internal states yielding these bits
either by exhaustive search over all reachable internal states requiring
2<SUP>63.32</SUP> 64-bit computations and bitwise comparisons or by only
one table lookup requiring 2<I>63.32</I> 64-bit words of memory to store
the inverse of the vectorial boolean function considered. The inverse function,
with multiple preimages if they exist, is found and stored in
2<SUP>63.32</SUP> precomputation time. Let the time and memory required in
these two extreme cases be denoted as <EM>T </EM>= 2<SUP>63.32</SUP>, <EM>M
</EM>= 1 and <I>T</I> = 1, <I>M</I> = 2<SUP>63.32</SUP>, respectively. Is
any meaningful time-memory trade-off based on the birthday paradox possible?
<P>
Assume that,the objective is to recover the preceding internal states for
any observed 64 successive keystream bits in the known plaintext scenario.
Each known keystream sequence of effective length 228 bits provides 102
&nbsp;<U>~</U>&nbsp;2<SUP>6.67</SUP> 64-bit blocks, and, due to the very
small keystream sequence length, it is very likely that the cryptanalyst
knows either all 228 bits or none of them. So, any time-memory trade-off
solely based on these 102 keystream blocks is meaningless. However, we may
consider a sample of all the keystream sequences corresponding to different
initial states (secret message keys) derived from <EM>K </EM>(at most
2<SUP>22</SUP>) different known public keys and a single secret session key.
The reconstruction of any internal state corresponding to a particular public
key is then meaningful if <EM>K </EM>< 2<SUP>22</SUP> and if it leads
to the recovery of the secret session key, which can then be used to decrypt
the ciphertexts obtained from the remaining public keys.
<P>
Let the cryptanalyst form a table of <EM>M </EM>possibly multiple 64-bit
words defining the reachable initial states corresponding to a random sample
of <EM>M </EM>different 64-bit output blocks, and let the table be then sorted
out with respect to the output blocks, which are also stored. Multiple preimages
are all obtained by the internal state reversion given a known output, in
<EM>O(M) </EM>time, see Subsection 5.2. The required precomputation time
for sorting is <EM>M </EM>log M or, approximately, just <I>M</I> if the
logarithmic factor, smaller than 64, is neglected. Altogether, the required
<P>
  <HR>
<P>
247
<P>
precomputation time is thus <EM>O(M). By </EM>the standard birthday paradox
(used in meet-in-the-middle attacks), it then follows that with high probability
at least one of the 102 · <I>K</I> keystream blocks in the observed
sample will<STRONG> </STRONG>coincide with one of the output blocks used
to form the table if <BR>
<P ALIGN=Center>
102 · <I>K</I> · <I>M</I> <U>></U> 2<SUP>63.32</SUP> &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp;(5)<BR>
<P>
where a small multiplicative constant is neglected for simplicity. The time
<I>T</I> needed to find such a keystream block is 102 · <I>K</I> log
<EM>M </EM>or simply 102 · <I>K</I> neglecting the logarithmic factor.
Then only one table lookup gives the desired internal state(s). So, the
time-memory trade-off is possible with <I>T</I> <EM>· M
</EM><U>></U> 2<SUP>63.32</SUP> and <I>T</I> < 102 ·
2<SUP>22</SUP>. For example, if <EM>K </EM>= 2<SUP>15, </SUP>then the time
and memory required are <I>T</I> <U>~</U>&nbsp;2<SUP>21.67</SUP> and <EM>M
</EM><U>~</U> 2<SUP>41.65</SUP> (in 128-bit words), respectively, and the
precomputation time is <EM>O(M). </EM>In an extreme case, when <EM>K </EM>=
2<SUP>21</SUP>, we get <I>T</I> <U>~</U>&nbsp;2<SUP>27.67</SUP> and <EM>M
</EM><U>~</U> 2<SUP>35.65</SUP> <U>~</U> 862 Gbytes, but the secret session
key to be determined can then only be used to decrypt ciphertexts obtained
from the remaining half of the public keys.
<P>
A more general approach for the cryptanalyst would be to analyze the traffic
corresponding to <I>L</I> different sessions for each out of <I>N</I> users.
This increases the sample size (and time) to 102 ·<EM> K · L ·
N, </EM>so that further reduction in <EM>M </EM>is possible, which makes
the attack quite realistic. In this case, a particular user whose secret
session key is to be determined is not known in advance. This, of course,
does not make a difference if the objective is cloning rather than decryption.
Even more generally, one may also allow that K be maximum possible,
2<SUP>22</SUP>, if the cryptanalyst is capable of attacking the algorithm
that combines the secret master key of a user and a public random 128-bit
key into the secret session key. Namely, the determined session key may be
useless for decryption, but may be used for the secret master key reconstruction
with devastating consequences regarding both decryption and cloning.
<P>
The time-memory trade-off attack described clearly applies to arbitrary keystream
generators, and is feasible in the case of A5 because of its relatively short
memory size of only 64 bits. It yields an internal state of A5 at a known
time and is meaningful when coupled with a cryptanalytic attack to be introduced
in the next section which gives all the candidates for the secret session
key. If the internal state is determined at time 101 <U><</U> <I>t</I>
<U><</U> 151, then the attack consists in the reversion of the internal
state to <I>S</I>(101) based on known output, then to <I>S</I>(0) when the
output is not known (due to the first 100 output bits discarded), and finally
to the secret session key when the known public key is incorporated. If the
internal state is determined at time 315 <U><</U> <I>t</I> <U><</U>
365, then the attack consists in the reversion of the internal state to
<I>S</I>(315) based on known output, then to <I>S</I>(214) when the output
is not known, and the rest is the same as in the first case with
<I>S</I>(214) as the internal state. Note that possible multiple solutions
are all obtained. Multiple candidates for the secret session key are then
easily reduced to only one, correct solution by comparing a small number
of already known keystream sequences with the ones generated from the assumed
candidates and known public keys.
<P>
  <HR>
<P>
248<BR>
<H3>
  5 Internal State Reversion via Branching
</H3>
<P>
The objective of the internal state reversion attack to be described in this
section is to find all the secret session keys that combined with a known
public key give rise to a given internal state at a known time. All the internal
states at a known time that are consistent with a known keystream sequence
can be obtained either by the basic internal state reconstruction attack
from Subsection 3.1 or by the time-memory trade-off attack from Section 4.
<P>
The performance of the attack is analyzed by the theory of critical and
subcritical branching processes and its time and space complexities are thus
shown to be both small. Extensive computer experiments on nonlinear filter
generators regarding the so-called generalized inversion attack [9] (where
the whole internal state is recovered starting from its finite input memory
part in a way similar to the internal state reversion) show that the size
of the generated search trees can be well described by the theory of branching
processes.<BR>
<H3>
  5.1 Unknown output
</H3>
<P>
Given an internal state <EM>S(t) </EM>at time <EM>t, t <U>></U> </EM>1,
<I>S</I>(<I>t</I>)
<SUB><IMG ALIGN="Bottom" ALT="[Image]" WIDTH="17" HEIGHT="31" SRC="eff.gif"></SUB>&nbsp;<I>S</I><SUB>0
</SUB>the objective of the reversion attack when the output sequence is not
known is to determine all the internal states <EM>S(t') </EM>at a given previous
time <EM>t' < t </EM>that produce <EM>S(t) </EM>at time <EM>t </EM>by
the state-transition function, whereas the output sequence is not considered
at all. For the reversion to work, the state-transition function, <I>F</I>,
must be easily computable in the reverse direction. Letting
<I>F</I><SUP>-1</SUP> denote the reverse state-transition function,
<I>F</I><SUP>-1</SUP>(<I>S</I>(<I>t</I>)) denotes the set of all
<I>S</I>(<I>t</I> - 1)<EM> </EM>such that <I>F</I>(<I>S</I>(<I>t</I> - 1)
= <I>S</I>(<I>t</I>)<EM>. </EM>The reversion attack then consists in the
recursive computation of the reverse state-transition function starting from
<I>S</I>(<I>t</I>) and up to <I>S</I>(<I>t'</I>)<EM>. </EM>The internal states
obtained can all be stored as nodes in a tree with <EM>t - t' + </EM>1 levels
where the initial level, <EM>n = O, </EM>has one initial node representing
<I>S</I>(<I>t</I>),<EM> </EM>and the level <I>n</I>, 1 <U><</U><EM>
</EM><I>n</I> <U><</U>&nbsp;<I>t - t'</I> contains the nodes representing
all possible <EM>S</EM>(<I>t - n</I>) giving rise to
<I>S</I>(<I>t</I>)<EM>. </EM>The end nodes thus give all the desired internal
states <I>S</I>(<I>t'</I>).<EM> </EM>The main problem here is to estimate
the size of the trees arising from a random <I>S</I>(<I>t</I>),<EM> </EM>that
is, the number of the nodes obtained at each level <EM>n </EM>if
<I>S</I>(<I>t</I>) is<EM> </EM>chosen uniformly at random, and especially
if <EM>n </EM>is not small.
<P>
The state-transition function of A5-is essentially determined by the
clock-control sequence, see (1) and (3). Accordingly, the number of different
states <EM>S</EM>(<I>t</I> - 1) in
<I>F</I><SUP>-1</SUP>(<I>S</I>(<I>t</I>)) is derived by backward clocking
from all the possibilities for <EM>C </EM>(<I>t</I> - 1) and hence only depends
on the following six bits: the three bits
(<I>s</I><SUB>1,<I>tau</I>1</SUB>(<I>t</I>),
<I>s</I><SUB>2,<I>tau</I>2</SUB>(<I>t</I> ),
<I>s</I><SUB>3,<I>tau</I>3</SUB>(<I>t</I>)) which define the clock-control
sequence at the current time <EM>t, C</EM>(<I>t</I>), and the three preceding
bits in the regularly clocked LFSR sequences which, if
min(<EM>tau</EM><SUB>1</SUB>, <EM>tau</EM><SUB>2</SUB>,
<EM>tau</EM><SUB>3</SUB>) <U>></U> 2, all belong to
<EM>S</EM>(<I>t</I>) and are given as (<I>s</I><SUB>1,<I>tau</I>1 -
1</SUB>(<I>t</I>), <I>s</I><SUB>2,<I>tau</I>2 - 1</SUB>(<I>t</I>),
<I>s</I><SUB>3,<I>tau</I>3 -1 </SUB>(<I>t</I>)). Denote these bits by
<I>s</I><SUB>1</SUB>, <I>s</I><SUB>2</SUB>, <I>s</I><SUB>3</SUB> and
<I>s'</I><SUB>1</SUB>, <I>s'</I><SUB>2</SUB>, <I>s'</I><SUB>3</SUB>,
respectively.
<P>
<STRONG>Proposition 3. </STRONG><I>Let (i, j, k) denote a permutation of</I>
(1, 2, 3) . <I>Then the following six events can occur:</I>
<P>
  <HR>
<P>
249
<BLOCKQUOTE>
  - <I>A : &nbsp;for any k, if s'<SUB>i</SUB> = s'<SUB>j</SUB> /=
  s'<SUB>k</SUB> = s<SUB>k</SUB>, then</I> <I>C</I>(<I>t</I> - 1) =
  {<I>i</I>,<I>j</I>}
  <P>
  - <I>B : &nbsp;for any k, if s'<SUB>i</SUB> = s'<SUB>j</SUB> /=
  s'<SUB>k</SUB> /= s<SUB>k</SUB>, then</I> <I>C</I>(<I>t</I> - 1) can take
  no values
  <P>
  - <I>C : &nbsp;if s'</I><SUB>1</SUB><I> = s'</I><SUB>2</SUB><I> /=
  s'</I><SUB>3</SUB><I> = s</I><SUB>1</SUB><I> = s</I><SUB>2</SUB><I> =
  s</I><SUB>3</SUB><I>, then</I> <I>C</I>(<I>t</I> - 1) = {1, 2, 3}
  <P>
  - <I>D : &nbsp;if s'</I><SUB>1</SUB><I> = s'</I><SUB>2</SUB><I> /=
  s'</I><SUB>3</SUB><I> /= s</I><SUB>1</SUB><I> = s</I><SUB>2</SUB><I> =
  s</I><SUB>3</SUB><I>, then</I> <I>C</I>(<I>t</I> - 1) can take every of the
  four values {1, 2}, {1,3}, {2,3}, and {1, 2, 3}
  <P>
  - <I>E : &nbsp;for any k, if s'</I><SUB>1</SUB><I> =
  s'</I><SUB>2</SUB><I> /= s'</I><SUB>3</SUB><I> = s<SUB>i</SUB> =
  s<SUB>j</SUB> /= s<SUB>k</SUB>, then</I> <I>C</I>(<I>t</I> - 1) can take
  every of the two values {<I>i</I>, <I>j</I>} and {1, 2, 3}
  <P>
  - <I>F : &nbsp;for any i, if s'</I><SUB>1</SUB><I> =
  s'</I><SUB>2</SUB><I> /= s'</I><SUB>3</SUB><I> = s<SUB>i</SUB> /=
  s<SUB>j</SUB> = s<SUB>k</SUB>, then</I> <I>C</I>(<I>t</I> - 1) can take every
  of the three values {<I>i</I>, <I>j</I>}, {<I>i</I>, <I>k</I>} and {1, 2,
  3}
</BLOCKQUOTE>
<P>
<STRONG>Proposition 4. </STRONG><I>If an internal state S</I>(<I>t</I>) <I>is
randomly chosen from S</I><SUB>0</SUB> <I>according to uniform distribution,
then the number of solutions for S</I>(<I>t</I> - 1) <I>is a nonnegative
integer random variable Z with the probability distribution</I><BR>
<P ALIGN=Center>
Pr{<I>Z</I> = 0} = 3/8, Pr{<I>Z</I> = 1} = 13/32,
<P ALIGN=Center>
Pr{<I>Z</I> = 2} = Pr{<I>Z</I> = 3} = 3/32, Pr{<I>Z</I> = 4} = 1/32. &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp;(6)<BR>
<P ALIGN=Left>
It follows that the state-transition function of A5 is not one-to-one and
that the fraction of the internal states from <I>S</I><SUB>0</SUB> not reachable
in one step is 3/8 (they are simply characterized by a set of three linear
equations). Let {<I>S</I>(<EM>t - n</EM>)} denote the set of all the internal
states/nodes at level <EM>n </EM>in the tree spanned by the reversion from
a given <EM>S</EM>(<I>t</I>), and let <I>Z<SUB>n</SUB></I> =
|{<I>S</I>(<I>t - n</I>)}| and <I>Y<SUB>n</SUB></I> =
Sigma<SUP>n</SUP><SUB><I>l </I>= 1</SUB> <I>Z<SUB>l</SUB></I>. Both the time
and space complexities of the reversion attack are determined by
<I>Y<SUB>n</SUB></I>. Our objective now is to estimate how large
<EM>Z<SUB>n</SUB> </EM>and <I>Y<SUB>n</SUB></I> can be when
<I>S</I>(<I>t</I>)<EM> </EM>is randomly chosen. Of course, each particular
<I>S</I>(<I>t</I>) uniquely determines the tree (model <B>M</B>'), and if
we assume that regularly clocked LFSR sequences are mutually independent
and purely random (model <B>M</B>), then the tree is random rather than unique
even when <I>S</I>(<I>t</I>) is<EM> </EM>fixed. From the internal state reversion
via the clocking sequences, Subsection 3.2, we know that in both the models
<EM>Z<SUB>n</SUB> </EM><U><</U> <I>n</I><SUP>3</SUP> necessarily holds.
The trees spanned in both the models are expected to be similar if the depth
<EM>n is </EM>smaller than 4/3 of the period of the shortest LFSR,
&nbsp;<U>~</U>&nbsp;4/3 2<SUP>19</SUP>, which is when on average the LFSR
sequences start to repeat themselves in model <B>M'</B>.
<P>
Proposition 4 shows that the associated Galton-Watson <EM>branching process,
</EM>described in the Appendix, has the branching probability distribution
defined by (6), with the expected value and variance µ = 1 and
<I>sigma</I><SUP>2</SUP> = 9/8, respectively. The branching process is
<EM>critical. </EM>The random trees produced by model <B>M</B> and by the
associated branching process are not exactly the same, as random variables.
The reason for this is that in the branching process the branching probability
distribution for a given node is independent of the nodes at the same or
the preceding levels (the history), whereas in model <B>M</B> there is a
weak dependence between the nodes as a result of different internal states
having some clock-control bits in common. This weak dependence affects the
expected values and variances of both <EM>Z<SUB>n</SUB> </EM>and
<I>Y<SUB>n</SUB></I>, but insignificantly.
<P>
  <HR>
<P>
250
<P>
Consequently, if <I>S</I>(<I>t</I>) is uniformly distributed over
<I>S</I><SUB>0</SUB>, then in model <B>M</B>, in view of Theorem 6 from the
Appendix, <I>E</I>(<I>Z<SUB>n</SUB></I>) <U>~</U> &nbsp;1,
Var(<I>Z<SUB>n</SUB></I>) <U>~</U> <I>sigma</I><SUP>2</SUP><I>n</I>, and
Pr{<I>Z<SUB>n</SUB></I> > 0} <U>~</U>
2/(<I>sigma</I><SUP>2</SUP><I>n</I>), where sigma<SUP>2</SUP> <EM>= </EM>9/8.
So, the fraction of the internal states reachable in <I>n</I> steps is about
2/(<I>sigma</I><SUP>2</SUP><I>n</I>). On the other hand, both the computational
time and the storage required for the reversion attack are determined by
the total number of nodes <I>Y<SUB>n</SUB></I>. Theorem 7 from the Appendix
then gives that <I>E</I>(<I>Y<SUB>n</SUB></I>) <U>~</U> <I>n</I> and
Var(<I>Y<SUB>n</SUB></I>) <U>~</U>
<I>sigma</I><SUP>2</SUP><I>n</I><SUP>3</SUP>/3. In view of the Chebyshev's
inequality Pr{|<I>Y<SUB>n</SUB></I> - <I>E</I>(<I>Y<SUB>n</SUB></I>)| >
<I>e</I>} <U><</U>
Var(<I>Y<SUB>n</SUB></I>)/<I>e</I><SUP>2</SUP>, we then get that the total
number of nodes <I>Y<SUB>n</SUB></I> is with high probability
<I>O</I>(<I>n</I> square root <I>n</I>) and the multiplicative constant is
not big. Note that in the case of interest, <EM>n = </EM>101, and the
approximations are expected to be very good.
<P>
It is also interesting to see how large <I>Z<SUB>n</SUB></I> and
<I>Y<SUB>n</SUB></I> can grow when conditioned on the event that the internal
state <I>S</I>(<I>t</I>) is reachable in <EM>n </EM>steps. We know that at
least one such state results from both the basic internal state reconstruction
attack and the time-memory trade-off attack. Theorem 8 from the Appendix
yields that in this case
<I>E</I>(<I>Z<SUB>n</SUB></I>|<I>Z<SUB>n</SUB></I> > 0)
<U>~</U>&nbsp;<I>sigma</I><SUP>2</SUP><I>n</I><EM>/2 </EM>and
Var(<I>Z<SUB>n</SUB></I>|<I>Z<SUB>n</SUB></I> > 0) <U>~</U>
<I>sigma</I><SUP>4</SUP><I>n</I><SUP>2</SUP>/4. This means that the number
of solutions for <I>S</I>(<I>t</I> - <EM>n</EM>)is with high probability
linear in <EM>n, </EM>provided at least one such solution exists. As for
the total number of nodes <I>Y<SUB>n</SUB></I>, we noted in the Appendix
that <I>E</I>(<I>Y<SUB>n</SUB></I>|<I>Z<SUB>n</SUB></I> > 0)&nbsp;=
<I>O</I>(sigma<SUP>2</SUP><I>n</I><SUP>2</SUP>) and
Var(<I>Y<SUB>n</SUB></I>|<I>Z<SUB>n</SUB></I> > 0)&nbsp;=
<I>O</I>(<I>sigma</I><SUP>2</SUP><I>n</I><SUP>4</SUP>) , so that
<I>Y<SUB>n</SUB></I> is then with high probability
<I>O</I>(<I>sigma</I><SUP>2</SUP><I>n</I><SUP>2</SUP>)<EM>. </EM>So, for
<I>n</I> = 101 both the time and space complexities are small, although somewhat
bigger than in the case of a uniformly distributed <I>S</I>(<I>t</I>).
<P>
<EM> </EM>The number, <EM>N, </EM>of starting internal states in the real
reversion attack may be bigger than just one, but is still small,<STRONG>
</STRONG>as will be shown in the following subsection. The time complexity
clearly increases proportionally with <EM>N, </EM>whereas the space complexity,
determined as the maximum tree size over all the starting states, increases
only logarithmically with <I>N</I>, due to the exponential probability
distribution (21) in Theorem 8 from the Appendix.<BR>
<H3>
  5.2 Known output
</H3>
<P>
Given an internal state <I>S</I>(<I>t</I>)<EM> </EM>at time <EM>t, t
<U>></U> </EM>1,<EM> S</EM>(<I>t</I>)<EM>
<SUB><IMG WIDTH="17" HEIGHT="31" SRC="eff.gif"></SUB> S</EM><SUB>0</SUB>,
the objective of the reversion attack when the output sequence is known is
to determine all the internal states <EM>S</EM>(<I>t'</I>)<EM> </EM>at a
given previous time <EM>t' < t </EM>that produce
<EM>S</EM>(<I>t</I>)<EM> </EM>at time <EM>t </EM>by the state-transition
function as well as the known output sequence (<I>y</I>(<I>t -
l</I>))<SUP><I>t - t'</I></SUP> <SUB><I>l </I>= 1</SUB>. This reversion attack
then goes along the same lines as the one when the output sequence is not
known, with a difference that<STRONG> </STRONG>from each level in the tree
spanned, the nodes whose internal states produce the<STRONG> </STRONG>output
bits different from the one known are all removed. The size of the resulting
tree is hence much smaller.
<P>
The output bit produced from <EM>S</EM>(<EM>t - </EM>1)<EM> </EM>at time
<EM>t - </EM>1 depends on the following six bits:
(<I>s</I><SUB>1,1</SUB>(<I>t</I>),<I> s</I><SUB>2,1</SUB>(<I>t</I>),
<I>s</I><SUB>3,1</SUB>(<I>t</I>)) and the preceding three bits in the regularly
clocked LFSR sequences. They are denoted as <EM>z</EM><SUB>1</SUB>,
<EM>z</EM><SUB>2</SUB>, <EM>z</EM><SUB>3</SUB> and
<EM>z'</EM><SUB>1</SUB>, <EM>z'</EM><SUB>2</SUB>,
<EM>z'</EM><SUB>3</SUB>, respectively. The produced output bit is then equal
to <EM>z'<SUB>i </SUB></EM>+ <EM>z'<SUB>j</SUB></EM> +
<EM>z<SUB>k</SUB></EM> if <EM>C</EM>(<EM>t - </EM>1)<EM> = </EM>{<EM>i,
j</EM>}<EM>, </EM>for any {<I>i</I>, <I>j</I>} (as usual, (<I>i</I>,
<I>j</I>, <I>k</I>) is a permutation of (1, 2, 3)), and to
<EM>z'</EM><SUB>1</SUB> + <EM>z'</EM><SUB>2 </SUB>+
<EM>z'</EM><SUB>3</SUB>&nbsp;if <EM>C</EM>(<EM>t </EM>- 1)<EM> = </EM>{1,
2, 3}. An analog of Proposition 3 can then be established, with a difference
that in each of the given six events,<STRONG>
</STRONG><I>C</I>(<I>t</I> - 1) can take every specified value for which,
in addition, the produced output<STRONG> </STRONG>bit coincides with the
one
<P>
  <HR>
<P>
251
<P>
known, <I>y</I>(<I>t</I> - 1). If min(<EM>tau</EM><SUB>1</SUB>,
<EM>tau</EM><SUB>2</SUB>, <EM>tau</EM><SUB>3</SUB>) <U>></U> 3, one can
then derive the following analog of Proposition 4.
<P>
<STRONG>Proposition 5. </STRONG><I>If an internal state S</I>(<I>t</I>) <I>is
randomly chosen from S</I><SUB>0</SUB> <I>according to uniform distribution,
then the number of solutions for S</I>(<I>t</I> - 1) <I>is a nonnegative
integer random variable Z with the probability distribution</I> <BR>
<P ALIGN=Center>
Pr{<I>Z</I> = 0} = 315/512, Pr{<I>Z</I> = 1} = 75/256, Pr{<I>Z</I> = 2} =
9/128,
<P ALIGN=Center>
Pr{<I>Z</I> = 3} = 5/256, Pr{<I>Z</I> = 4} = 1/512. &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp;(7)<BR wp="br1">
<P>
The probabilistic models <B>M</B> and <B>M'</B> are defined in the same way
as before, with a difference that the known output sequence is assumed to
be either fixed or purely random and independent of the LFSR sequences. The
dependence between the nodes in the trees produced by <B>M</B> and <B>M'</B>,
although still relatively weak, is stronger than before due to the six additional
bits controlling the output. The associated branching process is now<EM>
subcritical </EM>with µ = 1/2 and <I>sigma</I><SUP>2</SUP> = 17/32.
The results regarding the probability distribution, the expected values,
and the variances for the random variables <I>Z<SUB>n</SUB></I> and
<I>Y<SUB>n</SUB></I> are<STRONG> </STRONG>then obtained analogously, by applying
the parts of Theorems 6-8 from the Appendix relating to subcritical branching
processes. Consequently, we get that in model <B>M</B>,
<I>E</I>(<I>Z<SUB>n</SUB></I>) <U>~</U> 2<SUP>-<I>n</I></SUP>,
Var(<I>Z<SUB>n</SUB></I>) <U>~</U>
<I>sigma</I><SUP>2</SUP>2<SUP>-(<I>n</I>-2)</SUP>, and
Pr{<I>Z<SUB>n</SUB></I> > 0} <U>~</U>
<I>c</I>2<SUP>-<I>n</I></SUP>, where <I>c</I> is a positive constant that
is obtained numerically as c = lim<SUB>n>ºº</SUB>
2<I><SUP>n</SUP></I> (1- <I>f</I><SUP>(<I>n</I>)</SUP>(0)) <U>~</U> 0.63036,
where <I>f</I><SUP>(<I>n</I>)</SUP> is the self-composition of the generating
function <I>f</I> of the probability distribution defined by (7), see the
Appendix. Also, <I>E</I>(<I>Y<SUB>n</SUB></I>) <U>~</U> 1 and
Var(<I>Y<SUB>n</SUB></I>) <U>~</U> 8 <I>sigma</I><SUP>2</SUP>.
<P>
Conditioning on the event that the starting internal state is reachable in
<EM>n </EM>steps, we get
<I>E</I>(<I>Z<SUB>n</SUB></I>|<I>Z<SUB>n</SUB></I> > 0) <U>~</U>
1/<I>c</I> <U>~</U> 1.586,
Var(<I>Z<SUB>n</SUB></I>|<I>Z<SUB>n</SUB></I> > 0) <U>~</U> 4
<I>sigma</I><SUP>2</SUP>/<I>c</I> - 1/<I>c</I>2 <U>~</U> 0.854,
<I>E</I>(<I>Y<SUB>n</SUB></I>|<I>Z<SUB>n</SUB></I> > 0) =
<EM>O</EM>(<EM>n</EM>)<EM>, </EM>and
Var(<I>Y<SUB>n</SUB></I>|<I>Z<SUB>n</SUB></I> > 0) =
<I>O</I>(<I>n</I><SUP>2</SUP>). The size <I>Y<SUB>n</SUB></I> of the resulting
tree is then <EM>O</EM>(<EM>n</EM>)<EM> </EM>with high probability. In the
case of interest, resulting from the time-memory trade-off attack, we have
that <EM>n <U><</U> </EM>50, so that the obtained trees are very small,
whereas the number of possible solutions for <EM>S</EM>(<EM>t - n</EM>)<EM>
</EM>(<I>S</I>(101)) is with high probability only 1 or a very small positive
integer. <BR>
<P>
<STRONG>5.3 Secret key reconstruction</STRONG>
<P>
Our goal now is to obtain all possible secret session keys from all the
determined initial states <I>S</I>(0) given a known public key <I>p</I> =
(<I>p</I>(<I>t</I>))<SUP>0</SUP><SUB>t= -21</SUB>. Recall that the secret
session key is in fact an internal state of the initialization scheme, which
works in the same way as the keystream generator A5, except that the public
key is bitwise added, in 22 steps, into the feedback path of each of the
LFSRs. Given an initial state <I>S</I>(0), <I>S</I>(0)
<EM><SUB><IMG WIDTH="17" HEIGHT="31" SRC="eff.gif"></SUB></EM>
<I>S</I><SUB>0</SUB>, the objective of the secret key reconstruction attack
is to determine all the internal states <EM>S</EM>(<EM>t'</EM>)<EM> </EM>at
the previous time <EM>t' = -</EM>22 that produce <I>S</I>(0) by the modified
state-transition function <EM>S</EM>(<EM>t</EM>)<EM> =
F</EM><SUB>0</SUB>(<EM>S</EM>(<EM>t</EM> - 1), <I>p</I>(<I>t</I>)), -21
<U><</U> <EM>t <U><</U> </EM>0<EM>, </EM>which also depends on the
known public key sequence
<P>
  <HR>
<P>
252
<P>
<I>p</I>. The modified reverse state-transition function
<I>F</I><SUP>-1</SUP><SUB>0</SUB>(<EM>S</EM>(<EM>t</EM>)<EM>,
p</EM>(<EM>t</EM>))<EM> </EM>then consists of two stages: first, the bit
<EM>p</EM>(<EM>t</EM>)<EM> </EM>is<EM> </EM>added to the last stage of each
of the LFSRs and, second, the LFSRs are clocked backwards according to all
possible values <EM>C</EM>(<EM>t </EM>-1)<EM> </EM>for the clock-control
sequence.
<P>
It is readily seen that the secret key reconstruction can be achieved by
the reversion attack when the output sequence is not known in which the reverse
state-transition function is modified according to the public key p as explained
above. Consequently, both the analysis based on the theory of critical branching
processes and the conclusions derived remain valid for the secret key
reconstruction attack. Since now <I>n</I> = 22 instead of <I>n</I> = 101,
the trees spanned are much smaller in size. Multiple solutions for the secret
session key <I>S</I>(-22) giving rise to the same <I>S</I>(0) are still possible,
but their number is relatively small. All the resulting candidates for the
secret session key are consistent with the used keystream sequence. These
multiple candidates for the secret session key are then easily reduced to
only one, correct solution by comparing a small number of already known keystream
sequences with the ones generated from the assumed candidates and known public
keys.<BR>
<H3>
  6 Conclusions
</H3>
<P>
Several cryptanalytic attacks on a binary stream cipher known as A5 are proposed
and analyzed. The objective of the attacks is to reconstruct the 64-bit secret
session key from one or several known keystream sequences produced by different
22-bit (randomizing) public keys, in the known plaintext scenario which is
shown to be very realistic in the GSM applications. A basic divide-and-conquer
attack with the average computational complexity 2<SUP>40.16</SUP> and negligible
memory requirements is first introduced. It requires only about 64 known
successive keystream bits and gives all possible LFSR initial states consistent
with a known keystream sequence. A time-memory trade-off attack based on
the birthday paradox is then pointed out. The objective of the attack is
to find the LFSR internal states at a known time for a known keystream sequence
corresponding to a known public key. The attack is feasible as the internal
state size of A5 is only 64 bits.
<P>
To obtain the secret session key from the determined LFSR internal states,
an internal state reversion attack is proposed and analyzed by the theory
of critical and subcritical branching processes. It is shown that there typically
exist multiple, but not numerous, candidates for the secret session key that
are all consistent with the used keystream sequence. The unique, correct
solution is then found by checking on a small number of additional keystream
sequences. The secret session key recovered can be used to decrypt the
ciphertexts obtained from the remaining public keys and, possibly, to mount
a cryptanalytic attack on the secret master key of the user as well.
<P>
A simple way of increasing the security level of the A5 stream cipher with
respect to the cryptanalytic attacks introduced is to make the internal memory
size larger. For example, doubling the memory size, from 64 to 128 bits,
is very
<P>
  <HR>
<P>
253
<P>
likely to push the attacks beyond the current technological limits. Note
that the secret session key size need not be increased to 128 bits. In addition,
one can make the clock-control dependent on more than just a single bit in
each of the shift registers by using a balanced nonlinear filter function
applied to each of them individually. The inputs to the filter functions
should be spread over the shift register lengths, respectively, and their
outputs can be combined in the same way as in A5. This increases the complexity
of the basic internal state reconstruction attack. <BR>
<H3>
  Appendix
</H3>
<P>
<STRONG>Branching processes</STRONG>
<P>
The so-called Galton-Watson process, see [10], [2], is a Markov chain
{<I>Z<SUB>n</SUB></I>}<SUP>ºº</SUP><SUB><I>n</I>=0</SUB> on the
nonnegative integers whose transition function is defined in terms of a given
probability distribution
{<I>p<SUB>k</SUB></I>}<SUP>ºº</SUP><SUB><I>k</I>=0</SUB>. The initial
random variable Z<SUB>0</SUB> takes value 1 with probability 1, and for any
<EM>n </EM><U>></U> 1,<EM> </EM>the random variable
<I>Z<SUB>n</SUB></I> conditioned on <I>Z<SUB>n</SUB></I><SUB>-1</SUB> =
<I>i</I> is the sum of <I>i</I> independent identically distributed random
variables with the probability distribution
{<I>p<SUB>k</SUB></I>}<SUP>ºº</SUP><SUB><I>k</I>=0</SUB>. The process
can be regarded as a random (finite or infinite) tree with
<EM>Z<SUB>n</SUB> </EM>being the number of nodes at level <EM>n
<U>></U> </EM>0 where the number of branches leaving any node in the tree
is equal to <I>k</I> with probability <EM>p<SUB>k</SUB>, </EM>independently
of other nodes at the same or previous levels. The generating function
characterizing the probability distribution of <EM>Z<SUB>n</SUB> </EM>can
be expressed as the self-composition of the generating function
<EM>f</EM>(<EM>s</EM>)<EM> =
</EM>Sigma<SUP>ºº</SUP><SUB><I>k</I>=0</SUB>
<EM>p<SUB>k</SUB>s<SUP>k</SUP></EM> of
{<I>p<SUB>k</SUB></I>}<SUP>ºº</SUP><SUB><I>k</I>=0</SUB>, which
is the probablility distribution of Z<SUB>1</SUB>. Precisely, if
<I>f</I><SUP>(<I>n</I>)</SUP>(<I>s</I>), 0 <U><</U> <I>s</I>
<U><</U>1, denotes the generating function of the probability distribution
of <EM>Z<SUB>n</SUB> </EM>and if <EM>f</EM><SUP>(0)</SUP><EM> = s, </EM>then
for every <EM>n <U>></U> </EM>1<EM>,
f</EM><SUP>(<EM>n</EM>)</SUP>(<EM>s</EM>)<EM> =
f</EM>(<EM>f</EM><SUP>(<EM>n-</EM>1)</SUP>(<EM>s</EM>)).
<P>
<EM> </EM>The basic characteristic of a branching process is the expected
number of branches leaving any node, that is,<BR>
<P ALIGN=Center>
µ = <I>E</I>(<I>Z</I><SUB>1</SUB>) =
Sigma<SUP>ºº</SUP><SUB><I>k</I>=0</SUB> <I>kp<SUB>k</SUB></I>.
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;(8)<BR>
<P ALIGN=Left>
A branching process is called subcritical, critical, or supercritical if
µ < 1, µ = 1, or µ > 1, respectively. The extinction
probability defined as the probability of a tree being finite is 1 for
subcritical and critical (provided <I>p</I><SUB>0</SUB> > 0) processes
and smaller than 1 for supercritical processes. We are here only interested
in subcritical and critical processes, whose main properties are given by
the following theorem, see [2], [10]. Let <I>sigma</I><SUP>2</SUP> =
Var(<I>Z</I><SUB>1</SUB>) be the variance of <I>Z</I><SUB>1</SUB>.
<P ALIGN=Left>
<STRONG>Theorem 6. </STRONG><I>In the subcritical case, µ</I> < 1,
<I>for any n</I> <U>></U> 1,
<P ALIGN=Center>
<BR>
<I>E</I>(<I>Z<SUB>n</SUB></I>) = µ<I><SUP>n</SUP></I> &nbsp; &nbsp;
&nbsp; &nbsp; (9)<BR wp="br1">
<BR wp="br2">
<BR>
Var(<I>Z<SUB>n</SUB></I>) =
<I>sigma</I><SUP>2</SUP>µ<SUP><I>n</I>-1</SUP>
1-µ<I><SUP>n</SUP></I>/1 - µ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;(10)
<P ALIGN=Left>
  <HR>
<P ALIGN=Left>
254
<P ALIGN=Center>
<IMG WIDTH="599" HEIGHT="938" SRC="jg34.jpg"><BR>
<P ALIGN=Left>
  <HR>
<P ALIGN=Left>
255
<P ALIGN=Center>
<BR>
<I>E</I>(<EM>Z<SUB>n</SUB></EM>|<EM>Z<SUB>n</SUB> > </EM>0) ~
<I>sigma</I><SUP>2</SUP><EM>/</EM>2<EM> n</EM> &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp;(22)<BR>
<P ALIGN=Center>
Var(<EM>Z<SUB>n</SUB></EM>|<EM>Z<SUB>n</SUB> > </EM>0) ~
<I>sigma</I><SUP>4</SUP><EM>/</EM>4<EM> n</EM><SUP>2.</SUP> &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp;(23)<BR>
<P>
The probability distribution of the conditioned random variable
<I>Y<SUB>n</SUB></I>|{<I>Z<SUB>n</SUB></I> > 0} is not treated in the
standard books on branching processes like [10] and [2].<EM> </EM>Nevertheless,
the previous theorems and the results regarding the conditioned random variable
<I>Z<SUB>n</SUB></I>|{<I>Z<SUB>n</SUB></I><SUB>+<I>k</I></SUB> > 0} presented
in [2] lead us to conclude that in the subcritical case,
<EM>E</EM>(<EM>Y<SUB>n</SUB></EM>|<EM>Z<SUB>n</SUB> > </EM>0) =
<EM>O</EM>(<EM>n</EM>)<EM> </EM>and
Var(<EM>Y<SUB>n</SUB></EM>|<EM>Z<SUB>n</SUB> > </EM>0) =
<EM>O</EM>(<EM>n</EM><SUP>2</SUP>), whereas in the critical case,
<I>E</I>(<EM>Y<SUB>n</SUB></EM>|<EM>Z<SUB>n</SUB> > </EM>0) =
<EM>O</EM>(<I>sigma</I><SUP>2</SUP><EM>n</EM><SUP>2</SUP>) and
Var(<EM>Y<SUB>n</SUB></EM>|<EM>Z<SUB>n</SUB> > </EM>0) =
<EM>O</EM>(<I>sigma</I><SUP>4</SUP><EM>n</EM><SUP>4</SUP>)<EM>.</EM><BR>
<H3>
  References
</H3>
<P>
1. R. J. Anderson, Internet communication.
<P>
2. K. B. Athreya and P. E. Ney, <EM>Branching Processes. </EM>Berlin:
Springer-Verlag, 1972.
<P>
3. J. Daemen, R. Govaerts, and J. Vandewalle, "Resynchronization weakness
in synchronous stream ciphers," Advances in Cryptology - EUROCRYPT '92,
<EM>Lecture Notes in Computer Science, </EM>vol.<EM> </EM>765, T. Helleseth
ed., Springer-Verlag, pp. 159-167, 1994.
<P>
4. J. Dj. Golic and M. J. Mihaljevic, "A generalized correlation attack on
a class of stream ciphers based on the Levenshtein distance," <EM>Journal
of Cryptology, vol. </EM>3(3), pp. 201-212, 1991.
<P>
5. J. Dj. Golic, "On the security of shift register based keystream generators,"
Fast Software Encryption - Cambridge '93, <EM>Lecture Notes in Computer Science,
</EM>vol.<EM> </EM>809, R. J. Anderson ed., Springer-Verlag, pp. 90-100,
1994.
<P>
6. J. Dj. Golic, "Towards fast correlation attacks on irregularly clocked
shift registers," Advances in Cryptology - EUROCRYPT '95, <EM>Lecture Notes
in Computer Science, </EM>vol.<EM> </EM>921, L. C. Guillou and J.-J. Quisquater
eds., Springer-Verlag, pp. 248-262, 1995.
<P>
7. J. Dj. Golic, "Linear models for keystream generators," <EM>IEEE Trans.
Computers, </EM>vol. C-45, pp. 41-49, Jan. 1996.
<P>
8. J. Dj. Golic, " On the security of nonlinear filter generators," Fast
Software Encryption - Cambridge '96, <EM>Lecture Notes in Computer Science,
</EM>vol.<EM> </EM>1039, D. Gollmann ed., Springer-Verlag, pp. 173-188, 1996.
<P>
9. J. Dj. Golic, A. Clark, and E. Dawson, "Generalized inversion attack on
nonlinear filter generators," submitted.
<P>
10. T. H. Harris, <EM>The Theory of Branching Processes. </EM>Berlin:
Springer-Verlag, 1963.
<P>
11. R. Menicocci, "Cryptanalysis of a two-stage Gollmann cascade generator,"
<EM>Proceedings of SPRC </EM>'93, Rome, Italy, pp. 62-69, 1993.
<P>
12. R. A. Rueppel, "Stream ciphers," <EM>Contemporary Cryptology: The Science
of Information Integrity, </EM>G. Simmons ed., pp. 65-134. New York: IEEE
Press, 1991.
<P>
13. B. Schneier, <EM>Applied Cryptography. </EM>New York: Wiley, 1996.
<P>
14. S. Shepherd and W. Chambers, private communication.
<P>
<I>[End]</I>
<P>
  <HR>
<BLOCKQUOTE>
  <B><A NAME="wgc">Bart Preneel (Ed.)</A></B>
  <H2>
    Fast Software<BR>
    Encryption
  </H2>
  <H3>
    <B>Second International Workshop<BR>
    Leuven, Belgium, December 14-16, 1994<BR>
    Proceedings</B>
  </H3>
  <P>
  Springer
  <P>
  <I>[Excerpt, Pages ?, 26-27]</I>
</BLOCKQUOTE>
<P>
  <HR>
<P>
<I>[? page number]</I><BR>
<H1 ALIGN=Center>
  On Random Mappings and Random Permutations<BR>
</H1>
<H3 ALIGN=Center>
  W. G. Chambers
</H3>
<P ALIGN=Center>
Department of Electronic and Electrical Engineering,<BR>
King's College London, Strand, London WC2R 2LS, UK
<PRE>w.chambers@kcl.ac.uk
</PRE>
<P>
<BR wp="br2">
  <HR width=80%>
<H3>
  1 Introduction
</H3>
<P>
Much work has been done by many people, including the present author, to
prove that certain classes of sequence-generator have guaranteed periods
[1]. Here we examine what happens at the other extreme, with sequence generators
which are finite-state machines modelled as having random next-state tables.
The advantages are a lack of mathematical structure which might provide an
entry for the cryptanalyst, and a huge choice of possibilities; the disadvantages
are that there are no guarantees on anything, and as is well known there
is a risk of getting a very short period.
<P>
Thus we consider a finite-state machine whose state is specified by an integer
in the range 0, ...<I>N</I> - 1, and which has a next-state function <EM>F
</EM>which specifies the <EM>n + </EM>1-th state
<I>s<SUB>n</SUB></I><SUB>+l</SUB> as <EM>F(s<SUB>n</SUB>) </EM>with
<I>s<SUB>n</SUB></I> the <EM>n</EM>-th<EM> </EM>state. The output can also
be regarded as a function of the state, but we are not so much concerned
with this. Evidently the function <EM>F </EM>can be represented by a look-up
table with addresses and entries in the range 0, ...<I>N</I> - 1. Each function
<I>F</I> corresponds to a state-diagram where the states correspond to
<I>N</I> points 0, ...<I>N</I> - 1, with point <I>i</I> (the predecessor)
joined to point <I>j</I> (the successor) by an arrowed line if
<EM>F</EM>(<EM>i</EM>)<EM> </EM>=<EM> j</EM>.<EM> </EM>Points lying on a
loop or <EM>cycle </EM>are called cyclic points; a point <I>i</I> satisfying
<EM>F</EM>(<EM>i<I>)</I> </EM>=<EM> i </EM>is regarded as lying on a cycle
of length 1. Every point has a unique successor, but some points have no
predecessors and some have more than one predecessor. In general the state
diagram will consist of a number of cycles, plus a number of directed trees
rooted on cyclic points; in these trees the arrows are pointing to the root.
<P>
If the next-state function <EM>F</EM> is invertible, so that for every
<I>j</I> in 0, ...N - 1 there is an <I>i</I> satisfying
<EM>F</EM>(<EM>i</EM>)<EM> = </EM><I>j</I>, then the function will be called
a permutation (an <I>N</I>-permutation), and the state-diagram will consist
of a number of cycles without any trees rooted in them.
<P>
There are altogether <EM>N<SUP>N</SUP> </EM><I>N</I>-functions, and <I>N</I>!
<I>N</I>-permutations. Typically the state is represented by a number of
bits, say <EM>n</EM>, so that we have <I>N</I> = 2<I><SUB>n</SUB></I>. Normally
<EM>n </EM>would be of the order of hundreds.
<P>
There is a considerable literature on this topic [2], [3], [4], [5], [6],
[7].
<P>
  <HR>
<P>
<I>[Unknown parts not provided] </I>
<P>
26
<P>
as they run. There are certain practical objections to such a scheme of course,
for instance if the generator needs to be restarted frequently or if random
access to the key-stream is needed, but if one simply needs a long key-stream
without any restarts such a method is feasible and of course it makes it
harder for the cryptanalyst by presenting him as it were with a moving target.
<P>
If we have such a scheme then the internal description of the machine-state
must include the look-up table. Now the point is the following: if the
transformation from one state to the next is invertible, then we have a
permutation for the next-state function, but otherwise we may be dealing
with a general finitestate machine with its likelihood of much smaller loop
sizes. Thus it would seem that the modification of the look-up tables should
be carried out in a way that enables one to undo the transformation in a
unique manner.
<P>
Thus as a simple example consider the following random-number generator proposed
by Bob Jenkins and publicised on the Usenet service. The state variables
are
<PRE>a, b: Unsigned 32-bit integers <BR>m[O] ..m[255]: a lookup table of unsigned 32-bit integers <BR>p: a counter cycling from 0to 255 <BR wp="br1">
</PRE>
<P>
Initially <I>p</I> is set to zero and the other variables to random values.
There are two internal unsigned 32-bit integer variables, <I>x</I> and
<I>y</I>. Addition is carried out "modulo 2<SUP>32</SUP>". We loop indefinitely
on the following instructions:
<PRE>x = m [p];
y = m[RS(x)]; /* RS(x) is a right-shift of x by 24, leaving an 8-bit result */ 
a = R(a); /* R() is a rotation left by 27 bits */ 
a = a + y; /* addition is mod 2<SUP>32</SUP> */
m[p] = a + b; /***/ /* see below */
output(b+y); /* put the value b+y on the output stream */ 
b = x; 
p = (p + 1) mod 256; /* Step p */ <BR wp="br1">
</PRE>
<P>
The instruction marked <I>/***/</I> evidently changes the lookup table. It
is not hard to see that if we know the state variables at the end of this
loop we could determine them at the start, including the value of the modified
table-entry; thus we have what is in effect a permutation. The same thing
would apply if we replaced <I>/***/</I> by <I>m[p] = m[p] + a + b, or m[p]
= mtp] XOR (a + b)</I>, but not if we replaced /***/ by <I>m[p] = m[p] +
a</I>, as we could not then deduce the initial <BR>
value of b from the final values of the state variables.<BR>
<H3>
  8 A Cipher Claimed to Resemble A5
</H3>
<P>
Two other encryption algorithms have recently been publicised on the Internet,
without much theoretical backing. The first is "alleged RC4", which has
similarities to the algorithm just described. Here the next state function
is invertible.
<P>
  <HR>
<P>
27
<P>
The second (announced by <A HREF="http://jya.com/crack-a5.htm#rja">Ross Anderson
and Michael Roe</A>) is purported to be very similar to the A5 cipher used
in the GSM mobile telephone system [11]. It uses three binary linear feedback
shift-registers with known (key-independent) primitive polynomials of degrees
19, 22 and 23 respectively. These registers are initially set in a key-dependent
manner to non-zero values, and on each iteration they are stepped as follows:
A control-bit is taken from a known position near the centre of each register.
If two or three of the control bits are equal to 1, then the registers producing
these bits are stepped. On the other hand if two or three of the control
bits are 0, then the registers producing <EM>these </EM>bits are stepped.
In effect the registers are mutually clock-controlled in a stop/go fashion,
and it is easy to see that there is a probability of 3/4 that a register
is stepped on any iteration of the algorithm. Thus the longest register would
be expected to go through a complete cycle in roughly <I>P</I> =
(2<SUP>23</SUP> - 1) · 4/3 iterations.
<P>
Regarded as a finite-state machine the system has just under
2<SUP>19+22+23</SUP> = 2<SUP>64</SUP> states, and the next-state function
is non-invertible. Thus we would expect to find eventual periodicities of
the order of 2<SUP>32</SUP>, after a precursor sequence of the same order
of length. Instead, in a search for eventual periodicities the author found
237 cases (all distinct), and all of them<STRONG> </STRONG>had periods very
close to small multiples of <I>P</I> = (2<SUP>23</SUP> -1) · 4/3; moreover
just over 40% of these cases had periods very close to <I>P</I> itself. (The
precursor sequences were on the whole a little longer, of lengths something
like 10<I>P</I> on average.) Evidently the shorter registers, one with a
period very close to <I>P</I>/16 and the other with a period very close to
<I>P</I>/2, are locking on to the period of the longest register, with
respectively 16 cycles and 2 cycles for every cycle of the longest register.
Further investigations have shown that this "lock-in" is a robust phenomenon,
occurring independently of the choice of primitive polynomials, and even
occurring if the three sequences of control bits are chosen as random periodic
sequences with periods equal to numbers of the form 2<I><SUP>n</SUP></I>
- 1. This topic is being pursued further.
<P>
The shortness of the period is probably not a hazard in normal use [11],
where only a few hundred bits of output are required between key-changes,
but perhaps one would be advised not to use this scheme as a random-number
generator without further study. <BR>
<H3>
  9 Concluding Remarks
</H3>
<P>
Encryption algorithms in which the look-up tables are continuously modified
have the attractions of high speed and of making the analyst's task harder,
but there may be a lingering doubt about the period, particularly when there
is no significant theory available. The above discussion strongly suggests
that the nextstate function should be invertible, but one might like some
further reassurance. The use of a "rekeyed" cipher is a good way of obtaining
a guaranteed huge period. Here there is "driving" sequence generator D whose
output is used to modify the state of a second generator G which provides
the final output. The generator D has a known or lower-bounded huge value
for its period. Thus a 32-bit cascade generator with a cascade of length
8, as suggested in [1], will have
<P>
<I>[Balance of article not provided]</I>
<P>
  <HR>
<P>
<I>[End]</I>
<P>
HTML by <A HREF="http://jyacom/index.htm">JYA/Urban Deadline</A>
<P>
  <HR>
<P>
See references to A5 cryptanalysis by Ross Anderson, Michael Roe, Bruce Schneier
and Simon Shepherd:
<A HREF="http://jya.com/crack-A5.htm">http://jya.com/crack-A5.htm</A>
<P>
In particular, these classified papers by Simon Shepherd:
<PRE>&nbsp; S J Shepherd, "Cryptanalysis of the GSM A5 Cipher Algorithm",
&nbsp; IEE Colloquium on Security and Cryptography Applications to
&nbsp; Radio Systems, Digest No. 1994/141, Savoy Place, London, 3
&nbsp; June 1994, (COMMERCIAL-IN-CONFIDENCE).

&nbsp; S J Shepherd, "An Approach to the Cryptanalysis of Mobile
&nbsp; Stream Ciphers", IEE Colloquium on Security and Cryptography
&nbsp; Applications to Radio Systems, Digest No. 1994/141, Savoy
&nbsp; Place, London, 3 June 1994, (COMMERCIAL-IN-CONFIDENCE).

<HR>

</PRE>
</BODY></HTML>
Top Authors In Last 30 Days

Red Hat 210 files
Ubuntu 64 files
Debian 27 files
LiquidWorm 11 files
Valentin Lobstein 11 files
nu11secur1ty 8 files
Apple 6 files
Google Security Research 5 files
Ersin Erenler 4 files
E1.Coders 4 files
File Archives

Systems

AIX (429)
Apple (2,078)
BSD (376)
CentOS (58)
Cisco (1,927)
Debian (7,014)
Fedora (1,693)
FreeBSD (1,246)
Gentoo (4,467)
HPUX (880)
iOS (373)
iPhone (108)
IRIX (220)
Juniper (69)
Linux (49,227)
Mac OS X (691)
Mandriva (3,105)
NetBSD (256)
OpenBSD (488)
RedHat (15,501)
Slackware (941)
Solaris (1,611)
SUSE (1,444)
Ubuntu (9,439)
UNIX (9,391)
UnixWare (187)
Windows (6,649)
Other
a5-hack.htm

a5-hack.htm

File Archive:

April 2024

Top Authors In Last 30 Days

File Tags

File Archives

Systems

a5-hack.htm

Share This

a5-hack.htm

File Archive:

April 2024

Top Authors In Last 30 Days

File Tags

File Archives

Systems