description.html
4e801f94d4c2e12f2fdbe1406293e047b988f74c6286bda4df7f6a782d40ea48
<HTML>
<HEAD> <TITLE>How SNOW works</TITLE> </HEAD>
<BODY BGCOLOR="#d8d8d8" TEXT="#000080" LINK="#0000FF">
<H1> How SNOW works </H1>
<P>
This document gives a description of the encoding scheme used
by <B>snow</B>. <P>
<H3> The Nature of Steganography </H3>
<P>
Steganography is the science of concealing messages in other messages.
Some historical techniques have involved invisible ink, subtle indentations
in paper, and even tattooing messages under the hair of messengers.
In this digital age, steganography provides means for hiding messages in
digital audio files, in some kinds of images, and even for generating
pseudo-English text which encodes the message. </P>
<P>
Ideally, the original message is not noticeably degraded by presence of
a hidden message. As a result, the most effective techniques tend to
make use of data that contains a lot of redundancy, such as raw audio
and image files. Steganography works much less effectively, if at all,
with efficient compressed formats such as JPEG and MPEG. </P>
<P>
Unfortunately, sending large amounts of raw audio and image data can
arouse suspicion, and the pseudo-English encoding schemes are not
sophisticated enough to fool a human observer. </P>
<H3> Whitespace Steganography </H3>
<P>
The encoding scheme used by <B>snow</B> relies on the fact that spaces
and tabs (known as <EM>whitespace</EM>), when appearing at the end of lines,
are invisible when displayed in pretty well all text viewing programs.
This allows messages to be hidden in ASCII text without affecting the
text's visual representation. And since trailing spaces and tabs
occasionally occur naturally, their existence should not be sufficient
to immediately alert an observer who stumbles across them.
<P>
The <B>snow</B> program runs in two modes - message concealment, and
message extraction. During concealment, the following steps are taken. </P>
<DL>
<DD> Message -> optional compression -> optional encryption
-> concealment in text
</DL>
Extraction reverses the process.
<DL>
<DD> Extract data from text -> optional decryption -> optional uncompression
-> message
</DL>
<P>
Each of the steps are described in detail below. </P>
<H3> Compression </H3>
<P>
The compression scheme used by <B>snow</B> is a fairly rudimentary
Huffman encoding scheme, where the tables are optimised for English
text. This was chosen because the whitespace encoding scheme provides
very limited storage space in some situations, and a compression
algorithm with low overhead was needed. In other words, short messages
had to compress to even shorter data. Depending on the text, you
can usually get 25 - 40% compression. </P>
<P>
If you want to compress a long message, or one not containing standard
text, you would be better off compressing the message externally with
a specialized compression program, and bypassing <B>snow</B>'s optional
compression step. This usually results in a better compression ratio. </P>
<H3> Encryption </H3>
<P>
The encryption algorithm built in to <B>snow</B> is
<A HREF="../ice/index.html">ICE</A>, a 64-bit block cipher also
designed by the author of <B>snow</B>. It runs in 1-bit cipher-feedback
(CFB) mode, which although inefficient (requiring a full 64-bit encryption
for each bit of output), provides the best possible security when
different messages are encrypted with the same password. Although
using the same password many times is theoretically a big no-no, in
the real world it often can't be avoided. </P>
<P>
The lower 7 bits of each character in the password are packed into an array,
which is used to set the encryption key. The ICE encryption algorithm
can operate at different levels, with higher levels using longer keys
and providing more security. The ICE level appropriate for the password
length is used. </P>
<P>
CFB mode makes use of an initialization vector (IV), which is initially
set to the first 64 bits of the key encrypted by itself. Each time a
bit is encrypted, the IV is encrypted, and the leftmost bit of the
encrypted IV is XORed with the bit. The IV is then shifted left one bit,
and the ciphertext bit is added to the right. Decryption reverses this
process. </P>
<H3> The Encoding Scheme </H3>
<P>
To show the beginning of a message, a tab is added immediately after
the text on the first line where it will fit. This prevents the
insertion of mail and news headers containing trailing spaces from
corrupting the message, since a trailing tab must be found before
extraction begins. </P>
<P>
Data is written 3 bits at a time, coding for 0 to 7 spaces. Any messages
not a multiple of 3 bits will be padded by zeroes. During extraction,
an extra one or two bits at the end will be ignored (fortunately there
are no two-bit Huffman codes to confuse things). </P>
<P>
An alternative scheme was considered, where bits were written one at
a time as either a space or a tab. Although this scheme adds fewer
characters per bit (1 vs 1.5), it requires more columns per bit
(4.5 vs 2.67), and column space is the limiting factor. </P>
<P>
Tabs are used to separate the blocks of spaces. Thus 3 bits are usually
coded in 8 columns of text, and given that the default line length is
80 characters, this allows 30 bits to be stored on empty lines.
A tab is not appended to the end of a line unless the last 3 bits
coded to zero spaces, in which case it is needed to show some
bits are actually there. </P>
<P>
If a message will not fit into the available text, empty lines will be
appended and used to contain the overflow. A warning message will also
be produced, since this affects the look of the original text. </P>
</BODY>
</HTML>