This version of the Independent JPEG Group's JPEG Software (release 4)
has been modifed for 1-bit steganography in JFIF output files.

***IMPORTANT***

Many users have had difficulty using this program.  Much of this
difficulty has stemmed from the fact that 


                  JSTEG *DOES NOT READ JPEG FILES*.


JSTEG reads many other popular formats (only lossless formats) and saves
its output as a JPEG file.  An easy workaround to this is to use djpeg
on the jpeg in which you want to hide data, outputting it to a
TARGA (tga) format file, then use that tga file as input to cjpeg.
You should be aware, however, that this will result in a slight loss of
image quality.


To compile the package, simply follows the steps given in the original
README file.

To inject a data file into a JPEG/JFIF image, simply add the option
"-steg filename" to the "cjpeg" command line.  If the data file is too
large for the image, "cjpeg" will inform you.  At this point, you can
compress the data file, increase the quality of the image (thereby
increasing image size), or try a different image.

Extraction of a data file works similarly.  The "-steg filename"
option to "djpeg" writes the steganographic data to the file, wiping
out its previous contents.  Usually, the decoded image sent to
standard output is redirected to "/dev/null".


Steganography is the science of hiding data in otherwise plain text or
images.  Here, we are hiding the data inside images stored in the JFIF
format of the JPEG standard.  It was believed that this type of
steganography was impossible, or at least infeasible, since the JPEG
standard uses lossy encoding to compress its data.  Any hidden data
would be overwhelmed by the noise.  The trick used by this
steganographic implementation is to recognize that JPEG encoding is
split into lossy and non-lossy stages.  The lossy stages use a
discrete cosine transform and a quantization step to compress the
image data; the non-lossy stage then uses Huffmann coding to further
compress the image data.  As such, we can insert the steganographic
data into the image data between those two steps and not risk
corruption.

This method has several benefits.  First, the JPEG/JFIF image format
has become the de-facto standard for transmission across USENET and
for storage on FTP sites.  Steganography using these formats would be
innocuous compared to that with most other formats.  Second,
steganographic data in JPEG images is harder to detect with the naked
eye than the same data in raw 8-bit or 24-bit images.  Just as the
aforementioned lossy raw->encoded conversion tends to wipe out
steganographic data, the reversed, encoded->raw conversion tends to do
the same thing.  The steganographic inaccuracies in the image are
wiped over.  In addition, the wide control available over image
quantization makes it very difficult to establish whether or not the
inaccuracies which do appear are caused by steganographic data or by
lower-quality quantization.

The JPEG encoding procedure divides an image into 8x8 blocks of pixels
in the YCbCr colorspace.  Then they are run through a discrete cosine
transform (DCT) and the resulting frequency coefficients are scaled to
remove the ones which a human viewer would not detect under normal
conditions.  If steganographic data is being loaded into the JPEG
image, the loading occurs after this step.  The lowest-order bits of
all non-zero frequency coefficients are replaced with successive bits
from the steganographic source file, and these modified coefficients
are sent to the Huffmann coder.  (This choice of encoding slots produces
good results, but there may be better ones.  For example, tests have
shown that the human eye is less sensitive to changes along the Cb and
Cr colorspace axes---we ought to be able to stick more data there.)

The steganographic encoding format (the format of data inserted into
the lowest-order bits of the image) is as follows:

  +-----+-----------     -----+--------------------------------
  |  A  |  B  B  B  . . .  B  |  C  C  C  C  C  C  C  C  C  C  . . .
  +-----+-----------     -----+--------------------------------

  "A" is five bits.  It expresses the length (in bits) of field B.
  Order is most-significant-bit first.

  "B" is some number of bits from zero to thirty-one.  It expresses
  the length (in bytes) of the injection file.  Order is again
  most-significant-bit first.  The range of values for "B" is 0 to
  one billion.

  "C" is the bits in the injection file.  No ordering is implicit on
  the bit stream.

This format is designed to make the steganographic data as innocuous
as possible.  (As one would expect, there is no magic cookie at the
front giving the format).  We are forced to have a length field at the
beginning of the data, since any sort of in-band EOF tag would be
infeasible.

Expressing the length field as a raw series of bits representing an
integer would be dangerous, however; for any sort of small
steganographic file, there would be a long string of zeroes in the
field---very easy to detect.  By stripping off the zeroes and creating
a secondary length field for our primary length field(!), we greatly
reduce the problem.  The five bits for the secondary length field is
small enough that runs of zeroes are not a problem, and it allows a
primary length field of up to thirty-one bits.

There is still a danger in that the sixth bit of the stream will
always be one; this is solved by tacking an extra zero onto the
beginning of the primary length field in half the cases.  This helps
randomize the output, although it reduces the representable data size
to one gigabyte.

The storage effectiveness for this steganographic technique is
reasonable, but not astounding.  Using the simple encoding criteria
described above, an N kilobyte data file fits when the resulting
JPEG/JFIF file is around C*N kilobytes, where C ranges from eight to
ten.  This is not much worse than raw 24-bit insertion, and the
possibility of tweaking with regards to colorspace could produce even
better results.  Compressing the steganographic file before injection
does not seem to greatly harm compression in the envelope image; the
data spreading that occurs during injection increases entropy enough
for Huffmann coding to work.

Derek Upham
upham@cs.ubc.ca