
                            The Commodore ARC format

  Disclaimer: The description below is a mere extrapolation from the files and
programs I have encountered. Note that  there  is  another  ARC  format  under
MS-DOS which is completely different and is not discussed here.

  ARC is used to compress multiple files into a single archive.  There  is  no
signature at the beginning of the archives.

  Normally archives start at the  beginning of  the  file  but  there  can  be
different extractors prepended to the archive. A reliable  method to  get  the
start offset of the archive is reading the BASIC line at the beginning of  the
extractor, subtracting 6 from the BASIC line number and then  multiplying  the
result by 254. In case the argument of the SYS instruction in the  BASIC  line
starts with the digit 7, you have to subtract 1 more from the start offset.

  The archives contain one or more blocks that consist of a  file  header  and
the compressed data of the file. The original file data is compressed using LZ
compression bundled with a dynamic Huffman algorithm and is protected  with  a
16-bit checksum.

  The file header has the following structure:

  POSITION        DESCRIPTION
  $00             Header version (1 or 2)
  $01             Compression method (0-2 for version 1 and 0-5 for version 2)
  $02-$03         Checksum of the file
  $04-$06         Size of the original file data (3 bytes)
  $07-$08         Size of the packed file data in blocks
  $09             Type of file (PETSCII lowercase character D, P, R, S or U)
  $0A             Length of the original file name (LEN)
  $0B-(LEN-1)     Original name of the file

  The following two entries only exist in version 2 headers:
  LEN             Record length of the original file, if
  (LEN+1)-(LEN+2) Original date stamp of file (MS-DOS format)

  Compression methods are the following: 0  means  that  the  file  is  stored
without any compression, 1 is for RLE (run length encoding), 2 is for  Huffman
algorithm, 3 is for LZ compression, 4 is for Huffman algorithm with RLE and  5
is for LZ compression in one pass.

  The MS-DOS date stamp format packs the last modification date into a word:

  BIT POSITION  DESCRIPTION
   0- 4         Day (1-31)
   5- 9         Month (1-12)
  10-15         Year minus 1980

  The file name is a Pascal-style PETSCII string, its length being  its  first
character. Note that version 1 ARC archives are not designed to store relative
files; only version 2 archives are.

  You can read through an ARC archive with the following algorithm:

  1. Determine the start of the archive, with the above mentioned method.

  2. If you reached the end of the file, stop.

  3. Read in the first part of the header, up to the length of the  file  name
     (11 bytes).

  4. Read in the file name whose length you can fetch from the header.

  5. If the header version is 2, read in the record length and the date  stamp
     (3 bytes). Now you can process the header.

  6. If the file is RLE-packed then read in the RLE control byte (1 byte). Now
     you can process the file.

  7. Add the block count of the packed file size, multiplied by  254,  to  the
     file position of the beginning of the header. By seeking there,  you  get
     to the next header, go to step 2.
