AZID Readme v1.8 build 825 (2002-02-19).
Copyright (C) 1997-2002 Midas <midas@egon.gyaloglo.hu>
-------------------------------------------------------------------------------


Introduction
============

This is the documentation for the AC3 decoder application, Azid. It is
written by Midas <midas@egon.gyaloglo.hu>, copyright (C) 1997-2002
Midas.

The most recent update of this program can be found on the excellent
pages of http://www.doom9.org/



Usage and legal conditions:
---------------------------
  This is a test implemenation of standard A/52 from ATSC (Digital Audio
  Compression Standard), and it may contain algorithms covered by pending
  patents. This application may solely be used for proving that bitstreams
  are compliant to this standard for test and demonstration purposes only.
  Any other use may be prohibited by law in your country. The author has
  no liability regarding this application whatsoever. This application
  may be distributed freely unless prohibited by law.



Overview
--------

This document assumes that you know what AC3 is. I wont give any
introduction to what AC3 is and how it is used. The internet is full
of pages describing what AC3 is all about, the more authorative is
probably for the authors of AC3: Dolby. http://www.dolby.com

Now for the technical stuff: AC3 is a digital compression algorithm
which may compress up to 5 full bandwidth channels and one
low-frequency effects channel (with limited bandwidth of 120Hz) into a
bitstream. The size of this compressed bitstream is typically reduced
by a factor of 13 compared with the raw data rate.

The specification for the AC3 decoder can be found in the ATSC
specification A/52 at http://www.dolby.com or http://www.atsc.org.

AC3 encoded audio is divided into frames. One frame gives 1536 samples
of audio or 32ms of audio at 48kHz sampling rate. A single frame is
divided into several sub-sections:

      - syncronization
      - bit stream information (BSI)
      - 6 audio blocks
      - auxilliary data (and CRC)

The BSI section contains information regarding the bitstream and the
current frame. It contains information like samplerate, number of
encoded channels, downmix-levels, dynamic compression types, program
contents, etc.

The audio block contains the actual encoded audio. One block gives 256
samples (approx. 5.3ms at 48kHz). One audio block is atomic; the audio
decoding operation is repeated for each of these six blocks. The
specific details of this operation can be found in the A/52 spec.


Decoder operation
-----------------

The decoder decodes an audio block into elementary channels of
audio. These elementary channels represents the same channels that
where fed into the ac3 coder at the studio; like center, left, right,
etc. If the number of actual output channels is fewer than the encoded
number of audio channels, the decoder must downmix these channels into
the correct number of channels.

In this documentation these channels are called "input channels" and
the channels are named: left (l), right (r), center (c), surround left
(sl or s), surround right (sr) and low-frequency effect channel (lfe).

The downmix operation reduces the number of input channels to the
requested number of speakers. This operation is controlled by the
option -d. It selects how many front and rear speakers the decoder
should decode to. If for example -d2/0 (2 front speakers, no rear
speakers) is selected, everything is mixed into these two
speakers/channels.

Then the audio is fed to the output selector. This controls which of
these channels to route to the actual speaker output. The first
speaker output is named output speaker 0, the next output speaker 1,
etc. The decoder supports up to max. 6 output speakers.

The -o operaion controls what channel(s) to output. If -d2/0 is
selected, the left and the right channel may be output with the -ol,r
option. In this case, all other channels than l or r does not contain
any audio because -d2/0 generates only audio in the l and r
channels. Other sequences of channel output may also be chosen, or the
same channel may be output several times. For example: -or,l or -oc,c
is both legal options.

The -l and -L options control how the LFE channel is downmixed. The -l
selects the downmix level of the LFE channel into the LFE (output)
channel, while -L controls the amount of LFE audio into the left and
right channels.

TIP: If -d3/2 decoding is chosen, no downmix is performed and it's
possible to listen to individual channels selectable with the -o
option. E.g. use -ol,r to listen to the left and right channels, or
use -osl,sr to listen to the surround channels.

A special command option named --ch may be used to individually change
the attributes for each channel. This option may be used both on input
channels (named l,c,r,etc.) or on output speakers (numbers 0-5). The
attributes may contain gain (see -g for syntax) and/or a dynamic
compression value (see -c).



NEWS
====

This is the list of new features added to Azid:

v1.8
----

- Fixed DRC bug that caused occational clicks and pops when decoding
  using normal dynamic compression.

- Fixed WAV-header bug. Azid now generates WAV-files with proper
  headers. This bug occurred on wav32 output, which would generate an
  integer header with float data.

- Added new file output types and renamed them to give them simpler
  and more informative names. See -F option for more information.

- Fixed bug in azid soundcard playback that caused azid to hang on
  ac3-files shorter than 0.5 seconds.

- Fixed the downmix overflow logic. The decoder will now print the
  largest overflow within a block, not the first, making it more
  reliable for finding the maximum level of the file.

- Added a warning output enable/disable function (-w).

- Added statistics output printing the total maximum level of the
  channels, and how many samples that overflowed.

- Added support of stdout/stdin streaming. This can be done by using
  the '-' character as input and/or output filename.

- Proper Ctrl+C handling

- Fixed proper returnvalue from azid.exe

- Added a two-pass maximize function. This can be used to maximize the
  volume of the decoded audio.

- Added sectional decoding opertunities with -B and -E. With this
  function a section of the input file can be decoded, not only the
  entire file

- Added support for commandfile-scripts for easier reuse of options

- Fixed proper dynamic compression for channel 1 on 1+1 ac3 files




DESCRIPTION
===========

This section describes each setting of the decoder and how it affects the
decoding process.

The command-line syntax of Azid is:

    Azid [options] input.ac3 [output.wav]

The output.wav file is optional. If omitted, Azid sends the audio to your
soundcard. If the -N is omitted, no ouput is produced (neither to a file nor
to the soundcard).

The numbers of entries in the -o option controls the number of output
channels that Azid will produce. The default option -ol,r will produce
a stereo wav or play stereo sound. If, however, -oc is used, the
wav-file will be mono and the playback will be mono. More than 2 items
in the -o option will create multi-channel output. Dependent on your
soundcard (driver), multichannel output might not be possible.


Streaming
---------

If the input file is given as '-' Azid will take its input from stdin
stream instead of a file, and if the output file is given as '-' its
output will be sent to stdout stream. This has two major side effects:
If the input file is stdin, you cannot use options that will try to
seek the file, like the two-pass maximize in the input. The same
applies to the output, you cannot use wav-types outputs, because the
same seeking mechanisms are used there to write the proper wav-header.

The second side effect is that the streaming ability of Windows is
broken. To be able to utilize the streaming facility of azid, you need
to run either Linux or Cygwin.


Decoding overflow
-----------------

When decoding 1 on 1 (2ch AC3 to 2 channel output and 6ch AC3 to 6
channel output, etc.), small overflows can be observed from time to
time when not using any dynamic compression. This is a normal because
of the way AC3 works. Quote from the AC3 specification (p.93
1.paragraph):

"... Since the output signal consists of the original signal plus
coding error, it is possible for the output signal to exceed 100%
level even though the original input signal was less than or equal to
100% level."

When a downmix overflow is encountered, the output signal will be
saturated to 0dB FS to prevent overflow (wrap around).



COMMAND OPTIONS
===============


-a, --maximize  (*NEW* v1.8)
--------------

Default: omitted

This option will enable a two-pass maximize function of azid. Azid
will in the first pass scan the entire file to find the maximum
level. In the second pass the audio will be properly decoded, gaining
it up to 0dB FS.

NOTE: Sometimes when you use this function, the output will still
create downmix overflow warnings. This is normal. It happens because
the signal has touched the 0dB FS, or because of some random value
within the signal has caused it to slightly overload.



-b BOOL, --bsi-log=BOOL 
-----------------------

Default: true

The AC3 bitstream contains a BSI (Bit Stream Information)
section. This section contains information about the bitstream, like
sampling rate, number of channels, and other informative information.

This command option enables/disables such print-outs. A typlical BSI
print-out looks like this:


      +------ BSI -----
      |  Bitrate: 448 kbit (48 kHz)
      |  Mode: Complete Main (CM)
      |  Audio mode: 2/2  L,R,SL,SR
      |  Surround mix level: -3.0dB
      |  Dialogue level: -27dB
      |  Language: English
      |  Mixlevel: 105dB SPL
      |  Roomtype: Small root, flat monitor
      |  Stream: Copyright protected, Original stream
      +----------------


-B TIME, --begin=TIME  (*NEW* v1.8)
---------------------

Default: #0

This options enables you to control when or where in the file the
decoding should start. The decoder skips the frames until the
specified time or frame has been reached and starts from there to
produce output.

Please note that azid doesnt simply skip to the indicated time, but
will parse through the stream to the indicated time. This is done to
be able to find the correct point to start decoding.

The argument can be given as a frame number (#num) or as a time
([[HH:]MM:]SS[.mss]). Examples:

  -B#0     Start decoding at frame 0 (inclusive)
  -B#100   Start decoding at frame 100
  -B23     Start decoding at second 23
  -B1:00   Start decoding at one minute



-c COMPR, --dcompr=COMPR
------------------------

Default: none

This option sets the overall dynamic compression in the decoder. This
value is applied to every output speaker.

The bitstream contains information of how much to amplify or attenuate
the sound to decrease the overall dynamic variations (loudness) in the
program contents. Different options exists to choose the wanted
dynamic reduction:


  o none     No dynamic compression. The program contents is unchanged.

  o normal   Normal dynamic compression. Normal in-store decoders use
	     this as an hardcoded default.

  o light    Light dynamic compression. This is 50% (-6db) of the
	     reduction/gain that normal dynamic compression would give.

  o heavy    Heavy dynamic compression. Intended for poor listening
	     environment with much background noise.

  o inverse  Dynamic expansion. This is the inverse value of the light
	     dynamic compression, i.e. it makes strong sounds stronger
             and weaker sounds weaker.



-C LEVEL, --clevel=LEVEL
------------------------

Default: BSI

This command option controls the center dowmix level into the LR
channels.  Normally, the BSI section contains a field which tells the
decoder of how to downmix the center channel into the LR channels.

With this option, the user may override the BSI center downmix level
and specify a custom value. Note that this option is only active when
the output decode mode (-d) is 2/x.

Allowable values is gain values (either in db's or a positive
numerical value) or BSI. When BSI is selected, the center downmix
level gets its value from the BSI section.



--ch#=ATTRIB0[,ATTRIB1[,...]]
-----------------------------

This option sets one or more attributes for the given channel. There
are two major types of channels available:

    o The input channels. This is channels coming directly from the
      decoder prior to downmixing. Each of these channes represent the
      same as the channels put into the ac3 coder at the studio.
      Allowed channel names are: l,c,r,sl,sr,s or lfe.

    o The output channel or speaker. It refers to a output channel
      after downmixing and output selecting (-o). It refers directly
      to the index of the -o option. E.g. '-ol,r' implies that output
      channel 0 the left channel, and the output channel 1 is the
      right channel.  If '-oc,c --ch0=12db' is used, both output will
      contain the center channel, but only the first channel will have
      12db gain.  Allowed output channel names are: 0,1,2,3,4 and 5.

The attributes may be:

    o Channel gain. This specifies how much the signal on the given
      channel should be amplified/attenuated. Legal values are
      positive numbers or a logarithmic value written with the postfix
      'db'.  Examples: --chl=12 --chc=0 --ch0=+3db --ch1=-3db
     
    o Channel dynamic compression. This specifies the dynamic
      compression to use for that channel. Allowable values are:
      none,light,normal, heavy and inverse. Examples: --chc=normal

Several attributes may be separated by commas. Like this:

    --chc=normal,3db  or  --chlfe=light,0.5  or  --ch0=none,-3db



-d FRONT/REAR, --decode=FRONT/REAR
----------------------------------

Default: 2/0

This option selects how many front and rear speakers the decoder
should downmix for. The argument is given as front speakers/rear
speakers.

Note that this option only sets the downmix type, not the actual
output.  The -o option controls which channels to output. This option
does not control the LFE channel (see -l and -L).

Possible values are: 1/0, 2/0, 3/0, 1/1, 2/1, 3/1, 1/2, 2/2, 3/2



-e ERROR_ACTION, --erraction=ERROR_ACTION
-----------------------------------------

Default: zero

This options contols the decoder action in case of bitstream
errors. Possible values are:

    o quit. This causes Azid to quit the entire application if it
      encounters an error in the bitstream.

    o zero. The decoder will skip the current frame of ac3-data and
      pad the output with silence and continue with the next frame of
      data.



-E TIME, --end=TIME  (*NEW* v1.8)
-------------------

This option sets when to stop decoding. (See -B). The argument can be
given as a frame number (as #nn) or as a time (as [[HH:]MM:]SS[.mss]).
The argument is inclusive, i.e. if #100 is given, it will decode frame
100 and then stop.



-f BOOL, --rear-filter=BOOL
---------------------------

Default: off

This option controls rear-channel filtering in 2/0 output mode. The
filter is a 2nd order Butterworth filter with at -3 dB point at 7
kHz. There are two major applications for this feature:

    o To provide proper Pro Logic downmix of the rear channels

    o Phasing-problems in the downmix (washy sound) caused by the rear
      channel downmix into the L R channels.

Usually the rear channels are phased 90 deg in respect of the front
channels prior or inside the ac3 encoder. This is done to avoid
phasing problems when downmixing the program contents to two
channels. Some sources do not provide this shifting, and thus this
feature is added.

The filter provides an increasing phase shift according to
frequency. It is 90 deg at 7kHz.

NOTE: This option is only effective when 2/0 output mode is selected
(-d 2/0).



-F FILE_TYPE, --filetype=FILE_TYPE
----------------------------------

Default: wav

Selects the file type to generate. Possible values are:

  o wav. Generates "normal" 16-bits wav.

  o wav24. Generate 24-bit integer wav.

  o wav_float. Generate 32-bits floating-point (IEEE) wavs.

  o pcm. Generate 16-bit pcm (equal to wav, only without
    the wav-header)

  o pcm32. Generate 32-bit integer PCM

  o pcm_float. Generate 32-bits floating-point (IEEE) PCM
    output.



-g GAIN, --gain=GAIN
--------------------

Default: 1.0 (or 0db)

This option controls the main (output speaker) gain. The value can be
given in db's (by specifying "db" after the argument) or a positive
numerical value. Examples: -g-3db, -g5.3, -g6db



-i FILE, --script=FILE  (*NEW* v1.8)
----------------------

This option enables you to set all azid option from a command script
file. This file uses the following syntax:

  o All lines beginning with '#' or ';' is regarded as comments

  o Blank lines will be ignored

  o An option is given as command[=argument]. The argument can be
    omitted if the command does not require an argument.

  o The command are equal to the long names of the commandline options
    (the -- options).

Example:

  gain = 9dB
  filetype = pcm32
  no-output



-l LFE_LEVEL, --lfe=LFE_LEVEL
-----------------------------

Default: 0.0

This controls the downmix-level of the LFE channel into the LFE output
speaker. I.e. if this options is set to a non-zero value, the LFE
channel output may be listened to with the -olfe option.



-L LRLFE_LEVEL, --lrlfe=LRLFE_LEVEL
-----------------------------------

Default: 1.0 (or 0db)

This controls the downmix-level of the LFE channel into the LR
channels.



-m MONO_MODE, --mono=MONO_MODE
------------------------------

Default: stereo

This option control what type of 1+1 decoding should be used. A
special channel configuration exists, where the stream contains two
mono audio channels (called 1+1). Selectable options:

    o ch1. Route channel 1 into center.

    o ch2. Route channel 2 into center.

    o mono. Route channel 1 + channel to into center.

    o stereo. Route channel 1 into left and channel 2 into right.



-M BOOL, --matrix-log=BOOL
--------------------------

Default: off

This option makes the decoder print the dowmix matrix with its
individual coeffesients. A typical print would look something like
this:


      +------ DOWNNMIX MATRIX -----
      |          IN0     IN1     IN2     IN3     IN4     IN5
      |  L  : +0.2426 +0.1716 +0.0000 -0.1716 -0.1716 +0.2426
      |  C  : +0.0000 +0.0000 +0.0000 +0.0000 +0.0000
      |  R  : +0.0000 +0.1716 +0.2426 +0.1716 +0.1716 +0.2426
      |  SL :                 +0.0000 +0.0000 +0.0000
      |  SR :                 +0.0000 +0.0000 +0.0000
      |  LFE:         +0.0000 +0.0000 +0.0000 +0.0000 +0.0000
      +----------------------------

The channels on the top (INx) are the input channels. Which channel
each of these inputs represent can be read from the audio mode section
in the BSI printout:

      |  Audio mode: 2/2  L,R,SL,SR

Here L is IN0, R is IN1, etc. Note that the input channel gain does
not affect the downmix matrix coeffesients, while -C and -S does.



-n BOOL, --norm=BOOL
--------------------

Default: false

This selects if the decoder should use dialog normalization
reduction. The normal dialogue level in a program is defined a
reference of loudness, 0db.  The BSI info variable "dialogue level"
informs how much this dialogue level is under 0db full-scale (FS) - or
how much headroom there is above the dialogue level before clip.

One of Dolby's intentions with this variable is to ensure that all
dialogue levels are played back with the same volume, regardless of
the program's amount of headroom. It is good to have when the movie
you're looking at is interrupted by a commercial break, where the
headroom varies enormously.  (It prevents blowing your ears off when
the break comes.)

This feature is implemented by attenuate everything such that all
programs have 31 db headroom, regardless of its original headroom. For
a typical -27db headroom program, this will case a -4db gain.



-N, --no-output
---------------

This options causes the decoder not to produce any output, neither to
a wav- file nor to the soundcard. This is ideal for running through
the file to check its validy. It requires no arguments.



-o SEQUENCE, --output=SEQUENCE
------------------------------

Default: l,r

This options controls the channel and the sequence of the output
channels.  The selectable channels are all input channels
(l,c,r,s,sl,sr and lfe) and a special zero-data channel (0). Up to 6
channels may be listed with this command.

The -d option controls what kind of decoding target to use. This -o
option controls which of these channels to ouput (and their
sequence). Let's say for example that you have a 4 channel
soundcard. You would like to have left and right on one of the outputs
and surround left and surround left on the other. To do this you must
specify -ol,r,sl,sr.



-p PRESET, --preset=PRESET
--------------------------

Default: 2ch

Azid has some pre-defined settings. The default is 2/0 which all other
settings are derived from. The default command-prompt is:

    -ezero -b1 -z1 -M0 -mstereo -ssurround -d2/0 -ol,r -L1 -l0
    -Cbsi -Sbsi -cnone -n0 -g1

(which is the same as using -p2ch and not using the -p option at
all). The pre-defined options are:

    o 2ch. This is the configuration for 2/0 channel decoding. This
      option is really redundant, since this is the default preset.

    o 4ch. This setting will produce a 4 channel output, 2/2. The
      command prompt equivalent is: -d2/2 -ol,r,sl,sr

    o 6ch. This setting will produce a 6 channel output, 3/2+lfe. The
      command prompt equivalent is: -d3/2 -L0 -l1 -ol,r,sl,sr,c,lfe



-q, --no-logging
----------------

This option will disable the output logging. No BSI info, no settings,
nor bitstream error will be shown. This option overrides the -b, -z
and -M option.  It requires no argument.



-Q, --no-progress
-----------------

This option will disable the decoding progress indicator. It does not
require any arguments.



-s STEREO_MODE, --stereo=STEREO_MODE
------------------------------------

Default: surround

When 2/0 decoding is selected, this option controls what kind of
stereo downmix should be applied



-S LEVEL, --slevel=LEVEL
------------------------

Default: BSI

This command option controls the surround dowmix level into the LR
channels.  Normally, the BSI section contains a field which tells the
decoder of how to downmix the surround channels into the LR channels.

With this option, the user may override the BSI surround downmix level
and specify a custom value. Note that this option is only active when
the output decode mode (-d) is 2/x and the input stream is either x/1
or x/2.

Allowable values is gain values (either in db's or a positive
numerical value) or BSI. When BSI is selected, the surround downmix
level gets its value from the BSI section.



-w BOOL, --warn=BOOL  (*NEW* v1.8)
--------------------

Default: on

This options selects if warning output should be printed. Warnings are
messages like "Downmix overflow" etc.



-z BOOL, --set-log=BOOL
-----------------------

Default: on

This option selects if the current settings should be printed in an
easy- readable output. Like this:

      +------ SETTINGS -----
      |  Input channel configuration:
      |    Left     :  None    compression, +0dB gain
      |    Center   :  None    compression, +0dB gain
      |    Right    :  None    compression, +0dB gain
      |    Sur Left :  None    compression, +0dB gain
      |    Sur Right:  None    compression, +0dB gain
      |    LFE      :  None    compression, +0dB gain
      |  Output configuration: 2/0
      |    Ch0 [Left     ]:  None    compression, +0dB gain
      |    Ch1 [Right    ]:  None    compression, +0dB gain
      |  Output Dual mono mode: Stereo
      |  Output Stereo mode: Dolby surround compatible
      |  LFE levels: To LR +0dB, To LFE -INF
      |  Center   mix level: +40.0dB
      |  Surround mix level: BSI
      |  Dialog normalization: No
      +---------------------
