ADX (file format)

From Wikipedia, the free encyclopedia

CRI ADX

Developer:	CRI Middleware
OS:	Cross-platform
Use:	Codec
License:	Proprietary
Website:	CRI Middleware

ADX is a lossy proprietary audio storage and compression format developed by CRI Middleware specifically for use in video games. The format is similar in principal to ADPCM but offers smaller storage sizes, the sound quality is quite impressive given the extremely small sample size used. The format also provides a looping feature that has proved useful for background music in various games that have adopted the format, such as the Dreamcast and later generation Sonic the Hedgehog games from SEGA.

1 File Format
2 Sample Format
- 2.1 Decoding Samples
3 Sources

[edit] File Format

The ADX format's specification is not freely available, however the internal structure of the most significant elements that make up the format have been described in various places on the internet. The information given here may be incomplete but is sufficient to build a working codec or transcoder.

The format is inherently big-endian even when used on little-endian architectures such as the original Xbox or x86 computer. The standard byte size is an octet. The basic structure is outlined below:

	0	1	2	4	7	8	12
0x0	0x80	0	Copyright Offset	Unknown	Channel Count	Sample Rate	Total Samples
0x10	Version Mark			Unknown		Loop Enabled (v3)	Loop begin sample index (v3)
0x20	Loop begin byte index (v3)			Loop Enabled (v4)		Loop begin sample index (v4) End index (v3)	Loop begin byte index (v4) End byte index (v3)
0x30	Loop end sample index (v4)			Loop end byte index (v4)		Unknown
0x40	...
???	[CopyrightOffset - 2] -> ASCII String: "(c)CRI"
...	[CopyrightOffset + 4] -> Audio Data

The version mark field should contain the big-endian values 0x01F40400 (Hexadecimal) for 'version' 4, or 0x01F40300 for 'version' 3. Fields labeled unknown contain unknown data or otherwise appear to be reserved (ie. filled with null bytes). Fields labeled with v3 or v4 but not both are 'unknown' in the other version they aren't marked with.

[edit] Sample Format

ADX encoded audio data itself is broken into a series of consecutive blocks of 18 bytes. Each block contains data for one channel only, they are laid out in 'frames', one block for each channel makes up a frame in ascending order. ie. left channel block, right channel block, left, right, LRLRLR... The layout of a block itself looks like this:

0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
Scale		32 4bit samples

Be aware that the scale is a 16bit unsigned big-endian integer.

[edit] Decoding Samples

As noted above, each sample consists of 4bits, the high 4bits of each byte are the first sample with the low 4bits being the second.

7	6	5	4	3	2	1	0
First sample				Second sample

The sample decoding method is reasonably straightforward (Demonstrated in C99):

/* sample_index is an uint_fast32_t incremented every time a sample has been decoded from every channel */
/* current_channel is an uint_fast8_t that holds the index for the channel currently being decoded (ie. 0 for left, 1 for right) */
/* audio_data_start is an uint_least32_t byte index of the first byte of audio data in the file (ie. adx_header->CopyrightOffset + 4) */
/* num_channels is a uint_least8_t channel count, that is 1 for mono, 2 for stereo, etc (This is adx_header->ChannelCount verbatim) */
/* raw_data is a uint8_t pointer to the start of where the file is located in memory */
/* previous_sample and second_previous_sample are both int_fast32_t's */

int_fast32_t sample;
uint_fast8_t sample_4bit;
uint_fast16_t block_scale;
uint_fast32_t data_index;

/* ... Get 4 bit sample ... */
data_index = audio_data_start + (sample_index / 32) * num_channels * 18 + current_channel * 18;
block_scale = ntohs( *(uint16_t*)&raw_data[data_index] );
data_index += 2 + sample_index % 32 / 2;
sample_4bit = raw_data[data_index];
if (sample_index % 2)                      /* If the sample index [starting at 0] is odd then we are decoding a secondary sample */
    sample_4bit &= 0x0F;
else                                       /* Otherwise it is a primary sample */
    sample_4bit >>= 4;

/* ... Decode 4 bit sample ... */
sample = sample_4bit;
if (sample_4bit & 8) sample -= 16;         /* Check the 4th bit (the sign), if negative then adjust for larger variable */

sample *= block_scale * volume;            /* Scale up the sample and amplify */
sample += previous_sample * 0x7298;        /* Incorporate previous sample data */
sample -= second_previous_sample * 0x3350; /* Incorporate previous previous sample data */
sample >>= 14;                             /* Divide the sample by 16384 */
if (sample > 32767)                        /* Round-off the sample within the valid range for a 16bit signed sample */
    sample = 32767;
else if (sample < -32768)
    sample = -32768;

second_previous_sample = previous_sample;  /* Update the previous samples for the current channel */
previous_sample = sample;

The acquisition of the desired byte seems more complex than it really is, the long calculation can actually be more easily performed using a byte counter that is incremented every 'audio frame' processed but for complete clarity, the entire calculation is shown. It is assumed the entire file has been either mmap-ed or else read into memory, however progressive reading (ie. streaming) of the file is entirely possible as well. We start at the beginning of the audio section of the ADX then move the pointer to the 'audio frame' currently being processed, the calculation finds the number of frames already processed then, of course, there is a block for each channel in each frame with each being 18 bytes that must be skipped over until we reach the frame we want. Once that is done it is then necessary to move to the block that belongs to the channel that is currently being processed in the frame (ie. if we are reading the right channel, we must skip over the left channel block). We need the block's scale value to decode the sample so we collect that using ntohs to convert from big to little endian if necessary. Finally, to get the sample we move past the 2 byte scale and proceed to the byte we want within the block, we divide by 2 in order to convert the 4bit sample index into an 8bit byte index.

The highest bit of the 4bit sample is the sign bit (negative when set, Two's complement) so the number has to be converted into an 8bit or larger signed integer for proper arithmetic handling as few processors can handle 4bit numbers natively. The next stage is to multiply the sample by the scale which gives it a rational amplitude, then amplify by a volume, values between 0x0-0x4000 tend to work best depending on the audio file in question, you can go higher but distortion effects may be noticeable (this is caused by truncation of the higher amplitudes by the 16bit roundoff). The next 2 steps include information from the previous two samples to bring the sample in line with the others, be aware that the previous samples progress across block boundaries but separate sample sets must be kept for each channel, the values start at 0 in the first audio frame. Lastly, the sample is divided by 16384 using a downshift then rounded off inside of the 16bit signed sample range (-32768 to 32767).

The decoded samples from each channel may still need to be interleaved together to form a standard interleaved audio stream suitable for use with most sound cards.