7.1 SOUND Resource Format

by Lance Ewing
Last updated: 18 August 1997
Retrived from the Internet Archive

NOTE: The original version of this document did not cover every aspect of the sound format. It made no mention that the volume control and noise voice were also part of AGI's sound format. It turns out that the data contained in a sound resource is so much like the data sent to the PCjr's T1 chip that I have included a lot of Peter Nortons T1 sound chip section from the "Programmers Guide to the IBM PC".

INTRODUCTION

Most people who think of AGI games remember that they played their music and sounds over the PC speaker. What they may not know is that all sounds are composed of four parts, one which is the melody, two which are accompaniment, and the final one being noise. The IBM PC can only play one note at a time so all AGI games for the PC play the melody by itself. The other three parts are still included in the data though because some PC comptibles, including the IBM PCjr, have more than one sound generator.

HISTORY

According to Donald B. Trivette author of 'The Official Book of King's Quest', a year before the IBM PCjr was announced IBM asked Sierra to create a game that would show off the new computers color graphics capabilities. IBM supplied the company with a prototype Junior, and Roberta set to work designing a new type of adventure game. The game produced was called King's Quest. This is important because the IBM PCjr had a different method of sound generation than the IBM compatibles of today. The sound data was stored to make it easy to send to the Juniors sound generators. This format appears to have remained right through the AGI games up until 1989-90 when SCI took over even though the PCjr had long since been surpassed by the 286, and 386.

SOUND AND THE IBM PCjr

The best known source of sound in the Junior is the TI SN76496A sound generator chip. This source has four separate sound voices. Three of these are tone generators and the fourth is a noise source. All four voices have an independent volume control, providing an evenly graduated set of 15 volume levels, plus a zero volume (off). Each of the three pure voices has an independently selected frequency. The noise voice has three preselected frequencies and a fourth option, which borrows the frequency of the third pure voice. The data stored in the AGI games is designed to be sent to these four voices.

THE TONE GENERATIONS

A tone is produced on a voice by passing the sound chip a 3-bit register address and then a 10-bit frequency divisor. The register address specifies which voice the tone will be produced on. This is done through port 192 on the IBM PCjr by sending it 2 bytes in the following format:

First Byte

 7  6  5  4  3  2  1  0

 1  .  .  .  .  .  .  .      Identifies first byte (command byte)
 .  R0 R1 R2 .  .  .  .      Register number in T1 chip (0, 2, 4).
 .  .  .  .  F6 F7 F8 F9     4 of 10-bits in frequency count.

Second Byte

 7  6  5  4  3  2  1  0

 0  .  .  .  .  .  .  .      Identifies second byte (completing byte)
 .  X  .  .  .  .  .  .      Unused, ignored.
 .  .  F0 F1 F2 F3 F4 F5     6 of 10-bits in frequency count.

R0	R1	R2

0	0	0		Holds voice 1 frequency number.
0	1	0		Holds voice 2 frequency number.
1	0	0		Holds voice 3 frequency number.

The actual frequency produced is the 10-bit frequency divisor given by F0 to F9 divided into 1/32 of the system clock frequency (3.579 MHz) which turns out to be 111,860 Hz. Keeping all this in mind, the following is the formula for calculating the frequency:

F = 111860 / (((Byte-2 AND 0x3F) * 16) + (Byte-1 MOD 16));

Note: The order of the bytes are reversed for AGI sound data.

ATTENUATION

Each voice in the T1 sound chip has an independent sound-level control, which is calculated in terms of decibels of attenuation, or softening. There are four bits uses to control the volume. These bits, labeled A0 through A3, can be set independently or added together to produce sixteen volume levels as shown below.

 A0 A1 A2 A3        Value        Attenuation (decibels)

  .  .  .  1          1                    2
  .  .  1  .          2                    4
  .  1  .  .          4                    8
  1  .  .  .          8                   16
  1  1  1  1                           Volume off

When a bit is set on, the sound is attenuated (reduced) by a specific amount: either 2, 4, 8, or 16 decibels. When all four bits are set on, the sound is turned completely off. When all four bits are off, the sound is at
its fullest volume.

The attenuation is set by sending a byte of the following format to the T1 sound chip:

 7  6  5  4  3  2  1  0

 1  .  .  .  .  .  .  .      Identifies first byte (command byte)
 .  R0 R1 R2 .  .  .  .      Register number in T1 chip (1, 3, 5, or 7).
 .  .  .  .  A0 A1 A2 A3     4 attenuation bits

		R0	R1	R2

      0  0  1     Holds voice 1 attenuation.
      0  1  1     Holds voice 2 attenuation.
      1  0  1     Holds voice 3 attenuation.
      1  1  1     Holds noise voice attenuation.

THE NOISE GENERATOR

There are two modes for the noise operation, besides the four frequency selections. One, called periodic noise, produces a steady sound; the other, called white noise, produces a hissing sound. These two modes are controlled by a bit known as the FB bit. When FB is 0, the periodic noise is generated; when FB is 1, the white noise is produced.

Two bits, known as NF0 and NF1, control the frequency at which the noise generator works. Three of the four possible combinations of NF0 and NF1 set an independent noise frequency based on the timer. The fourth combination borrows the frequency from the third of the three pure voices made by the tone generators.

 NF0  NF1       Noise Frequency

  0    0         1,193,180 / 512 = 2330
  0    1         1,193,180 / 1024 = 1165
  1    0         1,193,180 / 2048 = 583

The noise frequency is set by sending a byte of the following format to the T1 sound chip:

 7  6  5  4  3  2  1  0

 1  .  .  .  .  .  .  .      Identifies first byte (command byte)
 .  1  1  0  .  .  .  .      Register number in T1 chip (6)
 .  .  .  .  X  .  .  .      Unused, ignored; can be set to 0 or 1
 .  .  .  .  .  FB .  .      1 for white noise, 0 for periodic
 .  .  .  .  .  . NF0 NF1    2 noise frequency control bits

AGI SOUND FILES

We now know enough about the PCjr's T1 sound chip to discuss the AGI sound format. The sound is stored as four separate units of data, one for each voice. Each sound file stored in the VOL files has an 8-bit header which contains offsets into file. The format is as follows:

Byte	Meaning
0-1	Offset of first voice data.
2-3	Offset of second voice data.
4-5	Offset of third voice data.
6-7	Offset of noise voice data.

The data starting at each voice offset is stored as 5-byte notes which give the frequency and duration of a note played on that voice. The 5 bytes have the following meanings:

Byte

0-1 Duration (16-bit word)
2-3 Frequency divisor of the format described in the PCjr section
above except the two bytes are around the other way.
4 Attenuation of the note in the format described above in the PCjr
section.

Byte	Meaning
0-1	Duration (16-bit word)
2-3	Frequency divisor of the format described in the PCjr section above except the two bytes are around the other way.
4	Attenuation of the note in the format described above in the PCjr section.

Note that the last three bytes were around the other way in version 1 of the AGI interpreter. The above order is opposite from the order that would be output to the T1 sound chip.

Each voice's data section in the SOUND resource file is usually terminated by two consecutive 0xFF codes. Another way of checking for the end is to see if it has reached the start of the next voice section, or in the case of the noise voise, the end of the SOUND data.

PLAYING THE SOUNDS ON A SOUND CARD

Writing a program to play the tunes will require four pointers which keep track of where in each voice segment the program currently is since all four voices are played simultaneously. The first voice is the melody and is the voice that is played on the PC speaker in today's modern PC compatibles, the other two voices being ignored. I'd imagine that other platforms such as the Amiga and Macintosh would probably play all three voices.

A program would start by reading each of the four offsets in the header. It would then go through a loop which begins by reading the first note of each voice section. The duration's are then monitored and when each note finishes, another note is read. Note that the notes for each voice will usually finish at different times. The program finishes when all of the voice sections have been entirely played. This will usually occur for each voice at the same time but not necessarily I don't think.

Then of course you could always convert the AGI SOUND to a MIDI file and play that which will sound a hundred times better :)

CALCULATING FREQUENCIES WHEN PLAYING NOTES ON A SOUND CARD

My program reads in the duration as a 16 bit word. It then loads the two following bytes and calculates the frequency as follows:

Freq. = 111860 / (((Byte-2 AND 0x3F) * 16) + (Byte-3 MOD 16));

The 111860 comes from the PCjr discussion above. Note that the bytes are in the opposite order from that mentioned in the PCjr information.

Remember also that the SOUND format includes volume information for each voice. The exact conversion from the decible values to the volume control on todays sound cards is uncertain at this stage.

APPENDIX 1: SOUND FORMAT SUMMARY

The header consists of four two-byte offsets, one for each voice. The low byte is first, followed by the high byte. Each offset points to the note data for the relevant voice. The note data for a voice consists entirely of five-byte note entries of the following format:

FIRDT BYTE
SECOND BYTE

Note duration (low byte and then high byte).

THIRD BYTE

  ---> In the case of a tone voice,

   7  6  5  4  3  2  1  0

   0  .  .  .  .  .  .  .      Always 0.
   .  X  .  .  .  .  .  .      Unused, ignored.
   .  .  F0 F1 F2 F3 F4 F5     6 of 10-bits in frequency count.


  --->  In the case of the noise voice, this byte is equal to zero.

FOURTH BYTE

  ---> In the case of a tone voice,

   7  6  5  4  3  2  1  0

   1  .  .  .  .  .  .  .      Always 1.
   .  R0 R1 R2 .  .  .  .      Register number in T1 chip (0, 2, 4).
   .  .  .  .  F6 F7 F8 F9     4 of 10-bits in frequency count.

   F = frequency = 111860 / (((Byte-3 AND 0x3F) * 16) + (Byte-4 MOD 16))
   R = register address


   ---> In the case of the noise voice,

   7  6  5  4  3  2  1  0

   1  .  .  .  .  .  .  .      Always 1.
   .  1  1  0  .  .  .  .      Register number in T1 chip (6)
   .  .  .  .  X  .  .  .      Unused, ignored; can be set to 0 or 1
   .  .  .  .  .  FB .  .      1 for white noise, 0 for periodic
   .  .  .  .  .  . NF0 NF1    2 noise frequency control bits

   NF0  NF1       Noise Frequency

    0    0         1,193,180 / 512 = 2330
    0    1         1,193,180 / 1024 = 1165
    1    0         1,193,180 / 2048 = 583

FIFTH BYTE

   7  6  5  4  3  2  1  0

   1  .  .  .  .  .  .  .      Identifies first byte (command byte)
   .  R0 R1 R2 .  .  .  .      Register number in T1 chip (1, 3, 5, or 7).
   .  .  .  .  A0 A1 A2 A3     4 attenuation bits


   A0 A1 A2 A3        Value        Attenuation (decibels)

    .  .  .  1          1                    2
    .  .  1  .          2                    4
    .  1  .  .          4                    8
    1  .  .  .          8                   16
    1  1  1  1                           Volume off


 Register Addresses:

   R0 R1 R2        Parameter

    0  0  0        Voice 1 frequency control number (10 bits)
    0  0  1        Voice 1 attenuation (4 bits)
    0  1  0        Voice 2 frequency control number (10 bits)
    0  1  1        Voice 2 attenuation (4 bits)
    1  0  0        Voice 3 frequency control number (10 bits)
    1  0  1        Voice 3 attenuation (4 bits)
    1  1  0        Noise voice control (4 bits; 3 used)
    1  1  1        Noise voice attenuation (4 bits)

The note data for one voice is terminated by two consecutive 0xFF values.

APPENDIX 2: AGI v1.12 SOUND FORMAT

The sound format used in version 1.12 of the AGI interpreter was quite different from the format described above for AGIv2 and AGIv3. It still uses the PCjr format for the note data but it does not store the duration as a separate field. The best way to describe it is by an example:

90 80 16 B0 A0 15 D0 C0 0E FF E4 00 80 17 A0 16 C0 11 00 80 16 B1 A0 14 C0 12 00 80 16 B2 A0 16 C0 13 00 ...

The first thing to point out is that the PCjr note data is in the opposite order to AGIv2. Secondly, all four parts are included together rather than in separate sections. Taking the above example, lets look at the first note and show the equivalent AGIv2 notation.

90 80 16 --> 03 00 16 80 90

Now, the duration isn't immediately obvious, but we will come to that in a short while. The followint three bytes give the first note for the second part, the third part, and the noise part (at least as far as this example is concerned).

B0 A0 15 --> 03 00 15 A0 B0
D0 C0 0E --> 03 00 0E C0 D0
FF E4 00 --> 33 00 00 E4 FF

The data that follows after these initial four starting notes is basically any changes in the note value which each 3 duration step. For example,

80 17 --> 03 00 17 80 90

Note that 0x90 doesn't need to be stored because that byte has retained its value. Every 0x00 byte that is encountered is the end of one set of note changes. Each set of note changes is the equivalent of a duration of 3 in the AGIv2 format. Continuing with our example,

A0 16 --> 03 00 16 A0 B0
C0 11 --> 03 00 11 C0 D0

The example now encounters a 0x00 byte which means that the noise voice isn't changed at this point. In fact, from the AGIv2 equivalent note above, you will see that the noise note will not change until 49 (or 0x33) sets of note changes have been processed.

80 16 --> 03 00 16 80 90
B1 A0 14 --> 03 00 14 A0 B1
C0 12 --> 03 00 12 C0 D0

How exactly the AGIv1.12 interpreter knows which voice is having its notes changed, and which bytes of the note are being changed, is not yet certain. On some occassion a sets of changes will contain only one byte which corresponds to one of the bytes which makes up one of the voices note value, but how it knows which one is a mystery to me.

On other occassions, there could be a whole chain of 0x00 bytes which means that during that whole time, none of the voices are changing their notes value.

You can help keep The Sierra Help Pages and its affiliates alive by helping to defray some of the costs of hosting this site. If it has been of help to you, please consider contributing to help keep it online.
Thank you.

Top