We are proud to publish the following article by Blake Troise (ThinkSpace, ProtoDome) as part of our contributor articles series. Feel free to leave comments, and do let us know if you would like to send us articles to share with the wider community!
Contributor: Blake Troise
A programmable sound generator (PSG) is an integrated circuit (IC) with the ability to generate sound by synthesizing basic waveforms. PSGs are often called sound chips, however not all sound chips are PSGs. PSGs were designed to be instructed by software commands and would usually be housed alongside a microprocessor as part of a computer system. The general benefit of the sound chip was that audio processing could be delegated to a dedicated system, freeing processing cycles for other functions, or simply, as Radio-Electronics magazine 1981 explains, “[controlling] music or sound effects from software, without overtaxing the computer”. One of the main reasons for the popularity of PSGs was that microcontrollers capable of generating pitches had become cheap enough to manufacture at the beginning of the 1980s. This became a commercially viable option for computational sound generation as part as an affordable home system. As such, PSGs were commonly utilised in the video game systems of the 1980s to mid 1990s, for example the Nintendo Entertainment System (Ricoh 2A03/2A07), Atari 2600 (Atari TIA), Sega Master System (Texas Instruments SN76489A, Yamaha YM-2413) and numerous others. The chips were also included in early home computers, especially those with gaming capabilities; the most popular example being the Commodore 64’s PSG, the SID chip. Perhaps one of the main reasons for the ubiquity of PSGs in video game applications is the medium’s requirement for a multimedia experience, with the desire for sound effects and music in electronic games predating software games.
Due to the prevalence of inexpensive computers, the eighties saw the emergence of a synergy of both programmer and composer practices due to both software accessibility and thus increased development in entertainment software. Many of these musicians, such as Hirokazu ‘Hip’ Tanaka, treated the computer (in Tanaka’s case, the Nintendo Entertainment System (NES)) as a means of expressing ‘serious’ music and approached their composition as such. Each PSG however, had a set of numerous musical limitations the composer would have to adhere to. One of the most compositionally influential of these restrictions (and defining characteristic of computer music of the period) is the PSGs limited polyphony.
The amount of voices a PSG could provide varied between the different chips. The NES’ Ricoh 2A03/2A07, had five separate ‘channels’ (individual functions for generating a single waveform); two pulse wave channels, one triangle channel, one pseudo-random noise generator and a rarely used Delta Modulation Channel (DMC) for playing Differential Pulse-Code Modulation (DPCM) samples (Figure 1). The Commodore 64 had four channels with a very similar set of waveforms to the Nintendo Entertainment System, however, unlike the NES, these were not restricted to a single wave function. Other chips such as the General Instruments AY 3-8910 (found in the MSX computer and Sinclair ZX spectrum 128, to name two popular examples) included very similar waveforms with a very similar amount of channels; in the AY 3-8910’s case, three square wave channels and a pseudo-random noise generator. The limited number of channels each PSG provided posed a significant compositional challenge as to how best to maximise the musical content with only a few voices.
Figure 1. Oscilloscope examples of the basic common waveforms various PSG provided. Top Left: Saw Wave. Top Right: Pulse/Square Wave. Bottom Left: Triangle Wave. Bottom Right: Noise.
Writing music with limited polyphony is not unique to the sound chip; for hundreds of years composers have written for three, two and even single voices. Pieces such as Adagio in Bb (for two clarinets and three bassett horns) by Mozart, Jägerlied by Schubert (for two horns or voices), most of Bach’s infamous chorales and even solo piano works for a single performer, make use of a small collection of monophonic voices (or fingers!) to create music. Even commercial synthesisers during the seventies and eighties would occasionally be dedicated to producing a single, monophonic voice and often, when polyphony was available, was still limited to a maximum of only eight voices. What separates these composers works (and other electronic hardware) from PSG composition is the idea of necessity. Whilst traditionally the basic form of the woodwind ensemble is the quintet, this is usually a creative choice and can be “expanded and contracted to meet the needs of the composer”. The piano composer in desperate need of further polyphony can simply add another performer to allow access a further set of ten (or more) notes. No such luxury was (in most cases) available to the sound chip composer.
Perhaps the simplest (and most common) approach seen in PSG composition was to consider each channel as an individual instrument, a similar method to the aforementioned ensemble writing. The iconic Super Mario Bros theme by K. Kondo is a good example of this process (figure 2). All three pitched channels of the Nintendo Entertainment System’s PSG move together as a three part harmony for the first thematic section. In the second section of the piece, the triangle channel diverges from rhythmic unison and plays a very simple arpeggiating bassline. This approach can be seen in video game soundtracks such as Mega Man 2 (1988) by T. Tateishi and M. Matsumae (programmed by Yoshihiro Sakaguchi), Legacy of the Wizard (1989) by Y. Koshiro and Castlevania (1986) by S. Terashima and K. Yamashita (programmed by H. Maezawa) (Figure 3). In fact, these soundtracks work so much like a traditional three part ensemble (with a fourth percussion instrument), they have a plethora of YouTube a capella covers (of varying successes) utilizing the original writing for each channel.
First thematic section.
Second thematic section.
Figure 2. Manuscript representation of the Super Mario Bros theme.
Figure 3. Manuscript representation of the Legacy of the Wizard ‘intro’ theme.
The ubiquity of this practice has resulted in the emergence of a common technique found in Nintendo Entertainment System music- dubbed the Famichord by chip musician Linus ‘LFT’ Akesson (Figure 4). The Famichord is essentially the removal of the dominant in a four note major or minor seventh chord (Maj7omit5 or m7omit5) to fit the NES’ three channel limit. Whilst not unusual in a wider musical practice, the fact many composers independently utilised this technique makes it distinctive in PSG compositional procedure. Examples can be found in the NES soundtracks to Mega Man II (1988), Duck Tales (1989), Super Mario Bros (1985) and various others (Figure 5).
Figure 4. Manuscript example of the C Major 7th Famichord.
Figure 5. D Minor seventh (b. 1) and C major seventh (b.3) Famichord in Super Mario Bros. ‘Invincibility’ theme.
The reason the ‘ensemble’ method of PSG writing was so popular was possibly because it was a very simple way of interfacing with the chip. Examining the documentation for the various sound chips, the way in which the microcontrollers expected to be instructed was with a simple ‘channel, pitch, duration’ format, the way in which the MIDI (musical instrument digital interface) standard operates today. A melody is easily built from instructions in this way and can be (and was often) very simply converted from score to code. This could be done by either the composer himself, or given to a programmer to transcribe, which was common practice. The downside to this technique is that the music is limited to as many instruments as there are PSG channels, resulting in a textually ‘thin’ soundscape. To fill the space, the prevalent mentality of trying to recreate externally composed music on the PSG had to shift to writing music for the PSG; treating the device as a unique, independent sonic medium.
The first technique to artificially expand the sonic environment is channel sharing, or splitting parts over a single voice. This technique is especially effective when two instruments can be distinguished from one another utilising the unique functions of the PSG channel. Often, each channel had an alterable amplitude for ADSR (attack, decay, sustain and release) envelope shaping, and pulse waves frequently had the ability to alter their duty cycle (the percentage between the high and low of the wave cycle), both techniques for textural changes and emulation of instrumental transients. Even more basically, a change in pitch will dictate a sequence’s sonic responsibility (Figure 6). Figure 6 is a piece written by the author for a single beeper demonstrating the channel sharing procedure in its extreme. The bass line is separated from the melody mainly by pitch, however has the impression of being an individual instrument. Percussion is discriminated from other instrumentation by rapidly altering the channel’s pitch semi-randomly, with a negative correlation. The final element in the expansion of the soundscape is structural; by frequently altering the instrumental character in a rhythmic fashion, the listener has the impression that multiple voices are present, all achieved on a single voice. This false polyphony significantly alters the compositional process; the chip musician looks for ‘gaps’ in the melody in which to fill out the composition.
Figure 6. Scored extract from original composition for an ATMega328 microcontroller, using only a single voice. Percussion is created by rapidly changing pitch.
PSGs often had multiple channels however, and more waveforms than a single pulse/square wave. The Commodore 64 was particularly flexible as each channel could alter its waveform, allowing for a more freedom when composing for the chip. One notable example of this technique is T. Follin’s work on the NES game Silver Surfer (1990), regarded as one of the best PSG soundtracks of the era. Follin utilises the triangle channel as both bass and drums whilst frequently switching melodic ‘licks’ between available channels. Lead instrumentation is given to the first two pulse channels; handled by varying ADSR envelopes and altering pulse width, however, when a pulse channel is unused, it doubles up with the bass to provide a thicker texture. Each channel is always doing something helping to imply a greater polyphony than simply four voices.
The most common use of the channel sharing technique in other soundtrack writing was the utilisation of the pseudo-noise channel included in various PSGs, typically dedicated to accompanying the melodic or pitched elements with percussion, mimicking the characteristics of an acoustic drum kit (Figure 7). This technique is popular as a repeated hi-hat figure is heard over a snare, kick or other percussive element, even though, when shared on a single channel, it is not present. Again, this fills the soundscape with a ‘false’ polyphony using listener expectation to ‘fill in the gaps’.
Figure 7. Typical PSG noise channel writing for a drum kit pattern. Kick, snare and hi-hat are all represented on a single channel. Vertical placement is representative of speed of noise randomization, which is perceived as a change in pitch.
Perhaps the most idiosyncratic and recognized feature of PSG writing is the super-fast arpeggio 86. Essentially, this is the same technique as instrumental channel sharing, however has the unique purpose of providing a ‘harmonic compression’; reducing all explicitly stated harmonic content to the fewest voices possible, liberating channels for other purposes. Both the Famichord and the super-fast arpeggio are solutions to the same problem, however the Famichord can only accurately represent a chord of four notes or less using all available channels whereas an arpeggio can cover any number of extensions by rapidly iterating through the chord on a single channel (Figure 8). This technique’s conception is sometimes credited to M. Galway with the score to Kong Strikes Back (1985) for the Commodore 64 and often appears in soundtracks throughout the computer’s lifespan.
Figure 8. A variety of super-fast arpeggio forms, all based on a C Major 9th chord.
The main drawback to the super-fast arpeggio, and many of the channel sharing techniques, is that they are both more difficult to program, demand more processing power and use more memory than simple three/four part writing. The Famichord may represent less extended harmony, with a less elegant approach to wave function economy, however it requires only three channel instructions to the sound chip to create a chord. The arpeggiation technique requires multiple instructions to a single channel, in a very small space of time (Figure 9). Often, the read only memory (ROM) on software distribution media was very small and developers would limit their music data due to memory constraints. Gratuitous use of musical content would quickly fill the memory limits, encroaching on space needed for the rest of the software.
A l64 o5 @00 c e d g e b g >c
e g d e c d c e d g e b g >c <g b e
g d e c d c e d g e b g >c <g b e g
d e c d c e d g e b g >c <g b e g d
e c d
Super-Fast Arpeggio, C Major Ninth, 1 bar
A l1 o5 @00 b
B l1 o5 @00 e
C l1 o5 c
Famichord C Major Seventh, 1 bar
Figure 9. Comparison of a single bar Famichord command compared to a single bar arpeggio command in ppMCK MML.
It seems that, whilst PSG music was dramatically shaped by polyphonic restrictions, perhaps the main pervasive limiting factor is ultimately the available memory the composer has to work with. As with the Famichord versus the super-fast arpeggio scenario, the decision seems to be less founded in compositional choice but in a constant creative balance between musicality and pragmatics. Perhaps the main difference between the Super Mario Brothers and Silver Surfer soundtracks was not how well the latter utilised channel sharing, or the simplicity of the former’s harmonic writing, but how much memory was delegated to each respective composition.