All sound effects and music in Backcountry are generated form code.
This is a common practice in size-constrained games and demos. Even short sound effect clips can take a few kilobytes when saved as MP3, OGG, or WAV.
So instead, the idea is to use an audio synthesis API and its signal processing methods to generate sounds by transforming and modulating sound waves.
Start with an oscillator generating a basic sound wave, e.g. a sine wave.
Modulate its frequency and amplitude.
Optionally, pipe it into a filter.
Optionally, modulate the filter's settings with another oscillator.
Optionally, modulate oscillators' settings with more oscillators.
Repeat a few times to create music.
In browsers, the Web Audio API can be used for this purpose.
Backcountry uses our custom solution for creating sounds and playing them back in the user's browser.
Here are a few example sound effects which we created for Backcountry:
Gust of wind.
Gold pick-up.
Player's gun shot.
Bandit's gun shot.
Player's click on a walkable tile.
Horse neigh.
This is our favorite sound effect. We didn't have any room left to include a voxel model of a horse, so instead you can only hear them while visiting the town.
It's super simple, which is a nice way of saying that it's very limited :)
It allows creating instrument descriptions, which always use a fixed graph of audio nodes:
Two oscillators, each with a gain and a frequency envelope and a detune parameter.
A noise buffer source with a gain envelope.
A master frequency filter.
A master LFO (low frequency oscillator) which can control the other oscillators' detune and the filter's frequency.
There's no tracker, or sequencer yet. The synth is best suited to creating single-note sound effects, like gun shots, gusts of wind, gold bar pick-ups.
That didn't stop us from using it for the soundtrack, however.
The plan for 2020 is to write the sequencer.
An instrument is a set of serializable synth parameters. A sequence of notes played using an instrument is called an AudioTrack. A set of tracks is called an AudioClip.
A single note is played using the the play_note function, which sets up the entire AudioNode graph from scratch every time it's called. For each note, it creates the oscillators, connects them to their GainNodes, creates a BiquadFilterNode, etc.
This is a different approach from the one employed by other synths commonly used in size-constrained demos. Two popular ones, Sonant-X and Soundbox, preprocess all sound data into in AudioBuffers.
Sonant-X then uses AudioBufferSourceNodes to play them back.
Soundbox generates a Wave file from the buffer and plays it back in an <audio> element.
In both cases, the preprocessing step reimplements some of WebAudio API's signal processing logic to generate samples, and takes some time upfront when the game is loaded.
By using the WebAudio API directly, our approach uses less code and doesn't require the preprocessing step.
As a potential optimization, we could try to reuse GainNodes and FilterNodes, and only create new Oscillators in play_note.
We implemented rate limiting to our playback logic, to prevent sound effects from playing on top of each other. Each AudioSource component stores the time elapsed since the current clip has started playing. Each AudioClip has an Exit field which defines the minimum time after which it's OK to play another clip.
Once scheduled for playback by an AudioContext, individual notes cannot be canceled. This limitation of the API meant that when changing scenes (e.g. going from the town into the desert), the music scheduled in one scene could play over a track from another scene.
We solved this by closing the current scene's AudioContext instance (which stops all sound) and creating a new one together with the new scene. It wasn't the most elegant approach but we ran out of time to implement a better one.
On some platforms, e.g. on Safari, AudioContext instances start paused and need to be unlocked through a user's interaction. This prevents sounds from playing right after page load.
In our case, recreating AudioContexts on each scene change means that every new AudioContext starts paused. For scene transitions triggered by the player walking into a collider, the scene remains silent until the next command issued by the user.
A more correct approach to solving this problem would be to schedule only a few notes from the soundtrack at a time, and keep track of the time remaining until the next batch of notes must be scheduled.
For the music score, heavily inspired by Ennio Morricone's theme in The Good, the Bad and the Ugly, we created instruments for the main melody and the baseline, and then hand-edited the sequence of notes in a text editor.
I wouldn't want to have to do this again.
We were helped by Antoni Korzeniowski, a friend who is a musician. He "remastered" the melody of the main theme and composed the baseline.
We didn't want the main theme to become repetitive. It only plays in full when the player visits the town.
When out on a mission, the recognizable jingle plays from time to time, while the rhythmic and sinister baseline serves as the background for the gameplay.
So how does our custom synthesizer and player compare to other solutions size-wise?
The minified player code is 1630 bytes. It compares favorably to vanilla Sonant-X (5677 bytes), to sonantx-reduced.js modified by Dominik Szablewski for Underrun (3170 bytes), and to Soundbox's player-small.js (2319 bytes).
Compressed with gzip as a single file, our player is only 732 bytes. The other players are 2088 (vanilla Sonant-X), 1324 (sonantx-reduced.js), and 1325 bytes (soundbox/player-small.js) minified and compressed. When included in a larger codebase, the compression is likely better for all solutions.