Table of Contents
- Composing & Inserting Music for Pokémon Decomp Projects
- Table of contents
- Part 1: Introduction
- Part 2: Understanding the GBA M4A Engine
- How GBA m4a Engine Works
- Voice Types
- ADSR Envelopes
- Polyphony and Channel Limits
- Supported MIDI Controls
- Part 3: Understanding Voicegroups
- What is a Voicegroup?
- How Voicegroups are Organized in pokeemerald
- Reading a Voicegroup File
- Finding Which Voicegroup a Song Uses
- Adding Custom Instrument Samples
- Keysplit Tables
- Part 4: Reaper + poryaaaa CLAP Plugin
- Part 5: Anvil Studio + poryaaaa
- Part 6: Create .wav Files or Listen to Existing Songs
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Composing & Inserting Music for Pokémon Decomp Projects
Table of contents
- Part 1: Introduction
- Part 2: Understanding the GBA M4A Engine
- Part 3: Understanding Voicegroups
- Part 4: Reaper + poryaaaa CLAP Plugin
- Part 5: Anvil Studio + poryaaaa
- Part 6: Create .wav Files or Listen to Existing Songs
Part 1: Introduction
The Pokémon games use an audio engine called m4a (a.k.a. Sappy or MusicPlayer2000/MP2K), which was a very popular engine used in many GBA games. This guide will cover the capabilities of m4a, as well as how to insert/modify your own songs with a focus on using an audio synthesizer tool called poryaaaa.
Who is this guide for?
This guide assumes you already have a baseline understanding of how to use a Pokémon decomp project (e.g. pokeemerald). Assuming you already know that, then this guide is aimed for people wanting to:
- Learn about the m4a engine's capabilities and limitations.
- Import MIDI files into the game, and previewing them with accurate sound.
- DAW-based music composition and insertion.
What Is poryaaaa?
Making/inserting music for pokeemerald requires going through a tedious process of (1) making modifications to midi/voices, (2) recompiling the ROM, and (3) testing the song in-game and see if it sounds right.
To remove that pain, poryaaaa is an audio synthesizer that emulates the GBA's m4a sound engine and loads audio data directly from your decomp project. This means you can quickly make edits without needing to repeatedly compile your ROM.
poryaaaa exists as three separate tools:
- poryaaaa.clap: CLAP plugin for DAWs (like Reaper)
- poryaaaa_standalone: Standalone synthesizer GUI that receives and plays MIDI events from other programs (like Anvil Studio)
- poryaaaa_render: Command line tool that creates
.wavfiles or plays songs through your speakers
If you aren't sure which poryaaaa tool you should be using:
- "I use a DAW to compose/tweak music (e.g. Reaper)"
poryaaaa.clapCLAP plugin
- "I use a MIDI editor/tracker (e.g. Anvil Studio)"
poryaaaa_standalone+ virtual midi cable (e.g. loopMIDI)
- "I have a MIDI file that already exists in the decomp, and I just want to listen to it or create a
.wav"poryaaaa_render
poryaaaa can be downloaded from its Releases page on GitHub.
Part 2: Understanding the GBA M4A Engine
Before we get into inserting/editing music with the help of poryaaaa, we need to learn about the capabilities of the m4a audio engine. Having a decent grasp on how it works and its limitations are important for making your own music.
How GBA m4a Engine Works
The m4a engine is the standard GBA sound engine used by a large number of GBA games. It's implemented in pokeemerald across several files, including src/m4a.c, src/m4a_1.s, src/m4a_tables.c.
Songs are represented as sequences of m4a bytecode commands, not MIDI. However, those commands are fairly equivalent to MIDI concepts:
- Note on/off
- Program change (instrument/voice selection)
- Volume, panning, pitch bend, pitch modulation
Each song uses a "voicegroup" that defines up to 128 instruments/voices. For example, Littleroot Town's song's voicegroup contains pizzicato strings, fretless bass, piano, and many others. Typically, songs have their own dedicated voicegroups, but it's very possible and reasonable for multiple songs to use the same voicegroup.
For the rest of this document, we'll use the term "voice" to refer to an instrument/voice within a voicegroup.
Voice Types
Voices come in many different types.
- DirectSound (PCM samples): This is typically the most common and prominent voice type. It plays back pre-recorded audio samples with a volume envelope and optional looping. Most melodic instruments and percussion use this. For example, a violin or piano.
- DirectSound No-Resample: This plays at a fixed pitch regardless of the note. Used for drums/percussion/sound-effects where you want the sample played as-is.
- Square Wave 1: CGB (Game Boy Color) synthesizer. It produces a square wave with configurable duty cycle (12.5%, 25%, 50%, 75%). It supports hardware pitch sweep. The hardware unit that produces this square wave is effectively the same as the hardware from the Game Boy Color.
- See Wikipedia if you don't know what a square wave sounds like or looks like.
- Square Wave 2: Same as Square 1 but without pitch-sweep capability. Use this for a second square voice.
- Programmable Wave: CGB synthesizer that plays back a 16-byte (32-nybble) custom waveform. Useful for bass, pads, and other sounds not achievable with square waves. Again, the hardware unit that produces this programmable wave is effectively the same as the Game Boy Color.
- Noise: CGB noise generator. 15-bit LFSR for white noise, or 7-bit LFSR for metallic/tonal noise. Can be used for percussion like hi-hats, snares, and whooshing sound effects. And once again, this uses the same hardware from the Game Boy Color.
- Keysplit: A meta-voice that selects different sub-voices based on the note played. Used for voices like piano where different directsound samples cover different pitch ranges.
- Drum Kit (keysplit_all) — A meta-voice where each MIDI note maps to a completely different voice. This is primarily used for percussion tracks.
ADSR Envelopes
An ADSR volume envelope is a ubiquitous strategy in music and sound design to control how an instrument's volume changes while it's being played. Naturally, all m4a voices have an ADSR configured. (A=Attack, D=Decay, S=Sustain, R=Release).
- Every voice has Attack, Decay, Sustain, Release parameters.
- DirectSound voices: each parameter ranges from 0–255.
- Attack: 1 = very slow fade-in, 255 = instant
- Decay: 0 = no decay, 255 = slowest decay
- Sustain: 0 = silence, 255 = full volume of the note
- Release: 0 = instant, 255 slowest fade-out
- CGB voices (Square, Wave, Noise): limited ranges
- Attack: 0 = instant, 7 = slowest fade-in
- Decay: 0 = no decay, 7 = slowest decay
- Sustain: 0 = silence, 15 = full volume of the note
- Release: 0 = instant, 7 = slowest fade-out
One limitation with the CBG voices is that m4a is only capable of playing them at 16 distinct volumes (and only 4 for the Programmable Wave voice). This means you can't do perfectly subtle swells or long, smooth fades the way you can with directsound voices. However, in practice, this isn't too limiting.
Polyphony and Channel Limits
"Polyphony" is a fancy way of saying "how many voices can be playing simultaneously". m4a has some hard limits, and if your song exceeds the limit at any given point during playback, it will result in some of those voices being ignored and not making sound.
- DirectSound channels: Configurable up to a max of 12. By default, pokeemerald's is set to 5.
- Increasing this value is generally fine, but more channels means the CPU has to do more work, and could contribute to performance issues in the game.
- CGB channels: Exactly one per type! This is a hardware limitation. For example, the hardware is only capable of playing one programmable wave at a time, so the song can only use one programmable wave voice at a time. Same thing for square 1, square 2, and noise.
- 1× Square 1, 1× Square 2, 1× Programmable Wave, 1× Noise
- To be clear, you cannot have two simultaneous Square 1 notes--you would need to use one Square 1 and one Square 2 to have two square waves playing simultaneously.
- Total tracks: up to 10 MIDI tracks/channels for background music. (sound effects have smaller track limits.)
With 5 directsound voices, a typical song arrangement might be composed of 1 drum kit + 1 bass + 1 melody + 1 harmony + 1 pad. That's already at the 5-simultaneous voice limit! Of course, not all 5 voices would likely be playing throughout the entire song, so you can get creative and use other voices while others aren't playing any notes. Use CGB voices to play more sounds, since they don't contribute to the directsound polyphony limit.
Supported MIDI Controls
In pokeemerald, when you run make to build the ROM, the mid2agb tool is responsible for converting MIDI files to .s files, which are the final representation of the song data that go into the ROM. You don't need to understand how mid2agb works, but you do need to understand the MIDI commands that it supports, so that you can use them in your MIDI songs.
- Note On / Note Off: This is the most command MIDI event. It plays notes at a given velocity/volume.
- Program Change: Selects which voice to use (0–127) from the song's voicegroup.
- CC 1 (Mod Wheel): LFO depth (0-127). Controls how deep the note's vibrato goes.
- CC 7 (Volume): Current overall track volume (0–127). Note, this is different from Note On/Off's velocity. The note's velocity combines with this overall track volume to produce the actual volume of a note. You can change this Volume MIDI event while a note is playing, too, whereas a note's velocity only applies once when the note is triggered.
- CC 10 (Pan): Stereo position (64 = center, 0 = hard left, 127 = hard right).
- CC 20 (Bend Range): Pitch bend range in roughly semitones (default: 2). For example, if set to 12, a full pitch bend would result an octave of pitch change.
- CC 21 (LFO Speed): Rate of vibrato effect. (0-127, higher means faster)
- Pitch Bend: How far to bend the current note's pitch down or up (-64 - +63). The extent of the pitch bend is controlled by CC 20 (Bend Range) described above.
Part 3: Understanding Voicegroups
What is a Voicegroup?
A voicegroup is a list of up to 128 voices. Each song references exactly one voicegroup, but it's possible for multiple songs to share the same voicegroup.
As mentioned earlier, the MIDI "Program Change" event selects which voice will be used by the track's notes. So, a Program Change of 0 would mean the midi notes will produce sound using the first voice in the voicegroup (which is usually a drumset), Program Change 17 would use voice index 17, and so on.
How Voicegroups are Organized in pokeemerald
Voicegroups are defined in individual files (e.g. sound/voicegroups/<name>.inc).
Sometimes, voices can have sub-voicegroups, like drumsets: sound/voicegroups/drumsets/<name>.inc
For keysplit sub-voicegroups (like trumpet or piano): sound/voicegroups/keysplits/<name>.inc
All voice groups are included via sound/voice_groups.inc. So if you add your own voice group, you need to make sure it's included there.
Note, pokefirered and pokeruby currently use a different layout, which is one giant voicegroups file: sound/voice_groups.inc
Reading a Voicegroup File
Let's take a look at a real voicegroup file, sound/voicegroups/petalburg.inc, and see some examples of voice definitions. The first thing you'll notice is that there are many identical voices, all with type voice_square_1. Songs in pokeemerald tend to define all 128 voices, with instruments somewhat mapped to traditional MIDI programs (e.g. piano at index 1, strings at index 48). The unused voices tend to be set to voice_square_1.
Let's look at the interesting voices:
voice_group petalburg
voice_keysplit_all voicegroup_petalburg_drumset
voice_keysplit voicegroup_piano_keysplit, keysplit_piano
...
voice_square_1 60, 0, 0, 3, 0, 2, 0, 0
voice_square_2 60, 0, 3, 0, 6, 0, 0
...
voice_directsound 60, 0, DirectSoundWaveData_sc88pro_fretless_bass, 255, 253, 0, 149
voice_programmable_wave 60, 0, ProgrammableWaveData_1, 0, 7, 15, 1
These voice definition are created with the voice_ macros, which can be found in asm/macros/music_voice.inc. Each macro controls the configuration of the voice type. The last 4 numbers are the ADSR (Attack, Sustain, Decay, Release) volume envelope described earlier in this guide. The rest are fairly self-explanatory, and can be referenced in music_voice.inc if necessary.
Personally, when I create new songs, I create minimally-sized voice groups, with no regard for traditional MIDI program instrument numbers. However, that means that listening to the MIDI with a standard audio player will result in gibberish. Whether or not that's important is up to you.
Finding Which Voicegroup a Song Uses
To find or configure which voicegroup a song uses, simply look at sound/songs/midi/midi.cfg. This file controls how mid2agb converts each .mid file. The -G parameter specifies the voicegroup name. This is also where things like the master volume (-V) and reverb (-R) are configured for each song.
Adding Custom Instrument Samples
As described earlier, the "directsound" voice plays a pre-recorded sample (e.g. a short looped recording of a trumpet). The sample files are all .wav files located in sound/direct_sound_samples/. If a sample is meant to be looped in order to play a held note, then the .wav file must contain a loop marker (more on that in a second).
- Prepare/create/obtain your
.wavfile: it should be 8-bit and mono channel.- Typically, you'd want to use a low-ish sample rate like 13379Hz, to keep ROM size smaller.
- To add looping, I recommend using Wavosaur.
- Open your
.wavfile. - Choose
Tools -> Loop -> Create loop points. - Adjust the loop start and end until you're happy with how it sounds when looping during playback.
- Save the new
.wavfile.
- Open your
- Place your
.wavfile insound/direct_sound_samples/. - Add an entry for it in
sound/direct_sound_data.inc. - Add it to your voicegroup with
voice_directsound.
When you compile the ROM with make, the .wav file is automatically converted to a .bin file, which is what actually goes into the ROM. The wav2agb tool is what performs that conversion.
Keysplit Tables
Keysplit tables allow mapping different MIDI note ranges to directsound samples. This is used when a single sample doesn't sound good across the entire range of notes. For example, piano and trumpet both use keysplit tables.
Rather than going into details here, it's pretty easy to copy those existing examples if you need to make your own.
Part 4: Reaper + poryaaaa CLAP Plugin
If your use case is "I use a DAW to compose/tweak music (e.g. Reaper)", then this section will be helpful! Below, we walk through how to use poryaaaa with Reaper--a popular DAW.
Step 0: Create a new empty project in Reaper.
File -> New Project
Step 1: Create an empty track.
Track -> Insert New TrackorCtrl + T
Step 2: Name that track to something like "poryaaaa master"
- Simply click on the area next to the red circle recording button and type the name
-
Step 3: Add the poryaaaa CLAP plugin as an FX to the "poryaaaa master" track.
- Click on the "FX" button to the right of the track's name. Or, click on the top slot in the track's Mixer view in the bottom left corner of the screen.
-
-
- If you can't find the poryaaaa CLAP plugin there, it means you haven't installed
poryaaaa.clapin a location that Reaper knows about. Copyporyaaaa.clapinto one of the CLAP plug-in paths in these Reaper settings, and then click "Re-scan". After re-scanning, you should be able to find the poryaaaa plugin when adding it to the track. -
At this point, your project should look like the image below. One track with the poryaaaa plugin FX, and the poryaaaa GUI should be functional and be editable.
Step 4: Create another blank Track. So now you have two total tracks in the project.
Step 5: Drag a .mid file from pokeemerald into that newly-created second track. Import the MIDI with all of the default options checked.
Now, your project should look like this: one "poryaaaa master" track, and all of the individual tracks with actual midi notes on them. (Tip, you can do Ctrl + Mouse Scroll to shrink/expand the vertical size of the tracks).
If you press Play now, you won't hear any music. This is because the song's tracks are only producing MIDI notes right now. No synthesizer is actually listening to those MIDI notes and turning them into audio. That's where poryaaaa comes in. We need to route each of those tracks into the "poryaaaa master" track, so that it receives their MIDI notes and turns them into audio.
To do that, we click on the "Route" button in the Mixer view. Let's start with the first song track (which is track #2).
Add a new "send" from this track to the "poryaaaa master" track.
Configure this new "send" so that it's sending ALL midi events to "poryaaaa master" tracks' MIDI channel 1.
Now, do that for each of the remaining tracks, but send it to the NEXT available midi channel on the "poryaaaa master" track. So, the next track would send to midi channel 2, the next would send to channel 3, and so on.
And that's it! If you press Play from the start of the song, you should hear the audio. In the poryaaaa GUI, make sure the currently-loaded voice group matches the song you're playing.
We just went through an example where we imported an existing MIDI file. However, the same exact steps apply if you're creating a song from scratch (ignoring the part where we imported an existing MIDI file)!
Assigning Instruments with Program Change
"PC / Program Change" midi events are how you specify which voice the current track is playing. To view/modify the Program Change events in Reaper, double click on the track. Then, open the "Event List" view. The "PC"-type events are the ones that select the voice. Here it's using voice 48 for Verdanturf's song. If you open sound/voicegroups/verdanturf.inc, you can see that the 48th voice is voice_keysplit voicegroup_strings_keysplit, keysplit_strings. So, for example, you can modify that line to a different instrument, then press the "Reload" button in poryaaaa's GUI to load that modification.
Typically, the PC event will be one of the very first MIDI events in a track. It's normal to switch to different voices with a PC event at different parts of the song, so you can get creative. Though, it's most common to just use that same voice for the whole duration of the song. (i.e. A track would just always play "trumpet" or "piano".)
Adding Loop Markers
Most songs want to use looping so that they play forever. It would be awkward if the background music in Littleroot Town ended just because it reached the end of the song.
To specify where the song loops, the resulting MIDI file needs to have a pair of text markers. The [ marker is the start of the loop and ] is the end. To add these markers in Reaper, choose Insert -> Marker or right-click on the timeline and select Insert marker....
Exporting Your Song as a MIDI File
Once you're happy with your song and want to insert it into the game, we need to export the project as a MIDI file from Reaper.
Choose File -> Export project MIDI..., and use the following options:
- Multitrack MIDI file (type 1 MIDI file)
- Embed project tempo/time signature changes
- Export project markers as MIDI: markers
-
Add the resulting MIDI to pokeemerald's sound/songs/midi/ directory. Then, ensure you've updated the following files appropriately with your new song (or if you're replacing an existing song):
sound/songs/midi/midi.cfgsound/song_table.incinclude/constants.songs.h
Part 5: Anvil Studio + poryaaaa
Anvil Studio has traditionally been a popular tool when preparing MIDI files for ROM hacks, which is why I'm including this section. I don't personally use it, so I'll mainly just be covering how to use poryaaaa with Anvil Studio.
Anvil Studio doesn't support CLAP plugins, so we can't use poryaaaa right out of the box, like in the Reaper guide above. Instead, we'll need to use a "virtual MIDI cable" program to route MIDI events to poryaaaa_standalone.
Installing loopMIDI (Windows)
The virtual MIDI cable we'll use is called loopMIDI. It's very simple to use with almost no setup.
- Download loopMIDI: https://www.tobias-erichsen.de/software/loopmidi.html
- Run loopMIDI, and ensure it has created an actively-running virtual MIDI port. If nothing shows up, click the "+" button in the bottom corner.
-
Configuring Anvil Studio
With loopMIDI running, open Anvil Studio, and load your MIDI file. We need to set the "Device" on all the tracks to output their MIDI events to the virtual loopMIDI port.
- Navigate to
View -> Synthesizers, MIDI + Audio Devices - Double click on
loopMIDI Portin the "MIDI Out Devices" list.- Ensure
loopMIDI Portis DISABLED in the "Enabled MIDI In devices" list! -
- If you don't see
loopMIDI Portin the list, try the following:- Close Anvil Studio.
- Ensure other MIDI-related programs (e.g. a DAW like Reaper) are closed.
- In Windows Services, restart the "Windows MIDI Service"
-
- Reopen Anvil Studio and see if it shows up.
- Ensure
- Return to the Mixer view (
View -> Mixer). All of your tracks should list list the loopMIDI port in the Device column. This means they are sending their MIDI events to the virtual MIDI port. -
At this, point you won't hear any audio if you play the song, which is expected, because nothing is listening to the virtual MIDI port, yet!
Connecting poryaaaa_standalone
Now that we've wired up Anvil Studio with the virtual MIDI port, we can use poryaaaa to play the audio.
- Run
poryaaaa_standalone.exe(e.g. by double clicking on it). This will show the poryaaaa GUI.- poryaaaa automatically binds to the virtual MIDI port, so it's now ready to play audio from the MIDI events originating from Anvil Studio.
- In poryaaaa's GUI, set the project path and voice group appropriately and click "Reload".
- Play the song from Anvil Studio, and you should hear the audio!
Part 6: Create .wav Files or Listen to Existing Songs
poryaaaa comes with a tool called poryaaaa_render. It's a command-line program that lets you listen to existing MIDI files from the game, or create .wav files for them.
See the official poryaaaa README for full usage instructions.
As an example, if I wanted to create a 5-minute rendered .wav file of Petalburg's theme, this command will accomplish that:
poryaaaa_render.exe /path/to/pokeemerald petalburg --midi /path/to/petalburg.mid --output petalburg.wav --reverb 50 --song-volume 80 --total-duration-seconds 300