As my ongoing audio initiative continues, the advent of easy-access music on Virtual Boy draws near. More information on that project will be on its way, but of course it led me into music theory, so let's talk music theory!
Have you ever sat down and tried to formulate some technical specification to answer the question "what is music"? If one is to implement music on a computer such as the Virtual Boy, that's the first question that has to be answered. I boiled the question down to its most fundamental elements, and came up with the following: music is some arrangement of one or more notes of an audible tone, each with a starting time, a duration and a frequency.
Therefore, in its simplest form, music on Virtual Boy needs to be a program capable of producing such notes. Of course other characteristics of notes are also useful, such as the exact sound used as a tone, the amplitude or volume of the note, and the note's ability to change its properties while it's playing (such as fading out or raising in pitch). Either way, we're looking at musical notes that start and end.
I know, I know, this seems obvious, but you'll find that happens a lot in programming. You have to think about the simple things because you have to tell the program about those simple things. Computers don't make assumptions.
Virtual Boy supplies audio hardware that gives us some useful features, but also imposes some limitations. We can use five PCM buffers to specify single cycles of wave forms, which is useful for producing different sounds of our design. However, these buffers are only 32 samples in size and they can't be written to while sound is being generated. It gives us five channels that can play simultaneously using any of the five waves, which is wonderful, but at the end of the day limits us to five simultaneous notes. There is also a sixth channel that produces pseudorandom noise, useful for its own little subset of audio effects.
There are some other features in the audio hardware that aren't especially useful for music, or are things that can be simulated through software, so I won't mention them here.
A simple musical track would initialize the PCM buffers with some waveforms, then have a schedule of notes to play on each channel. This is basically what MIDI does. But I think we can take it a step further, which leads us into technical territory...
__________
One of the specific goals of my project is efficiency. Not just fast code, but small data too. I want it to be so technically insignificant that anyone can ask themselves, "Do I have the resources to incorporate this into my project?" and answer yes every time. To that end, a few optimizations can be made.
Data compression is an example of the archetypal "double edged sword". While it reduces the number of bytes needed to represent something, it necessarily increases the processing of that thing by necessitating a decoding routine, which often requires more memory to work in. I'm not keen in using something like deflate on Virtual Boy, since that's a pretty big code and memory requirement that don't really fit in with my project goals.
Most data compression works by eliminating redundancy in the data. This is easy to do, because the data can simply specify "Remember that block of bytes from earlier? Let's do that another 20 times" instead of actually including that block of bytes 20 more times. Think of something like a drum loop. In many musical tracks, long portions of percussion are the same technical notes repeatedly. So instead of storing them who knows how many times, why not store them once and play them repeatedly?
The working draft I've got in front of me here allows the music engine to read data directly from ROM without decompressing it or anything, but still eliminate redundancy. This should get the best of both worlds, fingers crossed.
We're also talking
video game music, so I want to incorporate some features that are specifically tailored for that. I don't want to spill all the beans right now because I want to be able to hype it up later, but just consider a few things...
• Video game music often repeats to an extent, but generally won't repeat an intro.
• Some music changes a bit depending on game events. Mario underwater almost always sounds different.
• Conversely, some game events are
dependent on the music.
__________
The topic of frequency raises an interesting technical challenge. I'm kinda proud of this one. (-:
When interpolating between two values, such as when a note fades out, you smoothly transition from a starting value (hereby called "Left") and an ending value ("Right"). For volume in a fade, this can be done linearly with the usual interpolation formula:
Value = Left + (Right - Left) * Percent
Frequency doesn't work quite like that, though. At least not in the sense of an audible tone. To increase a note by one octave is to double its frequency. For example, raising 100hz by one octave is 200hz. Raising that another octave is 400hz. Since 100hz is two octaves below 400hz, it could be said that 50% of the way between them is just one octave, or 200hz. This is the intuitive way to notate the frequencies on a keyboard or piano roll.
Using the linear interpolation formula, you'd wind up with 250hz, so that's a snag. This of course is an exponential interpolation, and the formula is markedly similar. You just kick all the operators up a notch:
Value = Left * (Right / Left) ^ Percent
Easy enough in concept, but have you ever tried to do exponentiation with non-integer powers algorithmically? Yuck. Even using the industry-standard pow() function A) requires double-precision floats which VB doesn't natively support, and B) involves an infinite series and is thus quite slow. It's just really icky all around, but I won't yield!
Further complicating matters is the fact that the Virtual Boy audio hardware doesn't accept hertz directly. It samples from the PCM buffers on a 5MHz timer with a delay given by the CPU. Since there are 32 samples per PCM buffer, one could reason that the tone cycle frequency is 5MHz / 32 = 156250hz. The delay is 11-bit and inverted so that lower values yield lower frequencies, making the actual tone formula the following:
ToneHz = 156250 / (2048 - Value)
In other words, to interpolate between audible frequencies in this music engine, I have to take the following equation:
Left * (Right / Left) ^ Percent = 156250 / (2048 - Value)
... and solve for Value. Fortunately, a keen associate of mine helped me work through it, and I'm pleased to report that we devised an algorithm to get there in fewer than 25 CPU cycles for any input of Left, Right and Percent. The data stored in ROM does not increase in size, and the lookup data is minimal.
I need to come up with some way to reward the other guy for being such a big help.
__________
While I finish finalizing this draft of the spec, let's take a moment to talk music. What problems and solutions have you guys been a part of in your history of working with the technical nitty-gritties of music?