Have You ever heard about the Virtual Barber Shop? If not, then listen to this (use headphones!):
Sounds amazing, right? Well, the recording is almost 20 years old. The technique itself is more than two times older. Yet, there are hardly any games that make use of that and if they are, they are not AAA titles…which is strange because (as You could hear) the effect is incomparable with the stereo we’re used to and the method itself (as You’ll read later on) is very easy to implement. Actually, the only downside is that it works only on headphones.
But what is this magic actually?
The idea is to record..what our ears really hear. How do You do that? Well, you simply put the microphones in the…ears.That’s it. As for binaural recordings, there isn’t really much more to it. The effect is similar to what You can hear in the video above – You feel like you’re really there.And that is because of the obvious fact that microphones placed in your ears record sound waves almost the same as those that would normally hit your inner ear. As a consequence, such a recording already contains all the positional cues that our brain uses to localize the source of sound and other techniques try to reproduce. Moreover, it includes all the effects like scattering and diffraction of the sound wave on the head, pinna and other parts of the body that are inseparable from the sounds that we hear everyday and are very hard, if not impossible to reproduce.
There is one catch though : Everybody is different. The size and shape of the head, dimensions of the pinna and other parts like torso are naturally different for each person, causing slightly different effects for different wave lenghts – therefore recording made inside one person’s ears may sound unnatural for the other. If you would examine a head as a filter then every person’s would have his own, unique filter characteristic that ultimately shapes the final sound that the brain analyses. This leads us to…
Head Related Transfer Function (HRTF)
If our head (mainly, but other parts of the body too) acts as a filter, then HRTF is nothing more than that filter’s transfer function. HRTF is an information on how your head, including the pinna acts on different frequencies of the incoming sound. That information is crucial to our brain and one of the main mechanism of determining the direction of the incoming sound: it is, most importantly, used to determine the location of sound source lying on the cone of confusion (where the Interaural Time Difference and Interaural Level Difference are the same). HRTF is a rather complicated function of frequency and direction. For every direction, the phase and frequency response “of our filter” is different.
For a sound source directly ahead, HRTF is roughly similar for both the left and right ear because of the symmetry of our body. You can see peaks and notches on the plot which are caused by various wave effects, such as reflection of the pinna, diffraction on the head, etc…
For the sound source not on the axis of symmetry the biggest difference is in attenuation – You can see on the plots above that for the sound source to the one side, the ear on the opposite side of the head receives much less acoustic energy (that is intuitive). Generally, the higher the frequency, the higher the attenuation.
There are many detailed papers on the subject of HRTF – the topic was studied extensively and many measurements were done. I’ve skipped a lot of details because i want to focus on the most interesting use of HRTFs from the game audio point of view, which is…
Using HRTF to create virtual surround sound
Let’s say You have a HRTF for some direction V. What would happen if You filtered a raw (mono) signal with that HRTF for of the ears? You get a signal picked by that ear like it would come from the direction V. Apply this logic for the other ear and You have a stereo sound that is almost exactly the same like it would be a binaural recording. See where this is going? It is possible to position a sound source in any point in 3D space, given only the HRTF corresponding to that point. This is one of the most popular use of HRTF – positioning sound sources in a virtual space: a 3D panning. Note, that with standard panning techniques it is possible to position a sound source only in front, in a 2D plane (no elevation). There is no way to position a sound source behind, above or below. It is possible using HRTF.
Because of that, there exists entire databases of HRTF. CIPIC is probably the most popular one. HRTFs are recorded for a set of points that form a grid around the listener.
In practice, HRIR (Head Related Impulse Response) are measured. HRIR is nothing more than an Inverse Fourier Transform of the HRTF. It is the impulse response of “our head”. Since filtering is the process of convolution of the input with filter’s impulse response, the process can be easily reversed and that’s called deconvolution. In this case the filtered sound is recorded by microphones in the ears and the raw signal is used to find the unknown: the impulse response.
To sum up, the process of positioning a sound source using HRIR (HRTF) is very easy: It is simply a convolution of the mono signal with HRIR for left and right ear. With this simple process, You can achieve the same effect that You heard in “Virtual Barber Shop”.
However, there is one obvious problem: You would need an infinite number of HRIRs to cover every possible angle. If You want to be able to position a source in ANY position, some kind of interpolation is needed. I’ll talk about this problem and possible solution in the next post, where I will also show how to program a 3D panner using HRIRs. Thank You for reading.