Tag Archives: unity

Third-party audio spatialization plugins – future of the 3D audio in games?

With the upcoming release of the Oculus Rift and overall increased interest in virtual reality (VR) in gaming came also the development of 3D game audio. Finally, sound starts to be recognized as a vital part of the immersion in games, especially those using virtual reality technology. Still, major focus is being put on graphics – this is where 95% of the VR development is happening but at the same time there is an avalanche of free and commercial spatialization engine plugins/libraries that implement 3d (HRTF) panning supported by real-time reverberation with simple room modelling. Examples are: 3Dception by Two Big Ears, Oculus Audio SDK by Oculus VR, Phonom by Impulsonic…you can find many more but all are very similar when it comes to features. The details lay in sound quality, performance and price. Almost all of them allow integration with major game/audio engines: Unity, Unreal (only recently), Wwise, Fmod, which brings me to the topic of this post: are such 3rd-party plugins the future of 3D audio in games?

Currently, there are two dominant game engines on the market: Unity and Unreal Engine 4. Both have very basic built-in audio engines and offer no built-in 3D audio/spatialization (note that stereo panning and distance attenuation is not 3d audio). Neither does any of the two most popular audio middleware: Wwise and Fmod (Both offer 3D audio only via paid plugins). The only free solution are the libraries/plugins i mentioned above.

Let’s look at Unity. The way all spatialization plugins for Unity work is simple: they grab audio samples in OnAudioFilterRead method in a script attached to the sound source to spatialize,  process it on the native (C++) side via .dll that resides in the Plugins folder and return it back to Unity (managed code) to mix. This, however is not ideal and induces a significant overhead. However, the guys behind Unity seem to have recognized the need for 3D audio and the problem with current approach and included the Audio Spatialization SDK on their recently announced roadmap (full link here):

Capture

The feature, however, went from “on track” to “at risk”. Nevertheless it clearly means that Unity is developing a way to support those spatialization plugins instead of working on expanding their own audio engine capabilities in that area. Seems like a smart move. Why invent the wheel again? Let’s see how it looks in the case of Unreal Engine.

While Unity almost from it’s inception provides an easy way to access raw samples, Unreal didn’t offer any way to do it up until few months ago other than modifying the engine itself (it’s open source). In their release that came out few months ago they integrated the Oculus Audio SDK into the engine itself (available as a plugin) although only in the lower quality version and only on Windows 64-bit. What’s important is that while doing that, they created a way to get to raw audio samples by implementing your own audio spatialization plugin. This, however, is no where documented. It basically comes down to implementing the IAudioSpatializationPlugin and IAudioSpatializationAlgorithm interfaces, the latter one being where raw audio samples can be accessed. This is what the original source code comment says about it:

/**
* IAudioSpatializationAlgorithm
*
* This class represents instances of a plugin that will process spatialization for a stream of audio.
* Currently used to process a mono-stream through an HRTF spatialization algorithm into a stereo stream.
* This algorithm contains an audio effect assigned to every VoiceId (playing sound instance). It assumes
* the effect is updated in the audio engine update loop with new position information.
*
*/

While implementing that it’s also necessary to implement own XAPO (XAudio2 custom effect). Other than that it’s an ordinary Unreal plugin (module) so it’s possible to load own .dll and process the audio there, just like in case of Unity plugin. However, the whole process seem hacky and is full of pitfails and the lack of documentation doesn’t help. Most importantly, it’s Windows 64-bit only. It seems like it’s just a byproduct of adding support for Oculus Audio library that is yet to be fully implemented and officially released, but it’s there. Also, Unreal 4 audio side is undergoing heavy redesign at this moment because of the fact that audio in the engine is implemented differently for every platform to the point where they admitted that “every platform has almost it’s own audio engine”. After they sort that out it seems very likely they will provide a way to create custom spatialization effects in a cross-platform manner the way Unity plans to and the aforementioned interfaces are a big hint.

Since Unity and Unreal are the two giants in the game industry and most of the games in the near future will be created with their use, it seems like the future of 3D audio depends on those specialized in spatialization libraries/plugins/SDKs. Hopefully, this time, the audio development won’t get halted by some artificial patents and copyrights and we won’t end up stuck with the 3D counterpart of the Dolby Digital. Most importantly, there is a chance that due to the availability of good 3d audio solutions we will finally start catching up to graphics.

Air apsorption of sound as a digital filter – Part 2: Implementation in Unity (c#)

In the previous post i described the theory on how to design a digital filter that will simulate absorption of sound due to propagation through air, using a standardized acoustic absoprtion model and simple filter design method. In this post i’ll show how to put that theory into practice by writing a script for Unity 3D engine that attached to a sound source will filter audio coming from it, depending on player’s distance from the source. Unity is a very popular, fully-featured game engine and environment – i chose it as an example because it allows for very straight-forward implementation of what we want to achieve here and because the script i’ll write for it (in c-sharp) is simply a set of clsses/methods that can be easily used/rewritten elsewhere.

Unity is a component driven engine, this means that all functionality of a game object is encapsulated in the form of attachable components that have the ability to communicate with each other. This includes all built-in components like transform, mesh, shaders, coliders, audio source etc…and user-written scripts. These scripts are nothing else than custom Components and allow for programmatical way to manipulate objects, control components and the only way to implement game logic, custom behaviour and methods, user input…basically anything essential for more than a simplest game to work.

Scripts for Unity can be written in C#, JavaScript and Boo. I will use C#. Creating a script is very simple. We must provide a definition of a class that inherits from MonoBehaviour, the base class for every script:

using UnityEngine;
using System;

public class SoundAirAbsorption : MonoBehaviour {

	void Start () {
        //initialization
	}

	void Update () {
        // code to be executed every frame
	}

    void OnAudioFilterRead(float[] data, int channels) {
        //filtering here
    }
}

By default , we only have two methods definitions: Start() and Update(). Start() is called during script initialization, Update() is called every frame. We will fill them later, cause we don’t need them to do actual filtering. Custom filters in Unity are implemented using the OnAudioFilterRead method. It’s called everytime audio data from the audio source component (or previous filter in the chain) is ready to be processed. Parameters: data is an array of floats ranging from -1 to +1 and represents interleaved samples; channels is number of channels (left, right speaker and so on…) in the array – we’ll use that information to deinterleave the data, later.

Ok, first, let’s write our air absorption model. As a reminder we’re using this set of equations: LINK. Example class could look like this (public fields/methods highlighted):

class AirModel
{

    double Fr_O; //relaxation frequency of oxygen
    double Fr_N; //relaxation frequency of nitrogen

    const float T0 = 293.15f;
    const float T01 = 273.16f;  

    float T;
    public float Temperature
    {
        get { return T; }
        set
        {
            if ((value >= 0) && (value <= 330))
	    {
                T = value;
            	updateRelaxFs();
            }
        }
    }

    float hr;
    public float Humidity
    {
        get { return hr; }
        set
        {
            if ((value >= 0) && (value <= 100))
            {
                hr = value;
            	updateRelaxFs();
	    }
        }
    }

    float ps;
    public float Pressure
    {
        get { return ps; }
        set
        {
            if ((value >= 0) && (value <= 2))
	    {
                ps = value;
            	updateRelaxFs();
            }
        }
    }

    public AirModel(float temp_kelvin, float h_relative, float atm_pressure = 1f)
    {
        T = temp_kelvin;
        hr = h_relative;
        ps = atm_pressure;
        updateRelaxFs();
    }

    public double getAbsCoeff(float f)
    {
        float F = f/ps;       
        return 20 / Math.Log(10) * F * F * (1.84 * Math.Pow(10,-11) * Math.Sqrt(T/T0) +
            Math.Pow(T/T0,-2.5) * ( 0.01278*Math.Exp(-2239.1/T)/(Fr_O+F*F/Fr_O) + 0.1068*Math.Exp(-3352/T)/(Fr_N+F*F/Fr_N) )) * ps;
    }

    void updateRelaxFs()
    {
        double h = hr * Math.Pow(10, -6.8346 * Math.Pow(T01 / T, 1.261) + 4.6151) / ps;
        Fr_O = 24 + 40400 * h * (0.02 + h) / (0.391 + h) / ps;
        Fr_N = Math.Sqrt(T0 / T) * (9 + 280 * h * Math.Exp(-4.17 * (Math.Pow(T0 / T, 1f / 3) - 1))) / ps;
    }
}

This is rather self-explanatory. Since this model is only valid for specific conditions, changing temperature, humidity or pressure requires bounds checking. I implemented those members as c# properties because they are more convinient to use than standard “setX” and “getX” in such simple case. The updateRelaxFs() method is called each time one of the air properties is set and it’s job is to recalculate oxygen and nitrogen relaxaion frequencies. The getAbsCoeff() method returns an absorption coefficient for given acoustic frequency (and current atmospheric properties) in dB/m.

Next, let’s create a filter class, which we will use in our script to do actual audio filtering. What methods and members should this class have? Starting with the most important ones: an array of impulse response (preferably a private field) and a method (preferably public) that will create/update that impulse response. Such method must take two parameters: the distance to the sound source and desired filter length. In the code below I use an alogirthm from the previous post:

class AirAbsorbFilter
{
    public readonly AirModel m_AirModel;
    Complex[] m_ImpulseResponse;
    int m_SamplingRate;

    public void updateImpulseResponse(float distance, int N)
    {
        // design filter based on distance and current atmospheric properties
        // 'Frequency sampling' method

        Complex[] H = new Complex[N];
        float df = m_SamplingRate / N;

        // sample in (0,fs/2) range
        for (int i = 0; i < H.Length / 2 + 1; ++i)
        {
            float a = (float)m_AirModel.getAbsCoeff(df * i);
            H[i].Re = Mathf.Pow(10, -a * distance / 20);
        }

        //mirror DFT with respect to N/2+1 sample
        for (int i = H.Length / 2 + 1; i < H.Length; ++i)
            H[i] = H[N - i];

        //do IFFT to get impulse response
        FourierTransform.FFT(H, FourierTransform.Direction.Backward);

        //Aforge's FFT/IFFT is not normalized, divide by N
        for (int i = 0; i < H.Length; ++i)
            H[i] /= N;

        //impulse response
        m_ImpulseResponse = H;
        //shift by N/2
        Util.shiftArray<Complex>(m_ImpulseResponse, N / 2);

        //blackman window
        float[] blackman = Util.blackmanWindow(m_ImpulseResponse.Length);
        for (int i = 0; i < m_ImpulseResponse.Length; ++i)
            m_ImpulseResponse[i] *= blackman[i];

    }
}

This method designs a filter, create an impulse response and multiplies it by a blackman window. Parameter N is the filter’s length (filter’s order plus one). Impulse response is stored as m_ImpulseResponse member. The Complex data type and FourierTransform are part of the Aforge.NET library (link at the end). H is the array of sampled (desired) frequency response. At line 18, we get an absorption coefficient for the next sampled frequency and in the next line convert it to a linear scale. We sample only the real part of the spectrum! Parameter FourierTransform.Direction.Backward (line 27) means we perform the inverse Fourier Transform. Dividing the product of the ifft by N (line 31) is necessary if we want it to be consentient with the definition of normalized IDFT. To shift an array in place and apply a window i wrote functions that i put in a class called Util:

class Util
{
    public static void shiftArray<T>(T[] array, int N)
    {
        T[] temp = new T[array.Length];
        System.Array.Copy(array, temp, array.Length);
        for (int i = 0; i < array.Length; ++i)
            array[(i + N) % array.Length] = temp[i];
    }

    public static float[] blackmanWindow(int N)
    {

        int M = (N % 2 == 0) ? N / 2 : (N + 1) / 2;

        float[] win = new float[N];
        win[0] = win[N - 1] = 0f;
        for (int i = 1; i < M; ++i)
            win[i] = win[N - 1 - i] = 0.42f - 0.5f * Mathf.Cos(2 * Mathf.PI * i / (N - 1)) + 0.08f * Mathf.Cos(4 * Mathf.PI * i / (N - 1));

        return win;
    }
}

To generate a Blackman window i use this formula (the same one Matlab uses):
blackmanwhere N is the length of a window and M is equal to N/2 for N even and (N+1)/2 for N odd.

All right, so now that we have code that creates our filter’s impulse response based on a distance, let’s write the code that does the filtering (at last!) Impulse response is all we need to filter any signal. The equation for a FIR filter is:
firwhere y is the output (filtered) signal, x is the input signal, h is an impulse response and N is length of the impulse response (filter’s order plus one). We can see that “filtering” with a FIR filter is basically multiplying delayed samples of the input with filter coefficients (impulse response) and summing them. This equation is very similar to the equation defining a linear discrete convolution. Indeed, FIR filtering means to convolute an input with filter’s impulse response.

Convolution is a costly operation and in the form above its complexity is O(N*M) where N and M are lengths of x and h. This however can be very much reduced using the fact that the Fourier transform of convolution of two signals in time-domain is equal to multiplication of Fourier Transforms of those signals. To be more precise, convolution of x and h is equal to IDFT( DFT(x)*DFT(h) ) [1]. Using FFT/IFFT to compute DFT/IDFT the complexity is reduced to O( M*log(M) + N*log(N) + (M+N)*log(M+N) ). However, there are is an issue with with this approach: the result of [1] is a circular convolution, not linear. Circular convolution is a slightly different operation and simply using [1] would yield erroneous result (if used for filtering) , but there is a condition under circular and linear convolutions are equal and the solution is rather simple one:

Let Nx = length(x) and Nh = length(h). If we zero-pad both x and h to the length of at least Nx+Nh-1 and then perform circular convolution, the result will be equal to the linear convolution for the first Nx+Nh-1 elements.

Simple, isn’t it? To implement very efficient filtering we just need to add zeros to both audio signal and impulse response so that they have length of L >= Nx+Nh-1 , perform FFT on both , multiply, take IFFT and take only first L samples. Preferably, L should be equal to 2^p so that we take advantage of the fastest version of FFT. Many libraries can only perform FFT of length that is a power of 2 and Aforge.NET is one of them. The overhead caused by L being bigger that it’s necessary is negligible when compared with reduction in numbers of computations gained by Radix-2 FFT.

To implement the filter in our case there’s one last thing that needs to be done. We need to know how to filter in real-time, when the input signal is splitted into blocks/chunks. That is how the audio data that is being routed through OnAudioFilterRead function: basically, it is invoked (called) everytime a next chunk of audio is ready to be filtered and that chunk is passed as the data parameter. It’s not guranteed to be the same length everytime but it should consist of samples that come immediately after the previous samples in the audio file (otherwise there would be audible distortions) We can’t just filter every chunk independently: sum of convolutions is not equal to the convolution of the sum (sum in the sense of array merge). The problem is so called edge effect of convolution: there is a start-up transient of length Nh – 1 samples (Nh – length of impulse response) due to the lattency of the filter. If we were to simply perform a convolution in every call then this effect would occur for every chunk, resulting in a distortion in a continous output.

Eliminating this edge effect is possible by overlapping edge samples from each block, either input or output samples. Two , very well-known algorithms exist , named: overlap-save and overlap-add, that overlap input samples and output samples, respectively. In our case, we need to use the overlap-add method: for every block we need to store last K – Nx samples from output (result of the convolution) and add them to the output of the next block.

The code for our filter class with added methods and array that will store the samples between calls to the filter function:

class AirAbsorbFilter
{
    public readonly AirModel m_AirModel;
    public Complex[] m_ImpulseResponse;
    int m_SamplingRate;
    // two channels - stereo 
    float[][] m_buffers = new float[2][];

    public AirAbsorbFilter(int sampling_rate)
    {
        m_SamplingRate = sampling_rate;
        m_AirModel = new AirModel(293.15f, 50);
        for (int i = 0; i < m_buffers.Length; ++i)
            m_buffers[i] = new float[0];
    }

    public void updateImpulseResponse(float distance, int N)
    {
       // ...
    }

    public void filter(float[] data, int num_ch)
    {
        //deinterleave data for filtering
        float[][] channels = Util.deinterleaveData<float>(data, num_ch);
        //filter every channel
        for (int ch = 0; ch < channels.Length; ++ch)
            filterChannel(channels[ch], ch);
        //interleave and copy back
        Util.interleaveData<float>(channels, data);
    }

    void filterChannel(float[] data, int channel)
    {
        //convolution using OVERLAP-ADD

        // get length that arrays will be zero-padded to
        int K = Mathf.NextPowerOfTwo(data.Length + m_ImpulseResponse.Length - 1);

        //create temporary (zero padded to K) arrays
        Complex[] ir_pad = new Complex[K];
        System.Array.Copy(m_ImpulseResponse, ir_pad, m_ImpulseResponse.Length);
        Complex[] data_pad = new Complex[K];
        for (int i = 0; i < data.Length; ++i)
            data_pad[i].Re = data[i];

        //FFT 
        FourierTransform.FFT(data_pad, FourierTransform.Direction.Forward);
        FourierTransform.FFT(ir_pad, FourierTransform.Direction.Forward);
        //convolution
        Complex[] ifft = new Complex[K];
        for (int i = 0; i < ifft.Length; ++i)
            ifft[i] = data_pad[i] * ir_pad[i] * K;
        FourierTransform.FFT(ifft, FourierTransform.Direction.Backward);
        //add from buffer 
        for (int i = 0; i < data.Length; ++i)
        {
            data[i] = (float)ifft[i].Re;
            if (i < m_buffers[channel].Length)
                data[i] += m_buffers[channel][i];
        }
        //buffer last (K - data.length) samples
        m_buffers[channel] = new float[K - data.Length];
        for (int i = 0; i < m_buffers[channel].Length; ++i)
            m_buffers[channel][i] = (float)ifft[i + data.Length].Re;
    }
}

Array m_buffers[][] stores the samples to overlap for each channel. I assume 2 channels, for stereo, because that’s what unity uses for 3D audio sources. I also added the constructor which initializes inner buffer arrays. The necessary thing to do before filtering is to deinterleave audio data and filter each channel separately (it’s impossible to filter deinterleaved samples using fast convolution method based on DFT). I wrote two helper functions for interleaving and deinterleaving for the Util class:

public static T[][] deinterleaveData<T>(T[] data, int num_ch)
{
    T[][] deinterleaved = new T[num_ch][];
    int channel_length = data.Length / num_ch;
    for (int ch = 0; ch < num_ch; ++ch)
    {
        deinterleaved[ch] = new T[channel_length];
        for (int i = 0; i < channel_length; ++i)
            deinterleaved[ch][i] = data[i * num_ch];
    }

    return deinterleaved;
}

public static void interleaveData<T>(T[][] data_in, T[] data_out)
{
    int num_ch = data_in.Length;
    for (int i = 0; i < data_in[0].Length; ++i)
    {
        for (int ch = 0; ch < num_ch; ++ch)
            data_out[i * num_ch + ch] = data_in[ch][i];
    }
}

That would be all for the DSP part! (uff?) The only thing left to do is to integrate our filter class into the script in Unity:

using UnityEngine;
using System;
using AForge.Math;

[RequireComponent(typeof(AudioSource))]
public class SoundAirAbsorption : MonoBehaviour {

    public GameObject AudioListener;

    [Range(0, 50)]
    public float Temperature = 20f;
    [Range(0, 100)]
    public float Humidity = 50f;
    [Range(0, 2)]
    public float Pressure = 1f;
    
    AirAbsorbFilter air_filter;
    const float update_rate = 0.05f;
    const int filter_length = 2048;

	void Start () {
        air_filter = new AirAbsorbFilter(audio.clip.frequency);
        InvokeRepeating("updateFilter", 0, update_rate);
	}
	
	void Update () {

	}

    void OnAudioFilterRead(float[] data, int channels)
    {
        if (!ReferenceEquals(air_filter, null))
            air_filter.filter(data, channels);
    }

    void updateFilter()
    {
        float distance_to_source = UnityEngine.Vector3.Distance(AudioListener.transform.position, transform.position);
        air_filter.m_AirModel.Temperature = Temperature + 273.15f; //Celcius to Kelvin
        air_filter.m_AirModel.Humidity = Humidity;
        air_filter.m_AirModel.Pressure = Pressure;
        air_filter.updateImpulseResponse(distance_to_source, filter_length);
    }


}

The AudioListener member (line 8) stores the reference to the game object that will receive the sound: there is only one in a scene and that’s usually a main camera. We need that reference to calculate the distance between sound source and receiver. Fields Temperature, Humidity and Pressure are public so they can be directly manipulated from the Unity editor (Every public member of the class that inherits from MonoBehaviour can be controlled/assigned within the Unity editor). [Range()] is an attribute that is used for numeric fields and determines the range for the slider in Unity editor that is used to change the value of that field. update_rate (in seconds) determines how often distance to audio listener is updated and new impulse response calculated. We pass that value to InvokeRepeating function, which is a Unity-built way to repeatedly call some function in a given time interval. In this case, it’s the updateFilter function.

This is how our script looks inside the Unity editor:
Capture

This ends a two-part entry about implementing a digital filter simulating air absorption of sound. Hopefully , any of this will be of some help to somebody, cause i’ve sure had fun writing this. Thanks for reading!

—————————
LINKS:

Aforge.NET library: http://www.aforgenet.com/framework/

Fast convolution (overlap-add & save): http://inst.eecs.berkeley.edu/~ee123/sp14/docs/FastConv.pdf

A good entry on wiki about circular convolution (look the example): http://en.wikipedia.org/wiki/Circular_convolution