VR or 3D audio is often referred as spatialization, which basically is the ability to play a sound as if the sound is positioned in a particular place in a 3D space. The technology is essential to provide a strong sense of immersion because sounds give us important clues that suggest where we are in a real three-dimensional environment.
Just like localization, spatialization depends on two main components: direction and distance.
Directional Spatialization with HRTFs
Sounds are transformed by our ear geometry and body differently depending on the direction where they come from. Head-Related Transfer Functions (HRTFs) bases itself on these different effects, and they are used to localize sounds.
How to Capture HRTF?
HRTF are captured by placing people in an anechoic chamber, put microphones in their ears, and play sounds in the chamber from every possible direction to record those sounds with the microphones. Then the captured sounds are compared with the original sounds to compute the HRTF that would take the listener from one to the other. This is made for both ears and usable samples sets require sounds to be captured from a considerable number of discrete directions.
Of course, not every person has the same physical characteristics, and it would be impossible to record everyone’s HRTF, so labs like Microsoft Research and Oculus use a generic reference set that fits most situations, particularly when combined with head-tracking.
Once the HRTF set is ready, developers can select an appropriate HRTF and apply it to the sound if they know the direction they want the sound to appear or come from. This is made in the form of what tech companies call ‘time-domain convolution’ or an ‘FFT/IFFT pair’. So companies basically filter audio signals to make them sound like they come from a particular direction. Put into a few words sounds pretty easy, but it is actually costly and difficult to develop.
Besides, the use of headphones is essential because an array of speakers would unnecessarily complicate things further.
People use head motion to identify and locate sounds in space. Without this, our ability to locate sounds in a three-dimensional space would be considerably reduced. When people turn their head to one side, developers must make sure they are able to reflect that movement in their auditory senses, otherwise, the sounds will seem false and not immersive enough.
High-end headsets are already able to track people’s head orientation and even position in some cases. So if developers provide head orientation information to a sound package, they will be able to project immersive sounds in people’s space.
HRTF can help companies identify sounds’ direction, but not the distance of those sounds. There are several factors that humans use to determine or assume the distance of a sound source and these can be simulated with software:
This is perhaps the easiest one and our most reliable cue when we hear a sound. Developers can simply attenuate the sound based on the distance between the listener and the source.
Initial Time Delay
This is much harder to replicate, it requires to compute the early reflections for a given set of geometry and its characteristics. It is also very expensive and architecturally complicated.
Direct vs. Reverberant Sound
Also known as the wet/dry mix, it is the result of any system that intends to model late reverberations and reflections in an accurate way. These systems are usually very expensive.
This one is a byproduct of the velocity of a sound source.
High Frequency Attenuation
HF attenuation due to air absorption is a trivial effect, but it is quite easy to model by using a simple low-pass filter, and also by adjusting slope and cutoff frequency. High-frequency attenuation is not as important as the other distance cues but we can’t ignore it.
For more information check Oculus developer’s website.