Opinion: generating immersive sound experiences14 August 2017
Wilfried Van Baelen, CEO at Auro Technologies, writing exclusively for Installation, says producing 3D audio with different approaches is actually possible.
In music and movies, stereo was all the rage until surround sound appeared on the scene in the 1980s. Today, sound engineers and artists are much more interested in a format called ‘immersive sound’, the generic term I introduced in 2010 at the Tokyo AES Spatial Conference meaning ‘surround sound with height’ or three-dimensional sound.
Although this description is technically correct, there is much more to immersive sound than it suggests.
The appeal of immersive sound is that listeners are finally able to replay audio with more depth, transparency, and definition of source localisations. While surround sound formats are only two-dimensional – positioned in the horizontal plane around the listener and allow sound to be produced from left to right and from front to back – immersive sound formats add a third dimension, allowing sound to also be produced in the vertical axis to create a hemisphere of sound around the listener.
The challenge, of course, is reproducing the immersive sound format across various speaker set-ups in a way that is true to life. Additional speakers do not automatically make sound more natural, and the opposite may be true depending on the technology used or creative choices being made.
Object-based vs. channel-based encoding
From a technical perspective, the methods used to reproduce sounds in a three-dimensional environment can either be channel-based, object-based or even scene-based. Each come with pros and cons and have been around for 25 years, contrary to existing companies leading consumers to believe these are new (especially the case for object-based technology). However, combining these technologies into a ‘hybrid’ format is a newer approach, and based on the different ways this is being done, I think all immersive sound formats like Auro-3D, Dolby Atmos, DTS:X and MPEG-H should be classified as such. Marketing departments are misleadingly labelling formats as ‘object-based’ although the channel-based part is still very substantial (i.e. majority of sound energy for an average movie mix in a so called ‘object-based’ format is still channel-based).
So what’s the difference? Dolby Atmos and DTS:X use a two-dimensional channel-based format (5.1/7.1 surround) and need object-based technology to position sound in the vertical axis. Although Auro-3D also permits use of this technology, its hybrid format is based on a three-dimensional channel-based setup that allows the reproduction of a 3D space independent of object-based technology. The channel-based approach has many advantages for bringing the same immersive sound experience as intended by the creators in the most efficient way.
Another misunderstanding is that object-based technology can reproduce the position of the source sounds (the objects) more precisely than channel-based technology. While the metadata of each object is able to describe its localisation, it is also key for the renderer at playback to reproduce the sound based off physical location, hence the importance of speaker layout.
Channel-based technology can even reproduce more spatial precision compared to object-based technology on a similar speaker layout because the ‘objects’ in object-based formats are typically mono or stereo sounds, which don’t contain the 3D reflections that are crucial for reproducing a natural 3D sound experience. Those 3D reflections do have a time-component in both the horizontal and vertical axis. The time component in the vertical axis is not as flexible as object-based formats lead the market to believe and can be better preserved as a true 3D channel-based format.
It’s all about the speaker setup
Most audio set-ups on the market include only two overhead speakers (including 5.1.2, 7.1.2 or 9.1.2 layouts), which cannot reproduce a true 3D space. This is to blame for much of the confusion related to object-based and channel-based audio. Many people incorrectly believe that object-based audio is the only approach to generating immersive sound experiences despite the fact that channel-based methods are still being widely utilised across the entertainment and music sectors.
And, with a carefully chosen speaker layout, a channel-based system is more suited to capture and reproduce the crucial information that makes up the natural sound field, including information about the original recording environment, the size and distance of the sources, and many more elements.
That’s why all immersive sound formats include a channel-based component (2D in Dolby Atmos or DTS:X and 3D in Auro-3D) that carries this crucial information alongside the objects.
While it’s important to know the differences between these two approaches, and the types of projects each is best suited to produce, it’s equally imperative for the audio community to understand that the two are not mutually exclusive.
In the meantime, 3D sound continues to offer a great influence on the technology behind our entertainment mediums, and has likewise enhanced much of this content to a level where entertainment can be truly immersive. And while all advances in immersive technology and audio are captivating by nature, it’s important that the audio community tries to understand both pros and cons of the approaches and how all sound experiences can best be utilised. At the end of the day, what really matters is the final experience of the listener, rather than the technique used to achieve that experience.