There are many technologies that, when they first appear, seem so futuristic and incredible it is sometimes difficult to imagine them ever becoming reality. Artificial intelligence (AI) is a shining example of that. While people do not give a second thought about asking their smart speakers to time boiling an egg or put on a music playlist, at one time it was very much the stuff of science fiction. AI – and the associated area of machine learning (ML) – forms the basis of search engines, recommendation lists and targeted advertising, as well as the ubiquitous Alexa and Echo, with industrial processes and scientific research also relying on it for automating repetitive tasks or adapting existing procedures.
In the audio sector, AI helped enable motorised faders and instant recall of settings on mixing consoles in the 1970s. But in those early days the underlying technology was less important than the results it produced. Even in the more tech-obsessed 90s, CEDAR Audio decided not to publicise the fact that its DH1 dehisser, launched in February 1994, was based on ML. “Some of our engineers recommended we didn’t talk about AI or ML in our product information because, at a time when many people still thought real-time audio restoration was impossible, it might sound like technobabble,” recalls MD Gordon Reid.
Today, the terms are heard in general conversation but there continues to be a degree of confusion over what AI and ML actually are and how they relate to each other. AI is a sub-speciality of computer science designed to emulate human thought and enable computers to perform tasks that would usually require human intelligence. It is more an umbrella term for other, specialised procedures, such as robotics, deep learning, intelligent control and data mining. ML is a sub-set of AI but does not work on the same rule-based methodologies and has a looser statistical approach.
By the 2010s, more audio companies were not only exploiting, but also actively promoting the possibilities of AI and ML. CEDAR had continued development using the technologies and in 2012 introduced the DNS 8 Live dialogue noise suppressor, which included a more advanced ML noise estimator. Designed for both live sound and live broadcast, it was used for cleaning up in-ear monitor sound for Sam Smith’s 2018 tour and open-air shows by Robbie Williams in 2019 and 2020.
Gordon Reid adds that the unit, and its successor the DNS 8D, have also been used at events not originally considered by the designers, including political rallies and conferences. “These can have a lot of ambient noise, which makes it hard for in-house sound engineers to turn up the PA system,” he comments. “By reducing or eliminating the noise and spill from the crowd or audience, you can get much better results.”
Biamp is also using AI/ML for noise reduction, although executive vice president of corporate development Joe Andrulis says there are several areas where it can be applied in commercial audio. “AI is a fascinating technology,” he comments. “It’s come into its own this last couple of years in not just the audio industry but almost every area of life. Within our product line we have the Launch functionality, which is able to automatically tune our systems to match the kind of calibration quality you might get from an audio expert.”
Launch is a signal processing system that can configure systems, detect any connected devices, profile room acoustics and establish settings. This is now complemented by a recently introduced algorithm dedicated to noise reduction that, Andrulis says, heavily utilises ML techniques. “Traditional noise reduction algorithms are very good at [dealing with] systematic noise,” he explains. “Things like hums and air conditioners that change gradually over time. They’re not particularly effective at impulse noise sources, such as dropping a ball, crinkling a packet of potato chips or typing on keyboards. Those happen too fast for an algorithm to create a statistical model and then filter them out. AI learning approaches the problem entirely differently because AI algorithms are effectively very fancy and accurate classifiers. You can ask them questions about whether something belongs to a particular group – in this case, does it belong to the group of human speech? It if does not, then exclude it.” The new noise reduction algorithm will be implemented in the Parle range of conferencing audio/video bars, launched last November, with products due to ship in the coming months.
Other manufacturers exploring the possibilities for AI/ML include Crestron, which is using them for signal processing and cloud-based voice recognition, as well as de-noising. “The beauty of ML is that we can rely on processing cycles to look for trends to automatically optimise audio in our key use cases,” says Ekin Binal, director of product management for audio at Crestron. “Our first use of AI/ML in audio recently launched [as part of] our partnership with Shure to run its IntelliMix Room software DSP on Crestron Flex Unified Communication systems. This leverages AI and ML with a denoiser solution, which can tell the difference between noises and speech. It reduces the noise with virtually no audible effect on the speech, even when the two overlap.”
Several commercial audio and AV companies are now engaging in specific AI/ML research. When it opened its G Innovation Lab last year, Genelec stated that “developments in ML, AI and service-based business models” meant that “new and creative ways of thinking” will be essential in the future. The company’s research and development director, Aki Mäkivirta, comments that the technologies “will not be limited to certain areas of application” but be used for a “wide spectrum” of tools and methods in various segments of the audio market. “It is likely that there will be more intelligent aids and adjustments that can speed up the human production process,” he says. ”
AI/ML adoption can actually happen without much fuss, almost in the background, only becoming visible via clever controls.”
The potential of AI/ML is not only in individual processes but also as part of a much wider technical environment through the Internet of Things (IoT). The term was coined around the turn of the century but, as author Tom Chatfield noted in his 2011 book 50 Ideas You Really Need to Know, it was only by the 2010s that the concept was becoming a real possibility. The IoT, according to Chatfield, is envisioned as a network of smart computer chips incorporated into a wide variety of scenarios, including electrical systems, domestic appliances and buildings.
Gordon Reid at CEDAR observes that the IoT has huge potential and is something the company began to consider after starting to research blind source separation (BSS) in 2008. BSS is a technology that seeks to separate the sources in a mixed signal containing audio coming from multiple directions. The main example of a BSS application is attempting to solve the cocktail party problem. As people get older, their ability to hear and understand one person speaking in a noisy environment diminishes. This has created a market for devices comprising a small, low-cost microphone array and associated signal processing for separating the sources, which allows the listener to select what they want to hear.
Reid says that when CEDAR began testing BSS system prototypes, the technology did not appear to fit well into its existing product lines. As a result, AudioTelligence was established as a spin-off company to continue the research. It now produces systems for assertive listening that can be used with hearing aids and smartphones. Opportunities are also seen for the in-car ‘infotainment’ market, smartTVs, voice over IP communications, home assistants and domestic appliances. “You don’t want your fridge ordering milk because it hears someone doing that on a TV programme,” Reid explains. “But if you do want to order a pint of milk, the appropriate device should be able to isolate and understand the command even if there is a lot of noise from radios, TVs or even your children running around screaming in the background.” AudioTelligence and CEDAR jointly developed Isolate, which was launched last year and is currently deployed in CEDAR’s Trinity 5 audio surveillance and intelligence gathering technology.
Biamp’s Joe Andrulis describes the IoT as being about “creating almost every network connected device as either an endpoint of some type”, which he says could be an input, an output or both. “By input it could just be some sort of sensor. So almost every speaker becomes an opportunity to go and collect data. That massive data is not going to be processed by human being trying to figure it all out, it would overwhelm anybody. But AI algorithms eat it for lunch. They’re very efficient in processing vast amounts of data and quickly delivering meaningful, actionable insights from it.”
A specific audio example of this is the Crestron XiO cloud platform, which allows users to check the status of network amplifiers and control or manage updates. “There is a ton of potential,” says Ekin Binal. “Within a conference room we can capture occupancy usage, video playback, mic status and more to build insight into the entire environment and create intelligent dashboards that provide our customers with the ability to see how rooms and sites are performing and being used.”
At Genelec, Aki Mäkivirta observes that the IoT is “increasing in importance” as the methodology for enabling device awareness, management and control. “What precisely will happen within pro audio will depend on future standardisation and adoption of IoT technologies for our market,” he says. “We have to keep in mind that consumer IoT technologies may not be suitable for mission critical professional situations. More powerful technologies may have to be adopted.”
As for the future of AI/ML, Mäkivirta comments that while it shows “great promise”, the benefit “largely depends on creating either the representative AI training material or sufficiently detailed models of human psychoacoustics.” Ekin Binal observes that the future of AI/ML is “all about improving every experience” and while it has endless possibilities, the key will be to find solutions that provide direct value.
Joe Andrulis at Biamp concludes that AI/ML will help in ultimately creating “a natural, immersive AV experience that requires less and less understanding or explicit engagement from people to take full advantage of it.”
People are more used to such high technological concepts today but it still all sounds like something from the future – or a vision of the future filtered through science fiction. The power and potential of artificial intelligence and machine learning is summed up by Gordon Reid: “When Mr Spock on the 1960s Star Trek asked the ship’s computer a question, people were amazed by the concept of a talking machine. But they really should have been amazed by the fact that the computer knew to listen for him asking the question rather than anyone else speaking at the same time.”