Pioneered by Apple and popularised by Microsoft in April 1992 with Windows 3.1, graphical user interfaces – GUIs – were the first attempt to make computing technology accessible to the masses. Remarkably, given the widespread perception of touchscreens as a new technology, the H-P 150 was the first commercially available PC to offer it as an option 10 years before Windows 3.1. The Apple iPhone brought us the wonders of multitouch in 2007.
The quest, of course, has always been to make the electronic devices we use more intuitive. But: if we want something, what could be more intuitive than asking for it? Enter voice recognition. What has transformed its viability – to the point where we now take the likes of Alexa, Cortana and Siri for granted – has been the processing horsepower necessary, coupled with AI. Given the proliferation of voice-controlled devices in our homes it’s apparent that conversing with a computer is no longer the stuff of sci-fi. But: is the technology ready now for commercial prime time – and will it displace interactive screens as our primary interface?
“The obvious advantage of voice as a UI is the ability to communicate with a computer using our own language,” believes Kevin Hague, vice president, technology strategy for Harman International. “Advances in machine learning have enabled computers to finally do a reasonable job understanding our questions. This will be key when interactive displays are in environments where people need specific data and are unfamiliar with how to use the display.”
“Voice is certainly going more mainstream as a user interface with devices and services, although it has largely been limited to interesting proofs of concept and demonstrations to date,” says Chris Mcintyre-Brown, associate director, Futuresource Consulting. “The development in consumer devices is based upon the fast-growing smart speaker market and compatible devices, which are spreading throughout the home. Enterprise and customer-facing adoption of voice is not at this level, with manufacturers largely still evaluating voice and how it can be used. The challenge is to make it more than a gimmick – otherwise there may be some tentative adoption, but it has to have more appeal than touch.”
“Voice control systems have the potential to create huge growth in the interactive technologies market, thanks to the simplicity and speed of use they offer,” believes Joel Chimoindes, European commercial director at European AV distributor Maverick AV Solutions. “However, a long journey of research and development is required to ensure that voice activation systems work first time, every time, to ensure that users have complete confidence in their systems.”
“Users know and recognise touch now, but it has taken a long time to get there,” adds McIntyre-Brown. “Voice is not close to that level yet. Discoverability and awareness of voice, knowing the right wake-word and phrasing to interact with a virtual assistant is important. It will take time for users to reach this level of comfort and knowledge.”
Not as long
The long journey to which Chimoindes refers may not be as long as we think. CES earlier this year saw the debut of the Lenovo Smart Display – in effect, a Google Assistant-powered smart speaker with a built-in touchscreen. The device is sensitive to different voices – allowing personalised information to be displayed. There were similar products on show from Sony, LG and JBL (part of Harman) with its LINK View. Although currently aimed at consumers, it’s easy to envisage them becoming a universal piece of desk furniture.
With 128 million Alexa-enabled devices predicted to be in use by 2020, and Amazon ‘owning’ 70% of the US market, Google may struggle to make inroads – although you underestimate the search behemoth at your peril. That hasn’t deterred IBM, however, which in March launched Watson Assistant. Described by the company as an AI enterprise assistant, it’s a service aimed at companies looking to build voice-activated capabilities for their own products. Where it seems to differ most from its competitors is that it is designed for ‘own-brand’ applications – so developers can specify their own wake-up command, or perhaps none at all. Given IBM’s strength in the corporate sector, the company could emerge as a serious contender.
A month after the Watson launch, IBM and Harman announced Voice-Enabled Cognitive Rooms – using IBM’s Watson AI technology and Harman AKG microphones, JBL speakers and AMX AV control and switching systems. The companies said they were targeting medical, corporate, hotel, cruise ship and hospitality applications.
Are Alexa for Business and IBM’s Watson game-changers? Many think so. Alexa for Business provides tools and resources for organisations to set up and manage Alexa devices at scale, enable private skills, and enrol users. It has obvious attractions in the unified communications space: a simple “Alexa: begin the meeting” to automatically start up the necessary devices, make the connection and so on, for example. IBM’s Watson holds similar potential.
Voice control isn’t entirely without its downsides, though. “The main points of conflict around voice control are definitely simplicity versus security,” says Chimoindes. “It has the potential to make audiovisual solutions easier to use, more accessible and more efficient. However: voice relies heavily on a device which is always listening, raising a huge heap of security issues and causing major concern for enterprises and commercial, retail, education and medical areas alike.”
That’s a phenomenon of which domestic users are also aware, despite Amazon’s attempts to allay their fears. Given current concerns about the harvesting and misuse of personal data, it’s hard to dismiss those fears as mere paranoia.
Natalie Harris-Briggs, vice president of marketing at workspace solutions company Avocor, sees other potential downsides.
“Speed and usability are the primary advantages for control via voice,” she thinks. “The user will be able to quickly gain access to applications and meetings just by using their voice, with no complicated adoption training required to experience the full capabilities of a system or solution.”
“The disadvantages,” she continues, “could be a poor user experience because of inadequate third-party microphone and audio devices. This is an issue that brands in the video collaboration area have already witnessed, and so they have implemented third-party certification, such as Microsoft with Skype for Business and Teams.”