How machine learning can benefit the pro-AV and broadcast markets

Increasingly, organisations are experimenting with machine learning (ML) and greater adoption is taking hold. In fact, according to a McKinsey survey, 39% of organisations have already implemented some form of ML in their business.

While this adoption is still relatively nascent, the prospect of improved efficiencies, customer behaviour prediction, and insightful business intelligence – among other benefits – makes this an appealing technology for organisations across the board.

The pro AV and broadcast markets are no exception to this rule. ML is already stimulating new usage models and revenue streams for organisations in this space, not to mention cost savings. Here are just four examples of how Pro AV and broadcast companies can apply ML.

Region-of-Interest (ROI) Encoding

Streaming and storage costs for large video files and UHD content can easily stack up. Fortunately, Region-of-Interest (ROI) encoding can help ease this issue by reducing the overall bitrate of content and then applying best video quality (VQ) to areas that the eye is naturally drawn to, particularly faces and people, while reducing the VQ in less important areas such as backgrounds.

The perceived overall quality is still good when viewed naturally, but the output bitrate of the encoder could be reduced from, for example 5Mbps to 1.5Mbps. This is a 70% saving in bitrate, which directly equates to a 70% cost saving in streaming costs, which for a typical stream to 10,000 viewers could save over $700/hour!

The same is true for media storage costs. Assuming a 2TB high throughput drive is provisioned in the cloud, it can cost ~$1,000/month. Using ROI to reduce the encoder output bitrate by 70% means that a smaller and cheaper drive can be provisioned, or more likely, much more video content can be stored on the same provisioned drive.

ROI can also be used to preserve details in the most important areas in control room applications. For example, if an incident occurs and is monitored on a large video wall, it’s important that details can be accurately discerned during follow-up investigation, and usable for training so that mistakes can be learned from and action plans improved. This means preserving high VQ in areas of text overlays (e.g. clocks) using static co-ordinates for ROI encoding and faces or people using dynamic and ML-based co-ordinates.

Intelligent Digital Signage

Targeted advertising is the holy grail for marketers. Using various ML models to analyse an audience in front of a digital sign, it’s possible to serve more relevant and targeted ads, based on metrics like age and gender. This makes the signage provider more attractive to advertisers who will be willing to pay more for better ad presentation. This also generates valuable data for the advertiser such as viewer interest, which can lead to improved usage of the service, and provides monetisable feedback to the manufacturers they represent.

The viewer is also presented with relevant and more personalised ads. For example, ones suggesting goods and services they are likely to be more interested in seeing, improving their overall shopping experience. Alternative ML models can be used in interactive kiosks, replacing touch screens with gesture control to move to the next ad, or particularly for placing orders. The poor hygiene of touch screens in fast food ordering has been highlighted in the press, so turning this to gestures rather than physical contact makes it much cleaner and healthier for the customer.

Object Tracking & Windowing

Face detection using ML can be applied in other ways too. Imagine live-streaming a panel discussion about an artist’s work at a local college. This is a low-budget event with a niche audience so production costs are going to be very low. A single camera will typically be in use, capturing the whole panel with occasional zooming and panning.

Using ML, it’s possible to have a static 4K camera capture the whole panel, but automatically create extra lower resolution HD windowed outputs around each of the panellists and track them through the conversation. So, from a single 4K camera, it’s possible to have four different output shots to switch between during the live stream – the wide angle and three close-ups. This creates more visual interest and doesn’t require any extra camera equipment to set up – the camera operator can become the video mixer and simply select which frames to stream.

This approach can be applied, with various ML tracking models, in professional broadcast applications such as sports coverage or in collaboration environments where multiple video conferencing attendees can be tracked automatically.

Speech Recognition

Looking at a different area of ML, it’s also possible to perform speech recognition using natural language processing (NLP) models. This is already apparent in the home, with Alexa, Google and other smart devices that can respond to commands and present information and media, or control aspects of the house. With NLP built into devices, the same capabilities can be applied in professional media, making equipment set-up quicker and less complicated, not requiring a cloud connection and removing the need for any related subscription services to perform the same task.

Additionally, it’s possible to automatically transcribe meeting notes using speech-to-text algorithms and summarisation models. It’s also possible to perform regional translation with the potential of almost real-time subtitles in any language, which again could be applied to video conferencing applications, or to more traditional closed-caption systems in broadcast and cinema.

Customers can take advantage of these ML capabilities on Xilinx devices, including the Zynq UltraScale+ MPSoC platform, for AI edge processing. Processing directly at the edge, and without needing a network connection, has tremendous benefits in terms of low-latency performance, and could even be useful in overcoming many concerns around privacy and storage of identification metrics in the cloud. Incorporating these ML capabilities into Xilinx’s adaptable platforms means organisations can monetise analytics, improve workflow efficiency and enhance usability. Ultimately, these integrated ML features allow companies to increase innovation, differentiate themselves and accelerate time-to-market.

Rob Green is senior manager Pro AV and Broadcast at Xilinx

Your browser is out-of-date!

Related Articles

Q&A with Ciaran Doran, chair of ISE’s AV Broadcast Summit

Special report: R&D in pro AV and broadcast

Turning up the volume on corporate AV

Continuing convergence: The rise of “flexible & scalable” broadcast technology in pro AV

Peerless-AV returns to Lord’s Cricket Ground for fifth AV Showcase

Construction industry jitters force reflection in pro AV sector

Peerless-AV names exhibitors for The AV Showcase 2025: Pt 1

AVIXA report: Pro AV rebounds, but trade barriers remain