Bonum Commune Communitatis: Standardizing Machine Learning Based Video Coding Solutions (How ML Machine Learning is Used in Video Coding, Part 6)

Bonum Commune Communitatis: Standardizing Machine Learning Based Video Coding Solutions (How ML Machine Learning is Used in Video Coding, Part 6)

At this point in the series, I think we can all agree that machine learning will play a key role in the future of video coding. Whether used as an alternative to a standard codec component or in an end-to-end fashion, machine learning will be there for sure.

Standardization is crucial when it comes to video coding or any application on the Internet, as it is a system used in all regions of the world with a device category of hundreds, if not thousands, of different devices. Therefore, the data must follow a strict and standardized format to be able to overcome this extreme heterogeneity. This is why standards committees such as Moving Pictures Expert Group (MPEG), Joint Collaborative Team on Video Coding (JCT-VC), Joint Video Experts Team (JVET) and Joint Photographic Experts Group (JPEG) exist.

JPEG and MPEG are organized under ISO/IEC JTC 1/SC 29 (Coding of audio, image, multimedia and hypermedia information). MPEG focuses on setting standards for multimedia encoding, such as video and audio compression, file format for applications, and transmission. On the other hand, JPEG focuses on the same aspects for still images. The role of JCT-VC and JVET is a bit different as they were formed to design video coding standards. Namely, JCT-VC for High efficiency video coding (HEVC) and JVET for Versatile Video Coding (CVV).

These standards committees have focused on improving the performance of video coding solutions over the past decades. Nowadays, with machine learning being used more and more in video coding, standards committees have started to form new groups for these approaches.

JPEG-AI had become an official work item in 2021. It focuses on providing a learning-based image compression method that targets better visual quality with significant compression efficiency compared to industry standards. existing image coding. Additionally, image coding for machines is also being considered for applications such as image processing and computer vision tasks.

MPEG has an open group on neural network compression, as the efficient transmission of machine learning models will play a key role in video streaming. It’s a relatively new group. The motivation behind this was the growing importance of machine learning-based tools for applications such as video encoding, classification, and descriptor extraction from video content. The first version of neural network compression was already released in 2021, and version 2 is on the way.

In addition, MPEG has a exploration group for video coding for machines. Existing video codecs are designed for human consumption. However, most video today is analyzed by machines, and standard codecs are not a suitable solution for delivering video to machines. The MPEG activity on Video Coding for Machines (VCM) aims to standardize a bitstream format generated by compressing both a video stream and previously extracted features that will be used in machine vision tasks.

There is also the organization Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI), independent of MPEG, which aims to develop standards for AI-based data coding.

AI-Based End-to-End Video Coding (MPAI-EEV) is a subgroup of MPAI that focuses on end-to-end video coding with machine learning. The goal here is to develop a method capable of compressing video size using ML-based end-to-end data coding technologies without the constraints of previous video coding standards. Another MPAI project, AI-Enhanced Video Coding (MPAI-EVC), focuses on improving the performance of traditional video codecs by replacing components with machine learning-based methods.

ML-based video coding standards and their start date

That was it for this series of blog posts. We started by introducing what video encoding is and how it is delivered via HTTP Adaptive Streaming. Additionally, we discussed how machine learning can be used to improve video codec performance, visual quality of decoded videos, and provide end-to-end encoding solutions. And we finalized the series with this article by introducing the ongoing standardization work on ML-based video coding. I hope you enjoyed reading it, and I hope it introduced the wide world of video coding a bit.


Ekrem Çetinkaya obtained his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis on image denoising using deep convolutional networks. He is currently pursuing a doctorate. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision and multimedia networks.


Similar Posts

Leave a Reply

Your email address will not be published.