ISCAS 2023

Abstract

Recently, there is an increasing interest in the neural network-based video coding, in both academia and standardization. To foster the research in this emerging field, we organized a grand challenge on neural network-based video coding with ISCAS 2022 and received many responses from various universities and companies. To continue this activity and track the progress year by year, we propose this challenge, i.e., the 2nd grand challenge on neural network-based video coding with ISCAS 2023. Like the 1st grand challenge, in this GC, different neural network-based coding schemes will be evaluated according to their coding efficiency and innovations in methodologies. Two tracks will be evaluated, including the hybrid and end-to-end solutions. In the hybrid solutions, deep network-based coding tools shall be used with traditional video coding schemes. In the end-to-end solutions, the whole video codec system shall be built primarily upon deep networks. Participants shall express their interest to participate in this Grand Challenge by sending an email to the organizer Dr. Yue Li (yue.li@bytedance.com) and are invited to submit their proposals as ISCAS papers. The papers will be regularly reviewed and, if accepted, must be presented at ISCAS 2023. The submission instructions for Grand Challenge papers will be communicated by the organizers.

Rationale

In recent years, deep learning-based image/video coding schemes have achieved remarkable progress. As two representative approaches, hybrid solutions and end-to-end solutions have both been investigated extensively. The hybrid solution adopts deep network-based coding tools to enhance traditional video coding schemes while the end-to-end solution builds the whole compression scheme based on deep networks. In spite of the great advancement of these solutions, there are still numerous challenges remaining to be addressed, e.g.:

how to harmonize a deep coding tool with a hybrid video codec, e.g. how to take compression into consideration when developing a deep tool for pre-processing;
how to exploit long-term temporal dependency in an end-to-end framework for video coding;
how to leverage automated machine learning-based network architectures optimization for higher coding efficiency;
how to perform efficient bit allocation with deep learning frameworks;
how to achieve the global minimum in rate-distortion trade-offs, e.g. to take the impact of the current step on later frames into account, possibly by using reinforcement learning; and
how to achieve better complexity-performance trade-offs.

In view of these challenges, several activities towards improving deep-learning-based image/video coding schemes have been initiated. For example, there are a special section on “Learning-based Image and Video Compression” in TCSVT, July 2020, a special section on “Optimized Image/Video Coding Based on Deep Learning “ in OJCAS, Dec. 2021, and the “Challenge on Learned Image Compression (CLIC)” in CVPR, which has been organized annually since 2018. Meanwhile, JPEG started the JPEG-AI project targeting at a neural network-based image compression standard; and JVET also started to explore neural network-based video coding technologies for the potential next generation video coding standard. In hopes of encouraging more innovative contributions towards resolving the aforementioned challenges in the ISCAS community, we propose this grand challenge.

Requirements, Evaluation

Training Data Set

It is recommended to use the following training data.

UVG dataset: http://ultravideo.cs.tut.fi/
CDVL dataset: https://cdvl.org/

Additional training data are also allowed to be used given that they are described in the submitted document.

Test Specifications

In the test, the proposals will be evaluated on multiple YUV 4:2:0 test sequences in the resolution of 1920x1080. There is no constraint on the reference structure. Note that the neural network must be used in the decoding process.

Evaluation Criteria

The test sequences will be released according to the timeline and the results will be evaluated with the following criteria:

The decoded sequences will be evaluated in 4:2:0 color format.
PSNR (6*PSNRY + PSNRU + PSNRV)/8 will be used to evaluate the distortion of the decoded pictures.
Average Bjøntegaard delta PSNR (BD-PSNR) calculated using [1] for all test sequences will be gathered to compare the coding efficiency.
An anchor of HM 16.22 [2] coded with QPs = {22, 27, 32, 37} under random access configuration defined in the HM common test conditions [3] will be provided. The released anchor data will include the bit-rates corresponding to the four QPs for each sequence. It is required that the proposed method should generate four bit-streams for each sequence, targeting the anchor bit-rates corresponding to the four QPs. Additional constraints are listed as follows:
A. For each sequence, the bit-rate difference of the lowest bit-rate points between the anchor and the test shall be less than 20%; the bit-rate difference of the highest bit-rate points between the anchor and the test shall be less than 20%.
B. Only one single decoder shall be utilized to decode all the bitstreams.
C. The intra period in the proposed submission shall be no larger than that used by the anchor in generating the validation and test sequences.

Proposed Documents

A docker container with the executable scheme must be submitted for results generation and cross-check. Each participant is invited to submit an ISCAS paper, which must describe the following items in detail.

The methodology;
The training data set;
Detailed rate-distortion data (Comparison with the provided anchor is encouraged).

Complexity analysis of the proposed solutions is encouraged for the paper submission.

Important Dates

Aug. 15, 2022: The organizers release the validation set as well as the corresponding test information (e.g., frame rates and intra periods) and template for performance reporting (with rate-distortion points for the validation set)
Oct. 24, 2022: Deadline of paper submission (to be aligned with Special Sessions in case of extension) for participants
Nov. 08, 2022: Participants upload docker container wherein only one single decoder shall be utilized for the decoding of all the bitstreams
Nov. 10, 2022: The organizers release the test sequences (including frame rate, corresponding rate-distortion points, etc.)
Dec. 01, 2022: Participants upload compressed bitstreams and decoded YUV files
Dec. 14, 2022: Deadline of fact sheets submission for participants
Dec. 19, 2022: Paper acceptance notification
Feb. 04, 2023: Camera-ready paper submission deadline
TBA: Paper presentation at ISCAS 2023
TBA: Awards announcement (at the ISCAS 2023 banquet)

Awards

ByteDance will sponsor the awards of this grand challenge. Three categorizes of awards are expected to be presented. Two top-performance awards will be granted according to the performance, for the hybrid track and the end-to-end track, respectively. In addition, to foster the innovation, a top-creativity award will be given to the most inspiring scheme recommended by a committee group, and it is only applicable to participants whose papers are accepted by ISCAS 2023. The winner of each award (if any) will receive a $5000 USD prize.

References

[1] G. Bjøntegaard, “Calculation of average PSNR differences between RD-Curves,” ITUT SG16/Q6, Doc. VCEG-M33, Austin, Apr. 2001.
[2] https://vcgit.hhi.fraunhofer.de/jvet/HM/-/tree/HM-16.22
[3] Common Test Conditions and Software Reference Configurations for HM (JCTVC-L1100)