Abstract
Recently, there is increasing interest in neural network-based (NN-based) video coding, including hybrid, end-to-end, and NN enhanced schemes. To foster the research in this emerging field and provide a benchmark, we propose this Grand Challenge (GC). In this GC, different neural network-based coding schemes will be evaluated according to their coding efficiency and innovations in methodologies. Three tracks will be evaluated, including:
- hybrid neural network-based video codec,
- end-to-end video codec,
- neural network enhanced VVC encoder.
In the hybrid codec track, deep network-based coding tools shall be used with traditional video coding schemes. In the end-to-end codec track, the whole video codec system shall be built primarily upon deep networks. In the neural network enhanced VVC encoder track, deep network-based encoding algorithms can be applied in a VVC encoder which generates VVC compatible bitstreams.
Participants shall express their interest to participate in this Grand Challenge following the participation instruction and are invited to submit their schemes as ISCAS papers. The papers will be regularly reviewed and, if accepted, must be presented at ISCAS 2025. The submission instructions for Grand Challenge papers will be communicated by the organizers. Please contact Dr. Yue Li (yue.li@bytedance.com) for more information.
Rationale
In recent years, deep learning-based image/video coding schemes have achieved remarkable progress. As two representative approaches aiming at future video codec schemes, hybrid solutions and end-to-end solutions have both been investigated extensively. Hybrid solutions adopt deep network-based coding tools to enhance traditional video coding schemes while end-to-end solutions build the whole compression scheme based on deep networks. Besides, NN-based methods are also widely studied to optimize or speed up encoders compliant to existing popular standards such as HEVC, VVC. Although great advancement has been observed, there are still numerous challenges remaining to be addressed:
- How to harmonize a deep coding tool with a hybrid video codec, for example, how to take compression process into consideration when developing a deep tool for pre-processing;
- How to exploit long-term temporal dependency in an end-to-end framework for video coding;
- How to leverage automated machine learning-based network architecture optimization for higher coding efficiency;
- How to perform efficient bit allocation with deep learning frameworks;
- How to achieve a better global result in terms of rate-distortion trade-offs, for example, to take the impact of the current step on later frames into account, possibly by using reinforcement learning;
- How to achieve better complexity-efficiency trade-offs;
- How to speed up a VVC encoder with less coding efficiency loss via NN-based methods or use NN-based preprocessing to enhance the VVC encoding efficiency.
In view of these challenges, several activities towards improving deep learning-based image/video coding schemes have been initiated. For example, a special section on “Learning-based Image and Video Compression” was published in TCSVT, July 2020; a special section on “Optimized Image/Video Coding Based on Deep Learning” was published in OJCAS, December 2021; and the “Challenge on Learned Image Compression (CLIC)” has been organized annually since its inception at CVPR in 2018 and now was moved to DCC from 2024. In hopes of encouraging more innovative contributions towards the aforementioned challenges in the ISCAS community, we proposed this grand challenge since 2022. It has been successfully held for three years (ISCAS 2022, ISCAS 2023, ISCAS 2024), attracting related researchers all over the world. As being looked forward by many experts in this area, the grand challenge will be held again for ISCAS 2025, with the same tracks and awards.
Awards
ByteDance will sponsor the awards of this grand challenge. Awards in four tracks are expected to be presented, contingent upon sufficient participants in each category. Three top-performance awards will be granted according to the performance, for the hybrid track, the end-to-end track, and the VVC encoder-only track respectively. In addition, to foster innovation, a top-creativity award will be given to the most inspiring scheme recommended by a committee group, and it is only applicable to participants whose papers are accepted by ISCAS 2025. The winner of each award (if any) will receive a $4375 USD prize.
Requirements and Evaluation
Training Data Set
It is recommended to use the following training data.
UVG dataset: http://ultravideo.cs.tut.fi/
CDVL dataset: https://cdvl.org/
Additional training data are also allowed to be used given that they are described in the submitted document.
Test Specifications
In the test, each scheme will be evaluated with multiple YUV 4:2:0 test sequences in the resolution of 1920x1080.
There is no constraint on the reference structure. Note that the neural network must be used in the decoding process of the hybrid track and the end-to-end track, while the VVC reference software VTM will be utilized for decoding bitstreams of the NN enhanced VVC encoder-only track.
Evaluation Criteria
The test sequences will be released according to the timeline and the results will be evaluated with the following criteria:
- The decoded sequences will be evaluated in the 4:2:0 color format.
- PSNR (6*PSNRY + PSNRU + PSNRV)/8 will be used to evaluate the distortion of the decoded pictures.
- Average Bjøntegaard delta rates (BD-Rate) [1] for all test sequences will be gathered to compare the coding efficiency.
Anchors of HM 16.22 [2] and VTM-23.2 [3] coded with QPs = {22, 27, 32, 37} under the random access configurations defined in the HM and VTM common test conditions [4, 5] will be provided. Note that the HM anchor is used for the hybrid and end-to-end tracks, while the VTM anchor is used for the VVC encoder-only track. The released anchor data will include the bit-rates corresponding to the four QPs for each sequence.
Additional constraints for the first two tracks (i.e., the hybrid NN-based and end-to-end video codec) are listed as follows:
- It is required that the proposed method should generate four bit-streams for each sequence, targeting the anchor bit-rates corresponding to the four QPs. For each sequence, the lowest bit-rate point of the proposed method must be in the range of 80% to 110% of the anchor bit-rate at the lowest bit-rate point and the highest bit-rate point of the proposed method must be in the range of 90% to 120% of the anchor bit-rate at the highest bit-rate point;
- Only one single decoder shall be utilized to decode all the bitstreams;
- The intra period in the proposed submission shall be no larger than that used by the anchor in compressing the validation and test sequences.
While for the NN enhanced VVC encoder track, the additional requirements are listed as follows:
- The docker file shall have the capability of encoding the test sequences to generate VTM-compatible bitstreams;
- It is required that the proposed method should generate four bit-streams for each sequence, targeting at the anchor bit-rates corresponding to the four QPs. For each test point, the bit-rate of the proposed method should be in the range of 90% to 110% of the anchor bit-rate;
- The VTM-23.2 decoder is utilized to decode generated bitstreams to get reconstructed YUV files and use those YUV files to calculate the PSNR values. All the generated bitstreams MUST be decoded successfully;
- The VTM-23.2 encoder is utilized as the anchor encoder. For each test point, denote the encoding time of the proposed encoder as T1, the encoding time of VTM-23.2 encoder as T2, T1 and T2 should satisfy: T1 <= 70% T2. Note that T1 and T2 shall be evaluated on the same platform with single thread (e.g., Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz, NVIDIA A100-SXM4-80GB GPU). Encoding time comparison will be verified by the organizers.
Proposed Documents
A docker container with the executable scheme must be submitted for result generation and cross-check. Each participant is invited to submit an ISCAS paper, which must describe the following items in detail.
- The methodology;
- The training data set;
- Detailed rate-distortion data (comparison with the provided anchor is encouraged);
- Complexity analysis of the proposed solutions is encouraged for the paper submission.
References
[1] Bjøntegaard, “Calculation of average PSNR differences between RD-Curves,” ITUT SG16/Q6, Doc. VCEG-M33, Austin, Apr. 2001.
[2] https://vcgit.hhi.fraunhofer.de/jvet/HM/-/tree/HM-16.22.
[3] https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-23.2.
[4] Common Test Conditions and Software Reference Configurations for HM (JCTVC-L1100).
[5] VTM and HM common test conditions and software reference configurations for SDR 4:2:0 10 bit video (JVET-AB2010).