TU Berlin

Embedded Systems ArchitectureHigh Performance Video Coding

AES Logo

Page Content

to Navigation

High Performance Video Coding


Computer architecture and video coding have mutually influenced each other during their technological advancement. Video codecs have improved their compression capabilities over time by introducing advanced coding tools that usually require more computational resources with each new generation. At the same time, user demands for better quality have pushed the adoption of high quality video systems with HD and UHD resolutions and increased bitdepths and color formats, which require even more computation resources. Computer architectures, on the other hand, have improved their performance over time with a combination of technology and an increased support of parallelism, mainly Data Level Parallelism (DLP) and Thread Level Parallelism (TLP).

Our main research goal is to optimize video codecs such as H.264/AVC and HEVC/H.265 in order to obtain efficient implementations on contemporary computer architectures. We work on general algorithmic optimizations as well as optimizations for better use of hardware resources.

Research lines

- Implementations on general purpose processors. We work on adaptations and optimizations of video codecs in order to obtain the maximum performance that the architecture can provide. This includes parallelization for multi- and many-core architectures, GPU acceleration, SIMD vectorization, and memory layout optimizations. The main objective is to find an appropriate mapping of the type of parallelism present in video codecs to the type of parallelism offered by recent parallel computer architectures. We also investigate how to reduce power consumption and increase energy efficiency of software video (de)coders by using the low power modes included in most recent microprocessors.

- Implementations using hardware/software codesign. Although general purpose processors can provide the required performance for video codecs their power, energy and cost is not acceptable in all applications. As an alternative, specially in mass-market multimedia devices, SoCs with dedicated hardware components are used. To reduce the cost of production of these devices as well as their energy consumption it is crucial to find the right partitioning between hardware and software implementations and to apply efficient interconnect technologies. Our group therefore investigates methods for high-speed, low-cost encoding as well as decoding of video streams by using modern HW/SW codesign techniques and state-of-the-art FPGAs and embedded processors.

- Algorithms for efficient video coding. State-of-the art video codecs include many coding tools each one allowing multiple operation modes. Full search approaches can give the maximum compression and quality levels but at the cost of an unpractical complexity. We investigate on algorithms for performing efficient video coding using the computational resources offered by state-of-the-art microprocessors. The main research objective is to find the best quality and compression tradeoff when using all the computational resources of recent high performance microprocessors (with ILP, SIMD and multicore optimizations enabled)


  • Highly scalable parallel H.264/AVC decoder.
    We have developed a parallel H.264/AVC decoder that can scale to many-core architectures. There are two implementations: one using pthreads, and the other one using the OmpSs programming model. The code is part of the Starbench benchmark developed by our group.
       * Starbench
       * If you have any questions regarding Starbench, please write a mail to our .
  • OpenCL H.264/AVC video decoder for GPUs.
    Our OpenCL h.264/AVC decoder offloads inverse transform and motion compensation onto OpenCL devices. To use it you need an OpenCL supported device, driver, and corresponding OpenCL SDK installed.
        *  OpenCL decoder
        * If you have any questions regarding h.264 OpenCL decoder, please write a mail to
    our .


This project receives funding from the European Community's Seventh Framework Programme [FP7/2007-2013] under the projects: ENCORE, grant agreement n° 248647, and LPGPU, grant agreement n°288653.


Chi Ching Chi and Ben Juurlink and C.H. Meenderinck (2010). Evaluation of Parallel H.264 Decoding Strategies for the Cell Broadband Engine. Proceedings International Conference on Supercomputing (ICS)

Chi Ching Chi and Ben Juurlink (2011). A QHD-Capable Parallel H.264 Decoder. Proceedings 25th International Conference on Supercomputing

Arnaldo Azevedo and Ben Juurlink and Cor Meenderinck and Andrei Terechko and Jan Hoogerbrugge and Mauricio Alvarez Mesa and Alex Ramírez and Mateo Valero (2011). A Highly Scalable Parallel Implementation of H.264. Transactions on High-Performance Embedded Architectures and Compilers IV. Springer Berlin Heidelberg, 111-134.

Mauricio Alvarez Mesa and Chi Ching Chi and Ben Juurlink and V. George and T. Schierl (2012). Parallel Video Decoding in the Emerging HEVC Standard. Proceedings of the 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2012

Chi Ching Chi and Mauricio Alvarez Mesa and Ben Juurlink and V. George and T. Schierl (2013). Improving the Parallelization Efficiency of HEVC Decoding. Proceedings of the 2012 International Conference on Image Processing (ICIP)

Ben Juurlink and Mauricio Alvarez-Mesa and Chi Ching Chi and Arnaldo Azevedo and Cor Meenderinck and Alex Ramirez (2012). Scalable Parallel Programming Applied to H.264/AVC Decoding. Springer.

Chi Ching Chi and Mauricio Alvarez-Mesa and Jan Lucas and Ben Juurlink and T. Schierl (2013). Parallel HEVC Decoding on Multi- and Many-core Architectures. A Power and Performance Analysis.. Journal of Signal Processing Systems

Chi Ching Chi and Mauricio Alvarez Mesa and Ben Juurlink and Clare, G. and Henry, F. and Pateux, S. and Schierl, T. (2012). Parallel Scalability and Efficiency of HEVC Parallelization Approaches. IEEE Transactions on Circuits and Systems for Video Technology

Benjamin Bross and Mauricio Alvarez-Mesa and Valeri George and Chi Ching Chi and Tobias Mayer and Ben Juurlink and Thomas Schierl (2013). HEVC Real-time Decoding. Proc. SPIE. Applications of Digital Image Processing XXXVI, 88561R-88561R-11.

Benjamin Bross and Valeri George and Mauricio Alvarez-Mesa and Tobias Mayer and Chi Ching Chi and Jens Brandenburg and Thomas Schierl and Detlev Marpe and Ben Juurlink (2013). HEVC Performance and Complexity for 4K Video. Proc. Third IEEE Int. Conf. on Consumer Electronics - Berlin (ICCE-Berlin), 44-47.

Matthias Göbel (2014). A High-Performance Hardware Accelerator for HEVC Motion Compensation. Proceedings of the Informatiktage 2014, 209-212.

Philipp Habermann (2014). Design and Implementation of a High-Throughput CABAC Hardware Accelerator for the HEVC Decoder. Lecture Notes in Informatics - Seminars, Informatiktage 2014, 213-216.

Philipp Habermann, Chi Ching Chi, Mauricio Alvarez-Mesa, Ben Juurlink (2015). Optimizing HEVC CABAC Decoding with a Context Model Cache and Application-specific Prefetching. Proceedings of the 11th IEEE International Symposium on Multimedia (ISM 2015), 429-434.

Philipp Habermann and Chi Ching Chi and Mauricio Alvarez-Mesa and Ben Juurlink (2017). Application-specific Cache and Prefetching for HEVC CABAC Decoding. IEEE Multimedia, Volume 24, Issue 1, Jan.-Mar. 2017, 72-85.

Philipp Habermann and Chi Ching Chi and Mauricio Alvarez-Mesa and Ben Juurlink (2017). Syntax Element Partitioning for high-throughput HEVC CABAC Decoding. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), 1308-1312.


Quick Access

Schnellnavigation zur Seite über Nummerneingabe