direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

Current Projects



High-Performance Computing (HPC) plays a fundamental role in enabling scientific progress, as improvements in many areas of science critically depend on advances in computational modeling and processing power. The next major milestone for the HPC community is the transition from peta to exascale, which imposes difficult research challenges such as programming models for scientific productivity, scalable system software design, and energy efficiency. We propose the CELERITY environment to support the effective development of energy- and performance- efficient, predictably scalable and easy-to-program parallel applications targeting large-scale homogenous and heterogeneous HPC clusters. [more]

LPGPU2: Low-power Parallel Computing on GPUs 2


Low-power GPUs have become ubiquitous. They can be found in domains ranging from wearable and mobile computing, to automotive systems. This places an ever increasing demand on the expected performance and power efficiency of the devices.
Future low-power system-on-chips will have to provide higher performance and be able to support more complex applications, without using additional power.
These demands cannot be met through hardware improvements alone, but the software must fully exploit the available resources. Unfortunately, application developers are seriously hindered when creating low-power GPU software by the limited quality of current performance analysis tools. In low-power GPU contexts there is only a minimal amount of performance information, and essentially no power information, available to the programmer. As software becomes more complex it becomes increasingly unmanageable for programmers to optimise the software for low-power devices.
This project proposes to aid the application developer in creating software for low-power GPUs by building on the results of the first LPGPU project by providing a complete performance and power analysis process for the
programmer. ....[more]

Reconfigurable Computing

Reconfigurable Computing devices, such as Field-Programmable Gate Arrays (FPGA), play an increasingly important role, both in Embedded Systems and in High-Performance Computing (HPC). Major industry players shift their focus towards FPGAs, most notably Intel when acquiring Altera, the second-largest FPGA manufacturer in the world, in 2015. Their high throughput, due to their inherent parallelism, coupled with their ability to be reconfigured to adapt to almost any application, makes FPGAs a great choice for a huge number of use cases. In this project line, we research the use of FPGAs for different applications, ranging from Machine Intelligence to Signal Processing, as well as operating system support and CAD tools for FPGAs... [more]

High Performance Video Coding

Our main research goal is to optimize video codecs such as H.264/AVC and HEVC/H.265 in order to obtain efficient implementations on contemporary computer architectures. We work on general algorithmic optimizations as well as optimizations for better use of hardware resources....[more]

Completed Projects

CompVision: Computer Vision Library for Reconfigurable Architectures

This project aims to develop a library of embedded components that can be used to develop various computer vision algorithms in embedded hardware.... [more]



Film265 delivers a comprehensive approach to support European small and medium VoD services with the most innovative technology in video delivery. It aims at providing them with the technological edge needed to compete in the international market of film distribution in the internet. The core of Film265 is to adapt a HEVC/H.265 video codec for VoD scenarios, including a decoder integrated into a video player and an encoder integrated into a transcoding application in the cloud. This development will be supported by a comparison of HEVC/H.265 with
current solutions that uses H.264/AVC for full length real-life film material. In order to assess feedback on the user demands and needs of the resulting video playouts Film265 will develop statistic tools that allow to determine the Quality of Experience by measuring buffering times, performance, delivery rate, the time watched etc. The tool will mainly consist of a plugin and a player that can be used by any VoD service around the globe. Finally, in order to create a fully market-ready solution, we will create APIs that allow all relevant industry participants to make use of what we have developed. . ....[more]

Nexus++: a Hardware Task Management System for Multicore Systems


Since the trend of improving processor’s performance has shifted toward in- tegrating more cores on the same chip, several parallel programming models have been proposed that aim at relieving parallel programming. Examples include Google’s MapReduce, Intel’s TBB, and OmpSs. ...[more]

Low-power Parallel Computing on GPUs


Massively parallel GPUs are now being used in a great variety of market segments, ranging from video-games, to user interfaces, and to HPC. There are several signs, however, that computer and consumer technology industries are faced with major challenges in delivering improved performance and innovation for future entertainment devices. First, game developers have argued that while GPUs are increasing in performance, this is not leading to visual quality improvements because GPUs fundamentally restrict their flexibility. Second, there are signs that GPUs are approaching a "power wall", and architecture innovation is required now to circumvent this wall. Third, there is a lack of GPU tools available to compare multi-core processors (CPUs) to GPUs and to perform GPU program transformations to optimize for performance and power. To address these challenges, this project brings together commercial tools, applications and GPU designers, with academic researchers to analyze real-world mass-market software on comparable graphics processor architectures. ....[more]

Enabling technologies for a programmable many-CORE


Design complexity and power density implications stopped the trend towards faster single-core processors. The current trend is to double the core count every 18 months, leading to chips with 100+ cores in 10-15 years. Developing parallel applications to harness such multicores is the key challenge for scalable computing systems. The ENCORE project aims at achieving a breakthrough on the usability, code portability, and performance scalability of such multicores. ... [more]



Very Long Instruction Word (VLIW) and so-called Transport Triggered Architectures (TTA) are potentially simpler and hence more power-efficient than superscalar architectures since they do not need hardware to detect instruction-level parallelism. We have developed an FPGA-prototype of a hybrid VLIW/TTA architecture named SynZEN...[more]



The CluMP! project was funded by the faculty IV to keep digital design knowledge in house and make it accessible to other faculty members without any experience in this area. 
The technical core foundation will be a tightly coupled FPGA based cluster with focus on low cost, low energy, flexibility and capabilities for academic research... [more]

ComponentC: A Parallel Programming Language for Developing Performance Portable Software

Multicore architectures increase the programming effort significantly. It is expected that future processors will contain more cores, have a heterogeneous architecture, and implement different memory models. These architectural features are currently visible to the programmer and dramatically increase the effort for creating performance portable software...[more]

Automatic loop vectorization


Every common processor architecture supports single-instruction multiple-data (SIMD) instructions, since SIMD instructions are potentially much more (power-) efficient than scalar instructions. However, auto-vectorizing compilers that exploit these instructions, such as the GCC compiler, do not achieve the same performance as handwritten code...[more]

Starbench parallel benchmark suite

In recent years a multitude of parallel programming models have been introduced to ease parallel programming. Each programming model brings its own concepts and semantics, which makes it hard to see their impact on performance. Starbench is a benchmark suite that allows comparing different parallel programming models for embedded and consumer applications. Starbench consist of C/C++ benchmarks and currently covers video coding, image compression, image processing, hashing, artificial intelligence, computer vision, and compression. For each of the benchmark an optimized Pthreads version has been developed to serve as baseline. ...[more]

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe