direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Abgeschlossene Projekte

Low-power Parallel Computing on GPUs

Lupe

Massively parallel GPUs are now being used in a great variety of market segments, ranging from video-games, to user interfaces, and to HPC. There are several signs, however, that computer and consumer technology industries are faced with major challenges in delivering improved performance and innovation for future entertainment devices. First, game developers have argued that while GPUs are increasing in performance, this is not leading to visual quality improvements because GPUs fundamentally restrict their flexibility. Second, there are signs that GPUs are approaching a "power wall", and architecture innovation is required now to circumvent this wall. Third, there is a lack of GPU tools available to compare multi-core processors (CPUs) to GPUs and to perform GPU program transformations to optimize for performance and power. To address these challenges, this project brings together commercial tools, applications and GPU designers, with academic researchers to analyze real-world mass-market software on comparable graphics processor architectures. ....[more]

Enabling technologies for a programmable many-CORE

ENCORE Logo
Lupe

Die wachsende Komplexität bei der Entwicklung sowie die enorme Leistungsaufnahme und Verlustleistung haben den Trend hin zu schnelleren Single-Core-Prozessoren zum Erliegen gebracht. Stattdessen verdoppelt sich aktuell die Anzahl an Prozessorkernen alle 18 Monate, hin zu Chips mit 100+ Kernen in 10-15 Jahren. Anwendungen zu schaffen, die solche Mengen an Prozessorkernen effizient nutzen, ist die entscheidende Herausforderung bei der Entwicklung von skalierbaren Rechnersystemen. Das ENCORE projec arbeitet darauf hin, einen Durchbruch zu erzielen, was die Nutzbarkeit, Code-Portierbarkeit und Performance solcher Multicoresysteme angeht... [mehr]

SynZEN

Lupe

Very Long Instruction Word (VLIW) and so-called Transport Triggered Architectures (TTA) are potentially simpler and hence more power-efficient than superscalar architectures since they do not need hardware to detect instruction-level parallelism. We have developed an FPGA-prototype of a hybrid VLIW/TTA architecture named SynZEN...[mehr]

CluMP!

Lupe

The CluMP! project was funded by the faculty IV to keep digital design knowledge in house and make it accessible to other faculty members without any experience in this area. 
The technical core foundation will be a tightly coupled FPGA based cluster with focus on low cost, low energy, flexibility and capabilities for academic research... [more]

ComponentC: A Parallel Programming Language for Developing Performance Portable Software

Multicore architectures increase the programming effort significantly. It is expected that future processors will contain more cores, have a heterogeneous architecture, and implement different memory models. These architectural features are currently visible to the programmer and dramatically increase the effort for creating performance portable software...[mehr]

Automatic loop vectorization

Lupe

Every common processor architecture supports single-instruction multiple-data (SIMD) instructions, since SIMD instructions are potentially much more (power-) efficient than scalar instructions. However, auto-vectorizing compilers that exploit these instructions, such as the GCC compiler, do not achieve the same performance as handwritten code...[mehr]

Starbench parallel benchmark suite

In recent years a multitude of parallel programming models have been introduced to ease parallel programming. Each programming model brings its own concepts and semantics, which makes it hard to see their impact on performance. Starbench is a benchmark suite that allows comparing different parallel programming models for embedded and consumer applications. Starbench consist of C/C++ benchmarks and currently covers video coding, image compression, image processing, hashing, artificial intelligence, computer vision, and compression. For each of the benchmark an optimized Pthreads version has been developed to serve as baseline. ...[mehr]

Zusatzinformationen / Extras

Direktzugang:

Schnellnavigation zur Seite über Nummerneingabe