direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Automatic Tuning on Embedded Processors



Modern computer architectures are extremely complex: multiple levels of caches, different forms of parallelism, and power constraints have made this complexity hard to harness for traditional compilers. Due to their limitations, alternative approaches based on Autotuners have gained popularity as an effective method to produce high-quality portable scientific code. Autotuners optimize a set of library kernels by generating many variants of a given kernel and by benchmarking each variant by running on the target platform.

The aim of this project is to implement an automatic tuning approach for embedded processors. The target application is a well-known, hard-to-tune problem: stencil computation. The student will have to write a set of optimized stencil programs by exploiting both multi-threading and NEON intrinsics for ARM-based embedded processors.

Keywords: automatic tuning, embedded processor, NEON, stencil computations

Required Skills


Desired Skills

Knowledge about SIMD intrinsics

Contact Persons


  1. M. Christen, O. Schenk, H. Burkhart, "PATUS: A Code Generation and Autotuning Framework For Parallel Iterative Stencil Computations on Modern Microarchitectures," IPDPS ’11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium, pp 676-687
  2. PATUS Quickstart (Installation and usage manual of the Patus stencil compiler), Matthias-M. Christen, University of Lugano, Switzerland, 2012

Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe