Sie sind hier

# Older Entries

## 13.12.2017: Tamer Dallou successfully completed his PhD defense.

M.Sc. Tamer Dallou successfully completed his PhD defense on Wednesday 13th December 2017. His thesis title was: "Enhancing the Scalability of Many-core Systems – Towards Utilizing Fine-Grain Parallelism in Task-Based Programming Models

Congratulation Dr. Dallou for your success and we wish you the best in your future!

## 11.12.2017: Paper from AES/CERN collaboration accepted at PDP 2018.

The paper “Accelerating the RICH Particle Detector Algorithm on Intel Xeon Phi” written by Christina Quast, Angela Pohl, Biagio Cosenza, Ben Juurlink, as well as Rainer Schwemmer (CERN) was accepted at the 26th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP 2018).

In the paper, the authors show how an algorithm for particle classification was sped up on an Intel Xeon Phi platform using multiple optimization techniques. The work will presented as a full paper in the “GPU and Many Integrated Core” special session.

## 13.11.2017: AES paper accepted at DATE 2018 conference.

The paper "Optimal DC/AC Data Bus Inversion Coding" by Jan Lucas, Sohan Lal and Ben Juurlink has been accepted as regular paper at the DATE 2018 conference. In the paper a new method for data encoding is presented that reduces the energy consumption of the data transfer between CPU or GPU by up to 6%.  DATE (Design, Automation and Test in Europe) will held in March at the International Congress Center Dresden. The selection process was very competitive with an acceptance rate of regular papers of only 23.7%. More information can be found at https://www.date-conference.com/.

## 03.11.2017: AES paper accepted at CGO 2018.

The paper "Local Memory-Aware Kernel Perforation" by Daniel Maier, Biagio Cosenza and Ben Juurlink has been accepted at the International Symposium on Code Generation and Optimization (CGO 2018) that will be held in Vienna, Austria.

In the paper, the authors propose a new local memory-aware approach designed for the approximation and acceleration of GPU applications. Experimental evaluations show that the technique accelerates applications from different domains while introducing only a small and acceptable error.

The International Symposium on Code Generation and Optimization (CGO) provides a premier venue to bring together researchers and practitioners working at the interface of hardware and software on a wide range of optimization and code generation techniques and related issues.

## 20.10.17: AES Best-in-Class-Awards Summer 2017.

The Embedded Systems Architecture Group has given Best-in-Class Awards to the best students of the 2017 summer term. Kai Norman Clasen (Hot Topics in Embedded Systems), Leonard Wayne Hackel (Embedded Systems Architecture) and Rafael Fritsch (Advanced Computer Architectures) have shown outstanding performances in the respective courses. Congratulations!!!

## 18.10.2017: Ben Juurlink lead guest editor of special issue of International Journal of Reconfigurable Computing.

Prof. Ben Juurlink is lead guest editor of the special issue on “Approximating (Deep) Neural Networks and Approximate Computing Using Reconfigurable Hardware” of the International Journal of Reconfigurable Computing. The International Journal of Reconfigurable Computing is published by Hindawi, which is one of the is one of the world’s largest publishers of peer-reviewed, fully Open Access journals. Among others, the EU is strongly in support of Open Access publications. Guest editors of this special issue are Georgios Keramides of Think Silicon, Patras, Greece; Stephan Wong of TU Delft in the Netherlands; Antonio Beck of Federal University of Rio Grande do Sul in Porto Alegre in Brazil; and Chao Wang of the University of Science and Technology China in Suzhou, China. The Call for Papers can be found here.

## 11.10.2017: Ben Juurlink at LPCP workshop.

Ben Juurlink is currently in College Station, Texas attending the 30th International Workshop on Languages and Compilers for Parallel Computing (LPCP). He will present the paper “Auto-Vectorization in C/C++ Compilers: the Impact of Loop-Level Vectorization and Superword Level Parallelism” co-authored by Angela Pohl, Biagio Cosenza and Ben Juurlink, which has been accepted as a poster presentation, as well as give an invited talk about "Autotuning Stencil Computations with Structural Ordinal Regression Learning”.

## 02.10.2017- 10:00h: Workshop "The Future of Computer Architecture".

• Title:  The Future of Computer Architecture.
• Presenters: Ben Juurlink, Leonel Sousa, Georgios Keramides and Jan Kuper.
• Date and time: Monday, October 2, 2017, 10:00 - 12:15 am.
• Room: MAR 0.011, MAR building, Marchstr. 23, 10587 Berlin

Program:

 Time Speaker Affiliation Talk's Title 10:00 Prof. Ben Juurlink TU Berlin Welcoming Note 10:05 Prof. Leonel Sousa Universidade de Lisboa Cache-aware Roofline: Modeling CPU/GPU upper-bounds for performance, power and energy-efficiency 10:40 Dr. Georgios Keramides Think Silicon Ultralow-Power 3D Micro-GPU for IoT-Class Devices 11:15 Break 11:30 Dr. Jan Kuper University of Twente FPGA design using CλaSH

## 22.09.2017: New AES-group member: Farzaneh Salehiminapour.

We are pleased to welcome Farzaneh Salehiminapour as new member of the AES group.
Welcome Farzaneh!

## 22.09.2017: Matthias Göbel attends Deep Learning On-Chip Summer School in Turin.

Matthias Göbel from AES attends the Deep Learning On-Chip Summer School at the Politecnico de Torino in Turin, Italy. The event covers cutting-edge developments in the field of deep learning, with a special focus on hardware implementations of such techniques. It brings together researchers from all over the world to discuss recent advances and features famous speakers. More information can be found at http://www.macloc.org/.

## 17.09.2017: Ben Juurlink presents LPGPU2 research results at ScalPerf workshop.

Ben Juurlink has been invited to the ScalPerf workshop to present recent research results. The ScalPerf’17 workshop is held in Bertinoro, Italy from September 17 to September 22 and this 15th edition focuses on  "Storage and Memory Issues in Computing Systems.” Ben Juurlink will present the work "E²MC: Entropy Encoding Based Memory Compression for GPUs” which has been carried out in the context of the LPGPU2 project and which has been previously presented at the IPDPS conference.

## 11.09.2017: AES at the ARM Research Summit 2017 in Cambridge.

Biagio Cosenza from AES is speaker at the ARM Research Summit 2017 in Cambridge, UK. The Arm Research Summit is an academic summit to discuss future trends and disruptive technologies across all sectors of computing. It will take place in Cambridge over the days of 11-13 September 2017, and will be hosted at Robinson College. The Summit includes talks from the leaders in their research fields, demonstrations, networking opportunities and the chance to interact and discuss projects with members of Arm Research.

Dr. Cosenza will present on Wednesday 13 September our latest research results on Stencil Autotuning with Ordinal Regression.

List of speakers: https://developer.arm.com/research/summit/speakers
Agenda: https://developer.arm.com/research/summit/agenda#wednesday-13-sept

## 04.09.2017: New AES-group member: Kaijie Fan.

We are pleased to welcome Kaijie Fan as new member of the AES group. Kaijie is the recipient of a CSC Scholarship and will work on programming models for parallel architectures.

Welcome Kaijie!

## 28.08.2017: AES at DSD 2017.

Matthias Göbel from AES is attending the 2017 Euromicro Conference on Digital System Design (DSD 2017). The conference will be held this year in Vienna, Austria from Aug 30th to Sept 1st. The Euromicro Conference on Digital System Design “addresses all aspects of (embedded, pervasive and high-performance) digital and mixed HW/SW system engineering, covering the whole design trajectory from specification down to micro-architectures, digital circuits and VLSI implementations.”

He will present the paper “A Methodology for Predicting Application-specific Achievable Memory Bandwidth for HW/SW-Codesign” by Matthias Göbel. Ahmed Elhossini and Ben Juurlink. In this work, we present a methodology that assists the designer in making well-founded design decisions for FPGA-SoC-based systems using shared DDR memory by predicting the achievable memory bandwidth for various implementations.

## 03.08.2017: PEGPUM 2018 workshop proposal accepted.

The 6th LPGPU2 Workshop on Power-Efficient GPU and Many-core Computing has been accepted by the HiPEAC 2018 organization committee as a full day workshop. The workshop will consists of a mix of (a) presentations of short papers submitted in response to the call for papers and reviewed by the organizers, (b) presentations of LPGPU2 consortium members of project results, (c) demonstrations and short tutorials by LPGPU2 consortium members on how to use the tools developed in the LPGPU2 project, and (d) one invited talk by a key person from the mobile or high-performance GPU industry. The organizers and PC members are Ben Juurlink from TU Berlin, Germany, Jan Lucas from TU Berlin, Germany, Martyn Bliss from Samsung Research UK, Georgios Keramides from Think Silicon Ltd, Greece, Henk Corporaal from TU Eindhoven, Netherlands, Paul Keir from University of the West of Scotland, UK, and Ana Lucia Varbanescu from the University of Amsterdam, The Netherlands. The HiPEAC 2018 conference will take place in Machester, UK.

## Thursday July 20, 2017, 10:30, EN 643/644: "Perceptron Learning for Reuse Prediction". Elvira Teran.

The disparity between last-level cache and memory latencies motivates the search for efficient cache management policies. Recent work in predicting reuse of cache blocks enables optimizations that significantly improve cache performance and efficiency. However, the accuracy of the prediction mechanisms limits the scope of optimization. This paper proposes perceptron learning for reuse prediction. The proposed predictor greatly improves accuracy over previous work. For multi- programmed workloads, the average false positive rate of the proposed predictor is 3.2%, while sampling dead block prediction (SDBP) and signature-based hit prediction (SHiP) yield false positive rates above 7%. The improvement in accuracy translates directly into performance. For single-thread workloads and a 4MB last- level cache, reuse prediction with perceptron learning enables a replacement and bypass optimization to achieve a geometric mean speedup of 6.1%, compared with 3.8% for SHiP and 3.5% for SDBP on the SPEC CPU 2006 benchmarks. On a memory-intensive subset of SPEC, perceptron learning yields 18.3% speedup, versus 10.5% for SHiP and 7.7% for SDBP. For multi- programmed workloads and a 16MB cache, the proposed technique doubles the efficiency of the cache over LRU and yields a geometric mean normalized weighted speedup of 7.4%, compared with 4.4% for SHiP and 4.2% for SDBP.

Elvira is a Ph.D. Candidate from Texas A&M University. She received a B.S. in Computer Science from The University of Texas at San Antonio. During her studies she has worked under the supervision of Prof. Daniel A. Jiménez. Her research focus on improving performance in the memory hierarchy. She will be graduating this coming August.

• Title: Perceptron Learning for Reuse Prediction
• Presenter: Elvira Teran - Texas A&M University
• Date and time: Thursday July 20, 2017, 10:30 - 11:30
• Room: E-N 643/644 (AES seminar room)

## 19.07.2017: Dr. Cosenza to moderate a panel on "Programming Models for the ExaScale Era" at HPCS 2017.

Dr. Biagio Cosenza will be at the IEEE International Conference on High Performance Computing & Simulation (HPCS 2017) in Genoa, Italy.
On Wednesday, 19 July, he will moderate the panel on "Programming Models for the Exascale Era". The panel will be held in the Aurea Ballroom of the Grand Hotel Savoia at 16:30.

Panelists are:

• Marco Aldinucci (University of Turin, Italy )
• Ronald B. Brightwell (Sandia National Laboratories, New Mexico, USA)
• Paul C. Messina (Argonne National Labs and Exascale Project Director, Illinois, USA)
• David W. Walker (Cardiff University, U.K.)

## 17.07.2017: AES Research Presented at ACACES 2017.

Philipp Habermann, Angela Pohl and Ben Juurlink presented their current research during Wednesday’s poster session at the ACACES summer school in Fiuggi, Italy. They showcased their work about  Waveform Processing for HEVC Decoding, Performance Prediction in the LLVM Compiler, and the LPGPU2 project.

## 14.07.2017: AES paper accepted in the ICSTCC 2017 international conference.

Proposed Memory Organization

The AES paper titles "A Memory Architecture for Data Access Patterns in Multimedia Applications" authored by Tareq Alawneh and Ahmed Elhossini, was accepted for publication as full paper at the 21st International Conference on System Theory, Control and Computing to be held in Sinaia, Romania during October 19-21, 2017. The paper proposes a new memory organization that exploits 2D, stride and sequential data access patterns in multimedia applications. This memory organization aims at reducing the memory access latency, lowering the number of memory accesses and utilizing the memory bandwidth efficiently.

ICSTCC 2017 aims at bringing together under a unique forum, scientists from academia and industry to discuss the state of the art and the new trends in system theory, control and computer engineering, and to present recent research results and prospects for development in this rapidly evolving area.

## 12.07.2017: Prof. Dr. Ben Juurlink senior member of the ACM.

The ACM Senior Member Committee has accepted Prof. Juurlink’s nomination as Senior Member of the ACM. His endorsers were Prof. Per Stenström of Chalmers, Dr. Georgios Keramides - CTO of Think Silicon - and Dr. Alex Ramirez of Google. Successful candidates for Senior Member must have demonstrated performance that sets them apart from their peers. In general, this will be reflected in one or more of the following:

• Technical contributions: publications in refereed journals or conference proceedings, textbooks, success product engineering/development, patents, standards, etc.
• Professional contributions: service to professional societies, review committees, conference committees, standards committees, etc.

## 18.06.2017: AES at ISC High Performance 2017.

Jan Lucas from AES represented LPGPU2 at ISC High Performance conference which was held from 18th June - 22nd June at Frankfurt, Germany.
Jan demonstrated the power measurement testbed developed by TUB for LPGPU2 and gave an update of the research done by the consortium at the HiPEAC booth on 19th of June.

ISC High Performance focuses on HPC technological development and its application in scientific fields, as well as its adoption in commercial environments.

## 12.06.17: AES-paper accepted at DSD 2017.

The paper "A Methodology for Predicting Application-specific Achievable Memory Bandwidth for HW/SW-Codesign" by Matthias Göbel, Ahmed Elhossini and Ben Juurlink has been accepted at DSD 2017 as a four-page paper. In this paper, we present a methodology that assists the designer in making good design decisions for FPGA-SoC-based systems using shared DDR memory for communication. Our methodology analyzes a software implementation of the application, generates a trace of the memory accesses of one function to be implemented in hardware and subsequently predicts the memory accesses of a functionally equivalent hardware implementation of the selected function. We furthermore propose an IP core that can perform these predicted memory accesses to estimate the achievable memory bandwidth between a functionally equivalent hardware implementation and shared memory. The resulting achievable memory bandwidth estimations demonstrate the feasibility of the presented methodology.

The Euromicro Conference on Digital System Design (DSD) addresses all aspects of (embedded, pervasive and high-performance) digital and mixed HW/SW system engineering, covering the whole design trajectory from specification down to micro-architectures, digital circuits and VLSI implementations. It is a forum for researchers and engineers from academia and industry working on advanced investigations, developments and applications. The 20th edition will be held in Vienna, Austria between Aug 30th and Sep 1st.

## 12.06.2017: AES at SCOPES.

Prof. Ben Juurlink and Dr. Biagio Cosenza from AES are attending the 20th International Workshop on Software and Compilers for Embedded Systems (SCOPES) which is being held on June 12th and 13th, 2017 in Sankt Goar (Germany). Juurlink will present "The LPGPU2 Project - Low-Power Parallel Computing on GPUs", which is authored by Ben Juurlink, Martyn Bliss, Georgios Keramides and Jan Lucas. Cosenza will present "Stencil Autotuning with Ordinal Regression", which is authored by Biagio Cosenza, Juan Durillo, Stefano Ermon and Ben Juurlink.

The SCOPES workshop focuses on the software generation process for modern embedded systems. Topics of interest include all aspects of the compilation and mapping process of embedded single and multi-processor
systems.

## Thursday June 1, 2017, 14:00 - 15:00, "Dataflow programming for manycores". Prof. Dr.-Ing. Jeronimo Castrillon.

Title: Dataflow programming for manycores

Presenter: Prof. Dr.-Ing. Jeronimo Castrillon - TU Dresden

Date and time: Thursday June 1, 2017, 14:00 - 15:00

Room: MA 005

Dataflow-based programming has proven to be a good programming model for heterogeneous multi-processor systems on chip in the signal processing and multimedia domains. This is due to a clear separation of computation and communication, well-defined semantics and a strict distributed state. This talk provides an overview of the MAPS framework for mapping dataflow applications to manycores. It then delves into the details of recently proposed techniques for improving tool scalability, and adaptability and robustness of the computed mappings. This includes (i) a mathematical way to exploit symmetries in the problem formulation to reduce the design space, (ii) a runtime approach to select and execute variants of an application under resource constraints, and (iii) an algorithmic approach based on design centering to improve mapping robustness.

Jeronimo Castrillon is a professor in the Department of Computer Science at the TU Dresden, where he is also affiliated with the Center for Advancing Electronics Dresden (CfAED). He received the Electronics Engineering degree from the Pontificia Bolivariana University in Colombia in 2004, the master degree from the ALaRI Institute in Switzerland in 2006 and the Ph.D. degree (Dr.-Ing.) with honors from the RWTH Aachen University in Germany in 2013. His research interests lie on methodologies, languages, tools and algorithms for programming complex computing systems. He has more than 50 international publications and has been a member of technical program and organization committees in international conferences and workshops (e.g.,  DATE, Computing Frontiers, CGO, FPL, ICCS and ESWeek). He is also a regular reviewer for ACM and IEEE journals (e.g., IEEE TCAD, IEEE TPDS, ACM TODAES and ACM TECS). In 2014 Prof. Castrillon co-founded Silexica GmbH, a company that provides programming tools for embedded multicore architectures.

## 29.05.2017: AES at IPDPS 2017.

Biagio Cosenza and Sohan Lal from AES are attending the IEEE International Parallel & Distributed Processing Symposium (IPDPS) which is being held from 29th May to 02 June at Orlando, Florida, USA.
Biagio will present the paper "Autotuning Stencil Computations with Structural Ordinal Regression Learning" which is authored by Biagio Cosenza, Juan Durillo, Stefano Ermon and Ben Juurlink.
Sohan will present the paper "E²MC: Entropy Encoding Based Memory Compression for GPUs" which is authored by Sohan Lal, Jan Lucas and Ben Juurlink.

IPDPS serves as an international forum for engineers and scientists from around the world to present their latest research findings in all aspects of parallel and distributed computing.
It is a premier conference in its field and serves as the flagship event of the IEEE Computer Society Technical Committee on Parallel Processing and is co-sponsored by the ACM-SIGARCH.

## 2.5.2017: AES LPGPU2 team Introduces themselves on LPGPU2 website.

AES LPGPU2 team introduces themselves on LPGPU2 website in a getting to know initiative.
In this initiative every partner of the LPGPU2 consortium posts short bio of every member of their team.
http://lpgpu.org/wp/

## 26.04.2017: PC memberships Ben Juurlink

Prof. Ben Juurlink, head of the AES research group, currently is a member of several program committees:

• SBAC-PAD (Int. Conf. on Computer Architecture and High Performance Computing)
• SCOPES (Int. Workshop on Software and Compilers for Embedded Systems)
• SAMOS (Int. Conf. on Embedded Computer Systems: Architectures, Modeling, and Simulation) and
• PARS (Workshop of the special interest group on parallel algorithms, parallel computer structures and parallel system software within the German Informatics Societies (GI/ITG))
• Euromicro DSD (Euromicro Conference on Digital System Design)

## 24.04.2017: Prof. Castrillon of TU Dresden guest lecturer for an AES course.

We are very pleased to welcome Prof. Jeronimo Castrillon as a guest lecturer for the embedded systems architecture course during this summer. His compiler expertise will add a valuable contribution to the group's curricular activities.

## 07.04.2017: Matthias Göbel, Ahmed Elhossini and Ben Juurlink receive Best Paper Award at ARC 2017

Matthias Göbel, Ahmed Elhossini and Ben Juurlink received the Best Paper Award at the 13th International Symposium on Applied Reconfigurable Computing (ARC 2017) in Delft, NL for their paper "A Quantitative Analysis of the Memory Architecture of FPGA-SoCs". The work is co-authored by Chi Ching Chi and Mauricio Alvarez-Mesa of Spin Digital, a spin-off of AES.

In this paper, we analyze the various memory and communication interconnects found in FPGA-SoCs, particularly the Zynq-7020 and Zynq-7045 from Xilinx and the Cyclone V SE SoC from Intel. Issues such as different access patterns, cache coherence and full-duplex communication are analyzed, for both generic accesses as well as for a real workload from the field of video coding. Furthermore, the paper shows that by carefully choosing the memory interconnect networks as well as the software interface, high-speed memory access can be achieved for various scenarios.

## 07.04.2017 Philipp Habermann and Angela Pohl Receive Student Grants for ACACES 2017

Philipp Habermann and Angela Pohl have been admitted to the "Thirteenth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems" (ACACES 2017); both were awarded student grants. The summer school is organized by the HiPEAC Network of Excellence. It is a one week summer school for computer architects and tool builders working in the field of high performance computer architecture and compilation for computing systems. The school aims at the dissemination of advanced scientific knowledge and the promotion of international contacts among scientists from academia and industry.

ACACES 2017 will take place in Fiuggi, Italy, from July 9th to July 15th 2017.

## 15.03.2017: AES supports Open Access.

In the realm of Open Access, AES is happy to announce that we are making our publications openly accessible to the public. We have submitted our publications to the TU-Berlin DepositOnce repository. While the IEEE publications are already available, the ACM publications will follow soon. For a list of the available publications, please see our DepositOnce repository https://depositonce.tu-berlin.de/handle/11303/6101.

## 28.02.2017: AES Finals in Full Swing.

The winter term is coming to an end, and it's time for finals! Today, more than 900 freshmen take the Computer Organisation exam, taught by Prof. Ben Juurlink. As you can see in the picture, our TAs spend the last weeks with extensive tutoring to prepare everybody for a successful exam. We wisth the best of luck to all our AES students!

## 02.02.2017: AES at Intel AI Technical workshop in Munich.

Dr. Ahmed Elhossini is attending Intel AI Technical workshop, which is held at the same time as OOP 2017, Munich. This is a full day technical workshop for software developers and data scientists. The workshop will be conducted by technical experts from Intel covering following topics:

• Intel Software development tools and libraries for Machine Learning and Deep Learning

• Optimized Deep Learning frameworks for Intel Architecture

• Tools and frameworks from Nervana Systems

## 16 January 2017: Two AES-papers accepted at IPDPS 2017.

The first paper "E²MC: Entropy Encoding based Memory Compression for GPUs" is authored by Sohan Lal, Jan Lucas and Ben Juurlink. This paper proposes an entropy encoding based memory compression technique for GPUs. The proposed compression technique addressed the key challenges of probability estimation, choosing an appropriate symbol length for encoding, and decompression with low latency. It achieves higher compression ratio and performance gain compared to the state of the art.

The second paper "Autotuning Stencil Computations with Structural Ordinal Regression Learning" is authored by Biagio Cosenza, Juan Durillo, Stefano Ermon and Ben Juurlink. The paper is a collaboration between TU Berlin, the University of Innsbruck and Stanford University on new machine learning methodologies for automatic tuning. It proposes a new way of automatically tuning stencil computations based on structural learning. Experimental evaluations show that even with a small training set consisting of a few thousand points, our approach performs close to the sub-optimal solution of the problem.

IPDPS is an international conference for engineers and scientists from around the world to present their latest research findings in all aspects of parallel computation. IPDPS is sponsored by the IEEE Computer Society's Technical Committee on Parallel Processing. The 31st edition of the IPDPS will be held from May 29-June2, 2017 at Orlando, Florida, USA.

## 9 January 2017: AES-paper accepted at CC 2017.

The paper "Static Optimization in PHP 7" by Nikita Popov, Biagio Cosenza, Ben Juurlink, and Dmitry Stogov has been accepted at the International Conference on Compiler Construction (CC) 2017.

This work is performed in cooperation between the AES group at Technische Universität Berlin and Zend Technologies, and reports on the implementation of purely static bytecode optimizations for PHP 7. The paper presents the adaption of SSA form for use in PHP and type inference in combination with other typical optimizations such as constant propagation and dead code elimination. The evaluation includes both micro-benchmarks and web frameworks such as MediaWiki and WordPress.

The International Conference on Compiler Construction (CC) is interested in work on processing programs in the most general sense: analyzing, transforming or executing input that describes how a system operates, including traditional compiler construction as a special case. CC 2017 is the 26th edition of the conference. It will be co-located with CGO, HPCA, and PPoPP and take place Feb 5-6 in Austin, TX, USA.

Dr. Cosenza will be in Austin, Texas to present this paper on Sunday, 5 February at 14:45 (see info below).

## 2 January 2017: AES-paper accepted at International Journal of Parallel Programming.

The paper "GPU Parallelization of HEVC In-loop Filters" by Biao Wang, Diego F. de Souza, Mauricio Alvarez-Mesa, Chi Ching Ching, Ben Juurlink, Aleksandar Ilic, Nuno Roma, and Leonel Sousa has been accepted for publication in the International Journal of Parallel Programming. This work is performed in cooperation between AES group in Technische Universität Berlin and INESC-ID group in Universidade de Lisboa. It proposes a more efficient GPU parallelization for HEVC in-loop filters.  When compared to the state of the art, an improved work flow and  a more effective thread mapping have been implemented.

International Journal of Parallel Programming is a forum for the publication of peer-reviewed, high-quality original papers in the computer and information sciences, focusing specifically on programming aspects of parallel computing systems. The journal publishes both original research and survey papers. Fields of interest include: software engineering aspects, advances in parallel algorithms, performance studies, application studies, and so on.

## 22 December 2016: AES-paper accepted at ARC 2017.

The paper "A Quantitative Analysis of the Memory Architecture of FPGA-SoCs" by Matthias Göbel, Ahmed Elhossini, Chi Ching Chi, Mauricio Alvarez-Mesa and Ben Juurlink has been accepted at ARC 2017. In this paper, we analyze the memory architectures of different FPGA-SoCs regarding the available bandwidth. Issues such as different access patterns, cache coherence and full-duplex communication are analyzed, both for generic accesses as well as for a real workload from the field of Video Coding. The devices are compared and their strengths are highlighted. Furthermore, the paper shows that by carefully choosing the memory interconnect networks as well as the software interface, high-speed memory access can be achieved for various scenarios.

ARC aims at bringing together researchers and practitioners of reconfigurable computing with an emphasis on practical applications of this promising technology. The symposium will be held this year in Delft, The Netherlands from April 3rd to April 7th 2017.

## 17-20 December 2016: AES at ICM 2016.

Dr. Ahmed Elhossini is attending The International Conference on Microelectronics. The conference has been held in numerous countries across the Southern Europe and Western and Southern Asia for the past 27 years. The 28th edition of the conference will be held in Cairo, Egypt. ICM 2016 is technically sponsored by IEEE, Circuits and systems Society, IEEE region 8, IEEE Egypt section and IEEE CAS Egypt chapter.

Dr. Ahmed is going to present the paper titled "A Data Access Prediction Unit for Multimedia Applications" by Tareq Alawneh and Ahmed Elhossini. In this paper, we proposed a data access prediction unit for multimedia applications. Integrating the proposed unit in the architecture of modern processors can yield significant performance.

For more information refer to: http://www.ieeeicm2016.org

## December 2016: AES-paper at FPT-2016.

Dr. Ahmed Elhossini is attending the International Conference of Field Programmable Technology, FPT 2016. FPT is the premier conference in the Asia-Pacific region on field-programmable technologies including reconfigurable computing devices and systems containing such components. The conference will be held this year in the city of Xi'an, China between the 7th to the 9th of December, 2016.

Dr. Elhossini will present The paper "FPGA based Hardware Accelerator for KAZE Feature Extraction Algorithm" by Lester Kalms, Ahmed Elhossini, and Ben Juurlink that was accepted as a short paper. The paper presents a hardware accelerator for KAZE feature extraction algorithm. Feature extraction is an important stage in various computer vision systems. This work allow real-time performance for various computer vision systems such as facial recognition. For more information refer to: http://www.icfpt2016.org/

## 20. October 2016: AES-paper accepted at ICM2016.

The paper "A Data Access Prediction Unit for Multimedia Applications" by Tareq Alawneh and Ahmed Elhossini has been accepted in the International Conference on Microelectronics/ICM2016. In this paper, we propose a data access prediction unit for multimedia applications. Integrating the proposed unit in the architecture of modern processors can yield a significant performance improvement.

The International Conference on Microelectronics has been held in numerous countries across the Southern Europe and Western and Southern Asia for the past 27 years. The 28th edition of the conference will be held in Cairo, Egypt. ICM 2016 is technically sponsored by IEEE, Circuits and systems Society, IEEE region 8, IEEE Egypt section and IEEE CAS Egypt chapter. For more information refer to: http://www.ieeeicm2016.org.

## 06. October 2016: AES-paper accepted at FPT-2016.

The paper "FPGA based Hardware Accelerator for KAZE Feature Extraction Algorithm" by Lester Kalms, Ahmed Elhossini, and Ben Juurlink has been accepted at FPT-2016 as a short paper. The paper presents a hardware accelerator for KAZE feature extraction algorithm. Feature extraction is an important stage in various computer vision systems. This work allow real-time performance for various computer vision systems such as facial recognition.

FPT is the premier conference in the Asia-Pacific region on field-programmable technologies including reconfigurable computing devices and systems containing such components. The conference will be held this year in the city of Xi'an, China between the 7th to the 9th of December, 2016. For more information refer to: http://www.icfpt2016.org/

## September 19th: AES at MASCOTS 2016.

On September 19th Jan Lucas will present the paper "ALUPower: Data Dependent Power Consumption in GPUs" by Jan Lucas and Ben Juurlink at MASCOTS 2016. In his talk he will describe how value impact the power consumption of GPU ALUs and how this power consumption can be modeled for more accurate GPU power models. These more precise power models enable new innovations in GPU architecture as they allow researchers to discover new architectural improvements to reduce the power consumption of GPUs.

## July 18-22, 2016: Biagio Cosenza to moderate a Conference Panel on Resiliency for ExaScale at HPCS 2016.

Dr. Cosenza will be in Innsbruck, Austria, at the IEEE International Conference on High Performance Computing & Simulation (HPCS 2016).
On Tuesday, 19 July, 2016, he will moderate the panel "Resiliency in Extreme Scale High Performance Computing Systems and Applications" (http://hpcs2016.cisedu.info/4-program/hpcs2016-panels).
The panel will be held in the Casineum Ballroom of the Hilton Innsbruck Hotel at 15:55.
Panelists are:

• Vassil Alexandrov, Barcelona Supercomputing Center, Spain
• Thomas Ropars, Universite Grenoble Alpes, France
• Lorenzo Strigini, City University of London, U.K.

## 30. June 2016: AES-paper accepted at MMSP 2016.

The paper "Efficient HEVC Decoder for Heterogeneous CPU with GPU Systems" by Biao Wang, Diego F. de Souza, Mauricio Alvarez-Mesa, Chi Ching Ching, Ben Juurlink, Aleksandar Ilic, Nuno Roma, and Leonel Sousa has been accepted at MMSP 2016. The work is done in cooperation between AES group in Technische Universität Berlin and INESC-ID group in Universidade de Lisboa. It proposes an efficient HEVC decoding scheme where both CPU and GPU are employed to perform HEVC decoding in parallel.

MMSP 2016 is the 18th International Workshop on Multimedia Signal Processing.
The workshop will take place from 21 to 23 September 2016 in the city of Montreal in Canada.
The workshop is organized by the Multimedia Signal Processing Technical Committee of the IEEE Signal Processing Society.
This year’s event has a theme of Enhancing the Multimedia Experience in the 21st Century.
The goal of this workshop is to bring experts from different domains, such as multimedia, engineering, and computer science, to discuss ways of enhancing the multimedia experience in the 21st century.

## 27. June 2016: Film265 project finishes with significant advances on video codecs for online video delivery.

The European consortium Film265 has completed an 18-month innovation project on new video codec technologies for online video delivery for the film industry. The EU funded project has resulted in several advances of video coding, helping European VoD providers to have the tools and information required to deploy a new generation of online video services with higher quality, lower bandwidth, and better understanding of the QoE effects of video codecs.

The AES  team of TU Berlin has been the project coordinator of Film265, and has developed the H.265 codec that formed the basis of the project. The technology developed in Film265 has been transferred to Spin Digital (spin-off of TU Berlin) for its inclusion in a new generation of video codec products.

http://www.film265.eu/press-releases/PR_Film265_finalPressRelease_27.06.16.pdf

## 21. June 2016: AES-Paper accepted for IEEE Multimedia Magazine.

The paper "Application-specific Cache and Prefetching for HEVC CABAC Decoding" by Philipp Habermann, Chi Ching Chi, Mauricio Alvarez-Mesa and Ben Juurlink has been accepted for publication in the IEEE Multimedia Magazine.

The authors provide a design space exploration of different cache configurations for HEVC CABAC hardware decoding. It is demonstrated that the decoder throughput can be significantly increased when a cache replaces a bigger context model memory in the critical data path. Furthermore, it is shown that the cache miss rate can be effectively reduced with an application-specific prefetching algorithm and the corresponding optimized memory layout, up to the point where it is not noticeable anymore. The proposed CABAC decoder allows the decoding of high quality Full HD videos in real-time using few hardware resources on a low-power FPGA.

## 17. June 2016: AES-Paper accepted at MASCOTS 2016.

The paper "ALUPower: Data Dependent Power Consumption in GPUs" by Jan Lucas and Ben Juurlink has been accepted at MASCOTS 2016. It describes in detail how data values influence the power consumption of GPUs and how the power consumption can be modelled.

The MASCOTS conference is a well-established forum for state-of-the-art research on the measurement, modeling, and performance analysis of computer systems and networks. The 24th edition of this conference will take place september 21-23, 2016 in Imperial College Campus, London. The conference will bring together academics and industry practitioners to present and discuss their latest research results. The technical program for the 3-day conference will include keynote talks, refereed full and work-in-progress papers.

## 17.05.16: TU Berlin and Spin Digital to demonstrate state-of-the-art video codecs in Cannes.

TUB AES and Spin Digital will be part of a Panel at the Cannes Film Festival 2016 presenting a demonstration of an ultra- high quality HEVC codec for VoD applications for the film industry. The panel and demonstration will take place on May 17th at 4:00 pm  at the Palais des Festivals et des Congrès de Cannes.

The Film265 consortium will present a H.265 codec optimized for VoD applications at the NEXT pavilion of the Cannes Film Festival. The presentation consists of a panel discussion and a demonstration screening exhibiting the superior coding performance improvements. In particular, a comparison will be performed between the current industry standard H.264/AVC (used is most current VoD services), and the new generation codec HEVC/H.265, which is specially designed for HD and UHD content.

In the panel DPs, post-production companies, video codec experts, and film distributors will share their experiences about video compression and its effects on the final user experience.

Screening and Panel: May 17th: 4.p.m, Palais I (Palais des Festivals et des Congrès de Cannes)

Panel speakers:

• Patrice Carré (Moderator): Film director, producer, director of photography.
• Frantz Delbecque: Director of R&D and New Technologies at Eclair Group (www.eclairgroup.com) a French leading post production house and film laboratory.
• Mauricio Alvarez-Mesa: CEO of Spin Digital, a German company specialized in video codecs for ultra-high quality applications, and Senior Researcher at the Technical University of Berlin.
• Pascal Lebegue: Director of Photography.

The Film265 project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 645500.

## 18.04.2016: Felix Goroncy and Matthias Göbel have been admitted to ACACES 2016.

Felix Goroncy and Matthias Göbel have been admitted to the "Twelfth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems" (ACACES 2016). The summer school is organized by the HiPEAC Network of Excellence. It is a one week summer school for computer architects and tool builders working in the field of high performance computer architecture and compilation for computing systems. The school aims at the dissemination of advanced scientific knowledge and the promotion of international contacts among scientists from academia and industry.

ACACES 2016 will take place in Fiuggi, Italy, from July 10th to July 16th 2016.

## April 18-21: Spin Digital at the NAB Show 2016 in Las Vegas.

Spin Digital, a spin-off of the AES group of TU Berlin, will demonstrate at NAB 2016 an HEVC decoder and media player that is ready for 8K video and beyond. The demonstration will take place from April 18-21 at Las Vegas Convention Center, South Upper Hall Booth SU1521.

The demonstration at NAB will show a real-time HEVC/H.265 decoder and video renderer for 8K supporting 60 frames per second, and high quality colour formats (4:4:4 and 10-bit). The setup consists of a PC platform connected to an impressive 55 inch 8K monitor from Panasonic. Spin Digital HEVC video player together with the Panasonic 8K monitor form a complete solution for 8K video playback.

Spin Digital provides HEVC codec solutions for 8K & 4K professional needs including: HEVC/H.265 decoding, video rendering, media player, and HEVC/H.265 encoding. Spin Digital solutions are available immediately for selling include a complete package of products, technical support, as well as consulting and customization services for professional end users.

Chi Ching Chi, Alexander Papachristos, and Mauricio Álvarez-Mesa, will be at the booth to give expert technical information, and discuss business opportunities.

## Jan. 2016: Project LPGPU2 has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688759.

Graphics researchers at Samsung Electronics UK have teamed up with mobile graphics specialists Codeplay, Think Silicon and TU Berlin to develop a tool for enabling smartphone batteries to last longer while running advanced video games and using the camera.

The EU Commission has awarded a European GPU consortium a grant of 2.97 million Euros to research and develop a novel tool chain for analysing, visualizing, and improving the power efficiency of applications on mobile Graphics Processor Units (GPUs).

The consortium includes three European technology companies: Codeplay, the Edinburgh based GPU technology company, Think Silicon (a Greek low gate-count Graphics Semiconductor IP Core company) and Samsung Electronics UK Ltd. TU-Berlin (Germany), a European University, completes the group.

The key objectives of the 2 and a half years research project are:

• Define new industry standards for resource and performance monitoring to be widely adopted by embedded hardware GPU vendors (Khronos group)

• Define a methodology for accurate power estimations for embedded GPU.

• Enhance existing Dynamic Voltage and Frequency Scaling (DVFS) mechanism for optimum power management with sustained performance.

• To improve the power efficiency of compute and graphics applications running on mobile GPUs

• Build a unique power and performance visualization tool which informs application and GPU device driver developers of potential power and performance improvements.

## December 23, 2015: Matthias Göbel, Chi Ching Chi, Mauricio Alvarez-Mesa and Ben Juurlink received a HiPEAC Paper Award for the paper "High Performance Memory Accesses on FPGA-SoCs: A Quantitative Analysis".

Matthias Göbel, Chi Ching Chi, Mauricio Alvarez-Mesa and Ben Juurlink received a HiPEAC Paper Award for the paper "High Performance Memory Accesses on FPGA-SoCs: A Quantitative Analysis" which was published at the 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM 2015).

The authors analyzed the memory bandwidth of an FPGA-SoC, namely Xilinx's Zynq-7000. Their main focus was laying on two-dimensional memory accesses which can often be found in video coding and image processing applications. They implemented various hardware and software components that perform synthetic accesses with a given width and height. Scenarios like combining multiple ports or using cache coherency were evaluated. Furthermore, a memory trace of an HEVC motion compensation unit has been used in order to simulate a real workload. In contrast to other papers, the results showed that the full bandwidth of the memory controller and the DDR chips can be used. Therefore, the FPGA and the memory ports themselves cannot be considered bottlenecks. In addition, the results proved that Full-HD HEVC decoding in real-time on a Zynq-7000 is possible while 4k decoding is too ambitious without caching or memory compression techniques.

## December 16, 2015: Philipp Habermann receives Best Student Paper Award at IEEE ISM 2015.

Philipp Habermann presented the paper "Optimizing HEVC CABAC Decoding with a Context Model Cache and Application-specific Prefetching" at the IEEE International Symposium on Multimedia (ISM 2015) in Miami, FL.
He received the Best Student Paper Award for his work, which was co-authored by Chi Ching Chi, Mauricio Alvarez-Mesa and Ben Juurlink.

The authors provide a design space exploration of different cache configurations for HEVC CABAC hardware decoding. It is demonstrated that the decoder throughput can be significantly increased when a cache replaces a bigger context model memory in the critical data path. Furthermore, it is shown that the cache miss rate can be effectively reduced with an application-specific prefetching algorithm and the corresponding optimized memory layout, up to the point where it is not noticeable anymore.

## November 6, 2015: Paper “The Neuro Vector Engine: Flexibility to Improve Convolutional Network Efficiency for Wearable Vision" accepted for publication at International conference on Design, Automation and Test in Europe (DATE 20

The paper “The Neuro Vector Engine: Flexibility to Improve Convolutional Network Efficiency for Wearable Vision” by Maurice Peemen, Bart Mesman, Henk Corporaal, Runbin Shi, Sohan Lal and Ben Juurlinik has been accepted for publication at the International conference on Design, Automation and Test in Europe (DATE) 2016.

The paper resulted as a part of the HiPEAC funded collaboration between Technical University of Eindhoven and TU Berlin. The paper presents the Neuro Vector Engine (NVE), a SIMD accelerator for Deep Convolutional Networks (ConvNets) for visual object classification, targeting portable and wearable devices. A detailed comparison with a low-power Arm-A9 core and an embedded TK1 GPU shows much high throughput and energy efficiency of NVE. Considering power budget of only 54 mW, our proposed accelerator is suitable for embedment in the next-generation mobile devices and can bring smart features like real-time visual object classification and speech recognition to our cherished portable companions.

## November 2, 2015: Ben Juurlink PC member of DSD 2016.

Prof. Dr. Ben Juurlink has been invited and accepted to be a member of the Program Committee of the 19th Euromicro Conference on Digital System Design (DSD 2016), which will be held in Nicosia, Cyprus from August 31st to September 2nd.

## October 26, 2015: New Offers for Bachelor / Master Thesis

The AES group at TUB offers new Bachelor / Master Theses in the field of High Performance Video Coding. More information at http://www.aes.tu-berlin.de/menue/theses_projects/

## October 12, 2015: AES aquired 4K Cameras for Teaching and Research Activities.

The AES group at TUB has acquired two 4K professional camera to be used to shoot lectures of the courses offered by the group. The courses will be offered online through Coursera platform. A small studio facility is built using these cameras and can be used to shoot high quality videos for different purposes.

## October 1 and 2, 2015: EIT-Digital online education meeting- Rennes - France.

Ahmed Elhossini is attending EIT-Digital online education meeting in Rennes, France. The online education program aims to take the existing Master School program of EIT-Digital to a second level, targeting more audience. The program is planned to be hosted with by Coursera platform with 8 specialization covering all aspects of the typical 1st year of the master program in embedded systems. AES@TUB is planning to participate with two specializations covering advanced embedded computer architectures and multicore systems.

## September 21, 2015: Paper "Optimizing HEVC CABAC Decoding with a Context Model Cache and Application-specific Prefetching" accepted in IEEE International Symposium in Multimedia.

The paper "Optimizing HEVC CABAC Decoding with a Context Model Cache and Application-specific Prefetching" by Philipp Habermann, Chi Ching Chi, Mauricio Alvarez-Mesa and Ben Juurlink has been accepted for publication in the IEEE International Symposium on Multimedia (ISM 2015) which will take place from December 14th to 16th in Miami.

The authors provide a design space exploration of different cache configurations for HEVC CABAC hardware decoding. It is demonstrated that the decoder throughput can be significantly increased when a cache replaces a bigger context model memory in the critical data path. Furthermore, it is shown that the cache miss rate can be effectively reduced with an application-specific prefetching algorithm and the corresponding optimized memory layout, up to the point where it is not noticeable anymore.

## September 18, 2015: Paper "Reducing HEVC Encoding Complexity Using Two-Stage Motion Estimation" accepted in IEEE International Conference on Visual Communications and Image Processing.

The paper "Reducing HEVC Encoding Complexity Using Two-Stage Motion Estimation" by Gabriel Cebrián Márquez, Chi Ching Chi, José Luis Martínez, Pedro Cuenca, Mauricio Álvarez-Mesa, Sergio Sanz-Rodríguez, and Ben Juurlink has been accepted for to appear at the IEEE International Conference on Visual Communications and Image Processing (VCIP) to be held from December 13th to 16th in Singapore.

In the paper the authors analyse how to improve the performance of the HEVC encoder by using a two-stage Motion Estimation approach. The paper is the result of a collaboration between the High-Performance Networks and Architectures (RAAP) group of the University of Castilla-La Mancha, (Albacete, Spain) and the AES group of TU Berlin.

## September 9, 2015: Paper "Spatiotemporal SIMT and Scalarization for Improving GPU Efficiency" published in ACM TACO.

The paper "Spatiotemporal SIMT and Scalarization for Improving GPU Efficiency" has been accepted and published in ACM TACO. ACM TACO is a well known journal that focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. The paper shows that GPU using a new technique called spatiotemporal SIMT together with Scalarization improves energy efficiency by more than 25%.
See: http://dl.acm.org/citation.cfm?id=2811402

## September 11-15: Spin Digital at IBC 2015 in Amsterdam.

Spin Digital will demonstrate a real-time 8K HEVC/H.265 video decoding system at IBC 2015, taking place from September 11-15 in Amsterdam. The demonstration will take place at RAI Amsterdam at Hall 1, Stand 1.F13. Spin Digital will present the latest version of its very compact 8K HEVC/H.265 decoder for the next generation of Ultra High Definition applications. It uses Spin Digital’s optimized H.265 decoder, which achieves 8K real-time with 60 frames per second, and extended color formats (4:4:4, 10-bit) using a commodity workstation. Envisioned applications are immersive video for very large screens and next generation virtual reality devices.

Mauricio Álvarez-Mesa, Chi Ching Chi, and Sergio Sanz-Rodríguez will be at the booth to give expert technical information, and discuss business opportunities.

Spin Digital, a spin-off the AES group of TU Berlin, is a provider of high performance video codecs for the next generation of high quality video applications. The company is specialized in highly efficient implementations of the HEVC/H.265 video coding standard.

## September 4-9: Spin Digital at IFA 2015 in Berlin.

Spin Digital announced that it will demonstrate a real-time 4K HEVC/H.265 video decoding system at the IFA 2015 exhibition, taking place from September 4-9 in Berlin. The demonstration will take place at the Berlin Messe at Halle 11.1, Stand 2 (joint TUBs booth). Spin Digital will present its real-time HEVC/H.265 decoder for 4K (UHD-1 Phase 2) supporting 60 frames per second, high dynamic range, and extended color formats (4:4:4 and 10-bit) using a commodity PC.

Mauricio Álvarez-Mesa, Chi Ching Chi, Sergio Sanz-Rodríguez, Haifeng Gao, and Prof. Ben Juurlink will be at the booth to give expert technical information, and discuss business opportunities.

Spin Digital, a spin-off the AES group of TU Berlin, is a provider of high performance video codecs for the next generation of high quality video applications. The company is specialized in highly efficient implementations of the HEVC/H.265 video coding standard.

## Aug 24-28, 2015: Biagio Cosenza with two papers at Euro-Par 2015 in Vienna.

Biagio Cosenza will be at Euro-Par 2015 in Vienna for two papers.

The first paper, coauthored with Klaus Kofler and Thomas Fahringer from the University of Innsbruck, is "Automatic Data Layout Optimizations for GPUs". The paper presents an optimization infrastructure based on the Insieme compiler to automatically determine an improved data layout for OpenCL programs written in AoS layout. Authors' copy available at this link: http://www.biagiocosenza.com/papers/KoflerEUROPAR15.pdf

The second paper, "Behavioral Spherical Harmonics for Long-Range Agents’ Interaction", will be presented at the satellite Workshop on Parallel and Distributed Agent-Based Simulations (PADABS) on August 24. The paper introduces a new behavioral model based on spherical harmonics to efficiently represent directional information on agent-based simulation. Author's copy available at this link: http://www.biagiocosenza.com/papers/CosenzaPADABS15.pdf

Euro-Par is the prime European conference covering all aspects of parallel and distributed processing, ranging from theory to practice, from small to the largest parallel and distributed systems and infrastructures, from fundamental computational problems to full-fledged applications, from architecture, compiler, language and interface design and implementation, to tools, support infrastructures, and application performance aspects.
The 2015 edition of Euro-Par will be held in Vienna, Austria from August 24 to August 28, 2015. It will be hosted at the Vienna University of Technology and is organized by the Research Group for Parallel Computing.

Further information can be found at the following links:
Euro-Par 2015 http://www.europar2015.org
Euro-Par 2015 accepted papers (main track) http://www.europar2015.org/accepted_papers.html

## August 2015: Ben Juurlink in program committee of SCOPES 2016.

Ben Juurlink has been invited to join the program committee of the next edition of the workshop on Software and Compilers for Embedded Systems (SCOPES, www.scopesconf.org) which will be held on May 23-25, 2016 in Sankt Goar, Germany, a beautiful and quiet site in the hills at the border of the Rhein river.

It will feature a combination of research papers and research presentations. The research papers will be published in the ACM digital library and must therefore present original research results. Research presentations may be based on research results that have previously been presented in other forums. The workshop will be held in cooperation with ACM SIGBED and EDAA and it will be organized by

• Henk Corporaal, Eindhoven University of Technology  (General Chair)
• Peter Marwedel, Dortmund University (Publicity Chair)
• Sander Stuijk, Eindhoven University of Technology (Program Committee Chair)

## Juli 2015: Ben Juurlink in PARS-Leitungsgremiums für 2016 bis 2018.

Prof. Ben Juurlink ist ins Leitungsgremium der Fachgruppe PARS (2016-2018) gewählt worden. Die Fachgruppe Parallel -Algorithmen, -Rechnerstrukturen und -Systemsoftware (PARS) ist eine gemeinsame Fachgruppe der Gesellschaft für Informatik e.V. (GI) und der Informationstechnischen Gesellschaft (ITG). PARS befasst sich mit allen Formen der Parallelverarbeitung, insbesondere den Wechselbeziehungen zwischen Hardware- und Softwarearchitekturen von parallelen Systemen.

## 12th June 2015: AES group in Firmenlauf Berlin 2015.

The 14th Berliner Firmenlauf (Berlin's Company Run) event was held on 12th June 2015.
Prof. Juurlink, as well as other AES group members took part in this yearly event, with around other 12,000 participants in glorious weather.
The distance for every runner is 5.5 kilometers, with both starting and ending point at the Brandenburg Gate.
All AES runners completed this distance within around 30 minutes.
Dr. Mauricio Alvarez-Mesa is the champion within the group. His time was 24 minutes.
All AES participants enjoyed this event a lot.
After this intensive running they gathered along the tent of TU Berlin and enjoy a beer together.

## June 15th 2015: Grant for collaboration with AES.

The HiPEAC network of excellence has awarded a collaboration grant to Diego Felix de Souza of Instituto Superior Técnico in Lisbon, Portugal to visit and collaborate with the AES group of TU Berlin. Diego is a PhD student of Professor Leonel Sousa and his main research interests are Graphics Processing Unit (GPU), Parallel Computing, Video Coding and Processing, Image Processing, Multimedia Systems, Signal Processing Architectures, and Embedded Systems. The reviewers rated his proposal as very ambitious that very likely leads to a good publication.

## May 25-29, 2015: Tamer Dallou presenting a paper at IPDPS 2015, Hyderabad, INDIA.

Tamer Dallou will present the paper "Nexus#: A Distributed Hardware Task Manager for Task-Based Programming Models" authored by  Tamer Dallou, Nina Engelhardt, Ahmed Elhossini, and Ben Juurlink at IPDPS, which presents the group's recent research outcomes on developing hardware support for parallel programming models.

IPDPS is an international forum for engineers and scientists from around the world to present their latest research findings in all aspects of parallel computation. The 2015th version of IPDPS travels to India, Hyderabad, in the Hyderabad International Convention Centre, India's best convention venue.

In this paper, the authors investigate the hurdles of modern task-based programming models such as OmpSs, and introduce an improved version of their hardware accelerator for task graph management called Nexus#. The architectural design of Nexus# is described in detail along to several scenarios of its execution pipeline. Using traces of several benchmarks from the Starbench parallel benchmark suite, ranging from parallel workloads like ray-tracing to workloads with more complicated dependency patterns like h264dec, Nexus# has been evaluated. It showed significant advance over prior works, in addition to the importance of hardware acceleration for the scalability of task-based programming models.

## May 25-29, 2015: IEEE TCPP awarded Tamer Dallou a travel grant to attend IPDPS 2015.

The IEEE Computer Society Technical Committee on Parallel Processing (TCPP) has awarded Tamer Dallou a travel grant to attend the 29th IEEE International Parallel & Distributed Processing Symposium, which takes place at Hyderabad's International Convention Centre, Hyderabad, INDIA, on May 25-29, 2015.
Tamer Dallou will present the paper "Nexus#: A Distributed Hardware Task Manager for Task-Based Programming Models" authored by  Tamer Dallou, Nina Engelhardt, Ahmed Elhossini, and Ben Juurlink at IPDPS, which presents the group's recent research outcomes on developing hardware support for parallel programming models.

## May 15th, 2015: Paper "Two-Level Sliding-Window VBR Control Algorithm for Video on Demand Streaming" accepted in Signal Processing: Image Communication.

The paper "Two-Level Sliding-Window VBR Control Algorithm for Video on Demand Streaming" by Sergio Sanz-Rodríguez and colleagues from Universidad Carlos III de Madrid (UC3M), Spain, has been accepted in "Signal Processing: Image Communication", an international journal for the design, implementation and use of image communication systems and video codecs.

The paper proposes a two-level variable bit rate (VBR) control algorithm for hierarchical video coding, specifically tailored for the new High Efficiency Video Coding (HEVC) standard. A long-term level monitors the current bit count along a sliding window of a few seconds, comprising several intra periods (IPs) and shifted on an IP basis. This long-term view allows the accommodation of the naturally occurring rate variations at a slow pace, avoiding the annoying sharp quality changes commonly appearing when non-sliding window approaches are used. The bit excesses or defects observed at this level are evenly delivered to a short-term level mechanism that establishes target bit budgets for a narrower sliding window covering a single IP and shifting on a frame basis. At this level, an adequate quantization parameter is estimated to comply with the designated target bit rate. Recommended test conditions as well as two few minutes long video sequences with scene cuts have been used for the assessment of the proposed VBR controller. Comparisons with a state-of-the-art rate control algorithm have produced good results in terms of quality consistency, in exchange for moderate rate-distortion performance losses.

## May 14th-18th 2015: Mauricio Alvarez-Mesa at the Cannes Film Festival 2015.

Mauricio Alvarez-Mesa will participate in the 2015 edition of the Cannes Film Festival. The AES group of TU Berlin and the Cannes Film festival, among other partners, are working together in the Film265 European project. The main objective of this project is to improve existing Video-on-Demand platforms using the new HEVC/H.265 codec.

During the festival Mauricio Alvarez-Mesa will participate in meetings with the festival organizers, other partners of the Film265 project, and representatives from the media industry attending the different events organized in conjunction with the festival.

## May 12th. 2015: Mauricio Alvarez-Mesa participates at the DG Connect meeting at the European Commission in Luxembourg.

Mauricio Alvarez-Mesa will participate in the "Concertation meeting on activities of the Creativity Unit" organized by the European Commission under the DG Connect program (European Commission Directorate General for Communications Networks, Content & Technology) in Luxembourg.

The aim of the meeting is to bring together the ongoing H2020 projects enable knowledge transfer, exchange of lessons learned, prepare further collaboration shaping/ providing input to the H2020 LEIT ICT work programme in the area of Creativity / Creative Industries.

Mauricio Alvarez-Mesa will make a short presentation about the Film265 project, of which he is the Technical Coordinator.

http://www.film265.eu

## May 7+8, 2015: Three papers from the AES group are accepted for publication in the 26th PARS-Workshop, Potsdam, Germany.

Three papers from the AES group are accepted for publication in the 26th PARS-Workshop that will be organized by the Universität Potsdam on May 7 and 8, 2015, in the city of Potsdam, Germany. The papers are going to be presented during the workshop. The accepted papers are:

1- "High performance CCSDS image data compression using GPGPUs for space applications" by Sunil Chokkanathapuram Ramanarayanan, Kristian Manthey and Ben Juurlink.

2- "A Proximity Scheme for Instruction Caches in Tiled CMP Architectures" by Tareq  Alawneh, Chi Chi, Ahmed Elhossini and Ben Juurlink.

3- "Real-Time Vision System for License Plate Detection and Recognition on FPGA" by Farid Rosli, Ahmed Elhossini and Ben Juurlink.

PARS is a workshop organized by the special interest group on parallel algorithms, parallel computer structures and parallel system software within the German Informatics Societies (GI/ITG). The goal of the bi-annual PARS Workshop is the presentation of important research within the scope of PARS and an exchange of ideas between the participants. The papers will be included in the workshop proceedings which will be published in the yearly newsletter of the PARS special interest group (PARS-Mitteilungen, ISSN 0177-0454).

## May 3-5, 2015: Paper "High Performance Memory Accesses on FPGA-SoCs: A quantitative analysis" accepted at FCCM 2015.

The paper „High Performance Memory Accesses on FPGA-SoCs: A quantitative analysis“ by Matthias Göbel, Chi Ching Chi, Mauricio Alvarez-Mesa and Ben Juurlink was accepted as a poster at FCCM 2015. The conference takes place between May 3-5 in Vancouver, BC. An extended abstract will appear in the proceedings.

This paper presents an analysis of the various memory and communication interconnects found in so called FPGA-SoCs. For this purporse an actual device has been chosen, namely the Zynq-7000 by Xilinx. Issues such as different access patterns, cache coherency and fullduplex communication are analyzed, mainly with a focus on applications from the field of video coding. Furthermore, the paper shows that by carefully choosing the memory interconnect networks as well as the software interface, a high-speed memory access can be achieved for various scenarios.

## April 29th, 2015: Prof. Juurlink in PhD defence committee in Eindhoven.

On April 29 Ben Juurlink is a member of the PhD defence committee of Raymond Frijns, who defences his dissertation entitled “Platform-based Design for High-Performance Mechatronic Systems” at the Technical University of Eindhoven.

## 13th-17th, April 2015: Paper presentation in the 11th International Symposium on Applied Reconfigurable Computing (ARC 2015). “An Efficient and Flexible FPGA Implementation of a Face Detection System”.

Dr. Ahmed Elhossini is attending the 11th International Symposium on Applied Reconfigurable Computing (ARC 2015). He is going to present a paper in the main track of the Symposium titled: “An Efficient and Flexible FPGA Implementation of a Face Detection System” authored by Hichem Ben Fakih, Ahmed Elhossini and Ben Juurlink. The paper will be included in the proceeding published by Springer Lecture Notes in Computer Science (LNCS), ISSN: 0302-9743, which will be indexed by ISI Proceedings and EI-Compendex.

The International Symposium of Applied Reconfigurable Computing (ARC) aims to bring together researchers and practitioners of reconfigurable computing with an emphasis on practical applications of this promising technology. This year's Symposium will have a series of international invited speakers who will express their views on the future of reconfigurable technology. The Symposium will be Hosted by Ruhr-Universität Bochum in Germany between the 13th and 17th of April 2015 (http://arc2015.esit.rub.de/).

## March 27th, 2015: Ben Juurlink and Mauricio Alvarez Mesa visit Samsung.

Ben Juurlink and Mauricio Alvarez Mesa will visit Samsung R&D Institute UK (SRUK) on March 27 to give a talk entitled "Low Power High Efficiency Video Decoding” and to talk about future collaborations.

## March 16-20, 2015: Spin Digital, a spin-off of the AES group of TU Berlin, presents an 8K HEVC/H.265 demo at CeBIT.

Spin Digital, provider of ultra high performance and low power video codecs, announced that it will demonstrate an 8K HEVC/H.265 video decoding system at the CeBIT 2015 exhibition, taking place from March 16-20 in Hannover. The demonstration will take place at Booth 28 in Hall 9 (joint Berlin-Brandenburg booth).

Spin Digital presents an 8K video decoder based on the recently released version 2 of the HEVC/H.265 standard. Spin Digital’s optimized H.265 decoder achieves 8K real-time performance on a single-processor workstation, making 8K technology ready for deployment in practical use cases.

Mauricio Álvarez-Mesa, Chi Ching Chi, and Sergio Sanz-Rodríguez from the Spin Digital staff will be at the booth to give expert technical information, and discuss business opportunities.

http://spin-digital.com/8k

## March 12th to 14th, 2015: Ben Juurlink gives a talk at DATE conference in Grenoble.

Ben Juurlink will attend the Design Automation and Test in Europe (DATE) conference  from March 12 to March 14 to give a talk in the Hot Topic Session on Hot Topic - Multi/Many-Core Programming: Where Are We Standing? about "5% OR 5X? THE PERFORMANCE GAP IN SIMD OPTIMIZATION, AND POSSIBLE SOLUTIONS”.

## EIT-ICT Master School Online Education Kick-off Meeting - Eindhoven, 12th and 13th of March 2015

Ahmed Elhossini is attending the Kick-off meeting for the Online Education program of the EIT-ICT Master School. The event will be hosted by the Technical University of Eindhoven on the 12th and 13th of March 2015. During the event various program specializations will be presented and several workshops will be held to introduce the online environment used to host the courses and their contents. AES group  at  TU-Berlin are leading the “Architectures and Compiler for Embedded Systems” specialization which is part of the Embedded Systems major of the online education program.

## February 21th, 2015: Dr. Ahmed Elhossini attends FPGA 2015 Monterey, California, USA.

Dr. Ahmed Elhossini is attending the FPGA 2015 in Monterey, California, USA. A poster titled “An Efficient and Flexible FPGA Implementation of a Face Detection System” authored by Hichem Ben Fakih, Ahmed Elhossini and Ben Juurlink will be presented in the conference.

The ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2015) is the premier conference for presentation of advances in all areas related to FPGA technology. Major vendors are presenting their latest devices and tools to support high speed reconfigurable computing. The poster abstract will appear in the conference proceedings. The conference is taking place in Monterey Conference Center.

## February 5th, 2015, 14:00h.: EIT-ICT Labs Master School Information Session - Embedded System Major .

Looking for a chance to complete your Master studies in two of the big universities in Europe? You are invited to an information session about the EIT Labs Master School program. The EIT ICT Labs Master School offers a two-year education with eight technical programmes and a minor in Innovation & Entrepreneurship. 21 European top universities, renown researchers and leading businesses are partnered with EIT ICT Labs to provide technical excellence leading to two master's degrees, the EIT ICT Labs Master's Certificate and hands-on experience (http://www.masterschool.eitictlabs.eu/). This information session is to introduce the Embedded System Major of the master program. Five different universities are participating in this major, offering five different specialization in the area of embedded systems. Prof. Ben Juurlink is the local coordinator of the Embedded Systems Major of EIT-ICT Labs master school at TU-Berlin the last three years. Prof. Juurlink will give a presentation and answer questions about the school, the program , and the admission process.

Location: EIT-ICT Labs co-location center 6th Floor, TEL building.
Date and Time: February 5th, 2015, 14:00h.

## January 21st 2015: AES well-presented at the top HiPEAC Event in Amsterdam!.

The AES group of TU Berlin will have a very strong participation at the HiEPAC-2015 conference. The HiPEAC conference is the premier European forum for experts in computer architecture, programming models, compilers and operating systems for embedded as well as general-purpose systems. The AES team organizes a workshop, give three invited talks, present three posters from two projects, and present a paper in the main conference paper track.

Prof. Juurlink and Dr. Alvarez-Mesa are co-organizers of the LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2015) which will be held on Wednesday Jan. 21st. PEGPUM will consist of 12 invited talks from researchers of industry and academia, including two keynote speakers: Edvard Sørgård, Senior Principal Graphics Architect at ARM Trondheim (Norway), and Simon McIntosh-Smith, Head of the Microelectronics Group and Senior Lecturer at the University of Bristol (UK).

During the PEGPUM workshop Prof. Juurlink will give a presentation titled "Low-power parallel processing on GPUs: looking back and forward“ in which he will highlight the main contributions of the LPGPU project that has finished successfully recently. In addition, Jan Lucas, a PhD student from the AES group, will give a presentation titled "Scalarization and Temporal SIMT in GPUs: reducing redundant operations for better performance and higher energy efficiency" in which he will present his latest research about power-efficient GPU architectures.

On Tuesday Jan. 20th in the main paper track Dr Alvarez-Mesa will present a paper titled "Low-power High Efficiency Video Decoding using General Purpose Processors". This paper has been accepted for publication in the ACM Transaction on Architecture and Code Optimization (TACO) and it presents an analysis of techniques for improving the power efficiency of HEVC/H.265 video decoding wqhen using conventional processors.

Also on Tuesday Jan. 20th, Dr Alvarez-Mesa will give an invited talk in the Programmability Issues for Heterogeneous Multicores (MULTIPROG) workshop. The title of the talk is "Mapping Video Codecs to Heterogeneous Architectures" and present the experiences of the AES research team on using heterogeneous multicores for video codecs, mainly for the new HEVC/H.265 video coding standard.

On Wednesday Jan. 21st during the "EU Projects Poster Session" the AES team will present two posters that highlight the main achievements of the LPGPU project. The LPGPU project has finished Oct. 2014, and the AES team has made several contribution in the areas of applications, tools and architectures for low-power GPU systems.

Last but not least, on Tuesday Jan. 20th the AES team will present a poster during the "Industrial Poster Session" with the main results of a Technology Transfer Project (TTP) that has been conceded by the TETRACOM project to TU Berlin and the Greek company Think Silicon. The goal of this project is to accelerate AES' HEVC/H.265 video decodier using the embedded GPUs developed by Think Silicon.

- https://www.hipeac.org/2015/amsterdam
- http://lpgpu.org
- http://lpgpu.org/wp/pegpum-2015
- http://research.ac.upc.edu/multiprog
- http://www.tetracom.eu

## January 9th 2015: H2020 project Film265 kick-off meeting in Berlin.

The new EU funded project "Film265" will held its kick-off meeting in Berlin in January 9th 2015. The meeting is hosted by the AES group of TU Berlin, and will count with the participation of all project partners.

Funded by the European Commission under the Horizon 2020 program, Film265 is a new project that aims to support the creative Video-on-Demand industry with emerging technologies in video delivery. The core of the project consists of adapting the H.265 video codec for VoD scenarios. H.265 is a new video coding standard that provides compression gains up to 50% compared to the state-of-the-art H.264/AVC,and will be used, among others, for 4k/UHD streaming over the internet. Film265 aims to develop a complete end-to-end H.265 video delivery solution including: cloud-based transcoding, streaming delivery, and web playback. As an Innovation Action the project aims to transfer emerging technologies from research prototype stages into market ready solutions.

The consortium consists TU Berlin (Germany), Reelport (France/Germany), Marché du Film - Cinando (France), and LevelK (Denmark). TU Berlin (AES group) has an extensive experience on efficient implementations of video codecs, Reelport is a VoD provider for film-related projects, Cinando handles the market of the Cannes Film Festival, and LevelK is a Danish film distributor using VoD technologies. TU Berlin will provide the new video codec; Reelport its existing PicurePipe cloud encoding solution; and Cinando and LevelK will integrate and test the new technologies into their VoD solutions.

## Dec 12th, 2014: Paper "Nexus#: A Distributed Hardware Task Manager for Task-Based Programming Models" accepted at IPDPS 2015.

The paper "Nexus#: A Distributed Hardware Task Manager for Task-Based Programming Models" by Tamer Dallou, Nina Engelhardt, Ahmed Elhossini, and Ben Juurlink has been accepted to appear at the 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS) on May 25-29, 2015. IPDPS is an international forum for engineers and scientists from around the world to present their latest research findings in all aspects of parallel computation. The 2015th version of IPDPS travels to India, Hyderabad, in the Hyderabad International Convention Centre, India's best convention venue.

In this paper, the authors investigate the hurdles of modern task-based programming models such as OmpSs, and introduce an improved version of their hardware accelerator for task graph management called Nexus#. The architectural design of Nexus# is described in detail along to several scenarios of its execution pipeline. Using traces of several benchmarks from the Starbench parallel benchmark suite, ranging from parallel workloads like ray-tracing to workloads with more complicated dependency patterns like h264dec, Nexus# has been evaluated. It showed significant advance over prior works, in addition to the importance of hardware acceleration for the scalability of task-based programming models.

## Dec 10th, 2014: HiPEAC Technology Transfer Award for H.265 team of AES.

The HiPEAC network of excellence has awarded a Technology Transfer Award to Ben Juurlink, Mauricio Alvarez Mesa and Chi Ching Chi for the transfer of some of their H.265 technology to the Greek ultra low-power IP firm Think Silicon Ltd . The award rewards and celebrates the transfer of research results into industry (be it through technology licensing or providing dedicated services to an existing company or through the creation of a new company). The award consists of a certificate and a financial award.

## 17. November 2014: Paper "An Efficient and Flexible FPGA Implementation of a Face Detection System" accepted at FPGA-2015 (Poster session).

The paper "An Efficient and Flexible FPGA Implementation of a Face Detection System" by Hichem Ben Fakeh, Ahmed Elhossini, and Ben Juurlink was accepted as a poster in FPGA 2015, Monterery, California. The abstract will appear in the conference procedure in February 2015. This paper proposes a hardware architecture based on the object detection system of Viola and Jones using Haar-like features. The proposed design is able to discover faces in real-time with high accuracy. Speed-up is achieved by exploiting the parallelism in the design, where multiple classifier cores can be added. To maintain a flexible design, classifier cores can be assigned to different images. Moreover using different training data, every core is able to detect a different object type. As development platform, the Zynq-7000 SoC from Xilinx is used, which features an ARM Cortex-A9 dual-core CPU and a programmable logic (FPGA). The current implementation focuses on the face detection and achieves a real-time detection at the rate of 16.53 FPS on image resolution of 640x480 pixels, which represents a speed-up of 6.46 times compared to the equivalent OpenCV software solution.

## November 3rd, 2014: Paper "Low Power High Efficiency Video Decoding using General Purpose Processors" accepted in ACM TACO.

The paper "Low Power High Efficiency Video Decoding using General Purpose Processors" by Chi Ching Chi, Mauricio Alvarez-Mesa, and Ben Juurlink has been accepted in ACM Transactions on Architecture and Code Optimization. The paper will appear in a future regular issue, and also will be presented at the HiPEAC conference in Amsterdam in January 2015. ACM TACO is a well known journal that focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. In this paper the authors investigate how code optimization techniques and low-power states of general purpose processors improve the power efficiency of HEVC decoding. To this end the power and performance efficiency of the use of SIMD instructions, multicore architectures, and low-power active and idle states are analyzed in detail for offline video decoding. In addition, the power efficiency of techniques such as "race to idle" and "exploitation slack" with DVFS are evaluated for real-time video decoding. Results show that "exploitation slack" is more power efficient than "race to idle" for all evaluated platforms representing smartphones, tablets, laptops and desktops.

## November 3th, 2014: Our new employee Daniele Bortolotti.

We are pleased to welcome Daniele Bortolotti as a new member of our group. He will contribute to the AES-team in research.

Welcome Daniele!

## October 23th, 2014 - Kick-off Event of the third EIT ICT Labs Master School in Budapest, Hungary.

Ahmed Elhossini will attend the Kick-off Event of the third EIT-ICT Labs Master School in Budapest. The event will take place between the 23rd and 25th of October, 2014. New students of 2014/15 of the EIT ICT Labs Master School will get together for the grand opening of their Master School studies for a unique kick-off event this year in Budapest. For three days, students from 19 leading European universities have the opportunity to meet and collaborate with their fellow students across technical majors, universities and country borders.

Prof. Ben Juurlink is the coordinator of the local implementation of the EIT-ICT Labs master school at TU-Berlin as well as the "embedded systems" technical major for the last three years. Ahmed is a member of he selection committee of the "Embedded Systems" technical major at TU-Berlin, and will participate in the event to welcome new students and to attend technical meetings about the future of the program.

www.eitictlabs.eu/news-events/events/article/kick-off-event-of-the-third-eit-ict-labs-master-school-in-budapest-hungary/

## October 21st, 2014: Seminar "Recent Advances in Computer Architecture" starts.

The seminar Recent Advances in Computer Architecture takes place on Tuesdays from 10:15 to 11:45 in room EN 642. Due to a misunderstanding, the module directory of TU Berlin mentioned that the seminar starts next week October 21. Because of this, the first meeting has actually been postponed to 21st October. You’re cordially invited to attend this seminar.

## October 17th, 2014: Paper "A new real-time system for image compression on-board satellites” accepted in OBPDC Workshop.

The Paper "A new real-time system for image compression on-board satellites” by Kristian Manthey, David Krutz and Ben Juurlink has been accepted in "On-Board Payload Data Compression Workshop".

Remote sensing sensors are used in various applications from Earth sciences, archeology, intelligence, change detection or for planetary research and astronomy. Disaster management after floodings or earthquakes, detection of environmental pollutions or fire detection are examples of countless number of applications. The spatial as well as the spectral resolution of satellite image data increases steadily with new technologies and user requirements resulting in higher precision and new application scenarios. In the future, it will be possible to derive real-time application-specific information from the image on-board the satellite also based on high-resolution images. On the technical side, there is a tremendous increase in data rate that has to be handled by such systems. While the memory capacity requirements can still be fulfilled, the transmission capability becomes increasingly problematic.

In this paper, an image compression architecture with region-of-interest support and with flexible access to the compressed data based on the CCSDS 122.0-B-1 image data compression standard is presented. Modifications to the standard permit a change of compression parameters and the re-organization of the bit-stream after compression. An additional index of the compressed data is created, which makes it possible to locate individual parts of the bit-stream. On request, stored images can be re-assembled according the application’s needs and as requested by the ground station. Interactive transmission of the compressed data is possible so that overview images can be transmitted first followed by detailed information for the regions of interest (ROIs).

## October 9th, 2014: Paper "A parallel H.264/SVC Encoder for High Definition Video Conferencing" accepted in "Signal Processing: Image Communication".

The paper "A parallel H.264/SVC Encoder for High Definition Video Conferencig" by Sergio Sanz-Rodríguez, Mauricio Alvarez-Mesa, Tobias Mayer, and Thomas Schierl has been accepted in "Signal Processing: Image Communication". This article will appear (soon) in a future regular issue. "Signal Processing: Image Communication" is an international journal for the design, implementation and use of image communication systems and video codecs. The paper proposes a video encoder specially developed and configured for high definition (HD) video conferencing. This video encoder brings together the following three requirements: H.264/Scalable Video Coding (SVC), parallel encoding on multicore platforms, and parallel-friendly rate control. With the first requirement, a minimum quality of service to every end-user receiver over Internet Protocol networks is guaranteed. With the second one, real-time execution is accomplished and, for this purpose, slice-level parallelism, for the main encoding loop, and block-level parallelism, for the upsampling and interpolation filtering processes, are combined. With the third one, a proper HD video content delivery under certain bit rate and end-to-end delay constraints is ensured. The experimental results prove that the proposed H.264/SVC video encoder is able to operate in real time over a wide range of target bit rates at the expense of reasonable losses in rate-distortion efficiency due to the frame partitioning into slices.

## October 8th, 2014: Ben Juurlink @ HiPEAC CS Week in Athens.

Prof. Ben Juurlink is currently visiting the HiPEAC Computing Systems in Athens (Greece). It consist, as always, of exciting keynote and thematic sessions, and includes the Third HiPEAC Industry Partner Program (HIPP) Event as well as a social programme culminating with a guided tour of and a dinner in the Acropolis Museum. The program can be found at www.hipeac.net/csw/2014/athens. Right you see a picture of the center of Athens with the Acropolis, taken from the roof of the conference hotel.

## October 6th, 2014: Paper "SIMD Acceleration for HEVC Decoding" accepted in IEEE TCSVT

The paper "SIMD Acceleration for HEVC Decoding" by Chi Ching Chi, Mauricio Alvarez-Mesa, Benjamin Bross, Ben Juurlink, and Thomas Schierl has been accepted in IEEE Transactions on Circuits and Systems for Video Technology. The paper will appear (soon) in a future regular issue. The IEEE Transactions on Circuits and Systems for Video Technology has been the premier venue for publications in the areas of video technology such as video compression, hardware, and systems since its inception in 1991. Over the past decade, the scope of TCSVT has expanded and it is now the premier journal in all areas related to video technology and video systems including video analysis and processing. SIMD instructions have been commonly used to accelerate video codecs. The recently introduced HEVC codec like its predecessors is based on the hybrid video codec principle, and, therefore, also well suited to be accelerated with SIMD. In this paper we present the SIMD optimization for the entire HEVC decoder for all major SIMD ISAs. Evaluation has been performed on 14 mobile and PC platforms covering most major architectures released in recent years. With SIMD up to 5x speedup can be achieved over the entire HEVC decoder, resulting in up to 133 fps and 37.8 fps on average on a single core for Main profile 1080p and Main10 profile 2160p sequences, respectively.

## Call for Papers - ARCS 2015.

The 28th GI/ITG International Conference on Architecture of Computing Systems  will be held from March 24, 2015 through March 27, 2015.

Important Dates:

 Paper submission deadline: October 6, 2014 Workshop and tutorial proposals: November 3, 2014 Notification of acceptance: December 1, 2014(expected) Camera-ready papers: December 15, 2014

## 1st October 2014: Sohan Lal visiting TU EIndhoven for collaboration.

Sohan Lal will be visiting TU Eindhoven, Netherlands, starting from 1st October till the end of the year for collaborative work. The collaboration is funded by HiPEAC network of excellence.
Sohan Lal will work on the problem of memory divergence in GPUs in collaboration with Electronic Systems Group of TU Eindhoven which is lead by Prof. Henk Corporaal.
GPUs are high throughput processors and are designed for tolerating long latency by switching thousands of threads. However, memory bandwidth is usually a bottleneck.
One of the problem caused by memory divergence is data over-fetch which causes memory bandwidth wastage. In this collaborative work, the problems caused by memory divergence will be studied and their probable solutions will be sought.

## August 20th, 2014: Mr. Tamer Dallou presents a paper at HPCC 2014, Paris, France.

Mr. Tamer Dallou is presenting the paper "An Integrated Hardware-Software Approach to Task Graph Management" by Nina Engelhardt, Tamer Dallou, Ahmed Elhossini and Ben Juurlink at HPCC 2014, on 20.08.2014.

The HPCC-2014 conference is the 16th IEEE International Conference on High Performance and Communications. It will provide a forum for engineers and scientists in academia, industry, and government to address the resulting profound challenges and to present and discuss their new ideas, research results, applications and experience on all aspects of high performance computing and communications. HPCC-2014 is sponsored by IEEE, IEEE Computer Society, and IEEE Technical Committee on Scalable Computing (TCSC). It will take place in Paris, France on August 20-22, 2014. The conference program and more information can be found at conference.hpcc2014.studiocheik.fr

## August 11st, 2014: Mauricio Álvarez-Mesa gives an invited course at Universidad del Valle (Colombia).

From August 11th to 15th Dr. Mauricio Álvarez-Mesa will give an short course about "High Performance Video Coding" at the Univesidad del Valle in Cali, Colombia. In this course Dr. Álvarez-Mesa will present the latest developments in video coding, as well as the research work done at the AES group about optimized implementations of HEVC/H.265. Mr. Álvarez-Mesa will also meet with professors and graduate students to discuss ongoing collaborations between the "School of Computer Engineering" of Universidad del Valle and the AES group of TU Berlin.

## July 31th, 2014: Nina Engelhardt goes to University of Hongkong.

Nina Engelhardt leaves the AES team and will continue her research work at the University of Hong Kong. We are grateful for her good work and wish her all the best for the future.

## July 23th, 2014: Gervin Thomas successfully completed his PhD defence.

Dipl.-Ing. Gervin Thomas  successfully completed his PhD defence on Wednesday July 23rd, 2014. His thesis title was: "A Generic Implementation of a Quantified Predictor Applied to a DRAM Power-Saving Policy”. Gervin is the first to obtain his PhD in the AES group after Prof. Juurlink became the chair of the group in January 2010. Congratulation Dr. Thomas for your success and we wish you the best in your future.

## July 22th, 2014: Oracle Labs visits AES Group.

Eric Sedlar and Michael Haupt from Oracle Labs visted the AES Group on June 22th. Eric Sedlar is the Vice President and Technical Director Oracle Labs. Members of AES group presented their work and engaged in enthralling discussions. Later Eric Sedlar presented his talk "How I Learned to Stop Worrying and Love Compilers".

## July 9th, 2014: Paper "Parallel H.264/AVC Motion Compensation for GPUs using OpenCL" accepted in IEEE TCSVT.

The paper "Parallel H.264/AVC Motion Compensation for GPUs using OpenCL" by Biao Wang, Mauricio Alvarez Mesa, Chi Ching Chi and Ben Juurlink has been accepted in IEEE Transactions on Circuits and Systems for Video Technology. It will appear soon after the submission of camera ready manuscript to the IEEE CASS Publications Office.

IEEE Transactions on Circuits and Systems for Video Technology covers all aspects of visual information relating to video or that have the potential to impact future developments in the field of video technology and video systems, including but not limited to: image/video processing, image/video analysis and computer vision, image/video compression, image/video communication, image/video storage, image/video hardware/software systems, and image/video applications.

## Juli 7th, 2014: Poster at PUMPS 2014.

A poster from TU Berlin named “On the Potential and Shortcomings of Temporal SIMT GPUs” was selected for the PUMPS 2014 poster season. It will be presented at the PUMPS poster seasons by Jan Lucas from TU Berlin.

The fifth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.

## June 26th, 2014: Paper "An Integrated Hardware-Software Approach to Task Graph Management" accepted at HPCC 2014.

The paper titled "An Integrated Hardware-Software Approach to Task Graph Management" by Nina Engelhardt, Tamer Dallou, Ahmed Elhossini and Ben Juurlink has been accepted at HPCC 2014.

The HPCC-2014 conference is the 16th IEEE International Conference on High Performance and Communications. It will provide a forum for engineers and scientists in academia, industry, and government to address the resulting profound challenges and to present and discuss their new ideas, research results, applications and experience on all aspects of high performance computing and communications. HPCC-2014 is sponsored by IEEE, IEEE Computer Society, and IEEE Technical Committee on Scalable Computing (TCSC). It will take place in Paris, France on August 20-22, 2014. More information can be found at conference.hpcc2014.studiocheik.fr .

Paper Abstract: Task-based parallel programming models with explicit data dependencies, such as OmpSs, are gaining popularity, due to the ease of describing parallel algorithms with complex and irregular dependency patterns. These advantages, however, come at a steep cost of runtime overhead incurred by dynamic dependency resolution. Hardware support for task management has been proposed in previous work as a possible solution. We present VSs, a runtime library for the OmpSs programming model that integrates the Nexus++ hardware task manager, and evaluate the performance of the VSs-Nexus++ system. Experimental results show that applications with fine-grain tasks can achieve speedups of up to 3.4x, while applications optimized for current runtimes attain 1.3x. Providing support for hardware task managers in runtime libraries is therefore a viable approach to improve the performance of OmpSs applications.

## June 2014: Michael Andersch signs contract with Nvidia.

Michael Andersch, a graduate student in the AES group, has signed a contract with Nvidia. Nvidia Corporation is an American global technology company based in Santa Clara, California. Nvidia manufactures graphics processing units (GPUs), as well as system-on-a-chip units (SOCs) for the mobile computing market. For several years Michael has been a research assistant in the AES group, first contributing to the ENCORE project and thereafter to the FP7 European project LPGPU (www.lpgpu.org). Michael will first be stationed in the Berlin office of NVidia before moving to sunny California. Although we are a bit sad to lose Michael to NVidia, we wish him all the best for the future.

## June 12th, 2014: A TETRACOM Technology Transfer Project has been awarded to AES - TU Berlin and Think Silicon.

A technology transfer project called “eGPU accelerated HEVC/H.265 video decoder” has been awarded to AES TU Berlin and Think Silicon. The project is financed by TETRACOM (Technology Transfer in Computing Systems), a coordination action funded by the European Commission under the FP7 program. In this project, researchers from AES TU Berlin and Think Silicon will work together on the design of a HEVC/H.265 video decoder optimized for Think Silicon embedded GPU (eGPU). The main goal is to have a complete solution for embedded video applications that requires very low power consumption. The project will have a duration of 4 months starting from September 1st 2014. If the results are positive TU Berlin and Think Silicon will work together on the commercialization of the resulting products.

The AES group of TU Berlin (www.aes.tu-berlin.de) conducts research on computer architecture, ranging from low-power embedded systems to massively parallel high-performance systems. One of its main research lines consists of the efficient mapping of video (de)coding applications onto parallel computing systems. The group has developed an ultra fast HEVC/H.265 decoder optimized for multicore architectures.

Think Silicon (www.think-silicon.com) was founded in 2007 and specializes in designing and developing Mobile Computer Graphics Solutions for low-end and mid-end portable devices. On 2014 Think Silicon released a new embedded graphics processing unit called the Nema GPU. Nema GPU is a scalable, manycore, multi-threaded, state-of-the-art, data processing design blending both graphics rendering and general computing capabilities.

## June 6th, 2014: HiPEAC collaboration grant to Sohan Lal.

HiPEAC has accepted a three month collaboration proposal of Sohan Lal between Electronic Systems Group of TU Eindhoven, Netherlands and Embedded Systems Architecture Group of TU Berlin, Germany.

Sohan Lal will work on the problem of memory divergence in GPUs in collaboration with Electronic Systems Group which is lead by Prof. Henk Corporaal.

Memory divergence is one of the key performance bottleneck for high performance computing on GPUs. Both the Embedded Systems Architecture Group and the Electronic Systems Group have lot of experience in GPUs and the collaboration is expected to yield high quality joint publication.

## Mai 20th, 2014: Paper accepted for Scientific Programming Journal.

The paper "TACO: A Scheduling Scheme for Parallel Applications on Multicore Architectures" co-authored with Jan Schönherr and Jan Richling of the Operating Systems research group of TU Berlin and Ben Juurlink of AES has been accepted for publication in the Scientific Programming journal by IOS Press.

## Mai 15th, 2014: ISCA Travel Grant to Jan Jucas.

Jan Lucas was granted a travel grant by IEEE TCCA to visit ISCA 2014 and present his work on approximative storage in DRAM at "The Memory Forum" workshop.

The 41st International Symposium on Computer Architecture (ISCA) is the premier forum for new ideas and experimental results in computer architecture. ISCA 2014 will be held in Minneapolis, Minnesota during June 14-18, 2014.

The IEEE Computer Society Technical Committee on Computer Architecture (TCCA) is involved with research and development in the integrated hardware and software design of general- and special-purpose uniprocessors and parallel computers. TCCA annually sponsors/cosponsors the International Symposium on Computer Architecture, and with the ACM SIGARCH, it jointly administers the Eckert-Mauchly Award for contributions to computer architecture. TCCA also helps organize special issues of society periodicals and publishes a newsletter periodically, which contains meeting reports, abstracts of technical reports, calls for papers, and other announcements.

## Apr. 30th, 2014: "Sparkk: Quality-Scalable Approximate Storage in DRAM" accepted at "The Memory Forum".

The paper "Sparkk: Quality-Scalable Approximate Storage in DRAM" by Jan Lucas, Mauricio Alvarez Mesa,Michael Andersch and Ben Juurlink has been accepted at The Memory Forum. The Memory Forum 2014 will be held in conjunction with the 41st International Symposium on Computer Architecture (ISCA-41). The paper presents a novel technique for an improved approximative storage area in DRAM.

## Apr. 14th, 2014:The paper titled "GPGPU Workload Characteristics and Performance Analysis" has been accepted at SAMOS 2014.

The paper titled "GPGPU Workload Characteristics and Performance
Analysis" by Sohan Lal, Jan Lucas, Michael Andersch, Mauricio
Alvarez-Mesa, Ahmed Elhossini and Ben Juurlink has been accepted at SAMOS 2014.

The International Conference on Embedded Computer Systems: Architectures, MOdeling, and Simulation (SAMOS) was established in 2001 by Prof. Stamatis Vassiliadis, Prof. Ed Deprettere, and Dr. Andy Pimentel as a Dutch Research Seminar located in the tiny town of Agios Konstantinos on the small Greek island of Samos in the Aegean Sea. From 2001, every year, many scientists from both Academia and Industry are involved in the different aspects of the organization of the conference. More information about SAMOS 2014 can be found at http://samos-conference.com/.

## Gesucht: Studentische Hilfskraft mit 41 Monatsstunden und Unterrichtsaufgaben.

Kennziffer: 3434 T 25/14
Bewerbungsfristende: 10.03.2014
Einstellungsdauer: voraussichtlich vom 01.04.2014 bis zum 31.03.2016

Aufgabengebiet:
Mitarbeit in der Lehre im Bachelor-Studium. Betreuung und Vorbereitung für folgende Lehrveranstaltungen: TechGI1: Digitale Systeme, TechGI2/TechGI2TI: Rechnerorganisation, Hardware Praktikum

Anforderungen:
Bachelor Technische Informatik oder Informatik, Abschluss des 3. Semesters und Modulabschlüsse in TechGI1 und TechGI2 bzw. äquivalente Abschlüsse, gute Englisch - und VHDL-Kenntnisse, Bereitschaft zur Einarbeitung in neue Themengebiete

Ihre schriftliche Bewerbung mit Lebenslauf, Immatrikulationsbescheinigung und ggf. aktueller Notenübersicht richten Sie bitte an:

Technische Universität Berlin
Fakultät IV - Elektrotechnik und Informatik
Institut für Technische Informatik und Mikroelektronik (TIME)
Fachgebiet Architektur eingebetteter Systeme (AES)
Sekretariat EN-12
Einsteinufer 17
10587 Berlin

oder per e-Mail:

## Feb. 28th 2014: Paper accepted for "Informatiktage 2014" by German Informatics Society.

Philipp Habermann's paper "Design and Implementation of a High-Throughput CABAC Hardware Accelerator for the HEVC Decoder" was accepted for the proceedings of the "Informatiktage 2014" conference by GI (German Informatics Society), which will be held from March 27-28, 2014 in Potsdam, Germany.

## Feb. 27th 2014: Paper accepted by German Informatics Society.

The paper "A High-Performance Hardware Accelerator for HEVC Motion Compensation" by Matthias Göbel has been accepted for the proceedings of the "Informatiktage 2014" conference by GI (German Informatics Society) which will be held from March 27-28, 2014 in Potsdam, Germany.

## Feb. 10th, 2014: Paper accepted at GLSVLSI'14.

The paper "A Generic Implementation of a Quantified Predictor for FPGAs" by Gervin Thomas, Ahmed Elhossini and Ben Juurlink has been accepted for oral presentation at the 24th edition of GLSVLSI in Houston, Texas, USA, May 21-23. More information about GLSVLSI'14 can be found at http://www.glsvlsi.org/.

## Gesucht: Studentische Hilfskraft mit 41 Monatsstunden.

Kennziffer: 3434 T 16/14
Bewerbungsfristende: 24.02.2014
Einstellungsdauer: voraussichtlich vom 01.04.2014 bis zum 31.03.2016

Aufgabengebiet:
Mitarbeit in der Lehre im Bachelor-Studium. Betreuung und Vorbereitung für folgende Lehrveranstaltungen: TechGI1: Digitale Systeme, TechGI2/TechGI2TI: Rechnerorganisation, Hardware Praktikum

Anforderungen:
Bachelor Technische Informatik oder Informatik, Abschluss des 3. Semesters und Modulabschlüsse in TechGI1 und TechGI2 bzw. äquivalente Abschlüsse, gute Englisch - und VHDL-Kenntnisse, Bereitschaft zur Einarbeitung in neue Themengebiete

Ihre schriftliche Bewerbung mit Lebenslauf, Immatrikulationsbescheinigung und ggf. aktueller Notenübersicht richten Sie bitte an:

Technische Universität Berlin
Fakultät IV - Elektrotechnik und Informatik
Institut für Technische Informatik und Mikroelektronik (TIME)
Fachgebiet Architektur eingebetteter Systeme (AES)
Sekretariat EN-12
Einsteinufer 17
10587 Berlin

e-Mail:

## Feb 10th, 2014: Mauricio Álvarez-Mesa gives an invited talk at University of Castilla-La-Mancha, Albacete, Spain.

Dr. Mauricio Álvarez-Mesa will give an invited talk at the Department of Computer Science of University of Castilla-La-Mancha, in Albacete, Spain on Monday February 10th. The title of the talk is: "Parallel Video Decoding: Experiences with H.264 and HEVC". In this talk Dr Álvarez-Mesa will present the latest results of the AES research group on video decoding using parallel architectures. Dr Álvarez-Mesa also will have a meeting with researchers from the Computer Architecture and Technology group at the Albacete Research Institute of informatics (I3A) about possible joint projects.

## Gesucht: Wiss. Mitarbeiter(in) -Entgeltgruppe 13 TV-L Berliner Hochschulen für max. 5 Jahre (zur Promotion)

Kennziffer: IV - 13/14 (besetzbar ab 01.03.2014/ Bewerbungsfristende 23.02.2014)

## Jan. 15th 2014. AES group at HiPEAC 2014 Conference.

A delegation of the AES group will participate in the HiPEAC 2014 conference which will be held in Vienna from January 20th to January 22nd 2014. AES participation includes three presentations at the LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014). The workshop is organized by members of the LPGPU European project, including the AES group from TUB. The presentations from the AES group are: "Power and Energy Efficiency of Video Decoding on Multi-core Architectures" by Chi Ching Chi, "DART: A Decoupled Architecture Exploiting Temporal SIMD" by Jan Lucas, and "Parallel H.264/AVC Motion Compensation for GPUs using OpenCL" by Biao Wang (who got a PhD student registration grant from the HiPEAC 2014 organizing committee). More information about the PEGPUM workshop can be found at lpgpu.org/wp/pegpum-2014/ and about the HiPEAC conference at www.hipeac.net/conference/vienna.

## Jan. 7, 2014: Paper accepted at MULTIPROG.

The paper "Considering Quality-of-Service for Resource Reduction using OpenMP" has been accepted for presentation at the Seventh Workshop on
Programmability Issues for Heterogeneous Multicores (MULTIPROG-2014) to be held in conjunction with the 9th International Conference on
High-Performance and Embedded Architectures and Compilers (HiPEAC) in Vienna, Austria on January 22, 2014. The paper is the result of collaboration between Artur Podobas, Mats Brorsson and Vladimir Vlassov from KTH in Stockholm, Sweden, and Chi Ching Chi and Ben Juurlink from the AES group of TU Berlin. More information about the workshop can be found at http://multiprog.ac.upc.edu.

## Jan. 2, 2014- AES publicizes their H264 OpenCL decoder.

To employ the power of GPUs for massive parallel processing, this work offloads parallel kernels in H.264 decoding, namely inverse transform and motion compensation, onto GPUs. At kernel level, significant speedup is observed compared to an highly optimized CPU SIMD implementation.

.

## Dec. 17, 2013: Visit to Technische Universität Dresden.

The PhD guest Guilherme Calandrini made a visit to TU Dresden for give a presentation entitled as "Performance Portability and Energy Issues in Computing Architectures" to the group of Operating Systems and Security of Prof. Dr. Hermann Härtig in the faculty of Computer Science, the group has a strong background of high level system development, such as Linux Kernel and virtualization, the visit aimed to present the issues in the development of energy efficiency applications that must handle with different layers of computing architecture (from circuit level, architecture design, operating system and why not the virtual machine). During the visit, he also had the opportunity to make known the LPGPU project to Prof. Dr. Emil Matus from the Vodafone Chair Mobile Communications Systems that works in a heterogeneous SoC for communication applications.

For further information about the talk, see os.inf.tu-dresden.de/EZAG/abstracts/abstract_20131217.xml

## Dec 5, 2013- 10h, EN 642: Multiprotokollfähige Master für Ethernet-basierte Feldbusse - Victor Kozhukhov

Moderne CNC-Steuerungen verwenden spezielle auf Ethernet basierte Protokolle, um die Ansteuerung der Slave-Geäte in Echtzeit zu ermöglichen. Eine CNC-Steuerung von Schleicher Electronic, die bereits in der Lage ist als ein Sercos-III-Master zu operieren, wird um die Funktionalität eines EtherCAT-Masters erweitert. Die Nutzung beider Protokolle soll über die gleichen Ethernet-Anschlüsse der CNC-Steuerung möglich sein. Der Benutzer soll selbstständig entscheiden können, ob die CNC-Steuerung als ein Sercos-III-Master oder ein EtherCAT-Master eingesetzt werden muss. Der Wechsel des Protokolls soll dabei mit einem möglichst geringen Aufwand stattfinden. Dabei sind vor allem Änderungen der Hardware (mit der Ausnahme der Inhalte der programmierbaren Logik) zu vermeiden. Die CNC-Steuerung verwendet einen speziell für Sercos-III-Master optimierten Dual-MAC. Der Dual-Mac des Sercos-III-Masters wird beim Starten der Software als IP-Core auf ein in die CNC-Steuerung integriertes FPGA geladen. Um die Nutzung der CNC-Steuerung als ein EtherCAT-Master zu ermöglichen, wird ein geeignetes Dual-MAC entwickelt. Somit kann beim Starten der Software entschieden werden, ob der Dual-MAC für das Sercos-III-Master oder der Dual-MAC für das EtherCAT-Master auf das FPGA geladen wird. Der Dual-PHY, der in die CNC-Steuerung integriert und mit dem FPGA verbunden ist, ist für beide Protokolle geeignet.

Es wird zusätzlich eine Anpassung der Software benötigt, damit die CNC-Steuerung in der Lage ist als ein EtherCAT-Master zu operieren. Für den Aufbau des EtherCAT-Master-Protokollstacks wird ein EtherCAT-Master-High-Level-Treiber von Acontis Technologies eingesetzt. Der EtherCAT-Master-High-Level-Treiber steht dabei in Form einer vorkompilierten Library zur Verfügung. Die anwendungsspezifische Software ist in der Lage den High-Level-Treiber einzubinden und über eine entsprechende API zu verwenden. Für den Einsatz des High-Level-Treibers, zusammen mit dem selbstständig für den EtherCAT-Master entwickelten Dual-MAC, wird eine Ethernet-Hardwareabstraktionsschicht implementiert.

## Dec 3, 2013: Vice chancellor of Politecnico di Milano visits AES group.

On Tuesday December 3, the vice chancellor of TU Berlin's partner university Politecnico di Milano, Prof. Donatella Sciuto, will visit the AES group. Mrs. Sciuto is a full professor in Computer Engineering at the Dipartimento di Elettronica e Informazione of the Politecnico di Milano. She is Deputy Director of Education at CEFRIEL where she manages the executive companies education training programs. For more information about Prof. Sciuto and her research interests, visit her website at  here.

## Dec 2, 2013, 13h: Crown Scheduling: Energy-Efficient Resource Allocation, Mapping and Discrete Frequency Scaling for Collections of Malleable Streaming Tasks- Prof. Dr. Christoph Kessler.

Time: 1:00 PM, 2 December 2013
Place: 4.064, MAR Building

Abstract:
We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or malleable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data driven way. A stream of data flows through the tasks and intermediate results are forwarded on-chip to other tasks.
In this presentation we introduce Crown Scheduling, a novel technique for the combined optimization of resource allocation, mapping and discrete voltage/frequency scaling for malleable streaming task sets in order to optimize energy efficiency given a throughput constraint. We present optimal off-line algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). Our energy model considers both static idle power and dynamic power consumption of the processor cores.
Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium sized task sets even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds. -
We conclude with a short outlook to the new EU FP7 project EXCESS (Execution Models for Energy-Efficient Computing Systems).

Acknowledgements:
This is joint work with Nicolas Melot (Linköping University), Patrick Eitschberger and Jörg Keller (FernUniv. in Hagen, Germany). Partly funded by VR, SeRC, and CUGS.
Based on our recent paper with the same title at Int. Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS-2013), Sep. 2013, Karlsruhe, Germany.

Short Biography:
Christoph W. Kessler (german spelling: Keßler) is a professor for Computer Science at Linköping University, Sweden, where he leads the Programming Environment Laboratory's research group on compiler technology and parallel computing. Christoph Kessler received a PhD degree in Computer Science in 1994 from the University of Saarbrücken, Germany, and a Habilitation degree in 2001 from the University of Trier, Germany.
In 2001 he joined Linköping university, Sweden, as associate professor at the programming environments lab (PELAB) of the computer science department (IDA).
In 2007 he was appointed full professor at Linköping university. His research interests include parallel programming, compiler technology, code generation, optimization algorithms, and software composition. He has published two books, several book chapters and more than 90 scientific papers in international journals and conferences. His contributions include e.g. the OPTIMIST retargetable optimizing integrated code generator for VLIW and DSP processors, the PARAMAT approach to pattern-based automatic parallelization, the concept of performance-aware parallel components for optimized composition, the PEPPHER component model and composition tool for heterogeneous multicore/manycore based systems, the SkePU library of tunable generic components for GPU-based systems, and the parallel programming languages Fork and NestStep.

## 27-28 Nov. 2013: Ben Juurlink gives keynote presentation at ICT.OPEN 2013.

Ben Juurlink has been invited to give a keynote presentation in the Embedded Systems track of ICT.OPEN 2013. ICT.OPEN is the principal ICT research conference in the Netherlands and is held on 27-28 November in Eindhoven. The title of his talk is "Lessons Learnt From Parallelizing Video Decoding". More information about the conference can be found at www.ictopen2013.nl/content/speakers.

## 21.11.13, 10h, EN 642: Manycore Agent-Oriented Programming (MAOP)- Silvano Menk and Robert Hering

In our presentation we want to give a short overview of our bachelor thesis. Therefore we will briefly discuss the current state of parallel programming with special focus on manycore architectures. From this we will deduce our idea for a supposedly intuitive and efficient programming model for manycore architectures, which will be the subject of our thesis. Finally we will propose a coarse working plan and hope for some initial feedback and suggestions.

## November 18-19 2013. Fusing GPU Kernels at HiPEAC Compiler, Architecture and Tools Conference.

A presentation based on a research work, that has been undertaken by Codeplay and TU Berlin’s AES group as part of the LPGPU project, will be presented at this year’s HiPEAC Compiler, Architecture and Tools Conference in Haifa, Israel. The talk is titled “Fusing GPU kernels within a novel single-source C++ API” and will be presented by Paul Keir from Codeplay.

Abstract of the talk:

The prospect of GPU kernel fusion is often described in research papers as a standalone command-line tool. Such a tool adopts a usage pattern wherein a user isolates, or annotates, an ordered set of kernels. Given such OpenCL C kernels as input, the tool would output a single kernel, which performs similar calculations, hence minimising costly runtime intermediate load and store operations. Such a mode of operation is, however, a departure from normality for many developers, and is mainly of academic interest.

Automatic compiler-based kernel fusion could provide a vast improvement to the end-user's development experience. The OpenCL Host API, however, does not provide a means to specify opportunities for kernel fusion to the compiler. Ongoing and rapidly maturing compiler and runtime research, led by Codeplay within the LPGPU EU FP7 project, aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. Along with LPGPU's AES group from TU Berlin, we have now also investigated opportunities for kernel fusion within this new framework; utilising features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.

While pixel-to-pixel tranformations are interesting in this context, insomuch as they demonstrate the expressivity of this new single-source C++ framework, we also consider fusing transformations which utilise synchronisation within workgroups. Hence convolutions, utilising halos; and the use of the GPU's local shared memory are also explored.

A perennial problem has therefore been restructured to accommodate a modern C++-based expression of kernel fusion. Kernel fusion thus becomes an integrated component of an extended C++ compiler and runtime.

## Nov. 17-22, 2013: Mr. Tamer Dallou is presenting a paper at MTAGS - SC 2013

Mr. Tamer Dallou is attending "The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2013)", to present a paper at the "6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS 2013)". The paper title is "FPGA-Based Prototype of Nexus++ Task Manager", and presents the recent VHDL design and evaluation of the Nexus++, our hardware task graph manager for task-based programming models. SC 2013 is a principal HPC conference world wide, and takes place in Denver, Co, USA on Nov. 17-22, 2013.

## 7.11.13, 10:30h, EN 642: Automatic Code Generation for a Microblaze system with ARM NEON SIMD Acceleration - Ilias Timon Poulakis

SIMD (Single instruction, Multiple data) accelerators are increasingly deployed in modern CPU architectures. These units can efficiently process certain data, e.g. mulimedia formats, improving CPU performance and energy consumption. The research department Embedded Systems Architecture(AES) of the Berlin Institute of Technology currently utilizes the Microblaze processor by XILINX, which does not sup- port SIMD acceleration natively. Hence, an ARM NEON compatible SIMD accelerator has been attached to the Microblaze processor. The two units communicate through a protocol based on FSL (Fast Simplex Link). To efficiently use this peculiar architecture, automatic code generation is needed. Yet, creating a custom compiler is difficult and utterly time-consuming. In order to avoid this route, this thesis presents an alternate approach in which merely existing compiler backends are used. The main idea is to create machine code for both Microblaze and ARM NEON separately, using their respective existing compiler backends. Code sections executable by ARM NEON have to be located, then be appropriately inserted into the Microblaze code. In the wake of this thesis, a tool that performs these tasks has been successfully imple- mented, tested and evaluated. This paper focuses on the realization steps taken.  The capabilities of the implemented tool are discussed, and an outlook is given on how the approach could be utilized for a different combination of processor and SIMD accelerator.

## 29–30 October 2013: Ben Juurlink @ Cyber-Physical Systems: Uplifting Europe's Innovation Capacity.

Ben Juurlink is currently visiting this two-day event in Brussels which is devoted to explore the innovation potential of Cyber-Physical Systems (CPS). This event is organized by the European Commission and discusses how EU Research and Innovation Programmes can stimulate the creation of new industrial platforms led by EU-actors and facilitate the matchmaking between future user/customer needs and technology offers. For more information, see http://www.amiando.com/cps-conference.html.

## October 14, 2013: Prof. Juurlink is a member of the PhD defense committee of Yifan He.

Prof. Juurlink is visiting Eindhoven, NL, where he is a member of the PhD defense committee of Yifan He.

Yifan He defends his dissertation entitled "Low Power Architectures for Streaming Applications".

## 24.10.13, 11h, EN 642: Design and Implementation of a high-throughput CABAC Hardware Accelerator for the HEVC Decoder- Philipp Habermann.

HEVC is the new video coding standard of the Joint Collaborative Team on Video Coding. As in its predecessor H.264/AVC, Context-based Adaptive Binary Arithmetic Coding (CABAC) is a throughput bottleneck. Due to strong low-level data dependencies, there is only a very small amount of data level parallelism that can be exploited by using the SIMD extensions of current computer architectures. A high-level parallelization is possible in HEVC, but not mandatory. That is why another optimization strategy has to be developed that can be used independently from the input video. Attention was paid for throughput improvements during the standardization of HEVC to address this issue. The goal of this thesis is to evaluate the hardware acceleration opportunities for the highly sequential HEVC CABAC by exploiting the throughput improvements. The evaluation is limited to transform coefficient decoding, as it is the most time consuming part of CABAC. The hardware accelerator is implemented on the Digilent ZedBoard, a development board that contains a 667 MHz ARM Cortex-A9 processor together with a closely coupled FPGA and thereby allows efficient hardware-software co-design. The implemented hardware accelerator processes 70 Mbins/s at 75.36 MHz and achieves an 11× speed-up over software transform coefficient decoding for a typical workload. The hardware accelerator has also been integrated in a complete HEVC software decoder but due to the current slow hardware-software interface, the overall speed-up is relatively small. However, as the data transfer between hardware and software can be significantly reduced when a full CABAC hardware accelerator is implemented, this is a promising path to pursue in future work.

## 24.10.13, 10h, EN 642: Design and Implementation of a Hardware Accelerator for HEVC Motion Compensation- Matthias Goebel.

This master thesis focuses on the design and implementation of a motion compensation hardware accelerator for use in HEVC hybrid decoders, i.e. decoders that contain hard- ware as well as software parts. The motion compensation part of the decoding process is especially suited for such an approach as it is the most time consuming part of pure software decoders. Support for high resolutions and frame rates should be combined by the hardware accelerator with a very low demand for resources and power. An optimized software decoder compatible to the reference decoder has been used as a starting point. As a platform the Zynq-7000 All Programmable SoC by Xilinx is used which combines an ARM Cortex-A9 dual-core CPU running at 667 MHz with flexible programmable logic resources similar to those used in FPGAs. After giving some background information on the involved topics a discussion of the design space with a special focus on the level of granularity, the degree of parallelization and the memory access is performed. For the granularity the PU level has been chosen as it offers a good trade-off between performance and complexity. The resulting design is further highlighted and a prototype implemented and validated. For validation the Foreign Language Interface (FLI) of Mentor Graphics’ ModelSim HDL simulator has been used. As an evaluation of the prototype shows promising results, two different memory interfaces (including one using DMA) are added and the complete accelerator integrated into a Zynq-7000 environment. The necessary modifications to the software decoder for both interfaces are discussed and partially performed. A final evaluation shows an expected frame rate of 4.14 FPS for the complete 1080p decoding process when running the accelerator at 100 MHz.

## 23.10.13: The paper "Considering Quality-of-Service for Resource Reduction using OpenMP" in the MCC13.

The paper "Considering Quality-of-Service for Resource Reduction using OpenMP" has been accepted for Oral presentation at the 6th Swedish Workshop on Multicore Computing and to be included in the workshop proceedings. The workshop will be held at Halmstad University in Halmstad, Sweden (November 25-26, 2013) .

## 15.10.13: The paper "FPGA-Based Prototype of Nexus++ Task Manager" to appear in MTAGS 2013.

The paper "FPGA-Based Prototype of Nexus++ Task Manager", by Tamer Dallou, Ahmed Elhossini and Ben Juurlink, is accepted to appear at the 6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers, which is Co-located with Supercomputing/SC 2013, on November 17th, 2013, Denver, Colorado, USA.

## AES in California: An NVIDIAn returns.

Tuesday, 01. October 2013

As of October 1st, the graduate student Michael Andersch has re-joined AES research. Michael had spent his summer in Santa Clara, California, where he was employed by NVIDIA during a summer internship. As an intern, Michael worked in the GPU Compute Architecture team, building tools and architecture designs to analyze and improve compute application performance on NVIDIA's next-generation GPU designs. Welcome back, Michael!

## Our new employee Philipp Habermann.

Monday, 28. October 2013

We are pleased to welcome Philip Habermann as a new member of our group. He will contribute to the AES-team in research and teaching. Welcome!

## "Recent Advances in Computer Architecture" will take place in room EN 630!

Attention! Room Change! The Course "Recent Advances in Computer Architecture" (0433 L 334) will take place on Tuesdays from 10:00 to 12:00 in room EN 630.

## The Lab excercises for Multicore Architectures will take place in room TEL 206 Li!

Attention! Room Change!

The Lab excercises for the Multicore Architectures course (LV 0433 L 333) will take place on Mondays from 14:00 to 16:00 in room TEL 206 Li.

## The Kurs Computer Arithmetics, Multicore Architectures and Recent Advances in Computer Architectures starts a week later.

The following courses will start a week later (from the October 21, 2013):

- Computer Arithmetics: A Circuit Perspectiv,

- Multicore Architecture and

- Recent Advances in Computer Architecture.

## 10.10.13, 10h, EN 642: Enhancing Cache Organization for Tiled CMP Architectures - Tareq Alawneh.

Many-core processors architecture has become very common nowadays with the leading CPU manufactures (Intel, AMD, and TILERA) focusing on tiled CMP architectures. Our target system assumes a tiled CMP architecture consists of n-core interconnected with 2D mesh switched network. Each tile has a processor core, a private L1-D/I cache, private L2 cache, and router for on-chip data transfers. Each cache block has a home tile which maintains the directory information for that block- the directory keeps track of tiles with copies for that block. On occurring miss in the private L1 and L2 caches respectively, it will request it from home tile. In case of miss happened, it will be handled depending on its specific coherent protocol implementation. The drawback of this design is the possibility of overloading some home tiles with the remote requests which creates a scalability bottleneck. Furthermore, as the processor count increases the L2 miss cache access latency will be dominant by the number of message hops to reach the particular cache rather than the time spent to access the cache itself. These drawbacks can be mitigated when taking into account other access patterns of the data.

In this study, we analyze this problem and propose ways to alleviate its impact on the system performance. One way to improve the system performance of the tiled CMP architecture is to access the L2 cache banks of the adjacent tiles to fetch the requested code cache lines before accessing its assigned home tiles. Realizing such mechanism will reduce the L2 remote cache latency, since the requested code cache lines may be fetched them from L2 caches of nearby tiles instead of L2 caches of its home tiles. Furthermore, the number of accesses for the home tiles will be reduced. These two contributions of our proposed study will be certainly reflected in the improvement of the system performance as a consequence of expected reduction of network utilization and AMAT.

As future work, we propose another way to improve the tiled CMP architectures by migrating hot cache lines closer to requesting tiles.

## October, 6-10: Prof. Juurlink to ICCD conference.

From October 6 to October 10 Prof. Juurlink will visit the IEEE International Conference on Computer Design in Asheville, North Carolina, USA.

He is the chairman of the Processor Architecture track and will also chair the session on Efficient Cache Architectures.

## Sept. 30th 2013. A delegation from Hunan University, China visits AES TU-Berlin.

A delegation from the University of Hunan (one of the oldest and most important national universities in China) will visit the AES group of TU-Berlin on September 30th. They will be introduced to the research activities of the AES group, and discuss opportunities for joint research work. The delegation is composed by 7 faculty members from the School of Computer and Communication lead by professor Renfa Li.

## 13.09.13: AES TU Berlin presents 4k UHD HEVC/H.265 decoding.

The AES group is proud to present its highly efficient 4k Ultra HD capable MPEG-HEVC/H.265 decoder setup. A demo setup is created with a 65 inch Samsung UHD TV and a custom mini PC based on the 4th generation Intel Core processor. Optimization for the latest generation processors allow the compact setup to decode UHD faster than 60 fps even at higher bit depths with no more than two threads.

## 26.09.2013, 10h, EN 642: A Cost-Effective Kite State Estimator for Reliable Automatic Control of Kites​- Johannes Peschel

Airborne Wind Energy (AWE) is a developing technology that uses tethered wings to harvest wind energy and convert it into electrical energy. Most of the AWE concepts that will be presented in this thesis have one common challenge: Estimating the position and the orientation of the kite, also called kite state, especially during highly dynamic flight situations. The focus of this thesis is first, to investigate, if angular sensors are feasible to obtain reliable position data and second, which fusion algorithm can be used to join the data of the angular and Global Navigation Satellite System (GNSS) sensors. The TU Delft prototype is a suitable testing platform for this purpose. The author added angular sensors to the ground station of the TU Delft AWE system that measure the elevation and the horizontal displacement of the tether holding the kite. They are mounted on a modular stainless steel construction, which has low wear and a long lifetime. The author used the tether length and the angular data to obtain a new position. This position was merged with the two GNSS positions that were already attached to the kite. The angular sensors were able to measure with a resolution of <0.01°. The elevation and azimuth position of the kite had an error of less than 0.7° as long as the tether force was higher than 2000N. One of the GNSS sensors provided reliable data during low force phases. A reliable position in all flight conditions could be obtained by using double exponential smoothing prediction to merge both positions. This development enables the implementation of a reliable kite power control system.

## Sept. 22-27, 2013: Prof. Juurlink to ScalPerf workshop.

Prof. Juurlink has been invited to give a presentation at the ScalPerf (Scalable Approaches to High Performance and High Productivity Computing) workshop which will be held in Bertinoro, Italy from Sept. 22 to Sept. 27, 2013. There he will present his recent article "Amdahl's law for predicting the future of multicores considered harmful". For more information about the workshop see http://www.dei.unipd.it/~versacif/scalperf13/index.html. The article can be accessed via ACM Digital Library http://doi.acm.org/10.1145/2234336.2234338.

## September 15, 2013: "HiPEAC grant: Performance portability for low-power embedded GPUs"

The AES group of TU Berlin has received a collaboration grant from HiPEAC for a three month visit of Guilherme Calandrini, a PhD student from the University of Alcala in Spain. The visit will focus on performance portability for low-power embedded GPUs using OpenCL. In this collaboration we aim to create a set of OpenCL benchmarks that can be used to compare the performance and power efficiency of different embedded low-power GPUs. The results of this research will be very useful for understanding the performance and power implications of optimization strategies for different GPU architectures; and also selecting the most appropriate GPUs based on well defined quantitative performance and power metrics.

## 11.09.13: Best paper award at the 3rd IEEE 2013 ICCE-Berlin.

Mauricio Alvarez-Mesa, Chi Ching Chi and Ben Juurlink of the AES group of TU Berlin have won a best paper award at the Third IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin) for the paper "HEVC Performance and Complexity for 4K Video". The paper was a joint effort between the AES group of TU Berlin and Fraunhofer HHI.

## September 9, 2013: The AES group will host Mr. Hasan Hassan.

The AES group will host Mr. Hasan Hassan, who is a student at TOBB University of Economics and Technology (http://etu.edu.tr/en) as an intern to work on porting computer vision algorithms to GPUs using OpenCL. The internship will be organized within the framework of the EU Erasmus Programme and will take place between 09-09-2013 and 20-12-2013. We aim to develop several kernels that are used in various computer vision algorithms, with high demand of parallel computation, to the GPU world, which provides a high level of parallel processing.

## September 7, 2013: Prof. Dr. Ben Juurlink in the MuCoCoS-2013

The Paper "Topology-aware Equipartitioning with Coscheduling on Multicore Systems" by Jan H. Schönherr, Ben Juurlink and Jan Richling will be presented in the 6th International Workshop on Multi-/Many-core Computing Systems (MuCoCoS-2013), which will be held on September 7 in Edinburgh, Scotland, UK, in conjunction with the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT 2013).

MuCoCoS-2013 focuses on language level, system software and architectural solutions for performance portability across different architectures and for automated performance tuning.

## Aug 29th. 2013. AES paper in the SPIE Applications of Digital Image Processing Conference.

The paper "HEVC real-time decoding" by Mauricio Alvarez-Mesa, Chi Ching Chi and Ben Juurlink of the AES group of TU-Berlin has been presented at the SPIE Applications of Digital Image Processing Conference that was help in San Diego, USA, from August 25 to August 29 2013. The paper was a joint effort between the AES group of TU Berlin and Fraunhofer HHI.

## Juli 2013: "best-in-class-award" to Philip Habermann.

Recently the "best-in-class-award" was handed out by Prof. Juurlink to Philip Habermann for the class "Advanced Computer Architecture 2011-2012". The "best-in-class-award" is awarded to the best performing student who achieves the highest grade in Prof. Juurlink's master courses. This year it will also be awarded.

## 14-20 july, 2013: Sohan Lal and Jan Lucas from TU Berlin are going to present two posters at HiPEAC ACACES 2013.

Sohan Lal and Jan Lucas from TU Berlin are going to present two posters at HiPEAC ACACES 2013. The two posters will present some of the most recent research results from the project to the public for the first time.

• The poster “Exploring GPGPUs Workload Characteristics and Power Consumption” by Lal et al. will provide interesting insights into the power consumption of GPU workloads and how they are related to the performance characteristics of the workloads.
• The poster “DART: A GPU architecture exploiting temporal SIMD for divergent workloads” will present first simulation results for DART, an new GPU architecture developed within the LPGPU consortium by Lucas et al.

## 8.07.2013, 13h, EN 642: Implementation and Evaluation of Large Warps in GPUSimPow - Matthias Stroux

Graphic processors (GPU) are a special class of parallel processors for massive parallel programs. From additional processors enhancing graphics intensive programs they have developed into general purpose computing devices for high-performance business and scientific computing. GPU’s typically handle branches by sequentializing the branch paths, which leads to underutilization of their SIMD execution units. Large-Warps (LW) is a concept to increase utilization by selecting threads with the same execution path, PC and program-state, from larger units, called ’Large-Warps’ into temporary units of execution, the ’Sub-Warps’. LW should therefore lead for some programs to a significant increase of SIMD-utilization. Theoretical considerations also show that for some programs there should be an increase of IPC or a decrease of execution cycles possible by a factor of more than two. To test this concept with real programs and in a complete system, where memory latency and network effects can be taken into account and measured, Large-Warps was implemented in a software simulator for GPU’s, GPGPU-Sim 3.x and the new power-simulator GPUSimPow for power effects. Results for ideally constructed synthetic benchmarks show the expected effects: where functional execution of SIMD-units can be increased, IPC increases to. However memory effects and effects of other system parts have to be taken into account to. For a number of ’real-world’ benchmarks the positive effect of Large-Warps on performance (IPC) can be confirmed.

## June 12th 2013: A group of computer engineering students from Universidad del Valle (Colombia) visit AES TU-Berlin

A group of computer engineering students accompanied by professor Dr. Maria Trujillo from Universidad del Valle of the Colombian city of Cali will visit TU-Berlin on June 12th 2013. The German Academic Exchange Program (DAAD) has organized and financed this visit which will allow the students to know the research and teaching activities of the AES-TUB group, and also will be the starting point for future research collaborations.

## June 12th 2013, 10 a.m., EN 185: Achronix tech talk

The outline of the talk is:

• Overview of the FPGA market;
• Achronix in the High End FPGA market;
• Achronix Value Proposition versus incumbent high end FPGA vendors;
• INTEL / Achronix partnership : Current 22 nm Tri-gate process product, 14 nm products 2014/201, 10 nm;
• Products available at Achronix in 2nd H 2013;
• SW tools presentation (video);
• HD1000 demo board ;

## SoSe 13- A New Course: Computer Arithmetic: Circuit Respective

The advance of modern embedded systems, and their high computation capabilities mainly depends on their ability to perform arithmetic operation in an efficient manner. This course is intended to increase the Knowledge about the design of embedded arithmetic circuits as well as the scientific background of these circuits. This will help the students to gain more details about the design of arithmetic processing units and more practical experience in the implementation of digital systems. The students will increase their experience in the use of hardware description languages to model and implement digital systems. The implementation of these circuits using VHDL/FPGA will be included as well.

## April 21, 2013: AES-Paper at ISPASS 2013.

The paper "Why a Single Chip Causes Massive Power Bills - GPUSimPow: A GPGPU Power Simulator" by Jan Lucas, Sohan Lal, Michael Andersch, Mauricio Alvarez-Mesa and Ben Juurlink has been accepted at the 2013 International Symposium on Performance Analysis of Systems and Software which will be held from April 21-23 in Austin, Texas, US. The paper details much of the work performed by AES' Low-Power GPU group concerning power simulation.

## 12.04.2013, 10h, EN 642: Bachelor Thesis: High-Throughput Communication Interface for the Xilinx XUPV5 Evaluation Platform -Lester Kalms

Due to the increasing tasks of processors in computer systems and the growing complexity, it is not wrong to outsource some of these tasks to relieve the processor. Some of these tasks are done by expansion cards, such as graphic-cards or sound-cards. These cards communicate nowadays via PCI-Express with the rest of the system. Peripherals cards can of course also support other tasks. A platform for the development of an expansion card is for example provided by Xilinx with the evaluation platform XUPV5 [5]. In order to communicate with the card, an interface is needed on the hardware and on the software side of the communication. This can be developed with this card and the help of the Xilinx tools. In times of ever-increasing amounts of data, a correspondingly high data throughput is needed, which is in theory feasible with PCI-Express. This thesis deals with the development of an interfaces that communicates via PCI-Express and of how to maximize the data throughput. This Thesis has been done to support the work of others, which want to develop an efficient expansion card. The second chapter deals with PCI-Express and explains fundamental things to create a basic understanding. It explains what PCI-Express is, how communication works and what data throughput can be achieved. PCI-Express communicates via packets. These packets are called "Transaction Layer Packets". The third chapter deals with the system which has been developed. It is described how the hardware design works as a whole and in detail and how the hardware design has been implemented. It will also be described how the software system is created and how it works, and especially how these two systems interact with each other. The following chapters include the practical work. The fourth chapter describes how a running system, which satisfies the requirements, has been created. The system described in the previous chapter was able to communicate, but there were still some errors in various situations. It explains what has been done to correct these errors and what did not work and why did it not work. The fifth chapter deals with the increasing of the data throughput and it also includes some measurements. For easier handling and measurement, a user application was implemented. In the final chapter the results will be commented, interpreted and compared with the theory. Finally, there is an outlook on methods that still can be tested or that have not yet been tested completely.

## 8-11 April 2013: a 4K H.265/HEVC real-time decoder at NABShow 2013 in Las Vegas

A 4K H.265/HEVC real-time decoder has been presented at the NABShow in Las Vegas, Nevada, USA during April 8-11 2013. The demo consisted of a software based decoder running a multicore PC connected to a 4K 84 inches TV. It was presented at the Fraunhofer HHI Booth C7843. The real-time decoder has been developed as a part of a collaborarion between the Fraunhofer Heinrich Hertz Institute (HHI) and the AES group of TU-Berlin. The demo was presented by Benjamin Bross from Fraunhofer HHI and Mauricio Alvarez-Mesa from Fraunhofer HHI and TU-Berlin.

## 14/03/2013, 10h, EN 642: "Migen - a Python toolbox for building complex digital hardware"-Sébastien Bourdeauducq

Despite being faster than schematics entry, hardware design with Verilog and VHDL remains tedious and inefficient for several reasons. The event-driven model introduces issues and manual coding that are unnecessary for synchronous circuits, which represent the lion's share of today's logic designs. Counter-intuitive arithmetic rules result in steeper learning curves and provide a fertile ground for subtle bugs in designs. Finally, support for procedural generation of logic (metaprogramming) through "generate" statements is very limited and restricts the ways code can be made generic, reused and organized.
To address those issues, we have developed the Migen FHDL library that replaces the event-driven paradigm with the notions of combinatorial and synchronous statements, has arithmetic rules that make integers always behave like mathematical integers, and most importantly allows the design's logic to be constructed by a Python program. This last point enables hardware designers to take advantage of the richness of the Python language - object oriented programming, function parameters, generators, operator overloading, libraries, etc. - to build well organized, reusable and elegant designs.
Other Migen libraries are built on FHDL and provide various tools such as a system-on-chip interconnect infrastructure, a dataflow programming system, a more traditional high-level synthesizer that compiles Python routines into state machines with datapaths, and a simulator that allows test benches to be written in Python.
URL:  http://milkymist.org/3/migen.html

## 02/01/2013 - 11 a.m.: Master school - ICT Innovation - information event

Invitation to the information event

On February, 1st 2013 at 11 a.m. an information event for the "European Dual Degree Master in ICT innovation" will take place in room TEL AB (Telefunken tower). We would like to invite all interested students, and especially those who will finish their BSc degree until August 2013.

The dual degree master program "ICT Innovation" will start in the winter term 2013/14. The application deadline is April, 15th 2013.

## 31.01.2013, 10h, EN 642: "Composing Execution Times on Multicore Processors" - J. Reinier van Kampenhout

The use of multicore processors in embedded systems promises to reduce the space, weight and power requirements while offering increased functionality. To enable these benefits however, a runtime environment must be able to execute multiple safety-critical applications in parallel with non-critical applications. An underlying problem in multicores is the use of shared resources, which leads to interference between applications and unpredictable timing behaviour which is not acceptable for critical applications with hard real-time requirements.

In this research we will conceive and implement a concept for the execution of real-time applications on multicore processors with a composable timing behaviour. In our approach we decompose applications into basic blocks whose, behaviour is deterministic and can be determined empirically. Using models that capture the essential properties of the HW and SW we construct a deployment scheme out of these blocks. The result is a system on which multiple mixed-criticality applications are executed in parallel, each of which has a timing behaviour that is composed out of that of its basic blocks. Thus our method guarantees isolation between applications and simplifies worst case execution analysis, independent of the hypervisor or OS. The usage of resources can furthermore be optimized by allocating any unused resources dynamically to non-critical applications at run time. We will prove the effectiveness of our concept by comparing the variation in execution times to those achieved with purely static scheduling, fixed-priority scheduling and virtualization.

## 21.01.2013: Mr. Tamer Dallou has won a “Best Poster Award” at the HiPEAC 2013.

Mr. Tamer Dallou has won a “Best Poster Award” for our joint poster “Nexus++: A hardware Task Manager for the StarSs Programming Model” at the 8th International Conference on High-Performanceand Embedded Architectures and Compilers HiPEAC 2013, January 2013, Berlin, Germany.

Abstract:
Recently, several programming models have been proposed that try to relieve parallel programming. One of these programming models is StarSs. In StarSs, the programmer has to identify pieces of code that can be executed as tasks, as well as their inputs and outputs. Thereafter, the runtime system (RTS) determines the dependencies between tasks and schedules ready tasks onto worker cores. Previous work has shown, however, that the StarSs RTS may constitute a bottleneck that limits the scalability of the system and proposed a hardware task manager called Nexus to eliminate this bottleneck. Nexus has several limitations, however. For example, the number of inputs and outputs of each task is limited to a fixed constant and Nexus does not support double buffering. Here we present Nexus++ that addresses these as well as other limitations. Experimental results show that double buffering achieves a speedup of $54\times$, and that Nexus++ significantly enhances the scalability of applications parallelized using StarSs.

## Jan. 2013 : HiPEAC 2013 in Berlin.

The HiPEAC conference will be held in Berlin  from Monday 21 to Wednesday January 23, 2013. The HiPEAC conference is the premier forum for experts in computer architecture, programming models, compilers and operating systems for embedded and general-purpose systems in Europe. In 2013 the general chairs will be Ben Juurlink of TU Berlin and Keshav Pingali of the University of Texas, Austin. Program chairs are André Seznec of INRIA Rennes and Lawrence Rauchwerger of Texas A&M University. Paper selection is performed by the ACM journal TACO. More than 500 people attended the HiPEAC 2012 conference in Paris. Hopefully HiPEAC 2013 will be as successful. For more information, stay tuned at http://www.hipeac.net/conference/berlin.

## 2012: Book "Scalable Parallel Programming Applied to H.264/AVC Decoding"

The book titled  "Scalable Parallel Programming Applied to H.264/AVC Decoding" co-authored by Ben Juurlink, Mauricio Alvarez-Mesa, Chi Ching Chi, Arnaldo Azevedo, Cor Meenderinck and Alex Ramirez has been published by Springer as part of the series SpringerBriefs in Computer Science. The book can be purchased from several internet retailers. More information can be found at Springer webpage: http://www.springer.com/engineering/signals/book/978-1-4614-2229-7

## Nov. 1, 2012: AES-Paper in the IEEE Transactions on circuits and Systems for Video Technology

The paper "Parallel Scalability and Efficiency of HEVC Parallelization Approaches" by C.C. Chi, M. Alvarez-Mesa, B. Juurlink, G. Clare, F. Henry, S. Pateux and T. Schierl, has been accepted in the IEEE Transactions on circuits and Systems for Video Technology. The paper is part of a special issue about High Efficiency Video Coding (HEVC) that will appear in December 2012. The paper can now be accessed at ieeeXplore: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6327343&isnumber=4358651

## Nov. 10, 2012:AES-Paper in the Journal of Signal Processing Systems.

The paper "Parallel HEVC Decoding on Multi- and Many-core Architectures. A Power and Performance Analysis" by C.C. Chi, M. Alvarez-Mesa, J. Lucas, B. Juurlink, and T. Schierl, has been accepted in the Journal of Signal Processing Systems. It will appear soon in a special issue about Design and Implementation of Signal Processing Systems.

## Nov. 12, 2012: "SynZEN: A Hybrid TTA/VLIW Architecture with a Distributed Register File" ar the NORSHIP 2012

The paper "SynZEN: A Hybrid TTA/VLIW Architecture with a Distributed Register File" by S. Hauser, N. Moser, B. Juurlink, accepted at the NORCHIP - The Nordic Microelectronics event 2012 which will be held in Copenhagen, Denmark, on Nov. 12 - Nov. 13 2012. More information about NORCHIP 2012 can be found at www.norchip.org.

## Oct 23th, 11h, KIT: High Efficiency Video Coding on Multi- and Many-core Architectures by M. Alvarez Mesa

Dr. Mauricio Alvarez-Mesa will give an invited talk at Karlsruhe Institute of Technology titled "High Efficiency Video Coding on Multi- and Many-core Architectures" in which he will present the latest results of the AES research on HEVC decoding on parallel architectures. The talk will be held on October 23th, at 11:00 am at Karlsruher Institut für Technologie (KIT), Institut für Prozessdatenverarbeitung und Elektronik (IPE), Karlsruhe, Germany.

## 11. Oct 2012- 10h-EN 642: Scalable Runtime and OS Abstractions for Mesh-Based MultiCores (Prof. Frank Mueller)

Current trends in microprocessors are to steadily increase the number of cores. As the core count increases, the network-on-chip (NoC) topology has changed from buses over rings and fully connected meshes to 2D meshes.

This work contributes NoCMsg, a low-level message passing abstraction over NoCs. NoCMsg is specifically designed for large core counts in 2D meshes. Its design ensures deadlock free messaging for wormhole Manhattan-path routing over the NoC. Experimental results on the TilePro hardware platform show that NoCMsg can significantly reduce communication times when compared with other NoC-based message approaches. They further demonstrate the potential of NoC messaging to outperform shared memory abstractions, such as OpenMP, as core counts and inter-process communication increase.

This work further explores the benefits of novel runtime and operating systems abstractions for large scale multicores. On top of NoCMsg, a distributed OS abstraction is promoted instead of the traditional shared memory view on a chip. This distributed kernel features a pico-kernel per core. Sets of pico-kernels are controlled by micro-kernels, which are topologically centered within a set of cores. Cooperatively, micro-kernels comprise the overall operating system in a peer-to-peer fashion.

Biography: Frank Mueller () is a Professor in Computer Science and a member of multiple research centers at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994. He has published papers in the areas of parallel and distributed systems, embedded and real-time systems and compilers. He is a member of ACM SIGPLAN, ACM SIGBED and a senior member of the ACM and IEEE Computer Societies as well as an ACM Distinguished Scientist. He is a recipient of an NSF Career Award, an IBM Faculty Award, a Google Research Award and a Fellowship from the Humboldt Foundation.</pre><pre>

## 11.Oct 2012: Courses in WS2012/13

The Course information for the current semester is online . We would particularly like to report the new course AES for bachelor students.

## Sept 30- Oct 3, 12:"Improving the Parallelization Efficiency of HEVC Decoding" at the ICIP 2012

The paper "Improving the Parallelization Efficiency of HEVC Decoding"
by C. C. Chi, M. Alvarez-Mesa, B. Juurlink, V. George and T. Schierl has been accepted at the 2012 IEEE International Conference on Image Processing (ICIP) which will be held in Orlando, Florida, USA, on Sept. 30 - Oct. 3 2012. This paper is the second of a collaboration between the AES group and the Multimedia Communications Group of the Fraunhofer HHI Institute on the topic of parallel  processing for HEVC. More information about ICIP-2012 can be found at http://icip2012.com.

## Sept 10, 2012: "Hardware-Based Task Dependency Resolution for the StarSs Programming Model" at SRMPDS'12

The paper "Hardware-Based Task Dependency Resolution for the StarSs Programming Model" by Tamer Dallou and Ben Juurlink has been accepted at the "SRMPDS'12 - Eighth International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems", which will be held in conjunction with "ICPP'12 - The 2012 International Conference on Parallel Processing" in Pittsburgh, PA on September 10, 2012.
This paper is a result of the research conducted at AES as part of the ENCORE project. More information on SRMPDS can be found at:
http://www.mcs.anl.gov/~kettimut/srmpds/

## Sept 5-8, 2012: "A Novel Predictor-based Power-Saving Policy for DRAM Memories" at the 15th EUROMICRO Conference on Digital System Design (DSD)

The paper "A Novel Predictor-based Power-Saving Policy for DRAM Memories" by Gervin Thomas, Karthik Chandrasekar, Benny Akesson, Ben Juurlink and Kees Goossens has been accepted at the 15th EUROMICRO Conference on Digital System Design (DSD), Cesme, Izmir, Turkey on September 5th - September 8th, 2012. This paper is a collaboration between the AES group (TU-Berlin) and Electronic Systems group (TU Eindhoven). More information about DSD-2012 can be found at http://www.univ-valenciennes.fr/congres/dsd2012/.

## August 27, 2012: "An Optimized Parallel IDCT on Graphics Processing Units" at HeteroPar'2012

The paper "An Optimized Parallel IDCT on Graphics Processing Units" by Biao Wang, Mauricio Alvarez-Mesa, Chi Ching Chi, and Ben Juurlink has been accepted at the 2012 International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar'2012) which will be held in Rhodes Island, Greece on August 27, 2012. The paper presents the work of offloading H.264 IDCT kernel to the GPUs which has been conducted at AES as part of the LPGPU project. More information on HeteroPar can be found at http://pm.bsc.es/heteropar12/.

## July 31, 2012: AES has setup a testbed to accurately measure GPU power consumption.

AES has setup a testbed to accurately measure GPU power consumption. This testbed is being used to evaluate power reduction techniques on available GPUs. It will also be used to validate the power modeling of GPUSimPow, the GPU power simulator developed within the LPGPU project. Its high bandwidth and high sampling speeds enable it to accurately measure short, sub-ms power events.
The AES developed measurement software allows developers to pinpoint power consumption down to the individual kernel.

## 16-19 july, 12: "Using OpenMP Superscalar for Parallelization of Embedded and Consumer Applications" at the SAMOS XII

The paper "Using OpenMP Superscalar for Parallelization of Embedded and Consumer Applications" by M. Andersch, C.C. Chi and Ben Juurlink has been accepted at the 2012 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) which will be held in Samos, Greece on July 16.-19. 2012. The paper is the latest of the research concerning the OpenMP Superscalar programming model which has been conducted at AES as part of the ENCORE project. More information on SAMOS can be found at http://samos.et.tudelft.nl/samos_xii/html/.

## July 11, 2012: "Nexus++: A hardware Task Manager for the StarSs Programming Model" at ACACES'12

The poster "Nexus++: A hardware Task Manager for the StarSs Programming Model" by Tamer Dallou and Ben Juurlink has been presented at the "ACACES'12 - Eighth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems", which was
held in Fiuggi, Italy, on 8-14 July, 2012.
This poster presents some of the results of the research conducted at AES as part of the
http://www.hipeac.net/summerschool/

## 8-14 july, 12: Mr. Tamer Dallou attends ACACES 2012.

Mr. Tamer Dallou was awarded a HiPEAC grant to attend the Eighth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems ACACES 2012, 8-14 july, 2012, Fiuggi, Italy.

The "HiPEAC Summer School" is a one week summer school for computer architects and compiler builders working in the field of high performance computer architecture and compilation for embedded systems. The school aims at the dissemination of advanced scientific knowledge and the promotion of international contacts among scientists from academia and industry.

## AES group purchases TILE-Gx36 many-core.

The AES group of TU Berlin has purchased a state-of-the-art TILE-Gx36 many-core with 36 64-bit processor cores (tiles) from Tilera (tilera.com). Soon researchers and students of AES will be able to work on this state-of-the-art many-core processor. For more information about the TILE-Gx processor family, see http://tilera.com/products/processors/TILE-Gx_Family.

## May 12: Article in ACM SIGARCH Computer Architecture News.

Ben Juurlink and his PhD graduate Cor Meenderinck have published an article entitled "Amdahl's Law for Predicting the Future of Multicores Considered Harmful" in the current (May 2012) issue of Computer Architecture News, which is published by the ACM Special Interest Group on Computer Architecture (SIGARCH) [http://www.sigarch.org/]. In the article they consider how the predictions in the influential paper of Hill and Marty [1] change when instead of Amdahl's Gustafson's law is assumed. They also propose a different scaling equation called Generalized Scaled Speedup Equation (GSSE) that encompasses Amdahl's as well as Gustafson's law. [1] Mark D. Hill, Michael R. Marty: Amdahl's Law in the Multicore Era. IEEE Computer 41(7): 33-38 (2008)

## HiPEAC '13: Call for papers

The 8th HiPEAC conference will take place in Berlin, Germany from Monday 21 to Wednesday January 23, 2013.

For submission details, please refer to http://mc.manuscriptcentral.com/taco.

• Workshops/tutorials: June 1, 2012
• Papers: June 18, 2012
• Posters: October 15, 2012
• Early Registration Deadline: December 22, 2012

## 16 May 12: Prof. Dr. Ben Juurlink in the Map2MPSoC/SCOPES.

Prof. Ben Juurlink will give an invited keynote at the 5th Workshop on Mapping of Applications to MPSoCs and 15th International Workshop on Software and Compilers for Embedded Systems, which will be held May 15-16 in the beautiful Schloss Rheinfels hotel at St. Goar, Germany (http://www.scopesconf.org/scopes-12/)

## 10. Mai 12-10h - Room EN 642: REFLEX (Richard Weickelt)

REFLEX is a framework for deeply embedded control systems. It is based upon the event-flow model, which greatly supports component-centric development of concurrent applications. In combination with multiple scheduling directives, interrupt handling and power management facilities, developers can create applications that are both, deadlock-free and totally predictable.

The library is implemented in C++ and benefits from its powerful language features. Only few parts are platform dependent and can be ported to new architectures with very little effort. A standard compiler like g++ is the only requirement.

REFLEX was developed at the TU Cottbus and is released under the BSD license. In this meeting You will get a brief overview on the framework and its features. After a case study about a real-world product, future research challenges will be discussed.

## 12.04.2012- 10h: Online satellite image processing (Kristian Manthey)

Herr Kristian Manthey wird am 12.04.2012 um 10 Uhr im Rahmen unseres Forschungstreffen einen Vortrag zum Thema: Online satellite image processing (Realtime Image compression on reconfigurable Hardware)  halten. Raum: EN 642.

Abstract: There are challenging requirements on optical systems in spaceborne missions. In the last years, the spatial as well as the spectral resolution of the image data increased resulting in a tremendous increase in data rate. There are also requirement to image quality and constraints resulting from the environment in which the system should be used. An optical system for spaceborne application must have a very high reliability, low power consumption as well as a low weight. The system must be radiation tolerant and able to operate in vacuum and in a high temperature range. With the decrease of the ground sample distance (GSD) or the increase of swath, the amount of data increases significantly. Due to the limitation of transmission bandwidth to the ground station, it is necessary to compress the data. Depending on the requirements of the mission, lossless or lossy compression schemes can be used. Image Compression itself is based on the removal of redundant information in the image, such as spatial or statistical redundancy or of the removal of information not needed in the further processing. Image compression architectures consist of spatial decorrelation to remove spatial redundancy, in case of lossy compression followed by quantization and finally entropy coding to remove statically redundancy. Spatial decorrelation in typical space mission is done by prediction (DPCM), discrete cosine transform (DCT) or discrete wavelet transform (DWT). To achieve best compression results, inter-band decorrelation techniques are necessary. This is obvious because image data has correlation between bands or when using multi spectral sensors (MS) in combination with a sensor which is sensitive in all MS channels.  In the DLR, it is planned to develop a satellite camera which does all tasks - image acquisition, pre-processing, compression, storage, data formatting and communication with the ground station - on a single multi-chip-module (MCM). In a first step, the image compression should be done directly on the image acquisition module. The goal of this thesis is to investigate scenarios, where the ground station interactively requests and decompresses the image data, and to develop a high-speed image compression system on the image acquisition module.

## 25-29 March 2012 : M. Alvarez Mesa presents a paper at ICASSP-2012 in Kyoto, Japan

The paper "Parallel video decoding on the emerging HEVC standard"
by M. Alvarez-Mesa, C. C. Chi, B. Juurlink, V. George and T. Schierl has been accepted at th 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) which will be held in Kyoto, Japan, on March 25 - 30, 2012. The ICASSP meeting is one of the largest technical conference focused on signal processing and its applications. The paper, which is the result of a collaboration between the AES group and the Multimedia Communications Group of the Fraunhofer HHI Institute, will be presented by Mauricio Alvarez-Mesa at the session "Parallel and embedded signal processing systems". More information about ICASSP-2012 can be found at http://www.icassp2012.com.

## 26-27 March 2012: Prof. Dr. Ben Juurlink and Sean Halle present their progress in the LPGPU project in Cambridge

Prof. Dr. Ben Juurlink and Sean Halle are going to Cambridge for the first LPGPU face to face meeting, on March 26 and 27.  They will discuss interactions between the work-packages, the low-power industry-space, and tackle simulator questions.  Each participant is going to present their progress in the first year of LPGPU in preparation for the first-year review.

## 25-29 Feb. 2012 : Michael Andersch presents a poster at PPoPP in New Orleans.

The paper "Programming Parallel Embedded and Consumer Applications in OpenMP Superscalar" by Michael Andersch, Chi Ching Chi, and Ben Juurlink was accepted as a poster presentation at the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). The student Michael Andersch will present the poster in New Orleans from February 25 to February 29, 2012. For more information about the PPoPP conference, see http://dynopt.org/ppopp-2012/.

## 1.01.12 - 15.02.1212: EIT ICT Master School geht an den Start

Die Bewerbungsphase für die neue Master School läuft vom 1. Januar bis 15. Februar 2012. Weitere Informationen unter eitictlabs.masterschool.eu

## 10.01.2011 -16h: A System-Level Approach to Parallelism (Sean Halle)

Vortragsankündigung: A System-Level Approach to Parallelism (Sean Halle)
Dienstag, den 11. Januar 2011 um 16 Uhr im E-N 360.

## 31.03.2010: Lehrangebot im SS2010

Das Lehrangebot unseres Fachgebietes kann im Bereich Studium und Lehre eingesehen werden. Besonders hinweisen möchten wir auf das Master Modul "Advance Computer Architectures" für Informatiker und Technische Informatiker, welches in diesem Semester erstmalig angeboten wird.

## 8.03.10- 10h : New architectures for the final scaling of the CMOS world (Professor Luigi Carro)

Vortragsankündigung: New architectures for the final scaling of the CMOS world (Professor Luigi Carro). Montag, den 08.03.2010 10 Uhr im FR5516.

## 17.02.2010 - 10h: Evaluation of Parallel H.264 Decoding Strategies on the Cell Broadband Engine (Mr. Chi Ching Chi)

Vortragsankündigung:Evaluation of Parallel H.264 Decoding Strategies on the Cell Broadband Engine (Mr. Chi Ching Chi). Mittwoch, den 17.02.2010 10 Uhr im FR 3043.

## 12.01.2010: Mündliche Prüfung in TechGI2 (2. Wiederholungsprüfung)

Das Modul Technische Grundlagen der Informatik 2 (TechGI2) wird ab SS 2010 von dem neuen Leiter des Fachgebiets Architektur eingebetteter Systeme (AES), Prof. Juurlink, übernommen. Er wird dabei einige Veränderungen in der Umsetzung der in der Modulbeschreibung vorgegebenen Inhalte vornehmen, die sich auch in den Prüfungsfragen niederschlagen werden.
Der bisherige Veranstalter des Moduls, Hr. Flik, verliert seine Prüfungsberechtigung zum Ende des WS 2009/10, womit dann die Möglichkeit der mündlichen Prüfung über die bisherigen Inhalte wegfällt.
Für die derzeitigen Interessenten an einer solchen mündlichen Prüfung bietet Hr. Flik Prüfungstermine bis Mitte März 2010 an. Die Prüfungstage werden festgelegt, wenn die ersten Prüfungsanfragen vorliegen (flik(at)cs.tu-berlin.de). Anzugeben sind dabei die Studienrichtung, die Matr.-Nr. sowie der frühest möglich Wunschtermin.
Der eigentliche Prüfungstermin wird erst nach Vorlage der beim Prüfungsamt erforderlichen Prüfungsanmeldung vergeben. Diese Meldung muß wenigstens 7 Tage vor dem Prüfungstermin vorliegen (im Sekretariat von AES oder RT).

## 27.11.2009: Rufannahme von Professor Dr. Ben Juurlink.

Rufannahme von Professor Dr. Ben Juurlink, Professor an der Delft University of Technology, Niederlande, auf die W3-Professur für das Fachgebiet Rechnerarchitektur – Architektur eingebetteter Systeme.