Inhalt des Dokuments
17. November 2014: Paper "An Efficient and Flexible FPGA Implementation of a Face Detection System" accepted in FPGA-2015 (Poster session).
The paper "An Efficient and Flexible FPGA Implementation of a Face Detection System" by Hichem Ben Fakeh, Ahmed Elhossini, and Ben Juurlink was accepted as a poster in FPGA 2015, Monterery, California. The abstract will appear in the conference procedure in February 2015. This paper proposes a hardware architecture based on the object detection system of Viola and Jones using Haar-like features. The proposed design is able to discover faces in real-time with high accuracy. Speed-up is achieved by exploiting the parallelism in the design, where multiple classifier cores can be added. To maintain a flexible design, classifier cores can be assigned to different images. Moreover using different training data, every core is able to detect a different object type. As development platform, the Zynq-7000 SoC from Xilinx is used, which features an ARM Cortex-A9 dual-core CPU and a programmable logic (FPGA). The current implementation focuses on the face detection and achieves a real-time detection at the rate of 16.53 FPS on image resolution of 640x480 pixels, which represents a speed-up of 6.46 times compared to the equivalent OpenCV software solution.
November 3rd, 2014: Paper "Low Power High Efficiency Video Decoding using General Purpose Processors" accepted in ACM TACO.
The paper "Low Power High Efficiency Video Decoding using General Purpose Processors" by Chi Ching Chi, Mauricio Alvarez-Mesa, and Ben Juurlink has been accepted in ACM Transactions on Architecture and Code Optimization. The paper will appear in a future regular issue, and also will be presented at the HiPEAC conference in Amsterdam in January 2015. ACM TACO is a well known journal that focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. In this paper the authors investigate how code optimization techniques and low-power states of general purpose processors improve the power efficiency of HEVC decoding. To this end the power and performance efficiency of the use of SIMD instructions, multicore architectures, and low-power active and idle states are analyzed in detail for offline video decoding. In addition, the power efficiency of techniques such as "race to idle" and "exploitation slack" with DVFS are evaluated for real-time video decoding. Results show that "exploitation slack" is more power efficient than "race to idle" for all evaluated platforms representing smartphones, tablets, laptops and desktops.
November 3th, 2014: Our new employee Daniele Bortolotti.
We are pleased to welcome Daniele Bortolotti as a new member of our group. He will contribute to the AES-team in research.
October 23th, 2014 - Kick-off Event of the third EIT ICT Labs Master School in Budapest, Hungary.
Ahmed Elhossini will attend the Kick-off Event of the third EIT-ICT Labs Master School in Budapest. The event will take place between the 23rd and 25th of October, 2014. New students of 2014/15 of the EIT ICT Labs Master School will get together for the grand opening of their Master School studies for a unique kick-off event this year in Budapest. For three days, students from 19 leading European universities have the opportunity to meet and collaborate with their fellow students across technical majors, universities and country borders.
Prof. Ben Juurlink is the coordinator of the local implementation of the EIT-ICT Labs master school at TU-Berlin as well as the "embedded systems" technical major for the last three years. Ahmed is a member of he selection committee of the "Embedded Systems" technical major at TU-Berlin, and will participate in the event to welcome new students and to attend technical meetings about the future of the program.
For more information about the event please visit:
October 21st, 2014: Seminar "Recent Advances in Computer Architecture" starts.
The seminar Recent Advances in Computer Architecture takes place on Tuesdays from 10:15 to 11:45 in room EN 642. Due to a misunderstanding, the module directory of TU Berlin mentioned that the seminar starts next week October 21. Because of this, the first meeting has actually been postponed to 21st October. You’re cordially invited to attend this seminar.
October 17th, 2014: Paper "A new real-time system for image compression on-board satellites” accepted in OBPDC Workshop.
The Paper "A new real-time system for image compression on-board satellites” by Kristian Manthey, David Krutz and Ben Juurlink has been accepted in "On-Board Payload Data Compression Workshop".
Remote sensing sensors are used in various applications from Earth sciences, archeology, intelligence, change detection or for planetary research and astronomy. Disaster management after floodings or earthquakes, detection of environmental pollutions or fire detection are examples of countless number of applications. The spatial as well as the spectral resolution of satellite image data increases steadily with new technologies and user requirements resulting in higher precision and new application scenarios. In the future, it will be possible to derive real-time application-specific information from the image on-board the satellite also based on high-resolution images. On the technical side, there is a tremendous increase in data rate that has to be handled by such systems. While the memory capacity requirements can still be fulfilled, the transmission capability becomes increasingly problematic.
In this paper, an image compression architecture with region-of-interest support and with flexible access to the compressed data based on the CCSDS 122.0-B-1 image data compression standard is presented. Modifications to the standard permit a change of compression parameters and the re-organization of the bit-stream after compression. An additional index of the compressed data is created, which makes it possible to locate individual parts of the bit-stream. On request, stored images can be re-assembled according the application’s needs and as requested by the ground station. Interactive transmission of the compressed data is possible so that overview images can be transmitted first followed by detailed information for the regions of interest (ROIs).
October 9th, 2014: Paper "A parallel H.264/SVC Encoder for High Definition Video Conferencing" accepted in "Signal Processing: Image Communication".
The paper "A parallel H.264/SVC Encoder for High Definition Video Conferencig" by Sergio Sanz-Rodríguez, Mauricio Alvarez-Mesa, Tobias Mayer, and Thomas Schierl has been accepted in "Signal Processing: Image Communication". This article will appear (soon) in a future regular issue. "Signal Processing: Image Communication" is an international journal for the design, implementation and use of image communication systems and video codecs. The paper proposes a video encoder specially developed and configured for high definition (HD) video conferencing. This video encoder brings together the following three requirements: H.264/Scalable Video Coding (SVC), parallel encoding on multicore platforms, and parallel-friendly rate control. With the first requirement, a minimum quality of service to every end-user receiver over Internet Protocol networks is guaranteed. With the second one, real-time execution is accomplished and, for this purpose, slice-level parallelism, for the main encoding loop, and block-level parallelism, for the upsampling and interpolation filtering processes, are combined. With the third one, a proper HD video content delivery under certain bit rate and end-to-end delay constraints is ensured. The experimental results prove that the proposed H.264/SVC video encoder is able to operate in real time over a wide range of target bit rates at the expense of reasonable losses in rate-distortion efficiency due to the frame partitioning into slices.
October 8th, 2014: Ben Juurlink @ HiPEAC CS Week in Athens.
Prof. Ben Juurlink is currently visiting the HiPEAC Computing Systems in Athens (Greece). It consist, as always, of exciting keynote and thematic sessions, and includes the Third HiPEAC Industry Partner Program (HIPP) Event as well as a social programme culminating with a guided tour of and a dinner in the Acropolis Museum. The program can be found at www.hipeac.net/csw/2014/athens. Right you see a picture of the center of Athens with the Acropolis, taken from the roof of the conference hotel.
October 6th, 2014: Paper "SIMD Acceleration for HEVC Decoding" accepted in IEEE TCSVT
The paper "SIMD Acceleration for HEVC Decoding" by Chi Ching Chi, Mauricio Alvarez-Mesa, Benjamin Bross, Ben Juurlink, and Thomas Schierl has been accepted in IEEE Transactions on Circuits and Systems for Video Technology. The paper will appear (soon) in a future regular issue. The IEEE Transactions on Circuits and Systems for Video Technology has been the premier venue for publications in the areas of video technology such as video compression, hardware, and systems since its inception in 1991. Over the past decade, the scope of TCSVT has expanded and it is now the premier journal in all areas related to video technology and video systems including video analysis and processing. SIMD instructions have been commonly used to accelerate video codecs. The recently introduced HEVC codec like its predecessors is based on the hybrid video codec principle, and, therefore, also well suited to be accelerated with SIMD. In this paper we present the SIMD optimization for the entire HEVC decoder for all major SIMD ISAs. Evaluation has been performed on 14 mobile and PC platforms covering most major architectures released in recent years. With SIMD up to 5x speedup can be achieved over the entire HEVC decoder, resulting in up to 133 fps and 37.8 fps on average on a single core for Main profile 1080p and Main10 profile 2160p sequences, respectively.
Call for Papers - ARCS 2015.
The 28th GI/ITG International Conference on Architecture of Computing Systems will be held from March 24, 2015 through March 27, 2015.
|Paper submission deadline: ||October 6, 2014 |
|Workshop and tutorial proposals: ||November 3, 2014 |
|Notification of acceptance: ||December 1, 2014(expected) |
|Camera-ready papers: ||December 15, 2014|
More information about ARCS 2015 can be found at www.cister.isep.ipp.pt/arcs2015/.
1st October 2014: Sohan Lal visiting TU EIndhoven for collaboration.
Sohan Lal will be visiting TU Eindhoven, Netherlands, starting from 1st October till the end of the year for collaborative work. The collaboration is funded by HiPEAC network of excellence.
Sohan Lal will work on the problem of memory divergence in GPUs in collaboration with Electronic Systems Group of TU Eindhoven which is lead by Prof. Henk Corporaal.
GPUs are high throughput processors and are designed for tolerating long latency by switching thousands of threads. However, memory bandwidth is usually a bottleneck.
One of the problem caused by memory divergence is data over-fetch which causes memory bandwidth wastage. In this collaborative work, the problems caused by memory divergence will be studied and their probable solutions will be sought.
August 20th, 2014: Mr. Tamer Dallou presents a paper at HPCC 2014, Paris, France.
Mr. Tamer Dallou is presenting the paper "An Integrated Hardware-Software Approach to Task Graph Management" by Nina Engelhardt, Tamer Dallou, Ahmed Elhossini and Ben Juurlink at HPCC 2014, on 20.08.2014.
The HPCC-2014 conference is the 16th IEEE International Conference on High Performance and Communications. It will provide a forum for engineers and scientists in academia, industry, and government to address the resulting profound challenges and to present and discuss their new ideas, research results, applications and experience on all aspects of high performance computing and communications. HPCC-2014 is sponsored by IEEE, IEEE Computer Society, and IEEE Technical Committee on Scalable Computing (TCSC). It will take place in Paris, France on August 20-22, 2014. The conference program and more information can be found at conference.hpcc2014.studiocheik.fr
August 11st, 2014: Mauricio Álvarez-Mesa gives an invited course at Universidad del Valle (Colombia).
From August 11th to 15th Dr. Mauricio Álvarez-Mesa will give an short course about "High Performance Video Coding" at the Univesidad del Valle in Cali, Colombia. In this course Dr. Álvarez-Mesa will present the latest developments in video coding, as well as the research work done at the AES group about optimized implementations of HEVC/H.265. Mr. Álvarez-Mesa will also meet with professors and graduate students to discuss ongoing collaborations between the "School of Computer Engineering" of Universidad del Valle and the AES group of TU Berlin.
July 31th, 2014: Nina Engelhardt goes to University of Hongkong.
Nina Engelhardt leaves the AES team and will continue her research work at the University of Hong Kong. We are grateful for her good work and wish her all the best for the future.
July 23th, 2014: Gervin Thomas successfully completed his PhD defence.
Dipl.-Ing. Gervin Thomas successfully completed his PhD defence on Wednesday July 23rd, 2014. His thesis title was: "A Generic Implementation of a Quantified Predictor Applied to a DRAM Power-Saving Policy”. Gervin is the first to obtain his PhD in the AES group after Prof. Juurlink became the chair of the group in January 2010. Congratulation Dr. Thomas for your success and we wish you the best in your future.
July 22th, 2014: Oracle Labs visits AES Group.
Eric Sedlar and Michael Haupt from Oracle Labs visted the AES Group on June 22th. Eric Sedlar is the Vice President and Technical Director Oracle Labs. Members of AES group presented their work and engaged in enthralling discussions. Later Eric Sedlar presented his talk "How I Learned to Stop Worrying and Love Compilers".
July 9th, 2014: Paper "Parallel H.264/AVC Motion Compensation for GPUs using OpenCL" accepted in IEEE TCSVT.
The paper "Parallel H.264/AVC Motion Compensation for GPUs using OpenCL" by Biao Wang, Mauricio Alvarez Mesa, Chi Ching Chi and Ben Juurlink has been accepted in IEEE Transactions on Circuits and Systems for Video Technology. It will appear soon after the submission of camera ready manuscript to the IEEE CASS Publications Office.
IEEE Transactions on Circuits and Systems for Video Technology covers all aspects of visual information relating to video or that have the potential to impact future developments in the field of video technology and video systems, including but not limited to: image/video processing, image/video analysis and computer vision, image/video compression, image/video communication, image/video storage, image/video hardware/software systems, and image/video applications.
Juli 7th, 2014: Poster at PUMPS 2014.
A poster from TU Berlin named “On the Potential and Shortcomings of Temporal SIMT GPUs” was selected for the PUMPS 2014 poster season. It will be presented at the PUMPS poster seasons by Jan Lucas from TU Berlin.
The fifth edition of the Programming and Tuning Massively Parallel Systems summer school (PUMPS) is aimed at enriching the skills of researchers, graduate students and teachers with cutting-edge technique and hands-on experience in developing applications for many-core processors with massively parallel computing resources like GPU accelerators.
June 26th, 2014: Paper "An Integrated Hardware-Software Approach to Task Graph Management" accepted at HPCC 2014.
The paper titled "An Integrated Hardware-Software Approach to Task Graph Management" by Nina Engelhardt, Tamer Dallou, Ahmed Elhossini and Ben Juurlink has been accepted at HPCC 2014.
The HPCC-2014 conference is the 16th IEEE International Conference on High Performance and Communications. It will provide a forum for engineers and scientists in academia, industry, and government to address the resulting profound challenges and to present and discuss their new ideas, research results, applications and experience on all aspects of high performance computing and communications. HPCC-2014 is sponsored by IEEE, IEEE Computer Society, and IEEE Technical Committee on Scalable Computing (TCSC). It will take place in Paris, France on August 20-22, 2014. More information can be found at conference.hpcc2014.studiocheik.fr .
Paper Abstract: Task-based parallel programming models with explicit data dependencies, such as OmpSs, are gaining popularity, due to the ease of describing parallel algorithms with complex and irregular dependency patterns. These advantages, however, come at a steep cost of runtime overhead incurred by dynamic dependency resolution. Hardware support for task management has been proposed in previous work as a possible solution. We present VSs, a runtime library for the OmpSs programming model that integrates the Nexus++ hardware task manager, and evaluate the performance of the VSs-Nexus++ system. Experimental results show that applications with fine-grain tasks can achieve speedups of up to 3.4x, while applications optimized for current runtimes attain 1.3x. Providing support for hardware task managers in runtime libraries is therefore a viable approach to improve the performance of OmpSs applications.
June 2014: Michael Andersch signs contract with Nvidia.
Michael Andersch, a graduate student in the AES group, has signed a contract with Nvidia. Nvidia Corporation is an American global technology company based in Santa Clara, California. Nvidia manufactures graphics processing units (GPUs), as well as system-on-a-chip units (SOCs) for the mobile computing market. For several years Michael has been a research assistant in the AES group, first contributing to the ENCORE project and thereafter to the FP7 European project LPGPU (www.lpgpu.org). Michael will first be stationed in the Berlin office of NVidia before moving to sunny California. Although we are a bit sad to lose Michael to NVidia, we wish him all the best for the future.
June 12th, 2014: A TETRACOM Technology Transfer Project has been awarded to AES - TU Berlin and Think Silicon.
A technology transfer project called “eGPU accelerated HEVC/H.265 video decoder” has been awarded to AES TU Berlin and Think Silicon. The project is financed by TETRACOM (Technology Transfer in Computing Systems), a coordination action funded by the European Commission under the FP7 program. In this project, researchers from AES TU Berlin and Think Silicon will work together on the design of a HEVC/H.265 video decoder optimized for Think Silicon embedded GPU (eGPU). The main goal is to have a complete solution for embedded video applications that requires very low power consumption. The project will have a duration of 4 months starting from September 1st 2014. If the results are positive TU Berlin and Think Silicon will work together on the commercialization of the resulting products.
The AES group of TU Berlin (www.aes.tu-berlin.de) conducts research on computer architecture, ranging from low-power embedded systems to massively parallel high-performance systems. One of its main research lines consists of the efficient mapping of video (de)coding applications onto parallel computing systems. The group has developed an ultra fast HEVC/H.265 decoder optimized for multicore architectures.
Think Silicon (www.think-silicon.com) was founded in 2007 and specializes in designing and developing Mobile Computer Graphics Solutions for low-end and mid-end portable devices. On 2014 Think Silicon released a new embedded graphics processing unit called the Nema GPU. Nema GPU is a scalable, manycore, multi-threaded, state-of-the-art, data processing design blending both graphics rendering and general computing capabilities.
June 6th, 2014: HiPEAC collaboration grant to Sohan Lal.
HiPEAC has accepted a three month collaboration proposal of Sohan Lal between Electronic Systems Group of TU Eindhoven, Netherlands and Embedded Systems Architecture Group of TU Berlin, Germany.
Sohan Lal will work on the problem of memory divergence in GPUs in collaboration with Electronic Systems Group which is lead by Prof. Henk Corporaal.
Memory divergence is one of the key performance bottleneck for high performance computing on GPUs. Both the Embedded Systems Architecture Group and the Electronic Systems Group have lot of experience in GPUs and the collaboration is expected to yield high quality joint publication.
Mai 20th, 2014: Paper accepted for Scientific Programming Journal.
The paper "TACO: A Scheduling Scheme for Parallel Applications on Multicore Architectures" co-authored with Jan Schönherr and Jan Richling of the Operating Systems research group of TU Berlin and Ben Juurlink of AES has been accepted for publication in the Scientific Programming journal by IOS Press.
Mai 15th, 2014: ISCA Travel Grant to Jan Jucas.
Jan Lucas was granted a travel grant by IEEE TCCA to visit ISCA 2014 and present his work on approximative storage in DRAM at "The Memory Forum" workshop.
The 41st International Symposium on Computer Architecture (ISCA) is the premier forum for new ideas and experimental results in computer architecture. ISCA 2014 will be held in Minneapolis, Minnesota during June 14-18, 2014.
The IEEE Computer Society Technical Committee on Computer Architecture (TCCA) is involved with research and development in the integrated hardware and software design of general- and special-purpose uniprocessors and parallel computers. TCCA annually sponsors/cosponsors the International Symposium on Computer Architecture, and with the ACM SIGARCH, it jointly administers the Eckert-Mauchly Award for contributions to computer architecture. TCCA also helps organize special issues of society periodicals and publishes a newsletter periodically, which contains meeting reports, abstracts of technical reports, calls for papers, and other announcements.
Apr. 30th, 2014: "Sparkk: Quality-Scalable Approximate Storage in DRAM" accepted at "The Memory Forum".
The paper "Sparkk: Quality-Scalable Approximate Storage in DRAM" by Jan Lucas, Mauricio Alvarez Mesa,Michael Andersch and Ben Juurlink has been accepted at The Memory Forum. The Memory Forum 2014 will be held in conjunction with the 41st International Symposium on Computer Architecture (ISCA-41). The paper presents a novel technique for an improved approximative storage area in DRAM.
Apr. 14th, 2014:The paper titled "GPGPU Workload Characteristics and Performance Analysis" has been accepted at SAMOS 2014.
The paper titled "GPGPU Workload Characteristics and Performance
Analysis" by Sohan Lal, Jan Lucas, Michael Andersch, Mauricio
Alvarez-Mesa, Ahmed Elhossini and Ben Juurlink has been accepted at SAMOS 2014.
The International Conference on Embedded Computer Systems: Architectures, MOdeling, and Simulation (SAMOS) was established in 2001 by Prof. Stamatis Vassiliadis, Prof. Ed Deprettere, and Dr. Andy Pimentel as a Dutch Research Seminar located in the tiny town of Agios Konstantinos on the small Greek island of Samos in the Aegean Sea. From 2001, every year, many scientists from both Academia and Industry are involved in the different aspects of the organization of the conference. More information about SAMOS 2014 can be found at http://samos-conference.com/.
Gesucht: Studentische Hilfskraft mit 41 Monatsstunden und Unterrichtsaufgaben.
Kennziffer: 3434 T 25/14
Einstellungsdauer: voraussichtlich vom 01.04.2014 bis zum 31.03.2016
Mitarbeit in der Lehre im Bachelor-Studium. Betreuung und Vorbereitung für folgende Lehrveranstaltungen: TechGI1: Digitale Systeme, TechGI2/TechGI2TI: Rechnerorganisation, Hardware Praktikum
Bachelor Technische Informatik oder Informatik, Abschluss des 3. Semesters und Modulabschlüsse in TechGI1 und TechGI2 bzw. äquivalente Abschlüsse, gute Englisch - und VHDL-Kenntnisse, Bereitschaft zur Einarbeitung in neue Themengebiete
Ihre schriftliche Bewerbung mit Lebenslauf, Immatrikulationsbescheinigung und ggf. aktueller Notenübersicht richten Sie bitte an:
Technische Universität Berlin
Fakultät IV - Elektrotechnik und Informatik
Institut für Technische Informatik und Mikroelektronik (TIME)
Fachgebiet Architektur eingebetteter Systeme (AES)
Prof. Dr. B. Juurlink
oder per e-Mail: sekr
Feb. 28th 2014: Paper accepted for "Informatiktage 2014" by German Informatics Society.
Philipp Habermann's paper "Design and Implementation of a High-Throughput CABAC Hardware Accelerator for the HEVC Decoder" was accepted for the proceedings of the "Informatiktage 2014" conference by GI (German Informatics Society), which will be held from March 27-28, 2014 in Potsdam, Germany.
Feb. 27th 2014: Paper accepted by German Informatics Society.
The paper "A High-Performance Hardware Accelerator for HEVC Motion Compensation" by Matthias Göbel has been accepted for the proceedings of the "Informatiktage 2014" conference by GI (German Informatics Society) which will be held from March 27-28, 2014 in Potsdam, Germany.
Feb. 10th, 2014: Paper accepted at GLSVLSI'14.
The paper "A Generic Implementation of a Quantified Predictor for FPGAs" by Gervin Thomas, Ahmed Elhossini and Ben Juurlink has been accepted for oral presentation at the 24th edition of GLSVLSI in Houston, Texas, USA, May 21-23. More information about GLSVLSI'14 can be found at http://www.glsvlsi.org/.
Gesucht: Studentische Hilfskraft mit 41 Monatsstunden.
Kennziffer: 3434 T 16/14
Einstellungsdauer: voraussichtlich vom 01.04.2014 bis zum 31.03.2016
Mitarbeit in der Lehre im Bachelor-Studium. Betreuung und Vorbereitung für folgende Lehrveranstaltungen: TechGI1: Digitale Systeme, TechGI2/TechGI2TI: Rechnerorganisation, Hardware Praktikum
Bachelor Technische Informatik oder Informatik, Abschluss des 3. Semesters und Modulabschlüsse in TechGI1 und TechGI2 bzw. äquivalente Abschlüsse, gute Englisch - und VHDL-Kenntnisse, Bereitschaft zur Einarbeitung in neue Themengebiete
Ihre schriftliche Bewerbung mit Lebenslauf, Immatrikulationsbescheinigung und ggf. aktueller Notenübersicht richten Sie bitte an:
Technische Universität Berlin
Fakultät IV - Elektrotechnik und Informatik
Institut für Technische Informatik und Mikroelektronik (TIME)
Fachgebiet Architektur eingebetteter Systeme (AES)
Prof. Dr. B. Juurlink
Feb 10th, 2014: Mauricio Álvarez-Mesa gives an invited talk at University of Castilla-La-Mancha, Albacete, Spain.
Dr. Mauricio Álvarez-Mesa will give an invited talk at the Department of Computer Science of University of Castilla-La-Mancha, in Albacete, Spain on Monday February 10th. The title of the talk is: "Parallel Video Decoding: Experiences with H.264 and HEVC". In this talk Dr Álvarez-Mesa will present the latest results of the AES research group on video decoding using parallel architectures. Dr Álvarez-Mesa also will have a meeting with researchers from the Computer Architecture and Technology group at the Albacete Research Institute of informatics (I3A) about possible joint projects.
Gesucht: Wiss. Mitarbeiter(in) -Entgeltgruppe 13 TV-L Berliner Hochschulen für max. 5 Jahre (zur Promotion)
Kennziffer: IV - 13/14 (besetzbar ab 01.03.2014/ Bewerbungsfristende 23.02.2014)
Jan. 15th 2014. AES group at HiPEAC 2014 Conference.
A delegation of the AES group will participate in the HiPEAC 2014 conference which will be held in Vienna from January 20th to January 22nd 2014. AES participation includes three presentations at the LPGPU Workshop on Power-Efficient GPU and Many-core Computing (PEGPUM 2014). The workshop is organized by members of the LPGPU European project, including the AES group from TUB. The presentations from the AES group are: "Power and Energy Efficiency of Video Decoding on Multi-core Architectures" by Chi Ching Chi, "DART: A Decoupled Architecture Exploiting Temporal SIMD" by Jan Lucas, and "Parallel H.264/AVC Motion Compensation for GPUs using OpenCL" by Biao Wang (who got a PhD student registration grant from the HiPEAC 2014 organizing committee). More information about the PEGPUM workshop can be found at lpgpu.org/wp/pegpum-2014/ and about the HiPEAC conference at www.hipeac.net/conference/vienna.
Jan. 7, 2014: Paper accepted at MULTIPROG.
The paper "Considering Quality-of-Service for Resource Reduction using OpenMP" has been accepted for presentation at the Seventh Workshop on
Programmability Issues for Heterogeneous Multicores (MULTIPROG-2014) to be held in conjunction with the 9th International Conference on
High-Performance and Embedded Architectures and Compilers (HiPEAC) in Vienna, Austria on January 22, 2014. The paper is the result of collaboration between Artur Podobas, Mats Brorsson and Vladimir Vlassov from KTH in Stockholm, Sweden, and Chi Ching Chi and Ben Juurlink from the AES group of TU Berlin. More information about the workshop can be found at http://multiprog.ac.upc.edu.
Jan. 2, 2014- AES publicizes their H264 OpenCL decoder.
To employ the power of GPUs for massive parallel processing, this work offloads parallel kernels in H.264 decoding, namely inverse transform and motion compensation, onto GPUs. At kernel level, significant speedup is observed compared to an highly optimized CPU SIMD implementation.
For more Information and download see our page about "High performance video coding" here.
Dec. 17, 2013: Visit to Technische Universität Dresden.
The PhD guest Guilherme Calandrini made a visit to TU Dresden for give a presentation entitled as "Performance Portability and Energy Issues in Computing Architectures" to the group of Operating Systems and Security of Prof. Dr. Hermann Härtig in the faculty of Computer Science, the group has a strong background of high level system development, such as Linux Kernel and virtualization, the visit aimed to present the issues in the development of energy efficiency applications that must handle with different layers of computing architecture (from circuit level, architecture design, operating system and why not the virtual machine). During the visit, he also had the opportunity to make known the LPGPU project to Prof. Dr. Emil Matus from the Vodafone Chair Mobile Communications Systems that works in a heterogeneous SoC for communication applications.
For further information about the talk, see os.inf.tu-dresden.de/EZAG/abstracts/abstract_20131217.xml
Dec 5, 2013- 10:30h, EN 642: Real time reconfigurable system for traffic sign detection - Merten Sach.
Dec 5, 2013- 10h, EN 642: Multiprotokollfähige Master für Ethernet-basierte Feldbusse - Victor Kozhukhov
Moderne CNC-Steuerungen verwenden spezielle auf Ethernet basierte Protokolle, um die Ansteuerung der Slave-Geäte in Echtzeit zu ermöglichen. Eine CNC-Steuerung von Schleicher Electronic, die bereits in der Lage ist als ein Sercos-III-Master zu operieren, wird um die Funktionalität eines EtherCAT-Masters erweitert. Die Nutzung beider Protokolle soll über die gleichen Ethernet-Anschlüsse der CNC-Steuerung möglich sein. Der Benutzer soll selbstständig entscheiden können, ob die CNC-Steuerung als ein Sercos-III-Master oder ein EtherCAT-Master eingesetzt werden muss. Der Wechsel des Protokolls soll dabei mit einem möglichst geringen Aufwand stattfinden. Dabei sind vor allem Änderungen der Hardware (mit der Ausnahme der Inhalte der programmierbaren Logik) zu vermeiden. Die CNC-Steuerung verwendet einen speziell für Sercos-III-Master optimierten Dual-MAC. Der Dual-Mac des Sercos-III-Masters wird beim Starten der Software als IP-Core auf ein in die CNC-Steuerung integriertes FPGA geladen. Um die Nutzung der CNC-Steuerung als ein EtherCAT-Master zu ermöglichen, wird ein geeignetes Dual-MAC entwickelt. Somit kann beim Starten der Software entschieden werden, ob der Dual-MAC für das Sercos-III-Master oder der Dual-MAC für das EtherCAT-Master auf das FPGA geladen wird. Der Dual-PHY, der in die CNC-Steuerung integriert und mit dem FPGA verbunden ist, ist für beide Protokolle geeignet.
Es wird zusätzlich eine Anpassung der Software benötigt, damit die CNC-Steuerung in der Lage ist als ein EtherCAT-Master zu operieren. Für den Aufbau des EtherCAT-Master-Protokollstacks wird ein EtherCAT-Master-High-Level-Treiber von Acontis Technologies eingesetzt. Der EtherCAT-Master-High-Level-Treiber steht dabei in Form einer vorkompilierten Library zur Verfügung. Die anwendungsspezifische Software ist in der Lage den High-Level-Treiber einzubinden und über eine entsprechende API zu verwenden. Für den Einsatz des High-Level-Treibers, zusammen mit dem selbstständig für den EtherCAT-Master entwickelten Dual-MAC, wird eine Ethernet-Hardwareabstraktionsschicht implementiert.
Dec 3, 2013: Vice chancellor of Politecnico di Milano visits AES group.
On Tuesday December 3, the vice chancellor of TU Berlin's partner university Politecnico di Milano, Prof. Donatella Sciuto, will visit the AES group. Mrs. Sciuto is a full professor in Computer Engineering at the Dipartimento di Elettronica e Informazione of the Politecnico di Milano. She is Deputy Director of Education at CEFRIEL where she manages the executive companies education training programs. For more information about Prof. Sciuto and her research interests, visit her website at here.
Dec 2, 2013, 13h: Crown Scheduling: Energy-Efficient Resource Allocation, Mapping and Discrete Frequency Scaling for Collections of Malleable Streaming Tasks- Prof. Dr. Christoph Kessler.
Time: 1:00 PM, 2 December 2013
Place: 4.064, MAR Building
We investigate the problem of generating energy-optimal code for a collection of streaming tasks that include parallelizable or malleable tasks on a generic manycore processor with dynamic discrete frequency scaling. Streaming task collections differ from classical task sets in that all tasks are running concurrently, so that cores typically run several tasks that are scheduled round-robin at user level in a data driven way. A stream of data flows through the tasks and intermediate results are forwarded on-chip to other tasks.
In this presentation we introduce Crown Scheduling, a novel technique for the combined optimization of resource allocation, mapping and discrete voltage/frequency scaling for malleable streaming task sets in order to optimize energy efficiency given a throughput constraint. We present optimal off-line algorithms for separate and integrated crown scheduling based on integer linear programming (ILP). Our energy model considers both static idle power and dynamic power consumption of the processor cores.
Our experimental evaluation of the ILP models for a generic manycore architecture shows that at least for small and medium sized task sets even the integrated variant of crown scheduling can be solved to optimality by a state-of-the-art ILP solver within a few seconds. -
We conclude with a short outlook to the new EU FP7 project EXCESS (Execution Models for Energy-Efficient Computing Systems).
This is joint work with Nicolas Melot (Linköping University), Patrick Eitschberger and Jörg Keller (FernUniv. in Hagen, Germany). Partly funded by VR, SeRC, and CUGS.
Based on our recent paper with the same title at Int. Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS-2013), Sep. 2013, Karlsruhe, Germany.
Christoph W. Kessler (german spelling: Keßler) is a professor for Computer Science at Linköping University, Sweden, where he leads the Programming Environment Laboratory's research group on compiler technology and parallel computing. Christoph Kessler received a PhD degree in Computer Science in 1994 from the University of Saarbrücken, Germany, and a Habilitation degree in 2001 from the University of Trier, Germany.
In 2001 he joined Linköping university, Sweden, as associate professor at the programming environments lab (PELAB) of the computer science department (IDA).
In 2007 he was appointed full professor at Linköping university. His research interests include parallel programming, compiler technology, code generation, optimization algorithms, and software composition. He has published two books, several book chapters and more than 90 scientific papers in international journals and conferences. His contributions include e.g. the OPTIMIST retargetable optimizing integrated code generator for VLIW and DSP processors, the PARAMAT approach to pattern-based automatic parallelization, the concept of performance-aware parallel components for optimized composition, the PEPPHER component model and composition tool for heterogeneous multicore/manycore based systems, the SkePU library of tunable generic components for GPU-based systems, and the parallel programming languages Fork and NestStep.
27-28 Nov. 2013: Ben Juurlink gives keynote presentation at ICT.OPEN 2013.
Ben Juurlink has been invited to give a keynote presentation in the Embedded Systems track of ICT.OPEN 2013. ICT.OPEN is the principal ICT research conference in the Netherlands and is held on 27-28 November in Eindhoven. The title of his talk is "Lessons Learnt From Parallelizing Video Decoding". More information about the conference can be found at www.ictopen2013.nl/content/speakers.
21.11.13, 10h, EN 642: Manycore Agent-Oriented Programming (MAOP)- Silvano Menk and Robert Hering
In our presentation we want to give a short overview of our bachelor thesis. Therefore we will briefly discuss the current state of parallel programming with special focus on manycore architectures. From this we will deduce our idea for a supposedly intuitive and efficient programming model for manycore architectures, which will be the subject of our thesis. Finally we will propose a coarse working plan and hope for some initial feedback and suggestions.
November 18-19 2013. Fusing GPU Kernels at HiPEAC Compiler, Architecture and Tools Conference.
A presentation based on a research work, that has been undertaken by Codeplay and TU Berlin’s AES group as part of the LPGPU project, will be presented at this year’s HiPEAC Compiler, Architecture and Tools Conference in Haifa, Israel. The talk is titled “Fusing GPU kernels within a novel single-source C++ API” and will be presented by Paul Keir from Codeplay.
More information about the conference can be found at: http://software.intel.com/en-us/articles/compilerconf2013
Abstract of the talk:
The prospect of GPU kernel fusion is often described in research papers as a standalone command-line tool. Such a tool adopts a usage pattern wherein a user isolates, or annotates, an ordered set of kernels. Given such OpenCL C kernels as input, the tool would output a single kernel, which performs similar calculations, hence minimising costly runtime intermediate load and store operations. Such a mode of operation is, however, a departure from normality for many developers, and is mainly of academic interest.
Automatic compiler-based kernel fusion could provide a vast improvement to the end-user's development experience. The OpenCL Host API, however, does not provide a means to specify opportunities for kernel fusion to the compiler. Ongoing and rapidly maturing compiler and runtime research, led by Codeplay within the LPGPU EU FP7 project, aims to provide a higher-level, single-source, industry-focused C++-based interface to OpenCL. Along with LPGPU's AES group from TU Berlin, we have now also investigated opportunities for kernel fusion within this new framework; utilising features from C++11 including lambda functions; variadic templates; and lazy evaluation using std::bind expressions.
While pixel-to-pixel tranformations are interesting in this context, insomuch as they demonstrate the expressivity of this new single-source C++ framework, we also consider fusing transformations which utilise synchronisation within workgroups. Hence convolutions, utilising halos; and the use of the GPU's local shared memory are also explored.
A perennial problem has therefore been restructured to accommodate a modern C++-based expression of kernel fusion. Kernel fusion thus becomes an integrated component of an extended C++ compiler and runtime.
Nov. 17-22, 2013: Mr. Tamer Dallou is presenting a paper at MTAGS - SC 2013
Mr. Tamer Dallou is attending "The International Conference for High Performance Computing, Networking, Storage and Analysis (SC 2013)", to present a paper at the "6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS 2013)". The paper title is "FPGA-Based Prototype of Nexus++ Task Manager", and presents the recent VHDL design and evaluation of the Nexus++, our hardware task graph manager for task-based programming models. SC 2013 is a principal HPC conference world wide, and takes place in Denver, Co, USA on Nov. 17-22, 2013.
7.11.13, 10:30h, EN 642: Automatic Code Generation for a Microblaze system with ARM NEON SIMD Acceleration - Ilias Timon Poulakis
SIMD (Single instruction, Multiple data) accelerators are increasingly deployed in modern CPU architectures. These units can efficiently process certain data, e.g. mulimedia formats, improving CPU performance and energy consumption. The research department Embedded Systems Architecture(AES) of the Berlin Institute of Technology currently utilizes the Microblaze processor by XILINX, which does not sup- port SIMD acceleration natively. Hence, an ARM NEON compatible SIMD accelerator has been attached to the Microblaze processor. The two units communicate through a protocol based on FSL (Fast Simplex Link). To efficiently use this peculiar architecture, automatic code generation is needed. Yet, creating a custom compiler is difficult and utterly time-consuming. In order to avoid this route, this thesis presents an alternate approach in which merely existing compiler backends are used. The main idea is to create machine code for both Microblaze and ARM NEON separately, using their respective existing compiler backends. Code sections executable by ARM NEON have to be located, then be appropriately inserted into the Microblaze code. In the wake of this thesis, a tool that performs these tasks has been successfully imple- mented, tested and evaluated. This paper focuses on the realization steps taken. The capabilities of the implemented tool are discussed, and an outlook is given on how the approach could be utilized for a different combination of processor and SIMD accelerator.
29–30 October 2013: Ben Juurlink @ Cyber-Physical Systems: Uplifting Europe's Innovation Capacity.
Ben Juurlink is currently visiting this two-day event in Brussels which is devoted to explore the innovation potential of Cyber-Physical Systems (CPS). This event is organized by the European Commission and discusses how EU Research and Innovation Programmes can stimulate the creation of new industrial platforms led by EU-actors and facilitate the matchmaking between future user/customer needs and technology offers. For more information, see http://www.amiando.com/cps-conference.html.
October 14, 2013: Prof. Juurlink is a member of the PhD defense committee of Yifan He.
Prof. Juurlink is visiting Eindhoven, NL, where he is a member of the PhD defense committee of Yifan He.
Yifan He defends his dissertation entitled "Low Power Architectures for Streaming Applications".
24.10.13, 11h, EN 642: Design and Implementation of a high-throughput CABAC Hardware Accelerator for the HEVC Decoder- Philipp Habermann.
HEVC is the new video coding standard of the Joint Collaborative Team on Video Coding. As in its predecessor H.264/AVC, Context-based Adaptive Binary Arithmetic Coding (CABAC) is a throughput bottleneck. Due to strong low-level data dependencies, there is only a very small amount of data level parallelism that can be exploited by using the SIMD extensions of current computer architectures. A high-level parallelization is possible in HEVC, but not mandatory. That is why another optimization strategy has to be developed that can be used independently from the input video. Attention was paid for throughput improvements during the standardization of HEVC to address this issue. The goal of this thesis is to evaluate the hardware acceleration opportunities for the highly sequential HEVC CABAC by exploiting the throughput improvements. The evaluation is limited to transform coefficient decoding, as it is the most time consuming part of CABAC. The hardware accelerator is implemented on the Digilent ZedBoard, a development board that contains a 667 MHz ARM Cortex-A9 processor together with a closely coupled FPGA and thereby allows efficient hardware-software co-design. The implemented hardware accelerator processes 70 Mbins/s at 75.36 MHz and achieves an 11× speed-up over software transform coefficient decoding for a typical workload. The hardware accelerator has also been integrated in a complete HEVC software decoder but due to the current slow hardware-software interface, the overall speed-up is relatively small. However, as the data transfer between hardware and software can be significantly reduced when a full CABAC hardware accelerator is implemented, this is a promising path to pursue in future work.
24.10.13, 10h, EN 642: Design and Implementation of a Hardware Accelerator for HEVC Motion Compensation- Matthias Goebel.
This master thesis focuses on the design and implementation of a motion compensation hardware accelerator for use in HEVC hybrid decoders, i.e. decoders that contain hard- ware as well as software parts. The motion compensation part of the decoding process is especially suited for such an approach as it is the most time consuming part of pure software decoders. Support for high resolutions and frame rates should be combined by the hardware accelerator with a very low demand for resources and power. An optimized software decoder compatible to the reference decoder has been used as a starting point. As a platform the Zynq-7000 All Programmable SoC by Xilinx is used which combines an ARM Cortex-A9 dual-core CPU running at 667 MHz with flexible programmable logic resources similar to those used in FPGAs. After giving some background information on the involved topics a discussion of the design space with a special focus on the level of granularity, the degree of parallelization and the memory access is performed. For the granularity the PU level has been chosen as it offers a good trade-off between performance and complexity. The resulting design is further highlighted and a prototype implemented and validated. For validation the Foreign Language Interface (FLI) of Mentor Graphics’ ModelSim HDL simulator has been used. As an evaluation of the prototype shows promising results, two different memory interfaces (including one using DMA) are added and the complete accelerator integrated into a Zynq-7000 environment. The necessary modifications to the software decoder for both interfaces are discussed and partially performed. A final evaluation shows an expected frame rate of 4.14 FPS for the complete 1080p decoding process when running the accelerator at 100 MHz.
23.10.13: The paper "Considering Quality-of-Service for Resource Reduction using OpenMP" in the MCC13.
The paper "Considering Quality-of-Service for Resource Reduction using OpenMP" has been accepted for Oral presentation at the 6th Swedish Workshop on Multicore Computing and to be included in the workshop proceedings. The workshop will be held at Halmstad University in Halmstad, Sweden (November 25-26, 2013) .
15.10.13: The paper "FPGA-Based Prototype of Nexus++ Task Manager" to appear in MTAGS 2013.
The paper "FPGA-Based Prototype of Nexus++ Task Manager", by Tamer Dallou, Ahmed Elhossini and Ben Juurlink, is accepted to appear at the 6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers, which is Co-located with Supercomputing/SC 2013, on November 17th, 2013, Denver, Colorado, USA.
Abstract: StarSs is one of several programming models that try to relieve parallel programming. In StarSs, the programmer has to identify pieces of code that can be executed as tasks, as well as their inputs and outputs. Thereafter, the runtime system (RTS) determines the dependencies between tasks and schedules ready tasks onto worker cores. Previous work has shown, however, that the StarSs RTS may constitute a bottleneck that limits the scalability of the system and proposed a hardware task management system called Nexus++ to eliminate this bottleneck. The first prototype of Nexus++ was implemented in SystemC. Its architecture also had a nondeterministic multi-cycle search algorithm in its critical path, potentially limiting its scalability. In this paper, we improved the architecture of Nexus++ and employed a multi-way set-associative cache-like data structures to optimize its search algorithm and increase task throughput. We also modeled the new architecture in VHDL and targeted a Virtex~5 FPGA from Xilinx. Experimental results show that the new architecture is very resource-efficient utilizing only 19% of the target FPGA. It also shows that Nexus++ achieves a speedup of up to 81x using some synthetic benchmarks modeled after H.264 decoding. Hence, Nexus++ significantly enhances the scalability of applications parallelized using StarSs.
AES in California: An NVIDIAn returns.
Tuesday, 01. October 2013
As of October 1st, the graduate student Michael Andersch has re-joined AES research. Michael had spent his summer in Santa Clara, California, where he was employed by NVIDIA during a summer internship. As an intern, Michael worked in the GPU Compute Architecture team, building tools and architecture designs to analyze and improve compute application performance on NVIDIA's next-generation GPU designs. Welcome back, Michael!
Our new employee Philipp Habermann.
Monday, 28. October 2013
We are pleased to welcome Philip Habermann as a new member of our group. He will contribute to the AES-team in research and teaching. Welcome!
"Recent Advances in Computer Architecture" will take place in room EN 630!
Attention! Room Change! The Course "Recent Advances in Computer Architecture" (0433 L 334) will take place on Tuesdays from 10:00 to 12:00 in room EN 630.
The Lab excercises for Multicore Architectures will take place in room TEL 206 Li!
Attention! Room Change!
The Lab excercises for the Multicore Architectures course (LV 0433 L 333) will take place on Mondays from 14:00 to 16:00 in room TEL 206 Li.
The Kurs Computer Arithmetics, Multicore Architectures and Recent Advances in Computer Architectures starts a week later.
The following courses will start a week later (from the October 21, 2013):
- Computer Arithmetics: A Circuit Perspectiv,
- Multicore Architecture and
- Recent Advances in Computer Architecture.
10.10.13, 10h, EN 642: Enhancing Cache Organization for Tiled CMP Architectures - Tareq Alawneh.
Many-core processors architecture has become very common nowadays with the leading CPU manufactures (Intel, AMD, and TILERA) focusing on tiled CMP architectures. Our target system assumes a tiled CMP architecture consists of n-core interconnected with 2D mesh switched network. Each tile has a processor core, a private L1-D/I cache, private L2 cache, and router for on-chip data transfers. Each cache block has a home tile which maintains the directory information for that block- the directory keeps track of tiles with copies for that block. On occurring miss in the private L1 and L2 caches respectively, it will request it from home tile. In case of miss happened, it will be handled depending on its specific coherent protocol implementation. The drawback of this design is the possibility of overloading some home tiles with the remote requests which creates a scalability bottleneck. Furthermore, as the processor count increases the L2 miss cache access latency will be dominant by the number of message hops to reach the particular cache rather than the time spent to access the cache itself. These drawbacks can be mitigated when taking into account other access patterns of the data.
In this study, we analyze this problem and propose ways to alleviate its impact on the system performance. One way to improve the system performance of the tiled CMP architecture is to access the L2 cache banks of the adjacent tiles to fetch the requested code cache lines before accessing its assigned home tiles. Realizing such mechanism will reduce the L2 remote cache latency, since the requested code cache lines may be fetched them from L2 caches of nearby tiles instead of L2 caches of its home tiles. Furthermore, the number of accesses for the home tiles will be reduced. These two contributions of our proposed study will be certainly reflected in the improvement of the system performance as a consequence of expected reduction of network utilization and AMAT.
As future work, we propose another way to improve the tiled CMP architectures by migrating hot cache lines closer to requesting tiles.
October, 6-10: Prof. Juurlink to ICCD conference.
From October 6 to October 10 Prof. Juurlink will visit the IEEE International Conference on Computer Design in Asheville, North Carolina, USA.
He is the chairman of the Processor Architecture track and will also chair the session on Efficient Cache Architectures.
Sept. 30th 2013. A delegation from Hunan University, China visits AES TU-Berlin.
A delegation from the University of Hunan (one of the oldest and most important national universities in China) will visit the AES group of TU-Berlin on September 30th. They will be introduced to the research activities of the AES group, and discuss opportunities for joint research work. The delegation is composed by 7 faculty members from the School of Computer and Communication lead by professor Renfa Li.
13.09.13: AES TU Berlin presents 4k UHD HEVC/H.265 decoding.
The AES group is proud to present its highly efficient 4k Ultra HD capable MPEG-HEVC/H.265 decoder setup. A demo setup is created with a 65 inch Samsung UHD TV and a custom mini PC based on the 4th generation Intel Core processor. Optimization for the latest generation processors allow the compact setup to decode UHD faster than 60 fps even at higher bit depths with no more than two threads.
26.09.2013, 10h, EN 642: A Cost-Effective Kite State Estimator for Reliable Automatic Control of Kites- Johannes Peschel
Airborne Wind Energy (AWE) is a developing technology that uses tethered wings to harvest wind energy and convert it into electrical energy. Most of the AWE concepts that will be presented in this thesis have one common challenge: Estimating the position and the orientation of the kite, also called kite state, especially during highly dynamic flight situations. The focus of this thesis is first, to investigate, if angular sensors are feasible to obtain reliable position data and second, which fusion algorithm can be used to join the data of the angular and Global Navigation Satellite System (GNSS) sensors. The TU Delft prototype is a suitable testing platform for this purpose. The author added angular sensors to the ground station of the TU Delft AWE system that measure the elevation and the horizontal displacement of the tether holding the kite. They are mounted on a modular stainless steel construction, which has low wear and a long lifetime. The author used the tether length and the angular data to obtain a new position. This position was merged with the two GNSS positions that were already attached to the kite. The angular sensors were able to measure with a resolution of <0.01°. The elevation and azimuth position of the kite had an error of less than 0.7° as long as the tether force was higher than 2000N. One of the GNSS sensors provided reliable data during low force phases. A reliable position in all flight conditions could be obtained by using double exponential smoothing prediction to merge both positions. This development enables the implementation of a reliable kite power control system.
Sept. 22-27, 2013: Prof. Juurlink to ScalPerf workshop.
Prof. Juurlink has been invited to give a presentation at the ScalPerf (Scalable Approaches to High Performance and High Productivity Computing) workshop which will be held in Bertinoro, Italy from Sept. 22 to Sept. 27, 2013. There he will present his recent article "Amdahl's law for predicting the future of multicores considered harmful". For more information about the workshop see http://www.dei.unipd.it/~versacif/scalperf13/index.html. The article can be accessed via ACM Digital Library http://doi.acm.org/10.1145/2234336.2234338.
September 15, 2013: "HiPEAC grant: Performance portability for low-power embedded GPUs"
The AES group of TU Berlin has received a collaboration grant from HiPEAC for a three month visit of Guilherme Calandrini, a PhD student from the University of Alcala in Spain. The visit will focus on performance portability for low-power embedded GPUs using OpenCL. In this collaboration we aim to create a set of OpenCL benchmarks that can be used to compare the performance and power efficiency of different embedded low-power GPUs. The results of this research will be very useful for understanding the performance and power implications of optimization strategies for different GPU architectures; and also selecting the most appropriate GPUs based on well defined quantitative performance and power metrics.
11.09.13: Best paper award at the 3rd IEEE 2013 ICCE-Berlin.
Mauricio Alvarez-Mesa, Chi Ching Chi and Ben Juurlink of the AES group of TU Berlin have won a best paper award at the Third IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin) for the paper "HEVC Performance and Complexity for 4K Video". The paper was a joint effort between the AES group of TU Berlin and Fraunhofer HHI.
September 9, 2013: The AES group will host Mr. Hasan Hassan.
The AES group will host Mr. Hasan Hassan, who is a student at TOBB University of Economics and Technology (http://etu.edu.tr/en) as an intern to work on porting computer vision algorithms to GPUs using OpenCL. The internship will be organized within the framework of the EU Erasmus Programme and will take place between 09-09-2013 and 20-12-2013. We aim to develop several kernels that are used in various computer vision algorithms, with high demand of parallel computation, to the GPU world, which provides a high level of parallel processing.
September 7, 2013: Prof. Dr. Ben Juurlink in the MuCoCoS-2013
The Paper "Topology-aware Equipartitioning with Coscheduling on Multicore Systems" by Jan H. Schönherr, Ben Juurlink and Jan Richling will be presented in the 6th International Workshop on Multi-/Many-core Computing Systems (MuCoCoS-2013), which will be held on September 7 in Edinburgh, Scotland, UK, in conjunction with the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT 2013).
MuCoCoS-2013 focuses on language level, system software and architectural solutions for performance portability across different architectures and for automated performance tuning.
More information about MuCoCoS-2013 can be found at: www.ida.liu.se/conferences/mucocos2013/
Aug 29th. 2013. AES paper in the SPIE Applications of Digital Image Processing Conference.
The paper "HEVC real-time decoding" by Mauricio Alvarez-Mesa, Chi Ching Chi and Ben Juurlink of the AES group of TU-Berlin has been presented at the SPIE Applications of Digital Image Processing Conference that was help in San Diego, USA, from August 25 to August 29 2013. The paper was a joint effort between the AES group of TU Berlin and Fraunhofer HHI.
Juli 2013: "best-in-class-award" to Philip Habermann.
Recently the "best-in-class-award" was handed out by Prof. Juurlink to Philip Habermann for the class "Advanced Computer Architecture 2011-2012". The "best-in-class-award" is awarded to the best performing student who achieves the highest grade in Prof. Juurlink's master courses. This year it will also be awarded.
14-20 july, 2013: Sohan Lal and Jan Lucas from TU Berlin are going to present two posters at HiPEAC ACACES 2013.
Sohan Lal and Jan Lucas from TU Berlin are going to present two posters at HiPEAC ACACES 2013. The two posters will present some of the most recent research results from the project to the public for the first time.
- The poster “Exploring GPGPUs Workload Characteristics and Power Consumption” by Lal et al. will provide interesting insights into the power consumption of GPU workloads and how they are related to the performance characteristics of the workloads.
- The poster “DART: A GPU architecture exploiting temporal SIMD for divergent workloads” will present first simulation results for DART, an new GPU architecture developed within the LPGPU consortium by Lucas et al.
8.07.2013, 13h, EN 642: Implementation and Evaluation of Large Warps in GPUSimPow - Matthias Stroux
Graphic processors (GPU) are a special class of parallel processors for massive parallel programs. From additional processors enhancing graphics intensive programs they have developed into general purpose computing devices for high-performance business and scientific computing. GPU’s typically handle branches by sequentializing the branch paths, which leads to underutilization of their SIMD execution units. Large-Warps (LW) is a concept to increase utilization by selecting threads with the same execution path, PC and program-state, from larger units, called ’Large-Warps’ into temporary units of execution, the ’Sub-Warps’. LW should therefore lead for some programs to a significant increase of SIMD-utilization. Theoretical considerations also show that for some programs there should be an increase of IPC or a decrease of execution cycles possible by a factor of more than two. To test this concept with real programs and in a complete system, where memory latency and network effects can be taken into account and measured, Large-Warps was implemented in a software simulator for GPU’s, GPGPU-Sim 3.x and the new power-simulator GPUSimPow for power effects. Results for ideally constructed synthetic benchmarks show the expected effects: where functional execution of SIMD-units can be increased, IPC increases to. However memory effects and effects of other system parts have to be taken into account to. For a number of ’real-world’ benchmarks the positive effect of Large-Warps on performance (IPC) can be confirmed.
June 12th 2013: A group of computer engineering students from Universidad del Valle (Colombia) visit AES TU-Berlin
A group of computer engineering students accompanied by professor Dr. Maria Trujillo from Universidad del Valle of the Colombian city of Cali will visit TU-Berlin on June 12th 2013. The German Academic Exchange Program (DAAD) has organized and financed this visit which will allow the students to know the research and teaching activities of the AES-TUB group, and also will be the starting point for future research collaborations.
June 12th 2013, 10 a.m., EN 185: Achronix tech talk
The outline of the talk is:
- Overview of the FPGA market;
- Achronix in the High End FPGA market;
- Achronix Value Proposition versus incumbent high end FPGA vendors;
- INTEL / Achronix partnership : Current 22 nm Tri-gate process product, 14 nm products 2014/201, 10 nm;
- Products available at Achronix in 2nd H 2013;
- SW tools presentation (video);
- HD1000 demo board ;
More Information see: www.achronix.com
24.05.2013, 14:00h, EN 642: Automatic Code Generation for a MicroBlaze System with ARM NEON SIMD Acceleration - Ilias Poulakis
SoSe 13- A New Course: Computer Arithmetic: Circuit Respective
The advance of modern embedded systems, and their high computation capabilities mainly depends on their ability to perform arithmetic operation in an efficient manner. This course is intended to increase the Knowledge about the design of embedded arithmetic circuits as well as the scientific background of these circuits. This will help the students to gain more details about the design of arithmetic processing units and more practical experience in the implementation of digital systems. The students will increase their experience in the use of hardware description languages to model and implement digital systems. The implementation of these circuits using VHDL/FPGA will be included as well.
For more information visit: http://www.aes.tu-berlin.de/menue/lehrveranstaltungen/comparth/
April 21, 2013: AES-Paper at ISPASS 2013.
The paper "Why a Single Chip Causes Massive Power Bills - GPUSimPow: A GPGPU Power Simulator" by Jan Lucas, Sohan Lal, Michael Andersch, Mauricio Alvarez-Mesa and Ben Juurlink has been accepted at the 2013 International Symposium on Performance Analysis of Systems and Software which will be held from April 21-23 in Austin, Texas, US. The paper details much of the work performed by AES' Low-Power GPU group concerning power simulation.
12.04.2013, 10h, EN 642: Bachelor Thesis: High-Throughput Communication Interface for the Xilinx XUPV5 Evaluation Platform -Lester Kalms
Due to the increasing tasks of processors in computer systems and the growing complexity, it is not wrong to outsource some of these tasks to relieve the processor. Some of these tasks are done by expansion cards, such as graphic-cards or sound-cards. These cards communicate nowadays via PCI-Express with the rest of the system. Peripherals cards can of course also support other tasks. A platform for the development of an expansion card is for example provided by Xilinx with the evaluation platform XUPV5 . In order to communicate with the card, an interface is needed on the hardware and on the software side of the communication. This can be developed with this card and the help of the Xilinx tools. In times of ever-increasing amounts of data, a correspondingly high data throughput is needed, which is in theory feasible with PCI-Express. This thesis deals with the development of an interfaces that communicates via PCI-Express and of how to maximize the data throughput. This Thesis has been done to support the work of others, which want to develop an efficient expansion card. The second chapter deals with PCI-Express and explains fundamental things to create a basic understanding. It explains what PCI-Express is, how communication works and what data throughput can be achieved. PCI-Express communicates via packets. These packets are called "Transaction Layer Packets". The third chapter deals with the system which has been developed. It is described how the hardware design works as a whole and in detail and how the hardware design has been implemented. It will also be described how the software system is created and how it works, and especially how these two systems interact with each other. The following chapters include the practical work. The fourth chapter describes how a running system, which satisfies the requirements, has been created. The system described in the previous chapter was able to communicate, but there were still some errors in various situations. It explains what has been done to correct these errors and what did not work and why did it not work. The fifth chapter deals with the increasing of the data throughput and it also includes some measurements. For easier handling and measurement, a user application was implemented. In the final chapter the results will be commented, interpreted and compared with the theory. Finally, there is an outlook on methods that still can be tested or that have not yet been tested completely.
8-11 April 2013: a 4K H.265/HEVC real-time decoder at NABShow 2013 in Las Vegas
A 4K H.265/HEVC real-time decoder has been presented at the NABShow in Las Vegas, Nevada, USA during April 8-11 2013. The demo consisted of a software based decoder running a multicore PC connected to a 4K 84 inches TV. It was presented at the Fraunhofer HHI Booth C7843. The real-time decoder has been developed as a part of a collaborarion between the Fraunhofer Heinrich Hertz Institute (HHI) and the AES group of TU-Berlin. The demo was presented by Benjamin Bross from Fraunhofer HHI and Mauricio Alvarez-Mesa from Fraunhofer HHI and TU-Berlin.
04.04.2013, 10h, EN 642: How a Single Chip Causes Massive Power Bills GPUSimPow: A GPGPU Power Simulator (ISPASS 2013 paper) - Jan Lucas
14/03/2013, 10h, EN 642: "Migen - a Python toolbox for building complex digital hardware"-Sébastien Bourdeauducq
Despite being faster than schematics entry, hardware design with Verilog and VHDL remains tedious and inefficient for several reasons. The event-driven model introduces issues and manual coding that are unnecessary for synchronous circuits, which represent the lion's share of today's logic designs. Counter-intuitive arithmetic rules result in steeper learning curves and provide a fertile ground for subtle bugs in designs. Finally, support for procedural generation of logic (metaprogramming) through "generate" statements is very limited and restricts the ways code can be made generic, reused and organized.
To address those issues, we have developed the Migen FHDL library that replaces the event-driven paradigm with the notions of combinatorial and synchronous statements, has arithmetic rules that make integers always behave like mathematical integers, and most importantly allows the design's logic to be constructed by a Python program. This last point enables hardware designers to take advantage of the richness of the Python language - object oriented programming, function parameters, generators, operator overloading, libraries, etc. - to build well organized, reusable and elegant designs.
Other Migen libraries are built on FHDL and provide various tools such as a system-on-chip interconnect infrastructure, a dataflow programming system, a more traditional high-level synthesizer that compiles Python routines into state machines with datapaths, and a simulator that allows test benches to be written in Python.
02/01/2013 - 11 a.m.: Master school - ICT Innovation - information event
Invitation to the information event
On February, 1st 2013 at 11 a.m. an information event for the "European Dual Degree Master in ICT innovation" will take place in room TEL AB (Telefunken tower). We would like to invite all interested students, and especially those who will finish their BSc degree until August 2013.
The dual degree master program "ICT Innovation" will start in the winter term 2013/14. The application deadline is April, 15th 2013.
For more information: masterschool.eitictlabs.eu
31.01.2013, 10h, EN 642: "Composing Execution Times on Multicore Processors" - J. Reinier van Kampenhout
The use of multicore processors in embedded systems promises to reduce the space, weight and power requirements while offering increased functionality. To enable these benefits however, a runtime environment must be able to execute multiple safety-critical applications in parallel with non-critical applications. An underlying problem in multicores is the use of shared resources, which leads to interference between applications and unpredictable timing behaviour which is not acceptable for critical applications with hard real-time requirements.
In this research we will conceive and implement a concept for the execution of real-time applications on multicore processors with a composable timing behaviour. In our approach we decompose applications into basic blocks whose, behaviour is deterministic and can be determined empirically. Using models that capture the essential properties of the HW and SW we construct a deployment scheme out of these blocks. The result is a system on which multiple mixed-criticality applications are executed in parallel, each of which has a timing behaviour that is composed out of that of its basic blocks. Thus our method guarantees isolation between applications and simplifies worst case execution analysis, independent of the hypervisor or OS. The usage of resources can furthermore be optimized by allocating any unused resources dynamically to non-critical applications at run time. We will prove the effectiveness of our concept by comparing the variation in execution times to those achieved with purely static scheduling, fixed-priority scheduling and virtualization.
21.01.2013: Mr. Tamer Dallou has won a “Best Poster Award” at the HiPEAC 2013.
Mr. Tamer Dallou has won a “Best Poster Award” for our joint poster “Nexus++: A hardware Task Manager for the StarSs Programming Model” at the 8th International Conference on High-Performanceand Embedded Architectures and Compilers HiPEAC 2013, January 2013, Berlin, Germany.
Recently, several programming models have been proposed that try to relieve parallel programming. One of these programming models is StarSs. In StarSs, the programmer has to identify pieces of code that can be executed as tasks, as well as their inputs and outputs. Thereafter, the runtime system (RTS) determines the dependencies between tasks and schedules ready tasks onto worker cores. Previous work has shown, however, that the StarSs RTS may constitute a bottleneck that limits the scalability of the system and proposed a hardware task manager called Nexus to eliminate this bottleneck. Nexus has several limitations, however. For example, the number of inputs and outputs of each task is limited to a fixed constant and Nexus does not support double buffering. Here we present Nexus++ that addresses these as well as other limitations. Experimental results show that double buffering achieves a speedup of $54\times$, and that Nexus++ significantly enhances the scalability of applications parallelized using StarSs.
Jan. 2013 : HiPEAC 2013 in Berlin.
The HiPEAC conference will be held in Berlin from Monday 21 to Wednesday January 23, 2013. The HiPEAC conference is the premier forum for experts in computer architecture, programming models, compilers and operating systems for embedded and general-purpose systems in Europe. In 2013 the general chairs will be Ben Juurlink of TU Berlin and Keshav Pingali of the University of Texas, Austin. Program chairs are André Seznec of INRIA Rennes and Lawrence Rauchwerger of Texas A&M University. Paper selection is performed by the ACM journal TACO. More than 500 people attended the HiPEAC 2012 conference in Paris. Hopefully HiPEAC 2013 will be as successful. For more information, stay tuned at http://www.hipeac.net/conference/berlin.
2012: Book "Scalable Parallel Programming Applied to H.264/AVC Decoding"
The book titled "Scalable Parallel Programming Applied to H.264/AVC Decoding" co-authored by Ben Juurlink, Mauricio Alvarez-Mesa, Chi Ching Chi, Arnaldo Azevedo, Cor Meenderinck and Alex Ramirez has been published by Springer as part of the series SpringerBriefs in Computer Science. The book can be purchased from several internet retailers. More information can be found at Springer webpage: http://www.springer.com/engineering/signals/book/978-1-4614-2229-7
Nov. 1, 2012: AES-Paper in the IEEE Transactions on circuits and Systems for Video Technology
The paper "Parallel Scalability and Efficiency of HEVC Parallelization Approaches" by C.C. Chi, M. Alvarez-Mesa, B. Juurlink, G. Clare, F. Henry, S. Pateux and T. Schierl, has been accepted in the IEEE Transactions on circuits and Systems for Video Technology. The paper is part of a special issue about High Efficiency Video Coding (HEVC) that will appear in December 2012. The paper can now be accessed at ieeeXplore: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6327343&isnumber=4358651
Nov. 10, 2012:AES-Paper in the Journal of Signal Processing Systems.
The paper "Parallel HEVC Decoding on Multi- and Many-core Architectures. A Power and Performance Analysis" by C.C. Chi, M. Alvarez-Mesa, J. Lucas, B. Juurlink, and T. Schierl, has been accepted in the Journal of Signal Processing Systems. It will appear soon in a special issue about Design and Implementation of Signal Processing Systems.
Nov. 12, 2012: "SynZEN: A Hybrid TTA/VLIW Architecture with a Distributed Register File" ar the NORSHIP 2012
The paper "SynZEN: A Hybrid TTA/VLIW Architecture with a Distributed Register File" by S. Hauser, N. Moser, B. Juurlink, accepted at the NORCHIP - The Nordic Microelectronics event 2012 which will be held in Copenhagen, Denmark, on Nov. 12 - Nov. 13 2012. More information about NORCHIP 2012 can be found at www.norchip.org.
Follow us on Twitter: AES_TU_Berlin
Follow us on Twitter: http://twitter.com/#!/AES_TU_Berlin you can read our messages without your prior personal registration.
Oct 23th, 11h, KIT: High Efficiency Video Coding on Multi- and Many-core Architectures by M. Alvarez Mesa
Dr. Mauricio Alvarez-Mesa will give an invited talk at Karlsruhe Institute of Technology titled "High Efficiency Video Coding on Multi- and Many-core Architectures" in which he will present the latest results of the AES research on HEVC decoding on parallel architectures. The talk will be held on October 23th, at 11:00 am at Karlsruher Institut für Technologie (KIT), Institut für Prozessdatenverarbeitung und Elektronik (IPE), Karlsruhe, Germany.
11. Oct 2012- 10h-EN 642: Scalable Runtime and OS Abstractions for Mesh-Based MultiCores (Prof. Frank Mueller)
Current trends in microprocessors are to steadily increase the number of cores. As the core count increases, the network-on-chip (NoC) topology has changed from buses over rings and fully connected meshes to 2D meshes.
This work contributes NoCMsg, a low-level message passing abstraction over NoCs. NoCMsg is specifically designed for large core counts in 2D meshes. Its design ensures deadlock free messaging for wormhole Manhattan-path routing over the NoC. Experimental results on the TilePro hardware platform show that NoCMsg can significantly reduce communication times when compared with other NoC-based message approaches. They further demonstrate the potential of NoC messaging to outperform shared memory abstractions, such as OpenMP, as core counts and inter-process communication increase.
This work further explores the benefits of novel runtime and operating systems abstractions for large scale multicores. On top of NoCMsg, a distributed OS abstraction is promoted instead of the traditional shared memory view on a chip. This distributed kernel features a pico-kernel per core. Sets of pico-kernels are controlled by micro-kernels, which are topologically centered within a set of cores. Cooperatively, micro-kernels comprise the overall operating system in a peer-to-peer fashion.
Biography: Frank Mueller (firstname.lastname@example.org) is a Professor in Computer Science and a member of multiple research centers at North Carolina State University. Previously, he held positions at Lawrence Livermore National Laboratory and Humboldt University Berlin, Germany. He received his Ph.D. from Florida State University in 1994. He has published papers in the areas of parallel and distributed systems, embedded and real-time systems and compilers. He is a member of ACM SIGPLAN, ACM SIGBED and a senior member of the ACM and IEEE Computer Societies as well as an ACM Distinguished Scientist. He is a recipient of an NSF Career Award, an IBM Faculty Award, a Google Research Award and a Fellowship from the Humboldt Foundation.</pre><pre>
11.Oct 2012: Courses in WS2012/13
Sept 30- Oct 3, 12:"Improving the Parallelization Efficiency of HEVC Decoding" at the ICIP 2012
The paper "Improving the Parallelization Efficiency of HEVC Decoding"
by C. C. Chi, M. Alvarez-Mesa, B. Juurlink, V. George and T. Schierl has been accepted at the 2012 IEEE International Conference on Image Processing (ICIP) which will be held in Orlando, Florida, USA, on Sept. 30 - Oct. 3 2012. This paper is the second of a collaboration between the AES group and the Multimedia Communications Group of the Fraunhofer HHI Institute on the topic of parallel processing for HEVC. More information about ICIP-2012 can be found at http://icip2012.com.
Sept 10, 2012: "Hardware-Based Task Dependency Resolution for the StarSs Programming Model" at SRMPDS'12
The paper "Hardware-Based Task Dependency Resolution for the StarSs Programming Model" by Tamer Dallou and Ben Juurlink has been accepted at the "SRMPDS'12 - Eighth International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems", which will be held in conjunction with "ICPP'12 - The 2012 International Conference on Parallel Processing" in Pittsburgh, PA on September 10, 2012.
This paper is a result of the research conducted at AES as part of the ENCORE project. More information on SRMPDS can be found at:
Sept 5-8, 2012: "A Novel Predictor-based Power-Saving Policy for DRAM Memories" at the 15th EUROMICRO Conference on Digital System Design (DSD)
The paper "A Novel Predictor-based Power-Saving Policy for DRAM Memories" by Gervin Thomas, Karthik Chandrasekar, Benny Akesson, Ben Juurlink and Kees Goossens has been accepted at the 15th EUROMICRO Conference on Digital System Design (DSD), Cesme, Izmir, Turkey on September 5th - September 8th, 2012. This paper is a collaboration between the AES group (TU-Berlin) and Electronic Systems group (TU Eindhoven). More information about DSD-2012 can be found at http://www.univ-valenciennes.fr/congres/dsd2012/.
August 27, 2012: "An Optimized Parallel IDCT on Graphics Processing Units" at HeteroPar'2012
The paper "An Optimized Parallel IDCT on Graphics Processing Units" by Biao Wang, Mauricio Alvarez-Mesa, Chi Ching Chi, and Ben Juurlink has been accepted at the 2012 International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar'2012) which will be held in Rhodes Island, Greece on August 27, 2012. The paper presents the work of offloading H.264 IDCT kernel to the GPUs which has been conducted at AES as part of the LPGPU project. More information on HeteroPar can be found at http://pm.bsc.es/heteropar12/.
July 31, 2012: AES has setup a testbed to accurately measure GPU power consumption.
AES has setup a testbed to accurately measure GPU power consumption. This testbed is being used to evaluate power reduction techniques on available GPUs. It will also be used to validate the power modeling of GPUSimPow, the GPU power simulator developed within the LPGPU project. Its high bandwidth and high sampling speeds enable it to accurately measure short, sub-ms power events.
The AES developed measurement software allows developers to pinpoint power consumption down to the individual kernel.
16-19 july, 12: "Using OpenMP Superscalar for Parallelization of Embedded and Consumer Applications" at the SAMOS XII
The paper "Using OpenMP Superscalar for Parallelization of Embedded and Consumer Applications" by M. Andersch, C.C. Chi and Ben Juurlink has been accepted at the 2012 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS) which will be held in Samos, Greece on July 16.-19. 2012. The paper is the latest of the research concerning the OpenMP Superscalar programming model which has been conducted at AES as part of the ENCORE project. More information on SAMOS can be found at http://samos.et.tudelft.nl/samos_xii/html/.
July 11, 2012: "Nexus++: A hardware Task Manager for the StarSs Programming Model" at ACACES'12
The poster "Nexus++: A hardware Task Manager for the StarSs Programming Model" by Tamer Dallou and Ben Juurlink has been presented at the "ACACES'12 - Eighth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems", which was
held in Fiuggi, Italy, on 8-14 July, 2012.
This poster presents some of the results of the research conducted at AES as part of the
ENCORE project. More information on ACACES'12 can be found at:
8-14 july, 12: Mr. Tamer Dallou attends ACACES 2012.
Mr. Tamer Dallou was awarded a HiPEAC grant to attend the Eighth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems ACACES 2012, 8-14 july, 2012, Fiuggi, Italy.
The "HiPEAC Summer School" is a one week summer school for computer architects and compiler builders working in the field of high performance computer architecture and compilation for embedded systems. The school aims at the dissemination of advanced scientific knowledge and the promotion of international contacts among scientists from academia and industry.
AES group purchases TILE-Gx36 many-core.
The AES group of TU Berlin has purchased a state-of-the-art TILE-Gx36 many-core with 36 64-bit processor cores (tiles) from Tilera (tilera.com). Soon researchers and students of AES will be able to work on this state-of-the-art many-core processor. For more information about the TILE-Gx processor family, see http://tilera.com/products/processors/TILE-Gx_Family.
7.06.12-10h: CPU-Exclusive Group Scheduling (Balanced Performance Computing) - Nicolas Schier
May 12: Article in ACM SIGARCH Computer Architecture News.
Ben Juurlink and his PhD graduate Cor Meenderinck have published an article entitled "Amdahl's Law for Predicting the Future of Multicores Considered Harmful" in the current (May 2012) issue of Computer Architecture News, which is published by the ACM Special Interest Group on Computer Architecture (SIGARCH) [http://www.sigarch.org/]. In the article they consider how the predictions in the influential paper of Hill and Marty  change when instead of Amdahl's Gustafson's law is assumed. They also propose a different scaling equation called Generalized Scaled Speedup Equation (GSSE) that encompasses Amdahl's as well as Gustafson's law.  Mark D. Hill, Michael R. Marty: Amdahl's Law in the Multicore Era. IEEE Computer 41(7): 33-38 (2008)
HiPEAC '13: Call for papers
The 8th HiPEAC conference will take place in Berlin, Germany from Monday 21 to Wednesday January 23, 2013.
[Download here] the Call for papers in Pdf.
For submission details, please refer to http://mc.manuscriptcentral.com/taco.
- Workshops/tutorials: June 1, 2012
- Papers: June 18, 2012
- Posters: October 15, 2012
- Early Registration Deadline: December 22, 2012
16 May 12: Prof. Dr. Ben Juurlink in the Map2MPSoC/SCOPES.
Prof. Ben Juurlink will give an invited keynote at the 5th Workshop on Mapping of Applications to MPSoCs and 15th International Workshop on Software and Compilers for Embedded Systems, which will be held May 15-16 in the beautiful Schloss Rheinfels hotel at St. Goar, Germany (http://www.scopesconf.org/scopes-12/)
10. Mai 12-10h - Room EN 642: REFLEX (Richard Weickelt)
REFLEX is a framework for deeply embedded control systems. It is based upon the event-flow model, which greatly supports component-centric development of concurrent applications. In combination with multiple scheduling directives, interrupt handling and power management facilities, developers can create applications that are both, deadlock-free and totally predictable.
The library is implemented in C++ and benefits from its powerful language features. Only few parts are platform dependent and can be ported to new architectures with very little effort. A standard compiler like g++ is the only requirement.
REFLEX was developed at the TU Cottbus and is released under the BSD license. In this meeting You will get a brief overview on the framework and its features. After a case study about a real-world product, future research challenges will be discussed.
12.04.2012- 10h: Online satellite image processing (Kristian Manthey)
Herr Kristian Manthey wird am 12.04.2012 um 10 Uhr im Rahmen unseres Forschungstreffen einen Vortrag zum Thema: Online satellite image processing (Realtime Image compression on reconfigurable Hardware) halten. Raum: EN 642.
Abstract: There are challenging requirements on optical systems in spaceborne missions. In the last years, the spatial as well as the spectral resolution of the image data increased resulting in a tremendous increase in data rate. There are also requirement to image quality and constraints resulting from the environment in which the system should be used. An optical system for spaceborne application must have a very high reliability, low power consumption as well as a low weight. The system must be radiation tolerant and able to operate in vacuum and in a high temperature range. With the decrease of the ground sample distance (GSD) or the increase of swath, the amount of data increases significantly. Due to the limitation of transmission bandwidth to the ground station, it is necessary to compress the data. Depending on the requirements of the mission, lossless or lossy compression schemes can be used. Image Compression itself is based on the removal of redundant information in the image, such as spatial or statistical redundancy or of the removal of information not needed in the further processing. Image compression architectures consist of spatial decorrelation to remove spatial redundancy, in case of lossy compression followed by quantization and finally entropy coding to remove statically redundancy. Spatial decorrelation in typical space mission is done by prediction (DPCM), discrete cosine transform (DCT) or discrete wavelet transform (DWT). To achieve best compression results, inter-band decorrelation techniques are necessary. This is obvious because image data has correlation between bands or when using multi spectral sensors (MS) in combination with a sensor which is sensitive in all MS channels. In the DLR, it is planned to develop a satellite camera which does all tasks - image acquisition, pre-processing, compression, storage, data formatting and communication with the ground station - on a single multi-chip-module (MCM). In a first step, the image compression should be done directly on the image acquisition module. The goal of this thesis is to investigate scenarios, where the ground station interactively requests and decompresses the image data, and to develop a high-speed image compression system on the image acquisition module.
25-29 March 2012 : M. Alvarez Mesa presents a paper at ICASSP-2012 in Kyoto, Japan
The paper "Parallel video decoding on the emerging HEVC standard"
by M. Alvarez-Mesa, C. C. Chi, B. Juurlink, V. George and T. Schierl has been accepted at th 37th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) which will be held in Kyoto, Japan, on March 25 - 30, 2012. The ICASSP meeting is one of the largest technical conference focused on signal processing and its applications. The paper, which is the result of a collaboration between the AES group and the Multimedia Communications Group of the Fraunhofer HHI Institute, will be presented by Mauricio Alvarez-Mesa at the session "Parallel and embedded signal processing systems". More information about ICASSP-2012 can be found at http://www.icassp2012.com.
26-27 March 2012: Prof. Dr. Ben Juurlink and Sean Halle present their progress in the LPGPU project in Cambridge
Prof. Dr. Ben Juurlink and Sean Halle are going to Cambridge for the first LPGPU face to face meeting, on March 26 and 27. They will discuss interactions between the work-packages, the low-power industry-space, and tackle simulator questions. Each participant is going to present their progress in the first year of LPGPU in preparation for the first-year review.
25-29 Feb. 2012 : Michael Andersch presents a poster at PPoPP in New Orleans.
The paper "Programming Parallel Embedded and Consumer Applications in OpenMP Superscalar" by Michael Andersch, Chi Ching Chi, and Ben Juurlink was accepted as a poster presentation at the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). The student Michael Andersch will present the poster in New Orleans from February 25 to February 29, 2012. For more information about the PPoPP conference, see http://dynopt.org/ppopp-2012/.
1.01.12 - 15.02.1212: EIT ICT Master School geht an den Start
Die Bewerbungsphase für die neue Master School läuft vom 1. Januar bis 15. Februar 2012. Weitere Informationen unter eitictlabs.masterschool.eu
05.10.2011 - 06.10.2011: LPGPU Kickoff-Meeting
05.07.11 - 07.07.2011: ENCORE F2F meeting at TU Berlin
10.01.2011 -16h: A System-Level Approach to Parallelism (Sean Halle)
Vortragsankündigung: A System-Level Approach to Parallelism (Sean Halle)
Dienstag, den 11. Januar 2011 um 16 Uhr im E-N 360.
02.11.2010: EU launches ENCORE project to develop technologies for future heterogeneous multi-core platforms. ENCORE Press Release 1.
06.07.2010- 15 Uhr: Hardware Task Management Support for Task-Based Programming Models: the Nexus System (M.Sc. Cor Meenderinck)
Vortragsankündigung:Hardware Task Management Support for Task-Based Programming Models: the Nexus System (M.Sc. Cor Meenderinck)
Mittwoch, den 14. Juli 2010 um 15 Uhr im FR3043
31.03.2010: Lehrangebot im SS2010
Das Lehrangebot unseres Fachgebietes kann im Bereich Studium und Lehre eingesehen werden. Besonders hinweisen möchten wir auf das Master Modul "Advance Computer Architectures" für Informatiker und Technische Informatiker, welches in diesem Semester erstmalig angeboten wird.
8.03.10- 10h : New architectures for the final scaling of the CMOS world (Professor Luigi Carro)
Vortragsankündigung: New architectures for the final scaling of the CMOS world (Professor Luigi Carro). Montag, den 08.03.2010 10 Uhr im FR5516.
17.02.2010 - 10h: Evaluation of Parallel H.264 Decoding Strategies on the Cell Broadband Engine (Mr. Chi Ching Chi)
Vortragsankündigung:Evaluation of Parallel H.264 Decoding Strategies on the Cell Broadband Engine (Mr. Chi Ching Chi). Mittwoch, den 17.02.2010 10 Uhr im FR 3043.
12.01.2010: Mündliche Prüfung in TechGI2 (2. Wiederholungsprüfung)
Das Modul Technische Grundlagen der Informatik 2 (TechGI2) wird ab SS 2010 von dem neuen Leiter des Fachgebiets Architektur eingebetteter Systeme (AES), Prof. Juurlink, übernommen. Er wird dabei einige Veränderungen in der Umsetzung der in der Modulbeschreibung vorgegebenen Inhalte vornehmen, die sich auch in den Prüfungsfragen niederschlagen werden.
Der bisherige Veranstalter des Moduls, Hr. Flik, verliert seine Prüfungsberechtigung zum Ende des WS 2009/10, womit dann die Möglichkeit der mündlichen Prüfung über die bisherigen Inhalte wegfällt.
Für die derzeitigen Interessenten an einer solchen mündlichen Prüfung bietet Hr. Flik Prüfungstermine bis Mitte März 2010 an. Die Prüfungstage werden festgelegt, wenn die ersten Prüfungsanfragen vorliegen (flik(at)cs.tu-berlin.de). Anzugeben sind dabei die Studienrichtung, die Matr.-Nr. sowie der frühest möglich Wunschtermin.
Der eigentliche Prüfungstermin wird erst nach Vorlage der beim Prüfungsamt erforderlichen Prüfungsanmeldung vergeben. Diese Meldung muß wenigstens 7 Tage vor dem Prüfungstermin vorliegen (im Sekretariat von AES oder RT).
01.01.2010: Professor Dr. Ben Juurlink ist seit heute Lehrstuhlinhaber für das Fachgebiet Rechnerarchitektur – Architektur eingebetteter Systeme.
27.11.2009: Rufannahme von Professor Dr. Ben Juurlink.
Rufannahme von Professor Dr. Ben Juurlink, Professor an der Delft University of Technology, Niederlande, auf die W3-Professur für das Fachgebiet Rechnerarchitektur – Architektur eingebetteter Systeme.