UCLA - University of California, Los Angeles / OSU - Ohio State University
THURSDAY (02/10/2014) – 16:30 – 18:00 (Auditorium 1 and Auditorium 2)
Optimizing Compilers for High-Performance Computing
For application developers, the era of improved performance "for free" by means of increased clock speed and on-chip instruction-level parallelism is over. We saw dramatic and disruptive changes to the computing landscape, with multi-core processors becoming ubiquitous, and power density and energy considerations becoming the primary constraints driving technology directions. Customized accelerators, as illustrated by the work of the UCLA-led Center for Domain-Specific Computing, emerges as one of the key components to achieve the power efficiency demanded in the next-generation computing devices. Non-homogeneous CPU cores and ever-increasing complexity of System-on-Chips are on the roadmap of most manufacturers. In a word, computing platforms are now heterogeneous, after decades of mass-marketing homogeneous single-core x86 processors. These developments have dramatically changed the role and expectations of optimizing compilers: we expect from them high performance of the application for a wide variety of targets, which is absolutely critical both for embedded and mainstream computing as well as the push to exascale computing at the high end. Three key challenges have arisen from these changes, and must be tackled to deliver the much anticipated energy and execution time improvements those architectures have been designed for.
Performance portability, to improve the productivity of developers. Currently an application needs to be manually tuned or sometimes even redesigned for each specific target. We are in desperate need of automated tools to help achieving good performance on a wide variety of targets from a single input source.
Application modeling, to bridge the semantics gap between the application designer and the input to the compilation toolchain. Achieving performance portability requires aggressive program restructuring, which can be effectively done only if the compiler operates on a high-level, semantically rich representation of the algorithms implemented.
Hardware acceleration, to leverage the purpose-optimized computing capability of the hardware for a given application. This requires to transform and partition the input application to best exploit all hardware accelerators, but also to design and/or configure the hardware to match the specific computation patterns in an application.
In this talk I will present some current and upcoming research to address the above challenges, crossing the boundaries between application design, compiler optimization, and architectural configuration.
Dr. Louis-Noel Pouchet is currently a Visiting Assistant Professor at the University of California, Los Angeles. He is an active member of the NSF Center for Domain-Specific Computing (CDSC), working on both software and hardware customization. He is working on domain-specific languages and compilers for scientific computing, and has designed numerous optimizing compilation approaches to effectively map applications to CPUs, FPGAs and SoCs. He is also the author of PolyOpt, PoCC and PolyBench, three software packages dedicated to polyhedral compilation.
Inria Grenoble Compiler Group
FRIDAY (03/10/2014) – 09:00 – 10:30 (Auditorium 1 and Auditorium 2)
Source-to-source versus back-end / static versus dynamic: Making the most of both, or why compiler architecture must be deeply revisited
Language, compiler, and run-time systems are some of the most important components to bridge the gap between applications and hardware. With the continuously increasing power of computers, expectations are evolving, with more and more ambitious, computational intensive and complex applications. As desktop PC are becoming a niche and servers mainstream, three categories of computing impose themselves for the next decade: mobile, cloud, and super-computing. Thus diversity, heterogeneity (even on a single chip) and as a consequence also virtualization of hardware and system, are putting more and more pressure both on compilers and run-time systems. However, because of the energy wall, architectures are becoming more and more complex and parallelism ubiquitous at all levels. Unfortunately, memory-CPU gap continues to increase. To address the challenge of performance and energy consumption raised by silicon companies, compilers and run-time systems must change and, in particular, interact, taking into account the complexity of the host architecture.
Fabrice Rastello received is PhD degree in computer science from ENS Lyon in 2000. He then worked as an engineer for STMicroelectronics for two years before being a full researcher at Inria, ENS Lyon, France. He is now the team leader of the Inria Grenoble Compiler Group. His past researches was on Compiler optimizations for DSP/VLIW/Media like embedded processors, and focused on SSA-based optimizations. He is editor and author of the "SSA-based Compiler Design" book (to be published by Springer early 2015). His expertize is also on affine loop transformations such as tiling (PhD thesis).
Recently he started working on: 1. hybrid analyses (both static and dynamic), and hybrid compilation for loop transformations at machine level; 2. automatic tools for analysis of I/O complexity of algorithms with an application to hardware co-design; 3. software memory management.