Workshop on Integrating HPC and FPGAs
Organizers: Kazutomo Yoshii (Argonne National Laboratory), Taisuke Boku (Center for Computational Sciences,University of Tsukuba), Franck Cappello (Argonne National Laboratory), RIKEN R-CCS
Date: December 11 (Tue) afternoon, 2018
Location: Tenbusu-Naha Hall 3-2-10, Makishi, Naha-shi, Okinawa, Japan http://www.fpt18.sakura.ne.jp/venue.html
Room: Tenbusu Hall, 4th floor
Traditionally, high-performance computing (HPC) platforms have been designed for relatively regular and floating-point intensive-workloads (e.g., LINPACK). Because of emerging artificial intelligence, big data, and data analytics requirements, HPC platforms need to be changed to accommodate such workloads as well as traditional ones. Since FPGAs have already demonstrated their acceleration potential for these new HPC requirements, integrating FPGAs into HPC is a natural step. In this workshop, HPC experts, who have been evaluating reconfigurable architecture and post-Moore technologies for HPC, will discuss challenges and opportunities of such integration. This workshop is co-held with the International Workshop on FPGA for HPC (IWFH), the Joint Laboratory on Extreme Scale Computing (JLESC), and the Field-Programmable Technology Workshop (FPT'18). Audience participation is highly encouraged.
Opening and welcome by Kazutomo Yoshii
Dataflow and Task-Parallelism: Programming models for FPGA accelerated HPC
Recently, FPGA has been attracting attention as an alternative device to accelerate HPC applications. Data flow computing model is a popular abstraction of computing in both fine-grain and coarse-grain from decades, and this model is used as a programming model for FPGA such as Maxler DFE and SPGen (Stream Processor Generator). While it is a kind of “static” data flow model, it might be interesting to extend models by using “dynamic” data flow models as old dataflow architecture to handle the dynamic behavior of systems. On other hands, global programming models to integrate FPGA computing into parallel computing of host processors are also important. OpenMP task and target directives, which are recently introduced in OpenMP 4.5, can be extended to specify the interface to offloaded computation done by FPGA. And, the optimization for FGPA needs different metric such as hardware resources, which are very different from the optimization of CPU and GPU. In this talk, issues on programming models for FPGA accelerated HPC are presented.
Mitsuhisa Sato (RIKEN)
Mitsuhisa Sato is a deputy Director of RIKEN Center for Computational Science, renamed from AICS, and a team leader of architecture development team in FLAGSHIP 2020 project to develop Japanese flagship supercomputer in RIKEN, since 2014. He received the M.S. degree and the Ph.D. degree in information science from the University of Tokyo in 1984 and 1990. From 2001, he was a professor of Graduate School of Systems and Information Engineering, University of Tsukuba. He has been working as a director of Center for computational sciences, University of Tsukuba from 2007 to 2013. Since October 2010, he is appointed to the research team leader of programming environment research team in Advanced Institute of Computational Science (AICS), RIKEN. He is a Professor（Cooperative Graduate School Program）and Professor Emeritus of University of Tsukuba.
Hardware and software codesign on OmpSs@FPGA
Recent developments on OmpSs@FPGA have allowed us to increase the performance of the matrix multiplication benchmark up to 3x in the last year, on the Xilinx Zynq Ultrascale+ FPGA (AXIOM board). In particular, for smaller, fine-grain blocks of 128x128 single-precision floating point values, we are reaching an improvement of 1.6x using software only task dependence management. When we include hardware support for task dependence management, we reach 2x increase on performance. And when using 3 256x256 dataflow-optimized blocks with software-only task management, we reach 3x increase in performance.
The increase in the performance has come from several factors, that we will examine in this presentation. Namely, combining task parallelism on SMP and the FPGA with the "implements" approach; increasing the number of FPGA IP instances for matrix multiply computations; optimizing the FPGA IP with the Xilinx Vivado HLS directive extensions; incorporating the task dependence management system into the FPGA ("Picos").
This evolution has been possible thanks to the improvements of autoVivado, the OmpSs framework for supporting the automatic code generation for the FPGA. The autoVivado framework offloads the code annotated for the FPGA device to be compiled with the Vivado High-Level Synthesis tool, simplifying the work that the programmer needs to do to run different codes on the FPGAs. It also supports the integration of the Picos manager, as a way of accelerating the task management system, also with the target of supporting FPGA-only nodes for HPC and Data Centers.
Xavier Martorell (BSC)
Xavier Martorell received the M.S. and Ph.D. degrees in Computer Science from the Universitat Politecnica de Catalunya (UPC) in 1991 and 1999, respectively. Since 1992 he has lectured on operating systems, parallel runtime systems, OS administration, and systems for data science. He has been an associate professor in the Computer Architecture Department at UPC since 2001. His research interests cover the areas of operating systems, runtime systems, compilers and applications for high-performance multiprocessor systems. Dr. Martorell has participated in several long-term research projects with other universities and industries, primarily in the framework of the European Union ESPRIT, IST, FET and H2020 programs. He is currently participating in the LEGaTO, EuroEXA, and EPEEC projects, related to FPGA computing. He has coauthored more than 80 publications in international journals and conferences. He has co-advised eight Ph.D. theses and he is currently advising three PhD students. He is currently the Manager of the Parallel Programming Models team at the Barcelona Supercomputing Center.
Autonomic Management of Reconfigurations in DPR FPGA-based Systems
FPGA-based architectures can offer support for high flexibility through dynamic partial reconfiguration (DPR) features. This enables the switch at runtime between different computations, or different implementations of the same computation, with different characteristics in resource usage (e.g., FPGA surface) and performance (e.g. QoS). In turn, this provides support for the system to dynamically adapt itself to uncertainties related to the environment, or to the data being processed.
In this presentation we consider self-adaptive embedded systems, such as UAVs, involving an offline provisioning of the several implementations of the embedded functionalities (e.g. video processing). We propose an autonomic management architecture for self-adaptive and self-reconfigurable FPGA-based embedded systems. The control architecture is responsible for decision and choice of configurations, and is structured in three layers: a mission manager, a reconfiguration manager and a scheduling manager. In this work we focus on the design of the reconfiguration manager. We propose a design approach using automata-based discrete control.
We will draw perspectives on the used of DRP FPGA in the context of Data-Centers, as a shared resource for acceleration.
Eric Rutten (INRIA)
Eric Rutten, PhD 1990 and Hab. 1999 at University of Rennes, France, works at INRIA in Grenoble, in the field of Autonomic Computing. He currently works on the model-based control of self-adaptive and reconfigurable computing systems, using techniques from Control Theory (particularly Discrete Event Systems), and ranging from embedded systems, to Cloud distributed systems and High-Performance Computing. Recent results concern regulation of parallelism and mapping in Software Transactional Memory systems, reconfiguration control for self-adaptive Software Component-based Architectures, and control of Dynamically Partially Reconfiguragble FPGA.
Towards Production HPC with FPGA-Centric Clouds and Clusters
In his SIAM-PP keynote, Paul Coteus described the future of HPC as being, in part, Compute Everywhere. Our take is as follows: As Moore’s Law ends, performance gains will come from having compute capability wherever data moves. Also, designs will be not only application driven, but also adaptable to the application. FPGAs are the ideal devices to explore this future (and perhaps also to implement it). Since the hardware adapts to the application, rather than the reverse, we can obtain very high efficiency at low power. And since FPGAs are hybrid communication-computation processors that can be interconnected directly chip-to-chip, large-scale communication can proceed with both high bandwidth and low latency.
In this talk I give an overview of our work with High Performance Computing with FPGA-centric clouds and clusters, building around ten questions.
Martin Herbordt (Boston University)
Martin Herbordt is Professor of Electrical and Computer Engineering at Boston University where he directs the Computer Architecture and Automated Design Lab. He and his group have been working for many years in accelerating HPC applications with FPGAs. More recently their focus has been on system aspects of FPGA clusters and clouds, the latter especially in Bump-in-the-Wire configurations.
RECONF-HPC : Workshop on Reconfigurable High-Performance Computingin the same room (Tenbusu Hall, 4th floor)