CaltechAUTHORS
  A Caltech Library Service

Fast Arbitrary Precision Floating Point on FPGA

de Fine Licht, Johannes and Pattison, Christopher A. and Ziogas, Alexandros Nikolaos and Simmons-Duffin, David and Hoefler, Torsten (2022) Fast Arbitrary Precision Floating Point on FPGA. In: 2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE , Piscataway, NJ, pp. 1-9. ISBN 978-1-6654-8332-2. https://resolver.caltech.edu/CaltechAUTHORS:20220614-222241000

[img] PDF - Accepted Version
See Usage Policy.

610kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220614-222241000

Abstract

Numerical codes that require arbitrary precision floating point (APFP) numbers for their core computation are dominated by elementary arithmetic operations due to the super-linear complexity of multiplication in the number of mantissa bits. APFP computations on conventional software-based architectures are made exceedingly expensive by the lack of native hardware support, requiring elementary operations to be emulated using instructions operating on machine-word-sized blocks. In this work, we show how APFP multiplication on compile-time fixed-precision operands can be implemented as deep FPGA pipelines with a recursively defined Karatsuba decomposition on top of native DSP multiplication. When comparing our design implemented on an Alveo U250 accelerator to a dual-socket 36-core Xeon node running the GNU Multiple Precision Floating-Point Reliable (MPFR) library, we achieve a 9.8× speedup at 4.8 GOp/s for 512-bit multiplication, and a 5.3× speedup at 1.2 GOp/s for 1024-bit multiplication, corresponding to the throughput of more than 351× and 191× CPU cores, respectively. We apply this architecture to general matrix-matrix multiplication, yielding a 10× speedup at 2.0 GOp/s over the Xeon node, equivalent to more than 375× CPU cores, effectively allowing a single FPGA to replace a small CPU cluster. Due to the significant dependence of some numerical codes on APFP, such as semidefinite program solvers, we expect these gains to translate into real-world speedups. Our configurable and flexible HLS-based code provides as high-level software interface for plug-and-play acceleration, published as an open source project.


Item Type:Book Section
Related URLs:
URLURL TypeDescription
https://doi.org/10.1109/FCCM53951.2022.9786219DOIArticle
https://arxiv.org/abs/2204.06256arXivDiscussion Paper
ORCID:
AuthorORCID
de Fine Licht, Johannes0000-0002-1500-7411
Simmons-Duffin, David0000-0002-2937-9515
Hoefler, Torsten0000-0001-9611-7171
Additional Information:© 2022 IEEE. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme grant agreeement no. 101002047 and from the European High-Performance Computing Joint Undertaking (JU) under grant agreement no. 101034126. Christopher A. Pattison is supported by Air Force Office of Scientific Research (AFOSR), FA9550-19-1-0360, and thanks Dustin Kenefake for inspiring discussions. David Simmons-Duffin is supported by Simons Foundation grant 488657 (Simons Collaboration on the Non-perturbative Bootstrap) and a DOE Early Career Award under grant no. DE-SC0019085.
Group:Walter Burke Institute for Theoretical Physics, Institute for Quantum Information and Matter
Funders:
Funding AgencyGrant Number
European Research Council (ERC)101002047
European Research Council (ERC)101034126
Air Force Office of Scientific Research (AFOSR)FA9550-19-1-0360
Simons Foundation488657
Department of Energy (DOE)DE-SC0019085
DOI:10.1109/fccm53951.2022.9786219
Record Number:CaltechAUTHORS:20220614-222241000
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20220614-222241000
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:115149
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:14 Jun 2022 22:43
Last Modified:14 Jun 2022 22:43

Repository Staff Only: item control page