A Caltech Library Service

Fast convolutional neural networks on FPGAs with hls4ml

Aarrestad, Thea and Loncar, Vladimir and Ghielmetti, Nicolò and Pierini, Maurizio and Summers, Sioni and Ngadiuba, Jennifer and Petersson, Christoffer and Linander, Hampus and Iiyama, Yutaro and Di Guglielmo, Giuseppe and Duarte, Javier and Harris, Philip and Rankin, Dylan and Jindariani, Sergo and Pedro, Kevin and Tran, Nhan and Liu, Mia and Kreinar, Edward and Wu, Zhenbin and Hoang, Duc (2021) Fast convolutional neural networks on FPGAs with hls4ml. Machine Learning: Science and Technology, 2 (4). Art. No. 045015. ISSN 2632-2153. doi:10.1088/2632-2153/ac0ea1.

[img] PDF - Published Version
Creative Commons Attribution.

[img] PDF - Accepted Version
See Usage Policy.


Use this Persistent URL to link to this item:


We introduce an automated tool for deploying ultra low-latency, low-power deep neural networks with convolutional layers on field-programmable gate arrays (FPGAs). By extending the hls4ml library, we demonstrate an inference latency of 5 µs using convolutional architectures, targeting microsecond latency applications like those at the CERN Large Hadron Collider. Considering benchmark models trained on the Street View House Numbers Dataset, we demonstrate various methods for model compression in order to fit the computational constraints of a typical FPGA device used in trigger and data acquisition systems of particle detectors. In particular, we discuss pruning and quantization-aware training, and demonstrate how resource utilization can be significantly reduced with little to no loss in model accuracy. We show that the FPGA critical resource consumption can be reduced by 97% with zero loss in model accuracy, and by 99% when tolerating a 6% accuracy degradation.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper Itemhls4ml Library Library ItemExamples on how to use hls4ml ItemQKeras library ItemSVHN dataset ItemTensorFlow Datasets
Aarrestad, Thea0000-0002-7671-243X
Pierini, Maurizio0000-0003-1939-4268
Ngadiuba, Jennifer0000-0002-0055-2935
Duarte, Javier0000-0002-5076-7096
Harris, Philip0000-0001-8189-3741
Additional Information:© 2021 The Author(s). Published by IOP Publishing Ltd. Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Received 15 January 2021; Accepted 25 June 2021; Published 16 July 2021. We acknowledge the Fast Machine Learning collective as an open community of multi-domain experts and collaborators. This community was important for the development of this project. M P, S S and V L are supported by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (Grant Agreement No. 772369). S J, M L, K P, and N T are supported by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy (DOE), Office of Science, Office of High Energy Physics. P H is supported by a Massachusetts Institute of Technology University grant. Z W is supported by the National Science Foundation under Grant Nos. 1606321 and 115164. J D is supported by the DOE, Office of Science, Office of High Energy Physics Early Career Research program under Award No. DE-SC0021187. Data availability statement: The data that support the findings of this study are openly available. Code availability statement: The hls4ml library is available at and archived in the Zenodo platform at 10.5281/zenodo.4161550. The work presented here is based on the Bartsia release, version 0.5.0. For examples on how to use hls4ml, the notebooks in serve as a general introduction. The QKeras library, which also includes AutoQKeras and QTools, is available at The SVHN dataset [17] can be downloaded at or through TensorFlow Datasets at
Funding AgencyGrant Number
European Research Council (ERC)772369
Department of Energy (DOE)DE-AC02-07CH11359
Massachusetts Institute of Technology (MIT)UNSPECIFIED
Department of Energy (DOE)DE-SC0021187
Subject Keywords:deep learning, FPGA, convolutional neural network
Issue or Number:4
Record Number:CaltechAUTHORS:20210727-212012924
Persistent URL:
Official Citation:Thea Aarrestad et al 2021 Mach. Learn.: Sci. Technol. 2 045015
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:110042
Deposited By: Tony Diaz
Deposited On:02 Aug 2021 20:32
Last Modified:02 Aug 2021 20:32

Repository Staff Only: item control page