Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published August 4, 2024 | Submitted
Discussion Paper Open

Gradient-based Automatic Per-Weight Mixed Precision Quantization for Neural Networks On-Chip

  • 1. ROR icon California Institute of Technology
  • 2. ROR icon ETH Zurich

Abstract

Model size and inference speed at deployment time, are major challenges in many deep learning applications. A promising strategy to overcome these challenges is quantization. However, a straightforward uniform quantization to very low precision can result in significant accuracy loss. Mixed-precision quantization, based on the idea that certain parts of the network can accommodate lower precision without compromising performance compared to other parts, offers a potential solution. In this work, we present High Granularity Quantization (HGQ), an innovative quantization-aware training method designed to fine-tune the per-weight and per-activation precision in an automatic way for ultra-low latency and low power neural networks which are to be deployed on FPGAs. We demonstrate that HGQ can outperform existing methods by a substantial margin, achieving resource reduction by up to a factor of 20 and latency improvement by a factor of 5 while preserving accuracy.

Code Availability

We have made our library publicly available under the Apache 2.0 license at https://www.github.com/calad0i/HGQ. The scripts to reproduce the results in this paper are also available at https://www.github.com/calad0i/HGQ-demos under the Apache 2.0 license.

Data Availability

The data used for training and evaluation in this work are all publicly available datasets. The jet tagging dataset is available at https://dx.doi.org/10.5281/zenodo.2603255. The SVHN dataset is available at http://ufldl.stanford.edu/housenumbers/.  The muon tracking dataset is available at https://dx.doi.org/10.57967/hf/2084. Results shown in this work can be reproduced 
using the code available at https://www.github.com/calad0i/HGQ-demos.

Acknowledgement

C.S. is partially supported by the Caltech Danny Koh grad fellowship. C.S. acknowledges partial support from Gunther Dissertori. C.S. and M.S. acknowledge partial support from the U.S. Department of Energy (DOE), Office of Science,
Office of High Energy Physics grant DE-SC0011925. T.Å . is supported by the Swiss National Science Foundation Grant No. PZ00P2 201594. J.N., M.S., and C.S. are partially supported by the U.S. Department of Energy (DOE), Office of Science, Office of High Energy Physics “Designing efficient edge AI with physics phenomena” Project (DEFOA0002705). J.N. is partially supported by the AI2050 program at Schmidt
Futures (Grant G-23-64934). V.L. is supported by the NSF Institute for Accelerated AI Algorithms for Data-Driven Discovery (A3D3), under the NSF grant #PHY-2117997.

Additional Information

C.S. conceived, designed, and implemented the HGQ method and library and performed the experiments. C.S. and V.C. implemented HGQ support in hls4ml. C.S. and T.A. wrote the manuscript. All authors reviewed and edited the manuscript.

Conflict of Interest

The authors declare no competing interests.

Attached Files

Discussion paper in ArXiv: 2405.00645v1.pdf
Paper submitted for publication: HGQ_NML.pdf

Files

2405.00645v2.pdf
Files (1.8 MB)
Name Size Download all
md5:f25cec7eb69ffabc397aa03684b3852d
668.4 kB Preview Download
md5:bb75e27967def880579c12433a65db90
504.6 kB Preview Download
md5:2c03b9dc853eb65731d12674e2e2c95b
615.4 kB Preview Download

Additional details

Created:
August 14, 2024
Modified:
August 14, 2024