A Caltech Library Service

Differentially Quantized Gradient Descent

Lin, Chung-Yi and Kostina, Victoria and Hassibi, Babak (2021) Differentially Quantized Gradient Descent. In: 2021 IEEE International Symposium on Information Theory (ISIT). IEEE , Piscataway, NJ, pp. 1200-1205. ISBN 978-1-5386-8209-8.

[img] PDF - Accepted Version
See Usage Policy.

[img] PDF (22 Jun 2021) - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The only information that the server knows about the problem instance is what it receives from the worker via a rate-limited noiseless communication channel. We introduce the technique we call differential quantization (DQ) that compensates past quantization errors to make the descent trajectory of a quantized algorithm follow that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that differentially quantized gradient descent (DQ-GD) attains a linear convergence rate of max{σ_(GD), ρ_n2^(-R)}, where σ_(GD) is the convergence rate of unquantized gradient descent (GD), ρ_n is the covering efficiency of the quantizer, and R is the bitrate per problem dimension n. Thus at any R ≥ log₂ρ_n/σ_(GD), the convergence rate of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show a converse demonstrating that no GD-like quantized algorithm can converge faster than max{σ_(GD), 2^(-R)}. Since quantizers exist with ρ_n → 1 as n → ∞ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. In contrast, naively quantized GD where the worker directly quantizes the gradient attains only σ_(GD) + ρ_n2^(-R). The technique of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in convergence rate obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on both simulated and realworld least-squares problems validate our theoretical analysis.

Item Type:Book Section
Related URLs:
URLURL TypeDescription Paper
Kostina, Victoria0000-0002-2406-7440
Alternate Title:Achieving the fundamental convergence-communication tradeoff with Differentially Quantized Gradient Descent, Differentially Quantized Gradient Methods
Additional Information:© 2021 IEEE. This work was supported in part by the National Science Foundation (NSF) under grants CCF-1751356, CCF-1956386, CNS-0932428, CCF-1018927, CCF-1423663 and CCF-1409204, by a grant from Qualcomm Inc., by NASAs Jet Propulsion Laboratory through the President and Directors Fund, and by King Abdullah University of Science and Technology. Stimulating discussions with Dr. Himanshu Tyagi and Dr. Vincent Tan are gratefully acknowledged.
Funding AgencyGrant Number
JPL President and Director's FundUNSPECIFIED
King Abdullah University of Science and Technology (KAUST)UNSPECIFIED
Record Number:CaltechAUTHORS:20200214-105624458
Persistent URL:
Official Citation:C. -Y. Lin, V. Kostina and B. Hassibi, "Differentially Quantized Gradient Descent," 2021 IEEE International Symposium on Information Theory (ISIT), 2021, pp. 1200-1205, doi: 10.1109/ISIT45174.2021.9518254
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:101308
Deposited By: George Porter
Deposited On:14 Feb 2020 19:18
Last Modified:10 May 2022 18:45

Repository Staff Only: item control page