A Caltech Library Service

On the distance between two neural networks and the stability of learning

Bernstein, Jeremy and Vahdat, Arash and Yue, Yisong and Liu, Ming-Yu (2020) On the distance between two neural networks and the stability of learning. . (Unpublished)

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


How far apart are two neural networks? This is a foundational question in their theory. We derive a simple and tractable bound that relates distance in function space to distance in parameter space for a broad class of nonlinear compositional functions. The bound distills a clear dependence on depth of the composition. The theory is of practical relevance since it establishes a trust region for first-order optimisation. In turn, this suggests an optimiser that we call Frobenius matched gradient descent---or Fromage. Fromage involves a principled form of gradient rescaling and enjoys guarantees on stability of both the spectra and Frobenius norms of the weights. We find that the new algorithm increases the depth at which a multilayer perceptron may be trained as compared to Adam and SGD and is competitive with Adam for training generative adversarial networks. We further verify that Fromage scales up to a language transformer with over 10⁸ parameters. Please find code & reproducibility instructions at:

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemCode & reproducibility instructions
Bernstein, Jeremy0000-0001-9110-7476
Yue, Yisong0000-0001-9127-1989
Additional Information:The authors would like to thank Dillon Huff, Jeffrey Pennington and Florian Schaefer for useful conversations. They made heavy use of a codebase built by Jiahui Yu. They are much obliged to Sivakumar Arayandi Thottakara, Jan Kautz, Sabu Nadarajan and Nithya Natesan for infrastructure support. JB is supported by an NVIDIA fellowship.
Funding AgencyGrant Number
Record Number:CaltechAUTHORS:20200214-105602886
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:101302
Deposited By: George Porter
Deposited On:14 Feb 2020 20:57
Last Modified:11 Nov 2020 00:49

Repository Staff Only: item control page