Published December 2020 | Version public
Book Section - Chapter

On the distance between two neural networks and the stability of learning

Abstract

This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks. Since the resulting learning rule seems to require little to no learning rate tuning, it may unlock a simpler workflow for training deeper and more complex neural networks. The Python code used in this paper is here: https://github.com/jxbz/fromage.

Additional Information

The authors would like to thank Rumen Dangovski, Dillon Huff, Jeffrey Pennington, Florian Schaefer and Joel Tropp for useful conversations. They made heavy use of a codebase built by Jiahui Yu. They are grateful to Sivakumar Arayandi Thottakara, Jan Kautz, Sabu Nadarajan and Nithya Natesan for infrastructure support. JB was supported by an NVIDIA fellowship, and this work was funded in part by NASA.

Additional details

Identifiers

Eprint ID
118580
Resolver ID
CaltechAUTHORS:20221222-180007021

Funding

NVIDIA Corporation
NASA

Dates

Created
2022-12-22
Created from EPrint's datestamp field
Updated
2022-12-22
Created from EPrint's last_modified field