Published July 2021 | Version Submitted + Published
Journal Article Open

Learning by Turning: Neural Architecture Aware Optimisation

  • 1. ROR icon California Institute of Technology

Abstract

Descent methods for deep networks are notoriously capricious: they require careful tuning of step size, momentum and weight decay, and which method will work best on a new benchmark is a priori unclear. To address this problem, this paper conducts a combined study of neural architecture and optimisation, leading to a new optimiser called Nero: the neuronal rotator. Nero trains reliably without momentum or weight decay, works in situations where Adam and SGD fail, and requires little to no learning rate tuning. Also, Nero's memory footprint is square root that of Adam or LAMB. Nero combines two ideas: (1) projected gradient descent over the space of balanced networks; (2) neuron-specific updates, where the step size sets the angle through which each neuron's hyperplane turns. The paper concludes by discussing how this geometric connection between architecture and optimisation may impact theories of generalisation in deep learning.

Additional Information

© 2021 by the authors.

Attached Files

Published - liu21c.pdf

Submitted - 2102.07227.pdf

Files

2102.07227.pdf

Files (3.3 MB)

Name Size Download all
md5:24f4ee95bc5879d950488d7d9c9b5862
1.0 MB Preview Download
md5:97e79b86f466daa2ad450ca68dd6664e
2.3 MB Preview Download

Additional details

Identifiers

Eprint ID
108202
Resolver ID
CaltechAUTHORS:20210225-132711583

Related works

Dates

Created
2021-02-26
Created from EPrint's datestamp field
Updated
2023-06-02
Created from EPrint's last_modified field

Caltech Custom Metadata

Caltech groups
Division of Biology and Biological Engineering (BBE), Division of Engineering and Applied Science (EAS)