A Caltech Library Service

ZerO Initialization: Initializing Residual Networks with only Zeros and Ones

Zhao, Jiawei and Schäfer, Florian and Anandkumar, Anima (2021) ZerO Initialization: Initializing Residual Networks with only Zeros and Ones. . (Unpublished)

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Deep neural networks are usually initialized with random weights, with adequately selected initial variance to ensure stable signal propagation during training. However, there is no consensus on how to select the variance, and this becomes challenging especially as the number of layers grows. In this work, we replace the widely used random weight initialization with a fully deterministic initialization scheme ZerO, which initializes residual networks with only zeros and ones. By augmenting the standard ResNet architectures with a few extra skip connections and Hadamard transforms, ZerO allows us to start the training from zeros and ones entirely. This has many benefits such as improving reproducibility (by reducing the variance over different experimental runs) and allowing network training without batch normalization. Surprisingly, we find that ZerO achieves state-of-the-art performance over various image classification datasets, including ImageNet, which suggests random weights may be unnecessary for modern network initialization.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Anandkumar, Anima0000-0002-6974-6797
Record Number:CaltechAUTHORS:20220714-224704502
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:115609
Deposited By: George Porter
Deposited On:15 Jul 2022 23:04
Last Modified:15 Jul 2022 23:04

Repository Staff Only: item control page