CaltechAUTHORS
  A Caltech Library Service

Born Again Neural Networks

Furlanello, Tommaso and Lipton, Zachary C. and Tschannen, Michael and Itti, Laurent and Anandkumar, Anima (2018) Born Again Neural Networks. Proceedings of Machine Learning Research, 80 . pp. 1607-1616. ISSN 2640-3498. doi:10.48550/arXiv.1805.04770. https://resolver.caltech.edu/CaltechAUTHORS:20190327-085757099

[img] PDF - Published Version
See Usage Policy.

495kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20190327-085757099

Abstract

Knowledge Distillation (KD) consists of transferring “knowledge” from one machine learning model (the teacher) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student’s compactness, without sacrificing too much performance. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP). Both methods elucidate the essential components of KD, demonstrating the effect of the teacher outputs on both predicted and non-predicted classes.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://proceedings.mlr.press/v80/furlanello18a.htmlPublisherArticle
http://arxiv.org/abs/1805.04770arXivArticle
Additional Information:© 2018 by the author(s). This work was supported by the National Science Foundation (grant numbers CCF-1317433 and CNS-1545089), C-BRIC (one of six centers in JUMP, a Semiconductor Research Corporation (SRC) program sponsored by DARPA), and the Intel Corporation. The authors affirm that the views expressed herein are solely their own, and do not represent the views of the United States government or any agency thereof.
Funders:
Funding AgencyGrant Number
NSFCCF-1317433
NSFCNS-1545089
Center for Brain-inspired Computing Enabling Autonomous Intelligence (C-BRIC)UNSPECIFIED
IntelUNSPECIFIED
DOI:10.48550/arXiv.1805.04770
Record Number:CaltechAUTHORS:20190327-085757099
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20190327-085757099
Official Citation: Tommaso Furlanello, Zachary Lipton, Michael Tschannen, Laurent Itti, Anima Anandkumar ; Proceedings of the 35th International Conference on Machine Learning, PMLR 80:1607-1616, 2018.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:94176
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:28 Mar 2019 23:17
Last Modified:02 Jun 2023 00:41

Repository Staff Only: item control page