CaltechAUTHORS
  A Caltech Library Service

Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods

Janzamin, Majid and Sedghi, Hanie and Anandkumar, Anima (2015) Beating the Perils of Non-Convexity: Guaranteed Training of Neural Networks using Tensor Methods. . (Unpublished) http://resolver.caltech.edu/CaltechAUTHORS:20190401-123300368

[img] PDF - Submitted Version
See Usage Policy.

897Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20190401-123300368

Abstract

Training neural networks is a challenging non-convex optimization problem, and backpropagation or gradient descent can get stuck in spurious local optima. We propose a novel algorithm based on tensor decomposition for guaranteed training of two-layer neural networks. We provide risk bounds for our proposed method, with a polynomial sample complexity in the relevant parameters, such as input dimension and number of neurons. While learning arbitrary target functions is NP-hard, we provide transparent conditions on the function and the input for learnability. Our training method is based on tensor decomposition, which provably converges to the global optimum, under a set of mild non-degeneracy conditions. It consists of simple embarrassingly parallel linear and multi-linear operations, and is competitive with standard stochastic gradient descent (SGD), in terms of computational complexity. Thus, we propose a computationally efficient method with guaranteed risk bounds for training neural networks with one hidden layer.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
http://arxiv.org/abs/1506.08473arXivDiscussion Paper
Additional Information:We thank Ben Recht for pointing out to us the Barron’s work on approximation bounds for neural networks (Barron, 1993), and thank Arthur Gretton for discussion about estimation of score function. We also acknowledge fruitful discussion with Roi Weiss about the presentation of proof of Theorem 5 on combining estimation and approximation bounds, and his detailed editorial comments about the preliminary version of the draft. We thank Peter Bartlett for detailed discussions and pointing us to several classical results on neural networks. We are very thankful for Andrew Barron for detailed discussion and for encouraging us to explore alternate incremental training methods for estimating remaining parameters after the tensor decomposition step and we have added this discussion to the paper. We thank Daniel Hsu for discussion on random design analysis of ridge regression. We thank Percy Liang for discussion about score function. M. Janzamin is supported by NSF BIGDATA award FG16455. H. Sedghi is supported by NSF Career award FG15890. A. Anandkumar is supported in part by Microsoft Faculty Fellowship, NSF Career award CCF-1254106, and ONR Award N00014-14-1-0665.
Funders:
Funding AgencyGrant Number
NSFFG16455
NSFFG15890
Microsoft Faculty FellowshipUNSPECIFIED
NSFCCF-1254106
Office of Naval Research (ONR)N00014-14-1-0665
Subject Keywords:neural networks, risk bound, method-of-moments, tensor decomposition
Record Number:CaltechAUTHORS:20190401-123300368
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20190401-123300368
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:94321
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:01 Apr 2019 22:52
Last Modified:01 Apr 2019 22:52

Repository Staff Only: item control page