A Study of Generalization of Stochastic Mirror Descent Algorithms on Overparameterized Nonlinear Models
- Creators
- Azizan, Navid
- Lale, Sahin
- Hassibi, Babak
Abstract
We study the convergence, the implicit regularization and the generalization of stochastic mirror descent (SMD) algorithms in overparameterized nonlinear models, where the number of model parameters exceeds the number of training data points. Due to overpa-rameterization, the training loss has infinitely many global minima where they define a manifold of interpolating solutions. To have an understanding of the generalization performance of SMD algorithms, it is important to characterize which global minima the SMD algorithms converge to. In this work, we first theoretically show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima, which is usually the case in the high overparameterization setting, using sufficiently small step size, SMD converges to a global minimum. We further prove that this global minimum is approximately the closest one to the initialization in Bregman divergence, demonstrating the approximate implicit regularization of SMD. We then empirically confirm that these theoretical results are observed in practice. Finally, we provide an extensive study of the generalization of SMD algorithms. In our experiments, we show that on the CIFAR-10 dataset, SMD with ℓ₁₀ norm potential (as a surrogate for ℓ∞ ) consistently general-izes better than SGD (corresponding to an ℓ₂ norm potential), which in turn consistently outperforms SMD with ℓ₁ norm potential.
Additional Information
© 2020 IEEE.Additional details
- Eprint ID
- 102604
- DOI
- 10.1109/icassp40776.2020.9053864
- Resolver ID
- CaltechAUTHORS:20200417-131039768
- Created
-
2020-04-17Created from EPrint's datestamp field
- Updated
-
2022-12-23Created from EPrint's last_modified field