CaltechAUTHORS
  A Caltech Library Service

Finding Structure with Randomness: Stochastic Algorithms for Constructing Approximate matrix Decompositions

Halko, N. and Martinsson, P. G. and Tropp, J. A. (2009) Finding Structure with Randomness: Stochastic Algorithms for Constructing Approximate matrix Decompositions. California Institute of Technology , Pasadena, CA. (Unpublished) http://resolver.caltech.edu/CaltechAUTHORS:20111012-111324407

[img]
Preview
PDF
See Usage Policy.

1795Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20111012-111324407

Abstract

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. In particular, these techniques o®er a route toward principal component analysis (PCA) for petascale data. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed|either explicitly or implicitly|to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m x n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast with O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can be reorganized to exploit multi-processor architectures. (iii) For a matrix that is too large to fit in slow memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.


Item Type:Report or Paper (Technical Report)
Additional Information:The authors have bene¯ted from valuable discussions with many researchers, among them Inderjit Dhillon, Petros Drineas, Ming Gu, Edo Lib- erty, Michael Mahoney, Vladimir Rokhlin, Yoel Shkolnisky, and Arthur Szlam. In particular, we would like to thank Mark Tygert for his insightful remarks on early drafts of this paper. The example in Section 7.3 was provided by Fran»cois Meyer of the University of Colorado at Boulder. The example in Section 7.4 comes from the FERET database of facial images collected under the FERET program, sponsored by the DoD Counterdrug Technology Development Program O±ce. The work reported was initiated during the program Mathematics of Knowledge and Search Engines held at IPAM in the fall of 2007. Supported by NSF awards #0748488 and #0610097. Supported by ONR award #N000140810883
Group:Applied & Computational Mathematics
Funders:
Funding AgencyGrant Number
NSF0748488
NSF0610097
ONRN000140810883
Subject Keywords:Dimension reduction, eigenvalue decomposition, interpolative decomposition, Johnson{Lindenstrauss lemma, matrix approximation, parallel algorithm, pass-e±cient algorithm, principal component analysis, randomized algorithm, random matrix, rank-revealing QR factoriza- tion, singular value decomposition, streaming algorithm.
Other Numbering System:
Other Numbering System NameOther Numbering System ID
Applied & Computational Mathematics Technical Report2009-05
Classification Code:AMS subject classifcations. [MSC2010] Primary: 65F30. Secondary: 68W20, 60B20.
Record Number:CaltechAUTHORS:20111012-111324407
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20111012-111324407
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:27187
Collection:CaltechACMTR
Deposited By: Kristin Buxton
Deposited On:19 Oct 2011 19:42
Last Modified:26 Dec 2012 14:16

Repository Staff Only: item control page