Compression, inversion, and approximate PCA of dense kernel matrices at near-linear computational complexity
Abstract
Dense kernel matrices Θ∈R^(N×N) obtained from point evaluations of a covariance function G at locations {x_i}_(1≤i≤N)⊂ ℝ^d arise in statistics, machine learning, and numerical analysis. For covariance functions that are Green's functions of elliptic boundary value problems and homogeneously distributed sampling points, we show how to identify a subset S⊂{1,…,N}², with #S=O(N log(N)log^d(N/ϵ)), such that the zero fill-in incomplete Cholesky factorization of the sparse matrix Θ_(i,j)1_((i,j)∈S) is an ϵ-approximation of Θ. This factorization can provably be obtained in complexity O(N log(N) log^d(N/ϵ)) in space and O(N log²(N) log^(2d))(N/ϵ)) in time, improving upon the state of the art for general elliptic operators; we further present numerical evidence that d can be taken to be the intrinsic dimension of the data set rather than that of the ambient space. The algorithm only needs to know the spatial configuration of the x_i and does not require an analytic representation of G. Furthermore, this factorization straightforwardly provides an approximate sparse PCA with optimal rate of convergence in the operator norm. Hence, by using only subsampling and the incomplete Cholesky factorization, we obtain, at nearly linear complexity, the compression, inversion, and approximate PCA of a large class of covariance matrices. By inverting the order of the Cholesky factorization we also obtain a solver for elliptic PDE with complexity O (N log^d(N/ϵ)) in space and O (N log^(2d)(N/ϵ)) in time, improving upon the state of the art for general elliptic operators.
Additional Information
© 2021 Florian Schäfer, T. J. Sullivan, Houman Owhadi. Received by the editors October 24, 2019; accepted for publication (in revised form) October 9, 2020; published electronically April 15, 2021. The first and third authors gratefully acknowledge support by the Air Force Office of Scientific Research and the DARPA EQUiPS Program (award FA9550-16-1-0054, Computational Information Games) and the Air Force Office of Scientific Research (award FA9550-18-1-0271, Games for Computation and Learning). The second author has been supported by the Freie Universitat Berlin within the Excellence Initiative of the German Research Foundation. This collaboration has been facilitated by the Statistical and Applied Mathematical Sciences Institute through the National Science Foundation award DMS-1127914. We would like to thank C. Oates and P. Schröder for helpful discussions, and C. Scovel for many helpful comments and suggestions on an earlier version of this paper (June 2017, arXiv:1706.02205). We would like to thank the two anonymous referees for their constructive feedback, which helped to improve the paper.Attached Files
Published - 19m129526x.pdf
Submitted - 1706.02205.pdf
Files
Name | Size | Download all |
---|---|---|
md5:8754fd1003b4c848e8e30b60a972b0a9
|
5.4 MB | Preview Download |
md5:84fd67e5e68832d44c1b892f68ba7368
|
6.6 MB | Preview Download |
Additional details
- Eprint ID
- 78882
- Resolver ID
- CaltechAUTHORS:20170710-083743909
- Air Force Office of Scientific Research (AFOSR)
- FA9550-16-1-0054
- Air Force Office of Scientific Research (AFOSR)
- FA9550-18-1-0271
- Deutsche Forschungsgemeinschaft (DFG)
- NSF
- DMS-1127914
- Created
-
2017-07-10Created from EPrint's datestamp field
- Updated
-
2021-08-20Created from EPrint's last_modified field