García Trillos, Nicolás and Hoffmann, Franca and Hosseini, Bamdad (2021) Geometric structure of graph Laplacian embeddings. Journal of Machine Learning Research, 22 . pp. 1-55. ISSN 1533-7928. https://resolver.caltech.edu/CaltechAUTHORS:20200331-074327697
![]() |
PDF
- Published Version
Creative Commons Attribution. 565kB |
![]() |
PDF
- Submitted Version
See Usage Policy. 684kB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200331-074327697
Abstract
We analyze the spectral clustering procedure for identifying coarse structure in a data set x₁,…,x_n, and in particular study the geometry of graph Laplacian embeddings which form the basis for spectral clustering algorithms. More precisely, we assume that the data are sampled from a mixture model supported on a manifold M embedded in Rd, and pick a connectivity length-scale ε>0 to construct a kernelized graph Laplacian. We introduce a notion of a well-separated mixture model which only depends on the model itself, and prove that when the model is well separated, with high probability the embedded data set concentrates on cones that are centered around orthogonal vectors. Our results are meaningful in the regime where ε=ε(n) is allowed to decay to zero at a slow enough rate as the number of data points grows. This rate depends on the intrinsic dimension of the manifold on which the data is supported.
Item Type: | Article | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| |||||||||
ORCID: |
| |||||||||
Additional Information: | © 2021 Nicolás García Trillos, Franca Hoffmann, Bamdad Hosseini. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Submitted 8/19; Revised 11/20; Published 3/21. The authors would like to thank Ulrike von Luxburg for pointing them to the paper Schiebinger et al. (2015) which was the starting point of this work. NGT was supported by NSF grant DMS 1912802. FH was partially supported by Caltech's von Karman postdoctoral instructorship. BH is supported in part by a postdoctoral fellowship granted by Natural Sciences and Engineering Research Council of Canada. | |||||||||
Funders: |
| |||||||||
Subject Keywords: | Unsupervised learning, spectral clustering, graph Laplacian, mixture models, continuum limit | |||||||||
Record Number: | CaltechAUTHORS:20200331-074327697 | |||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20200331-074327697 | |||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||
ID Code: | 102184 | |||||||||
Collection: | CaltechAUTHORS | |||||||||
Deposited By: | Tony Diaz | |||||||||
Deposited On: | 31 Mar 2020 16:09 | |||||||||
Last Modified: | 06 Jul 2021 17:57 |
Repository Staff Only: item control page