CaltechAUTHORS
  A Caltech Library Service

Spectral analysis of weighted Laplacians arising in data clustering

Hoffmann, Franca and Hosseini, Bamdad and Oberai, Assad A. and Stuart, Andrew M. (2022) Spectral analysis of weighted Laplacians arising in data clustering. Applied and Computational Harmonic Analysis, 56 . pp. 189-249. ISSN 1063-5203. doi:10.1016/j.acha.2021.07.004. https://resolver.caltech.edu/CaltechAUTHORS:20200331-075759863

[img] PDF - Accepted Version
See Usage Policy.

2MB
[img] PDF (13 July 2020) - Submitted Version
See Usage Policy.

1MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200331-075759863

Abstract

Graph Laplacians computed from weighted adjacency matrices are widely used to identify geometric structure in data, and clusters in particular; their spectral properties play a central role in a number of unsupervised and semi-supervised learning algorithms. When suitably scaled, graph Laplacians approach limiting continuum operators in the large data limit. Studying these limiting operators, therefore, sheds light on learning algorithms. This paper is devoted to the study of a parameterized family of divergence form elliptic operators that arise as the large data limit of graph Laplacians. The link between a three-parameter family of graph Laplacians and a three-parameter family of differential operators is explained. The spectral properties of these differential operators are analyzed in the situation where the data comprises of two nearly separated clusters, in a sense which is made precise. In particular, we investigate how the spectral gap depends on the three parameters entering the graph Laplacian, and on a parameter measuring the size of the perturbation from the perfectly clustered case. Numerical results are presented which exemplify the analysis and which extend it in the following ways: the computations study situations in which there are two nearly separated clusters, but which violate the assumptions used in our theory; situations in which more than two clusters are present, also going beyond our theory; and situations which demonstrate the relevance of our studies of differential operators for the understanding of finite data problems via the graph Laplacian. The findings provide insight into parameter choices made in learning algorithms which are based on weighted adjacency matrices; they also provide the basis for analysis of the consistency of various unsupervised and semi-supervised learning algorithms, in the large data limit.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1016/j.acha.2021.07.004DOIArticle
https://arxiv.org/abs/1909.06389arXivDiscussion Paper
ORCID:
AuthorORCID
Hoffmann, Franca0000-0002-1182-5521
Stuart, Andrew M.0000-0001-9091-7266
Additional Information:© 2021 Elsevier Inc. Received 13 September 2019, Revised 18 April 2021, Accepted 30 July 2021, Available online 1 September 2021. The authors are grateful to Nicolás García Trillos for helpful discussions regarding the results in Section 5 concerning various graph Laplacians and their continuum limits. We are also thankful to the anonymous reviewers whose comments and suggestions helped us improve an earlier version of this article. AMS is grateful to AFOSR (grant FA9550-17-1-0185) and NSF (grant DMS 18189770) for financial support. FH was partially supported by Caltech's von Kármán postdoctoral instructorship. BH was partially supported by an NSERC PDF fellowship.
Funders:
Funding AgencyGrant Number
Air Force Office of Scientific Research (AFOSR)FA9550-17-1-0185
NSFDMS-18189770
CaltechUNSPECIFIED
Natural Sciences and Engineering Research Council of Canada (NSERC)UNSPECIFIED
Subject Keywords:Spectral clustering; Graph Laplacian; Large data limits; Elliptic differential operators; Perturbation analysis; Spectral gap; Differential geometry
Classification Code:AMS subject classifications: 47A75, 62H30, 68T10, 35B20, 05C50
DOI:10.1016/j.acha.2021.07.004
Record Number:CaltechAUTHORS:20200331-075759863
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20200331-075759863
Official Citation:Franca Hoffmann, Bamdad Hosseini, Assad A. Oberai, Andrew M. Stuart, Spectral analysis of weighted Laplacians arising in data clustering, Applied and Computational Harmonic Analysis, Volume 56, 2022, Pages 189-249, ISSN 1063-5203, https://doi.org/10.1016/j.acha.2021.07.004.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:102186
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:31 Mar 2020 16:02
Last Modified:23 Sep 2021 17:28

Repository Staff Only: item control page