A Caltech Library Service

Guaranteed clustering and biclustering via semidefinite programming

Ames, Brendan P. W. (2014) Guaranteed clustering and biclustering via semidefinite programming. Mathematical Programming, 147 (1-2). pp. 429-465. ISSN 0025-5610. doi:10.1007/s10107-013-0729-x.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Identifying clusters of similar objects in data plays a significant role in a wide range of applications. As a model problem for clustering, we consider the densest k -disjoint-clique problem, whose goal is to identify the collection of k disjoint cliques of a given weighted complete graph maximizing the sum of the densities of the complete subgraphs induced by these cliques. In this paper, we establish conditions ensuring exact recovery of the densest k cliques of a given graph from the optimal solution of a particular semidefinite program. In particular, the semidefinite relaxation is exact for input graphs corresponding to data consisting of k large, distinct clusters and a smaller number of outliers. This approach also yields a semidefinite relaxation with similar recovery guarantees for the biclustering problem. Given a set of objects and a set of features exhibited by these objects, biclustering seeks to simultaneously group the objects and features according to their expression levels. This problem may be posed as that of partitioning the nodes of a weighted bipartite complete graph such that the sum of the densities of the resulting bipartite complete subgraphs is maximized. As in our analysis of the densest k -disjoint-clique problem, we show that the correct partition of the objects and features can be recovered from the optimal solution of a semidefinite program in the case that the given data consists of several disjoint sets of objects exhibiting similar features. Empirical evidence from numerical experiments supporting these theoretical guarantees is also provided.

Item Type:Article
Related URLs:
URLURL TypeDescription DOIArticle ReadCube access Paper
Additional Information:© 2013 Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society. Received: 16 February 2012; Accepted: 9 November 2013; Published online: 30 November 2013. This research was supported in part by the Institute for Mathematics and its Applications with funds provided by the National Science Foundation, and by a Postgraduate Scholarship from NSERC (Natural Science and Engineering Research Council of Canada). I am grateful to Stephen Vavasis, Henry Wolkowicz, Levent Tunçel, Shai Ben-David, Inderjit Dhillon, Ben Recht, and Ting Kei Pong for their helpful comments and suggestions. I am especially grateful to Ting Kei for his help implementing the ADMM algorithms used to perform the numerical trials. I would also like to thank Warren Schudy for suggesting some relevant references that were omitted in an earlier version. Finally, I thank the two anonymous reviewers, whose suggestions vastly improved the presentation and organization of this paper.
Funding AgencyGrant Number
Institute for Mathematics and its ApplicationsUNSPECIFIED
Natural Sciences and Engineering Research Council of Canada (NSERC)UNSPECIFIED
Subject Keywords:Semidefinite programming; Clustering; Planted clique; Biclustering; Co-clustering
Issue or Number:1-2
Record Number:CaltechAUTHORS:20141016-101357930
Persistent URL:
Official Citation:Ames, B.P.W. Math. Program. (2014) 147: 429. doi:10.1007/s10107-013-0729-x
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:50441
Deposited By: Tony Diaz
Deposited On:16 Oct 2014 18:23
Last Modified:10 Nov 2021 18:56

Repository Staff Only: item control page