Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published December 2008 | Submitted
Journal Article Open

Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem

Abstract

The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in R^d , incomplete data objects correspond to affine subspaces (lines or Δ-flats).With this motivation we study the problem of finding the minimum intersection radius r(L) of a set of lines or Δ-flats L: the least r such that there is a ball of radius r intersecting every flat in L. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher dimensional flats, primarily because "distances" between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly's theorem. This "intrinsic-dimension" Helly theorem states: for any family L of Δ-dimensional convex sets in a Hilbert space, there exist Δ + 2 sets L' ⊆ L such that r(L) ≤ 2r(L'). Based upon this we present an algorithm that computes a (1+ε)-core set L' ⊆ L, |L'| = O(Δ^4/ε), such that the ball centered at a point c with radius (1 +ε)r(L') intersects every element of L. The running time of the algorithm is O(n^(Δ+1)dpoly(Δ/ε)). For the case of lines or line segments (Δ = 1), the (expected) running time of the algorithm can be improved to O(ndpoly(1/ε)).We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space.

© 2008 Springer Science. Received: 12 April 2007/Revised: 12 June 2008/Accepted: 7 July 2008/Published online: 23 September 2008. Work was done when J. Gao was with Center for the Mathematics of Information, California Institute of Technology. Work was done when M. Langberg was a postdoctoral scholar at the California Institute of Technology. Research supported in part by NSF grant CCF-0346991. Research of L.J. Schulman supported in part by an NSF ITR and the Okawa Foundation.

Attached Files

Submitted - GAOsoda08submitted.pdf

Files

GAOsoda08submitted.pdf
Files (178.8 kB)
Name Size
md5:1ee9dcebb6fe7764dfb2d6c7212707ab
178.8 kB