Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published June 2015 | Submitted
Book Section - Chapter Open

Learning Arbitrary Statistical Mixtures of Discrete Distributions


We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, mix, is a probability distribution over probability distributions p, where each such p is a probability distribution over [n] = {1,2,...,n}. When we sample from mix, we do not observe p directly, but only indirectly and in very noisy fashion, by sampling from [n] repeatedly, independently K times from the distribution p. The problem is to infer mix to high accuracy in transportation (earthmover) distance. We give the first efficient algorithms for learning this mixture model without making any restricting assumptions on the structure of the distribution mix. We bound the quality of the solution as a function of the size of the samples K and the number of samples used. Our model and results have applications to a variety of unsupervised learning scenarios, including learning topic models and collaborative filtering.

Additional Information

© 2015 ACM. Supported in part by the National Basic Research Program of China grants 2015CB358700, 2011CBA00300, 2011CBA00301, and the National NSFC grants 61202009, 61033001, 61361136003. Work performed in part at the Simons Institute for the Theory of Computing. Supported by BSF grant number 2012333, and by the Israeli Center of Excellence on Algorithms. Supported in part by NSF grant 1319745. Work performed in part at the Simons Institute for the Theory of Computing. Supported in part by NSERC grant 32760-06, an NSERC Discovery Accelerator Supplement Award, and an Ontario Early Researcher Award.

Attached Files

Submitted - 1504.02526v1.pdf


Files (252.8 kB)
Name Size Download all
252.8 kB Preview Download

Additional details

August 20, 2023
October 23, 2023