Published June 2015 | Version Submitted
Book Section - Chapter Open

Learning Arbitrary Statistical Mixtures of Discrete Distributions

Abstract

We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, mix, is a probability distribution over probability distributions p, where each such p is a probability distribution over [n] = {1,2,...,n}. When we sample from mix, we do not observe p directly, but only indirectly and in very noisy fashion, by sampling from [n] repeatedly, independently K times from the distribution p. The problem is to infer mix to high accuracy in transportation (earthmover) distance. We give the first efficient algorithms for learning this mixture model without making any restricting assumptions on the structure of the distribution mix. We bound the quality of the solution as a function of the size of the samples K and the number of samples used. Our model and results have applications to a variety of unsupervised learning scenarios, including learning topic models and collaborative filtering.

Additional Information

© 2015 ACM. Supported in part by the National Basic Research Program of China grants 2015CB358700, 2011CBA00300, 2011CBA00301, and the National NSFC grants 61202009, 61033001, 61361136003. Work performed in part at the Simons Institute for the Theory of Computing. Supported by BSF grant number 2012333, and by the Israeli Center of Excellence on Algorithms. Supported in part by NSF grant 1319745. Work performed in part at the Simons Institute for the Theory of Computing. Supported in part by NSERC grant 32760-06, an NSERC Discovery Accelerator Supplement Award, and an Ontario Early Researcher Award.

Attached Files

Submitted - 1504.02526v1.pdf

Files

1504.02526v1.pdf

Files (252.8 kB)

Name Size Download all
md5:e3bff3e9533c820511a0583de8699a6b
252.8 kB Preview Download

Additional details

Identifiers

Eprint ID
58890
DOI
10.1145/2746539.2746584
Resolver ID
CaltechAUTHORS:20150715-085732981

Funding

National Basic Research Program of China
2015CB358700
National Basic Research Program of China
2011CBA00300
National Basie Research Program of China
2011CBA00301
National Science Foundation of China (NSFC)
61202009
National Science Foundation of China (NSFC)
61033001
National Science Foundation of China (NSFC)
61361136003
Simons Institute for the Theory of Computing
Binational Science Foundation (USA-Israel)
2012333
Israeli Center of Excellence on Algorithms
NSF
1319745
Natural Sciences and Engineering Research Council of Canada (NSERC)
32760-06
Ontario Early Researcher Award

Dates

Created
2015-07-22
Created from EPrint's datestamp field
Updated
2021-11-10
Created from EPrint's last_modified field