Learning Arbitrary Statistical Mixtures of Discrete Distributions
Abstract
We study the problem of learning from unlabeled samples very general statistical mixture models on large finite sets. Specifically, the model to be learned, mix, is a probability distribution over probability distributions p, where each such p is a probability distribution over [n] = {1,2,...,n}. When we sample from mix, we do not observe p directly, but only indirectly and in very noisy fashion, by sampling from [n] repeatedly, independently K times from the distribution p. The problem is to infer mix to high accuracy in transportation (earthmover) distance. We give the first efficient algorithms for learning this mixture model without making any restricting assumptions on the structure of the distribution mix. We bound the quality of the solution as a function of the size of the samples K and the number of samples used. Our model and results have applications to a variety of unsupervised learning scenarios, including learning topic models and collaborative filtering.
Additional Information
© 2015 ACM. Supported in part by the National Basic Research Program of China grants 2015CB358700, 2011CBA00300, 2011CBA00301, and the National NSFC grants 61202009, 61033001, 61361136003. Work performed in part at the Simons Institute for the Theory of Computing. Supported by BSF grant number 2012333, and by the Israeli Center of Excellence on Algorithms. Supported in part by NSF grant 1319745. Work performed in part at the Simons Institute for the Theory of Computing. Supported in part by NSERC grant 32760-06, an NSERC Discovery Accelerator Supplement Award, and an Ontario Early Researcher Award.Attached Files
Submitted - 1504.02526v1.pdf
Files
Name | Size | Download all |
---|---|---|
md5:e3bff3e9533c820511a0583de8699a6b
|
252.8 kB | Preview Download |
Additional details
- Eprint ID
- 58890
- DOI
- 10.1145/2746539.2746584
- Resolver ID
- CaltechAUTHORS:20150715-085732981
- National Basic Research Program of China
- 2015CB358700
- National Basic Research Program of China
- 2011CBA00300
- National Basie Research Program of China
- 2011CBA00301
- National Science Foundation of China (NSFC)
- 61202009
- National Science Foundation of China (NSFC)
- 61033001
- National Science Foundation of China (NSFC)
- 61361136003
- Simons Institute for the Theory of Computing
- Binational Science Foundation (USA-Israel)
- 2012333
- Israeli Center of Excellence on Algorithms
- NSF
- 1319745
- Natural Sciences and Engineering Research Council of Canada (NSERC)
- 32760-06
- Ontario Early Researcher Award
- Created
-
2015-07-22Created from EPrint's datestamp field
- Updated
-
2021-11-10Created from EPrint's last_modified field