CaltechAUTHORS
  A Caltech Library Service

A Spectral Algorithm for Latent Dirichlet Allocation

Anandkumar, Animashree and Foster, Dean P. and Hsu, Daniel and Kakade, Sham M. and Liu, Yi-Kai (2015) A Spectral Algorithm for Latent Dirichlet Allocation. Algorithmica, 72 (1). pp. 193-214. ISSN 0178-4617. https://resolver.caltech.edu/CaltechAUTHORS:20170920-142816744

[img] PDF - Submitted Version
See Usage Policy.

302Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20170920-142816744

Abstract

Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. The increased representational power comes at the cost of a more challenging unsupervised learning problem for estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of multi-view models and topic models, including latent Dirichlet allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method is based on an efficiently computable orthogonal tensor decomposition of low-order moments.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1007/s00453-014-9909-1DOIArticle
https://link.springer.com/article/10.1007%2Fs00453-014-9909-1PublisherArticle
http://rdcu.be/v2FGPublisherFree ReadCube access
https://arxiv.org/abs/1204.6703arXivDiscussion Paper
Additional Information:© 2014 Springer Science+Business Media New York. Received: 01 October 2013; Accepted: 12 June 2014; First Online: 03 July 2014. We thank Kamalika Chaudhuri, Adam Kalai, Percy Liang, Chris Meek, David Sontag, and Tong Zhang for valuable insights. We also thank Rong Ge for sharing preliminary results (in [8]) and the anonymous reviewers for their comments, suggestions, and pointers to references. Part of this work was completed while DH was a postdoctoral researcher at Microsoft Research New England, and while DPF, YKL, and AA were visiting the same lab. AA is supported in part by Microsoft Faculty Fellowship, NSF Career award CCF-1254106, NSF Award CCF-1219234, NSF BIGDATA IIS-1251267 and ARO YIP Award W911NF-13-1-0084.
Funders:
Funding AgencyGrant Number
Microsoft ResearchUNSPECIFIED
NSFCCF-1254106
NSFCCF-1219234
NSFIIS-1251267
Army Research Office (ARO)W911NF-13-1-0084
Subject Keywords:Topic models; Mixture models; Method of moments; Latent Dirichlet allocation
Issue or Number:1
Record Number:CaltechAUTHORS:20170920-142816744
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20170920-142816744
Official Citation:Anandkumar, A., Foster, D.P., Hsu, D. et al. Algorithmica (2015) 72: 193. https://doi.org/10.1007/s00453-014-9909-1
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:81632
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:20 Sep 2017 21:40
Last Modified:03 Oct 2019 18:45

Repository Staff Only: item control page