A Caltech Library Service

Unsupervised learning of transcriptional regulatory networks via latent tree graphical models

Gitter, Anthony and Huang, Furong and Valluvan, Ragupathyraj and Fraenkel, Ernest and Anandkumar, Animashree (2016) Unsupervised learning of transcriptional regulatory networks via latent tree graphical models. . (Unpublished)

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


Gene expression is a readily-observed quantification of transcriptional activity and cellular state that enables the recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory networks from gene expression data is a problem that has attracted much attention, but previous work often makes the simplifying (but unrealistic) assumption that regulator activity is represented by mRNA levels. We use a latent tree graphical model to analyze gene expression without relying on transcription factor expression as a proxy for regulator activity. The latent tree model is a type of Markov random field that includes both observed gene variables and latent (hidden) variables, which factorize on a Markov tree. Through efficient unsupervised learning approaches, we determine which groups of genes are co-regulated by hidden regulators and the activity levels of those regulators. Post-processing annotates many of these discovered latent variables as specific transcription factors or groups of transcription factors. Other latent variables do not necessarily represent physical regulators but instead reveal hidden structure in the gene expression such as shared biological function. We apply the latent tree graphical model to a yeast stress response dataset. In addition to novel predictions, such as condition-specific binding of the transcription factor Msn4, our model recovers many known aspects of the yeast regulatory network. These include groups of co-regulated genes, condition-specific regulator activity, and combinatorial regulation among transcription factors. The latent tree graphical model is a general approach for analyzing gene expression data that requires no prior knowledge of which possible regulators exist, regulator activity, or where transcription factors physically bind.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Additional Information:FH is supported by NSF BIGDATA IIS-1251267. AA is supported in part by a Microsoft Faculty Fellowship, NSF Career Award CCF-1254106, NSF Award CCF-1219234, and ARO YIP Award W911NF-13-1-0084. EF is supported in part by the Institute for Collaborative Biotechnologies through grant W911NF-09-0001 from the US Army Research Office (the content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred) and by NIH grant R01-GM089903. Authors' Contributions: AG and FH collected the data and performed the computational analysis. FH and RV implemented the latent tree algorithm. AG, FH, EF, and AA designed the study, analyzed the data, and wrote the manuscript. EF and AA supervised the study. All authors read and approved the final manuscript.
Funding AgencyGrant Number
Microsoft Faculty FellowshipUNSPECIFIED
Army Research Office (ARO)W911NF-13-1-0084
Army Research Office (ARO)W911NF-09-0001
NIHR01 GM089903
Record Number:CaltechAUTHORS:20190401-123329660
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:94329
Deposited By: George Porter
Deposited On:01 Apr 2019 21:50
Last Modified:03 Oct 2019 21:03

Repository Staff Only: item control page