A Caltech Library Service

Multiple organism gene finding by collapsed Gibbs sampling

Chatterji, Sourav and Pachter, Lior (2004) Multiple organism gene finding by collapsed Gibbs sampling. In: Proceedings of the eighth annual international conference on Computational molecular biology (RECOMB '04). ACM , New York, NY, pp. 187-193. ISBN 1-58113-755-9.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


The Gibbs sampling method has been widely used for sequence analysis after it was successfully applied to the problem of identifying regulatory motif sequences upstream of genes. Since then numerous variants of the original idea have emerged, however in all cases the application has been to finding short motifs in collections of short sequences (typically less than 100 nucleotides long). In this paper we introduce a Gibbs sampling approach for identifying genes in multiple large genomic sequences up to hundreds of kilobases long. This approach leverages the evolutionary relationships between the sequences to improve the gene predictions, without explicitly aligning the sequences. We have applied our method to the analysis of genomic sequence from 14 genomic regions, totaling roughly 1.8Mb of sequence in each organism. We show that our approach compares favorably with existing ab-initio approaches to gene finding, including pairwise comparison based gene prediction methods which make explicit use of alignments. Furthermore, excellent performance can be obtained with as little as 4 organisms, and the method overcomes a number of difficulties of previous comparison based gene finding approaches: it is robust with respect to genomic rearrangements, can work with draft sequence, and is fast (linear in the number and length of the sequences). It can also be seamlessly integrated with Gibbs sampling motif detection methods.

Item Type:Book Section
Related URLs:
URLURL TypeDescription
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2004 ACM. Thanks to Simon Cawley for helpful discussions and comments. This work was partially funded with a grant from the NIH (R01: HG2362-1).
Funding AgencyGrant Number
NIHR01 HG2362-1
Subject Keywords:Gibbs sampling, hidden Markov model, gene finding
Classification Code:J.3 [Computer Applications]: LIFE AND MEDICAL SCIENCES—Biology and genetics
Record Number:CaltechAUTHORS:20170308-141248581
Persistent URL:
Official Citation:Sourav Chatterji and Lior Pachter. 2004. Multiple organism gene finding by collapsed gibbs sampling. In Proceedings of the eighth annual international conference on Resaerch in computational molecular biology (RECOMB '04). ACM, New York, NY, USA, 187-193. DOI=
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74919
Deposited By: George Porter
Deposited On:08 Mar 2017 22:30
Last Modified:15 Nov 2021 16:29

Repository Staff Only: item control page