Multigenome DNA sequence conservation identifies Hox cis-regulatory elements
Abstract
To learn how well ungapped sequence comparisons of multiple species can predict cis-regulatory elements in Caenorhabditis elegans, we made such predictions across the large, complex ceh-13/lin-39 locus and tested them transgenically. We also examined how prediction quality varied with different genomes and parameters in our comparisons. Specifically, we sequenced ∼0.5% of the C. brenneri and C. sp. 3 PS1010 genomes, and compared five Caenorhabditis genomes (C. elegans, C. briggsae, C. brenneri, C. remanei, and C. sp. 3 PS1010) to find regulatory elements in 22.8 kb of noncoding sequence from the ceh-13/lin-39 Hox subcluster. We developed the MUSSA program to find ungapped DNA sequences with N-way transitive conservation, applied it to the ceh-13/lin-39 locus, and transgenically assayed 21 regions with both high and low degrees of conservation. This identified 10 functional regulatory elements whose activities matched known ceh-13/lin-39 expression, with 100% specificity and a 77% recovery rate. One element was so well conserved that a similar mouse Hox cluster sequence recapitulated the native nematode expression pattern when tested in worms. Our findings suggest that ungapped sequence comparisons can predict regulatory elements genome-wide.
Additional Information
© 2008, Cold Spring Harbor Laboratory Press. Received August 26, 2008; accepted in revised form September 17, 2008. Published in Advance November 3, 2008, doi:10.1101/gr.085472.108 We dedicate this study to the memory of E.B. Lewis, who pioneered the analysis of Hox clusters at Caltech. We thank C.T. Brown for discussions, N. Mullaney for work on an early version of MUSSA, E. Moon for aid in fosmid library construction, and E. Rubin and his colleagues at the DOE JGI for fosmid sequencing and assembly. We thank L.R. Baugh, C.T. Brown, C. Dalal, J. Green, M. Kato, K. Kiontke, A. Mortazavi, A. Seah, and B. Williams for comments on the manuscript. Some nematode strains used in this work were provided by the Caenorhabditis Genetics Center, which is funded by the NIH National Center for Research Resources (NCRR). Unpublished metazoan genomic sequences were generously provided by the DOE JGI and GeneDB. This work was supported by grants from DOE to B.J.W. and P.W.S., from NASA to B.J.W., from NIH to B.J.W., and from the HHMI, with which P.W.S. is an Investigator. Supplemental material is available online at www.genome.org. The sequence data from this study have been submitted to GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) under accession nos. FJ362353–FJ36238.Attached Files
Published - KUNgr08.pdf
Files
Name | Size | Download all |
---|---|---|
md5:5220d0df67fcefb0fbfc81118c1a9a03
|
1.5 MB | Preview Download |
Additional details
- PMCID
- PMC2593573
- Eprint ID
- 12867
- Resolver ID
- CaltechAUTHORS:KUNgr08
- Department of Energy (DOE)
- NASA
- NIH
- Howard Hughes Medical Institute (HHMI)
- Created
-
2009-01-06Created from EPrint's datestamp field
- Updated
-
2021-11-08Created from EPrint's last_modified field