A Caltech Library Service

MAVID: Constrained ancestral alignment of multiple sequences

Bray, Nicolas and Pachter, Lior (2004) MAVID: Constrained ancestral alignment of multiple sequences. Genome Research, 14 (4). pp. 693-699. ISSN 1088-9051. PMCID PMC383315. doi:10.1101/gr.1960404.

[img] PDF - Published Version
Creative Commons Attribution Non-commercial.

[img] PDF - Submitted Version
See Usage Policy.


Use this Persistent URL to link to this item:


We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein-based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region, which consists of 1.8 Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments, an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse, and rat genomes.

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle Paper
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2004 Cold Spring Harbor Laboratory Press. The Authors acknowledge that six months after the full-issue publication date, the Article will be distributed under a Creative Commons CC-BY-NC License (Attribution-NonCommercial 4.0 International License, Accepted November 17, 2003. Received September 10, 2003. ograms. We thank Von Bing Yap for helping with the evolutionary models used in MAVID. Thanks to Ingileif Brynd's Hallgr'msdóttir for her help throughout the project and for her comments on the final manuscript. The data used in the multiple alignment of the CFTR region was generated by the NIH Intramural Sequencing Center (, and was used subject to their 6-mo hold policy. The HIV sequences were downloaded from the HIV database ( Thanks also to the Rat Sequencing Consortium, both for providing the rat sequence to align, and for facilitating helpful collaborations and discussions. Finally, we thank the anonymous reviewers for their insightful comments and suggestions. This work was partially supported by funding from the NIH (grant R01-HG02362-01) and the Berkeley PGA grant from the NHLBI. The publication costs of this article were defrayed in part by payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 USC section 1734 solely to indicate this fact.
Funding AgencyGrant Number
National Heart, Lung, and Blood InstituteUNSPECIFIED
Issue or Number:4
PubMed Central ID:PMC383315
Record Number:CaltechAUTHORS:20170307-074220313
Persistent URL:
Official Citation:MAVID: Constrained Ancestral Alignment of Multiple Sequences Nicolas Bray and Lior Pachter Genome Res. April 2004 14: 693-699; doi:10.1101/gr.1960404
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:74826
Deposited By: Tony Diaz
Deposited On:07 Mar 2017 15:54
Last Modified:11 Nov 2021 05:30

Repository Staff Only: item control page