A Caltech Library Service

Minimizing the overlap problem in protein NMR: a computational framework for precision amino acid labeling

Sweredoski, Michael J. and Donovan, Kevin J. and Nguyen, Bao D. and Shaka, A. J. and Baldi, Pierre (2007) Minimizing the overlap problem in protein NMR: a computational framework for precision amino acid labeling. Bioinformatics, 23 (21). pp. 2829-2835. ISSN 1460-2059. doi:10.1093/bioinformatics/btm406.

[img] PDF - Published Version
Creative Commons Attribution Non-commercial.

[img] Archive (ZIP) (Supplementary data) - Supplemental Material
Creative Commons Attribution Non-commercial.


Use this Persistent URL to link to this item:


Motivation: Recent advances in cell-free protein expression systems allow specific labeling of proteins with amino acids containing stable isotopes (¹⁵N, ¹³C and ²H), an important feature for protein structure determination by nuclear magnetic resonance (NMR) spectroscopy. Given this labeling ability, we present a mathematical optimization framework for designing a set of protein isotopomers, or labeling schedules, to reduce the congestion in the NMR spectra. The labeling schedules, which are derived by the optimization of a cost function, are tailored to a specific protein and NMR experiment. Results: For 2D ¹⁵N-¹H HSQC experiments, we can produce an exact solution using a dynamic programming algorithm in under 2 h on a standard desktop machine. Applying the method to a standard benchmark protein, calmodulin, we are able to reduce the number of overlaps in the 500 MHZ HSQC spectrum from 10 to 1 using four samples with a true cost function, and 10 to 4 if the cost function is derived from statistical estimates. On a set of 448 curated proteins from the BMRB database, we are able to reduce the relative percent congestion by 84.9% in their HSQC spectra using only four samples. Our method can be applied in a high-throughput manner on a proteomic scale using the server we developed. On a 100-node cluster, optimal schedules can be computed for every protein coded for in the human genome in less than a month. Availability: A server for creating labeling schedules for ¹⁵N-¹H HSQC experiments as well as results for each of the individual 448 proteins used in the test set is available at

Item Type:Article
Related URLs:
URLURL TypeDescription
Sweredoski, Michael J.0000-0003-0878-3831
Additional Information:© 2007 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. Received: 21 April 2007; Revision received: 06 July 2007; Accepted: 06 August 2007; Published: 25 September 2007. Work supported by an NIH grant (GM-66763) and a UC Discovery Grant bio05-10533 to A.J.S., and a Laurel Wilkening Faculty Innovation award, a Microsoft Faculty Research Award, an NIH Biomedical Informatics Training grant (LM-07443-01) and an NSF MRI grant (EIA-0321390) to P.B. Conflict of Interest: none declared.
Funding AgencyGrant Number
University of California, Irvinebio05-10533
Laurel Wilkening Faculty Innovation AwardUNSPECIFIED
Microsoft Faculty Research AwardUNSPECIFIED
Issue or Number:21
Record Number:CaltechAUTHORS:20200506-083932399
Persistent URL:
Official Citation:Michael J. Sweredoski, Kevin J. Donovan, Bao D. Nguyen, A.J. Shaka, Pierre Baldi, Minimizing the overlap problem in protein NMR: a computational framework for precision amino acid labeling, Bioinformatics, Volume 23, Issue 21, 1 November 2007, Pages 2829–2835,
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:103021
Deposited By: Tony Diaz
Deposited On:06 May 2020 16:03
Last Modified:16 Nov 2021 18:17

Repository Staff Only: item control page