A Caltech Library Service

A Unified Dynamic Programming Framework for the Analysis of Interacting Nucleic Acid Strands: Enhanced Models, Scalability, and Speed

Fornace, Mark E. and Porubsky, Nicholas J. and Pierce, Niles A. (2020) A Unified Dynamic Programming Framework for the Analysis of Interacting Nucleic Acid Strands: Enhanced Models, Scalability, and Speed. ACS Synthetic Biology, 9 (10). pp. 2665-2678. ISSN 2161-5063.

[img] PDF (Additional free energy model details, recursions for the complex ensemble with or without coaxial and dangle stacking, evaluation algebras for each physical quantity, operation orders for each physical quantity, distinguishability issues, additional ...) - Supplemental Material
See Usage Policy.


Use this Persistent URL to link to this item:


Dynamic programming algorithms within the NUPACK software suite enable analysis of nucleic acid sequences over complex and test tube ensembles containing arbitrary numbers of interacting strand species, serving the needs of researchers in molecular programming, nucleic acid nanotechnology, synthetic biology, and across the life sciences. Here, to enhance the underlying physical model, ensure scalability for large calculations, and achieve dramatic speedups when calculating diverse physical quantities over complex and test tube ensembles, we introduce a unified dynamic programming framework that combines three ingredients: (1) recursions that specify the dependencies between subproblems and incorporate the details of the structural ensemble and the free energy model, (2) evaluation algebras that define the mathematical form of each subproblem, (3) operation orders that specify the computational trajectory through the dependency graph of subproblems. The physical model is enhanced using new recursions that operate over the complex ensemble including coaxial and dangle stacking subensembles. The recursions are coded generically and then compiled with a quantity-specific evaluation algebra and operation order to generate an executable for each physical quantity: partition function, equilibrium base-pairing probabilities, MFE energy and proxy structure, suboptimal proxy structures, and Boltzmann sampled structures. For large complexes (e.g., 30 000 nt), scalability is achieved for partition function calculations using an overflow-safe evaluation algebra, and for equilibrium base-pairing probabilities using a backtrack-free operation order. A new blockwise operation order that treats subcomplex blocks for the complex species in a test tube ensemble enables dramatic speedups (e.g., 20–120× ) using vectorization and caching. With these performance enhancements, equilibrium analysis of substantial test tube ensembles can be performed in ≤ 1 min on a single computational core (e.g., partition function and equilibrium concentration for all complex species of up to six strands formed from two strand species of 300 nt each, or for all complex species of up to two strands formed from 80 strand species of 100 nt each). A new sampling algorithm simultaneously samples multiple structures from the complex ensemble to yield speedups of an order of magnitude or more as the number of structures increases above ≈10³. These advances are available within the NUPACK 4.0 code base ( which can be flexibly scripted using the all-new NUPACK Python module.

Item Type:Article
Related URLs:
URLURL TypeDescription
Pierce, Niles A.0000-0003-2367-4406
Additional Information:© 2020 American Chemical Society. Received: December 22, 2019; Published: September 10, 2020. We thank all the NUPACK users that have helped out as beta testers over the years, as well as the many NUPACK users that have emailed to request features or report bugs. We thank J.S. Bois for helpful discussions, J. Huang for assistance developing the NUPACK Python module, and S.J. Schulte for performing preliminary studies. This work was funded by the National Science Foundation (Software Elements NSF-OAC-1835414, INSPIRE NSF-CHE-1643606, Molecular Programming Project NSF-CCF-1317694), by the Programmable Molecular Technology Center (PMTC) within the Beckman Institute at Caltech, by the AWS/IST Cloud Credit Program at Caltech, by a Microsoft Azure sponsorship, by the National Institutes of Health (National Research Service Award T32 GM007616), by a Professorial Fellowship at Balliol College, University of Oxford, and by the Eastman Visiting Professorship at the University of Oxford. Author Contributions: M.E.F. and N.J.P. contributed equally. The authors declare no competing financial interest.
Funding AgencyGrant Number
Caltech Beckman InstituteUNSPECIFIED
Amazon Web ServicesUNSPECIFIED
Microsoft AzureUNSPECIFIED
NIH Predoctoral FellowshipT32 GM007616
University of OxfordUNSPECIFIED
Subject Keywords:RNA, DNA, base-pairing, secondary structure, equilibrium, complex ensemble, test tube ensemble, coaxial and dangle stacking subensembles, free energy model, dynamic programming algorithm, recursion, evaluation algebra, operation order, partition function, base-pairing probability, minimum free energy, concentration structure sampling
Issue or Number:10
Record Number:CaltechAUTHORS:20200911-133138429
Persistent URL:
Official Citation:A Unified Dynamic Programming Framework for the Analysis of Interacting Nucleic Acid Strands: Enhanced Models, Scalability, and Speed. Mark E. Fornace, Nicholas J. Porubsky, and Niles A. Pierce. ACS Synthetic Biology 2020 9 (10), 2665-2678; DOI: 10.1021/acssynbio.9b00523
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:105353
Deposited By: George Porter
Deposited On:14 Sep 2020 14:07
Last Modified:21 Oct 2020 16:24

Repository Staff Only: item control page