A Caltech Library Service

Regular Architecture (RegArch): A standard expression language for describing protein architectures

Ortega, Davi R. and Jensen, Grant J. (2019) Regular Architecture (RegArch): A standard expression language for describing protein architectures. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial No Derivatives.

[img] PDF - Supplemental Material
Creative Commons Attribution Non-commercial No Derivatives.


Use this Persistent URL to link to this item:


Domain architecture – the arrangement of features in a protein – exhibits syntactic patterns similar to the grammar of a language. This feature enables pattern mining for protein function prediction, comparative genomics, and studies of molecular evolution and complexity. To facilitate such work, here we propose Regular Architecture (RegArch), an expression language to describe syntactic patterns in protein architectures. Like the well-known Regular Expressions for text, RegArchs codify positional and non-positional patterns of elements into nested JSON objects. We describe the standard and provide a reference implementation in JavaScript to parse RegArchs and match annotated proteins.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper
Ortega, Davi R.0000-0002-8344-2335
Jensen, Grant J.0000-0003-1556-4864
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. bioRxiv preprint first posted online Jun. 22, 2019. The authors would like to thank the developers of MiST3: Luke Ulrich, Vadim Gumerov, and Ogun Adebali for helpful suggestions and comments on the manuscript. We also would like to thank Igor Zhulin for discussions on protein domain architectures that led to the idea of Regular Architecture. We also thank Dr. Catherine M. Oikonomou for helpful discussion and suggestions on the manuscript. This work was made possible through the support of the National Institutes of Health (grant R35 GM122588 to G.J.J.) and the John Templeton Foundation as part of the Boundaries of Life Initiative (grants 51250 & 60973 to G.J.J.).
Funding AgencyGrant Number
NIHR35 GM122588
John Templeton Foundation51250
John Templeton Foundation60973
Record Number:CaltechAUTHORS:20190624-114502121
Persistent URL:
Official Citation:Regular Architecture (RegArch): A standard expression language for describing protein architectures. Davi R. Ortega, Grant J. Jensen. bioRxiv 679910; doi:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:96667
Deposited By: Tony Diaz
Deposited On:24 Jun 2019 18:56
Last Modified:16 Nov 2021 17:22

Repository Staff Only: item control page