Published March 2024 | Version Published
Journal Article Open

Fast and scalable querying of eukaryotic linear motifs with gget elm

  • 1. ROR icon California Institute of Technology

Abstract

Motivation

Eukaryotic linear motifs (ELMs), or Short Linear Motifs, are protein interaction modules that play an essential role in cellular processes and signaling networks and are often involved in diseases like cancer. The ELM database is a collection of manually curated motif knowledge from scientific papers. It has become a crucial resource for investigating motif biology and recognizing candidate ELMs in novel amino acid sequences. Users can search amino acid sequences or UniProt Accessions on the ELM resource web interface. However, as with many web services, there are limitations in the swift processing of large-scale queries through the ELM web interface or API calls, and, therefore, integration into protein function analysis pipelines is limited.

Results

To allow swift, large-scale motif analyses on protein sequences using ELMs curated in the ELM database, we have extended the gget suite of Python and command line tools with a new module, gget elm, which does not rely on the ELM server for efficiently finding candidate ELMs in user-submitted amino acid sequences and UniProt Accessions. gget elm increases accessibility to the information stored in the ELM database and allows scalable searches for motif-mediated interaction sites in the amino acid sequences.

Availability and implementation

The manual and source code are available at https://github.com/pachterlab/gget.

Copyright and License

Acknowledgement

We thank the expert curators of the ELM database for providing an excellent resource. We thank Dr Toby Gibson for the valuable feedback on the manuscript. We also thank Candace Rypisi and the rest of the Summer Undergraduate Research Fellowships (SURF) program staff for facilitating valuable research opportunities for undergraduate students and mentorship opportunities for graduate students at Caltech. Illustrations in Fig. 2 were created with BioRender.com.

Contributions

L.L. and L.P. conceived the project after listening to a lecture by Prof. Amy E. Keating. L.L., C.H., and M.K. designed the gget elm approach. L.L. and C.H. wrote the gget elm software, with C.H. being the primary developer under the supervision of L.L. L.L. is the primary developer of the gget software, and M.K. is the primary developer of the ELM resource. L.L. wrote the initial draft of the manuscript. C.H., M.K., and L.P. provided feedback on the manuscript. All authors reviewed and approved the manuscript.

Funding

This work was supported by funding from the Biology and Bioengineering Division at the California Institute of Technology and the Chen Graduate Innovator Grant [CHEN.SYS3.CGIAFY21 to L.L.]. C.H. was supported by the Citadel Global Fixed Income SURF Fellowship. gget was supported by Pachter lab start-up funds.

Data Availability

Supplementary data are available at Bioinformatics online.

Conflict of Interest

None declared.

Files

btae095.pdf

Files (826.3 kB)

Name Size Download all
md5:d5a83b22de16af13c4e30849bc69017a
816.3 kB Preview Download
md5:d39fb685b7c5b6c4daedcec4c1b267b7
10.0 kB Download

Additional details

Identifiers

Funding

California Institute of Technology
Division of Biology and Biological Engineering
California Institute of Technology
Tianqiao and Chrissy Chen Institute for Neuroscience CHEN.SYS3.CGIAFY21
California Institute of Technology
Summer Undergraduate Research Fellowship

Caltech Custom Metadata

Caltech groups
Division of Biology and Biological Engineering (BBE), Tianqiao and Chrissy Chen Institute for Neuroscience