A Caltech Library Service

Efficient querying of genomic reference databases with gget

Luebbert, Laura and Pachter, Lior (2022) Efficient querying of genomic reference databases with gget. . (Unpublished)

[img] PDF (May 27, 2022) - Submitted Version
Creative Commons Attribution.

[img] PDF - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


A recurring challenge in interpreting genomic data is the assessment of results in the context of existing reference databases. Currently, there is no tool implementing automated, easy programmatic access to curated reference information stored in a diverse collection of large, public genomic databases. gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic reference databases, such as Ensembl. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying required for genomic data analysis in a single line of code. The manual and source code are available at

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemMaual and source code Information
Luebbert, Laura0000-0003-1379-2927
Pachter, Lior0000-0002-9164-6231
Alternate Title:Efficient querying of genomic databases for single-cell RNA-seq with gget
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. Version 1 - May 19, 2022; Version 2 - May 25, 2022; Version 3 - May 27, 2022. We thank Kyung Hoi (Joseph) Min for advice on the command-line interface, Matteo Guareschi for advice on Windows operability, and A. Sina Booeshaghi, Kristján Eldjárn Hjörleifsson, and Ángel Gálvez-Merchán for insightful discussions about gget. Illustrations in Figure 1 and Supplementary Figure 1 were created with Thanks to the wonderful staff at Dash Coffee Bar in Pasadena, who occasionally gave LL free banana bread to sustain this work. LL was supported by funding from the Biology and Bioengineering Division at the California Institute of Technology and the Chen Graduate Innovator Grant CHEN.SYS3.CGIAFY21. LP was supported in part by NIH U19MH114830. Conflict of Interest: none declared.
Group:Tianqiao and Chrissy Chen Institute for Neuroscience
Funding AgencyGrant Number
Caltech Division of Biology and Biological EngineeringUNSPECIFIED
Tianqiao and Chrissy Chen Institute for NeuroscienceCHEN.SYS3.CGIAFY21
Record Number:CaltechAUTHORS:20220520-512432000
Persistent URL:
Official Citation:Efficient querying of genomic reference databases with gget Laura Luebbert, Lior Pachter bioRxiv 2022.05.17.492392; doi:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:114820
Deposited By: George Porter
Deposited On:20 May 2022 17:56
Last Modified:07 Jul 2022 21:39

Repository Staff Only: item control page