Efficient querying of genomic reference databases with gget
- Creators
-
Luebbert, Laura
-
Pachter, Lior
Abstract
Motivation: A recurring challenge in interpreting genomic data is the assessment of results in the context of existing reference databases. With the increasing number of command line and Python users, there is a need for tools implementing automated, easy programmatic access to curated reference information stored in a diverse collection of large, public genomic databases. Results: gget is a free and open-source command line tool and Python package that enables efficient querying of genomic reference databases, such as Ensembl. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying required for genomic data analysis in a single line of code.
Additional Information
© The Author(s) 2023. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. We thank Kyung Hoi (Joseph) Min for advice on the command line interface, Matteo Guareschi for advice on Windows operability, and A. Sina Booeshaghi, Alessandro Groaz, Kristján Eldjárn Hjörleifsson and Ángel Gálvez-Merchán for insightful discussions about gget. Illustrations in Fig. 1 and Supplementary Figure S1 were created with BioRender.com. This work was supported by funding from the Biology and Bioengineering Division at the California Institute of Technology and the Chen Graduate Innovator Grant [CHEN.SYS3.CGIAFY21 to L.L.]; in part by National Institutes of Health (NIH) [U19MH114830 to L.P.]. Conflict of Interest: none declared.Attached Files
Published - btac836.pdf
Supplemental Material - btac836_supplementary_data.zip
Files
Name | Size | Download all |
---|---|---|
md5:e977586c97073e27a8c796cc80fc3da0
|
7.3 MB | Preview Download |
md5:26fb24ab7c1f2783f9ffc41a88ad4f2b
|
1.2 MB | Preview Download |
Additional details
- PMCID
- PMC9835474
- Eprint ID
- 122526
- Resolver ID
- CaltechAUTHORS:20230725-706344000.34
- Caltech Division of Biology and Biological Engineering
- Tianqiao and Chrissy Chen Institute for Neuroscience
- CHEN.SYS3.CGIAFY21
- NIH
- U19MH114830
- Created
-
2023-08-13Created from EPrint's datestamp field
- Updated
-
2023-08-14Created from EPrint's last_modified field
- Caltech groups
- Tianqiao and Chrissy Chen Institute for Neuroscience, Division of Biology and Biological Engineering