Published May 2024 | Version Published
Journal Article Open

FAIR Header Reference genome: a TRUSTworthy standard

Abstract

The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability and Technology. The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR’s design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.

Copyright and License

Acknowledgement

The authors thank Natalie Meyers with The Lucy Family Institute for Data and Society at the University of Notre Dame for conversations around research communities focused on metadata standards that were used in the writing of the manuscript. The authors thank Monica Poelchau with the National Agriculture Library and Sarah Dyer at EMBL EBI for relevant discussions.

Funding

Adam Wright is supported by the Adaptive Oncology Programme at the Ontario Institute for Cancer Research. During a portion of this project, David Molik is supported by the USDA Agricultural Research Service (ARS) HQ Research Associate program in Big Data.

A portion of this work was carried out by the Tropical Pest Genetics and Molecular Biology Research Unit, ARS Project number 2040-22430-028-000D.

A portion of this work was carried out by the Arthropod borne Animal Diseases Research Unit, ARS Project numbers 3020-32000-018-000-D, 3020-32000-020-000-D, and 3020- 32000-019-000-D.

This research used resources provided by the SCINet project of the USDA Agricultural Research Service, ARS project number 0500-00093-001-00-D.

We gratefully acknowledge the support of the WormBase grant (U24HG002223), which provided funding for this research. This grant has been instrumental in supporting the contributions of Karen Yook, Daniela Raciti, Paul Sternberg, Adam Wright, Lincoln Stein and Scott Cain. Their efforts have significantly contributed to the success of this project.

Data Availability

The code published for FHR is in the public domain per the United States 17 U.S.C. § 105. The code and specification are freely available for use and modification (Table 4).

Files

bbae122.pdf

Files (8.0 MB)

Name Size Download all
md5:9ea8f5ecc67c52c47f61b9dd1343936f
1.0 MB Preview Download
md5:4bf28873fe3a6fcc674beb1f68b9db3e
24.3 kB Preview Download
md5:7c7e505d40f81b222c7e7043bf8f0dda
1.0 MB Preview Download
md5:cf5565ed6a92e59dd72867e07bd3cd8f
1.1 MB Preview Download
md5:95447a6f20703e7b70281e2346d50132
1.2 MB Preview Download
md5:1ce3a9617cb61864614e6a789ab53362
1.0 MB Preview Download
md5:7212e813d2187b39879e1fce986a1273
577.3 kB Preview Download
md5:9bfe4ab42a81b8edc38500381026fdc4
882.4 kB Preview Download
md5:745577ec2e8c30188fb9b629d119ccb3
1.1 MB Preview Download

Additional details

Additional titles

Alternative title
DATA RESOURCES AND ANALYSES FAIR Header Reference genome: A TRUSTworthy standard

Identifiers

ISSN
1477-4054
PMCID
PMC10981671

Related works

Is new version of
Discussion Paper: 10.1101/2023.11.29.569306 (DOI)
Discussion Paper: PMC10705436 (PMCID)

Funding

Ontario Institute for Cancer Research
Agricultural Research Service
2040-22430-028-000D
Agricultural Research Service
3020-32000-018-000-D
Agricultural Research Service
3020-32000-020-000-D
Agricultural Research Service
3020-32000-019-000-D
Agricultural Research Service
0500-00093-001-00-D
National Institutes of Health
U24HG002223

Caltech Custom Metadata

Caltech groups
Division of Biology and Biological Engineering (BBE), WormBase