Published April 25, 2024 | Published
Journal Article Open

Perspectives on tracking data reuse across biodata resources

Ross, Karen E. ORCID icon
Bastian, Frederic B. ORCID icon
Buys, Matt ORCID icon
Cook, Charles E. ORCID icon
D'Eustachio, Peter ORCID icon
Harrison, Melissa ORCID icon
Hermjakob, Henning ORCID icon
Li, Donghui ORCID icon
Lord, Phillip ORCID icon
Natale, Darren A. ORCID icon
Peters, Bjoern ORCID icon
Sternberg, Paul W.1 ORCID icon
Su, Andrew I. ORCID icon
Thakur, Matthew ORCID icon
Thomas, Paul D. ORCID icon
Bateman, Alex ORCID icon
Bateman, Alex ORCID icon
Martin, Maria-Jesus ORCID icon
Orchard, Sandra ORCID icon
Magrane, Michele ORCID icon
Ahmad, Shadab
Bowler-Barnett, Emily H. ORCID icon
Bye-A-Jee, Hema ORCID icon
Denny, Paul ORCID icon
Dogan, Tunca ORCID icon
Ebenezer, ThankGod ORCID icon
Fan, Jun ORCID icon
da Costa Gonzales, Leonardo Jose ORCID icon
Hussein, Abdulrahman ORCID icon
Ignatchenko, Alexandr ORCID icon
Insana, Giuseppe ORCID icon
Ishtiaq, Rizwan ORCID icon
Joshi, Vishal ORCID icon
Jyothi, Dushyanth ORCID icon
Kandasaamy, Swaathi ORCID icon
Lock, Antonia ORCID icon
Luciani, Aurelien ORCID icon
Luo, Jie
Lussi, Yvonne ORCID icon
Raposo, Pedro ORCID icon
Rice, Daniel L. ORCID icon
Saidi, Rabie ORCID icon
Santos, Rafael ORCID icon
Speretta, Elena ORCID icon
Stephenson, James ORCID icon
Totoo, Prabhat ORCID icon
Tyagi, Nidhi
Vasudev, Preethi
Warner, Kate ORCID icon
Zaru, Rossana ORCID icon
Wijerathne, Supun ORCID icon
Ibrahim, Khawaja Talal ORCID icon
Kim, Minjoon ORCID icon
Marin, Juan
Bridge, Alan J. ORCID icon
Aimo, Lucila ORCID icon
Argoud-Puy, Ghislaine ORCID icon
Auchincloss, Andrea H. ORCID icon
Axelsen, Kristian B. ORCID icon
Bansal, Parit ORCID icon
Baratin, Delphine ORCID icon
Batista Neto, Teresa M. ORCID icon
Bolleman, Jerven T. ORCID icon
Boutet, Emmanuel ORCID icon
Breuza, Lionel ORCID icon
Cabrera Gil, Blanca ORCID icon
Casals-Casas, Cristina ORCID icon
Coudert, Elisabeth ORCID icon
Cuche, Beatrice ORCID icon
de Castro, Edouard ORCID icon
Estreicher, Anne ORCID icon
Famiglietti, Maria L. ORCID icon
Feuermann, Marc ORCID icon
Gasteiger, Elisabeth ORCID icon
Gehant, Sebastien ORCID icon
Gos, Arnaud ORCID icon
Gruaz, Nadine ORCID icon
Hulo, Chantal ORCID icon
Hyka-Nouspikel, Nevila ORCID icon
Jungo, Florence ORCID icon
Kerhornou, Arnaud ORCID icon
Le Mercier, Philippe ORCID icon
Lieberherr, Damien ORCID icon
Masson, Patrick ORCID icon
Morgat, Anne ORCID icon
Pedruzzi, Ivo ORCID icon
Pilbout, Sandrine
Pourcel, Lucille ORCID icon
Poux, Sylvain ORCID icon
Pozzato, Monica ORCID icon
Pruess, Manuela ORCID icon
Redaschi, Nicole ORCID icon
Rivoire, Catherine ORCID icon
Sigrist, Christian J. A. ORCID icon
Sundaram, Shyamala ORCID icon
Sveshnikova, Anastasia ORCID icon
Wu, Cathy H. ORCID icon
Arighi, Cecilia N. ORCID icon
Chen, Chuming ORCID icon
Chen, Yongxing ORCID icon
Huang, Hongzhan ORCID icon
Laiho, Kati ORCID icon
Lehvaslaiho, Minna
McGarvey, Peter ORCID icon
Natale, Darren A. ORCID icon
Ross, Karen
Vinayaka, C. R.
Wang, Yuqi ORCID icon
Zhang, Jian
UniProt Consortium
  • 1. ROR icon California Institute of Technology

Abstract

Motivation

Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge.

Results

The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources.

Availability and implementation

Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).

Copyright and License

Acknowledgement

The UniProt Consortium: Alex Bateman, Maria-Jesus Martin, Sandra Orchard, Michele Magrane, Shadab Ahmad, Emily H. Bowler-Barnett, Hema Bye-A-Jee, Paul Denny, Tunca Dogan, ThankGod Ebenezer, Jun Fan, Leonardo Jose da Costa Gonzales, Abdulrahman Hussein, Alexandr Ignatchenko, Giuseppe Insana, Rizwan Ishtiaq, Vishal Joshi, Dushyanth Jyothi, Swaathi Kandasaamy, Antonia Lock, Aurelien Luciani, Jie Luo, Yvonne Lussi, Pedro Raposo, Daniel L. Rice, Rabie Saidi, Rafael Santos, Elena Speretta, James Stephenson, Prabhat Totoo, Nidhi Tyagi, Preethi Vasudev, Kate Warner, Rossana Zaru, Supun Wijerathne, Khawaja Talal Ibrahim, Minjoon Kim, Juan Marin at the EMBL—European Bioinformatics Institute; Alan J. Bridge, Lucila Aimo, Ghislaine Argoud-Puy, Andrea H. Auchincloss, Kristian B. Axelsen, Parit Bansal, Delphine Baratin, Teresa M. Batista Neto, Jerven T. Bolleman, Emmanuel Boutet, Lionel Breuza, Blanca Cabrera Gil, Cristina Casals-Casas, Elisabeth Coudert, Beatrice Cuche, Edouard de Castro, Anne Estreicher, Maria L. Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Sebastien Gehant, Arnaud Gos, Nadine Gruaz, Chantal Hulo, Nevila Hyka-Nouspikel, Florence Jungo, Arnaud Kerhornou, Philippe Le Mercier, Damien Lieberherr, Patrick Masson, Anne Morgat, Ivo Pedruzzi, Sandrine Pilbout, Lucille Pourcel, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Christian J. A. Sigrist, Shyamala Sundaram and Anastasia Sveshnikova at the SIB Swiss Institute of Bioinformatics.; Cathy H Wu, Cecilia N Arighi, Chuming Chen, Yongxing Chen, Hongzhan Huang, Kati Laiho, Minna Lehvaslaiho, Peter McGarvey, Darren A Natale, Karen Ross, C.R. Vinayaka, Yuqi Wang and Jian Zhang at the Protein Information Resource. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Data Availability

The data underlying this article are available in the article and in its online supplementary material. Summaries of the survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEz-bOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).

Supplementary data are available at Bioinformatics Advances online.

Contributions

Karen E. Ross (Conceptualization [lead], Investigation [equal], Writing—original draft [lead], Writing—review and editing [lead]), Fredric B. Bastian (Investigation [equal], Writing—review and editing [supporting]), Matt Buys (Investigation [equal], Writing—review and editing [supporting]), Charles E. Cook (Conceptualization [supporting], Investigation [equal], Writing—review and editing [supporting]), Peter D’Eustachio (Investigation [equal], Writing—review and editing [supporting]), Melissa Harrison (Investigation [equal], Writing—review and editing [supporting]), Henning Hermjakob (Conceptualization [supporting], Investigation [equal], Writing—review and editing [supporting]), Donghui Li (Investigation [equal], Writing—review and editing [supporting]), Phillip Lord (Investigation [equal], Writing—review and editing [supporting]), Darren A. Natale (Investigation [equal], Writing—review and editing [supporting]), Bjoern Peters (Investigation [equal], Writing—review and editing [supporting]), Paul W. Sternberg (Investigation [equal], Writing—review and editing [supporting]), Andrew I. Su (Conceptualization [supporting], Investigation [equal], Writing—review and editing [supporting]), Matthew Thakur (Investigation [equal], Writing—review and editing [supporting]), Paul D. Thomas (Investigation [equal, Writing—review and editing [supporting]), and Alex Bateman (Conceptualization [lead], Investigation [equal], Project administration [lead], Writing—review and editing [lead])

Conflict of Interest

A.B. is Editor-in-Chief of Bioinformatics Advances, but was not involved in the editorial process of this manuscript.

Funding

This work has been supported by the National Institutes of Health [U24HG007822; U24HG007822-09S1].

Files

vbae057.pdf
Files (992.5 kB)
Name Size Download all
md5:574afda99e8fd8c97c7201297b8fec09
16.2 kB Preview Download
md5:dde4c275bb5d020f1b33453984f12d15
976.4 kB Preview Download

Additional details

Created:
May 9, 2024
Modified:
May 9, 2024