Perspectives on tracking data reuse across biodata resources
- Creators
- Ross, Karen E.
- Bastian, Frederic B.
- Buys, Matt
- Cook, Charles E.
- D'Eustachio, Peter
- Harrison, Melissa
- Hermjakob, Henning
- Li, Donghui
- Lord, Phillip
- Natale, Darren A.
- Peters, Bjoern
- Sternberg, Paul W.1
- Su, Andrew I.
- Thakur, Matthew
- Thomas, Paul D.
- Bateman, Alex
- Bateman, Alex
- Martin, Maria-Jesus
- Orchard, Sandra
- Magrane, Michele
- Ahmad, Shadab
- Bowler-Barnett, Emily H.
- Bye-A-Jee, Hema
- Denny, Paul
- Dogan, Tunca
- Ebenezer, ThankGod
- Fan, Jun
- da Costa Gonzales, Leonardo Jose
- Hussein, Abdulrahman
- Ignatchenko, Alexandr
- Insana, Giuseppe
- Ishtiaq, Rizwan
- Joshi, Vishal
- Jyothi, Dushyanth
- Kandasaamy, Swaathi
- Lock, Antonia
- Luciani, Aurelien
- Luo, Jie
- Lussi, Yvonne
- Raposo, Pedro
- Rice, Daniel L.
- Saidi, Rabie
- Santos, Rafael
- Speretta, Elena
- Stephenson, James
- Totoo, Prabhat
- Tyagi, Nidhi
- Vasudev, Preethi
- Warner, Kate
- Zaru, Rossana
- Wijerathne, Supun
- Ibrahim, Khawaja Talal
- Kim, Minjoon
- Marin, Juan
- Bridge, Alan J.
- Aimo, Lucila
- Argoud-Puy, Ghislaine
- Auchincloss, Andrea H.
- Axelsen, Kristian B.
- Bansal, Parit
- Baratin, Delphine
- Batista Neto, Teresa M.
- Bolleman, Jerven T.
- Boutet, Emmanuel
- Breuza, Lionel
- Cabrera Gil, Blanca
- Casals-Casas, Cristina
- Coudert, Elisabeth
- Cuche, Beatrice
- de Castro, Edouard
- Estreicher, Anne
- Famiglietti, Maria L.
- Feuermann, Marc
- Gasteiger, Elisabeth
- Gehant, Sebastien
- Gos, Arnaud
- Gruaz, Nadine
- Hulo, Chantal
- Hyka-Nouspikel, Nevila
- Jungo, Florence
- Kerhornou, Arnaud
- Le Mercier, Philippe
- Lieberherr, Damien
- Masson, Patrick
- Morgat, Anne
- Pedruzzi, Ivo
- Pilbout, Sandrine
- Pourcel, Lucille
- Poux, Sylvain
- Pozzato, Monica
- Pruess, Manuela
- Redaschi, Nicole
- Rivoire, Catherine
- Sigrist, Christian J. A.
- Sundaram, Shyamala
- Sveshnikova, Anastasia
- Wu, Cathy H.
- Arighi, Cecilia N.
- Chen, Chuming
- Chen, Yongxing
- Huang, Hongzhan
- Laiho, Kati
- Lehvaslaiho, Minna
- McGarvey, Peter
- Natale, Darren A.
- Ross, Karen
- Vinayaka, C. R.
- Wang, Yuqi
- Zhang, Jian
- UniProt Consortium
Abstract
Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge.
The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources.
Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).
Copyright and License
Acknowledgement
The UniProt Consortium: Alex Bateman, Maria-Jesus Martin, Sandra Orchard, Michele Magrane, Shadab Ahmad, Emily H. Bowler-Barnett, Hema Bye-A-Jee, Paul Denny, Tunca Dogan, ThankGod Ebenezer, Jun Fan, Leonardo Jose da Costa Gonzales, Abdulrahman Hussein, Alexandr Ignatchenko, Giuseppe Insana, Rizwan Ishtiaq, Vishal Joshi, Dushyanth Jyothi, Swaathi Kandasaamy, Antonia Lock, Aurelien Luciani, Jie Luo, Yvonne Lussi, Pedro Raposo, Daniel L. Rice, Rabie Saidi, Rafael Santos, Elena Speretta, James Stephenson, Prabhat Totoo, Nidhi Tyagi, Preethi Vasudev, Kate Warner, Rossana Zaru, Supun Wijerathne, Khawaja Talal Ibrahim, Minjoon Kim, Juan Marin at the EMBL—European Bioinformatics Institute; Alan J. Bridge, Lucila Aimo, Ghislaine Argoud-Puy, Andrea H. Auchincloss, Kristian B. Axelsen, Parit Bansal, Delphine Baratin, Teresa M. Batista Neto, Jerven T. Bolleman, Emmanuel Boutet, Lionel Breuza, Blanca Cabrera Gil, Cristina Casals-Casas, Elisabeth Coudert, Beatrice Cuche, Edouard de Castro, Anne Estreicher, Maria L. Famiglietti, Marc Feuermann, Elisabeth Gasteiger, Sebastien Gehant, Arnaud Gos, Nadine Gruaz, Chantal Hulo, Nevila Hyka-Nouspikel, Florence Jungo, Arnaud Kerhornou, Philippe Le Mercier, Damien Lieberherr, Patrick Masson, Anne Morgat, Ivo Pedruzzi, Sandrine Pilbout, Lucille Pourcel, Sylvain Poux, Monica Pozzato, Manuela Pruess, Nicole Redaschi, Catherine Rivoire, Christian J. A. Sigrist, Shyamala Sundaram and Anastasia Sveshnikova at the SIB Swiss Institute of Bioinformatics.; Cathy H Wu, Cecilia N Arighi, Chuming Chen, Yongxing Chen, Hongzhan Huang, Kati Laiho, Minna Lehvaslaiho, Peter McGarvey, Darren A Natale, Karen Ross, C.R. Vinayaka, Yuqi Wang and Jian Zhang at the Protein Information Resource. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Data Availability
The data underlying this article are available in the article and in its online supplementary material. Summaries of the survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEz-bOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).
Supplementary data are available at Bioinformatics Advances online.
Contributions
Karen E. Ross (Conceptualization [lead], Investigation [equal], Writing—original draft [lead], Writing—review and editing [lead]), Fredric B. Bastian (Investigation [equal], Writing—review and editing [supporting]), Matt Buys (Investigation [equal], Writing—review and editing [supporting]), Charles E. Cook (Conceptualization [supporting], Investigation [equal], Writing—review and editing [supporting]), Peter D’Eustachio (Investigation [equal], Writing—review and editing [supporting]), Melissa Harrison (Investigation [equal], Writing—review and editing [supporting]), Henning Hermjakob (Conceptualization [supporting], Investigation [equal], Writing—review and editing [supporting]), Donghui Li (Investigation [equal], Writing—review and editing [supporting]), Phillip Lord (Investigation [equal], Writing—review and editing [supporting]), Darren A. Natale (Investigation [equal], Writing—review and editing [supporting]), Bjoern Peters (Investigation [equal], Writing—review and editing [supporting]), Paul W. Sternberg (Investigation [equal], Writing—review and editing [supporting]), Andrew I. Su (Conceptualization [supporting], Investigation [equal], Writing—review and editing [supporting]), Matthew Thakur (Investigation [equal], Writing—review and editing [supporting]), Paul D. Thomas (Investigation [equal, Writing—review and editing [supporting]), and Alex Bateman (Conceptualization [lead], Investigation [equal], Project administration [lead], Writing—review and editing [lead])
Conflict of Interest
A.B. is Editor-in-Chief of Bioinformatics Advances, but was not involved in the editorial process of this manuscript.
Funding
This work has been supported by the National Institutes of Health [U24HG007822; U24HG007822-09S1].
Files
Name | Size | Download all |
---|---|---|
md5:574afda99e8fd8c97c7201297b8fec09
|
16.2 kB | Preview Download |
md5:dde4c275bb5d020f1b33453984f12d15
|
976.4 kB | Preview Download |
Additional details
- PMCID
- PMC11076920
- National Institutes of Health
- U24HG007822
- National Institutes of Health
- U24HG007822-09S1
- Caltech groups
- Division of Biology and Biological Engineering