FAIRsharing, a cohesive community approach to the growth in standards, repositories and policies

In this modern, data-driven age, governments, funders and publishers expect greater transparency and reuse of research data, as well as greater access to and preservation of the data that supports research findings. Community-developed standards, such as those for the identification1 and reporting2 of data, underpin reproducible and reusable research, aid scholarly publishing, and drive both the discovery and evolution of scientific practice. The number of these standardization efforts, driven by large organizations or at the grass root level, has been on the rise since the early 2000s. Thousands of community-developed standards are available (across all disciplines), many of which have been created and/or implemented by several thousand data repositories. Nevertheless, their uptake by the research community, however, has been slow and uneven. This is mainly because investigators lack incentives to follow and adopt standards. The situation is exacerbated if standards are not promptly implemented by databases, repositories and other research tools, or endorsed by infrastructures. Furthermore, the fragmentation of community efforts results in the development of arbitrarily different, incompatible standards. In turn, this leads to standards becoming rapidly obsolete in fast-evolving research areas. As with any other digital object, standards, databases and repositories are dynamic in nature, with a ‘life cycle’ that encompasses formulation, development and maintenance; their status in this cycle may vary depending on the level of activity of the developing group or community. There is an urgent need for a service that enhances the information available on the evolving constellation of heterogeneous standards, databases and repositories, guides users in the selection of these resources, and that works with developers and maintainers of these resources to foster collaboration and promote harmonization. Such an informative and educational service is vital to reduce the knowledge gap among those involved in producing, managing, serving, curating, preserving, publishing or regulating data. A diverse set of stakeholders-representing academia, industry, funding agencies, standards organizations, infrastructure providers and scholarly publishers— both national and domain-specific as well global and general organizations— have come together as a community, representing the core adopters, advisory board members, and/or key collaborators of the FAIRsharing resource. Here, we introduce its mission and community network. We present an evaluation of the standards landscape, focusing on those for reporting data and metadata - the most diverse and numerous of the standards - and their implementation by databases and repositories. We report on the ongoing challenge to recommend resources, and we discuss the importance of making standards invisible to the end users. We report on the ongoing challenge to recommend resources, and we discuss the importance of making standards invisible to the end users. We present guidelines that highlight the role each stakeholder group must play to maximize the visibility and adoption of standards, databases and repositories.


Abstract
In this modern, data-driven age, governments, funders and publishers expect greater transparency and reuse of research data, as well as greater access to and preservation of the data that supports research findings. Community-developed standards, such as those for the identification 1 and reporting 2 of data, underpin reproducible and reusable research, aid scholarly publishing, and drive both the discovery and evolution of scientific practice. The number of these standardization efforts, driven by large organizations or at the grass root level, has been on the rise since the early 2000s.
Thousands of community-developed standards are available (across all disciplines), many of which have been created and/or implemented by several thousand data repositories. Nevertheless, their uptake by the research community, however, has been slow and uneven. This is mainly because investigators lack incentives to follow and adopt standards. The situation is exacerbated if standards are not promptly implemented by databases, repositories and other research tools, or endorsed by infrastructures. Furthermore, the fragmentation of community efforts results in the development of arbitrarily different, incompatible standards. In turn, this leads to standards becoming rapidly obsolete in fast-evolving research areas.
As with any other digital object, standards, databases and repositories are dynamic in nature, with a 'life cycle' that encompasses formulation, development and maintenance; their status in this cycle may vary depending on the level of activity of the developing group or community.
There is an urgent need for a service that enhances the information available on the evolving constellation of heterogeneous standards, databases and repositories, guides users in the selection of these resources, and that works with developers and maintainers of these resources to foster collaboration and promote harmonization. Such an informative and educational service is vital to reduce the knowledge gap among those involved in producing, managing, serving, curating, preserving, publishing or regulating data. A diverse set of stakeholders-representing academia, industry, funding agencies, standards organizations, infrastructure providers and scholarly publishers-both national and domain-specific as well global and general organizations-have come together as a community, representing the core adopters, advisory board members, and/or key collaborators of the FAIRsharing resource. Here, we introduce its mission and community network.
We present an evaluation of the standards landscape, focusing on those for reporting data and metadata -the most diverse and numerous of the standards -and their implementation by databases and repositories. We report on the ongoing challenge to recommend resources, and we discuss the importance of making standards invisible to the end users. We report on the ongoing challenge to recommend resources, and we discuss the importance of making standards invisible to the end users. We present guidelines that highlight the role each stakeholder group must play to maximize the visibility and adoption of standards, databases and repositories.

Mapping the landscape and tracking evolution
Working with and for data producers and consumers, and leveraging on our large network of international collaborators we have iteratively 2,3,4 developed FAIRsharing (https://fairsharing.org), an informative and educational resource that describes and interlinks community-driven standards, databases, repositories, and data policies. As of 10th of July 2018, FAIRsharing has over 2352 records: 1173 standards, 1071 data repositories, 112 data policies (of which 80 are from journals and publishers and 22 from funders). These records focus mainly on the life, agricultural, environmental, biomedical and health sciences, but FAIRsharing is progressively expanding to cover other disciplines. However, quantity is not the end goal. The richness and accuracy of each record are our priorities.
Using community participation, the FAIRsharing team precisely curates information on standards employed for the identification, citation and reporting of data and metadata, via four standards subtypes. Minimum reporting guidelines, also known as guiding principles or checklists, outline the necessary and sufficient information vital for contextualizing and understanding a digital object. Terminology artifacts or semantics, ranging from dictionaries to ontologies, provide definitions and unambiguous identification for concepts and objects. Models and formats define the structure and relationship of information for a conceptual model or schema, and include transmission formats to facilitate the exchange of data between different systems. Identifier schema are formal systems for resources and other digital objects that allow their unique and unambiguous identification. These standards range from generic and multi-disciplinary, to standards that are tailored for specific disciplines. FAIRsharing monitors their evolution, implementation in databases and repositories, and recommendation by journal and funder data policies.
Producers of standards, databases and repositories are able to claim the record(s) for the resource(s) they maintain or have developed; this functionality allows them to gain personal recognition and ensures that the description is accurate and up-to-date. All records and related updates by the maintainers are checked by a FAIRsharing curator. Conversely if a record is updated by a FAIRsharing curator, an email notification is sent to the record claimant which minimizes the introduction of inaccuracies. In communication with the community behind each resource, FAIRsharing assigns indicators to show the status in the resource's life cycle: 'Ready' for use, 'In Development', 'Uncertain' (when any attempt to reach out to the developing community has failed), and 'Deprecated' (when the community no longer mandates its use, together with an explanation where available).
To make standards, databases, repositories and data policies more discoverable and citable, we mint digital object identifiers (DOIs) for each record, which provides a persistent and unique identifier to enable referencing of these resources. In addition, the maintainers of each record can be linked with their Open Research and Contributor IDentifier (ORCID) profile (https://orcid.org).
Citing a FAIRsharing record for a standard, database and repository offers a unique, at-a-glance view of all descriptors and indicators pertaining to a resource, as well as any evidence of adoption or endorsement by a data policy or organisation. Referencing the record together with the resource's main paper (which provides a snapshot of its status at a given time) provides a complete reference for a resource. FAIRsharing has its own record to serve this very purpose: https://doi.org/10.25504/FAIRsharing.2abjs5.
Working with and for the community, FAIRsharing collects the necessary information to ensure that standards, databases, repositories and data policies are aligned with the FAIR data principles 5 . It ensures these resources are Findable (e.g., by providing persistent and unique identifiers, functionalities to register, claim, maintain, inter-link, search and discover them), Accessible (e.g., identifying their level of openness and/or licence type), encouraged to be Interoperable (e.g., highlighting which repositories implement the same standards to structure and exchange data), and Reusable (e.g., knowing the coverage of a standard and its level of endorsement by a number of repositories should encourage its use or extension in neighbouring domains, rather than reinvention). With the goal of being an interoperable component in the ecosystem of other services, FAIRsharing collaborates with many other infrastructure resources to cross-link each record to other registries, as well as within major FAIR-driven global initiatives, research and infrastructure programmes, many of which are generic and cross-disciplinary. Exemplars are listed in Box1, along with the roles that FAIRsharing plays. A 'live', updated list is maintained at https://fairsharing.org/communities/activities. An example is the FAIR Metrics working group (http://fairmetrics.org) 6 , where we work to guide producers of standards, databases and repositories to assess the level of FAIRness of their resource. We will develop measurable indicators, which will be progressively implemented in the FAIRsharing registry. The content within FAIRsharing is • ELIXIR (https://www.elixir-europe.org): an intergovernmental organisation that brings together life science resources from across Europe and beyond to build a sustainable infrastructure to support life science research and its translation to medicine and the environment, the bio-industries and society. FAIRsharing is a service of its Interoperability Platform.
• EOSC Pilot (https://eoscpilot.eu): the European Open Science Cloud (EOSC) Pilot that supports the exploration of a virtual environment with open and seamless services for storage, management, analysis and re-use of research data, across borders and scientific disciplines by federating existing scientific data infrastructures. FAIRsharing is a core element of its proposed metadata catalogues strategy.
• FORCE11 (https://www.force11.org): a community of scholars, librarians, archivists, publishers and research funders that aims to bring about a change in modern scholarly communications through the effective use of information technology. FAIRsharing has a working group in FORCE11 and works with its Data Citation Implementation Pilot, including journals and publishers, to identify criteria and develop tools for the selection of databases and repositories. FAIRsharing is functional element of a distributed FAIRness assessment tool kit, including the FAIRshake system to evaluate digital objects.
• RDA (https://www.rd-alliance.org): an international organization that focuses on the development of infrastructure and community activities to reduce barriers to data sharing and promote the acceleration of data driven innovation worldwide. FAIRsharing has a working group in RDA and works with other RDA activities, e.g., to define and implement a common framework for journal and publisher research data journal policy; and to connect FAIRsharing to data management plans tools.
We say we need standards, but do we use them?
The scientific community, funders and publishers all endorse the concept that common data and metadata standards underpin data reproducibility, ensuring that the relevant elements of a dataset are reported and shared consistently and meaningfully. However, navigating through the many standards available can be discouraging and often unappealing for prospective users. Bound within a particular discipline or domain, reporting standards are fragmented, with gaps and duplications, thereby limiting their combined used. Although standards should stand alone, they should also function well together, especially to better support multi-dimensional data but also the aggregation of pre-existing datasets from one or more disciplines or domains. Although standards should stand alone, they should also function well together, especially to better support multi-dimensional data but also the aggregation of pre-existing datasets from one or more disciplines or domains.
Understanding how they work or how to comply with them takes time and effort. Measuring the uptake of standards, however, is not trivial, and achieving a full picture is practically impossible.
FAIRsharing provides a snapshot of the standards landscape, which is dynamic and will continue to evolve as we engage with more communities and verify the information we house, add new resources, track their life-cycle status and usage in databases and repositories, and link out to examples of training material. FAIRsharing also plays a fundamental role in the activation of the decision-making chain, which is is an essential step towards fostering the wider adoption of standards. When a standard is mature and appropriate standard-compliant systems become available, such as databases and repositories, these must then be channelled to the relevant stakeholder community, who in turn must recommend them (e.g., in data policies) or use them (e.g. to define a data management plan) to facilitate a high-quality research cycle.
As of 10th of July 2018, there are a total of 1173 community standards, 793 of which are specific to the life, agricultural, environmental, biomedical and health sciences, and 45 are generic and multi-disciplinary. 124 reporting guidelines (out of 144), 653 terminology artifacts (out of 737), and 6 identifier schemas (out of 7) are mature and tagged as 'Ready' for use. Table 1 displays the 12 data and metadata standard records, corresponding to the top ten positions, that were most accessed on FAIRsharing during 2017. This ranking, however, shows no direct correlation with the level of standard adoption (by journal and funder data policies, or by databases and repositories) and is perhaps due to their popularity within their direct domain. The ranking is also very variable and can change substantially from year to year, most likely reflecting the activity of their respective research communities and when a standard is actively in development.  'invisible' to users, including publishers/journals. recommended standards, only one (the NCBI Taxonomy) is in the top ten standards most implemented by databases and repositories (as shown in Table 1), whilst one third (ten reporting guidelines and one terminology artifact) are not implemented. are not explicitly mentioned in the journals/publishers data policies. focussing on specific disciplines and areas of study, or the engagement with learned societies and researchers actively supporting and using certain standards. Generally, the community journals/publishers serve is often not familiar with standards, with many standards perceived as a hindrance to data reporting rather than help. Therefore, the current trend is for journals/publishers to recommend generalists and a core set of discipline-specific repositories -although a bigger number of (public and global, project-driven, and institution-based) databases and repositories exist, and the list of those recommended by the various organizations are very different 7 -while very few standards, such as those for data citation standards and the minimum reporting guidelines (the metadata standards more relevant to publication), are recommended. The general opinion is that terminology artifacts and models/formats instead should emerge from a close collaboration between their developing community and the implementing repositories, and remain implicitly suggested.
FAIRsharing, therefore, plays the unique role of highlighting to journals/publishers, as well as researchers and other stakeholders, which terminology artifacts, models/formats, along other standards, each database and repository implements. This, along with community indicators of use and maturity, and emerging global certifications, is essential to inform the selection or recommendation of relevant databases and repositories. FAIRsharing aims to increase the visibility, citation and credit of these community-driven standards, databases and repositories efforts.

The best standards are invisible and transparent
Standards for reporting of data and metadata are essential for data reuse, which drives sciences and discoveries. Minimal reporting guidelines are intended for human consumption and are usually narrative in form and therefore prone to ambiguities, making compliance and validation difficult and approximate. Many of these guidelines, however, already come with (or lead to the development of) associated models/formats and terminology artifacts, which are created to be machine readable (rather than for human consumption). These two types of standards ensure the datasets are harmonized in regard to structure, formatting and annotation, setting the foundation for the development of tools and repositories that enable transparent interpretation, verification, exchange, integrative analysis and comparison of (heterogeneous) data. The goal is to ensure the implementation of these standards in data annotation tools and data repositories, making these standards invisible to the end users.
Models/formats and terminology artifacts are essential to the implementation of the FAIR principles that put a specific emphasis on enhancing the ability of machines to automatically discover and use data. In particular, the 'computability' of standards is core to the development of FAIR metrics to measure the level of compliance of a given dataset against the relevant metadata descriptors. These machine-readable standards provide the necessary quantitative and verifiable measures of the degree by which data meets these reporting guidelines. The latter, on their own, would just be statements of unverifiable good intentions of compliance to given standards.
Delivering tools and practices to create standards-based templates for describing datasets smarter and faster is essential, if we are to use these standards in the authoring of metadata for the variety of data types in the life sciences and other disciplines. Community discussions are ongoing around the need for common frameworks for disciplinary research data management protocols 8 .
Furthermore, research activities to deliver machine-readable standards are already being undertaken by the FAIRsharing team and collaborators 9 ; all outputs will be freely shared for others to develop tools that would make it easy to check the compliance of data to standards.

Committed to community service
The FAIRsharing mission is to increase guidance to consumers of standards, databases, repositories, and data policies, to accelerate the discovery, selection and use of these resources; and producer satisfaction in terms of resource visibility, reuse, adoption and citation. Box 2 illustrates community-provided exemplar use cases that drive our work. This is a major undertaking, but it is a journey we are not doing alone.

Box 2. Exemplar on how FAIRsharing can help different stakeholders.
• Carla (researcher) searches FAIRsharing to identify an established repository, recognized by the journal she plans to submit to, with restricted data access to deposit her sensitive datasets, as recommended by her funder's data policy.
• Andrea (biocurator) searches FAIRsharing for suitable standards to describe a set of experiments; he filters the results by disciplines, focussing on standards implemented by one or more data repositories, with available annotations tools; he also looks for examples of the most up-to-date version of the standards, and the details of a person or support group to contact.
• Alex (standards developer) creates and maintains a personalized collection page on FAIRsharing to list and showcase the set of standards developed by the grass-roots standard organization she is the representative of; she registers the standards and/or claims existing records added by the FAIRsharing team, vetting the descriptions and/or enhancing them by adding indicators of maturity for the standards, and indicating the repositories and tools implementing them; her grass-roots organization uses the collection to maximise the visibility of their standards, promoting adoption outside their immediate community, also favouring reuse in and extensions to other areas.
• Sam (repository manager) registers his data resource at FAIRsharing manually or programmatically, describing terms of deposition and access, adding information on the resource's relationship to other repositories and use of standards, and assessing the level FAIRness of his data repository; he links the record to funding source(s) supporting the resources and the institute(s) hosting it, as well as his ORCID profile to get credit for his role as maintainer of a resource; he receives alerts if a publisher recommends his repository in a data policy; and he uses the DOI assigned to his repository record to cite the evidence of these adoption.
• Andy (policy maker) registers her journal's data policy in FAIRsharing, creating and maintaining an interrelated list of the repositories and standards she recommends to the authors, to deposit and annotated data and other digital assets; she keeps her data policy up-to-date using visualization and comparison functionalities, and consulting the knowledge graph that offers a interactive view of the repositories, tools and standards, as well as receiving customized alerts, e.g. when a repository has changed its data access terms, or when a standards has been superseded by another.
• Bob (data manager) consults FAIRsharing when creating a data management plan to identify the most appropriate reporting guidelines, formats and terminologies for his data types, and formally cites these community standards using their DOIs and/or the 'how to cite this record' statements provided for each resource.
• Kyle (librarian) and her colleagues involved in supporting research data use FAIRsharing to: enrich educational and training material to support scholars to utilize data standards, and conform to journal and funder policies, and to develop guidance that increases capability and skills, and empowers researchers to organize and make their data FAIR.
Collaborative work is happening on many fronts. We are categorizing the records according to discipline and domain via two open application ontologies. This should facilitate more accurate browsing, discovery and selection. To improve our policy registry we are disambiguating between individual journal policies and those by publishers that encompass more journals. This will increase the number of journals covered and more accurately represent the different data policy models currently being pursued by publishers. Selection and decision making is being improved by the enrichment of indicators based on community-endorsed and discipline-specific criteria, such as FAIR metrics and FAIRness level. To maximize the 'look-up service' functionality, and to connect the content to other registries and tools, customizable interfaces for human as well as programmatic access to the data are being created. We are also expanding the existing network graph and creating new visually-accessible statistics (https://fairsharing.org/summary-statistics). Finally, on a 22 monthly-basis we are highlighting featured exemplar resources, as well as adding to the informative and educational material available on FAIRsharing.

Guidance to stakeholders
To foster a culture change within the research community into one where the use of standards, databases and repositories for FAIRer data is pervasive, we need to better promote the existence and value of these resources. First and foremost, we need to paint an accurate picture of the status quo. Several stakeholders can play catalytic roles (Figure 1); specifically, developers/curators of these resources; journal editors and publishers; research data management support staff, trainers and educators; societies, unions and communities alliances; funders; and finally researchers.  https://fairsharing.org/collection/ComputationalModelingCOMBINE). The more adopted a resource is, the greater its visibility. For example, if your standard is implemented by a repository, these two records will be interlinked; thus, if someone is interested in that repository they will see that your standard is used by that resource. If your resource is recommended in a data policy from a journal, funder or other organization, it will be given a 'recommended' ribbon, which is present on the record itself and clearly visible when the resource appears in search results.
For journal publishers or organizations with a data policy, FAIRsharing enables the maintenance of an interrelated list of citable standards and databases, grouping those that the policy recommends to users or their community (e.g., see examples of recommendations created by eight main publishers and journals: https://fairsharing.org/recommendations, including some generalist and many domain-specific databases and repositories). As we continue to map the landscape, journals/publishers can also revise their selections over time, enabling the recommendation of additional resources with more confidence. All journals that do not have such data statements should develop them to ensure all data relating to an article or project are as FAIR as possible.
Finally, journals should also encourage authors to cite the standards, database and repositories they use or develop via the 'how to cite this record' statement, found on each FAIRsharing record, which includes a DOI.
Trainers, educators as well as librarians and those organization and services involved in supporting research data can use FAIRsharing to provide a foundation on which to create or enrich educational lectures, training and teaching material, and to plug it into data management planning tools. These stakeholder communities play a pivotal role to prepare the new generation of scientists and deliver courses and tools that address the need to guide or empower researchers to organize data and to make it FAIR.
Learned societies, international scientific unions and associations, and alliances of these organisations should raise awareness around standards, databases, repositories and data policies, in particular on their availability, scope and value for FAIR and reproducible research; as well as mobilize their community members to take action e.g. 10,11,12 , to promote the use and adoption of key resources, initiate new or participate in existing initiatives to define and implement policies and projects.
Funders can use FAIRsharing to help select the appropriate resources to recommend in their data policy and highlight those resources that awardees should consider when writing their data management plan e.g. 13 . If we are to make FAIR data a reality, funders should recognize standards, as well as databases and repositories, as digital objects in their own right, which have and must have their own associated research, development and educational activities 14 . New funding frameworks need to be created to provide catalytic support for the technical and social activities around standards, in specific domains, within and across disciplines to enhance their implementation in databases and repositories, and the interoperability and reusability of data.
Last but not least, researchers can use FAIRsharing as a lookup resource to identify and cite the standards, databases or repositories that exist for their data and discipline, for example, when creating a data management plan for a grant proposal or funded project; or when submitting a manuscript to a journal, to identify the recommended databases and repositories, as well as the standards they implement to ensure all relevant information about the data is collected at the source.
Today's data-driven science, as well as the growing demand from governments, funders and publishers for FAIRer data, requires greater researcher responsibility. Acknowledging that the ecosystem of guidance and tools is still work in progress, it is essential that researchers develop or enhance their research data management skills, or seek the support of professionals in this area.
Anyone can be a user of FAIRsharing. FAIRsharing brings the producers and consumers of standards, databases, repositories and data policies closer together, with a growing list of adopters we are open to participate in joint projects to develop services for specific stakeholders and communities. Join us or reach out to us, and let's pave the way for FAIRer data together.