CaltechAUTHORS
  A Caltech Library Service

Term Matrix: A novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns

Wood, Valerie and Carbon, Seth and Harris, Midori A. and Lock, Antonia and Engel, Stacia R. and Hill, David P. and Van Auken, Kimberley and Attrill, Helen and Feuermann, Marc and Gaudet, Pascale and Lovering, Ruth C. and Poux, Sylvain and Rutherford, Kim M. and Mungall, Christopher J. (2020) Term Matrix: A novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns. Open Biology, 10 (9). Art. No. 200149. ISSN 2046-2441. PMCID PMC7536087. https://resolver.caltech.edu/CaltechAUTHORS:20200423-151030286

[img] PDF - Published Version
Creative Commons Attribution.

936Kb
[img] PDF - Submitted Version
Creative Commons Attribution.

1783Kb
[img] MS Excel (Supplemental Table 1) - Supplemental Material
Creative Commons Attribution.

88Kb
[img] MS Excel (Supplemental Table 2) - Supplemental Material
Creative Commons Attribution.

11Kb
[img] MS Excel (Supplemental Table 3) - Supplemental Material
Creative Commons Attribution.

29Kb
[img] MS Excel (Supplemental Table 4) - Supplemental Material
Creative Commons Attribution.

4Kb
[img] MS Excel (Supplemental Table 5) - Supplemental Material
Creative Commons Attribution.

18Kb
[img] MS Excel (Supplemental Table 6) - Supplemental Material
Creative Commons Attribution.

17Kb

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200423-151030286

Abstract

Biological processes are accomplished by the coordinated action of gene products. Gene products often participate in multiple processes, and can therefore be annotated to multiple Gene Ontology (GO) terms. Nevertheless, processes that are functionally, temporally, and/or spatially distant may have few gene products in common, and co-annotation to unrelated processes likely reflects errors in literature curation, ontology structure, or automated annotation pipelines. We have developed an annotation quality control workflow that uses rules based on mutually exclusive processes to detect annotation errors, based on and validated by case studies including the three we present here: fission yeast protein-coding gene annotations over time; annotations for cohesin complex subunits in human and model species; and annotations using a selected set of GO biological process terms in human and five model species. For each case study, we reviewed available GO annotations, identified pairs of biological processes which are unlikely to be correctly co-annotated to the same gene products (e.g., amino acid metabolism and cytokinesis), and traced erroneous annotations to their sources. To date we have generated 107 quality control rules, and corrected 289 manual annotations in eukaryotes and over 2.5 million automatically propagated annotations across all taxa.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1098/rsob.200149DOIArticle
https://doi.org/10.1101/2020.04.21.045195DOIDiscussion Paper
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7536087PubMed CentralArticle
ORCID:
AuthorORCID
Wood, Valerie0000-0001-6330-7526
Carbon, Seth0000-0001-8244-1536
Harris, Midori A.0000-0003-4148-4606
Lock, Antonia0000-0003-1179-5999
Engel, Stacia R.0000-0001-5472-917X
Hill, David P.0000-0001-7476-6306
Van Auken, Kimberley0000-0002-1706-4196
Attrill, Helen0000-0003-3212-6364
Feuermann, Marc0000-0002-4187-2863
Gaudet, Pascale0000-0003-1813-6857
Lovering, Ruth C.0000-0002-9791-0064
Poux, Sylvain0000-0001-7299-6685
Rutherford, Kim M.0000-0001-6277-726X
Mungall, Christopher J.0000-0002-6601-2165
Additional Information:© 2020 The Authors. Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited. Manuscript received 22/06/2020; Manuscript accepted 06/08/2020; Published online 02/09/2020. We thank Peter D’Eustachio for Reactome updates and the InterPro group for InterPro2GO mapping updates. We thank Nomi Harris for constructive comments on the manuscript. We also thank the many biocurators, editors and other members of the GO Consortium who have contributed to GO annotations and to the development of the Gene Ontology, and PomBase principal investigator Stephen G. Oliver for ongoing guidance and support of all PomBase activities. Data accessibility: The GO ontology and annotation datasets are freely available from the Gene Ontology website (see the main downloads page [41]). All other data supporting this article have been uploaded as part of the electronic supplementary material. Authors' contributions: V.W. conceived the project, generated annotation rules and wrote the initial draft; S.C. and C.J.M. developed Term Matrix; K.M.R. provided bioinformatic support for the fission yeast case study; V.W., A.L., S.R.E., D.P.H., K.V.A., H.A. and R.C.L. corrected annotation errors identified in the study; M.A.H. made extensive text revisions, and prepared the manuscript for submission; D.P.H., K.V.A. and P.G. corrected ontology errors; S.P. and M.F. provided SPKW mapping updates; M.F. and P.G. provided PAINT propagation updates. All authors contributed to the discussion of ideas and manuscript revisions, and read and approved the final manuscript. The authors declare no competing interests. V.W., A.L., M.A.H. and K.M.R. are supported by the Wellcome Trust via the PomBase project (grant no. 104967/Z/14/Z). S.C., S.R.E., D.P.H., K.V.A., P.G. and C.J.M. are funded via the GO resource, which is supported by the National Human Genome Research Institute (NHGRI) (grant no. U41 HG002273). S.R.E. is also funded by the NHGRI via the Saccharomyces Genome Database (grant no. U41 HG001315) and the Alliance of Genome Resources (grant no. U24 HG010859). K.V.A. is also funded via WormBase, which is supported by the NHGRI (grant no. U24 HG002223), the UK Medical Research Council (grant no. MR/S000453/1) and the UK Biotechnology and Biological Sciences Research Council (grant no. BB/P024602/1). H.A. is funded by the UK Medical Research Council (grant no. MR/N030117/1). R.C.L. is supported by Alzheimer’s Research UK (grant no. ARUK-NAS2017A-1) and by the National Institute for Health Research UCL Hospitals Biomedical Research Centre. The GO Consortium, FlyBase (HA), Mouse Genome Informatics (DPH), the Saccharomyces Genome Database (SRE), and WormBase (KVA) are members of the Alliance of Genome Resources.
Funders:
Funding AgencyGrant Number
Wellcome Trust104967/Z/14/Z
NIHU41 HG002273
NIHU41 HG001315
NIHU24 HG010859
NIHU24 HG002223
Medical Research Council (UK)MR/S000453/1
Biotechnology and Biological Sciences Research Council (BBSRC)BB/P024602/1
Medical Research Council (UK)MR/N030117/1
Alzheimer’s Research UKARUK-NAS2017A-1
National Institute for Health ResearchUNSPECIFIED
Subject Keywords:gene ontology, quality control, annotation, biocuration
Issue or Number:9
PubMed Central ID:PMC7536087
Record Number:CaltechAUTHORS:20200423-151030286
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20200423-151030286
Official Citation:Wood V et al. 2020 Term Matrix: a novel Gene Ontology annotation quality control system based on ontology term co-annotation patterns. Open Biol. 10: 200149. http://dx.doi.org/10.1098/rsob.200149
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:102759
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:23 Apr 2020 22:26
Last Modified:12 Oct 2020 16:11

Repository Staff Only: item control page