CaltechAUTHORS
  A Caltech Library Service

Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology

Mortensen, Jonathan M. and Telis, Natalie and Hughey, Jake J. and Fan-Minogue, Hua and Van Auken, Kimberly and Dumontier, Michel and Musen, Mark A. (2016) Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology. Journal of Biomedical Informatics, 60 . pp. 199-209. ISSN 1532-0464. PMCID PMC4836980. http://resolver.caltech.edu/CaltechAUTHORS:20160222-091718577

[img] PDF - Accepted Version
See Usage Policy.

842Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20160222-091718577

Abstract

Biomedical ontologies contain errors. Crowdsourcing, defined as taking a job traditionally performed by a designated agent and outsourcing it to an undefined large group of people, provides scalable access to humans. Therefore, the crowd has the potential overcome the limited accuracy and scalability found in current ontology quality assurance approaches. Crowd-based methods have identified errors in SNOMED CT, a large, clinical ontology, with an accuracy similar to that of experts, suggesting that crowdsourcing is indeed a feasible approach for identifying ontology errors. This work uses that same crowd-based methodology, as well as a panel of experts, to verify a subset of the Gene Ontology (200 relationships). Experts identified 16 errors, generally in relationships referencing acids and metals. The crowd performed poorly in identifying those errors, with an area under the receiver operating characteristic curve ranging from 0.44 to 0.73, depending on the methods configuration. However, when the crowd verified what experts considered to be easy relationships with useful definitions, they performed reasonably well. Notably, there are significantly fewer Google search results for Gene Ontology concepts than SNOMED CT concepts. This disparity may account for the difference in performance – fewer search results indicate a more difficult task for the worker. The number of Internet search results could serve as a method to assess which tasks are appropriate for the crowd. These results suggest that the crowd fits better as an expert assistant, helping experts with their verification by completing the easy tasks and allowing experts to focus on the difficult tasks, rather than an expert replacement.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1016/j.jbi.2016.02.005DOIArticle
http://www.sciencedirect.com/science/article/pii/S1532046416000277PublisherArticle
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4836980/PubMed CentralArticle
ORCID:
AuthorORCID
Van Auken, Kimberly0000-0002-1706-4196
Additional Information:© 2016 Elsevier B.V. Available online 10 February 2016.
Subject Keywords:Crowdsourcing; Ontology engineering; Gene Ontology
PubMed Central ID:PMC4836980
Record Number:CaltechAUTHORS:20160222-091718577
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20160222-091718577
Official Citation:Jonathan M. Mortensen, Natalie Telis, Jacob J. Hughey, Hua Fan-Minogue, Kimberly Van Auken, Michel Dumontier, Mark A. Musen, Is the crowd better as an assistant or a replacement in ontology engineering? An exploration through the lens of the Gene Ontology, Journal of Biomedical Informatics, Volume 60, April 2016, Pages 199-209, ISSN 1532-0464, http://dx.doi.org/10.1016/j.jbi.2016.02.005. (http://www.sciencedirect.com/science/article/pii/S1532046416000277)
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:64624
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:22 Feb 2016 17:46
Last Modified:18 Jul 2017 19:55

Repository Staff Only: item control page