A Caltech Library Service

A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework

Bandrowski, A. E. and Cachat, J. and Li, Y. and Müller, H. M. and Sternberg, P. W. and Ciccarase, P. and Clark, T. and Marenco, L. and Wang, R. and Astakhov, V. and Grethe, J. S. and Martone, M. E. (2012) A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework. Database : The Journal of Biological Databases and Curation, 2012 . Art. No. bas005- . ISSN 1758-0463. PMCID PMC3308161.

PDF - Published Version
See Usage Policy.


Use this Persistent URL to link to this item:


The breadth of information resources available to researchers on the Internet continues to expand, particularly in light of recently implemented data-sharing policies required by funding agencies. However, the nature of dense, multifaceted neuroscience data and the design of contemporary search engine systems makes efficient, reliable and relevant discovery of such information a significant challenge. This challenge is specifically pertinent for online databases, whose dynamic content is ‘hidden’ from search engines. The Neuroscience Information Framework (NIF; was funded by the NIH Blueprint for Neuroscience Research to address the problem of finding and utilizing neuroscience-relevant resources such as software tools, data sets, experimental animals and antibodies across the Internet. From the outset, NIF sought to provide an accounting of available resources, whereas developing technical solutions to finding, accessing and utilizing them. The curators therefore, are tasked with identifying and registering resources, examining data, writing configuration files to index and display data and keeping the contents current. In the initial phases of the project, all aspects of the registration and curation processes were manual. However, as the number of resources grew, manual curation became impractical. This report describes our experiences and successes with developing automated resource discovery and semiautomated type characterization with text-mining scripts that facilitate curation team efforts to discover, integrate and display new content. We also describe the DISCO framework, a suite of automated web services that significantly reduce manual curation efforts to periodically check for resource updates. Lastly, we discuss DOMEO, a semi-automated annotation tool that improves the discovery and curation of resources that are not necessarily website-based (i.e. reagents, software tools). Although the ultimate goal of automation was to reduce the workload of the curators, it has resulted in valuable analytic by-products that address accessibility, use and citation of resources that can now be shared with resource owners and the larger scientific community.

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle
Sternberg, P. W.0000-0002-7699-0173
Additional Information:© 2012 The Author(s). Published by Oxford University Press. Submitted 19 October 2011; Revised 6 January 2012; Accepted 9 January 2012. We thank Mrs Andrea Stagg and many assistant curators for their hard work on the NIF Registry. This work was supported by and has been funded in whole or in part through the NIH Blueprint for Neuroscience Research with Federal funds from the National Institute on Drug Abuse, National Institutes of Health, Department of Health and Human Services [Contract Number HHSN271200577531C]. P.W.S. is an Investigator with the Howard Hughes Medical Institute. Conflict of interest. None declared.
Funding AgencyGrant Number
National Institute on Drug AbuseUNSPECIFIED
Howard Hughes Medical Institute (HHMI)UNSPECIFIED
PubMed Central ID:PMC3308161
Record Number:CaltechAUTHORS:20120706-134849278
Persistent URL:
Official Citation:A. E. Bandrowski, J. Cachat, Y. Li, H. M. Müller, P. W. Sternberg, P. Ciccarese, T. Clark, L. Marenco, R. Wang, V. Astakhov, J. S. Grethe, and M. E. Martone. A hybrid human and machine resource curation pipeline for the Neuroscience Information Framework Database 2012: bas005 doi:10.1093/database/bas005 published online March 20, 2012.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:32285
Deposited By: Ruth Sustaita
Deposited On:23 Jul 2012 16:02
Last Modified:03 Oct 2019 03:59

Repository Staff Only: item control page