Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published February 2, 2018 | public
Journal Article

PEBank: A Comprehensive Database for Protein Engineering and Design


Recent advances in gene synthesis, microfluidics, deep sequencing, and microarray techniques have made it possible to construct and assay large libraries of variant protein sequences. This rapid generation of large sets of mutational data has significantly enhanced researchers' ability to study how proteins function and to engineer proteins with new and improved properties. Although many groups around the world are currently generating large amounts of protein engineering data, there is no standardized format to report this data and no simple mechanism for groups to share the data that they generate. We have developed PEBank (Protein Engineering data Bank), a comprehensive database for protein engineering data where users can store their data as well as query and analyze data submitted by themselves and others. PEBank stores the data in a relational database using a standardized schema that requires full protein sequence information and detailed assay descriptions. These features allow for accurate comparison of measurements made across different proteins and by different groups. PEBank is comprehensive in that it accepts data for several different protein properties, including those related to stability, folding, activity, and binding. PEBank thus provides a central repository for data that is often scattered across many different specialized databases. PEBank features a web interface and REST API that streamlines data deposition and allows for batch input and queries. A suite of analysis tools are provided to allow for discovery and analysis of relationships between mutated sequences. We demonstrate the importance of a standardized format for reporting protein engineering data that allows for accurate comparisons between different data sets and enables future data mining and machine learning approaches to be applied.

Additional Information

© 2018 Biophysical Society. Available online 6 February 2018.

Additional details

August 19, 2023
October 18, 2023