A Caltech Library Service

Barcode identification for single cell genomics

Tambe, Akshay and Pachter, Lior (2019) Barcode identification for single cell genomics. BMC Bioinformatics, 20 . Art. No. 32. ISSN 1471-2105. PMCID PMC6337828. doi:10.1186/s12859-019-2612-0.

[img] PDF - Published Version
Creative Commons Attribution.

[img] PDF (Figs S1-S6) - Supplemental Material
Creative Commons Attribution.

[img] PDF - Submitted Version
Creative Commons Attribution Non-commercial.


Use this Persistent URL to link to this item:


Background: Single-cell sequencing experiments use short DNA barcode ‘tags’ to identify reads that originate from the same cell. In order to recover single-cell information from such experiments, reads must be grouped based on their barcode tag, a crucial processing step that precedes other computations. However, this step can be difficult due to high rates of mismatch and deletion errors that can afflict barcodes. Results: Here we present an approach to identify and error-correct barcodes by traversing the de Bruijn graph of circularized barcode k-mers. Our approach is based on the observation that circularizing a barcode sequence can yield error-free k-mers even when the size of k is large relative to the length of the barcode sequence, a regime which is typical single-cell barcoding applications. This allows for assignment of reads to consensus fingerprints constructed from k-mers. Conclusion: We show that for single-cell RNA-Seq circularization improves the recovery of accurate single-cell transcriptome estimates, especially when there are a high number of errors per read. This approach is robust to the type of error (mismatch, insertion, deletion), as well as to the relative abundances of the cells. Sircel, a software package that implements this approach is described and publically available.

Item Type:Article
Related URLs:
URLURL TypeDescription Paper CentralArticle
Pachter, Lior0000-0002-9164-6231
Additional Information:© 2019 The Author(s). This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated. Received: 23 May 2017; Accepted: 7 January 2019; Published: 17 January 2019. We thank Jase Gehring and Vasilis Ntranos for helpful comments and feedback during the development of the method. Funding: None. Availability of data and materials: The datasets analyzed here were obtained from previously published datasets, which are available at the NCBI Sequence Read Archive. SRA ascension numbers used in this paper are SRR1873277 and SRR5250839. Authors’ contributions: AT and LP conceived of the project. AT wrote the software and analyzed data. AT and LP wrote the manuscript. All authors read and approved the final manuscript. Ethics approval: Not applicable. Consent for publication: Not applicable. The authors declare that they have no competing interests.
Subject Keywords:Single-cell; Barcodes; Barcode identification; de Bruijn graph; Circularization; K-mer counting
PubMed Central ID:PMC6337828
Record Number:CaltechAUTHORS:20181029-144423877
Persistent URL:
Official Citation:Tambe, Akshay, and Lior Pachter. “Barcode identification for single cell genomics.” BMC bioinformatics vol. 20,1 32. 17 Jan. 2019, doi:10.1186/s12859-019-2612-0
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:90476
Deposited By: Tony Diaz
Deposited On:30 Oct 2018 01:56
Last Modified:24 Feb 2022 17:38

Repository Staff Only: item control page