A Caltech Library Service

Addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq

Gustafsson, Johan and Robinson, Jonathan and Nielsen, Jens and Pachter, Lior (2020) Addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq. . (Unpublished)

[img] PDF (August 5, 2020) - Submitted Version
Creative Commons Attribution.

[img] PDF - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


The incorporation of unique molecular identifiers (UMIs) in single-cell RNA-seq assays allows for the removal of amplification bias in the estimation of gene abundances. We show that UMIs can also be used to address a problem resulting from incomplete sequencing of amplified molecules in sequencing libraries that can lead to bias in gene abundance estimates. Our method, called BUTTERFLY, is based on a zero truncated negative binomial estimator and is implemented in the kallisto bustools single-cell RNA-seq workflow. We demonstrate its efficacy using a range of datasets and show that it can invert the relative abundance of certain genes in cases of a pooled amplification paradox.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemCode ItemCode
Gustafsson, Johan0000-0001-5072-2659
Robinson, Jonathan0000-0001-8567-5960
Nielsen, Jens0000-0002-9955-6003
Pachter, Lior0000-0002-9164-6231
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. Posted July 06, 2020. We thank Pall Melsted, Sina Booeshaghi and Joseph Min for helpful suggestions on the project and on the integration of BUTTERFLY in bustools. Availability of data and materials: Means to access the datasets analyzed during the current study are listed in Supplementary table S1. The source code as well as Jupyter notebooks for generating the figures is available at: The source code for the branch of bustools used in this project is available at: This work was supported by funding from the Knut and Alice Wallenberg foundation (J.N.), the National Cancer Institute of the National Institutes of Health under award number F32CA220848 (J.R.), and NIH U19MH114830 (L.P.) The authors declare that they have no competing interests. Authors’ Contribution: Conceptualization, J.G., L.P.; Methodology, J.G, L.P.; Software, J.G. ; Writing – Original Draft, J.G., L.P.; Writing – Review & Editing, J.G., L.P., J.R., J.N.; Supervision, L.P., J.R., J.N.; Funding Acquisition, L.P., J.R., J.N.
Funding AgencyGrant Number
Knut and Alice Wallenberg FoundationUNSPECIFIED
NIH Postdoctoral FellowshipF32CA220848
Record Number:CaltechAUTHORS:20200707-114817234
Persistent URL:
Official Citation:Addressing the pooled amplification paradox with unique molecular identifiers in single-cell RNA-seq. Johan Gustafsson, Jonathan Robinson, Jens Nielsen, Lior Pachter. bioRxiv 2020.07.06.188003; doi:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:104254
Deposited By: Tony Diaz
Deposited On:07 Jul 2020 18:57
Last Modified:06 Aug 2020 22:02

Repository Staff Only: item control page