A Caltech Library Service

Modular and efficient pre-processing of single-cell RNA-seq

Melsted, Páll and Booeshaghi, A. Sina and Gao, Fan and da Veiga Beltrame, Eduardo and Lu, Lambda and Hjorleifsson, Kristján Eldjárn and Gehring, Jase and Pachter, Lior (2019) Modular and efficient pre-processing of single-cell RNA-seq. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution.

[img] PDF (Supplementary Figures) - Supplemental Material
Creative Commons Attribution.

[img] PDF (Supplementary Note) - Supplemental Material
Creative Commons Attribution.

[img] MS Excel (Supplementary Table. Benchmark Panel Summary) - Supplemental Material
Creative Commons Attribution.

[img] MS Excel (Supplementary Table. Time & Memory) - Supplemental Material
Creative Commons Attribution.


Use this Persistent URL to link to this item:


Analysis of single-cell RNA-seq data begins with the pre-processing of reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemCode ItemCode
Melsted, Páll0000-0002-8418-6724
Booeshaghi, A. Sina0000-0002-6442-4502
da Veiga Beltrame, Eduardo0000-0002-1529-9207
Lu, Lambda0000-0002-7092-9427
Hjorleifsson, Kristján Eldjárn0000-0002-7851-1818
Gehring, Jase0000-0002-3894-9495
Pachter, Lior0000-0002-9164-6231
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license. We thank Vasilis Ntranos and Valentine Svensson for helpful suggestions and comments. We thank Jeff Farrell for the Danio rerio gene annotation used to process SRR6956073, John Schiefelbein for the Arabidopsis thaliana gene annotation used to process SRR8257100, Justin Fear the Drosophila melanogaster gene annotation used to process SRR8513910, and Junhyong Kim and Qin Zhu for the Caenorhabditis elegans gene annotation used to process SRR8611943. The benchmarking work was made possible, in part, thanks to support from the Caltech Bioinformatics Resource Center. Author Contributions: PM developed the algorithms for bustools and wrote the software. ASB conceived of and performed the UMI and barcode calculations motivating the algorithms. FG implemented and performed the benchmarking procedure, and curated indices for the datasets. EB designed and produced the comparisons between Cell Ranger and kallisto. LL investigated in detail the performance of different workflows on the 10k mouse neuron data and produced the analysis of that dataset. ASB designed the RNA velocity workflow and performed the RNA velocity analyses. KH developed and investigated the effect of, and optimal choice for, reference transcriptome sequences for pseudoalignment. JG interpreted results and helped to supervise the research. ASB planned, organized and made figures. ASB, EB, PM and LP planned the manuscript. ASB and LP wrote the manuscript.
Funding AgencyGrant Number
Caltech Bioinformatics Resource CenterUNSPECIFIED
Record Number:CaltechAUTHORS:20190617-153352518
Persistent URL:
Official Citation:Modular and efficient pre-processing of single-cell RNA-seq Páll Melsted, A. Sina Booeshaghi, Fan Gao, Eduardo da Veiga Beltrame, Lambda Lu, Kristján Eldjárn Hjorleifsson, Jase Gehring, Lior Pachter bioRxiv 673285; doi:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:96485
Deposited By: George Porter
Deposited On:17 Jun 2019 22:46
Last Modified:16 Nov 2021 17:21

Repository Staff Only: item control page