A Caltech Library Service

Flexible parsing and preprocessing of technical sequences with splitcode

Sullivan, Delaney K. and Pachter, Lior (2023) Flexible parsing and preprocessing of technical sequences with splitcode. . (Unpublished)

[img] PDF - Submitted Version
Creative Commons Attribution No Derivatives.


Use this Persistent URL to link to this item:


Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. We present a tool called splitcode, that enables flexible and efficient preprocessing, parsing, and manipulation of sequencing reads. The splitcode program is free, open source, and available for download at This versatile tool will facilitate simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays.

Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription Paper ItemCode
Sullivan, Delaney K.0000-0002-8359-6705
Pachter, Lior0000-0002-9164-6231
Additional Information:The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license. We thank the laboratory of Mitchell Guttman (Caltech) for discussions which motivated this project. Some of the splitcode code is derived from code written by Páll Melsted (University of Iceland), and we are grateful to him for sharing his code with us. Thanks to A. Sina Booeshaghi for helpful discussions regarding seqspec and splitcode. Illustrations were created with BioRender: D.K.S. was funded by the UCLA-Caltech Medical Scientist Training Program (NIH NIGMS training grant T32 GM008042). L.P. was supported in part by the National Institutes of Health (NIH) grants U19MH114830 and 5UM1HG012077-02. The authors declare no conflicts of interest. Contributions. D.K.S. conceived of the work, developed the methods and software, and drafted the manuscript. L.P. supervised the work. Both authors reviewed and edited the manuscript. Code Availability. The splitcode software is available at The version of the splitcode software referred to throughout this paper is version 0.28.0.
Funding AgencyGrant Number
NIH Predoctoral FellowshipT32 GM008042
Record Number:CaltechAUTHORS:20230327-441955000.1
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:120411
Deposited By: George Porter
Deposited On:30 Mar 2023 03:03
Last Modified:30 Mar 2023 03:03

Repository Staff Only: item control page