Flexible parsing and preprocessing of technical sequences with splitcode
- Creators
- Sullivan, Delaney K.
- Pachter, Lior
Abstract
Next-generation sequencing libraries are constructed with numerous synthetic constructs such as sequencing adapters, barcodes, and unique molecular identifiers. Such sequences can be essential for interpreting results of sequencing assays, and when they contain information pertinent to an experiment, they must be processed and analyzed. We present a tool called splitcode, that enables flexible and efficient preprocessing, parsing, and manipulation of sequencing reads. The splitcode program is free, open source, and available for download at http://github.com/pachterlab/splitcode. This versatile tool will facilitate simple, reproducible preprocessing of reads from libraries constructed for a large array of single-cell and bulk sequencing assays.
Additional Information
The copyright holder for this preprint is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license. We thank the laboratory of Mitchell Guttman (Caltech) for discussions which motivated this project. Some of the splitcode code is derived from code written by Páll Melsted (University of Iceland), and we are grateful to him for sharing his code with us. Thanks to A. Sina Booeshaghi for helpful discussions regarding seqspec and splitcode. Illustrations were created with BioRender: http://biorender.com. D.K.S. was funded by the UCLA-Caltech Medical Scientist Training Program (NIH NIGMS training grant T32 GM008042). L.P. was supported in part by the National Institutes of Health (NIH) grants U19MH114830 and 5UM1HG012077-02. The authors declare no conflicts of interest. Contributions. D.K.S. conceived of the work, developed the methods and software, and drafted the manuscript. L.P. supervised the work. Both authors reviewed and edited the manuscript. Code Availability. The splitcode software is available at http://github.com/pachterlab/splitcode. The version of the splitcode software referred to throughout this paper is version 0.28.0.Attached Files
Submitted - nihpp-2023.03.20.533521v2.pdf
Files
Name | Size | Download all |
---|---|---|
md5:41d51f31b63cdb948cc44dfcaf33a128
|
303.9 kB | Preview Download |
Additional details
- PMCID
- PMC10055216
- Eprint ID
- 120411
- DOI
- 10.1101/2023.03.20.533521
- Resolver ID
- CaltechAUTHORS:20230327-441955000.1
- NIH Predoctoral Fellowship
- T32 GM008042
- NIH
- U19MH114830
- NIH
- 5UM1HG012077-02
- Created
-
2023-03-30Created from EPrint's datestamp field
- Updated
-
2023-06-30Created from EPrint's last_modified field
- Caltech groups
- Division of Biology and Biological Engineering