of 3
2
n
a
t
u
r
e
p
o
r
t
f
o
l
i
o
|
r
e
p
o
r
t
i
n
g
s
u
m
m
a
r
y
M
a
r
c
h
2
0
2
1
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences
Behavioural & social sciences
Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see
nature.com/documents/nr-reporting-summary-flat.pdf
Ecological, evolutionary & environmental sciences study design
All studies must disclose on these points even when the disclosure is negative.
Study description
Research sample
Sampling strategy
Data collection
Timing and spatial scale
Data exclusions
Reproducibility
Randomization
Blinding
Did the study involve field work?
Yes
No
Field work, collection and transport
Field conditions
Location
Access & import/export
PRJNA763820.
Genome
assemblies
produced
from
previously
published
third
party
data
can
be
accessed
at
Bioproject
PRJNA767332.
Supplementary
data
and
full
resolution figures can be accessed on figshare here: https://doi.org/10.6084/m9.figshare.c.5518182.v1. All accessions used are listed in the supplementary data and
in table 2.
In this study we expand the genomic information and produce the first reference genomes for heritable (symbiotic) Rickettsia and its
sister lineage Candidatus Megaira. We performed a large-scale comparative genomic analysis to clarify the evolution of these
neglected groups.
We first used a targeted sequencing approach to produce genome assemblies for the Rickettsia symbiont from midge (Culicoides
impunctatus), bed bug (Cimex lectularius), tsetse fly (Glossina morsitans submorsitans), and a spider mite (Bryobia graminum) hosts.
Additionally, we sequenced and constructed draft genomes for Ca. Megaira from the alga (Carteria cerasiformis). We further
extracted assembled 22 genomes (21 Rickettsia and 1 Ca. Megaira) from publicly available arthropod genome sequencing projects
(SRA-NCBI). All invertabrate SRA in ncbi were examined, the resulting pool of samples included beetles, bugs, wasps, stoneflies (all
listed in table 2. It is reflective of the samples of previously sequenced insects. Many of these do not have metadata for gender or age
range. There was no manipulation involved
We have previously identified SRA deposits from all available arthropod WGS studies in ncbi containing Rickettsia sequences (https://
doi.org/10.1093/gigascience/giab021). All identified deposits were used for subsequent genome mining. non-ncbi genomes were
targeted based on the fact that they were infected with rickettsia or megaira. Sample sizes were sufficient because they included all
information available to us. We did not aim to compared frequency of infection with host sample size due to the inherent biases
within SRA sequences toward laboratory and model organisms. Our aim was to scrape asmuch information about symbiotic bacteria
from existing data as possible. For the targeted samples, they were organisms that we knew had symbionts.
Raw reads from Rickettsia-containing SRA deposits were downloaded through the European Nucleotide Archive (ENA). Previously
published Rickettsia genomes where obtained from GenBank or ENA database. Excel spreadsheets, csv files, tsv files and database
files (anvio 7) collected and recorded information during following bioinformatic processes.
Targeted sequencing efforts took place between 2016-2020. The SRA-NCBI read deposits were downloaded and processed between
2019-2020. This was an almost entirely random screening effort aimed at picking out symbiont genomes dissociated from spatial and
temporal metadata. Ecological data is unavailable or irrelevant for 90% of our samples
All genome assemblies generated in this study and genomes retrieved from the database were quality checked and the ones that did
not pass our criteria (completeness > 90% and contamination < 2%) were excluded from downstream analyses.
The data sources and methods used here are entirely reproducible. All genomes can be downloaded and binned with the same
algorithms to produce the same results.
The exact methods, tools and code used to obtain and assemble the Rickettsia/Megaira genomes from the SRA deposits and the
scripts for downstream comparative analyses are available in github repository (https://github.com/VibrantStarling/Code-used-to-
extract-bacterial-genomes-from-invertebrate-genomes) and in supplementary data. All genome metadata and source information
can be found in the supplementary data.
Largely not relevant as there was no manipulation of sample groups. Our sample collection was only as random as the ncbi SRA
database is for arthropods. SRA deposits were grouped by host taxonomy on the ncbi servers. The obtained genomes were assigned
to known groups based on phylogeny, ANI/AAI scores, and previous grouping conventions for RIckettsia and Megaira.
Is not relevant to this study because there is no manipulation of the original samples and the bacterial genomes extracted were
completely unknown to us anyway.
Culicoides impunctatus samples were collected from a wild population in Kinlochleven, Scotland on the evenings of the 2nd and 3rd
September 2020. Weather was cloudy and still and ~14 degrees Celsius.
Kinlochleven, Scotland (56° 42' 50.7''N 4° 57' 34.9''W), 305m altitude. All other specimen used are either labaoratory strains or are
from previous studies.
Culicoides impunctatus (the highland midge) is very prevalent in Scotland and no particular permit is needed for collections. The