of 14
ARTICLE
Genomic diversity across the
Rickettsia
and
Candidatus
Megaira
genera and proposal of
genus status for the Torix group
Helen R. Davison
1
, Jack Pilgrim
1
, Nicky Wybouw
2
, Joseph Parker
3
, Stacy Pirro
4
, Simon Hunter-Barnett
1
,
Paul M. Campbell
1,5
, Frances Blow
1,6
, Alistair C. Darby
1
, Gregory D. D. Hurst
1
& Stefanos Siozios
1
Members of the bacterial genus
Rickettsia
were originally identi
fi
ed as causative agents of
vector-borne diseases in mammals. However, many
Rickettsia
species are arthropod sym-
bionts and close relatives of
Candidatus
Megaira
, which are symbiotic associates of micro-
eukaryotes. Here, we clarify the evolutionary relationships between these organisms by
assembling 26 genomes of
Rickettsia
species from understudied groups, including the Torix
group, and two genomes of
Ca
. Megaira
from various insects and microeukaryotes. Our
analyses of the new genomes, in comparison with previously described ones, indicate that the
accessory genome diversity and broad host range of Torix
Rickettsia
are comparable to those
of all other
Rickettsia
combined. Therefore, the Torix clade may play unrecognized roles in
invertebrate biology and physiology. We argue this clade should be given its own genus
status, for which we propose the name
Candidatus
Tisiphia
.
https://doi.org/10.1038/s41467-022-30385-6
OPEN
1
Institute of Infection, Veterinary and Ecological sciences, University of Liverpool, Liverpool L69 7ZB, UK.
2
Terrestrial Ecology Unit, Department of Biology,
Faculty of Sciences, Ghent University, Ghent, Belgium.
3
Division of Biology and Biological Engineering, California Institute of Technology, 1200 E California
Boulevard, Pasadena, CA 91125, USA.
4
Iridian Genomes, Bethesda, MD, USA.
5
School of Health and Life Sciences, Faculty of Biology Medicine and Health, the
University of Manchester, Manchester, UK.
6
Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY, USA.
email:
siozioss@liverpool.ac.uk
NATURE COMMUNICATIONS
| (2022) 13:2630 | https://doi.org/10.1038/s41467-022-30385-6 | www.nature.com/naturecommunications
1
1234567890():,;
S
ymbiotic bacteria are vital to the function of most living
eukaryotes, including microeukaryotes, fungi, plants, and
animals
1
4
. The symbioses formed are often functionally
important to the host with effects ranging from mutualistic to
detrimental. Mutualistic symbionts may provide bene
fi
ts through
the biosynthesis of metabolites, or by protecting their hosts
against pathogens and parasitoids
5
,
6
. Parasitic symbionts can be
detrimental to the host due to resource exploitation or through
reproductive manipulation that favours its own transmission over
the host
s
7
,
8
. Across these different symbiotic relationships,
symbionts are often important determinants of host ecology and
evolution.
The
Rickettsiales
(Alphaproteobacteria) represent an order of
largely obligate intracellular bacteria that form symbioses with a
variety of eukaryotes
9
.
Deianiraea
, an extracellular parasite of
Paramecium
, is the one known exception
10
. Within
Rickettsiales
,
the family
Rickettsiaceae
represent a diverse collection of bacteria
that infect a wide range of eukaryotic hosts and can act as sym-
bionts, parasites, and pathogens. Perhaps the best-known clade of
Rickettsiaceae
is the genus
Rickettsia
, which was initially described
as the cause of spotted fever and other rickettsioses in vertebrates
that are transmitted by ticks, lice,
fl
eas, and mites
11
.
Rickettsia
have been increasingly recognised as heritable
arthropod symbionts. Since the description of a maternally
inherited male-killer in ladybirds
12
, we now know that heritable
Rickettsia
are common in arthropods
13
,
14
. Further,
Rickettsia
-
host symbioses are diverse, with different symbionts being cap-
able of reproductive manipulation, nutritional and protective
symbiosis, as well as in
fl
uencing thermotolerance and pesticide
susceptibility
15
21
.
Our understanding of the evolution and diversity of the genus
Rickettsia
and its allies has increased in recent years, with the
taxonomy of
Rickettsiaceae
developing as more data becomes
available
14
,
22
. Weinert et al.
14
loosely de
fi
ned 13 different groups
of
Rickettsia
based on 16 S rRNA phylogeny, which showed two
early branching clades that appeared genetically distant from
other members of the genus. One of these was a symbiont of
Hydra
and designated as Hydra group
Rickettsia
, which has since
been assigned its own genus status,
Candidatus
Megaira
23
.
Ca
.
Megaira
forms a related clade to
Rickettsia
and is found in cili-
ates, amoebae, chlorophyte and streptophyte algae, and
cnidarians
24
. Members of this clade are found in hosts from
aquatic, marine and soil habitats which include model organisms
(e.g.,
Paramecium
,
Volvox
) and economically important verte-
brate parasites (e.g.,
Ichthyophthirius multi
fi
liis
, the ciliate that
causes white spot disease in
fi
sh)
24
. Whilst symbioses between
Ca
. Megaira
and microeukaryotes are pervasive, there is no
publicly available complete genome and the impact of these
symbioses on the host are poorly understood.
A second early branching clade was described from
Torix tagoi
leeches and is commonly coined Torix group
Rickettsia
25
. Sym-
bionts in the Torix clade have since been found in a wide range of
invertebrate hosts from midges to freshwater snails to
fi
sh-
parasitic amoeba
13
. The documented diversity of hosts is wider
than other
Rickettsia
groups, which are to date only found in
arthropods and their associated vertebrate or plant hosts
14
. Torix
clade
Rickettsia
are known to be heritable symbionts, but their
impact on host biology is poorly understood, despite the eco-
nomic and medical importance of several hosts (inc. bed bugs,
black
fl
ies, and biting midges). Rare studies have described the
potential effects on the host, which include larger body size in
leeches
25
; a small negative effect on growth rate and reproduction
in bed bugs
26
; and an association with parthenogenesis in
Empoasca
Leafhoppers
27
.
Current data suggest an emerging macroevolutionary scenario
where the members of the
Rickettsia
clade originated as
symbionts of microeukaryotes, before diversifying to infect
invertebrates
23
,
28
,
29
. Many symbionts belonging to the
Rick-
ettsiaceae
(e.g.,
Ca
. Megaira
,
Candidatus
Trichorickettsia
,
Candidatus
Phycorickettsia
,
Candidatus
Sarmatiella
and
Candidatus
Gigarickettsia
) circulate in a variety of
microeukaryotes
23
,
30
33
. The Torix group
Rickettsia
retained a
broad range of hosts from microeukaryotes to arthropods
13
. The
remaining members of the genus
Rickettsia
evolved to be
arthropod heritable symbionts and vector-borne pathogens
14
,
34
.
However, a lack of genomic and functional information for
symbiotic clades limits our understanding of evolutionary tran-
sitions within
Rickettsia
and its related groups. No
Ca
. Megaira
genome sequences are currently publicly available and of the 165
Rickettsia
genome assemblies available on the NCBI (as of 29/04/
21), only two derive from the Torix clade and these are both draft
genomes. In addition, dedicated heritable symbiont clades of
Rickettsia
, such as the Rhyzobius group, have no available
genomic data, and there is a single representative for the Adalia
clade. Despite the likelihood that heritable symbiosis with
microeukaryotes and invertebrates was the ancestral state for this
group of intracellular bacteria, available genomic resources are
heavily skewed towards pathogens of vertebrates.
In this study we establish a richer base of genomic information
for heritable symbionts
Rickettsia
and
Ca
. Megaira
, then use
these resources to clarify the evolution of these groups. We
broaden available genomic data through a combination of tar-
geted sequencing of strains without complete genomes, and
metagenomic assembly of
Rickettsia
strains from arthropod
genome projects. We report the
fi
rst closed circular genome of a
Ca
. Megaira
symbiont from a streptophyte alga (
Mesostigma
viride
) and provide a draft genome for a second
Ca
. Megaira
from a chlorophyte (
Carteria cerasiformis
). In addition, we pre-
sent the complete genomes of two Torix
Rickettsia
from a midge
(
Culicoides impunctatus
) and a bed bug (
Cimex lectularius
)as
well as a draft genome for
Rickettsia
from a tsetse
fl
y(
Glossina
morsitans submorsitans
, an important vector species), and a new
strain from a spider mite (
Bryobia graminum
). A metagenomic
approach established a further 22 draft genomes for insect sym-
biotic strains, including previously unsequenced Rhyzobius and
Meloidae group draft genomes. We utilize these to conduct
pangenomic, phylogenomic, and metabolic analyses of our
extracted genome assemblies, with comparisons to existing
Rickettsia
.
Results and discussion
We have expanded the available genomic data for several
Rick-
ettsia
groups through a combination of draft and complete gen-
ome assembly. This includes an eight-fold increase in available
Torix-group genomes, and genomes for previously unsequenced
Meloidae and Rhyzobius groups. We further report initial refer-
ence genomes for
Ca
. Megaira
.
Complete and closed reference genomes for Torix
Rickettsia
and
Ca
. Megaira
. The use of long-read sequencing technologies
produced complete genomes for two subclades of the Torix group
limoniae (RiCimp) and leech (RiClec). Sequencing depth of the
Rickettsia
genomes from
C. impunctatus
(RiCimp) and
C. lectu-
larius
(RiClec) were 18X and 52X, respectively. The RiCimp
genome provides evidence of plasmids in the Torix group (pRi-
Cimp001 and pRiCimp002) (Table
1
). Notably, the two plasmids
share more similarities between them than to other
Rickettsia
plasmids. However, both plasmids contain distant homologs of
the DnaA_N domain-containing proteins previously found in
other
Rickettsia
plasmids
35
. In addition, only two components of
the type IV conjugative transfer system known as RAGEs
ARTICLE
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30385-6
2
NATURE COMMUNICATIONS
| (2022) 13:2630 | https://doi.org/10.1038/s41467-022-30385-6 | www.nature.com/naturecommunications
(
Rickettsiales
Ampli
fi
ed Genetic Elements)
36
were present on the
plasmids including homologs of the proteins TrwB/TraD and
TraA/MobA. The majority of the RAGE elements including both
the F-like (
tra
) and P-like type IV components have been
incorporated in the main chromosome. The presence of RAGE
elements, alongside the fact conjugation apparatuses have narrow
host-ranges
37
, suggest horizontal transfer of these plasmids is
likely within the
Rickettsiaceae
and could occur between Torix
and the main
Rickettsia
clade, considering co-infections of these
genera have been noted previously
38
,
39
. We additionally assem-
bled a complete closed reference genome of
Ca
. Megaira
from
Mesostigma viride
(MegNEIS296) from previously published
genome sequencing efforts. Likewise, MegNEIS296 genome
contains a plasmid which bears features of other
Rickettsia
plas-
mids including the presence of a tra conjugative element and the
presence of two DnaA_N-like protein paralogs.
General features of both genomes are consistent with previous
genomic studies of the Torix group (Table
1
). A single full set of
rRNAs (16 S, 5 S and 23 S) and a GC content of ~33% was
observed. Notably, the two complete Torix group genomes show a
distinct lack of synteny (Supplementary Fig. 1), a genomic feature
that is compatible with our phylogenetic analyses that placed these
two lineages in different subclades (leech/limoniae) (Fig.
1
and
Supplementary Fig. 3). Gene order breakdown due to intragenomic
recombination has been previously associated with the expansion of
mobile genetic elements in both
Rickettsia
40
and
Wolbachia
41
,
another member of the
Rickettsiales
. Both RiCimp and RiClec
genomes predicted to encode for a high number of transposable
elements with circa 96 and 119 annotated putative transposases,
respectively. This expansion of transposable elements along with
their phylogenetic distance is likely responsible for the extreme
synteny breakdown between RiCimp and RiClec. Of note within the
closed reference genomes MegNEIS296 and RiCimp is the presence
of a putative non-ribosomal peptide synthetase (NRPS) and a
hybrid non-ribosomal peptide/polyketide synthetase (NRPS/PKS)
respectively (Supplementary Fig. 2). Although, the exact products of
these putative pathways are uncertain, in silico prediction by Norine
suggests some similarity with both cytotoxic and antimicrobial
peptides hinting at a potential defensive role (Supplementary Fig. 2).
Further homology comparison with other taxa did not provide links
with any speci
fi
c functions or phenotypes. Previously, an unrelated
hybrid NRPS/PKS cluster has been reported in
Rickettsia buchneri
on a mobile genetic element, providing potential routes for
horizontal transmission
42
. The strongest blastp hits of MegNEIS296
NRPS proteins occur in
Cyanobacteria
(Supplementary Fig. 2)
42
.In
addition, putative toxin-antitoxin systems similar to one associated
with cytoplasmic incompatibility in
Wolbachia
have recently been
observed on the plasmid of
Rickettsia
felis in a parthenogenetic
booklouse
35
. Toxin-antitoxin systems are thought to be part of an
extensive bacterial mobilome network associated with reproductive
parasitism
43
. A BLAST search found a very similar protein in
Oopac6 to the putative large pLbAR toxin found in
R. felis
(88% aa
identity), and a more distantly related protein in the
C. impunctatus
plasmid (25% aa identity).
Sequencing and de novo assembly of other
Rickettsia
and
Ca
. Megaira
genomes
. Our direct sequencing efforts enabled
assembly of draft genomes for a second
Ca
. Megaira
strain from
the alga
Carteria cerasiformis
, and for
Rickettsia
associated with
tsetse
fl
ies and
Bryobia
spider mites. The
Rickettsia
genome
retrieved from a wild caught Tsetse
fl
y, RiTSETSE, is a potentially
chimeric assembly of closely related Transitional group
Rickettsia
.
We identi
fi
ed an excess of 3584 biallelic sites (including 3369
SNPs and 215 indels) when the raw Illumina reads were mapped
back to the assembly. High read depth of 104X indicate that this
could be a symbiotic association, re
fl
ecting previous observations
in Tsetse
fl
y cells
44
. However, there is a possibility that RiTSETSE
is not a heritable symbiont but comes from transient infection
from a recent blood meal.
From the SRA accessions, the metagenomic pipeline extracted
29 full symbiont genomes for
Rickettsiales
across 24 host species.
Five of 29 were identi
fi
ed as
Wolbachia
and discarded from
further analysis, one was a
Rickettsia
discarded for low quality,
and another was a previously assembled Torix
Rickettsia
,
RiCNE
45
. Thus, 22 high quality
Rickettsia
metagenomes were
obtained from 21 host species. One beetle (SRR6004191) carried
coinfecting
Rickettsia
Lappe3 and Lappe4 (Table
2
). The high-
quality
Rickettsia
genomes covered the Belli, Torix, Transitional,
Rhyzobius, Meloidae and Spotted Fever Groups (Table
2
and
Supplementary Data 1).
Beetles, particularly rove beetle (
Staphylinidae
) species, appear
in this study as a possible hotspot of
Rickettsia
infection.
Rickettsia
has historically been commonly associated with beetles,
including ladybird beetles (
Adalia bipunctata
), diving beetles
(
Deronectes sp
.) and bark beetles (
Scolytinae
)
14
,
17
,
34
,
46
,
47
. Though
a plausible and likely hotspot, this observation needs be
approached with caution as this could be an artefact of skewed
sampling efforts.
Phylogenomic analyses and taxonomic placement of assembled
genomes
. The phylogeny and network illustrate the distance of
Torix from
Ca
. Megaira
and other
Rickettsia
, along with an
extremely high level of within-group diversity in Torix compared
Table 1 Summary of the closed
Ca
. Megaira
and Torix
Rickettsia
genomes completed in this project.
Group
Ca
. Megaira
Torix
Rickettsia
Torix
Rickettsia
Strain Name
MegNIES296
RiCimp
RiClec
Symbiont genome accession
GCA_020410825.1
GCA_020410785.1
GCA_020410805.1
Host
Mesostigma viride NIES-296
Culicoides impunctatus
Cimex lectularius
Raw reads accession
SRR8439255
,
SRX5120346
SRR16018514
,
SRR16018513
SRR16018512
,
SRR16018511
Total nucleotides
1,532,409
1,566,468
1,611,726
Chromosome size (bp)
1,448,425
1,469,631
1, 611,726
Plasmids
1 (83,984 bp)
2 (77550 bp
+
19287 bp)
None
GC content (%)
33.9
32.9
32.8
Number of CDS
1,359
1,397
1,544
Avg. CDS length (bp)
998
900
874
Coding density (%)
88.5
86
84
rRNAs
3
3
3
tRNAs
34
34
35
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30385-6
ARTICLE
NATURE COMMUNICATIONS
| (2022) 13:2630 | https://doi.org/10.1038/s41467-022-30385-6 | www.nature.com/naturecommunications
3
to any other group (Fig.
1
and Supplementary Fig. 3). No sig-
ni
fi
cant discordance was detected between the core and ribosomal
phylogenies. The phylogenies generated using core genomes are
consistent with previously identi
fi
ed
Rickettsia
and host associa-
tions using more limited genetic markers
13
,
14
,
48
,
49
. For instance,
P
fl
uc4 from
Proechinophthirus
fl
uctus
lice is grouped on the same
branch as a previously sequenced
Rickettsia
from a different
individual of
P.
fl
uctus
48
. The following groups were identi
fi
ed in
the 22 genomes assembled from the SRA screening: 4 Transi-
tional, 1 Spotted Fever, 1 Adalia, 8 Belli and 7 Torix limoniae.
Targeted sequences were con
fi
rmed as: Torix limoniae (RiCimp),
Torix leech (RiClec), Transitional (RiTSETSE),
Ca
. Megaira
(MegCarteria and MegNEIS296), and a deeply diverging Torix
clade provisionally named Moomin (Moomin) (Table
2
, Fig.
1
,
Supplementary Fig. 3 and 4). The extracted Torix genomes
include one double infection giving a total of 10 new genomes
across 9 potential host species. The double infection is found
within the rove beetle
Labidopullus appendiculatus
, forming two
distinct lineages, Lappe3 and Lappe4 (Fig.
1
and Supplementary
Fig. 3).
We also report a putative Rhyzobius group
Rickettsia
genomes
extracted from the staphylinid beetle
Oxypoda opaca
(Oopac6)
and Meloidae group
Rickettsia
from the
fi
re
fl
y
Pyrocoelia
pectoralis
(Ppec13). They have high completeness, low contam-
ination, and consistently group away from the other draft and
completed genomes (Figs.
1
,
2
, and Supplementary Data 1).
MLST analyses demonstrate that these bacteria are most like the
Rhyzobius and Meloidae groups described by Weinert et al.
14
(Supplementary Fig. 4). Phylogenies of Oopac6 and Ppec13 sug-
gest that Rhyzobius sits as sister group to all other
Rickettsia
groups, and Meloidae is more closely associated with Belli (Fig.
1
,
Supplementary Fig. 3
5). Further genome construction will help
clarify this taxon and its relationship to the rest of the
Rickettsiaceae
. The sequencing data for the wasp,
Diachasma
alloeum
, used here has previously been described to contain a
pseudogenised nuclear insert of
Rickettsia
material, but not a
complete
Rickettsia
genome
50
. The construction of a full, non-
pseudogenised genome with higher read depth than the insect
contigs, low contamination (0.95%) and high completion
(93.13%) suggests that these reads likely represent a viable
Rickettsia
infection in
D. alloeum
. However, these data do not
exclude the presence of an additional nuclear insert. It is possible
for a whole symbiont genome to be incorporated into the host
s
DNA like in the case of
Wolbachia
51
, or the partial inserts of
Ca
. Megaira
genomes in the
Volvox carteri
genome
52
. The
presence of both the insert and symbiont need con
fi
rmation
through appropriate microscopy methods.
Recombination is low within the core genomes of
Rickettsia
and
Ca
. Megaira
but may occur between closely related clades
that are not investigated here. Across all genomes, the PHI score
is signi
fi
cant in 6 of the 74 core gene clusters, suggesting putative
recombination events. However, it is reasonable to assume that
most of these may be a result of systematic error due to the
divergent evolutionary processes at work across
Rickettsia
genomes. Patterns of recombination can occur by chance rather
than driven by evolution which cannot be differentiated by
current phylogenetic methods
53
. The function of each respective
cluster can be found in Supplementary Data 1.
Gene content, pangenome and metabolic analysis
. Across all
genomes used in the gene content comparison analysis (Supple-
mentary Fig. 6), Anvi
o identi
fi
ed only 208 core gene clusters of
which 74 are represented by single-copy genes. It is particularly
evident the large size of the accessory genome across the main
Rickettsia
and the Torix clades. Out of the 2470 predicted
ortholog clusters for the Torix clade 1296 (52.5%) are uniquely
found among the Torix genomes, while for
Rickettsia
2460 unique
ortholog clusters were predicted from a total of 3811 (64.5%)
Core Gene Clusters
Bootstrap Values
91-100
81-90
≤80
0.1
0.1
Rickettsia prowzekii *
Rickettsia honei RB
Rickettsia massiliae str. AZT80
RiCimp
Rickettsia peacockii str. Rustic
MegNEIS296
Rickettsia japonica *
Rickettsia africae ESF-5
Dallo3
Moomin
Rickettsia rhipicephali
Choog2
Cmasu2
Rickettsia hoogstraalii Croatica
Rickettsia australis str. Cutlack
Btrans1
Rickettsia endosymbiont of Proechinophthirus fluctus
Gbili3
Rickettsia bellii str. RML An4
Rickettsia aeschlimannii
Rickettsia conorii *
Rickettsia bellii RML369-C
Econn1
Rickettsia typhi *
Rickettsia massiliae MTU5
Lappe4
Rickettsia amblyommatis *
Orientia tsutsugamushi str. Ikeda
Gdoso1
RiAbipunctata
Oopac6
Rickettsia argasii T170-B
Rickettsia canadensis str. CA410
Ofont3
Rickettsia montanensis str. OSU 85-930
Pfluc4
Rickettsia canadensis str. McKiel
Rickettsia akari str. Hartford
Drufa1
Earac4
Rickettsia parkeri *
RiCNE
Sanch3
Lappe3
Pante1
Rickettsia felis str. Pedreira
Rickettsia rhipicephali str. 3-7-female6-CWPP
Rickettsia rickettsii *
RiClec
Rickettsia gravesii BWI-1
S2
Psono2
RiTBt
Rickettsia heilongjiangensis 054
Slati1
Rickettsia philipii str. 364D
Rickettsia sp. MEAM1 (Bemisia tabaci)
RiTSETSE
Ppec13
Rickettsia australis str. Phillips
Rickettsia rhipicephali str. Ect
Rickettsia slovaca *
Ssp4
MegCarteria
Rickettsia sibirica *
Blapp1
Rickettsia bellii OSU 85-389
Rickettsia felis URRWXCal2
Rickettsia monacensis
Rickettsia helvetica C9P9
100
100
100
100
100
To r i x
(
Ca
. Tisiphia)
Typhus
Transitional
Helvetica
Adalia
Canadensis
Scapularis
Belli
Meloidae
Rhyzobius
Ca
. Megaira
Orientia
Spotted
Fever
Rickettsia honei RB
Rickettsia massiliae str. AZT80
Rickettsia peacockii str. Rustic
Rickettsia japonica *
Rickettsia africae ESF-5
Rickettsia rhipicephali
Rickettsia endosymbiont of Proechinophthirus fluctus
Rickettsia aeschlimannii
Rickettsia conorii *
Rickettsia massiliae MTU5
Rickettsia amblyommatis *
Orientia tsutsugamushi str. Ikeda
Rickettsia argasii T170-B
Rickettsia montanensis str. OSU 85-930
Pfluc4
Rickettsia parkeri *
Rickettsia rhipicephali str. 3-7-female6-CWPP
Rickettsia rickettsii *
Rickettsia gravesii BWI-1
Rickettsia heilongjiangensis 054
Rickettsia philipii str. 364D
Rickettsia rhipicephali str. Ect
Rickettsia slovaca *
Rickettsia sibirica *
//
//
//
Fig. 1 Genome wide phylogeny of
Rickettsia
and
Ca
. Megaira
.
Maximum likelihood (ML) phylogeny of
Rickettsia
and
Ca
. Megaira
constructed from 74
core gene clusters extracted from the pangenome. New genomes are indicated by
and bootstrap values based on 1000 replicates are indicated with
coloured diamonds (red
=
91
100, yellow
=
81
90, black <
=
80). New complete genomes are: RiCimp, RiClec and MegNEIS296. Asterisks indicate
collapsed monophyletic branches and
//
represent breaks in the branch. Accessions used are provided in Supplementary Data 1.
ARTICLE
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30385-6
4
NATURE COMMUNICATIONS
| (2022) 13:2630 | https://doi.org/10.1038/s41467-022-30385-6 | www.nature.com/naturecommunications
Table 2 Summary of draft genomes generated during the current project and their associated hosts. Full metadata including CheckM completeness score
s and levels of
contamination can be found in Supplementary Data 1.
Strain
Symbiotic bacteria assembly
accession
Group
Number of
contigs
Total
length (bp)
Host name
Host Order
Blapp1
GCA_020404495.1
Belli
171
1266633
Bembidion lapponicum
Coleoptera
Btrans1
GCA_020404375.1
Belli
241
1417452
Bembidion nr. transversale
OSAC:DRMaddison
DNA3205
Coleoptera
Choog2
GCA_020404365.1
Belli
16
1357829
Columbicola hoogstraali
Phthiraptera
Cmasu2
GCA_020404525.1
Transitional
196
1295004
Ceroptres masudai
Hymenoptera
Dallo3
GCA_020404485.1
Belli
196
990679
Diachasma alloeum
Hymenoptera
Drufa1
GCA_020404445.1
Belli
14
1364611
Degeeriella rufa
Phthiraptera
Earac4
GCA_020881375.1
Transitional
96
1350066
Ecitomorpha arachnoides
Coleoptera
Econn1
GCA_020881315.1
Transitional
238
1070326
Eriopis connexa
Coleoptera
Gbili3
GCA_020881275.1
Torix limoniae (
Ca
.
Tisiphia
)
171
1188102
Gnoriste bilineata
Diptera
Gdoso1
GCA_020881245.1
Belli
34
1420758
Graphium doson
Lepidoptera
Lappe3
GCA_020881125.1
Torix limoniae (
Ca
.
Tisiphia
)
122
1368980
Labidopullus appendiculatus
Coleoptera
Lappe4
GCA_020881075.1
Torix limoniae (
Ca
.
Tisiphia
)
154
1332357
Labidopullus appendiculatus
Coleoptera
MegCarteria
GCA_020881215.1
Ca
. Megaira
72
1298707
Carteria cerasiformis
Chlamydomonadales
Ofont3
GCA_020404465.1
Adalia
91
1529137
Omalisus fontisbellaquei
Coleoptera
Oopac6
GCA_020881235.1
Rhyzobius
181
1497231
Oxypoda opaca
Coleoptera
Pante1
GCA_020881195.1
Torix limoniae (
Ca
.
Tisiphia
)
70
1472610
Pseudomimeciton antennatum
Coleoptera
P
fl
uc4
GCA_020404545.1
Spotted Fever
7
1251895
Proechinophthirus
fl
uctus
Phthiraptera
Ppec13
GCA_020404425.1
Belli
90
1426047
Pyrocoelia pectoralis
Coleoptera
Psono2
GCA_020881175.1
Torix limoniae
Ca
. Tisiphia
)
163
1492063
Platyusa sonomae
Coleoptera
RiTSETSE
GCA_020881295.1
Transitional
172
1451997
Glossina morsitans submorsitans
Diptera
S2
GCA_020404555.1
Torix limoniae (
Ca
.
Tisiphia
)
103
1251484
Sericostoma
Trichoptera
Sanch3
GCA_020881115.1
Belli
181
1487154
Stiretrus anchorago
Hemiptera
Slati1
GCA_020881155.1
Transitional
109
1301763
Sceptobius lativentris
Coleoptera
Ssp4
GCA_020404565.1
Torix limoniae (
Ca
.
Tisiphia
)
87
1231013
Sericostoma sp
. HW-2014
Trichoptera
Moomin
GCA_020881085.1
Torix moomin (
Ca
. Tisiphia
)
204
1137559
Bryobia graminum
Trombidiformes
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30385-6
ARTICLE
NATURE COMMUNICATIONS
| (2022) 13:2630 | https://doi.org/10.1038/s41467-022-30385-6 | www.nature.com/naturecommunications
5
(Fig.
3
). However, if we account for the number of genomes
available in each clade then Torix shows higher rates of gene
cluster and unique gene clusters accumulation with each addi-
tional genome (Fig.
4
). Our results indicate that the main
Rick-
ettsia
clade and especially the Torix clade, seem to have a high
degree of genome diversity, suggesting a wider repertoire of genes
and potentially greater rates of gene turnover. As expected, the
more genomes that are included in analyses, the smaller the core
genome extracted. However, gene content analysis results of
increasingly diverged genomes should be always interpreted with
caution as true homology relationship between genes/proteins
might get obscured by their sequence divergence.
Torix is a distinctly separate clade sharing less than 65% AAI
similarity to any
Rickettsia
or
Ca
. Megaira
genomes (Fig.
2
). It
contains at least
fi
ve species-level clusters with >95% ANI similarity
that re
fl
ect its highly diverse niche in the environment
(Fig.
2
)
13
,
54
,
55
.Withonlytwoexamples,thetruediversityof
Ca
. Megaira
is underestimated here. Overall, our results indicate
a)
Blapp1
Btrans1
Choog2
Cmasu2
Dallo3
Drufa1
Earac4
Econn1
Gbili3
Gdoso1
Lappe3
Lappe4
MegCarteria
MegNEIS296
Moomin
Ofont3
Oopac6
Pante1
Pfluc4
Ppec13
Psono2
RiCimp
RiClec
RiTSETSE
S2
Sanch3
Slati1
Ssp4
b)
Adalia
Spotted Fever
Belli
Canadensis
Scapularis
Meloidae
Typhus
Ca.
Megaira
Torix
(
Ca
. Tisiphia)
Orientia
Helvetica
Rhyzobius
Transitional
Rickettsia
Orientia
To r i x
(
Ca
. Tisiphia)
Ca.
Megaira
Dallo3
Psono2
Pante1
Sanch3
Oopac6
Ppec13
Gdoso1
RiTSETSE
Lappe3
Lappe4
MegCarteria
Earac4
Drufa1
Choog2
Slati1
RiCimp
Blapp1
S2
Ssp4
Cmasu2
Ofont3
Pfluc4
Gbili3
Moomin
Econn1
Btrans1
MegNEIS296
RiClec
Fig. 2 Genus and species level clustering across
Rickettsia
and
Ca
. Megaira
.
Frutcherman Reingold networks of pairwise (
a
) Average Amino Acid
Identity (AAI) with edge weights >65% similarity and (
b
) Average Nucleotide Identity (ANI) with edge weights >95% similarity across all genomes. AAI
and ANI illustrate genus and species boundaries, respectively. The 13 current cluster names are annotated over the 23 species clusters found in the AN
I
network. New genomes are named and have a green outline. Node
fi
ll colours indicate
Rickettsia
(Dark blue),
Ca
. Megaira
(orange), Torix/
Ca
. Tisiphia
(purple),
Orientia
outgroup (light blue). Source data are provided in Source Data.
ARTICLE
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30385-6
6
NATURE COMMUNICATIONS
| (2022) 13:2630 | https://doi.org/10.1038/s41467-022-30385-6 | www.nature.com/naturecommunications
higher genomic plasticity within Torix clade in terms of gene
content compared to
Rickettsia
.
We also investigated whether Torix and
Rickettsia
clades are
enriched for particular COGs (Supplementary Data 1). Among
the most highly enriched genes in Torix clade were genes
encoding for invasion associated proteins like the
exopolysaccharide synthesis protein ExoD (COG3932) and the
invasion associated protein IalB (COG5342), a carbonic anhy-
drase (COG0288) and a Chloramphenicol resistance associated
protein (COG3896). Both carbonic anhydrase and ExoD homo-
logs has been already reported in Torix clade
45
and our results
here further support their important role in Torix biology. ExoD
Rickettsia
(88)
Rhyzobius (1)
To r i x (
Ca
. Tisiphia) (12)
Ca
. Megaira (2)
Shared gene clusters
Intersection size
Known COG
Unknown
Annotation
Fig. 3 Gene content comparison.
Shared and unique gene clusters across genus putative genus clusters
Rickettsia
, Rhyzobius, Torix and
Ca
. Megaira
as
suggested by GTDB-tk. Vertical coloured bars represent the size of intersections (the number of shared gene clusters) between genomes in descending
order with known COG functions displayed in coral and unknown in blue. Black dots mean the cluster is present and connected dots represent gene
clusters that are present across groups. Numbers in parenthesis represent the number of genomes used in the analysis. Source data are provided in
Source Data.
Group
Ri ckettsi a
To r i x (
Ca
. Tisiphia)
)
b
)
a
c)
Gene clusters
Gene clusters
Gene clusters
Number of genomes
Number of genomes
Number of genomes
Gene cluster accumulation
Core genome
Unique gene cluster accumulation
Fig. 4 Gene cluster accumulation analysis. a
Pangenome accumulation curves.
b
Core genome accumulation curves.
c
The unique genome of
Rickettsia
(red) and Torix (turquoise) clades as a function of the number of genomes sequenced. Each point represents the mean value while error bars represent ±
standard deviation based on 100 permutations. Source data are provided in Source Data.
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-30385-6
ARTICLE
NATURE COMMUNICATIONS
| (2022) 13:2630 | https://doi.org/10.1038/s41467-022-30385-6 | www.nature.com/naturecommunications
7