Trajectories for the evolution of bacterial CO2-concentrating mechanisms

Significance The emergence of biological novelty is often coupled to the evolution of Earth’s chemical environment. Here, we studied how the evolution of a bacterial CO2-concentrating mechanism (CCM)—a complex, multicomponent system that enables modern CO2-fixing bacteria to grow robustly in environments with low CO2—depends on environmental CO2 levels. Using a “synthetic biological” approach to assay the growth of the present-day bacteria engineered to resemble ancient ones, we show that it is possible to explain the emergence of bacterial CCMs if atmospheric CO2 was once much higher than today, consistent with geochemical proxies. Taken together, our results delineated an unexpected “CO2-catalyzed” pathway for the evolution of bacterial CCMs, whose multiple emergence has been challenging to understand.


Manipulation of the C. necator genome
The knockout mutant C. necator ΔA0006 Δcan Δcaa was produced by iterative rounds homologous recombination (to generate a desired mutation) followed by sacB counterselection to cure the kanamycin resistance marker integrated at the target locus (3). Homologous recombination was achieved by conjugation with E. coli S17 carrying a mobilizable vector encoding 500 bp homology arms flanking a cassette encoding kanamycin resistance and sacB counter selection. For each individual knockout, a pKD19-mobSacB plasmid was generated with 500 bp homology arms directly flanking the target gene. This plasmid was transformed into C. necator by conjugation with E. coli S17 and plated onto LB agar supplemented with 200 μg/ml kanamycin to select for integrants and 10 μg/ml gentamicin to select against residual E. coli.
Single integrant colonies were inoculated into LB with 10 μg/ml gentamicin and 20 μg/ml kanamycin and incubated in 30°C until turbid. Genomic integration was verified by colony PCR using a primer set where one primer annealed to the genome and the other primer annealed to the plasmid backbone. Verified colonies were inoculated into salt-free LB (10 g/L tryptone, 5 g/L yeast extract) supplemented with 10 μg/ml gentamicin and 100 mg/ml sucrose and incubated at 30°C for 48-72 hours to select against sacB activity. Strains were then streaked on two different LB plates: one without NaCl, but containing 10 μg/ml gentamicin and 50 mg/ml sucrose and a second plate with NaCl, 10 μg/ml gentamicin and 200 μg/ml kanamycin. Colonies that grew on sucrose but not on kanamycin were genotyped by colony PCR using a pair of primers that annealed upstream and downstream of the target gene. PCRs were run on an agarose gel to ensure prospective knockouts were not wild-type revertants. The final strain, C. necator ΔA0006 Δcan Δcaa was further verified by phenotype: it fails to grow heterotrophically in ambient air, but is able to grow under elevated CO 2 (4,5).

Plasmid transformation of C. necator
To enable routine electroporation of plasmids into C. necator H16, we first knocked out the hdsR homolog A0006 as removal of this restriction enzyme increases electroporation efficiency (3,6). Electrocompetent stocks of C. necator ΔA0006-derived strains (including the various knockouts) were made according to a protocol from (3) with the following modifications. A colony of the strain was inoculated into LB with 10 μg/ml gentamicin. Once turbid, the pre-culture was added to 100 mL fresh media and let grow until it reached an OD600 between 0.6-0.8. ΔA0006 was grown in ambient CO 2 and ΔA0006ΔcanΔcan was grown in 10% CO 2 . Cells were then chilled, shaking in an ice slurry until they reached 4°C. The culture was split into two 50 ml Falcon tubes and centrifuged at 4000g for 10 minutes at 4°C. The supernatant was decanted and pellets were washed twice with 50 ml ice cold sterile water and once with 50 ml 10% glycerol. The pellets were then resuspended in 0.75 ml 10% glycerol, pooled, and 100 ul aliquots were flash-frozen in liquid nitrogen for storage at -80 °C.
For plasmid transformation, a 100 μl aliquot of C. necator was thawed on ice. Upon thawing, 500 ng of plasmid was added, gently mixed, and left to incubate on ice for 5 minutes. The aliquot was then transferred to a 1 mm electroporation cuvette (Biorad Gene Pulser) and pulsed in a Gene Pulser Xcell Microbial System electroporator (2300 V, 200Ω, 25µF). The sample was then immediately resuspended in 1 ml of LB supplemented with 10 mg/ml fructose, transferred into a 14 ml round-bottom falcon tube, and recovered in a 30°C for 2 hours (H16 ΔA0006 in ambient CO 2 , H16 ΔA0006ΔcanΔcaa in 10% CO 2 ). 200 μl was then plated on LB agar plates with 10 μg/ml gentamicin, 200 μg/ml kanamycin, and 10 mg/ml fructose and placed in a 30°C incubator at ambient CO 2 or 10% CO 2 (depending on the strain) for 48 hours.

Modeling, data and analysis
The dual-limitation model was elaborated in Mathematica 12 (Wolfram) and steady-state solutions were translated to Python for further analysis and plotting. All data analysis was performed using Python 3.8 and Jupyter notebooks. Data and code required to generate all figures is available at https://github.com/flamholz/ccm_evolution.

Modeling the co-limitation of autotrophic growth
Carbonic anhydrase cannot reasonably act as a CO 2 pump alone Our model considers an autotroph with no CCM that uses rubisco to fix CO 2 in an environment with fixed extracellular CO 2 and HCO 3 concentrations, C out and H out . We further assume that these extracellular species are in equilibrium with respect to the pH, i.e. that H out /C out = K EQ (pH), and that the intracellular pH is the same as the extracellular pH so that the pH-dependent equilibrium constant K EQ (pH) is equal on both sides of the cell membrane. This assumption of equal pH equilibrium is not required but simplifies the model (7). We now write differential equations describing the time evolution of the intracellular CO 2 and HCO 3 concentrations, C in and H in , at first ignoring the HCO 3 dependence of growth to illustrate that it must be included.
Since CO 2 and HCO 3 have diffusion constants of ≈ 10 3 μm 2 /s in water, corresponding to diffusion timescales of R 2 /6D ≈ 10 -4 s over the ≈ 1 micron lengths of bacterial cells, we assume that their concentrations are spatially homogeneous inside and outside the cell (8). While cytoplasm is more viscous than water (9, 10), these effects depend on the size of the diffusant. Diffusion constants measured for smaller molecules (< 1 kDa) are about fourfold smaller in cytoplasm than in water (9), which does not affect our calculation of millisecond diffusion timescales over bacterial cell lengths. We also assume all enzyme-catalyzed reactions have first-order kinetics, i.e. substrate concentrations are substantially lower than Michaelis constants ([S] ≪ K M ). These assumptions give the following equations: Here we treat both CO 2 and HCO 3 as entering the cell passively with "effective permeabilities" ɑ and β.
These effective permeabilities account for the surface area to volume ratio of bacterial cells, which, for rod shaped cells around the size of E. coli, is (BNIDs 101792 and 114924) as we discuss is a linearized expression for rate of irreversible CO 2 fixation by rubisco, where γ assuming a Michaelis-Menten formalism and C in ≪ K M . In contrast to rubisco, the CA reaction is reversible. As such, the balance of the rates of CO 2 hydration and (δ − ϕ ) (δ ) HCO 3 dehydration , assuming each of these reactions are in their linear regimes as well. While the (ϕ ) assumption of linearity is not required, it is also not counterfactual: typical K M values measured for bacterial rubiscos (11) and carbonic anhydrases (12) are comparable to equilibrium concentrations of CO 2 and HCO 3 in water in equilibrium with ambient air at 25 C ( Figure S11).
We set both derivatives to 0 and solve for the steady-state values of C in and H in .
The above calculation implies that CA expression could increase C in by at most 10% because CAs are not coupled to any energy source and, therefore, cannot increase C in above C out . This calculation depends, of course, on the rubisco kinetics and expression ( ) and membrane permeability to CO 2 ( ). Rubisco γ α kinetics have been studied in great depth and are well-constrained (11,13). Similarly, many generations of physical chemists have studied the permeability of lipid membranes to small molecules and developed theory to estimate membrane permeabilities (14)(15)(16). Nonetheless, membrane permeabilities can depend on the lipid composition of the membrane and the complement of protein channels embedded therein (17).
Assuming that rubisco fixation is the sole growth-limiting reaction, we can estimate the exponential growth rate from C in by calculating the rubisco fixation rate C in ≈ 9x10 3 μM/s. Here we took C out ≈ 10 μM, which is roughly Henry's law equilibrium with present-day atmosphere at 25°C (Fig. S16), = 10 4 s -1 and = 10 3 s -1 . We expound on this choice of values in the main text and below. Assuming a cell volume of ≈ 1 fL (BNIDs 104843, 100004), 9x10 3 μM/s equals a fixation rate of roughly 5x10 6 CO 2 /s or ≈ 10 10 CO 2 /hr. An E. coli cell of this volume contains ≈10 10 carbon atoms (BNID 103010) and Cyanobacteria do not differ substantially from E. coli in carbon content (compare BNIDs 105530 and 111459). Therefore, assuming no loss of fixed carbon, such a Cyanobacterium would double once an hour. Autotrophic respiration, which equals the difference between gross and net fixation, is typically less than 50% of gross both in pure cyanobacterial cultures (18) and natural ecosystems (19) implying a doubling time of at most 2 hours.
Given the model articulated above, a 10% increase in C in (e.g. due to CA expression) can increase the rubisco carboxylation rate by at most 10%. As rubisco is required for producing all biomass carbon in autotrophy, a 10% increase in the rate of rubisco carboxylation can increase the exponential growth rate by at most 10%. However, in Figures S5-6 the "rubisco alone" strain did not meaningfully grow in 0.5% CO 2 while the strains expressing a CA or Ci transporter grew robustly. These qualitative effects indicated that we should look for a mechanism that can improve growth by more than ≈10%. As described in the main text and the following section, the cellular demand for HCO 3 -, which is required for several anabolic carboxylation reactions (20)(21)(22)(23), is one such mechanism.
Notably, CO 2 and HCO 3 do interconvert spontaneously. The spontaneous reaction is associated with relatively slow kinetics, with and near pH 7 (7,24). Therefore, δ ≈ 10 value is similar in scale to , which is 2-3 orders smaller than (7). Therefore, the simplified equation β α above is supported near pH 7.
A model of autotrophy including the HCO 3 --dependence of growth Our above calculation indicated to us that CA cannot act as a CO 2 pump and that, therefore, some factor is missing from the naive model of autotrophy given above. We assume that the missing factor is the ubiquitous dependence of microbial growth on HCO 3 -. This dependence is well-documented for heterotrophic microbes, which require carbonic anhydrases for growth in ambient air (4,5,(20)(21)(22)26) and is argued to stem from the reliance on HCO 3 dependent carboxylases in nucleotide, amino acid, and lipid biosynthesis (20)(21)(22). This view is supported by simple chemical logic: CO 2 is very cell-permeable (see discussion below), so it is unlikely that CO 2 is growth-limiting when available extracellularly. Experiments in yeast (21) and S. pneumonia (22) provide further support, showing that supplementing the growth media with the products of these carboxylation reactions (e.g. fatty acids) rescues growth in ambient air. Moreover, recent experiments show that ambient air growth of an E. coli CA mutant is rescued when a Cyanobacterial Na + :HCO 3 symporter, sbtA, is expressed (27,28). Similar CA dependencies have been observed in land plants (23) and manually-curated metabolic models of autotrophs include these same carboxylation reactions (23,29,30), indicating that this dependence of growth on HCO 3 is very widespread, perhaps even universal.
The primary enzymology research supporting the use of HCO 3 as the carboxylation substrate by pivotal anabolic enzymes is extensive. Biotin-dependent carboxylases like acetyl-CoA carboxylase (producing malonyl-CoA to initiate fatty acid biosynthesis) and pyruvate carboxylase (the primary anaplerotic carboxylase in many organisms) are known to use HCO 3 as part of a multi-step mechanism (31, 32).
Carbamoyl phosphate synthetase generates carbamoyl-P for arginine and bacterial biosynthesis from HCO 3 -, an amine donor (ammonia or glutamate), and 2 ATP (33). Purine biosynthesis likewise requires a carboxyl donor to generate 4-carboxyaminoimidazole ribonucleotide. In bacteria this reaction is typically catalyzed by a pair of enzymes, purE and purK, which utilize HCO 3 -(34, 35), while eukaryotes typically rely on a "class II" purE for which is considered to use CO 2 (21,35).
Our experiments demonstrate that both CCMB1 E. coli and the model facultative chemolithoautotroph C. necator depend on CA for robust rubisco-dependent growth in intermediate CO 2 levels (0.5% and 1.5%, Fig. 5). Note that C. necator was grown in autotrophic conditions in Figure 5, with H 2 being the electron donor and CO 2 being the only carbon source (Methods). As the C. necator growth defect was fully reversed by expression of the DAB2 Ci transporter (Figures 5), which is understood to produce intracellular HCO 3 by vectorial hydration of extracellular CO 2 , we interpret these data as supporting the hypothesis that C. necator depends on intracellular HCO 3 for autotrophic growth in ambient air.
Following the example of the Farquahar model of photosynthesis (38), we assume that the flux to biomass J B is determined as the minimum of two fluxes: the CO 2 -dependent flux through rubisco ( C in ) and flux through HCO 3 dependent carboxylation reactions ( H in ). This is co-limitation expressed as J B = min( C in , H in / q) where q is the fraction of biomass carbon deriving from HCO 3 -. The exponential growth rate can be estimated from J B by noting that a typical bacterial cell contains ≈10 10 carbon atoms λ (BNID 103010). For simplicity we ignore the carbon cost of cellular maintenance, though this could be included in future renditions of the model. Steady-state solutions are given below. These values determine the steady-state rates of rubisco carboxylation and bicarboxylation, which, in turn, determines the biomass production flux and exponential growth rate.
It is evident from these expressions that the rate of biomass production J B = min( C in , H in / q) will depend on H in in some circumstances and on C in in others. For example, if we assume , , ≈ 0, we find H in = C out K eq / ( + ). Therefore, if CA and Ci uptake activities are negligible and the HCO 3 permeability is much smaller than the bicarboxylaton activity , H in will be small and growth will be limited by low bicarboxylation flux. In the following sections we describe how we set reasonable ranges for all model parameters in order to examine the dependence of autotrophic biomass production on the activity of rubisco, CA, and Ci uptake systems.

Choosing realistic ranges for parameter values
We assume the pH is the same both inside and outside the cell for simplicity. Furthermore, we choose pH 7.1 since the effective pK a between CO 2 and HCO 3 is roughly 6.1 in biological salt concentrations (see supplement of (7) for detail). According to the Henderson-Hasselbalch relation, pH = pK a + log 10  Since we endeavor to explain phenotypes observed in relatively low CO 2 levels (e.g. ambient air in Fig. 6 and 0.5-1.5% CO 2 in Figs. [4][5], we assume the extracellular CO 2 concentration is in Henry's law equilibrium with present day atmosphere (≈0.04% CO 2 ). This gives C out ≈ 15 μM (7, 39) and, with K EQ = 10, H out = 150 μM (Fig. S16). For the permeability of the cell membrane to CO 2 and HCO 3 -, we use P C = 3x10 3 μm/s and P H = 10 3.2-pH x 30 μm/s ≈ 4x10 -3 μm/s following (7). The latter relation calculates the permeability of HCO 3 from its pH-dependent abundance and the permeability of H 2 CO 3 , assuming that HCO 3 has negligible permeability when compared to H 2 CO 3 due to its charge. This calculation is described in detail in the supplement of (7). We multiply these permeabilities by the surface area to volume ratio SA/V ≈ 4 μm -1 to obtain estimates of and . α ≈ 1. 2 × 10 We are left to choose ranges for the enzymatic activity parameters and . First, we note that γ, δ, ϕ, ω χ the the CA activity parameters and must be must be consistent with the equilibrium constant K EQ (i.e. must obey the Haldane relation). If the CA reaction was allowed to equilibrate, it would carry no net flux and . In these conditions , giving .
To set ranges for enzyme activities (rubisco carboxylation) and (CO 2 hydration by CA), we reviewed γ δ literature values for k cat /K M for rubiscos (11) and CA (12,40). The geometric mean of measured rubisco k cat /K M values is ≈ 0.2 μM -1 s -1 with a multiplicative standard deviation of roughly two-fold (Figure S11). A typical protein concentration might range between 0.1 and 100 μM (25). As rubisco is typically one of the most abundantly expressed proteins in autotrophic cells (41), we extend this range to 0.1 μM -1 mM implying that ranges from ≈10 -2 -10 3 s -1 . Note that we are using μM units for both the enzyme and γ substrate so that has units of μM/s carbon consumed. For CA, the geometric mean k cat /K M value in γ the direction of CO 2 hydration is ≈ 20 μM -1 s -1 with a multiplicative standard deviation of roughly seven-fold ( Figure S11). CA is not typically as highly-expressed as rubisco, so a plausible range for is δ perhaps 0.1-10 4 s -1 when a CA is expressed. As noted above, the spontaneous reaction is characterized by .  Figure S11 shows that rubisco k cat values range from roughly 1-10 s -1 (geometric mean 3.3 s -1 with a multiplicative standard deviation of 1.5 fold) and k cat values for CA-catalyzed CO 2 hydration range from ≈10 4 -10 6 s -1 (geometric mean 1.3x10 5 s -1 with a multiplicative standard deviation of 6.4 fold).
Only and remain to be set. We chose = / q with q = 100 to reflect our assumption that both ω χ rubisco and bicarboxylation processes contribute to biomass production in a roughly fixed proportion (q), but that rubisco is responsible for the production of nearly all biomass carbon in autotrophy (we assume 99%) and bicarboxylation is responsible for the remainder (1%). We used the same value of in calculating the biomass flux from the principle of co-limitation, i.e. J B = min( C in , H in / q). This amounts to assuming that the cell regulates the bicarboxylation and rubisco capacities to match their relative contributions to biomass production. Our assumption that is proportional to can be omitted, but this yields a model with an additional free parameter that is challenging to constrain from data.
To set , we consider measurements of saturated Ci uptake rates in Cyanobacteria, which are on the order of 10-100 μmol per mg chlorophyll per hour (42). Since a typical cyanobacterial cell contains ≈10 -11 mg chlorophyll (43), the per-cell rates are at most 10 -9 μmol/hour, or 3x10 -13 μmol/s into a volume of ≈1.5 μm 3 = 1.5x10 -15 L. Uptake rates in this range would contribute ≈ +200 μM/s to dH in /dt. If and depending on the pH then . Note that χ ≤ 200 µ / = 100 − 2000 µ χ ≤ 2 −1 Figure 7 and S12-15 use wider ranges for , and than calculated here in order to illustrate the behavior of the model with two-dimensional plots.

On the requirement for bicarbonate for biosynthesis
One way to examine the role of bicarbonate dependent carboxylation in our model is to set the bicarboxylation rate constant = 0. This gives = (α(β + ϕ) + ϕ(β + χ)) β(α + γ + δ) + ϕ(α + γ) = (αδ + (α + γ + δ)(β + χ)) β(α + γ + δ) + ϕ(α + γ) We see that C in and H in remain interdependent, i.e. the processes that produce H in like CO 2 hydration by carbonic anhydrase ( ) and active Ci uptake ( ) are represented in the equation for C in and vice versa. Nonetheless, these processes have negligible effect on CO 2 fixation by rubisco because (i) C in uniquely determines the rubisco rate in our model, and (ii) literature values for CO 2 permeability are high enough that (iii) rubisco cannot reduce C in much beneath C out , as described above. Figures S13 and S15 illustrate this point by showing that order-of-magnitude changes to , and do not substantially affect C in . In particular, in Figure S13A, C in ≈ C out until rubisco activity reaches very high levels ≈ 10 4 s -1 . As a result, the carboxylation flux increases with the rubisco activity , but is unaffected by CA activity ( Figure   S13G). This is a simple consequence of the fact that the measured CO 2 permeability of biological membranes ( ) is quite high. Figure S14 illustrates this point using a model with substantial Ci uptake activity ( = 100 s -1 ) and an unrealistically low value of = 12 s -1 (1000-fold smaller than we estimated above). Very low values enable CA and Ci uptake to act in concert to pump CO 2 into the cell by (i) actively taking up HCO 3 -, and (ii) converting HCO 3 into CO 2 via CA, which is (iii) retained in the cell when the membrane permeability to CO 2 ( ) much smaller than calculated or measured (14,16).
If order-of-magnitude changes to and do not affect C in (when realistic values are used), then the rubisco carboxylation flux cannot change and we must invoke another mechanism to explain the observed phenotypes. As discussed in the main-text and above, we assumed that the ubiquitous requirement for HCO 3 as the substrate for biosynthetic carboxylases is the underlying mechanism. Once we described the growth rate as mathematically coupled to both rubisco carboxylation of CO 2 and biosynthetic carboxylation of HCO 3 -, we found that changes in CA and Ci uptake activities do produce changes in growth ( Figure S13).

A quantitative view of futile cycling
Figures 7C and S15 document the effects of simultaneously varying CA activity ( ) and Ci uptake ( ) on the co-limitation model of autotrophic growth, showing that futile cycling only occurs when both activities are present at high levels. As discussed in the main-text, this quantitative view helped us understand why co-expression of CA and Ci uptake activities was not deleterious to CCMB1 or C. necator (Figures 4-5), but rather beneficial to CCMB1, enabling modest growth in ambient air ( Figure 6). This understanding relies on a fundamental difference between CA and Ci uptake: that Ci uptake is energized and can work against equilibrium, while CAs are not coupled to any energy source and cannot.
Given that CAs are not energy-coupled, they cannot cause any leakage or futile cycling on their own. This is clearly seen by considering Figure 7C or the bottom row of S15: if CA activity was increased while Ci uptake is kept low, the modeled cell did not leak Ci. At best, CA expression can lead to equilibration of the Ci pools on both sides of the membrane ( Figure S15A-B). Based on a variety of experiments, Ci uptake systems are considered to use energy to concentrate HCO 3 in the cytoplasm either by pumping extracellular HCO 3 or by energy-coupled hydration of CO 2 at the cell membrane. The energy sources used range from ATP to redox and ion gradients (36,44,45). Regardless of the underlying mechanism, our current understanding of the CCM requires a high intracellular HCO 3 concentration that is, crucially, not in equilibrium with CO 2 (7,46,47). This is understood to be the reason that expression of cytoplasmic CA activity is highly deleterious to photosynthesis and growth in model Cyanobacteria (46).
Energy-coupled Ci uptake can therefore concentrate HCO 3 in the cytosol and HCO 3 spontaneously dehydrates to CO 2 on a timescale of ≈10 s (7,24). High values can therefore produce Ci leakage on their own, which can be seen in Figure 7C and S15 where very high values lead to both CO 2 and HCO 3 leakage, i.e. J L,B , J L,H > 0. Leakage of CO 2 indicates that some HCO 3 dehydrates to CO 2 , some of which can be used by rubisco. This effect is amplified by CA expression: when was increased at high , zero leakage (J L,tot = J L,B + J L,H = 0) could be achieved at relatively lower ( Figure 7C and bottom row of S15) without altering the flux to biomass (J B ) substantially (depicted in log-scale in Figure S15I). According to our model, therefore, modest co-expression of CA and Ci uptake can reduce energy expended on pumping and balance the supply of CO 2 and HCO 3 with the cellular demand for rubisco and bicarboxylation flux.
When and were both set to high values, the model produced substantial futile cycling with J L,tot / J B ≈ 100 in extreme cases. First note that these values of = = 10 3 s -1 are several orders higher than the upper bounds we estimated above. Nonetheless, we can ask whether such a leakage rate should be expected to be deleterious to growth by comparing the energy expended on Ci pumping and CO2 fixation. Ci pumping consumes ≈ 1 ATP/carbon (7,45) while CO 2 fixation in the Calvin-Benson-Bassham cycle consumes 2.3 ATP/carbon (48,49). Therefore, J L,tot / J B ≈ 100 implies that 40-50 times more cellular energy is expended on Ci pumping than on CO 2 fixation. Figure S1: H. neapolitanus CCM mutants grow 5% CO 2 but not in ambient air. Quantification of panel B of Fig. 1.

Supplementary Figures
Wild-type H. neapolitanus (WT) grows well in 5% CO 2 (dark purple) and ambient air (0.04% CO 2 , lighter purple), producing > 10 8 colony forming units per milliliter of culture in both conditions. Mutants lacking genes coding for essential CCM components grow in elevated CO 2 (dark purple) but fail to grow in ambient air (light purple). The ΔcsosCA strain lacks the gene coding for the carboxysomal carbonic anhydrase (csosCA) while the Δcsos2 strain lacks the gene coding for an unstructured protein, csos2, required for carboxysome formation (50,51). These mutant strains both failed to grow in ambient air ("no growth"), but grew robustly in 5% CO 2 (≈10 8 colony forming units/ml). Bar heights give the mean of counts for three biological replicates, which each represent the mean of three technical replicates. Error bars give the standard deviation of the mean. See Table S4 for full description of strains and mutations. Figure S2: Reproducibility of H. neapolitanus fitness measurements across replicate experiments in the same CO 2 environment. All CO 2 conditions were assayed via duplicate cultures with biologically independent pre-cultures, except for the 5% CO 2 condition which was assayed in biological quadruplicate. Scatterplots show the correlation between replicates for those genes which produced high confidence fitness measurements in both replicates, with known CCM genes in purple and all other genes in grey. The Pearson correlation R is given for all pairs of replicates plotted and exceeds 0.85 in all cases. Marginal distributions of per-replicate fitness effects are given by the "rug" along the axes. As CCM gene disruptions (purple) represent the largest fitness effects observed in lower CO 2 conditions, the range of fitness effects decreases with increasing CO 2 . Figure S3: Contributions H. neapolitanus CCM genes to organismal fitness across five environmental CO 2 concentrations. As in Figure 2, data derive from batch competition assays of a barcoded whole-genome insertional mutagenesis library (RB-TnSeq) developed in (36). Data for ambient and 5% CO 2 conditions are reproduced from that reference, while data 0.5%, 1.5% and 10% CO 2 conditions were collected for this study. Each competition assay was performed in duplicate, except for the 5% CO 2 condition, which was performed in quadruplicate (i.e. biological duplicate in each study). We manually divided CCM-associated genes into several categories based on their known or presumed roles. The correspondence between genes and categories is given Table S1. The figure plots the fitness effects of knockouts for each gene category as a function of the CO 2 level and include three additional categories of genes omitted from Figure 2: putative transcriptional regulators of the CCM, rubisco chaperones, and the non-carboxysomal Form II rubisco ("non-carboxysomal rubisco"). The presence of a non-carboxysomal rubisco explains why mutations disrupting the carboxysomal enzyme are not very deleterious in 5-10% CO 2 : the secondary rubisco is expressed in those conditions (52). The interpretation of fitness results is complicated by genetic redundancy for several other gene categories as well. For example, the H. neapolitanus genome encodes 6 carboxysome shell proteins, which differ in their abundances (53) and could have overlapping roles in the carboxysome structure (45,54). Five of these proteins are encoded by genes in the major carboxysome operon (36,45), which can cause polar effects where the knockout of an upstream gene has a larger effect due to perturbation of transcription of genes encoded downstream (55). Likewise, H. neapolitanus has two DAB-type Ci uptake complexes. These complexes are encoded by 2-3 genes each and are both functional when expressed in E. coli (36,56), which may explain the complex CO 2 -dependent phenotypes observed for "Ci transport" genes. The "regulation" and "rubisco chaperones" categories are more ad-hoc, as they group multiple genes with poorly-documented roles. Knockout of the rubisco chaperone acRAF, for example, is associated with sizable CO 2 -dependent fitness defect, though it is as-yet unclear what role this gene plays in rubisco or carboxysome biogenesis in bacteria (1, 57). Figure S4: Growth curves testing the effect of rubisco encapsulation on the growth of CCMB1 in various CO 2 pressures. Each panel displays four biological replicate growth curves grown in four CO 2 pressures marked. The CO 2 pressure is denoted by the shade of orange in each panel. Figure 3 plots the endpoint densities of these curves (density at 100 hours). The CCMB1 E. coli strain grows in elevated CO 2 (1.5 and 5%) when rubisco is expressed ("Rubisco Alone", top left). Expressing the full complement of CCM genes ("Full CCM", top middle) permits growth in all CO 2 levels. Omitting the DAB-type Ci transporter from this construct ("Encap. Rub. + CA", top right) nonetheless improves growth above the "Rubisco Alone" baseline in 0.5% and 1.5% CO 2 . Mutating a single amino acid on rubisco (CbbL Y72R) eliminates carboxysome localization by abolishing Csos2 binding (51). Introducing this mutation to a "Full CCM" construct ("Cytosolic Rub.", bottom left) abolishes growth in atmosphere, as reported in (1), but not in 0.5% CO 2 or higher. Therefore, carboxysome localization of rubisco is not required for robust growth in 0.5% CO 2 . Removing carboxysomal CA activity from the "Encap Rub. + CA" construct by active site mutation (CsosCA C173S) abolishes the growth improvement observed when active CA is present ("Encapsulated Rub. Alone", bottom middle). This result implies that the robust growth observed for "Cystolic Rub." and "Encap Rub.+CA" strains was due to the presence of carbonic anhydrase activity. A negative control strain carrying inactive rubisco ("Encap Rub. -", CbbL K194M) fails to grow in any condition, as expected. See Table S4 for strains, Table S5 for plasmids and Methods for growth conditions. Figure S5: Assessment of statistical significance of differences in endpoint culture densities for CCMB1 strains testing rubisco encapsulation. Data and labels are identical to Figure 3, but reordered to group different strains grown in the same CO 2 condition. P-values were calculated by comparison to the 'Rubisco Alone' reference strain using a Bonferroni-corrected two-sided Mann-Whitney-Wilcoxon test. '*' denotes p < 0.05, '**' denotes p < 0.01, and '***' denotes p <0.001. 'ns' denotes 'not significant' at the 5% threshold after Bonferroni correction.   Figure 4, but reordered to group different strains grown in the same CO 2 condition. P-values were calculated by comparison to the 'Rubisco Alone' reference strain using a Bonferroni-corrected two-sided Mann-Whitney-Wilcoxon test. '*' denotes p < 0.05, '**' denotes p < 0.01, and '***' denotes p <0.001. 'ns' denotes 'not significant' at the 5% threshold after Bonferroni correction. Figure S8: Expression of the cyanobacterial HCO 3 transporter sbtA permits growth of rubisco-expressing CCMB1 in 0.5% CO 2 . Data give the number of colonies formed by each strain in three CO 2 conditions on M9 glycerol plates, where colony forming units were counted by plating tenfold serial dilutions of a pre-culture (grown in 10% CO 2 ) in each condition. Colony forming units (CFU) are reported per OD per mL of pre-culture to account for variation in the growth of the pre-culture. Wild-type, or WT, denotes E. coli BW25113, the parent strain of CCMB1. Here WT carries two vector control plasmids (pFE-sfGFP and pFA-sfGFP) so that it is resistant to the same antibiotics as the following CCMB1 strains. As expected, WT grows in all CO 2 levels tested. "CCMB1+rubisco" denotes CCMB1:p1A+pFA-sfGFP, where the p1A plasmid expresses the carboxysomal Form IA rubisco from H. neapolitanus and a cyanobacterial phosphoribulokinase (1). Consistent with Figures 3-4, this strain did not grow in 0.5% CO 2 but did grow in 10% CO 2 . "CCMB1+rubisco+sbtA" denotes CCMB1:p1A+pFA-sbtA, which expresses the cyanobacterial HCO3-transporter sbtA (27,28,45) on the pFA backbone. Comparing to the previous strain, it is clear that sbtA expression permitted growth in 0.5% CO 2 . This is highlighted by the right panel focusing on 0.5% CO 2 . "CCMB1 neg" denotes CCMB1:pFE-sfGFP+pFA-sfGFP. This strain does not express rubisco or phosphoribulokinase and fails to grow in all CO 2 conditions tested (similar to the rubisco point mutant "En. Rub -" in Figure 3). Anhydrotetracycline (aTc) is used to induce expression from pFA and pFE plasmids; here all strains are induced with 100 nM aTc added to the agar plates (1). All CCMB1 strains failed to grow on plates lacking aTc induction (not shown).   (58,59) the k cat gives the substrate-saturated per-active site rate (top panels, s -1 units), the K M denotes the substrate concentration at which an enzyme-catalyzed reaction achieves half the k cat (middle panel, uM units) and k cat /K M gives the per-active site rate in the limit of low substrate concentrations ([S] ≪ K M ). Rubisco data is drawn from (11) and CA data from (12). Carboxysomal rubiscos are of the form I (FI) variety that is also found in land plants (11,45). The H. neapolitanus genome also encodes auxiliary form II (FII) rubisco. These isoforms typically have higher k cat values, but also lower affinity towards CO 2 , i.e. higher CO 2 K M values than FI enzymes (60). Less data is available about the kinetics of Form III (FIII) and form II/III (F2/3) rubiscos (61). Notice that K M values for FI rubiscos are comparable to CO 2 concentrations in water equilibrated with present day atmosphere at 25°C, indicated by the dashed gray line marked 0.04% CO 2 (25). Similarly, K M values associated with CA-catalyzed hydration of CO 2 greatly exceed the equilibrium CO2 concentrations. Less data is available about the kinetics of Form III (FIII) and form II/III (F2/3) rubiscos (61). The empirical median k cat /K M value is 0.2 uM -1 s -1 (interquartile range 0.17-0.27 uM -1 s -1 ) for FI rubiscos and 20 uM -1 s -1 for CA catalyzed hydration of CO 2 (interquartile range 5-98 uM -1 s -1 ).  Figure 7B showing that the model exhibits two regimes: one wherein growth is limited by rubisco flux and another where it was limited by bicarboxylation flux. At low rubisco levels (lighter-colored lines), growth is rubisco-limited: increasing rubisco activity (darker lines) produced faster growth, but the growth rate was insensitive to increasing because slow CO 2 hydration provided sufficient HCO 3 to keep pace with rubisco. At higher rubisco levels (maroon lines), growth was bicarboxylation-limited and increasing was required for increasing rubisco activity to translate into faster growth. (B) Varying Ci uptake activity led to similar effects. As we assume a spontaneous level of CO 2 hydration even in the absence of CA ( = 10 -2 s -1 ), very high values can increase growth by producing CO 2 for rubisco in the rubisco-limited regime. This phenomenon is only apparent at when is implausibly large and the rubisco activity is small, but is nonetheless instructive for understanding the distinctions between CA and energized Ci uptake. (C) As our co-limitation model is linear, varying the external CO 2 concentration produces a proportional increase in the rubisco flux. Additionally, because we assume extracellular HCO 3 and CO2 are in equilibrium with respect to the pH, H out increases proportionally with C out and supplies sufficient HCO 3 by passive diffusion and spontaneous hydration of CO 2 . However, notice that growth does not increase in proportion with rubisco activity as in panels A-B (solid lines represent values evenly-spaced on a log scale) because, at higher = q values, passive diffusion and spontaneous hydration of CO 2 are insufficient to supply HCO 3 required for a proportional increase. This can be seen by considering the difference between the solid maroon line (CA = 10 -2 s -1 ) and the dashed one ( = 10 s -1 ).
Figure S13: Rubisco and bicarboxylation-limited growth regimes in the co-limitation model. In each panel, the x-axis gives the CA activity in s -1 units and the y-axis the rubisco activity in the same units. Color in the filled contour plots gives the quantity named in each panel title. We set CO 2 permeability = 1.2x10 4 s -1 and HCO 3 permeability = 1.5 x 10 -2 s -1 as calculated in the supplementary text. The Ci uptake activity was set to 0 for all panels. (A-B) C in and H in are the intracellular CO 2 and HCO 3 concentrations, respectively. Notice that Cin varies little over orders of magnitude changes in and is independent of CA activity as discussed in the main text. (D-E) J L,C = -(C out -C in ) and J L,H = -(H out -H in ) represent the flux of CO 2 and HCO 3 leakage from the cell. J L,C is positive when C in > C out and negative when C in < C out and there is net passive diffusion of CO 2 into the cell. As we set = 0, both leakage fluxes are uniformly negative here, connoting passive uptake of both CO 2 and HCO 3 -. (F) J L,tot = J L,C + J L,H is the total flux of Ci leakage from the cell. Notice that J L,H contributes negligibly to J L,tot here because no HCO 3 is pumped when = 0. (G) The rubisco carboxylation flux is calculated as C in . Given these permeability values, the rubisco flux is independent of CA activity ( , x-axis) because passive diffusion of CO 2 across the membrane is sufficient to supply even very high rubisco activities ( , y-axis). In contrast, panel (H) gives the bicarboxylation flux H in , which varies with both and . The dependence on is an artifact of our assumption that bicarboxylation capacity is proportional to . The dependence on is due to the value of , which is low enough that passive diffusion of HCO 3 across the cell membrane is insufficient at higher = / q. (I) The flux to biomass is calculated as J B = min( C in , H in / q). When rubisco activity is low, J B is rubisco-dependent, i.e. depends on but not on . When is larger, however, J B can be bicarboxylation-limited, i.e. depend on (via bicarboxylation) but not on . Panel (C) gives J L,tot / J B as a proxy for the energetic efficiency of growth. Here this value is always negative because J L,tot < 0. Figure S14: Unrealistically low CO 2 permeabilities permit the co-limitation model to concentrate CO 2 intracellularly. In each panel, the x-axis gives the CA activity [s -1 ] and the y-axis the rubisco activity [s -1 ]. Color in gives the quantity named in each panel title. Here CO 2 permeability = 12 s -1 , HCO 3 permeability = 1.5 x 10 -2 s -1 and Ci uptake activity = 100 s -1 for all panels. (A-B) C in and H in give intracellular CO 2 and HCO 3 concentrations, respectively. Given the low CO 2 permeability and Ci uptake capacity , it is possible for the model to pump CO 2 such that C in ≫ C out = 10 uM. (D-E) J L,C = -(C out -C in ) and J L,H = -(H out -H in ) represent the flux of CO 2 and HCO 3 leakage from the cell. As we use a large value of , both leakage fluxes can adopt large positive values here. (F) J L,tot = J L,C + J L,H is the total flux of Ci leakage from the cell. Notice that J L,H contributes substantially to J L,tot here because of substantial HCO 3 pumping ( ≫ 0). (G) The rubisco carboxylation flux is calculated as C in and depends strongly on because CA activity produces CO2 from pumped HCO 3 as shown in panel A. Panel (H) gives the bicarboxylation flux H in , which also varies with and . The dependence on is an artifact of our assumption that bicarboxylation capacity is proportional to . The dependence on is due to CA-catalyzed conversion of pumped HCO 3 -(the bicarboxylation substrate) into CO 2 . (I) The flux to biomass is calculated as J B = min( C in , H in / q). In contrast to Figure S13, biomass flux now depends on even at low rubisco activities . This is due to an unrealistically low value = 12 s -1 , which is 1000-fold lower than estimated and measured for biological membranes. shows the reactions between inorganic carbon (Ci) species considered here, which are the solubilization of gaseous CO 2 in water, the hydration of aqueous CO 2 to HCO 3 -, and the interconversion of hydrated Ci species -H 2 CO 3 , HCO 3 -, and CO 3 2--by protonation and deprotonation reactions. As in (7), we define [H total ] to be the sum of the hydrated species. Panel (B) gives the speciation of H total as a function of the pH at an ionic strength of 0.2 M, calculated from formation energies reported in (62). (C) gives aqueous concentrations of CO 2 , H 2 CO 3 , HCO 3 -, and CO 3 2as a function of the atmospheric CO2 concentration (pH 7.0 and I = 0.2 M) assuming that CO 2 is in Henry's law equilibrium with the atmosphere (39).