Occurrence and stability of hetero-hexamer associations formed by β-carboxysome CcmK shell components

The carboxysome is a bacterial micro-compartment (BMC) subtype that encapsulates enzymatic activities necessary for carbon fixation. Carboxysome shells are composed of a relatively complex cocktail of proteins, their precise number and identity being species dependent. Shell components can be classified in two structural families, the most abundant class associating as hexamers (BMC-H) that are supposed to be major players for regulating shell permeability. Up to recently, these proteins were proposed to associate as homo-oligomers. Genomic data, however, demonstrated the existence of paralogs coding for multiple shell subunits. Here, we studied cross-association compatibilities among BMC-H CcmK proteins of Synechocystis sp. PCC6803. Co-expression in Escherichia coli proved a consistent formation of hetero-hexamers combining CcmK1 and CcmK2 or, remarkably, CcmK3 and CcmK4 subunits. Unlike CcmK1/K2 hetero-hexamers, the stoichiometry of incorporation of CcmK3 in associations with CcmK4 was low. Cross-interactions implicating other combinations were weak, highlighting a structural segregation of the two groups that could relate to gene organization. Sequence analysis and structural models permitted the localization of interactions that would favor formation of CcmK3/K4 hetero-hexamers. The crystallization of these CcmK3/K4 associations conducted to the elucidation of a structure corresponding to the CcmK4 homo-hexamer. Yet, subunit exchange could not be demonstrated in vitro. Biophysical measurements showed that hetero-hexamers are thermally less stable than homo-hexamers, and impeded in forming larger assemblies. These novel findings are discussed within the context of reported data to propose a functional scenario in which minor CcmK3/K4 incorporation in shells would introduce sufficient local disorder as to allow shell remodeling necessary to adapt rapidly to environmental changes.

Introduction was inspected here by co-expressing protein couples in E. coli, and under configurations permitting simultaneous production of the four paralogs. Experimental compatibilities between CcmK1 and CcmK2, as well as between CcmK3 and CcmK4 paralogs could be demonstrated. The formation of hetero-hexamers between CcmK3 and CcmK4 from Synechococcus elongatus PCC7942 (Syn7942) could be shown too, thus confirming and complementing results obtained on the same proteins and on paralogs from Halothece PCC 7418 (Hal7418) [34]. Such observations were reasoned on the basis of sequence identities and compensatory effects for residues implicated in inter-subunit contacts, also by means of dynamic simulations using homology models. The stability of purified hetero-associations was finally investigated by biophysical means and in subunit exchange experiments. Overall, our data together with recent findings strongly suggest that the formation of such hetero-oligomers is likely to occur in cyanobacteria. Although adding to the complexity of BMC shells, this phenomenon might play important roles in modifying the structural robustness and environmental adaptability of BMC shells.

Cloning
Full-length DNA sequences coding for shell proteins from Synechocystis sp. PCC6803, as well as for ccmK3 and ccmK4 from Synechococcus elongatus PCC7942 were synthesized (Genecust and Twist Bioscience, provided in S1 List). Sequences included stretches for N-ter or C-ter tag extensions, exception made of untagged proteins. To characterize individual proteins, sequences were cloned in pET15b using XbaI/XhoI sites. For co-expression studies, manipulations were carried out in pBlueScript II SK+ (Stratagene) after cloning between SacI and KpnI sites a synthetic sequence that comprised four cassettes with independent T7 promoter/lac operator, RBS and T7 terminators (shown in S1 List). For co-expression of protein couples, the fourth and second cassette were removed stepwise, using AvrII or BamHI/AgeI restriction enzymes (blunt ends prepared by reaction with Klenow fragment LC, Thermofisher), respectively, followed by plasmid recircularization after each step. The different ccmK sequences were integrated using SwaI/BamHI (1 st cassette), PacI/AgeI (2 nd ), MfeI/SalI (3 rd ) or BsrGI/ HindIII (4 th ), respectively. All sequences are detailed in S1 List. Treatments with BglII/BlpI permitted the transfer of final products to pET-26b. Resulting vectors were used to transform chemically-competent BL21(DE3) E. coli cells, following standard protocols.

Expression, solubility and protein purification
All cell cultures corresponding to combinations with a given His-tagged protein were carried out in parallel, applying strictly the same protocol. Handled typical volumes were often 10 to 30 mL. When the cultures growing in LB at 37˚C reached mid log phase (OD 600nm = 0.6-0.8), expression was induced with 0.2 mM IPTG (final conc.). Incubation was continued for 3-4 hours before cells were harvested at 6000 g and supernatant (SN) discarded. In studies of expression of 4 combined proteins, in order to increase yields of purified material, experiments were also carried out in ZYM-5052 auto-induction media [35]. After inoculation with 1/100 th volume of an overnight saturated pre-culture in LB, incubations were shaken at 220 rpm for 15 hours, at 37˚C.
Cellular lysis was carried out in 1/10 th of culture volume of 20 mM NaPi, 300 mM NaCl, 10 mM imidazole, pH 8), supplemented with DNase I (5 μg/mL final conc.) and lysozyme (0.05 mg/mL). Protease inhibitors aprotinin (10 μM, final conc.), leupeptin (20 μM) and pepstatin (2 μM) were also present. After incubation at room temperature with gentle agitation for 5 to 10 minutes, cells were sonicated at 4˚C. Four cycles of 30 sec sonication at 25% power, spaced by 1 min lags were applied (SO-VCX130 equipped with a 630-0422 probe, Sonics). The inhibitor phenylmethylsulfonyl fluoride was added right after the first cycle (PMSF, 1 mM). Insoluble debris were removed by centrifugation for 20 min at 20000 x g (4˚C). The supernatant (soluble fraction) was applied to cobalt-loaded TALON Superflow metal affinity resin (Clontech) preconditioned in Sol A (20 mM NaPi, 300 mM NaCl, 10 mM imidazole, pH 8.0). After thoroughly washing with Sol A, elution was performed with Sol B (300 mM imidazole in Sol A). A single fraction was collected, to which EDTA (5 mM final conc.) was added immediately after elution. SDS-polyacrylamide gels (15%) for Coomassie staining or Western Blots were run using Tris-Glycine-SDS buffer after loading heat denatured samples (95˚C, 10 min, in loading dye (LD) preparation for SDS-PAGE) prepared from freshly lysed cells, from soluble fractions and from purified material. Loaded volumes of lysed cells and purified fractions were identical for all samples corresponding to combinations with a given His-tagged protein.
In the case of soluble fractions, volumes were adjusted taking into consideration absorption values at 280 nm of supernatants.

Western blot analysis
After SDS-PAGE, gel contents were electro-transferred to a PVDF membrane (Immobilion-P, Milipore). Membranes were blocked at room temperature (rt) for 1 hr in 5% nonfat dry milk in TBS containing 0.05% Tween 20. Standard protocols were applied for subsequent treatments. Primary antibody immunolabelling was carried out for 1hr at rt with 1:2000 diluted FLAG tag mouse monoclonal antibody (FG4R, ThermoFisher). After incubation with the secondary alkaline phosphatase-conjugated goat Anti-Mouse IgG (H+L) secondary antibody, and extensive washings, blots were developed with the Sigmafast BCIP/NBT substrate.

Native mass spectrometry
Prior to analysis, protein tags were removed by incubation overnight at 25˚C with turbo TEV protease (GenWay, 5 μg/mL final) in 50 mM Tris pH 8.0 / 300 mM NaCl /2 mM DTT/ 1 mM EDTA. After exchanging buffer by Sol C, proteins were concentrated to 1-2 mg/mL. Right prior to spraying, the samples were buffer-exchanged against 150 mM aqueous ammonium acetate at pH 8 using Amicon Ultra-0.5 mL centrifugal filters (MWCO = 10 kDa; Millipore). MS measurements were performed in positive ion mode exactly as described before [23]. For tandem mass spectrometry experiments, precursor ions were isolated in the quadrupole mass analyzer and accelerated into an argon-filled linear hexapole collision cell (P = 3.0 x 10 −2 mbar). Various collision energy offsets were applied upstream of the collision cell.

CcmK3 and CcmK3/K4 homology models and molecular dynamics simulations
Homology models for CcmK3 Syn6803 homo-hexamers were built using SWISS-MODEL and PHYRE2 algorithms. Templates selected for 3D model reconstruction differed between the two, corresponding to the Syn7942 CcmK1/2 (PDB ID 4OX7, residue identity of 53%) using SWISS-MODEL, or the structure of Syn6803 CcmK4 (2A10, 47% identity) with PHYRE2. Hetero-hexamers were recomposed by replacing one of the monomers of the 2A10 CcmK4 structure by CcmK3 modeled monomers (previously superimposed by minimizing RMSD of monomer main-chain atoms). Few side-chain clashes between monomers in recomposed hexamers were relaxed using steepest descent energy minimization approaches.
Molecular dynamics simulations were carried using the AMBER14 forcefield implemented within YASARA software. After a first energy minimization within YASARA, hexamers were hydrated within a cubic cell with dimensions extending 10 Å beyond edge protein atoms, which was filled with explicit solvent. Periodic boundary conditions were applied. YASARA's pKa utility was used to assign residue protonation states at pH 7.0. The simulation cell was neutralized with NaCl (0.9% (w/v) final concentration) by iteratively placing sodium and chlorine ions at the coordinates with the lowest electrostatic potential. The cut-off for the Lennard-Jones potential and the short-range electrostatics was 8 Ǻ . Long-range electrostatics were calculated using the Particle Mesh Ewald (PME) method with a grid spacing of 1.0 Å, 4 th order PME-spline, and PME tolerance of 10 −5 for the direct space sum. The entire system was energy-minimized using steepest descent minimization, in order to remove conformational stress, followed by a simulated annealing minimization until convergence (<0.05 kJ/mol/200 steps). Simulations were run at 298 K, with integration time steps for intra-molecular and inter-molecular forces of 1 fs and 2 fs, respectively. Two identical 20 ns simulations were run starting from the same structure, but differing by the attribution of random initial atom velocities. Intermediate structures were saved every 250 ps. Dihedral angle analysis and generation of figures was carried out with scripts run within Pymol (https://www.pymol.org/).

AFM imaging
Purified protein was diluted 10 to 50-fold with 10 mM NaPi, 300 mM NaCl to pH 6. Typically, 2 μL of the solution was then dispensed onto freshly-cleaved mica and proteins were allowed to adsorb for longer than 10 min. Samples were imaged after dilution with 150 μl of the same buffer. Standard image analysis and treatments were initially performed using NanoScope Analysis software (Bruker). When necessary, AFM images were processed with 0 to 3 th order plane fitting and 0 to 3 rd order flattening to reduce XY tilt. For further details on experimental approach and instrumentation, please refer to [23].
Screening of crystallization conditions was performed using Mosquito drop dispensing automate (TTPLabtech) and crystallization screens (Qiagen). Drops were prepared by mixing 150 nL of protein solution with an equivalent volume of screening solution, at 12˚C. Crystals formed in drops prepared from reservoirs containing 22% (w/v) PEG 4000, 0.2 M ammonium sulfate and 0.1 M sodium acetate at pH 4.6. Crystals were briefly immersed in the reservoir solution supplemented with 20% ethylen-glycol before being cooled at 100 K in a cooled gaseous nitrogen flux. Diffraction data were collected on beamline ID30A3 at ESRF (European Synchrotron Radiation Facility, Grenoble, France) and processed using AUTOPROC (Global-Phasing, Cambridge, UK) and XDS.
The structure was solved using the molecular replacement method, as implemented in PHASER [37]. As the crystal potentially contained both CcmK3 and CcmK4, a model was built from the structure of CcmK4 (PDB entry 2A18), truncating all non-common side chains to alanine. The structure was refined with REFMAC [38] and COOT [39] from the CCP4 suite of programs [40], and deposited in the RCSB databank with PDB code 6SCR.

Protein thermal denaturation studies
Differential Scanning Fluorimetry was used to characterize the thermal stability of selected homo-and hetero-heterohexamers. Mixtures of 20-μl of the sample (8-10 μM final, considering hexamers) and SYPRO Orange (× 100; Invitrogen) in Sol C were subjected to a temperature gradient from 20 to 100˚C with increments of 0.3˚C every 10 sec. Measurements were performed in triplicate in 96-well plates (Bio-Rad) with a real-time PCR CFX96 System (Bio-Rad). Melting temperatures (T m ) were extracted after adjustment of the full set of data from 2-3 experiments to a sigmoidal function using PRISM software, after normalization of fluorescence intensities.
Temperature-induced aggregation was evaluated using Dynamic Light Scattering. Assays were conducted on 10 μL of the protein sample in Sol C, applying a temperature gradient from 20 to 100˚C. Experiments were performed in a Zetasizer APS instrument (Malvern™, Panalytical Ltd., Malvern, UK). Scattered intensities were measured in function of the temperature (1 point/s, 2˚C/minute), and normalized. T aggr values were extracted after adjustment of the full set of data from 2-4 experiments to a sigmoidal function using PRISM software.

Monomer exchange experiments
Untagged Syn6803 K1 and K4 proteins were prepared by overnight treatment of TALON-purified His 4 -tagged proteins following a described protocol [23]. After reaction, the solutions were flushed through TALON resins, which permitted to eliminate the TEV protease and possible unreacted protein traces. The flowthrough was collected, buffer exchanged against solution C, and concentrated to 1-2 mg/mL using 10 kDa MWCO concentrator units.
Subunit exchange studies were carried out in two ways. First, purified His 4 -K4/K3-FLAG (7 μM final conc, 25 μL total volume) was conditioned in solutions with 50 mM NaPi (at pH 9 or pH 7) or NaAcetate (pH 5), 100 mM NaCl and 100 mM ammonium sulfate. In some cases, the mixture included 14% PEG 3350. In second type of experiment, the doubly-tagged His 4 / FLAG-homo-or hetero-hexamer (2 μM final conc, 25 μL total volume) in 50 mM NaPi (pH 8.0), 100 mM NaCl, and 0.15 mg/mL (final) of BSA (added to limit protein losses) was incubated in the absence or presence of untagged K1 or K4 (6 μM). Incubations were performed at room temperature, overnight. After addition of Tris buffer (150 mM final, pH 8), the TALONresin was added (25 μL of 50% slurry, preconditioned in Sol A). The mixture was maintained at 4˚C, 5 min, with periodical shaking. After spinning down and removing supernatant to approximately dryness, 20 μL of a mixture containing LD (x1.5 final), EDTA (10 mM) and imidazole (300 mM) was added to the resin, and the mixture was heat denatured, before proceeding to WB analysis, as described.

Selection of engineered Syn6803 CcmK constructs
Prior to studying the compatibility between the different Syn6803 CcmK components, we established the expression and solubility profiles of untagged proteins in E. coli. Collected data proved good expression of all 4 paralogs. Abundant soluble material was present after lysis in all cases, exception made of CcmK3 (abbreviated K3, S1 Fig).
The impact of tagging was also examined. Short peptides were selected and placed at either the N-or C-terminus. Overall, C-ter tagging was better tolerated. Thus, CcmK1 (K1) and CcmK2 (K2) were expressed and remained soluble regardless of the peptide tag identity at Cterminus (S2 Fig). Multiple bands observed for CcmK4 (K4) pointed to proteolysis phenomena. At the N-terminus, expression was restricted to His 4 and FLAG constructs, and only K2 and K4 remained soluble. Once again, no band corresponding to soluble K3 could be detected, irrespective of peptide type and tagging side.

Hetero-hexamer formation between Syn6803 CcmK paralogs
To investigate the potential association of paralogs within hexamers, all possible couples of genes coding for CcmK proteins were engineered in a pET26b-based plasmid that included two cassettes for independent expression (each one presenting T7 promoter/lac operator, RBS and T7 terminator) (S3A Fig). Combinations of His 4 -tagged CcmK and a second FLAG-tagged CcmK were assayed in E. coli, following standard IPTG induction protocols. Bearing in mind results from previous section, plasmids coding for C-ter tagged K1, K2 and K4 were favored. Combinations with N-ter His 4 -tagged K4 were also treated to anticipate the possibility that purified yields with the C-ter tagged K4 were low due to mentioned proteolytic instability of this construct. Finally, N-or C-ter FLAG-tagged and untagged K3 were also studied, in view of the demonstrated insolubility of the K3 paralog.
Cell contents, proteins remaining soluble after lysis and centrifugation, and purified fractions recovered on TALON resins were analyzed on Coomassie-stained SDS-PAGE gels (S3B Fig). Bands indicative of the occurrence of species with slightly different apparent size permitted to directly infer co-purification of K1 and K2, irrespective of which of the two was carrying the His 4 or FLAG tag. Double bands were also noticed when K4 labeled at the C-ter with His 4 and FLAG were expressed together. Absence of proteins was noticed in purified fractions for combinations with K3. This was the case for all 5 combinations with K3-His 4 (S3 Fig, 3 rd lane). More surprising, purified proteins were also absent for combinations between K1-His 4 and FLAG-tagged or untagged K3 (white arrows). Despite less affected, K2-His 4 and K4-His 4 levels also diminished when co-expressed with K3, as compared to material purified in combination with other CcmK paralogs. The most intense bands in combinations with K3 were those of  Fig 1), irrespective of which of the two carried the His 4 or FLAG peptide. Signals were comparable to those obtained for positive controls (K1 or K2 homo-hexamers). The most remarkable observation was, however, the detection of an intense band when His 4 -K4 was combined with K3-FLAG. The intensity was, however, weaker than that observed for the K4 homo-hexamers. Some degree of cross-association was noticed between His 4 -K4 and the FLAG-K3 construct, but also with FLAG-tagged K1 and K2, the latter being confirmed by a moderate intensity band with K4-His 4 . Not surprisingly, signals were absent in combinations between His 4 -and N-or C-ter FLAG-tagged K3 constructs, which also failed to reveal any purified material in Coomassie-stained gels (S3 Fig).

Hetero-hexamer formation confirmed by native MS
BMC-H proteins are prone to assemble into large patches, observed in vitro but also during recombinant expression inside cells. Therefore, data presented in the previous section might After running SDS-PAGE gels, and transferring proteins onto PVDC membranes, detection was performed with a mouse antiFLAG primary antibody followed by a secondary IgG antimouse-alkaline phosphatase fusion. The figure was prepared from two independent western blots that were performed in parallel. White vertical lines are to highlight the resulting image discontinuities.
https://doi.org/10.1371/journal.pone.0223877.g001 be explained as being the consequence of the purification of mixed assemblies combining different homo-hexamers. To clarify this possibility, three type of experiments were carried out. First, purified fractions that according to WB data contained hetero-hexamers (i.e. combinations between K1 and K2, or between His 4 -K4 and K3) were analyzed by size-exclusion chromatography (SEC). The His 4 -K4/K3-FLAG sample eluted at volumes characteristic of homohexamers, whereas K1-His 4 /K2-FLAG showed intermediate behavior between hexamers and dodecamers. This elution behavior had been observed before for Syn6803 K2 homo-hexamers, and is assumed to arise from formation of dodecamers consisting of stacked hexamers [15,23,41]. Second, cells expressing separately each homo-hexamer (His 4 -or FLAG-tagged) were mixed up before lysis and purification. Fractions recovered from TALON resins did not reveal bands of FLAG-tagged proteins in WB. These experiments ruled out the copurification of assembled homo-hexamers, also the occurrence of monomer exchange between homo-hexamers during manipulations.
Finally, samples were inspected by native electrospray ionization mass spectrometry (native ESI-MS), an approach that is well suited for the characterization of molecular associations present in solution. This technique was exploited in a previous study to characterize His 4 -tagged CcmK homo-hexamers [23]. After improving the molecular homogeneity of TALON-purified fractions by TEV treatment, hetero-hexamers could be detected in the 3500-5000 m/z range (Fig 2). For K1-His 4 /K2-FLAG, the presence of the K2 paralog was directly evidenced by its two typical charge state distributions (CSD) at m/z 4000-5000 and 5500-6200 m/z. The second CSD, however, corresponded to species of lower MW than those detected before for the K2 homo-hexamer [23]. A zoomed view of the hexamer distribution indicated that every charge state was in fact split into several signals, matching to hexamers of different stoichiometries (shown in the inset of Fig 2A). After averaging over all charges, the calculated MW matched to K1/K2 stoichiometries ranging from 1:5 to 4:2 (S1 Table).
Deviations from values calculated for combinations of monomers were small, below 50 Da. Such monomers were seen in the 1000-2000 m/z range, with masses indicative of loss of the first methionine, something usual for C-ter tagged proteins. Definitely proving the occurrence of hetero-hexamers, the two CcmK monomers were produced when given hexamer species were subjected to collision induced dissociation (CID) (Fig 2A, bottom panel). In addition, no evidence from K2-derived species could be obtained when similar experiments were performed on material purified from pools of cells expressing K1-His 4 and K2-FLAG separately (S1 Table).
Analogous conclusions were drawn from data collected on complexes formed between His 4 -K4 and K3-FLAG ( Fig 2B). The most important evidence was the detection of the two monomers dissociating from selected hexamer species submitted to CID. Monomer masses were in excellent agreement with the expected value for TEV-untagged His 4 -K4 and K3-FLAG (after loss of first methionine with the latter). The attribution of hexamer peaks in the 3500-5000 m/z region to different stoichiometries was less straightforward than for K1/K2 complexes, a likely consequence of the small MW difference between the two monomers (113 Da, or 244 Da if K3 had lost the first Met). The most intense signal best matched a K3/K4 heterohexamer with 1:5 stoichiometry. K4 homo-hexamers were also detected, something that contrasts with data for K1/K2 that only revealed hetero-hexamers. It is also worth mentioning that faint signals were detected for a 71.1 kDa species. This MW closely matches a hypothetical K3 hexamer, which should however not be retained by the TALON resin. Alternatively, they could derive from homo-or hetero-hexamers composed of partially degraded CcmK subunits. Two other minor species, of approx. 37.7 and 66.8 kDa, could not be attributed to any n-mer combination.

Simultaneous expression of all CcmK paralogs
Co-expression of CcmK couples proved the structural compatibility between CcmK1 and CcmK2 or between CcmK3 and CcmK4 couples. This coincidently reflects the organization of each CcmK couple at separated chromosomal loci, and might therefore support an independent evolution of each CcmK pair of sequences. Our data, however, did not rule out the possibility of attaining more complex associations in situations of concomitant expression of all four paralogs. Transcriptomic data indicate that all four CcmK paralogs might be expressed simultaneously in Syn6803, depending on environmental conditions [42]. Besides, weak WB signals noticed for combinations between K4 and FLAG-tagged K1 and K2 (Fig 1) suggested that other combinations might lead to more complex associations.
A similar strategy as presented above was adopted to investigate this point, the main difference being that engineered plasmids included four cassettes permitting independent expression of proteins tagged with different short peptides: His 4 , StrepTag, FLAG and HA at cassettes 1, 2, 3 and 4, respectively. We limited our screening to eight combinations (schematized in S4A Fig). These included His 4 C-ter tagged K1, K3 and K4, as well as N-ter tagged Positive-ion mode native ESI-MS spectra are presented on top panels. These spectra are dominated by signals from multiply charged ions of hexamers. Potential higher order assemblies were observed in combinations with CcmK2. Bottom panels present an example of MS-MS collisional activation data collected on hexamer precursor ions selected from top spectra (indicated with an asterisk). An asymmetric charge partitioning is noticed, hexamers dissociation resulting in two type of monomers and either pentameric species with CcmK1/K2 (left) or tetrameric species with CcmK3/K4 (right). Monomer masses permitted to attribute picks to either CcmK1 (red spheres), CcmK2 (green), CcmK3 (orange) or CcmK4 (violet). Species m/z values and charges are indicated for major peaks. Cartoons schematically illustrate the stoichiometries of detected protein complexes and subcomplexes. Molecular weights of neutral species estimated from data from different charge states are compiled in S1 His 4 -K4 for reasons mentioned above (all in cassette 1). Depending on the identity of the paralog at first cassette, N-or C-ter FLAG tagged K3 or K4 were placed at cassette 3, and either K1-HA or K4-HA at cassette 4. All cases included K2-StrepTag at cassette 2. Additionally, a positive control was assayed that consisted of K2-His 4 combined with K2 labeled with all three other peptides at cassettes 2 to 4, and which should inform on signal thresholds attained with identical configurations leading to homo-hexamers.
The co-expression of the different combinations was screened in BL21(DE3) cells after classical induction with IPTG or under auto-induction conditions. Six of the eight screened plasmids resulted in sufficient purified protein as to be visualized in Coomassie-stained SDS-PAGE gels. Most intense bands were those obtained with the two plasmids combining

Sequence-structural considerations for the formation of CcmK3 homohexamers or CcmK3/K4 hetero-hexamers
Despite well expressed, the low solubility of Syn6803 K3 precluded the investigation of its solution behavior or attempts to elucidate its crystal structure. To understand whether this behavior could be linked to an incorrect folding of the K3 monomer or to the presence of residues incompatible with oligomerization, the amino acid sequence was first aligned with those of other Syn6803 paralogs or with CcmK from Syn7942 and Hal7418 species recently studied by Kerfeld and coworkers [34] (Fig 4A). Identity scores were considerably lower when considering members of the K3 family (56% in average) than within all other CcmK (mean 90% and 68% for K1/K2 and K4 families, respectively) (S2 Table). The conservation raised significantly when comparing positions implicated in inter-monomer contacts in CcmK1/K2 and K4 (96 and 81% average identities, respectively), as noticed before [43]. However, the increase almost vanished when residues at the presumed K3 monomer interfaces were compared (58%). The most remarkable difference, when comparing Syn6803 K3 to other CcmK paralogs, is the presence of the two ionic bulky amino acids Glu38 and Arg39 (Fig 4A) [34]. These residues, also present in Hal7418 K3 but changing to Glu38 and Ser39 in Syn7942, would likely clog the central hole of a potential CcmK3 hexamer. Also in common with the Hal7418 K3, potential secondary structure-disrupting prolines are found in Syn6803 K3 (Pro49 and Pro83). Other notable substitutions, specific of Syn6803 K3, are the replacement of a few small/  , when compared to other CcmK. The surface and ribbon representation are shown with one chain colored light blue, neighbor subunits in grey. Panel B presents the water-exposed orientation of several hydrophobic side-chains residues localized on the predicted α-helix-2 of CcmK3 (balls and sticks representation, with carbon atoms in orange, nitrogens in blue, oxygens red and sulfur atoms in yellow), as well as Glu38-Arg39 residues lining the central hole. The helix runs from the hexamer edge to the center, and is followed by the stretch comprising S67 to M69, where a residue insertion specific of Syn6803 CcmK3 occurs. The emplacement of the hexamer clogged central hole is indicated by the arrow. The model is viewed from the convex face. Panel C shows the emplacement of other diverging residues, like the Tyr33, the Leu87 or the pore-clogging Glu38-Arg39 residues. The hexamer model is viewed from the concave face and the emplacement of the clogged central pore is indicated by the arrow. Please notice that the side-chains of Arg39, Leu55, Met59 and Val68 lie on the opposite face.
To further visualize these differences, homology models were built using SWISS-MODEL (hereafter CcmK3_1 model) and PHYRE2 (CcmK3_2) algorithms. The resulting monomer models were similar, with average root-mean-squared-deviation (RMSD) of about 0.6 Å calculated for the position of 246 backbone atoms (excluding the C-ter 94-103 residues). Deviations increased slightly for recomposed hexamers (RMSD of 1.0 Å estimated for 1508 backbone atoms). Structural differences were most remarkable for three stretches (S5A Fig): residues 37 to 42 around the hypothetical central hexamer pore; the region comprised between Glu66 and Gly71 that connects helix-2 and strand-4, where the mentioned single residue insertion occurs; and the region delineated by residues Leu87 to Pro91. Besides, different conformers were proposed by the two algorithms for the side-chain of Tyr77. Similarly, the side-chain of Tyr33, which corresponds to Gly or Ser in other paralogs, was also rotated towards the protein surface in CcmK3_1, but pointing towards the monomer core in CcmK3_2. Several hydrophobic residues were also predicted to lie in the water-exposed side of an α-helix (e.g. Leu55 and Met59, which are Ala or hydrophilic residues in other sequences) on the hexamer convex surface ( Fig  4B), something that would increase the aggregation trend of Syn6803 CcmK3.
Without evident reason to favor one model over the other, the two were exploited to construct hetero-hexamers combining 5 K4 units and a single K3 monomer, and their behavior was investigated by molecular dynamics simulations, in comparison to K3 homo-hexamers (S5B Fig). The two K3/K4 hetero-hexamer models built from CcmK3_1 or CcmK3_2 displayed similar structural robustness, rearranging slightly over the first nanoseconds but remaining basically unchanged for the rest of the simulation. Remarkably, a closer inspection indicated that the conformation of Tyr33 and Try77 side-chains of CcmK3_2 model rearranged over the course of simulations to adopt a similar disposition as in the CcmK3_1 monomer. In agreement with these data, the CcmK3_1 homo-hexamer model seemed structurally stable, contrasting with the CcmK3_2-based homo-hexamer that failed to reach convergence. Results were virtually identical in two independent 20 nanosecond MD simulations runs that only differed by the attribution (random seed) of initial atom velocities.
A visualization of modeled structures suggested some interactions that might contribute to contacts between CcmK3_1 and CcmK4. One example could be an ionic interaction between the side-chains of CcmK4 Arg38 and CcmK3 Asp35 ( The electrostatic properties calculated for the CcmK3_1-based homo-hexamer are well different from those of other paralogs (S6B Fig). The isoelectric point (pI) estimated by PropKa for the CcmK3_1 structure is 7.0, which compares to pI 6.0 and 5.2 for CcmK1 and CcmK4, respectively. The most striking difference was noticed on the electrostatic surface potential from the concave face side, which seemed inversed with regard to those from other paralogs at neutral pH, as pointed out by Sommer et al. [43]. The differences were attenuated when a single CcmK3 monomer was modeled within the CcmK4 assembly (pI 5.3 estimated for the hetero-hexamer). Yet, mostly caused by the presence of K3 Glu38 and Arg39, the physical properties of the hetero-hexamer pore would be substantially modified, and the hexagonal symmetry broken.
Overall, the consideration of 3D homology models and results of MD simulations suggest that the stability of CcmK3 might be higher when embedded with CcmK4 than in homo-hexamers. Furthermore, the augmentation of the hydrophobicity of the Syn6803 CcmK3 monomer surface, which would cumulate in a homo-hexamer, might be alleviated when combined to CcmK4 in hetero-hexamers.

Structural investigations of Syn6803 CcmK3/K4 hetero-hexamers
Two experimental approaches were followed to characterize structurally the K3/K4 heterohexamer behavior. First, its assembly behavior was monitored by atomic force microscopy (AFM) and compared to the K4 homo-hexamer (Fig 5A, left). Despite formation of 2D patches noticed in some experiments for K3/K4, the assemblies were smaller and less regular than those obtained with His 4 -tagged K4 homo-hexamers (Fig 5A, right), which basically reproduced previous published results [23]. Assembly plane with K3/K4 samples positioned about 2.8 nm above mica surfaces, though this value is likely an under-estimate induced by high surface coverages. The degree of organization on mica of CcmK1/K2 hetero-hexamers was even lower. Images revealed only the presence of individual hexamers, of clusters of variable size and of some linear arrangements (Fig 5B). We hypothesize that the decreased assembly potential of the last sample could be in part due to its high molecular heterogeneity, as shown by the variable stoichiometries determined by native ESI-MS.
The crystallization of K3/K4 hetero-hexamers was next attempted. We opted to increase K3/K4 abundance by means of pIsep-FPLC, a chromatofocusing-like purification approach (Fig 5C). Our intention was to exploit the expected strong interaction with the cationic resin of the anionic DYKDDDDK FLAG peptide in fusion to K3. In that manner, the presumed His 4 -tagged K4 homo-hexamers could be removed, as indicated by the WB analysis of purified fractions that proved that most of FLAG signal was eluting with the second (most abundant) chromatographic peak. The pooled fractions collected for this peak were used for crystallographic assays.
A condition resulted in a hexagonal crystal form diffracting X-rays to 1.80 Å resolution and displaying two molecules in the asymmetric unit. Unexpectedly, although both CcmK3 and CcmK4 were present in the crystallization solution, the corresponding refined structure unambiguously showed the sole presence of K4 subunits in the asymmetric unit, which organize as canonical homo-hexamers as the result of space group symmetry (structure deposited with PDB code 6SCR, statistics presented in the S3 Table). The monomer structures were basically identical to previously solved structures, with only 0.25 and 0.23 Å RMSD for 408 backbone atoms, when compared to structures deposited in the RCSB databank with PDB codes 2A18 and 2A10, respectively. The only novel feature was the observation of a 2D arrangement with similarly-oriented K4 hexamers (Fig 5D), which therefore differs from the stripped organization described before for the same protein [13], and provides an additional proof of the high structural plasticity of BMC-H proteins.

Evaluation of Syn6803 CcmK hetero-hexamer stability
CcmK3 absence in crystals prepared from samples of K4/K3 hetero-hexamers pointed to monomer rearrangements occurring during crystallization. To shed light on this possibility, purified His 4 -K4/K3-FLAG was first incubated overnight at variable pH, and under conditions resembling those that resulted in the formation of CcmK4 crystals, and FLAG signals remaining associated to the His 4 -tagged component were quantified by WB after sedimentation of material retained bound to TALON beads. These experiments failed to reveal any signal drop (Fig 6A, top), independently of the incubation pH or of the presence of the PEG additive, which was included at lower concentrations than for crystallization assays in order to limit protein precipitation. In a similar experiment, we sought to enhance monomer exchange by incubating the K3/K4 hetero-hexamer (combining His 4 and FLAG monomers) with a 3-fold molar excess of untagged K4 or K1 homo-hexamers. Exchange of K3-FLAG monomer for an untagged subunit was expected to result in a diminution of FLAG readings. However, K4/K3 was as stable as the K4 homo-hexamer, signals remaining similar for incubations carried out in the absence or in the presence of untagged hexamers (Fig 6A, bottom).
Absence of monomer exchange confirmed the consensual vision of BMC-H components as being robust. Similar conclusions had been made in other studies that were conducted on homo-hexamer mixtures and monitored by crosslinking high-mass MALDI-MS or native ESI-MS [44]. Nevertheless, we decided to compare the thermal stability of heterovs homo- hexamers by Differential Scanning Fluorimetry (DSF). This technique often permits the monitoring of protein unfolding processes that expose hydrophobic patches, causing an increase of the fluorescence of a probe. In that manner, Syn6803 K1/K2 and K3/K4 hetero-hexamers exhibited denaturation profiles with strikingly similar midpoint melting temperatures (T m ) of 58.7 and 59.6˚C, respectively (S4 Table, Fig 6B). In comparison, T m of 89.2 and 62.3˚C were measured for K4 and K2, respectively, whereas fluorescence readings with the K1 homo-hexamer changed slowly and weakly with temperature, which prevented the determination of a T m value.
Thermal stability was further assessed by Dynamic Light Scattering (DLS), which is very sensitive to aggregation phenomena that often accompanies protein unfolding. Yet, care must be taken in interpreting results, especially considering that CcmK are subject to auto-assembly. Overall, measurements of the light scattered intensities upon augmenting the temperature produced similar trends as DSF and confirmed the lower stability of hetero-hexamers (Fig 6B). After fitting DLS data to sigmoid functions, T aggr of 71.3 and 61.3˚C were calculated for K1/ K2 and K4/K3 hetero-hexamers, respectively (S4 Table). The value for the former was significantly higher than measured by DSF, suggesting that the fluorescent probe could have played a deleterious effect. On the contrary, values estimated by DSF and DSL for K4/K3 were similar. Most importantly, T aggr values were displaced towards higher values for homo-hexamers. Thus, a T aggr of 80.6˚C was calculated for K2. It is noteworthy that T aggr values could not be measured for K1 and K4, the light scattering intensity augmenting slowly and unstably in experiments with the first, or starting to occur when the T was above 90˚C with the second (Fig 6B). Top, FLAG signals recovered bound to TALON beads after reacting with purified His 4 -K4/K3-FLAG (7 μM monomer concentration) that had been pre-incubated overnight, at pH ranging between 5 and 9, in the absence (top lane) or presence of 14% PEG 3350 (bottom). The first lane corresponds to a sample at pH 7 that was maintained at 4˚C throughout the overnight incubation. The bottom panel presents FLAG signals detected for indicated homo-and hetero-hexamers combining His 4 -and FLAG-tagged subunits (2 μM final monomer concentration, indicated by black or violet letters, respectively) that were incubated overnight at room temperature and pH 8 in the absence or in the presence of untagged K4 or K1 homo-hexamers (6 μM monomer). B, Thermal denaturation monitored by DSF (top) and DLS (bottom). Left panels present data collected for CcmK1 (triangles) and CcmK2 (circles) homo-hexamers, as well as for CcmK1/K2 hetero-hexamers (squares). Right panels show data for CcmK4 (squares) and CcmK3/K4 (circles). In DSF experiments, the proteins were incubated with the Sypro dye and the fluorescence of the probe was monitored as temperature was increased. For DLS assays, changes of the intensity of scattered light of hexamer solutions with temperature were recorded. DSF and DLS data were normalized before analysis. The mean of multiple measurement (see S4 Table) is shown with black symbols as well as the adjusted sigmoids (red lines). For each trace, only one every two recorded data points are displayed to facilitate figure interpretation.

CcmK3 and CcmK4 paralogs from Syn. elongatus PCC 7942 also form hetero-hexamers
To investigate whether CcmK hetero-oligomerization might be common to other species, we applied the same experimental approach to the study of association between K3 and K4 paralogs of the model Syn7942 β-cyanobacteria. Plasmids were prepared permitting the co-expression of both isoforms tagged at N-or C-terminus with His 6 -and FLAG-peptides. Protein expression, solubility and purification in E. coli were assessed by the same means as for experiments with the Syn6803 paralogs. Controls were set up to quantify signals obtained when the same paralog (CcmK4) was co-expressed tagged with both His 6 and FLAG tags. By the time of realization of these experiments, Kerfeld and colleagues published evidences on the formation of hetero-hexamers between Syn7942 K3 and K4 [34]. We decided nevertheless to complete this portion of our study, especially because differences between the two experimental designs might permit the opportunity to obtain complementary information.
Bands corresponding to K3 co-purifying with His 6 -tagged K4 were evident in Coomassiestained gels (regardless of tag emplacement) (S7P Fig, bottom). K3 bands were nevertheless much fainter than those of K4-His 6 , contrasting with similar intensities detected for cellular or soluble fractions (S7C/ S7S Fig). The ratio of K3/K4 intensities was higher for His 6 -K4 coexpressed with K3-FLAG, when induction was triggered with IPTG, but the effect can probably be attributed to a much lower expression of the His 6 -K4 partner. An interesting observation was that, unlike Syn6803 K3 that is basically insoluble, Syn7942 K3 was still present in the soluble fraction when expressed alone, and could be even purified. Also noticeable, purified K4 bands were broad in the Coomassie gels, suggestive of potential degradation of the protein, somehow resembling observations on Syn6803 K4.
The presence of K3/K4 hetero-hexamers was confirmed on WB. The intensity of FLAG bands in purified fractions was lower for samples containing K3-FLAG than in combinations resulting in K4 homo-hexamers (Fig 7). Taking into consideration that FLAG signals were , whereas data on the right correspond to samples produced under autoinduction conditions. Detection on PVDC membranes was performed with a mouse antiFLAG primary antibody followed by a secondary IgG antimouse-alkaline phosphatase fusion. Two raw data images were spliced and rearranged to prepare this figure.

Discussion
The structural characterization of BMC shells is important to understand function, also for engineering novel structures with properties tailored to new applications in synthetic biology. Among other properties, shell permeability, compartment robustness and adaptability to variable environmental conditions will depend on molecular properties of assembly "bricks". Until recently, shell components were presumed to be homo-oligomeric associations exclusively. This scenario, however, neglected the possibility that multiple protein paralogs present in a given organism could cross-associate to form heteromers.
Several lines of evidence advocated the formation of such hetero-associations. Bioinformatics surveys have proven the existence of multiple BMC-H (up to 15 genes), BMC-T (up to 5) and BMC-P (up to 7) in given organisms [3]. Very often, several paralog genes are found within a same operon, and are expected to result in simultaneous protein expression. Moreover, paralog homologies are high. Another indication might be the existence of regulatory mechanisms to ensure the assembly of a single type of BMC in the many organisms that are equipped with several BMC candidates [45]. An example is the repression in Salmonella enterica of transcription of the ethanolamine utilization (Eut) BMC by 1,2-propanediol (1,2-PD), the substrate (and inducer) of the propanediol utilization (Pdu) BMC [46]. In this study, phenotypic changes of growth on 1,2-PD, symptomatic of a disrupted BMC shell, were noticed when the microorganism was engineered to permit the unregulated expression of Eut proteins or of CcmO from cyanobacteria. We hypothesize that such regulatory mechanisms would serve to control the intrinsic structural promiscuity of BMC shell components, both at the monomer and hexamer levels.
This study demonstrate the structural compatibility between Syn6803 K1/K2 and K3/K4 hetero-hexamers. Data were more consistent for the Syn6803 K1/K2 combination, results being unaffected by the choice of tagging configuration (Fig 1). WB signal intensities were comparable to those measured for FLAG-labeled homo-hexamers, suggesting that K1/K2 might compete efficiently with homo-hexamer formation in vivo. Moreover, analysis of purified samples by native MS/MS, which confirmed the association of subunits within the same hexamer, permitted to detect almost all possible stoichiometries, supporting the good structural miscibility between the two paralogs.
Most remarkable was the purification of Syn6803 K3 complexes in combination with K4, especially considering that K3 is recurrently found insoluble when expressed alone in E. coli. The formation of the Syn6803 K3/K4 hetero-hexamer was first noticed for the combination between the K3-FLAG and His 4 -K4. This observation was confirmed when the four CcmK were co-expressed together, which in addition demonstrated the viability of other combinations (i.e. K3-His 4 /FLAG-K4). These experiments also highlighted the structural segregation of K1/K2 from K3/K4 paralogs, the two pairs of paralogs being coincidently encoded together but in separated operons of Syn6803. Thus, cross-associations between His 4 -labeled K4 and K1 or K2 that insinuated as faint WB signals in studies with CcmK couples (Fig 1), were absent when all CcmK were expressed together, indicating that they might be irrelevant or only happen under given relative expression regimes.
Hetero-hexamers also formed upon co-expression of Syn7942 K3 and K4, confirming and complementing conclusions reported recently on the association of K3/K4 from Syn7942 or Hal7418 [34]. In that study, a His 6 /StrepTag tandem purification approach permitted to establish a 2:4 K3:K4 association stoichiometry, which we also detected for Syn6803 paralogs together with the most prominent 1:5 hexamer. In our study, we also sought to estimate the extent of competition between processes leading to heteroversus homo-hexamers. Although the interpretation is still complicated by parameters such as relative expression/solubility levels of each paralog in E. coli or tag tolerance differences among CcmK paralogs, the overall trend indicated that K3/K4 formation is relatively inefficient. The conclusion is based on: i) weaker Syn6803 K4/K3 WB signals, when compared to FLAG-tagged CcmK4 homo-hexamers; ii) Coomassie bands for K3-His 4 significantly less intense than those of copurifying FLAG-K4 (S4B Fig, case 3); iii) detection by native ESI-MS of CcmK4 homo-hexamers contaminating purified K3/K4 samples (also inferred by pISep-HPLC); iv) the most abundant stoichiometry of K3:K4 is 1:5. In our hands, the competing potential of Syn7942 K3 seemed even lower, as evidenced the weaker WB signals for purified K3/K4 samples when compared to K4/K4 homo-hexamers (Fig 7), and the small K3:K4 ratio of Coomassie band intensities in K3/K4 purified samples, which contrasted with comparable or higher K3 intensities detected in soluble fractions (S7 Fig). A close inspection of the position of Syn6803 K3 and K4 interfacial residues pinpointed compensatory residue substitutions that might contribute cooperatively to the cross-association. However, we failed to get proofs in support of a co-evolution of pairs of K3/K4 residues using specialized algorithms (i.e. MISTIC or EVcomplex). Hetero-hexamer stability was supported by molecular dynamics trajectories of 3D hetero-hexamers built from combinations of a CcmK4 crystal structure and 3D homology models. Thermal denaturation data validated experimentally this view (measured T m values of about 60˚C are for instance comparable to values measured for the HIV-1 capsid protein by similar means [47]), also experiments that showed that hetero-associations are reluctant to exchange their composing monomers. However, DSF and DLS data indicated that Syn6803 K3/K4 and, more surprisingly, K1/K2 heterohexamers denature at lower temperatures than the homo-hexamers. The crystallization of CcmK4 from CcmK3/K4 samples might also point to a higher stability of homo-hexamers.
Understanding the reason of CcmK paralog multiplicity remains challenging. Globally, data collected on deletion mutants suggest that CcmK function might differ depending on the β-cyanobacteria, and possibly in relation with this, on environmental conditions. Redundant roles were initially proposed for Syn7942 CcmK3 and CcmK4 to explain that photoautotrophic growth was compromised only with the ΔccmK3-ΔccmK4 double mutant, but not with each individual knockout strain [48]. These observations were partly revoked recently by experiments that showed that the ccmK4 knockout grew 2.5 times slower than the WT strain [34], similarly to early observations on a ΔccmK4 Syn6803 strain [49]. These data would therefore point to CcmK3 and CcmK4 playing different roles. Though deletion of Syn7942 ccmK3 gene was without phenotypic consequence, the systematic co-occurrence of ccmK3/ccmK4 genes in 206 out of 227 β-cyanobacteria genomes [43] should be taken as indicative of a non-redundant role. That none of the two occurs alone strongly suggests that at least one of the functions of CcmK3 and CcmK4 must be different from each other and possibly complementary. The importance of CcmK3 could have been missed if it manifested only under certain environmental conditions that were not covered by previous studies. A precedent supports this possibility: the variable growth sensitivity of Syn7942 to deletion of the ccmK4 gene, depending on the culturing pH [34].
A shell-capping role was recently proposed for Syn7942 K3/K4 hetero-hexamers [34]. In such a scenario, K3/K4 associations would position atop other hexamers embedded on the shell layer, presumably modifying their permeability. The interaction would be mediated by ionic residues (E98/R101) present in the C-terminal helix of Syn7942 CcmK3. The proposition was based on the characterization by size-exclusion chromatography of K3/K4 species with the apparent size of a dodecamer (supposedly two stacked hexamers), and on the assumption that K3 assembly must be hampered in virtue of the replacement of three residues that participate to inter-hexamer contacts with all other CcmK, and generally across BMC-H. Actually, AFM data presented here demonstrated a decline in assembly tendency of Syn6803 K3/K4 and more surprisingly of the K1/K2 hetero-hexamers-residues implicated in hexamer contacts are identical in K1 and K2, indicating that the assembly defects are elicited differently (e.g. the different tendency of each paralog to form curved assemblies [23])-. Also agreeing with a peripheral role of K3/K4 complexes, the two YFP-labeled proteins were visualized in carboxysomes within Syn7942 cells [50], whereas only CcmK4 was detected by MS when wild-type carboxysomes were purified [51]. This suggests that peripheral K3/K4, but not shell integrated K4 homo-hexamers, might have been lost during carboxysome isolation. Unfortunately, other known carboxysome components (e.g. CcmO, CcmN) were neither detected, making difficult to conclude on the real reasons behind the absence of CcmK3 detection. Investigating the fate of YFP-labeled K3 and K4 after purification of the corresponding fluorescent carboxysomes might permit to clarify this point. It is important to point out, however, that whilst Syn6803 K1/K2 displayed the hexamer-dodecamer equilibrium characteristic of the K2 paralog, SEC-HPLC and native ESI-MS data proved that Syn6803 K3/K4 behaves as hexamer in solution (fitting with a replacement of E98 by Ala in Syn6803 K3). Although we cannot exclude that dodecamers formed under unexplored experimental conditions, our data presently disagree with a shell-capping function of Syn6803 K3/K4.
In our opinion, a scenario with K3/K4 associations embedding within shells should not be fully ruled out. AFM data indicated that K3/K4 assemblies form, although giving rise to 2D networks of lower quality than those obtained with K4. The low K3:K4 stoichiometry and strong cooperativity of interactions established by shell components might still permit K3/K4 incorporation within shells. Indeed, AFM data proved that the mutation of one or even two of the key residues for inter-hexamer contacts does not always suffice to abolish BMC-H assembly [22,23]. Similarly, R79A and N29A PduA mutants were found to accompany purified Pdu BMC, albeit with phenotypic features indicative of damaged shells [52]. If such scenario was right, local defects at contacts between K3 and neighboring hexamers might then constitute entry points for the exchange of subunits or the action of dedicated editing machineries (e.g. chaperones or proteases). Though not compulsory, the lower stability of K3/K4 hetero-hexamers might facilitate the remodeling/editing processes. In that manner, shell properties would be readapted to changes of environmental conditions in a simpler and less energy-consuming manner than if new compartments had to be built. Transcriptomic data indicated that the Syn6803 K3-K4 operon is active under almost the full set of screened culturing conditions, whereas transcription of the K1/K2-containing operon seemed triggered by light [42]. It is therefore possible that the K3-K4 to K1-K2 ratio in Syn6803 shells shifts depending on cellular age and/or environmental conditions, as suggest the increase of the number of K3 and K4 functional units measured for Syn7942 carboxysomes, when comparing cells growing in the presence of 3% CO 2 or under atmospheric air conditions [50]. Carboxysome aggregation revealed in early stages of biogenesis and later migration within cells might also fit this model [53,54], especially considering that Syn7942 K3 (also K2) seem to interact with components that work to maintain a correct carboxysome distribution [55]. Intriguingly, carboxysome aggregation was also noticed inside a ΔCcmK3/K4 Syn7942 double knockout cells [48].
CcmK3 might also serve to regulate the incorporation of other CcmK in shells. Thus, the expression of K3 in E. coli was found to alter the solubility of other co-expressed paralogs (S3 Although these studies were conducted in E. coli, the observation of K3/K4 compatibility between Syn6803, Syn7942 and Hal7418 paralogs strongly suggests that hetero-hexamers must occur in cyanobacteria too. Additional investigations are necessary to clarify this point, also to establish functional differences among species, to clarify paralog redundancies and to address more specifically hetero-hexamer function. With the exception of the mentioned study of the ΔCcmK4 strain, the physiological importance of individual Syn6803 CcmK1, CcmK2 or CcmK3, or of combined CcmK1/K2 or CcmK3/K4 couples remains unexplored, to the best of our knowledge. Another point that would merit attention is the study of crossinteractions implying BMC-T too (also belonging to Pfam000936). Indeed, we collected preliminary data validating some associations with Syn6803 CcmO and CcmP, and n-ESI-MS data were presented in a Ph.D. thesis, together with a likely over-simplified interpretation that needs to be reconsidered [44]. Future experiments are also necessary to investigate the connection between regulatory mechanisms and BMC-H promiscuity in species harboring several BMC, also to ascertain whether disrupted shells, which form in Eut-unregulated Salmonella enterica, could be caused by the potential integration in shells of heteromers combining monomers from different BMC types (and not of homo-hexamers from different BMC) [46].
Some published data will also need to be reinterpreted in the light of novel findings. For instance, Cai et al. presented data (basically fluorescence images) to support the formation of BMC chimeras integrating CsoS1 (i.e. homo-hexamers) from Prochlorococcus marinus str. MIT9313 into β-carboxysome shells from Syn7942 [7]. Yet, the formation of hetero-hexamers combining CsoS1 and CcmK subunits should not be excluded. In fact, considering that the YFP domain attached to CsoS1 monomers is considerably bulkier than SUMO domains exploited by the same authors to prevent BMC-H assembly in vitro [56], the resulting Cso-S1-YFP hexamers should also be assembly-incompetent. If this reasoning is correct, the fluorescence puncta observed in vivo should be the result of an incorporation of CsoS1-YFP/ CcmK hetero-hexamers in the compartment shell.
Supporting information S1  Coomassie-stained SDS-PAGE views of cellular and soluble fractions prepared from BL21 (DE3) strains transformed with pET15b-based plasmids permitting expression of CcmK proteins tagged with the indicated peptides. On top, total cellular expression levels are shown, bottom part is for material remaining in supernatants after lysis and centrifugation at 20.000 g. A, results collected for the N-ter tagged protein versions. B, similar data collected for C-ter tagged proteins. Please notice that the relative vertical positioning of bands might differ slightly, as a consequence of the image mounting process. Images from seven different gels needed to be spliced and rearranged to prepare the figure. (TIF)

S3 Fig. Hetero-hexamer formation with combined Syn6803
CcmK paralogs. A, Schematic representation of constructed expression vectors. Two ORFs coding for CcmK proteins tagged at either N-or C-terminus with His4 and FLAG peptides were engineered between T7 promoter and terminator sequences (blue and empty circles, respectively). Studied cases are listed in the table. In bold and underlined are the combinations of the same paralog expected to produce doubly-tagged homo-hexamers (positive controls). Black letters are used for His4-carrying subunit, violet for the FLAG-tagged proteins. B, Coomassie-stained SDS-PAGE showing total cellular contents (C), soluble material remaining in supernatants after lysis and centrifugation (S) and purified fractions (P). Only the portion of the gels presenting the region where CcmK monomers appear is shown. The approximate position of the different co-expressed partners is indicated on the right. Please notice that, as a consequence of partial proteolysis of C-ter CcmK4 proteins, their emplacement cannot be unambiguously indicated. Besides, the relative vertical positioning of bands in gels might be slightly erroneous, as a consequence of image mounting process. White arrows are to indicate the absence of CcmK1-His 4 soluble bands in combinations showing expression, whereas black arrows highlight CcmK1-FLAG and CcmK2-FLAG solubility that seems unaffected by the presence of CcmK3. The figure combines data from six different gels spliced and rearranged appropriately. (TIF)

S4 Fig. CcmK hetero-association occurrence upon co-expression of all Syn6803 CcmK. A,
Schematic representation of screened vectors. Four ORFs coding for CcmK proteins tagged at either N-or C-terminus with His4 (black), Strep (blue), FLAG (violet) or HA (red) peptides were engineered between T7 promoter and terminator sequences (blue and empty circles, respectively). Studied combinations are listed in the table below. Whether asterisk is written on before or after the paralog name denotes N-or C-tagging emplacement, respectively. B, Coomassie-stained SDS-PAGE showing total cellular contents. (C), soluble material remaining in supernatants after lysis and centrifugation (S) and purified fractions (P) prepared after culturing BL21(DE3) transformed with plasmids schematized in panel A in auto-induction media. Results are organized as in the table, depending on the position of the FLAG-tag with regard to the protein placed on the third cassette (violet). Only the portion of the gels presenting the region where CcmK monomers appear is shown. White arrows indicate an absence of CcmK1 soluble bands in combinations showing expression. C, Western blot analysis of TALON-purified fractions. Detection on PVDC membranes was effected stepwise, a first incubation being performed with a mixture of mouse antiFLAG and mouse antiHA, followed by a second incubation with a mixture containing a secondary IgG antimouse-AP fusion plus streptactin-AP conjugate. The vertical position of protein bands in images B and C are not strictly matched with each other. The preparation of panels B and C required cropping and rearrangements from data collected in two gels, for each case. Please refer to M&M for further details. whereas on the right are presented data from samples produced under auto-induction conditions. Top lines indicate the combination of isoforms co-expressed together, as well as the tagging identity. Tag emplacement at either N-ter or C-ter is indicated with an asterisk written before or after the paralog abbreviation, respectively. The figure combines data from four different gels, which needed to be spliced and rearranged. (TIF)