Giant flagellins form thick flagellar filaments in two species of marine γ-proteobacteria

Flagella, the primary means of motility in bacteria, are helical filaments that function as microscopic propellers composed of thousands of copies of the protein flagellin. Here, we show that many bacteria encode “giant” flagellins, greater than a thousand amino acids in length, and that two species that encode giant flagellins, the marine γ-proteobacteria Bermanella marisrubri and Oleibacter marinus, produce monopolar flagellar filaments considerably thicker than filaments composed of shorter flagellin monomers. We confirm that the flagellum from B. marisrubri is built from its giant flagellin. Phylogenetic analysis reveals that the mechanism of evolution of giant flagellins has followed a stepwise process involving an internal domain duplication followed by insertion of an additional novel insert. This work illustrates how “the” bacterial flagellum should not be seen as a single, idealised structure, but as a continuum of evolved machines adapted to a range of niches.


Introduction
Flagella are the organelles responsible for motility in diverse bacterial species. Flagella form helical filaments, several microns long, connected to a basal body that spans the cell envelope and functions as a rotary motor [1]. Some species have only a single flagellum, while others possess multiple filaments that form a coherent bundle for swimming [2]. Flagellar motors consist of over 20 different proteins that harness proton-or sodium-motive forces to generate torque [3]. The motor spins the flagellar filament to propel the cell.
The flagellar filament is composed of thousands of flagellin monomers whose architecture features a conserved polymerization core and a variable solvent-exposed region. Flagellins are secreted by the flagellar type III secretion system and travel through the hollow core of the growing filament to assemble into a helical array at the distal tip [4]. Flagellins from diverse species all possess conserved N-and C-terminal regions that discontinuously fold together to form the D0 and D1 structural domains [2,5]: D0 folds from the discontinuous extreme Nand C-terminal regions, while D1 folds from discontinuous adjacent N-and C-terminal regions. These domains mediate extensive inter-flagellin contacts, polymerizing to form the inner tubular section of the flagellar filament. The region between the discontinuous N-a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 terminal and C-terminal segments of D1 forms the solvent-exposed surface of the filament, which, according to the structure of the canonical flagellin from Salmonella enterica serovar Typhimurium (S. Typhimurium), is composed of the D2 and D3 structural domains [5,6].
In contrast to the highly conserved N-and C-terminal flagellin domains, it has long been known that the central region of this protein is highly variable in both sequence and length [7]. Recent studies have also revealed considerable variation in flagellar motor structure together with alternative arrangements of variable surface-exposed flagellin domains that do not fit into the S. Typhimurium paradigm [2,8,9].
Most work on flagellins has focussed on proteins of a similar size to that of the phase 1 flagellin (FliC) of S. Typhimurium (495 amino acids; 51.6 kDa). However, nature provides multiple examples of 'giant flagellins' greater than a thousand amino acids in length, with large solvent-exposed hypervariable domains. Here, we sought to explore the phylogenetic and evolutionary context for these giant flagellins through bioinformatics analyses (also reviewed in our recent survey review [10]). We hypothesised that such giant flagellins might assemble into filaments thicker than the canonical flagellar filaments from S. Typhimurium and confirmed this experimentally by measuring filament thickness in two hydrocarbon-degrading marine γproteobacteria with predicted giant flagellins: Bermanella marisrubri Red65, originally isolated from the Red Sea [11] and Oleibacter marinus 2O1, originally isolated from Indonesian seawater [12]. We propose a mechanism for evolution of these giant flagellins based on phylogenetic analysis.

Database searches and phylogenetic analysis of giant flagellins
A search of the September 2017 release of the UniProtKB database returned 4,678 protein sequences that harbour the PFAM domain pf00669 and are longer than S. Typhimurium FliC, including 92 longer than 1000 aa (S1 File). Although pf00669 represents the N-terminal helical structure common to all flagellins, it is also found in some representatives of the hook-filament junction protein FlgL. To extract only flagellins from the database, hidden Markov models (HMMs) for FliC and FlgL were constructed from established examples of each family (S2 and S3 Files). These HMMs were used in a voting scheme to identify 3536 true flagellins. To determine phylogenetic relationships, a multiple sequence alignment of the conserved N-and Cterminal flagellin sequences was built using FSA [13]. T-coffee was used to remove unconserved gaps [14]. A phylogeny was determined using RAxML [15], visualized with SeaView [16], and annotated with flagellin insert sizes using Inkscape 0.91 and Python scripting.

Strains and growth conditions
Two strains potentially encoding giant flagellins were obtained as live cultures from the German Culture Collection (DSMZ; Table 1). They were grown in 5 mL of Marine Broth 2216 (BD, New Jersey, U.S.A.) supplemented after autoclaving with 1% filter-sterilised Tween 80, within loosely-capped 30 mL sterile polystyrene universal tubes at 28˚C and shaking at 200 rpm.

Light and transmission electron microscopy
Initial cultures were grown as described above for 72 h. A 50 μL portion of each culture was then transferred to 5 mL of the same culture medium and grown under the same conditions for 24 h. Motility was screened by placing a 5 μL drop of each culture on a microscope slide and examining with differential interference contrast through a 100× oil-immersion objective lens.
For transmission electron microscopy (TEM), 1 mL aliquots of the fresh cultures were washed twice by pelleting the cells at 1,500 × g for 3 min and gently re-suspending in 2-(Nmorpholino)ethanesulfonic acid (MES) buffer. The final resuspension used 200 μL of MES buffer to achieve a higher cell density. 4 μL of the final cell suspension was deposited on glow discharged, carbon-coated TEM grids and the cells were allowed to settle onto the surface for 1 min. Cells were negatively stained by uranyl acetate and then imaged with a Tecnai T12 transmission electron microscope with TVIPS camera (FEI, Oregon, U.S.A.).

Whole genome shotgun sequencing and analysis
B. marisrubri genomic DNA was purified from pelleted cells of a 5 mL culture using a Fas-tDNA SPIN kit for Soil (MP Biomedicals, California, U.S.A.) according to the manufacturer's instructions, except that the final elution used 100 μL of buffer DES. A sequencing library was prepared using a Nextera XT kit and sequenced on a NextSeq 500 in 2 × 150 bp, high-output mode (both Illumina, Cambridge U.K.).
Sequence reads were quality trimmed with Trimmomatic 0.38 [19] and assembled with SPAdes 3.11.1 [20]. Sequence similarity was assessed by calculating the average nucleotide identity between the resulting contigs and NCBI genome NZ_CH724113.1 at http://enveomics.ce.gatech.edu/ani/index [21]. All reads were mapped to the expected flagellin gene sequence using the Geneious mapper and default parameters within Geneious 11.1.2. Mapped reads were separated and mapped again to the SPAdes contigs to locate the giant flagellin gene, and a pairwise alignment was created for the reference and test contig using MAFFT 7.388 [22].

Protein identification by mass spectrometry
Flagella were isolated by acid depolymerisation according to an existing method [23], then separated by sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) using 8% Novex Tris-Glycine gels (Fisher Scientific, Loughborough, U.K.). Following staining with Bio-Safe Coomassie Stain (Bio-Rad, Watford, U.K.), bands for analysis were cut out, washed, treated with trypsin, and extracted according to standard procedures [24]. The peptide solution resulting from the digest was mixed with α-cyano-4-hydroxycinnamic acid (Sigma-Aldrich, Gillingham, U.K.) as matrix and analysed on an Autoflex Speed MALDI-TOF/TOF mass spectrometer (Bruker Daltonics, Coventry, U.K.). The instrument was controlled by a flexControl method (version 3.4, Bruker Daltonics) optimised for peptide detection and calibrated using peptide standards (Bruker Daltonics). Data were processed in FlexAnalysis (Bruker Daltonics) and the peak lists were submitted for a database search using an in-house Mascot Server (v2.4.1; Matrixscience, London, UK). The search was performed against a custom database containing the sequence of interest (tr|Q1N2Y3|Q1N2Y3_9GAMM, Flagellin OS = Bermanella marisrubri GN = RED65_06563 PE = 3 SV = 1) in a background of approximately 200 random E. coli protein sequences using trypsin/P as enzyme with maximum 1 missed cleavage, 50 ppm mass tolerance, carbamidomethylation (C) as fixed and oxidation (M) as variable modification.

Analysis of giant flagellin evolution
To analyse flagellin evolution, we focused on the Oceanospiralles and closely related species. We extracted sequences specific to the Oceanospiralles for further analysis (S1 File) and initially inspected sequences using Dotlet [25] to identify homologies and repeats. To identify the boundaries of the DE region, local sequence alignments were performed between pairs of tandem repeats from the same protein, and locally aligned parts of sequences extracted and used to build a multiple sequence alignment, as described above, of all DE regions that were found as tandem repeats (S4 File). An HMM was built from this multiple sequence alignment to identify remaining singly-occurring DE regions with appropriate boundaries. Matching sequences were extracted, pooled with the previously identified DE regions, and used to build a second HMM that was finally used to identify DE regions across the Oceanospiralles dataset (S1 Fig and S5 File). Another iteration did not identify additional regions, and no further iterations were performed.
To determine the scope of DE region occurrence, the DE region HMM was uploaded to the HMMSearch webserver version 2.23.0 at http://www.ebi.ac.uk/Tools/hmmer/search/ hmmsearch and Reference Proteomes searched for significant hits [26].

Giant flagellins are widespread in nature
Phylogenetic analysis (Fig 1) revealed that giant flagellins have evolved through convergent evolution within several different lineages scattered across a range of bacterial phyla and sub-phyla, including the α-, γ-and δ-/ ε-subdivisions of the Proteobacteria, Aquificae, Chrysiogenetes, Deferribacteres, Planctomycetes, Synergistetes and the Firmicutes, plus the candidate phylum Glassbacteria from the recently described candidate phyla radiation [27].

Giant flagellins produce thick flagella in the Oceanospiralles
To determine whether giant flagellins produce thick flagella, two closely related γ-proteobacterial species with giant flagellins were selected for further investigation: Bermanella marisrubri Red65 and Oleibacter marinus 2O1. Both are representatives of the order Oceanospiralles, comprised of marine bacteria, many of which are able to consume petroleum hydrocarbons. The B. marisrubri (NCBI: NZ_CH724113.1) and O. marinus (NCBI: NZ_FTOH01000001.1) genomes each encode two proteins annotated as containing the pf00669 PFAM domain. However, in each case, the putative giant flagellin is the only protein predicted by our analyses to belong to the flagellin family, while the shorter proteins show greater similarity to FlgL than to flagellin. Light microscopy revealed almost all B. marisrubri cells to be highly motile and to swim extremely quickly, while only a minority of O. marinus cells were motile (S1 and S2 Videos). Under the electron microscope, most B. marisrubri cells but far fewer O. marinus cells were flagellated, suggesting that differences in the numbers of motile cells reflected differential expression of the flagellar apparatus. B. marisrubri cells were very thin (< 0.5 μm), approximately 2-3 μm in length, with a spiral shape and a single, polar flagellum. Meanwhile, O. marinus cells were shorter (1-2 μm), curved or spiral in shape, also with a single, polar flagellum. In support of our hypothesis, we found that both B. marisrubri and O. marinus produced flagellar filaments with widths of at least 30 nm. This is approximately 25% thicker than S. Typhimurium phase I filaments (24 nm), the subunit of which contains 495 amino acids [5].
We confirmed the presence and sequence of the expected giant flagellin gene in B. marisrubri DSM 23232 by Illumina short-read whole genome shotgun sequencing (NCBI: SRP150 336). Our assembled contigs had a 100% (S.D. 0.14%) two-way average nucleotide identity with the reference genome. The giant flagellin gene was located on contig 2 (333,297 bp). A pairwise alignment between this and the reference contig AAQH01000006 (146,124 bp) showed 99.9% identical bases, including the entire giant flagellin gene, with only an 84 bp intergenic sequence in the reference that was not in contig 2 (S2 Fig). To confirm that in B. marisrubri, the giant flagellin is the only constituent of the flagellar filament, we isolated flagella from B. marisrubri cells by acid depolymerisation and detected a 120 kDa band by SDS-PAGE (S3 Fig). Tryptic digests of this protein band analysed by mass spectrometry produced peptides that matched with the expected protein (Q1N2Y3) in the UniProt database, with a protein score of 143 and E-value of 1.2 × 10 −12 (S3 Fig and S6 File). The next-best match had a protein score of 18 with an E-value of 3.7, indicating a high degree of certainty over the identity of the expressed protein.

Some giant flagellins contain repeat domains
Next, we sought to understand the evolutionary pathway to this giant flagellin family. Sequence analyses confirmed that the B. marisrubri flagellin features a large, non-homologous central insertion in place of the D2 and D3 domains found in S. Typhimurium. Dot plots applied to this central insert revealed an internal duplication within the relevant stretch of sequence from the giant flagellins seen in the Oceanospiralles, as exemplified by the flagellin from B. marisrubri. Analyses using Hidden Markov models of the repeat unit (which we have named the Domain-Extension or DE region to make lack of homology to D2 and D3 explicit) showed that this region was common in predicted flagellins from Oceanospiralles and related γ-proteobacteria. Large (>800 aa) and giant (>1,000 aa) flagellins from taxa closely related to B. marisrubri possess two tandem DE regions, while related Oceanospiralles species with intermediate-length flagellins feature only one DE region (Fig 2C). In addition to the tandem DE regions, Oceanospiralles with particularly giant flagellins (Oleibacter, Thalassolituus, Oleispira, and Gynuella species) feature an additional insert (which we call the DE-eXtension or DX insert), between the two DE regions. The phylogeny of post-duplication DE regions were congruent with the flagellin tree, suggesting a direct domain duplication followed by vertical transmission, as opposed to horizontal transfer of the insert.
To determine the wider pattern of occurrence of the DE region, we searched Reference Proteomes using our hidden Markov model and HMMSearch. Significant hits were made to diverse bacteria including Pseudomonas, Legionella, Campylobacter and Helicobacter species, but not Salmonella. No significant hits were observed for any proteins other than annotated flagellins (E-value < 0.1). While partial structures of the Pseudomonas [2] and Legionella [28] flagellins have been determined, the structure of the DE region itself has not been determined to high resolution. The medium-resolution structure of the Pseudomonas DE domain region, however, is suggestive of inter-monomer pairwise dimerization of DE domains along the length of each of the 11 protofilaments [2].
The origin of the additional DX insert between tandem DE regions was less clear. Extraction of this sequence and BLASTing against reference genomes revealed no significant hits except among the Oceanospiralles flagellins. This DX insert features a conserved~35 aa glycine-rich region without internal repeats, which due to lack of other detectable homologs has probably been generated de novo [29].
How are the tandemly duplicated DE inserts likely to fold? Flagellins form discontinuous "out-and-back" topologies, wherein the N-terminus is close to the axis of the filament, the amino acids at the sequence midpoint are furthest from the flagellum axis and most peripheral, while the C-terminus is again axial, adjacent to the N-terminus. How, then, a flagellin can fold with a tandemly duplicated domain is intriguing. Two possibilities present themselves: the two DE regions fold internally, forming two tandem continuous DE domains corresponding to continuous stretches of sequence (Fig 3A), or the two DE repeats discontinuously fold to each contribute to one half of the two DE structural domains, as is the case with the D0, D1, and D2 domains (Fig 3B) in S. Typhimurium. Dimerization of the DE region of Pseudomonas aeruginosa [2], suggests that a domain duplication would lead to dimerization of the tandem DE regions, in favour of the former scenario, depicted in Fig 3A. Future structural determination will shed more light on the folding mechanism and evolution of these domains.

The functional advantages of giant flagellins and thick flagella remain unclear
Our phylogenetic analysis suggests an incremental, stepwise, and monotonic increase in flagellin size in the B. marisrubri and O. marinus lineage, indicating retention of each successively larger flagellin, and in turn suggesting consistent selective benefit from each size increase. We can currently only speculate as to the true driving force behind the evolution of giant flagellins. However, the parallel evolution in different-but not all-members of the Oceanospiralles indicates adaptation to a similar niche in each case. More broadly, the diversity of variable domains suggests that the protein has undergone similar processes of incremental adaptation to a myriad of different niches throughout the bacteria.
Multiple functions of flagellins also appears to be common: flagellins with active metallopeptidase enzyme domains have recently been discovered [30], and flagellins of enteric bacteria and plant pathogens also serve important functions in cell attachment and recognition or modulation of host immunity [1,31]. Therefore, it would appear that flagellins evolve to suit the specific combination of functional requirements demanded by their ecological niche.
The expanded central domain of the giant flagellins could confer a second function beyond motility, as described above. However, the proteins reported here do not possess sequence similarity to any enzyme domains. An alternative explanation is the increased thickness represents a mechanical adaptation to allow faster or more efficient swimming. In particular, a thicker flagellin might be less flexible due to steric hindrance, producing a longer, less helical filament. Motility at low viscosities might favour selection of such a less flexible filament as it would increase the axial thrust per rotation. Some species of bacteria, including Vibrio parahaemoliticus and Bradyrhizobium japonicum display a 'division of labour' by expressing single, thick, polar or subpolar flagella for swimming motility in liquid media and multiple, thinner, lateral flagella for swarming on surfaces or in viscous media [32][33][34][35]. However, B. marisrubri and O. marinus are not predicted to express any other flagellins and so probably rely solely on thick flagella for their motility.
The discovery of thick, polar flagellar filaments in two species of marine bacteria expands the known diversity of flagellar architecture, but also raises more questions. Are other components of the flagellar apparatus correspondingly large? Do thick flagella require alternative structural arrangements to those seen in the common model organisms? How common or widespread is this pattern of flagellation?
In conclusion, this work illustrates how "the" bacterial flagellum is not a single, ideal structure, but is instead a continuum of evolved machines adapted to a wide range of niches [36][37][38]. It will be interesting to learn how the structure and function of these marine flagellar systems are related to their sequences and how the structures are influenced by their environments. S4 File. Multiple sequence alignment of all Oceanospiralles Domain Extension regions that were found as tandem repeats. "ins" refers to the "insert" sequence, and "1half" and "2half" denotes either the first or second repeats. The file is a plain text file in ClustalX format that can be opened in any text editor for viewing.