Coevolved Mutations Reveal Distinct Architectures for Two Core Proteins in the Bacterial Flagellar Motor

Switching of bacterial flagellar rotation is caused by large domain movements of the FliG protein triggered by binding of the signal protein CheY to FliM. FliG and FliM form adjacent multi-subunit arrays within the basal body C-ring. The movements alter the interaction of the FliG C-terminal (FliGC) “torque” helix with the stator complexes. Atomic models based on the Salmonella entrovar C-ring electron microscopy reconstruction have implications for switching, but lack consensus on the relative locations of the FliG armadillo (ARM) domains (amino-terminal (FliGN), middle (FliGM) and FliGC) as well as changes during chemotaxis. The generality of the Salmonella model is challenged by the variation in motor morphology and response between species. We studied coevolved residue mutations to determine the unifying elements of switch architecture. Residue interactions, measured by their coevolution, were formalized as a network, guided by structural data. Our measurements reveal a common design with dedicated switch and motor modules. The FliM middle domain (FliMM) has extensive connectivity most simply explained by conserved intra and inter-subunit contacts. In contrast, FliG has patchy, complex architecture. Conserved structural motifs form interacting nodes in the coevolution network that wire FliMM to the FliGC C-terminal, four-helix motor module (C3-6). FliG C3-6 coevolution is organized around the torque helix, differently from other ARM domains. The nodes form separated, surface-proximal patches that are targeted by deleterious mutations as in other allosteric systems. The dominant node is formed by the EHPQ motif at the FliMMFliGM contact interface and adjacent helix residues at a central location within FliGM. The node interacts with nodes in the N-terminal FliGc α-helix triad (ARM-C) and FliGN. ARM-C, separated from C3-6 by the MFVF motif, has poor intra-network connectivity consistent with its variable orientation revealed by structural data. ARM-C could be the convertor element that provides mechanistic and species diversity.


Introduction
Bacterial motility and chemotaxis have been studied extensively for the past few decades. These studies have established two fundamental tenets: 1. the rotation of flagellar motors is energized by membrane ion potentials [1], 2. a signal phospho-relay built around a diffusible, phosphoprotein CheY couples chemoreceptor state [2] to flagellar motor response. Changes in chemoreceptor state triggered by chemotactic stimuli alter motor counter-clockwise (CCW) / clockwise (CW) rotation bias, but do not affect energization of motor rotation. The binding of the CheY signal protein to FliM subunits within the rotor results in large domain movements of the adjacent FliG subunits. FliM and FliG multi-subunit organization and domain interactions are critical to understanding how the movements underlie motor response.
The C-ring, a large multi-subunit assembly within the flagellar basal body composed of the proteins FliG, FliM and FliN, forms the rotor of the bacterial flagellar motor. The C-ring architecture of isolated Salmonella enterica serovar Typhimurium ("Salmonella") basal bodies has been determined by electron microscopy [3]. Atomic models of C-ring architecture, with implications for the switching mechanism, have been developed. The models dock the X-ray structures of the protein components into the electron microscopy reconstruction, guided by cross-link data and mutant analysis [4][5][6] (Fig 1). The switching of Salmonella flagellar rotation sense is "ultra-sensitive", with a high Hill co-efficient for the activated CheY concentration in vivo [7] consistent with the multiple subunits [8][9][10][11]. In addition to the X-ray structures [6,[12][13][14][15][16], NMR of isolated FliG, FliM and CheY complexes have described the protein-protein interactions affected by CheY binding [17]. CheY binds to other sites on FliM and / or FliN once tethered to FliM N [17,18]. The conformational changes triggered by CheY binding could be enhanced by FliM self-association mediated by the pseudo-symmetric 3-layered α/β/α sandwich middle domain (FliM M ) [5]. FliM M and the FliG middle domain (FliG M ) may form the gearbox that relays these changes to FliG C . The penultimate helix, henceforth termed "torque helix", forms a prominent surface ridge in the FliG C-terminal domain (FliG C ). The FliG protein has a N-terminal domain (FliG N ) in addition to FliG M and FliG C , all composed of multiple armadillo (ARM) repeats [6]. The torque helix interacts with the stator Mot complexes [19] and changes orientation during chemotactic stimulation [15,20]. Conserved residues identified from hidden Markov models (HMMs) of Pfam multiple sequence alignments (MSAs) (shown in http://pfam.xfam.org/clan/FliG) include three short sequences ("motifs"). These motifs are GGXG in FliM M , EHPQ in FliG M , MFXF in FliG C (all letters, except X, specify the conserved amino acid; while X denotes variable residue positions). FliG C may be divided into an N-terminal helical triad (ARM-C) and a C-terminal six-helix bundle (C1-6) based on its flexibility around the MFXF motif in H. pylori [15]. The conservation of charged residues in the torque helix, while not absolute, has been noted [12]. The motifs are among the sites that upon mutagenesis yield CW or CCW chemotactic (che) phenotypes [21,22], reviewed in [6,23].
In spite of the above-noted advances, a complete atomic level knowledge of the switching mechanism has not been possible, even for the enteric Salmonella and Escherichia coli that have been the focus of studies thus far. This is due to several factors. 1. The limited resolution of the electron microscopy reconstruction makes consensus on subunit stoichiometry or contacts difficult [24]. 2. Thermotoga maritima, Aquifex aeolicus and Helicobacter pylori FliG Xray structures used for the atomic model show the protein adopts multiple conformations [15]; while basal bodies from these and other species differ from Salmonella in C-ring size [25,26]. Even within one species, C-ring architecture is likely to be altered by adaptive changes [27]. 3. Residue conservation identifies important residues but not the interactions between residue positions required for deciphering the allosteric network involved in the switching mechanism. 4. The C-ring protein-protein interactions documented by NMR and in-situ cross-linking do not fully agree [28]. 5. The chemotactic response of the flagellar motor differs between species. While CheY binding switches rotation sense from CCW to CW in the enteric bacteria; this logic is inverted in Bacillus subtilis [29]. In Rhodobacter sphaeroides and Sinorhizobium mellioti, the motor alternates between rotation stops and starts [30][31][32]. CheY is dephosphorylated at the motor by FliY [33], present with, or instead of, FliN in many species [34]. Part of FliY is homologous to FliM M . This FliY segment could complement or substitute for FliM M interactions with FliG in gram positives. Thus, even if complete knowledge of the switching mechanism were achieved for Salmonella, its general applicability would remain an issue.
We present, here, a novel approach based on covariance analysis of coevolved mutations [35] for identification of the common design principles of the flagellar motor switch. The method has important advantages. First, in common with residue conservation, its conclusions are based on a wide database and, therefore, have generality. Second, it records interactions at single residue detail. This is true also for NMR, but only for isolated complexes of limited size, and in-situ crosslinking, but only for positions selected for study. The disadvantages are Model of the 3D EM reconstruction (http://www.ebi.ac.uk/pdbe/entry/EMD-1887) shows MS ring (green) and C-ring (blue). The MS-ring is embedded in the cytoplasmic membrane while the C-ring protrudes into the cytoplasm. There is a mismatch between the MS-ring and C-ring symmetry. Rectangle denotes likely position of the FliM M FliG MC complex (4FHR.pdb) in the C-ring half proximal to the MSring. The complex comprises FliM M (yellow), FliG M (green), FliG C ARM-C (olive) and C1-6 terminal six-helix bundle (dark green). FliG M consists of ARM-M plus a partially resolved linker. C3-6 = C1-6 four-terminal helices. Orange segments denote EHPQ (black asterisk) and MFVF (red asterisk) motifs. Charged residues on the torque helix within C3-6 are highlighted (red sidechains). The distal C-ring is comprised of the FliM C-terminal domain and FliN. S1 Fig has secondary structure nomenclature.
analysis and interpretation of the large amount of information contained in a coevolution matrix. We developed metrics based on network tools [36] to make the analysis tractable and mapped the correlations onto the atomic structures to facilitate interpretation. We find that FliM M has an unusually compact coevolution network, a feature that is explained by the primacy of the inter-subunit contacts for FliM self-association. FliM M and the FliG C terminal four-helix bundle (C3-6), built around the torque helix, communicate via an allosteric network mediated by a few surface-proximal patches in FliG organized around the EHPQ motif. The patches are targeted by deleterious che mutations, underlining the importance of the network for signal transduction in the switch complex.

Fig 2
gives an overview of the computational strategy. 1. MSAs of FliG and FliM were the basis for all analysis. The information content and conservation score for residue positions was determined to guide subsequent steps. The MSAs were mapped onto structures for identification of conserved surface residues potentially involved in inter-domain interactions. 2. Correlations between residue positions were the main measure of coevolution. We created randomized MSA libraries to estimate the statistical significance of the correlations. The original coevolution matrices were compared against the population of correlation matrices generated from the randomized libraries. Lists of chemotactic mutations in Salmonella based on swarm plate assays were matched to the residue correlation network. The lists were shuffled to score for random matches. 3. A network model of the original coevolution matrices was generated and metrics developed to measure residue, patch and domain coevolution. 4. Phylogenetic tree similarity provided an alternate check for domain coevolution. Replicates were used to assess the robustness of the most likely phylogenetic tree for each domain. 5. Phylogenetic tree topologies were compared by computation of the fit probabilities of the domain MSAs with a reference domain phylogenetic tree. 6. The results were evaluated in the context of available structural knowledge. Custom scripts to perform various tasks were written in C, python and R (http://www.r-project.org). They are available upon request. Procedures for each step are detailed in Methods.

FliM M contacts dominate the FliM M FliG MC coevolution matrix
A coevolution matrix contains a large number of correlations between residue positions. The numbers scale as the square of the protein sequence (e.g. 10 4 possible correlations for a 100 residue protein). The correlations fall into three categories; residual correlations due to finite MSA depth and diversity, correlations due to residue contact either within or between domains and long-range correlations due to allosteric couplings.
Our analysis is based on representation of the coevolution matrix as a network, with residue positions as "nodes" and the correlations between them as "edges". The contribution of residue positions to the network is then obtained as their centrality [37]. The eigenvector centrality, E, is calculated directly from the correlation matrix: where (M) evol is the coevolution matrix and λ the corresponding eigenvalue. We define the mean centrality of "i" residue positions as their weight. W ¼ P n i¼1 E=n. The number of contiguous residue positions, n, is 6, unless otherwise noted. "Node" will henceforth refer to such sixresidue segments in the complete network or its derived sub-networks, rather than individual residues. The weight W measures the network information content contained in a node. It is a product of the mean strength of the correlations formed by the node with other nodes times the number of correlations or its connectivity. Domain-level measures for correlation strength (S M ) and connectivity (C) are defined later in this Section.
We first corrected for residual correlations in order to study the correlations due to protein domain interactions. The residual correlations were characterized by generation of a library of randomized MSAs (n = 100) in which the amino acid residues were shuffled column by column. This method preserved the entropy at residue positions. The randomized MSA library was batch-processed with the PSICOV algorithm [38] to generate a stack of randomized correlation matrices. One example of a randomized matrix is shown together with the mean centrality profile of the randomized MSA library for the FliM M FliG MC complex (4FHR.pdb) (Fig 3A). The centrality of residue positions superimposed with their entropy in the MSA. The  consistency between the two measures shows that the potential fractional contribution of residue positions to the network information content is given by their Shannon Entropy (Methods). The entropy differences between residue positions are created by the finite MSA size and diversity, with an extreme example of low entropy being positions occupied by only acidic (E, D) or basic (K, R) residues. In contrast, all nodes have the same entropy in an ideal random network and, thus, equal W (default value 1). Primary nodes of a coevolved network were then defined as those with W > W MEAN+2σ , where σ is the deviation expected from the randomized MSA library.
We now examined the real coevolution matrix of the FliM M FliG MC complex ( Fig 3B). We obtained the striking result, seen in the centrality profile, that FliM M collectively had greater weight in the composite matrix than FliG MC . Inspection of the FliM M and FliG MC matrices revealed the reason. The FliM M matrix was more densely and uniformly populated than that for FliG MC . The weight δW i of residue position "i" in the difference profile was computed from the equation where E i and E R are the real centrality and randomized library centrality at position i, while M E .

M R
! is the ratio of the real over randomized library centrality means, averaged over the profile. The difference (δW i ) profile confirmed that the difference between the mean FliM M and FliG MC weights exceeded the expected deviations in the centrality profile due to network noise from residual correlations. We sought an explanation for this difference.
Inter-subunit contact correlations account for the high-density of the FliM M coevolution matrix The high-density of the FliM M matrix results from correlations between distant sequence positions. Distant sequence positions imply physical separation. If so, the density of the FliM M matrix could indicate inter-subunit contacts and / or allosteric couplings. We used available structural knowledge based on cross-link data ( Table 1) as well as the X-ray structures to evaluate these possibilities. We screened the T. maritima FliM M (2HP7.pdb) for conserved surface residue positions ( Fig 4A). We reasoned that surface residues that mediate inter-subunit contacts should be conserved for residue type (hydrophobicity or charge) relative to those that do not. The che mutations in Salmonella [21] have been proposed to target sites for FliM self-association [5]. We therefore constructed networks comprising all possible interactions between residue positions equivalent to those targeted in Salmonella and examined their centrality. We assumed that the correlations between the mutated positions had equivalent strength. Binary mask matrices, with same dimensionality as (M) evol , representing the interactions between CW or CCW mutant positions were created; with elements representing correlations between mutated positions having value 1, and other elements value 0. The correlation matrices were obtained by multiplication of [M] evol by the mask matrices. The CCW mutation network had one primary node, while the CW network had several. These nodes, with two exceptions (nodes 1 and 4), mapped within or close to conserved surface residue patches ( Fig 4A).
We recorded the C α -C α physical distance separating correlated residue positions in the T. maritima FliM M structures (Fig 4B). High-scoring correlations were mapped on the structures. A more stringent +3σ threshold (Methods) was used for the single correlations, relative to the +2σ threshold employed for the 6 residue nodes in the centrality profiles, with σ in both cases determined from the randomized libraries. Many correlations were between pairs greater than 20 angstroms apart in the FliM M subunit. The residues localized at subunit surfaces marked by the CW mutations, linking positions in CW nodes 1 (H1), 3 (β2, β3) and 5 (between β1 Ã and H2 Ã ). In-situ cross-link data have shown that these surface elements participate in inter-subunit contacts. The long-range (> 20 angstrom) correlations had comparable values to the contact (< 12 angstrom) correlations. The consequence was that correlation strength had a weak dependence on distance ( Fig 4C). The mean value / fraction above threshold for the contact (< 12 angstrom) population is 1.74 ± 0.72 / 0.15, versus 1.66 ± 0.54 / 0.1 for the non-contact (>12 angstrom) population. The dependence was insensitive to whether FliM M was in isolation, or in complex with FliG MC ; though values were inflated for FliM M correlations in the complex due to inclusion of the low-scoring FliG MC correlations in the normalization. This result implies that the inter-subunit contacts are as important as the intra-subunit contacts that maintain the domain fold.
Coevolution analysis indicates that FliM M interfacial contacts for selfassociation are more conserved than the FliM M contact with FliG M An alternative explanation to inter-subunit contacts is that the high-density of the FliM M coevolution network results from multiple contacts between residue positions due to conformational variability between species that smear out correlations over the coevolution matrix. Superposition of the structures from the evolutionary distant T. maritima and H. pylori species does not support this explanation. The structures have a common fold (Fig 5A), even though Other headers denote the protein pair ("Protein1/Protein2") whose residues are crosslinked. Crosslinked pairs are denoted as "Residue1/Residue1", or as "Residue1/Residue1,Residue2,-"where Residue1 from Protein1 forms multiple crosslinks with Residue1,Residue2,-from Protein2. Superscript denote network nodes identified in Figs 4A and 6A that either include or are adjacent (+ = C-terminal,-= N-terminal) in the sequence to the crosslinked residue.
Residue font denotes it forms a crosslink whose yield is increased by either repellent (italic) or attractant (bold) stimuli. Superscript color denotes either a FliM M CW (italic) or CCW (bold) node. Residue numbers from E. coli and H. pylori have been converted to T.maritima residue numbers based on the MSA doi:10.1371/journal.pone.0142407.t001 Conformational changes triggered by CheY need to propagate along the C-ring, as well as from its distal to proximal end. A quantitative comparison of the correlation strength of the FliM M inter-subunit contacts versus the FliM M FliG M contact could evaluate the dominance of these pathways for chemotactic signal transmission. As noted, the collective FliM M W is determined by intra-domain, rather than FliG MC interactions in the composite network ( Fig 4A). We now developed two metrics for the interactions ("edges") that contribute to the node weight, W.
The first metric, S M is a measure of mean correlation strength.
The second metric, C, is a measure of connectivity The relative strength, S M , and connectivity, C, of the networks that involve the FliM M domain is shown (Fig 5B). The parameters used to compute these metrics from Eqs 3 and 4 are listed in Table 2. The calculations confirm the greater strength, S M , and connectivity, C, of FliM M within the composite network. The C between FliM M residue positions is within 5% of that obtained for the randomized networks and exceeds the C of the composite network twofold. The S M and C of the FliM M correlations with FliG MC are three and two-fold lower respectively than for the FliM M network. Interestingly, while correlations within the FliM M FliG MC contact have increased S M relative to the overall correlations between FliM M and FliG MC as might be expected for contact pairs, C is two-fold lower. The latter result shows that the contact is localized, consistent with the contact map (Fig 5A inset).
The CW and CCW chemotactic networks have greater (10-20%) S M , than the complete FliM M network from which they are derived ( Fig 5B). C is also improved. The change is small since C for the complete FliM M network is already ¾ of the maximum possible. Binary mask green), β = β-sheet (dark green with arrowhead). Asterisks indicate pseudo-symmetric equivalents. Inset: The 2HP7.pdb structure colour coded for conservation (strong (purple)-weak (blue)). ii. Network centrality profile of FliM M alone (gold symbols) is identical to the FliM M profile in the composite FliG M .FliG MC network. Thus intra-domain correlations determine the centrality of FliM M residue positions in the composite network. The horizontal short dashed lines around the zero mean difference (bold dashed line) show the (±σ) deviation expected due to residual correlations (as in 3B). There are no significant peaks in the FliM M difference profile (dotted gold line) or the FliM M FliG MC inter-domain correlations, consistent with the dominance of FliM M intradomain correlations. Centrality profiles of the CW (red) and CCW (blue) chemotactic networks show distinct CW (red arrows) and CCW (blue arrow) primary nodes. With the exception of CW node 4, the nodes are within or adjacent (< 7 residues) to the conserved surface patches. (B) Map of high-scoring correlations (white lines) between residue positions (gold stick side chains) in 2HP7.pdb (gold cartoon C α backbone). Red spheres mark residue positions equivalent to positions targeted by CW mutations in Salmonella. Numbers mark CW primary node segments identified from the centrality profile in A. (C) The distribution of correlation values as a function of the C α -C α distance between the paired residues. Shaded grey area demarcates the contact zone (< 12 angstroms). The short dashed line marks the 3σ threshold for high-scoring correlations.  The high-scoring correlations (coloured lines) between residues (numbered with yellow side-chains) mapped onto the enlarged 4FHR.pdb contact. Line colour denotes correlation strength (strong (orange / red)-weak (purple)). The correlated FliM residues cluster at two locations along the FliM M loop M131-E147 (CCW node 1), namely the G 132 GXG 135 motif and I 144 -G 147 . The correlated FliG residues in the FliG M segment P116-E170 cluster at E 126 HPQ 129 motif plus residues T310, A132 in the adjacent helix and two residues (I162, A163) in the helix neighbouring it. (B) i. The mean strength, S M and connectivity, C of matrices (n = 1000) with elements " P n i¼1 P n i¼1 ði; i þ 1Þ"; of value 1, generated by permutation from the list "I, i+1, i+2 . . . n" of mutated residue positions, were used to create a population of dummy CW or CCW networks to estimate significance. The S M and C of the CW dummy networks generated from the lists were 0.90±0.08 and 0.91±0.05 respectively of the real CW FliM M sub-network. Thus, the CW mutations target the more prominent features of the FliM M network. This is not the case for the CCW mutations. The S M and C of the CCW dummy list was 1.04±0.17 and 1.00±0.10 respectively of the real CCW FliM M network.
The EHPQ motif forms the dominant primary node in the complete FliG network  Table 2).    (Fig 3B). The mean randomized MSA library (black symbols) and the corrected difference (dashed cyan line) profiles are also shown. Vertical lines delineate domains. Horizontal dashed lines mark the expected deviation due to residual correlations (+2σ (red), -σ (black)). Arrows (red) denote primary nodes. The peak modes, numbered from N to C terminal (3HJL.pdb residue positions), are FliG N 86K, FliG M H128 (EHPQ motif) and K161, FliG C K235 (adjacent to MFXF motif), D249, S282 and Q308. The gaps in the profile are due to deletion tolerant sequence segments (yellow patches in 3HJL.pdb (colour coded for conservation as 2HP7.pdb (Fig 4Ai) nodes can be identified in the difference profile. The EHPQ motif forms the dominant node 2 with the highest W, out of the seven nodes identified. As for FliM M FliG MC , we used the available structures to determine the dependence of correlation strength on the physical distance between correlated residues. However, conformational heterogeneity was evident in the FliG MC structures (Fig 6B). Superposition shows that the heterogeneity as assessed by the root mean square deviation (RMSD) is due to the inter-domain linkers, since the individual domain RMSDs are lower than the overall RMSD. Within the subdomains, ARM-C is the most, and the C3-6 the least, heterogeneous. Short-range correlations that represent contact interactions (< 12 angstrom distance (shaded block)) were notably stronger than long-range non-contact (< 12 angstrom) correlations, as shown for the two extreme conformations (1LKV.pdb = extended, 3AJC.pdb = compact) (Fig 6C). Contact correlations have 30% greater strength and over two-fold greater fraction of high-scoring correlations, F (= high-scoring / total), than non-contact values. The mean strength / F for the 3AJC. pdb contact population were 2.11 ± 1.15 / 0.13, versus 1.6 ± 0.52 / 0.05 for the non-contact population. The mean strength / F for the 1LKV.pdb contact population were 2.09 ± 1.13 / 0.13, versus 1.48 ± 0.52 / 0.06 for the non-contact population.
In conclusion, the FliG network is different from the FliM M network in that it has distinct maxima in the centrality profile and contact distance dependence. The FliG domains have comparable W, in the network, in contrast to FliM M and FlIG MC in the composite network.

The torque helix alters the pin-wheel FliG ARM domain network architecture
The contact correlations provide insight into internal ("intra-domain") architecture of the domains. The coevolution matrices for the C3-6 and ARM-M sub-domains are shown together with their centrality profiles (Fig 7). The images (Fig 7A and 7B) show the high-scoring correlations mapped onto the 3HJL.pdb fold. A central helix (H8 in ARM-C, H17 in C3-6) in contact with the surrounding helices forms the core of the fold. H1 is the central helix for N1-4 (S1 Fig). In both N1-4 and ARM-M, these helices constitute the primary nodes of the contact networks. The high-scoring correlations radiate out in a pin-wheel pattern from these hub-helices. In C3-6 the pin-wheel is disrupted and the hub-helix (H17) no longer forms a primary node, even though its internal helix contacts are conserved. Instead, the primary node is now the torque helix (H19). The conserved α-helical architecture of the torque helix and adjacent helices, as well as the ARM-M hub adjacent to the EHPQ motif, is evident as bands four residues apart along the diagonal in the matrices (Fig 7). The loops connecting the torque helix to adjacent helices constitute nodes 6 and 7. The ARM-M hub-helix is adjacent to the dominant EHPQ node 2. The contrast between N1-4 and C3-6 is of interest since both sub-domains have a similar fold [6]. It argues that the torque helix is pivotal to coevolution of the C3-6 fold.
A three-node FliG M FliG C inter-domain network links the EHPQ motif to the C3-6 fold FliG inter-domain networks were characterized by isolation and analysis of off-diagonal blocks within the complete matrix to define domain interactions. Their centrality profiles were compared against the complete FliG profile (Fig 8A). The nodes for the FliG M FliG C interaction network superimposed with the complete network primary nodes 2, 3 4, with a weaker contribution from primary node 7. The nodes were localized at or close to the surface. E. coli cross-link data document the surface proximity of nodes 3 and 7 through formation of FliG oligomers ( Table 1). The same three nodes were also the target for CW mutations in Salmonella, as discerned from the centrality profile of the CW network. Dummy lists were constructed to evaluate statistical significance. The CW network and the dummy lists were both  (Fig 6A). (A) The C3-6 coevolution matrix. The torque helix H19 is the primary node (red arrow) in the centrality profile. The short, hub helix (H17 -black arrow) is adjacent to the linker between the torque-helix and the terminal helix (H20) Image: The highscoring correlations (white lines) mapped onto the 3HJL.pdb C3-6 (cyan backbone, yellow side-chains). H19 (conserved charged residues = red side-chains) and H17 (black backbone) are marked. (B) The ARM-M coevolution matrix. Correlations between a short helix (H8 -black) and surrounding helices form a pin-wheel pattern. The H8 helix adjacent to the EHPQ motif forms the primary node (black arrow) in the ARM-M contact centrality. Image: The high-scoring correlations mapped onto 3HJL.pdb ARM-M. The correlations and backbones are coloured as in A.  Primary node 1 within FliG N and an adjacent surface segment formed nodes for interactions with FliG MC (Fig 8A). The interactions are not expected from the structure of the A. aeolicus full-length FliG in which FliG N is separated by an intervening long helix from the rest of the protein. The long-helix may not be a common feature since it is formed, in part, by a deletiontolerant sequence segment. Cross-link data indicate that FliG N is in spatial proximity to FliG M in E. coli [28], consistent with this idea. The mean FliG C W was notably less than for FliG M in the FliG N and FliG MC interaction network centrality profile, No nodes were identified within the FliG C section of this profile.

The major interactions of the FliG signal transmission pathway
Computation of the S M and C of the FliG short and long-range interaction networks followed the examination of the node weights above. The parameters are listed in Table 3 and the results are summarized as a bar chart (Fig 8B). Among the short-range, intra-domain networks, that for C3-6 has both the greatest S M and C; notably greater than the corresponding metrics for N1-4. The normalized S M value for C3-6 is comparable to FliM M , though C is lower. The intra-domain and inter-domain FliG networks. The values have been normalized relative to the randomized library (mean (thick dashed line) ± σ (thin dashed lines)) as in Fig 5B. (C) The high-scoring long-range (>20 angstrom) correlations (white lines) mapped onto the 3HJL.pdb domains. The C α backbone segments are coloured according to the centrality profiles in A, The numbers denote the three nodes, the white spheres the node residues and yellow side-chains other correlated residues. Insets (bottom left panels) show relative S M (circle diameter) and C (line thickness) of the 3-node networks.
doi:10.1371/journal.pone.0142407.g008 ARM-C connectivity, C (19% of the randomized library value), is markedly worse than for the other modules. The FliG domain interaction networks have S M values that are lower than for the intradomain networks, being only marginally greater than the mean S M for the randomized networks. The C values are two-fold lower than those for the intra-domain networks. These S M differences are consistent with the stronger correlations seen between contact pairs (Fig 6C) that mainly represent intra-domain couplings. The FliG M FliG C stacking contact observed in some structures (3AJC.pdb, 4FHR.pdb) has somewhat higher S M than the overall FliG M FliG C interaction network, analogous to the FliM M FliG M contact. However, its correlations are uniformly distributed over the contact helices (S1 Fig), in contrast to the FliM M FliG M contact (Fig 5A).
We constructed networks from the top three primary nodes ("3-node networks") for the long-range networks to evaluate whether these formed the major determinants for the interdomain interactions. This is the case. The FliG M and FliG C interaction is the strongest. The 3-node network of the FliG N interaction with FliG MC has 1.5 fold greater S M than the complete interaction network, while the 3-node FliG M and FliG C interaction network S M is 2-fold greater ( Table 3, Fig 8B). FliG C3-6, with correlations between nodes 6 and 7 (adjacent to the torque helix) and node 5 (H15 just after ARM-C), has the long-range (> 20 angstroms) network with the best connectivity, C, to complement its strong contact network; while its S M is comparable to the 3-node FliG M and FliG C interaction network. The 3-node (2, 3 and 4) CW network too has improved strength and connectivity ( Fig 8B, Table 3). The 3-node networks have comparable S M but lower C values relative to the FliM M domain ( Table 2). The C value for C3-6 (0.9 (Table 3)) is closest to that for FliM M (0.95 (Table 2)). The high-scoring correlations for the 3-node networks are mapped onto the structures in Fig 8C. The topology makes a contactbased rationale for the inter-node correlations improbable, though contacts may occur as a consequence of mobility [15] as considered in Discussion.
In summary, the covariance analysis identifies a pathway for signal transmission from the EHPQ motif to the torque helix. The pathway is built from a patchwork of inter-connected nodes (2, 3 and 4). Node 4 contains the MFXF motif that dominates the sparsely connected ARM-C network module. The sparse ARM-C connectivity suggests that conformational heterogeneity, seen in the superimposed X-ray structures (Fig 6B) smears out residue correlations. Based on both short and long-range correlations, C3-6 forms a conserved fold. A conserved C3-6 fold is in line with the hypothesis, based on the H. pylori FliG MC structures [15], that FliG C C1-6 responds as a unit to conformational changes within FliM M triggered by CheY. These changes must be relayed, in part, via the EHPQ ARM-M hub (node 2).

The coevolution of FliG M with FliG C is detected by phylogenetic tree similarity
We constructed the phylogenetic tree of the FliG C domain to, first, learn more about its evolution (Fig 9A). The FliG C phylogenetic tree was colour coded to assess clustering. While both monophyletic and paraphyletic branches were observed, the former were predominant. The firmicutes were the most, and the δ-proteobacteria the least, monophyletic. The α-proteobacteria were the most paraphyletic; consistent with their diversity. The monophyletic branching was consistent with the neutral model of molecular evolution that posits that neutral mutations due to genetic drift are retained with selection based on phenotype while deleterious ones are rapidly eliminated ( [39] and references therein). The clustering was disrupted by presence of multiple FliG orthologues in the domain Pfam seed set used for construction of the tree, including two with duplicate flagellar systems in the set of commonly studied species. In some cases, possibly due to horizontal gene transfer, one orthologue localized to a branch for another phylum (eg. V.alginolyticus). In other cases a phylum (eg. α-proteobacteria) was partitioned between disconnected branches with representatives (eg. R. Sphaeroides) divided accordingly.
Second, phylogenetic tree similarity offered an independent alternative, with metrics limited by different factors, to check that S M , was greatest for the interaction of FliG M with FliG C . For the similarity comparison, the FliG C seed sequence MSA was used to extract matching FliG N , FliG M and FliM M sequences from the corresponding MSAs in the Pfam database (Methods). For species with multiple FliG orthologues, the single FliM sequence was paired with each FliG sequence. The FliG C tree was the most compact in terms of branch length, consistent with C3-6 residue coevolution (Fig 9B). Domain phylogenetic tree topologies were compared in duplicate for each of two reference trees (FliG M and FliG C ) to check for self-consistency. Coevolution between FliG C and FliG M was detected regardless of choice of reference tree, while coevolution of these domains with either FliM M or FliG N was not. The sensitivity of similarity measures scales with sequence length and is possibly compromised by the short domain sequences. In any case, similarity detection between the FliG C and FliG M trees supported the evidence from the covariance analysis that the interaction between FliG C and FliG M was the strongest.

Discussion
We have determined residue coevolution for FliM M alone, FliG alone and FliM M FliG MC in complex. We separated intra-domain from inter-domain correlations, identified inter-subunit associations, and assessed network disruption by chemotactic lesions. We developed metrics based on network analysis to measure the correlations. We cannot presently relate the metrics to biochemical parameters such as binding affinity because the coevolution signal may be modulated by a number of factors as illustrated in Fig 10A. PSICOV and related algorithms have been optimized to detect hard-wired, native contacts based on static electrostatic or steric constraints, but a large macromolecular assembly such as the switch complex is likely to form a conformational ensemble with diverse dynamics. However, guided by the structural data, we are able to provide a description of the flagellar switch architecture that reveals both common elements as well as possible sources of mechanistic and species diversity.

The FliM M array forms a concerted switch element
The extended network connectivity of FliM M indicates the importance of the FliM M fold as well as self-association. We take a high mean correlation strength, S M , and connectivity, C, of short-range contact correlations as indicators, most simply, of a compact structural fold that is conserved over species. Our data are consistent with molecular dynamics simulations that reveal the high mechanical stability of α/β/α sandwiches [40]. They are also in line with models that propose a central role for FliM M in triggering switching of rotation sense [20,28]. Monte Carlo simulations of conformational spread in the multi-subunit c ring have shown strong coupling between subunits is required to generate the observed two-state switching behaviour [8]. The conserved FliM M inter-subunit contacts suggested by the long-range correlations are consistent with this requirement and, furthermore, identify FliM M as the key determinant for the proposed conformational spread.
The contacts are known targets for che mutations [5]. They seem to be stabilized for the conformation representative of the Salmonella CCW rotation state, as they are disrupted to a greater extent by CW mutations. Three of the four nodes in the CW mutation coevolution network map to segments previously implicated in FliM self-association. The role of the fourth node is presently unknown. The interfacial surface covered by the coevolved contacts is large. So switching would be attenuated, but not determined by the variations in subunit stoichiometry or localization of the CheY binding sites. Change of one residue (X 0 ) causes change in a unique partner (Y 0 ) to preserve fold. Contacts that produce weak correlations fall into four groups. Diverse: X 0 has multiple partners due to conformational heterogeneity, or variable subunit symmetry in the case of surface residues. Permissive: X 0 tolerates multiple partners due to absence of strong constraints. Only certain residues that disrupt the contact interface are forbidden. Compliant: Y 0 is part of a structural element that is mobile or subject to local denaturation ("melting"). Hinged: X 0 and Y 0 are hinge elements coupled via a chain of residues. Alteration in one hinge triggers compensatory change in the other to preserve orientation. B. Signal transmission in the flagellar switch complex. The FliM M (gold backbone) fold and inter-subunit contacts are both important for its function. Arrows (gold) denote conformational spread in the FliM M array. The FliG C3-6 motor sub-domain (dark-green) is organized around A dedicated motor module The FliG C domain (C3-6) based on its coevolved network as measured by all three metrics (W, S M and C), also has a compact fold. The torque helix H19 is central to the C3-6 coevolution network. The H10 contact correlations modify the pin-wheel architecture found for the other FliG ARM domains. This knowledge supplements the conservation of its charged residues responsible for designation of H19 as a torque helix. For torque helix movements to be entrained to C1-6 global motions [15], it needs to be immobilized by contacts with adjacent helices. Our analysis implies this is the case. Accordingly, we propose that the C3-6 subdomain has been dedicated for motor function.
Primary nodes 6 and 7 flank the torque helix ( Fig 7A) and interact strongly among themselves (Fig 8B and 8C). Node 6 is a binding target for the c-di-GMP binding protein YcgR [41] in presence of c-di-GMP, a molecule that regulates several cellular behaviours. Cross-link data indicate that node 6 residues from neighbouring subunits form adjacent surface patches [4] that may function as allosteric sectors (see below). It will be of interest to determine whether node 6 serves as hinge to control C3-6 movements in response to chemotactic stimulation.

Relay of allosteric sectors
The primary nodes of the coevolved FliG M and FliG C interaction network are the third feature of the common switch architecture. These nodes could constitute an allosteric relay. Studies on dihydrofolate reductase as a model system have shown that inter-connected surface sites, termed "sectors", are preferred locations for allosteric control. These sectors were hot-spots for deleterious mutations [42]. The primary nodes that wire the EHPQ motif to the C3-6 motor domain have the properties observed for the dihydrofolate reductase sectors; namely distributed spatial organization that, in this case, wires the torque helix to multiple distant surface patches. YcgR may then act as allosteric effector. Furthermore, adjacent subunits could play a similar role in the multi-subunit assembly. Cross-links between residues in nodes 2 and 3 and within node 6, result in the formation of E. coli FliG oligomers. The E. coli cross-links could document mobility, analogous to the cross-links between nodes 4 and 7 in H. pylori (Table 1), consistent with transient association of adjacent subunits for allosteric regulation through freezing out of motions [43]. The dominant EHPQ motif node 2, adjacent to the ARM-M hub helix H8, forms one nexus of a two point FliM M FliG M contact. Node 3 includes the GGXG motif and a large conserved surface patch. Node 4 in ARM-C contains the MFXF motif [15]. Nodes 2 and 3 also interact with node 1 in FliG N . The relevance of the FliG N FliG M interaction for the switching mechanism, if any, is not known. The conservation of the motifs as well as the fact that they were targeted by CW chemotactic mutations was prior knowledge. Their coevolution is the new knowledge revealed by the present study.
Phylogenetic tree similarity measures provide independent support for FliG M coevolution with FliG C. The detection of allosteric contacts by covariance analysis is a debated topic [44], since multiple allosteric pathways exist within protein domains [45]. We favour the possibility that signal transmission between FliM M and C3-6 is mediated by allosteric inter-node couplings, but further work is needed, in particular protein dynamics [46], to elucidate these couplings.

Sources of mechanistic and species diversity
The ARM-C sub-domain is an element of particular interest since, although its MFXF motif (node 4) is integral to FliG network architecture, the sub-domain has sparse connectivity. Multiple factors can contribute (Fig 10A), but the structures suggest an explanation. ARM-C is characterized by conformational heterogeneity within and between species (Fig 6B). Segments of this domain are deleted in many species, while the helix linker connecting ARM-C to ARM-M has segments that could not be resolved in a number of X-ray structures. This linker is truncated or absent altogether from many sequences in the MSA, as is the linker between FliG N and ARM-M, and could also contribute to species diversity. ARM-C must report changes in FliM M conformational state triggered by CheY to C3-6, either via FliG M [17] or directly [20]. The coevolution signal for the ARM-M ARM-C stacking contact [28] seen in some T. maritima structures was weak relative to ARM-C ARM-M primary node interactions. There was also no signal for the E. coli ARM-C interaction with FliM M documented by numerous lines of evidence [17,23,28]. The coevolution signal for dynamic contacts may be smeared out by the ARM-C conformational heterogeneity due to the flexible loops. The heterogeneity may generate an ensemble of states from two (CW and CCW) FliM M states, as argued [47] to account for the diversity in motile behaviour seen across species.
A second element that may contribute to diversity is the contact between FliM M and FliG M . The contact is built from two FliM M residue segments in the loop at the pseudo-symmetry centre of the domain in both the T. maritima and H.pylori, structures [14] A two-point contact with flexible spacing provided by the loop accommodates the variable FliM stoichiometry [48], as well as participation of different protein components. Many species with multiple flagellar systems, for instance those identified in Fig 10, have duplicate fliG genes whose products must both associate with a single FliM. Furthermore, FliM subunits may contact FliG C as well as FliG M within the C ring, as proposed for the E. coli flagellar motor [20,49]. Finally, FliY may also contact FliG in addition to FliM in species that have both proteins, H. pylori for example. Strong contact between FliM and FliG is not required if the FliM M inter-subunit contacts are conserved in the common switch design to ensure conformational spread. FliG subunits can then be mobilized by the cooperative transition along the FliM M array to report FliM M conformational state to the proximal FliG C3-6 motor domain.
Our conclusions are summarized in Fig 10B. FliM M and FliG C3-6 form the dedicated switch and motor domains respectively of the switch complex. FliM M self-association is important for its function during chemotaxis, consistent with the proposed role of conformational spread [8]. The FliG ARM-C domain has weak intra-domain connectivity that reflects the conformational heterogeneity captured by the X-ray structures, but its MFXF motif forms a key interaction node. The circuit connecting the switch and motor domains consists of a chain of nodes, of which the EHPQ motif / ARM-M hub helix form the dominant node. The nodes have properties analogous to the sectors described for allosteric networks.

Methods
The Methods sections correspond to the boxes in Fig 2 that outlines the computational strategy.

MSA analysis
Sequences and alignments for the FliG N (PF14842), FliG M (PF14841) and FliG C (PF01706) domains, and FliM M (PF02154) were downloaded from Pfam [50]. The full-sequence Pfam alignments (2000-2600 sequences) are based on construction of a HMM from a curated seed alignment with HMMER3 [51] that was subsequently used to search the sequence database.
The MSAs were inspected with JALVIEW [52]. The Pfam headers were replaced with the more comprehensive Uniprot (http://www.uniprot.org) headers for concatenation of the unaligned and aligned sequences. MSA quality was assessed by measurement of the Shannon entropy of residue positions (S i ).
where p ij is the fraction of sequences at residue position i occupied by amino acid j. The entropy tends to a minimum value as conservation increases. Gaps are treated as another residue. The domain MSAs were downloaded (Pfam) or generated (CONSURF), then concatenated to obtain overall alignments. CONSURF computes residue conservation based on physico-chemical similarity [53] or evolutionary rate reliant on sequence phylogeny [54]. Alignment of the gap regions provided a metric of alignment quality.

Coevolved mutations
We used the PSICOV (precise structural contact prediction using sparse inverse covariance) algorithm [38] to compute correlations between residue positions. PSICOV employs arithmetic product correction [55] and normalized mutual information (nMI) [56] to minimize the effects of phylogenetic bias. Sparse inverse covariance estimation based on the glasso algorithm [57] minimizes indirect couplings. The mutual information (MI) between two positions (i,j) in a MSA is the difference between the sum of the Shannon entropy of the individual positions (S i , S j ) and their joint entropy, S ij . The correlation measure is the direct information, Dij, between two residue positions, where Wij, Wii and Wjj are the inverse of the nMI matrices respectively [58]. The distribution of Dij values is normalized by subtraction of the mean values in the two columns for the residue positions. The coevolution matrix is formed from the normalized Dij values. Shuffling eliminates correlations between residue positions. The comparison of the real correlation value with the distribution of values from a shuffled population provided a statistical estimate of its significance. Significant correlations ("high-scoring" correlations) were taken as those whose Dij values exceeded the distribution mean by 3σ, where σ was the standard deviation of the randomized library distribution.

Network Analysis
The PSICOV coevolution matrices were used to generate a network model, with the residues as nodes and correlations represented by edges. Bio3D [59] was used for computation of the entropy and analysis of model networks. The matrices were analysed with the igraph network library in R (http://www.igraph.org). Their network representations were examined with Cytoscape [60]. The primary nodes of the network were identified as 6 residue segments whose mean weight, W, in the difference centrality exceeded the distribution mean by 2 σ, with σ based on the randomized library distribution.

& 5. Phylogenetic Tree Topology
Domain coevolution was assessed by phylogenetic tree similarity [61]. We paired the headers of the Pfam FliG C seed sequence MSA (80 sequences) to headers in the full-sequence FliG N , FliG M and FliM M MSAs. Approximately maximum-likelihood phylogenetic trees for constructed from the FliG C MSA and each of the paired MSA using Fast Tree [62]. The paired MSAs were then quered to determine the best match to the topology of the FliG C tree. The process was repeated with another tree as reference. The reliability of tree splits was determined from 100 bootstrap replicates. The results were analysed by CONSEL [63]. CONSEL outputs the log-likelihood difference between the reference and query domain MSAs for the reference tree topology (OBS) and the Shimodaira-Hasegawa test probability (SH) that the reference tree topology is generated by the query MSA In contrast to the standard bootstrap probability, SH corrects for bias due to different sequence length. An alternative approach, based on distance matrices between all protein pairs selected from the similarity in residue composition [64] gave similar results, but was not pursued due to its limitations for analysis of paralogs [65]. . Residue positions absent from, or not resolved in, the structure sequence were eliminated with a custom script. The PSICOV algorithm was modified to output residue type together with residue position. The match for residue type ensured the high-scoring correlations were mapped correctly onto structure. Physical distances between correlated residue positions were computed from the C α atoms coordinates in the maps. The C α backbones of domains and complexes in the structures were superimposed to assess conformational heterogeneity with analysis tools in GROMACS version 4.5.5 [66]. Superposition was based on a common set of equivalent residue positions identified from the MSA. Determine of topology used the POPS web server [67] to detect surfaces based on residue solvent accessibility and estimate surface hydrophobicity / hydrophilicity. Conservation based on evolution rate, computed with CONSURF, in combination with the POPS score filtered for conserved surface patches. Results were visualized in VMD (http://www.ks.uiuc.edu/Research/vmd) and Pymol (http://www.pymol.org/).