The LysE Superfamily of Transport Proteins Involved in Cell Physiology and Pathogenesis

The LysE superfamily consists of transmembrane transport proteins that catalyze export of amino acids, lipids and heavy metal ions. Statistical means were used to show that it includes newly identified families including transporters specific for (1) tellurium, (2) iron/lead, (3) manganese, (4) calcium, (5) nickel/cobalt, (6) amino acids, and (7) peptidoglycolipids as well as (8) one family of transmembrane electron carriers. Internal repeats and conserved motifs were identified, and multiple alignments, phylogenetic trees and average hydropathy, amphipathicity and similarity plots provided evidence that all members of the superfamily derived from a single common 3-TMS precursor peptide via intragenic duplication. Their common origin implies that they share common structural, mechanistic and functional attributes. The transporters of this superfamily play important roles in ionic homeostasis, cell envelope assembly, and protection from excessive cytoplasmic heavy metal/metabolite concentrations. They thus influence the physiology and pathogenesis of numerous microbes, being potential targets of drug action.


Introduction
Members of the LysE superfamily have long been known to catalyze solute export [1]. Three families had been shown to comprise this novel superfamily: (i) L-lysine and L-arginine exporters (LysE); (ii) homoserine/threonine resistance proteins (RhtB); and (iii) cadmium ion resistance proteins (CadD) [1]. While LysE and RhtB proteins catalyze export of amino acids, the more distant CadD proteins are involved in efflux of the heavy metal ion, cadmium (Cd 2+ ) [1,2,3]. Most members of these families share similar sizes, around 200 amino acyl residues, similar hydrophobicity plots indicative of 6 transmembrane α-helical segments (TMSs), high degrees of sequence similarity within but not between families and prokaryotic origins [1].
In this paper, we report investigations allowing expansion of the LysE superfamily to include members from all three domains of life. Using computational methods, we demonstrate that the previously established members of this superfamily are homologous to members of eight additional families: (i) tellurium ion resistance proteins (TerC); (ii) iron/lead transporters (ILT); (iii) Mn 2+ exporters (MntP); (iv) Ca 2+ /H + antiporters-2 (CaCA2); (v) Ni 2+ /Co 2+ transporters (NicO); (vi) neutral amino acid transporters (NAAT); (vii) peptidoglycolipid provided for protein sequences used to generate the tree in Fig 7, and these accession numbers are found in the zip file "newick and SFT fasta.zip" in the file "FASTA_sequences_superfamily_tree.faa." The newick file of the 100 trees used to generate a consensus SFT tree is also contained in the zip file "newick and SFT fasta.zip." All multiple sequence alignments described in the manuscript are found in the zip file "Multiple_Sequence_Alignments_zip." Improved explanations for obtaining the data are located in the revised figure legends at the end of the manuscript.
Funding: This work was funded by National Institutes of Health (http://grants.nih.gov/grants/oer. htm), Grant #: GM077402. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Identifying Internal Repeats
The multiple alignment file produced from ClustalX was used as the input for IntraCompare, a program for the detection of internal repeats. Generated AveHAS plots for respective multiple alignment files were referenced to locate comparable regions of interest. IntraCompare generates comparison scores expressed in S.D. for non-overlapping regions of the same homologous proteins [15].

Motif Analyses
Motif analyses were carried out using the MEME program (The MEME Suite; http://meme. nbcr.net/meme/) [16]. Default settings were used to search for ungapped, conserved residues within a given set of homologues. Results from HMMTOP were used to predict relationships between conserved regions relative to the TMSs. Motifs identified for each family were then paired to different families to observe similar residue conservation.

Construction of Phylogenetic Trees
Phylogenetic trees were derived using multiple programs. RAxML and FastTree methods have been explored using raxmlgui [17]. Phylip-formatted multiple alignments generated using ClustalX, Mafft and Probcons were used as inputs to generate FastTree trees for each protein family in this study. In addition, a Phylip-formatted multiple alignment of members from all eleven families was generated from Mafft and used to create a set of 100 trees using the RAxML method of analysis [18]. The Mafft alignment used for the RAxML tree analysis was generated using the Mafft-homologs function with 200 homologs retrieved per input sequence at a threshold of 1e -20 [12]. All FastTree trees and the best tree indicated by the RAxML method  were viewed using FigTree. SuperfamilyTree (SFT) [19,20,21,22,23,24,25,26] and TreeView [27] were also utilized. Agreement between 100 trees was evaluated. FASTA-formatted sequences corresponding to the TC families were inputted and used to compile tens of thousands of NCBI BLAST bit-scores upon which SFT trees were based. SFT and Fitch programs then generated a default of 100 superfamily trees based on the results. These 100 trees were used to create a consensus tree [19,20,21,22,23,24,25,26]. The parameters for these programs are described in S1 Fig.

Controls
The Mitochondrial Carrier Family and the LysE superfamily. Members of the MC family have been shown to transport keto acids, amino acids, nucleotides, inorganic ions and cofactors across the membranes of mitochondria and other eukaryotic organelles [36,37]. Crystal structures for MC proteins have been elucidated, and these 6-TMS proteins were shown to have arisen via a 2-TMS triplication [28,38,39]. Members of the LysE superfamily, however, are predicted to have arisen via a 3-TMS duplication. Because of the differences in these two evolutionary pathways, MC proteins have been selected as a negative control to establish the highest possible comparison score that can be obtained by chance using non-homologous members of two unrelated superfamilies (Tables 2 and 3). The best comparison score between 3-TMS segments of the MC and LysE superfamily members was 10.5 S.D. This score was obtained between proteins of the MC family and the CaCA2 family. The average score for the five best comparisons between LysE superfamily members and the MC family was 9.8 S.D. Although at least 3 TMSs of members of these two superfamilies were included in each alignment, the TMS alignments were poor (S16J and S16K Fig). TMS overlap in the alignments is present in Table 2. In contrast, the average score for all of the best comparisons for the eleven LysE superfamily families with each other (Table 3) is 13.5 S.D, and corresponding TMSs were strongly aligned. Based on these results, we suggest that three conditions are sufficient to provide strong evidence for homology: (1) a standard comparison score of at least 13.0 S.D.; (2) proper alignment of at least 3 TMSs and (3) a unified evolutionary pathway for all superfamily members (Fig 1). These criteria were satisfied for all eleven members of the LysE superfamily.   Schematic diagrams depicting motifs and highly conserved residues within and between the CaCA2 (C2) and ILT families. Highly conserved residues were identified using alignments generated from Mafft. In Part C, the MEME/MAST Suite was used to generate the graphical logo, and the alignment was presented using the ClustalX2 user interface with the associated Mafft multiple sequence alignment (MSA).  LysE protein, TC# 2.A.75.1.1 (P94633). In addition, a score of 52.0 S.D. was obtained when comparing the full sequences of Bth1 with RhtB protein, TC# 2.A.76.1.5 (P76249). These comparison scores satisfy our statistical standards for homology, and thus, we apply the superfamily principle to confirm that these two families are related (Table 3). The relationships between CadD proteins and LysE and RhtB proteins are not apparent based on our statistical standards for sequence similarity. Additional evidence will be discussed to expand upon these relationships and establish homology.
Ca 2+ /H + antiporters-2 (CaCA2; TC# 2.A.106) Members of the family of Ca 2+ /H + antiporters, CaCA2, contain around 200-350 amino acyl residues, with 6 TMSs, typically with a 3+3 TMS arrangement, and are found in all three Table 4. Protein families with Demonstrated Internal Repeat Elements. UniProt accession numbers are provided in Column 2. The TMSs aligned refers to the positions of the TMSs from the N-terminus. For 6-TMS proteins, we find the 3-TMS internal repeat elements occur as two tandem 3-TMS elements for all families examined. For 7-TMS proteins, we find the 3-TMS internal repeat elements in the first 6 TMSs, suggesting these 7-TMS proteins have a 3+3+1 topology. The GSAT alignments generated using 20,000 shuffles for these comparisons are presented in Column 6.
Mn 2+ exporters (MntP; TC# 2.A.107) Similar to previously established members of the LysE superfamily, members of the MntP family are characterized by a size of around 200 amino acyl residues with 6 TMSs in a 3+3 TMS arrangement. They are exclusively found in bacteria and archaea. A member of this family, YebN, is known to export manganese ions [34,40]. YebN has been suggested to share significant sequence similarity with members of the LysE family efflux pumps [34]. 6-TMS MntP proteins share sufficient sequence similarity with RhtB, CadD and CaCA2 family members to establish homology (Tables 2 and 3 (Tables 2 and 3).
Iron/Lead Transporters (ILT; TC# 2.A.108). ILT family members are heavy metal ion transporters specific for iron and/or lead ions. Topological analyses confirmed that most members of the ILT family have 7 conserved TMSs arranged in a 3+3+1 arrangement [31]. ILT protein sizes vary substantially due to the inclusion of large hydrophilic domains near the Ntermini in many of these proteins. A majority of family members are found in bacteria and archaea, but are also found in eukaryotes such as fungi. ILT proteins demonstrate significant sequence similarity with proteins of CadD, RhtB and CaCA2 families (S5A-S5C Fig). Eli1 is predicted to have 7 TMSs, but HMMTOP and WHAT did not recognize a strongly hydrophobic region between predicted TMS#1 and TMS#2 as a transmembrane segment, thus suggesting that this protein has 8 TMSs. Finally, we compared TMSs 1-3 of the ILT homologue Sso1 (Q97V64) with TMSs 1-3 of the CaCA2 homologue Aan1 (F0Y333). This comparison yielded a score of 15.3 S.D (S5C Fig). A score of 67.2 S.D. resulted when comparing the full sequences of Sso1 and ILT protein, TC# 2.A.108.3.3 (Q4J7V8). In addition, a score of 52.7 S.D. was obtained when comparing the full sequences of Aan1 and CaCA2 protein, TC# 2.A.106.1.1 (P52876). With this statistical evidence, we conclude that ILT is an additional member to the LysE superfamily. A comparison between ILT and TerC proteins also yielded high comparison scores (Tables 1 and 2).
Tellurium Ion Resistance Proteins (TerC; TC# 2.A.109). Members of the TerC family are believed to function in tellurium ion resistance [41]. These proteins share a 7-TMS core with a 3+3+1 TMS arrangement and are typically found in bacteria and archaea, but are also found in eukaryotic organisms [42]. Sizes for these proteins range from 180 to 350 with as many as 9 TMSs. Coinciding with the proposed evolutionary pathway (Fig 1), no triplication could be demonstrated for these 9-TMS proteins. TerC members show significant sequence similarities with homologues from a large number of the different families (S6A- S6F Fig).
Of the TerC comparisons, the highest score was observed between TerC and CaCA2 family members (S6F  Table 2. These relationships provide further evidence for the inclusion of the TerC families in the LysE superfamily. Neutral Amino Acid Transporter Family (NAAT; TC# 2.A.95). NAAT family proteins are exclusively found in bacteria and archaea. The majority of these proteins have sizes between 190-280 amino acids with 6 predicted TMSs in a 3+3 TMS arrangement. The best characterized member of the NAAT family, SnatA, is involved in the uptake of neutral amino acids, glycine and alanine [35]. Several homologues have been annotated as multiple drug resistance proteins. However, a recent study provided evidence that disagrees with this functional assignment [43]. Significant comparison scores with NAAT proteins were seen between LysE, RhtB, CadD, MntP, and TerC family proteins (S7A-S7E Fig).
The best example of homology is seen with the comparison of TMSs 1-5 of the RhtB homologue Pag1 (L7BNM7) and the NAAT homologue Cba1 (H1S8A2), which yielded a score of 15.  2 (Q45153). These results provided strong evidence that NicO is homologous to the previously discussed families and support further expansion of the LysE superfamily. A significant comparison score between NicO and DsbD was also noted.
Peptidoglycolipid Addressing Protein Family (GAP; TC# 2.A.116). GAP family proteins are typically found in bacteria and are prominent in members of the mycobacterial genus. The majority of these proteins have sizes between 180-290 amino acids with 6 predicted TMSs in a 3+3 TMS orientation. The best characterized member of the GAP family, Q3L890 of Mycobacterium smegmatis, has been reported to play a role in biogenesis of the mycobacterial cell envelope via the transport of peptidoglycolipids [45]. The mechanism by which transport occurs is largely unknown. However, statistical relationships between GAP proteins and members of RhtB and DsbD families were determined (S9A and S10E Figs).
A comparison between sequences containing TMSs 1-5 of the RhtB homologue Hgr1 (F3KVR3) and the GAP homologue Ssp3 (NCBI: WP_019358971.1) yielded a comparison score of 14.5 S.D., demonstrating homology between the two families. A score of 16.6 S.D. was found when comparing the full length sequence of Ssp3 with that of the GAP protein, TC# 2.  [29]. Homology was established between DsbD and the RhtB, CaCA2, MntP, NAAT and GAP family proteins (S10A-S10E Fig).
In exploring these relationships, 6 TMSs of the NAAT homologue Pfu1 (Q8U2T5) were found to align with 6 TMSs of the DsbD homologue Dto1 (K0NNX9), yielding a score of 15.

Topological Analyses
Using ClustalX, Mafft and Probcons, we created multiple alignments for homologues within each family included in our study [11]. The alignments generated with each program showed a high degree of agreement. Because Mafft alignments were able to produce comparable residue patterns to ClustalX without excessive expansion of the residue position axis (S11 Fig), Mafft alignments were selected to represent the data. With these Mafft alignments, we generated Ave-HAS plots to examine the relative average hydropathy, amphipathicity and similarity plots for the homologues (S11 Fig). Additionally, AveHAS plots were generated from multiple alignments of homologues for all families with established statistical relationships (Fig 2).
Examining the plots for S11A-S11K Fig, we observe that the homologues for the LysE, RhtB, CadD, CaCA2, MntP, NAAT, NicO, GAP and DsbD families are most similar in regions corresponding to predicted TMS#1 and TMS#6. Furthermore, these figures show that the largest hydrophilic region separates TMSs #3 and 4, corresponding to regions that are highly dissimilar. These analyses support a 3+3 topological arrangement for all LysE superfamily proteins. Homologues of TerC and ILT display a 7-TMS core (S11J-S11K Fig) but share the previous characteristics with LysE, RhtB, CadD, CaCA2 and MntP. With respect to the TerC and ILT proteins, we observe a predicted 3+3+1 topological arrangement (Fig 1), but many ILT family homologues have 8 predicted TMSs, where an additional hydrophobic peak occurs at the N-termini. TerC proteins, on the other hand, can vary between 6 to 9 TMSs, and additions may occur either in the C-terminal or N-terminal regions of the sequences.
Finally, we examined a combined AveHAS plot of all eleven families with established statistical relationships. The plot (Fig 2) reveals a core of 6 TMSs among the different families with a large hydrophilic region separating the aligned core TMS#3 and TMS#4. These results further support a 3+3 TMS arrangement for members of the LysE superfamily.

Identifying Internal Repeats
Previous work on the LysE superfamily suggested that members derived from a 3-TMS internal duplication to result in a 3+3 TMS arrangement [1]. A recent examination of ILT transporters suggested a 3+3+1 arrangement with two 3-TMS repeat elements followed by a single extra TMS [31]. In addition, CaCA2 and DsbD proteins have been suggested to contain 3-TMS repeat elements [29,32]. Using IntraCompare and GSAT, we report evidence for internal 3-TMS repeats in several members of the LysE superfamily (Table 4, S12-S15 Figs). This evidence supports the proposed hypothesis that all of these proteins arose via a common intragenic duplication event.
Strong evidence is seen in the 6-TMS CaCA2 Ssp2 protein (S12 Fig). Comparing the first and second halves of the Ssp2 protein (Q2JWH3), TMSs 1-3 and TMSs 4-6 were found to align. The comparison yielded a score of 13.5 S.D., which is sufficient to establish the existence of two homologous internal repeats. The existence of this internal repeat element confirms previous reports regarding the repeating ExGD(KR)(TS) motif in TMS#1 and TMS#4 of the CaCA2 family [32]. Since we have demonstrated that CaCA2 is a member of the LysE superfamily, the other LysE superfamily proteins are presumed to share the same evolutionary pathway.

Motif Analyses
Previous mutation studies on the LysE protein in Corynebacterium glutamicum demonstrated the importance of highly conserved residues in the second and fourth hydrophobic segments of the protein [46]. A highly conserved aspartic acid (D) is present in the second hydrophobic segment of LysE, and its negative charge is essential for translocation of L-lysine. In addition, mutations to the fully conserved asparaginyl (N) and prolyl (P) in the fourth hydrophobic segment reduce export function dramatically. The prolyl residue in particular holds importance for three-dimensional structures of the carrier, and any changes in the neighboring asparaginyl residue would introduce steric hindrance. A fully conserved aspartic acid (D) is also present in the fourth hydrophobic segment, and has been proposed to bind the L-lysine substrate. Change of this aspartic acid (D) to a lysyl (K) residue resulted in an inactive protein. In the present study, motifs identified using the MEME/MAST Suite (www.meme.nbcr.net/meme/) for the different families were compared with one another (Figs 3, 4, 5 and 6, Table 5) [16]. Here we report strongly conserved residues within and between families.
CaCA2 vs. ILT. 80 proteins of CaCA2 and ILT homologues were combined and found to exhibit a shared motif in TMS#3 in these 6-TMS proteins (Fig 3A and 3B, Table 5). Not only do the two motifs align in the MEME/MAST Suite, all tested proteins share many strongly conserved residues. Positions 1-2 of this motif correspond to the second half of TMS#3 that is shared in proteins of the two families. Of the 9 positions, amino acids in positions 1, 3, 5, 6 and 9 consist largely of hydrophobic residues. In positions 1 and 2, both families contain fully conserved phenylalanine (F) and glycine (G) residues, respectively.
At TMS#1 and TMS#4, both families contain two strongly conserved negatively charged amino acyl residues (D/E). Similar to proteins in the CaCA2 and ILT families, conserved negatively charged residues have been found in MntP, CadD and TerC proteins (Figs 3, 4, 5 and 6). With the exception of the CadD proteins, the conserved, negatively charged residues in TMS#1 and TMS#4 within each protein align (S12, S13, S14 and S15 Figs). The D/E residue in these 5 families could have functional significance similar to the D residue in the fourth hydrophobic segment of LysE described previously. However, the biological significance of the conserved,  negatively charged residues in TMS#1 is not yet understood. These findings imply an evolutionary relationship between these five families and a closer relationship between CaCA2 and ILT.
MntP vs. CadD. Sequences of 85 MntP and 85 CadD proteins, all containing 6 TMSs, were combined into a single file shown to share motifs (Fig 4A and 4B, Table 5). The best shared motif in TMS#4 of MntP and CadD proteins was found in all of 170 selected proteins. Positions 1-13 in this motif correspond to the second half of TMS#4 that is shared in proteins of these two families. A highly conserved aspartic acid (D) is contained in this shared motif. Differing within the TMS#4 motif are positions 5, 8, 12 and 14. Position 5 is a fully conserved serine (S) in MntP homologues, but is a strongly conserved glycine (G) in CadD homologues. Position 8 is a strongly conserved asparagine residue in CadD homologues, but a strongly conserved alanine in MntP homologues. Additionally, position 12 corresponds to a well-conserved tyrosine in CadD proteins, but a fully conserved glycine in MntP proteins. Finally, we note well-conserved polar amino acids in position 14 for MntP homologues, but a conserved proline residue in CadD homologues.
A shared motif corresponding to the entire TMS#6 in 85 MntP and 85 CadD proteins was identified (Fig 4A and 4B, Table 5). A completely conserved glycine was shared at position 15, and strongly conserved acidic residues occurred at position 21. Finally, well-conserved hydrophobic amino acids were present in positions 6, 9, 10, 12, 14, 16, 18, 19 and 20, providing additional support for a close evolutionary relationship between MntP and CadD proteins.
The strongly conserved residues of the two sets of homologues differ at positions 4, 7, 8, 11, 13 and 22. In position 4, negatively charged amino acids are largely conserved only in MntP homologues. Position 11 differs where a completely conserved leucine residue in MntP homologues but either a phenylalanine or a tyrosine in CadD homologues is found. A glycine is wellconserved at position 13 of CadD homologues, but it is weakly conserved in MntP homologues. Position 22 of CadD homologues shows well-conserved polar amino acids (S, N), while this position in MntP homologues contains a conserved histidyl residue. Finally, we note two unique residues at positions 7 and 8: proline and glycine. Conserved proline residues can be found in CadD only (position 8), while two almost fully conserved glycines are present in MntP homologues (positions 7 and 8). These unique differences may provide insight into the divergence of these proteins and possibly correlate with their differing specificities.
LysE, RhtB and TerC. More distantly related are the motifs within members of the LysE, RhtB and TerC families. Among these three families, two residues in TMS#3 are shared (Figs 5-6, Table 5). In the middle of TMS#3, all three families show a fully conserved glycine. Additionally, a fully conserved leucine, three residues (one helical turn) away from the glycine, can be found. Strongly conserved hydrophobic residues between the fully conserved glycyl and leucyl residues are present. A tyrosine (Y) is also conserved between 88 RhtB and 88 TerC proteins (GxxYL) but is not observed in LysE proteins (GxxxL).

Phylogenetic Tree
Proteins listed in TCDB for each family were used to generate a phylogenetic tree based on tens of thousands of BLAST bit-scores using the SFT1 program (Fig 7) [20]. RhtB, LysE and TerC localize to a single branch. Similarly, CaCA2 clusters with ILT, and CadD clusters with MntP. Based on these branching patterns, members in each of these groupings must be more strongly related to each other than to other families as had been suggested from motif analyses. A tree including all eleven families generated using a Mafft multiple alignment and RAxML with bootstrap values was included for comparison (S17 Fig). The SFT and Mafft trees show remarkable agreement, particularly with respect to family relationships. However, the branches sometimes differ between the two trees (compare Fig 7 with S17 Fig), [19,20,21,22,23,24,25,26], this and other differences suggest that the phylogenetic distances between the eleven families are too great to allow the generation of accurate multiple sequence alignments. Trees representing each individual family have been constructed using multiple alignments generated by Clus-talX, Mafft and ProbCons (S18-S28 Figs).

Discussion
Using rigorous statistical criteria, we have expanded the LysE superfamily nearly four-fold. In addition to the LysE, RhtB and CadD families identified previously, this superfamily now includes the following families: NAAT, CaCA2, MntP, ILT, TerC, NicO, GAP and DsbD. Members of each of these families have been characterized and shown to play roles in transport of amino acids and resistance of heavy metal ions, along with cell surface maintenance. Most families include secondary carrier type transporters catalyzing heavy metal or amino acid efflux, but one family catalyzes amino acid uptake, another catalyzes heavy metal ion uptake, and a third catalyzes transmembrane electron transfer. GAP proteins have not been mechanistically characterized, but based on their inclusion in the LysE superfamily, we tentatively propose that GAP proteins operate as secondary carriers, where the energy source for lipid export is the proton motive force.
Through sequence analyses, we were able to recognize a distinct pattern of homology. That is, LysE, RhtB, NAAT, CaCA2, MntP, ILT, TerC, NicO, GAP and DsbD proved to be homologous in 3 or more TMSs. The 3 TMSs that aligned are usually between the first 3 TMSs, the second 3 TMSs or both. This observation fits the predicted evolutionary pathway presented in Fig  1. The presence of 3-TMS internal repeats supports the conclusion that all members of the LysE superfamily arose from a 3-TMS precursor via the same pathway in which the proposed duplication gave rise to 6 TMSs in a 3+3 TMS arrangement. In some TerC and ILT proteins, the topologies differ from the 3+3 TMS arrangement with the addition of one or two TMSs at the C-or N-terminal end, resulting in a 3+3+1, 3+3+2, or 1+3+3 arrangement.
According to the phylogenetic tree, amino acid exporter families RhtB and LysE branch close to each other, as suggested from previous studies [1]. In contrast to these two amino acid exporter families, TerC, which branches near RhtB and LysE in the tree, has been observed to play roles in tellurium ion resistance. MntP and CadD cluster together, and both are involved in divalent metal cation transport. Likewise, divalent cation transporters of the CaCA2 and ILT families branch in close proximity.
This study suggests that members of the LysE Superfamily are involved in ionic homeostasis, protection from excessive cytoplasmic heavy metal/metabolite concentrations, cell envelope assembly and transmembrane electron flow. Many of the family members, however, are still poorly understood from functional and physiological standpoints. In continuing this project, genome context analyses will be conducted on members of each family. This will allow functional predictions, further promoting an understanding of the significance of these proteins. To date, no crystal structures exist for a member of this superfamily, and such studies will be crucial for understanding their mechanistic details. Thus, studies on the LysE superfamily remain in their infancy.
Supporting Information The tree was generated using the SuperFamilyTree program and viewed using FigTree. It depicts the evolutionary relationship between the 11 different families in this study. Clustering indicates closer phylogenetic relationships. The tree is based on tens of thousands of BLAST bit scores generated with the SFT1 program where every protein was compared with every other protein included in the analysis. The SFT2 program was used to integrate all of the information to show the relationships of the eleven families to each other. Bootstrap values have been added in blue text and located near each node. (TIF) S1 Supporting Information. FASTA Files for each family. The corresponding zip file contains the FASTA files generated using Protocol1, for comparisons with Protocol2. (ZIP) S2 Supporting Information. Multiple Sequence Alignments. The corresponding zip file contains the multiple sequence alignment (MSE) outputs generated using ClustalX, Mafft, and ProbCons. These MSEs have been used to generate S17-S28 Figs. (ZIP) S3 Supporting Information. Newick and SFT FASTA files. The corresponding zip file contains the 100 trees generated from SFT, the consensus tree, the FASTA sequences used to generated the trees, and the newick file for the best tree generated from RAxML analyses (described in S17 Fig). (ZIP) S4 Supporting Information. MEME Input Sequences for Figs 3-6. The corresponding zip file contains the FASTA files used to conduct MEME Suite analyses shown in Figs 3-6 and described in Table 5. (ZIP) S5 Supporting Information. S2-S10 Figs Combined PDF. The corresponding PDF file contains the S2-S10 Figs described previously. (PDF) S6 Supporting Information. S12-S15 Figs Combined PDF. The corresponding PDF file contains the S12-S15 Figs described previously. (PDF) S7 Supporting Information. S18-S28 Figs Combined PDF. The corresponding PDF file contains the S18-S28 Figs described previously. (PDF)