Systematic Genetic Nomenclature for Type VII Secretion Systems

CITATION: Bitter, W., et al. 2009. Systematic genetic nomenclature for type VII secretion systems. PLoS Pathogens, 5(10): 1-6, doi: 10.1371/journal.ppat.1000507.

Mycobacteria, such as the etiological agent of human tuberculosis, Mycobacterium tuberculosis, are protected by an impermeable cell envelope composed of an inner cytoplasmic membrane, a peptidoglycan layer, an arabinogalactan layer, and an outer membrane. This second membrane consists of covalently linked, tightly packed long-chain mycolic acids [1,2] and noncovalently bound shorter lipids involved in pathogenicity [3][4][5]. To ensure protein transport across this complex cell envelope, mycobacteria use various secretion pathways, such as the SecA1-mediated general secretory pathway [6,7], an alternative SecA2-operated pathway [8], a twin-arginine translocation system [9,10], and a specialized secretion pathway variously named ESAT-6-, SNM-, ESX-, or type VII secretion [11][12][13][14][15][16]. The latter pathway, hereafter referred to as type VII secretion (T7S), has recently become a large and competitive research topic that is closely linked to studies of host-pathogen interactions of M. tuberculosis [17] and other pathogenic mycobacteria [16]. Molecular details are just beginning to be revealed [18][19][20][21][22] showing that T7S systems are complex machineries with multiple components and multiple substrates. Despite their biological importance, there has been a lack of a clear naming policy for the components and substrates of these systems. As there are multiple paralogous T7S systems within the Mycobacteria and orthologous systems in related bacteria, we are concerned that, without a unified nomenclature system, a multitude of redundant and obscure gene names will be used that will inevitably lead to confusion and hinder future progress. In this opinion piece we will therefore propose and introduce a systematic nomenclature with guidelines for name selection of new components that will greatly facilitate communication and understanding in this rapidly developing field of research.
The first T7S-associated protein to be identified was the 6-kD early secreted antigenic target ESAT-6 [23]. This small, highly immunogenic protein lacks a classical N-terminal signal sequence and is present in large amounts in the culture filtrate of M. tuberculosis [23], but is missing from the closely related attenuated live vaccine Mycobacterium bovis bacille Calmette-Guérin (BCG) [24] due to the deletion of region of difference 1 (RD1) [25]. ESAT-6 and its protein partner, the 10-kD culture filtrate protein CFP-10 [26], form a 1:1 protein complex [27] that involves hydrophobic interaction [18,28]. Secretion of ESAT-6 and CFP-10 is required for the pathogenicity of M. tuberculosis [29][30][31]. The absence of ESAT-6 secretion is responsible in part for the attenuation of the BCG and Mycobacterium microti vaccines [13,32,33], as well as for the decrease in virulence of the attenuated M. tuberculosis H37Ra strain [34].
In M. tuberculosis, ESAT-6 and CFP-10 belong to the WXG100 family of 23 small secreted proteins that share a size of approximately 100 amino acids, a helical structure, and a characteristic hairpin bend formed by the conserved Trp-Xaa-Gly (W-X-G) motif [35]. The genes encoding these proteins, many of which represent immunodominant T cell antigens [36], are called esx genes in M. tuberculosis (esxA-W, Table 1) and are arranged in tandem pairs at 11 genomic loci [37]. In five of these genomic loci (ESX-1-ESX-5), the esx genes are flanked by genes coding for components of secretion machineries involved in the export of the corresponding ESX proteins ( Figure 1). These proteins constitute the major building blocks of the T7S systems [11,12,15,16,19]. Four of these regions are also characterized by the presence of genes encoding PE and/or PPE proteins ( Figure 1, Table 2), named after their characteristic N-terminal motifs prolineglutamic acid (PE) and proline-prolineglutamic acid (PPE) [38]. Apart from genes localized in these core ESX regions, additional genes situated elsewhere on the chromosome may be required for the function of T7S systems. For example, the rv3616c-rv3614c genes are required for secretion of ESAT-6 and CFP-10 by ESX-1 [39][40][41].
Apart from members of the M. tuberculosis complex, the ESX-1 cluster is also present in a range of mycobacteria, including Mycobacterium kansasii [23] and Mycobacterium leprae [42]. However, experimental work has mainly focused on the ESX-1 system of Mycobacterium marinum [21,22,[43][44][45][46][47], a fish pathogen that shows high homology in its ESX loci with M. tuberculosis [48], and the fast grower Mycobacterium smegmatis [49][50][51]. M. marinum has also been used to define a role for the paralogous system ESX-5, which is required for the secretion of PE and PPE proteins [16,52,53]. For the remaining ESX-2, ESX-3, and ESX-4 systems, only very limited predictions of their putative functions can be made. ESX-3 transcriptome data suggest that this system is involved in iron/zinc homeostasis [54,55], which would be consistent with the essential role of ESX-3 in M. tuberculosis [56]. The putative functions of ESX-2 and ESX-4 remain unknown. ESX-4, which harbors a smaller number of genes than other ESX loci (Table 2), appears to represent the most ancestral T7S system in mycobacteria [12]. This hypothesis is based on the observation that ESX-4-like loci are the only ESX clusters that are found in other high GC Gram-positive bacteria, suggesting that the last common ancestor of mycobacteria already harbored an ESX-4 T7S system. Other ESX clusters may have evolved later by gene duplication and gene diversification events. However, the finding that Nocardia farcinica (http://nocardia.nih.go.jp/) contains two T7S systems, one orthologous to ESX-4 and one locus that shows some similarity to all the conserved components of larger T7S systems, suggests that evolution of T7S systems is more complex than previously anticipated. This second T7S locus in N. farcinica even contains two PPE-like genes that were originally thought to be specific for the mycobacteria [38].
T7S-like systems are also found outside the high GC Gram-positive bacteria, since a number of Firmicutes have WXG100 members [35]. However, the loci containing these WXG100 genes are only weakly similar to the mycobacterial T7S systems: in fact, only the gene encoding the FtsK/ SpoIIIE-like protein is present. Therefore, these systems should be called WXG100 systems to differentiate them from true T7S systems. Both Staphylococcus aureus and Bacillus anthracis have an active WXG100 system, and the WXG100 system encoded by S. aureus is important for virulence [57,58].
Research in the T7S/ESX field is relatively new, but is now rapidly expanding and we therefore would like to propose a systematic nomenclature for all components involved. Until now a small number of genes within the different ESX loci of mycobacteria have been named, but for most genes the original genome annotation numbers are used. These gene numbers vary between different species and even between different strains of the same species, and therefore make comparative studies confusing. Our nomenclature is appropriate for all T7S systems in high GC-Gram-positive species. Extending this nomenclature to the T7S-like systems of Firmicutes is not recommended, since there are only a very few conserved components.
As a starting point for the new nomenclature, we focus on the most studied system, the ESX-1 system of M. tuberculosis, which is the paradigm T7S system. The new nomenclature is given for ESX-1 in M. tuberculosis ( Figure 1 and Table 2) and for all ESX systems in various Mycobacteria (Table S1). The proposed rules for the nomenclature are as follows: N Only genes that have homologues in at least four of the mycobacterial ESX systems will get a general name, whereas the locus-specific genes have a more restricted name reflecting their specificity. The reason for this distinction is that the conserved genes are most likely to represent the core components of the secretion system. Moreover, all of the conserved ESX-1 components have been shown to be essential for ESAT-6/CFP-10 secretion in at least one of the mycobacterial species studied (See below). In contrast, many of the locus-specific genes encode secreted proteins, as has been shown for the ESX-1 system (see below). Furthermore, in M. leprae, an organism with an extreme reductive evolution of its genome, almost all of the non-conserved ESX-1 components are pseudogenes, whereas all of the conserved components seem to be intact [42].
N The three letter acronym for the conserved components will be ecc, for esx conserved component ( Figure 1, Table 2). This abbreviation has not been used for other genes in bacteria.
N The ESAT-6 and CFP-10 encoding genes, esxA and esxB, respectively, and the other esx genes (Table 1) will not be renamed. These gene names are informative, well-accepted, and frequently used in the literature. Furthermore, the esx gene products seem to be secreted proteins and do not seem to be components of the secretion system itself, although their presence is required for the secretion of other substrates. The same reasoning is used for the pe and ppe genes. Four of the five systems harbor pe and ppe genes, but for the moment their functions within the T7S systems remain uncertain. Furthermore, various mycobacte-   [60]. Note that the channel drawn in the outer membrane of our model refers to a hypothetical pore, whose existence has not been experimentally demonstrated. doi:10.1371/journal.ppat.1000507.g001 rial species contain a large number of genes belonging to the pe and ppe families, and it would be confusing to rename some of them. Finally, the subtilisin-like proteases already have an established and descriptive name in literature, i.e., the mycosins [59]. Therefore, we will not change this name.
N The alphabetic suffix of conserved genes will be based on the gene order in the paradigm ESX-1 system (see Figure 1). This decision is mainly based on the fact that the ESX-1 system is the most studied. The gene order of the different T7S systems is highly variable and it is therefore difficult to propose a logical ordering that would be satisfactory for all systems. The genes of ESX-2/-3/-4 and -5 will therefore be named according to their paralogue in ESX-1 ( Table 2 and Table S1), allowing for a direct and relevant comparison. The gene names of each mycobacterial T7S will include a numeral suffix indicating the ESX cluster to which this gene belongs. In order to avoid confusion with numbering of alleles, the ESX cluster number is indicated in subscript. As shown in Figure 1, the first conserved gene of the ESX-1 cluster will be eccA 1 .
N In some of the T7S clusters, the gene encoding the FtsK/SpoIIIE-like protein is split in two genes. Since these gene products clearly form a functional unit, as has also been shown for the two FtsK/SpoIIIE-like proteins of the ESX-1 system [14], the split genes will get a lower case alphabetic suffix, i.e., eccCa 1 and eccCb 1 for the ESX-1 system of M. tuberculosis (Figure 1 and Table 2). N When working with several different organisms, it can also be useful to indicate the origin of the respective genes. For this we recommend using a two-letter subscript at the end of the gene name. For example, the orthologues of the M. tuberculosis genes eccCa 1mt and eccCb 1mt would be eccCa 1ms and eccCb 1ms in M. smegmatis.
N The gene names can be converted into their proteins by capitalization, e.g., EccCa 1 . Alternatively, once the true function of a protein is known, the name could be changed to indicate this function, as has been done for the secretins of type II and type III secretion systems. If in the future new genes are identified that are essential for the functioning of several T7S systems, these genes could be named similarly using the next alphabetical suffix (eccG, eccH, etc.).
N As discussed above, in addition to the conserved genes, there are also regionspecific genes. The role of these genes in ESAT-6/CFP-10 secretion is not entirely clear: some of the encoded proteins seem to be involved in the secretion of T7S substrates in M. marinum, whereas their orthologues show less or no effect on secretion in M. tuberculosis. Recently, it has been shown that a subset of these proteins are in fact also substrates of the ESX-1 system. Thus far, four ESX-1 substrates have been identified in addition to ESAT-6 and CFP-10. These substrates are called EspA [39], EspB The number of transmembrane domains varies depending on the prediction programme used (for details see Table S2).  [46], EspR [41], and the M. marinum homologue of Rv3864 [22]. The acronym Esp stands for ESX-1 secretion-associated protein. Both rv3864 and espB are located within the ESX-1 cluster, whereas EspA and the secreted regulatory protein EspR are encoded by genes outside the ESX-1 locus. However, the espA gene is part of an operon (rv3616-3614) that has paralogues in the 59 region of the ESX-1 locus. Therefore, we propose naming all the region-specific genes of the ESX-1 system and the rest of the espA operon esp genes with alphabetical suffixes (see Table 2 and Figure 1). We will follow the espA operon and ESX-1 gene order, with the exception of espB and espR, which are already named. This means that the first gene in the esx-1 operon, whose gene product was recently shown to be secreted protein in M. marinum, will be named espE. One of the new esp genes, espG, is present with low but significant homology in two other ESX systems (ESX-2 and ESX-3) and should therefore also have a numeral suffix ( Figure 1, Table 2).
N The nomenclature of esp genes in M.
marinum is more complicated, in particular for espA. The genome of M. marinum contains a large gene cluster upstream of the ESX-1 locus, among which are 15 espA-like genes [48]. In addition, there are three more paralogues at other locations in the ge-nome. These genes should all be named espA with a superscript numeral suffix to indicate the exact gene and a subscript ''mm'' to indicate the species.
N Region-specific genes or genes encoding secreted proteins of the other ESX loci and T7S systems should not be called esp, as this name should be reserved for ESX-1 related genes. If there are important region-specific genes for ESX-2/-3/-4 or -5, a new name has to be introduced.
In order to ensure wide visibility for this new nomenclature it will be included in the most extensively used mycobacterial genome databases. As a first step, selected genome browsers available at the Institut Pasteur (http://genolist.pasteur.fr/) and/ or the Ecole Polytechnique Federale de Lausanne (http://tuberculist.epfl.ch/) will adopt these new rules; other databases could follow this example.
In conclusion, we would like to emphasize that the introduction of a uniform gene nomenclature for other secretion systems in Gram-negative bacteria (type II, type III) has facilitated comparative analysis of these systems. We anticipate that the acceptance/implementation of this proposal will provide similar advantages for the T7S systems.