Evolutionary Relationships of Microbial Aromatic Prenyltransferases

The linkage of isoprenoid and aromatic moieties, catalyzed by aromatic prenyltransferases (PTases), leads to an impressive diversity of primary and secondary metabolites, including important pharmaceuticals and toxins. A few years ago, a hydroxynaphthalene PTase, NphB, featuring a novel ten-stranded β-barrel fold was identified in Streptomyces sp. strain CL190. This fold, termed the PT-barrel, is formed of five tandem ααββ structural repeats and remained exclusive to the NphB family until its recent discovery in the DMATS family of indole PTases. Members of these two families exist only in fungi and bacteria, and all of them appear to catalyze the prenylation of aromatic substrates involved in secondary metabolism. Sequence comparisons using PSI-BLAST do not yield matches between these two families, suggesting that they may have converged upon the same fold independently. However, we now provide evidence for a common ancestry for the NphB and DMATS families of PTases. We also identify sequence repeats that coincide with the structural repeats in proteins belonging to these two families. Therefore we propose that the PT-barrel arose by amplification of an ancestral ααββ module. In view of their homology and their similarities in structure and function, we propose to group the NphB and DMATS families together into a single superfamily, the PT-barrel superfamily.


Introduction
Aromatic prenyltransferases (PTases) catalyze the transfer of isoprenyl moieties to aromatic acceptor molecules, forming C-C bonds. They are key enzymes in the biosynthesis of lipoquinones and of many secondary metabolites in plants, fungi and bacteria [1].
Aromatic PTases of lipoquinone biosynthesis are integral membrane proteins. They contain an aspartate-rich motif (e.g. NDxxD) for binding of the prenyl diphosphate substrate via a Mg 2+ ion, similar to the corresponding motif of farnesyl diphosphate synthase [2]. A structural model of the PTase UbiA involved in the biosynthesis of ubiquinone ( Fig. 1) has been proposed [3].
In contrast to the PTases of lipoquinone biosynthesis, the aromatic PTase CloQ from Streptomyces roseochromogenes, involved in the formation of clorobiocin (Fig. 1), was found to be a soluble, monomeric 35 kDa protein [4]. CloQ does not contain a NDxxD motif and is active in the absence of Mg 2+ or other divalent cations. Kuzuyama et al. [5] identified a similar aromatic prenyltransferase, NphB, involved in the biosynthesis of the prenylated polyketide naphterpin (Fig. 1) in Streptomyces sp. strain CL190. NphB was found to display a hitherto unobserved b-barrel fold which was termed the PT-barrel ( Fig. 2; PDB 1ZB6). It consists of five repetitive aabb elements. The ten b-strands arrange in an antiparallel fashion to form a central b-barrel that contains the active center in its spacious lumen and the a-helices form a solvent-exposed ring around the barrel [6]. The structure of the aforementioned CloQ also displays the PT-barrel fold ( Fig. 2; PDB 2XLQ) [7]. PSI-BLAST searches currently reveal 17 further database entries with sequence similarity to NphB and CloQ, 12 of them in bacteria of the genus Streptomyces and five in fungi of the phylum Ascomycota. In silico structure predictions suggest that all these proteins adopt the PT-barrel fold. Eleven of these enzymes have been investigated biochemically, and all of them catalyze the C-prenylation of aromatic compounds, i.e. phenols or phenazines. Fig. 1 shows as examples the reactions catalyzed by the bacterial enzymes CloQ, NphB, SCO7190, Fnq26 and PpzP, and by the fungal enzyme Ptf At . Tab. S1 (Supplementary Material) lists the references and accession numbers for all these enzymes and for all other proteins included in this study.
The C-prenylation of an aromatic compound is also catalyzed by dimethylallyltryptophan synthase (DMATS; Fig. 1), involved in the biosynthesis of the pharmaceutically important ergot alkaloids in different fungi of the phylum Ascomycota. DMATS shows no sequence similarity, as evaluated with PSI-BLAST, to the bacterial enzyme NphB or orthologs thereof, and is considerably larger than NphB (459 vs. 307 amino acids). Unexpectedly, however, it was found to adopt the same PT-barrel fold as NphB ( Fig. 2; PDB 3I4X) [8]. DMATS is the prototype of the fungal indole PTases, involved in the biosynthesis of a large number of complex secondary metabolites in fungi [9]. Currently, approximately 200 close orthologs of DMATS are found in different fungal genomes in the database. The structure of a second member of this group, FtmPT1, has recently been published, and it shows the same fold as DMATS ( Fig. 2; PDB 3O2K) [10]. Furthermore, three indole PTases (LtxC, CymD and IptA; Fig. 1) have recently been identified in bacteria [11,12,13], and GenBank currently contains 16 further entries from bacterial genomes with similarity to these three enzymes. Most of these entries are found in genomes of actinomycetales, but one is from the alphaproteobacterium Methylobacterium sp. 4-46, and LtxC is from a cyanobacterium. In silico structure prediction suggests that all these bacterial indole PTases adopt the PT-barrel fold.
A similarity in sequence between the phenol/phenazine PTases (NphB/CloQ family) and the indole PTases (DMATS/CymD family) is not detectable using BLAST and PSI-BLAST. This raises the question whether the NphB/CloQ family and the DMATS/CymD family may have originated independently and converged on the PT-barrel fold in response to the biochemical challenge of performing an aromatic prenylation reaction, i.e. a reaction corresponding to a Friedel-Crafts alkylation, in an aqueous solution which requires effective shielding of the reactive intermediary allylic cation from reaction with water [8].
Only a limited number of structural solutions is available to a polypeptide chain, therefore protein structures are multiply convergent [14]. In contrast, sequence space is essentially infinite and many sequences are compatible with a particular fold. For this reason, sequence similarity rather than structure similarity is the primary marker of homology. In the recent years, the enormous growth of protein sequence and structure databases coupled with the development of sensitive sequence comparison methods has shown that proteins may not be as polyphyletic as hitherto assumed [15]. Indeed, many fold families, for instance families of the TIM (ba) 8 -barrel fold, that were previously considered to be analogous are now thought to be homologous [16,17,18].
In this study, we used a highly sensitive sequence comparison method, HHsearch [19], based on profile hidden Markov Models (HMMs) to evaluate the evolutionary origins of the CloQ/NphB and the DMATS/CymD families. Our results indicate that they are homologous. We also include an investigation on the membrane-bound aromatic PTases, e.g. of lipoquinone biosynthesis. They show no evolutionary relationship to the CloQ/NphB and the DMATS/CymD families but display evolutionary connections to other PTases such as protoheme IX farnesyltransferases, chlorophyll a synthases and decaprenyl-phosphate 5phosphoribosyltransferases.

Materials and Methods
To calculate the root mean square deviation (RMSD) and TMscores for the proteins included in this study, we used the TMalign server (http://zhanglab.ccmb.med.umich.edu/TM-align/) [20] with default parameters. To evaluate homology between PTases we used HHsearch [21], a sensitive remote homology detection method based on the pairwise comparison of profile hidden Markov models (HMMs). HHsearch was used to perform all-against-all comparison of the 36 biochemically investigated proteins listed in supplementary Table S1. For each of these 36 proteins, we generated multiple sequence alignments using the buildali script from the HHsearch package. The obtained multiple alignments were used to calculate profile HMMs using HHmake, also from the HHsearch package. The profile HMMs were compared with each other using HHsearch and the results were mapped onto a matrix. All tools were run using default settings.
To gather the amino acid sequences of PTases for cluster analysis, we searched the non-redundant protein sequence database (nr) at NCBI for homologs of NphB from Streptomyces sp. strain CL190 (PDB identifier 1ZB6) and DMATS from Aspergillus fumigatus (3I4X) using the PSI-BLAST algorithm [22] in four iterative steps. The sequences which were shorter than 200 amino acid residues and longer than 600 amino acid residues were excluded to avoid fragments and multi-domain proteins. The same procedure was applied to search for homologs of UbiA from Escherichia coli (AP_004541), MenA from E. coli (AAB01207) and Slr1736 from Synechocystis sp. PCC 6803 (BAA17774) in four consecutive PSI-BLAST iterations. All identified sequences were pooled together and duplicates were removed using RetrieveSeq tool from the MPI bioinformatic toolkit (http://toolkit.tuebingen. mpg.de) [23]. The sequence XP_003295160 was removed due to the presence of unidentified amino acids in the sequence.
All identified sequences were analyzed and clustered by their pairwise PSI-BLAST P-values [24] with CLANS (http://toolkit. tuebingen.mpg.de/clans; [25]). CLANS treats sequences as point masses in a virtual space which attract or repel each other depending on their pairwise sequence similarities. Clustering was done to equilibrium in 2D at a P-value cutoff of 1E-3 for the cluster map of NphB and DMATS, and 1E-6 for the UbiA, MenA and Slr736 cluster map using default settings.
To detect sequence repeats in PT-barrels, we used the highly sensitive de novo repeat detection method HHrepID [26]. HH-repID takes a multiple sequence alignment as input and converts it into a profile HMM. To detect internal sequence repeats, this profile HMM is repeatedly aligned to itself. We extracted sequences of bacterial and fungal indole PTases, and phenol/ phenazine PTases from the aforementioned cluster map and calculated multiple sequence alignments with ClustalW [27]. These alignments were then analyzed for the presence of repeats with HHrepID using default settings.

Results
HMM-HMM comparisons of PTases featuring the PT-barrel: Sequence search methods achieve different levels of sensitivity, depending on the amount of information they incorporate. Sequence-tosequence methods, such as BLAST [24], are the least sensitive as they use only the information from the pairwise comparison of two sequences, scored by a global substitution matrix. Profile-tosequence methods, such as the iterated version of BLAST, PSI-BLAST [28], are more sensitive, as they include family-specific information for the query sequence in the form of a positionspecific scoring matrix derived from homologous sequences. Profile-to-profile comparison methods, such as COMPASS [29], provide an additional improvement by using family-specific information for both sequences being compared. Incorporation of position-specific gapping probabilities into the profiles yields profile Hidden Markov Models (HMMs) [30], which are currently our most sensitive tool for the detection of sequence similarity. HHsearch [19,21], an HMM-to-HMM comparison method, has a sensitivity comparable to that of advanced fold recognition methods, despite using only sequence information.
While members of the CloQ/NphB and the DMATS/CymD families display the PT-barrel fold (the structures of NphB and DMATS align at a RMSD of 3.97 Å over 290 aligned residues), they show very little sequence identity (,15%). Nevertheless many instances are known where proteins with such low sequence identity belong to the same superfamily (e.g. ubiquitins [31]). We therefore used HHsearch to investigate the evolutionary origins of these two families. Biochemically characterized members of (i) the PTases with similarity to NphB/CloQ, (ii) the fungal indole PTases and (iii) the bacterial indole PTases were selected as representatives for HHsearch analysis. As expected, HHsearch assigns a 100% probability of homology to all pairwise matches within each of these three groups (Fig. 3). We also detected matches between the fungal indole PTases (e.g. DMATS) and the bacterial indole PTases (e.g. CymD) at a probability of 100%, confirming their evolutionary relatedness. Likewise, we also obtained probability values of 100% for connections between the bacterial phenol PTases NphB and CloQ, and the fungal phenol PTases Ptf At , Ptf Bf and Ptf Sc . Strikingly, we obtained several matches between the CloQ/NphB and the DMATS/CymD families at high probabilities (50%-75%). We have previously shown that this level of sequence similarity is indicative of common ancestry [32,33,34,35]. We thus conclude that these two families are homologous.
In the biosynthesis of ubiquinones, menaquinones, plastoquinones and tocopherols, the C-prenylation of aromatic substrates is catalyzed by integral membrane proteins with several membranespanning helices [1]. Similar to the soluble farnesyl diphosphate synthase (FPP synthase) [2] and the octaprenyl diphosphate synthase IspB [36] (Fig. 1), these membrane-bound aromatic PTases show conserved NDxxD motifs for the binding of the isoprenoid substrates in the form of Mg 2+ complexes. In contrast, all aromatic PTases characterized by the PT-barrel fold are soluble enzymes without the NDxxD motifs. As expected, HHsearch detected matches between the membrane-bound aromatic PTases UbiA of ubiquinone biosynthesis, MenA of menaquinone biosynthesis and Str1736 of tocopherol biosynthesis, confirming their homology (data not shown). In contrast, these enzymes did not make any connections to the soluble PTases with the PT-barrel fold.
To check for the existence of possible distant homologs of the aromatic PTases with the PT-barrel fold, we ran HHsearch against a database comprising several complete genomes. The search was seeded with the PTases NphB and DMATS. We did not find matches to proteins outside of the CloQ/NphB and the DMATS/CymD families, indicating that the PT-barrel fold is exclusive to them at this time.
Detection of sequence repeats in the PT-barrel: The PT-barrel is a toroidal fold, in which five aabb structural repeats are arranged in a circular fashion to form a closed barrel. While these five repeats are structurally well superimposable with median RMSDs below 2.5 Å , they do not show clear sequence similarity to each other. Therefore, it has remained unclear whether the symmetry displayed by the PT-barrel is a result of five-fold amplification of a single aabb unit or of structural convergence. If PT-barrels originated by amplification, we might still find residual sequence similarity between their repeats with highly sensitive sequence comparison tools. For this, we used the de novo repeat detection method HHrepID, which detects internal sequence symmetries by repeatedly aligning the query HMM with itself. HHrepID has been used successfully to detect highly divergent sequence repeats in several folds including TIM (ba) 8 -barrels [18] and outer membrane b-barrels [34]. We detected five-fold internal sequence symmetry in both the bacterial and the fungal indole PTases at default settings with a P-value of better than 1E-4. We also found repeats in the phenol/phenazine PTases, albeit at lower detection stringency. In the indole PTases the detected repeats coincide largely with the aabb structural units, but in the phenol/ phenazine PTases the repeats are shorter and coincide only with the bb hairpins. While we can substantiate a scenario for the origin of indole PTases by amplification based on the presence of residual sequence similarity between their repeats, the repeats of phenol/ phenazine PTases are more divergent and a scenario for their origin cannot be established at this time. We note that this range of internal symmetry among members of the same superfamily is not unique to PT-barrels. b-propellers, for instance, display a wide range of internal symmetry, from near-identical to fully diverged, and an origin by amplification has been proposed for them [35].
Cluster analysis of aromatic PTases: In order to visualize the relationships between the PTases with the PT-barrel fold, we searched the non-redundant protein sequence database at NCBI for homologs of NphB and DMATS and clustered the obtained sequences in CLANS. The resulting cluster map (Fig. 4A) very clearly shows two distinct clusters that correspond to the phenol/ phenazine PTases and the indole PTases. The two clusters are connected with each other, further confirming the proposed evolutionary relationship between these two enzyme families. No other groups of proteins with similarity to NphB and DMATS were identified by this PSI-BLAST search, showing that the enzymes with PT-barrel fold are not related to other currently known proteins.
The phenol/phenazine PTases (Fig 4A; dark orange) comprise 14 bacterial proteins from the genus Streptomyces and 5 fungal proteins from the phylum Ascomycota. The cluster analysis did not show a separation of the bacterial and the fungal enzymes within this family, even at higher clustering stringency. In contrast, the indole PTases can be separated into two subclusters, one of which contains all of the 19 bacterial entries, and the other one all of the 186 fungal entries. This separation is already visible in Fig. 4A, and becomes very clear at higher clustering stringency (data not shown).
We also performed a cluster analysis of the membrane-bound aromatic PTases. We searched the non-redundant protein sequence database at NCBI for homologs of the membrane-bound PTases UbiA, MenA and Slr1736 and clustered them in CLANS. As expected, the map (Fig. 4 B) shows distinct but connected clusters for (i) 4-hydroxybenzoate PTases of ubiquinone biosynthesis, e.g. UbiA of E. coli [37], (ii) 1,4-dihydroxy-2-naphthoate 3prenyltransferases of menaquinone biosynthesis, e.g. MenA of E. coli [38], and (iii) homogentisate PTases of plastoquinone and tocopherol biosynthesis [39]. In addition, this cluster analysis revealed further enzymes to be related to the aromatic PTase of lipoquinone biosynthesis. These include the chlorophyll a synthases and protoheme IX farnesyltransferases, both of which attach phytyl or farnesyl moieties to side chains of tetrapyrrole substrates [40,41]. Another group is formed by the 5-phosphoribose-1-diphosphate:decaprenyl-phosphate 5-phosphoribosyltransferases (DPPRs) which are involved in the biosynthesis of lipids of the bacterial cell wall. The reaction catalyzed by DPPRs is quite different from that catalyzed by aromatic PTases, yet there is obvious sequence similarity between DPPR and UbiA [42]. A last  Table S1, Supplementary Material) were performed using HHsearch. Group and protein names are shown on the left. Cell color indicates HHsearch probability of the match as depicted in the scale on the right. doi:10.1371/journal.pone.0027336.g003 group of database entries related to membrane-bound aromatic PTases comprises hypothetical proteins, mostly from proteobacteria, which consist of two distinct domains: one similar to hydrolases of the HAD superfamily [43], the other one similar to DPPR or UbiA. The function of these proteins is, to our knowledge, unknown.

Discussion
The PT-barrel is a novel protein fold that was discovered recently and is found exclusively in microbial secondary metabolic PTases with aromatic substrates. For proteins with the PT-barrel fold, the name ABBA PTases has been suggested previously [6], owing to the abba succession of the secondary structure elements in the polypeptide chain which results in the characteristic antiparallel orientation of the b-sheets in the barrel. Our study suggests that all proteins with the PT-barrel fold share a common ancestry and they therefore belong to a single superfamily. As shown in Fig. 5, this superfamily can be divided into two families, i.e. the indole PTases and the phenol/phenazine PTases. The state-of-the-art sequence comparison method HHsearch yielded significant matches between these families, indicating a common ancestry. We also found evidence for the origin of the PT-barrel fold by amplification of an ancestral aabb module.
The family of indole PTases comprises the fungal indole PTases and the bacterial indole PTases, with DMATS and CymD as typical representatives, respectively. It should be noted that the term ''indole PTases'' is correct for most but not all biochemically investigated members of this family. The exceptions are SirD (NCBI accession AAS92554), which catalyses the O-prenylation of the phenolic oxygen of tyrosine in sirodesmin biosynthesis [44], VrtC (ADI24928), which C-prenylates a phenolic substrate which is related to tetracyclines [45], and TdiB (ABU51603) which catalyses both an indole prenylation and the prenylation of a phenolic moiety during terrequinone biosynthesis A [46].
As expected, HHsearch did not indicate a relationship between the soluble aromatic PTases with the PT-barrel fold, such as NphB or DMATS, and the membrane-bound PTases, such as UbiA of ubiquinone biosynthesis. Therefore, two independent solutions have evolved in nature to solve the biochemical problem of catalyzing an aromatic prenylation reaction in an aqueous environment. The indispensable shielding of the reactive allylic cation, generated from the prenyl diphosphate substrate, is achieved by a barrel of antiparallel b-sheets in case of the ABBA PTases, and by a deep lipophilic pocket between the transmembrane helices in case of the membrane-bound aromatic PTases.
All PTases characterized by the PT-barrel fold belong to secondary metabolic pathways; no primary metabolic enzyme with this fold has been discovered yet. In contrast, most of the membranebound aromatic PTases are involved in primary metabolism. However, a few enzymes of this group are involved in secondary metabolism. The bacterial PTase AqgD catalyzes the O-prenylation of the secondary metabolite alkyl-methoxyhydroquinone [47], and the fungal PTase XP_751272 is involved in the biosynthesis of pyripyropene A [48]; both show similarity to UbiA of ubiquinone biosynthesis. The bacterial putative PTase BAD07390 is likely to be involved in the biosynthesis of the secondary metabolite BE-40644 [49]; it shows similarity to MenA of menaquinone biosynthesis. The recently characterized bacterial PTase AuaA is involved in the biosynthesis of Auracin D [50] and is located inbetween the UbiA and protoheme IX farnesyltransferase clusters in the map depicted in Fig. 4B.
During our cluster analysis of membrane-bound aromatic PTases, we noticed that many bacterial and fungal genomes contain not one but several genes for (biochemically not yet characterized) proteins annotated as ''UbiA prenyltransferase'' or similar. For instance, the genome of Salinispora tropica contains two genes annotated as ''4-hydroxybenzoate polyprenyltransferase'' (YP_001160901) and ''UbiA PTase'' (YP_001161073). The genome of Catenulispora acidiphila likewise contains a gene annotated as ''4-hydroxybenzoate polyprenyltransferase'' (YP_003118736) and in addition three ''UbiA prenyltransferase'' genes (YP_003112865, YP_003115669 and YP_003116365). Both organisms are Grampositive bacteria which are believed to not produce ubiquinones [51]. It remains to be shown whether such UbiA-like enzymes may be involved in the biosynthesis of secondary metabolites. In plants several PTases with homology to enzymes of ubiquinone and plastoquinone biosynthesis have recently been shown to be involved in the biosynthesis of important secondary metabolites [52].
Both the membrane-bound and the soluble aromatic PTases show remarkable promiscuity for their aromatic substrates and have been used for the chemoenzymatic synthesis of new prenylated aromatic compounds [53,54,55,56,57,58]. Protein engineering has allowed altering the substrate specificity of indole PTases [9,59]. Therefore, these PTases may represent promising tools for biotechnological and pharmaceutical research.

Supporting Information
Table S1 Proteins included in this study. (PDF)

Author Contributions
Conceived and designed the experiments: TB VA ANL LH. Performed the experiments: TB VA. Analyzed the data: TB VA. Contributed reagents/ materials/analysis tools: VA ANL. Wrote the paper: TB VA OS ANL LH.