A Non-Canonical NRPS Is Involved in the Synthesis of Fungisporin and Related Hydrophobic Cyclic Tetrapeptides in Penicillium chrysogenum

The filamentous fungus Penicillium chrysogenum harbors an astonishing variety of nonribosomal peptide synthetase genes, which encode proteins known to produce complex bioactive metabolites from simple building blocks. Here we report a novel non-canonical tetra-modular nonribosomal peptide synthetase (NRPS) with microheterogenicity of all involved adenylation domains towards their respective substrates. By deleting the putative gene in combination with comparative metabolite profiling various unique cyclic and derived linear tetrapeptides were identified which were associated with this NRPS, including fungisporin. In combination with substrate predictions for each module, we propose a mechanism for a ‘trans-acting’ adenylation domain.


Introduction
Fungal non-ribosomal peptides contribute a large variety of secondary metabolites with remarkable properties such as antibacterial, antifungal, antiparasitic, anticancer and immunosuppressive activities. These metabolites are produced by large, multifunctional protein complexes, called nonribosomal peptide synthetases (NRPS). These enzymes catalyze the stepwise condensation of simple amino acid building blocks to complex molecules. NRPSs have a modular organization, with each module responsible for one discrete chain-elongation step. Every single module can be subdivided into domains that carry all essential information for recognition, activation and modification of the corresponding substrate. At a minimum, a typical NRPS module consists of an adenylation (A) domain, responsible for amino acid activation, a thiolation domain, also known as peptidyl carrier protein (PCP), which binds the activated amino acid and a condensation (C) domain that catalyzes peptide-bond formation. The common arrangements of these domains follow a (C-A-PCP) n organization. Additionally, a variety of optional domains have been described such as methyltransferase (MT) and epimerization (E) domains [1].
The number of modules and their domain organization within NRPS enzymes controls the structures of the final product(s) [1][2][3]. Thus, the order of modules usually corresponds to the sequence of amino acids in the peptide. Many NRPS systems adhere to this mechanistic paradigm, which is often referred to as the ''colinearity rule'' [4]. Also exceptions to this rule have been discovered, including iterative NRPSs, which incorporate multiple residues of the same amino acid iteratively into the peptide structure and the so called nonlinear NRPSs, which deviate completely from the standard domain organization leading to unexpected products [3,5,6].
The impact of non-ribosomal peptide metabolites on the quality of human life raised the interest of pharmaceutical industries to invest in identification, engineering and heterologous expression of NRPS genes and pathways to ensure the rational production of novel compounds [7][8][9]. To understand the basic mechanisms of the biosynthesis of these complex NRPSs, detailed studies have been performed during the past few decades. These included the structural analysis of adenylation domains, mutational analysis of substrate specificity of these modules, the fusion of unrelated modules to produce new products and the identification of helper proteins for optimal activation of adenylation domains [10][11][12].
Although this has led to detailed insights into catalytic mechanisms, so far a structure of a complete NRPS is lacking that would reveal how modules cooperate to facilitate product formation. The availability of genome sequencing data and sophisticated bioinformatics analysis of various fungi revealed the presence of many NRPS genes that have not been associated with known secondary metabolites [13][14][15]. Moreover, most of these genes are not expressed when the fungi are grown under laboratory conditions, implying that many more secondary metabolites await discovery.
The filamentous fungus Penicillium chrysogenum is well known for the production of the antibiotic penicillin G that is synthesized by the tri-modular NRPS d-(L-a-aminoadipyl)-L-cysteinyl-D-valine synthetase. In addition, other NRPS derived secondary metabolites like the roquefortines and meleagrin have been reported from P. chrysogenum [16][17][18]. Here, we describe the identification and structural characterization of cyclic tetrapeptides (Figure 1), including the previously identified metabolite fungisporin [19,20], and the discovery of a tetra-modular NRPS with an unusual domain organization with adenylation domains showing microheterogenicity. It is proposed to term this NRPS HcpA (CAP93139.1) based on the produced Hydrophobic cyclic peptides.

B. Host strains, media, grown condition and plasmid construction
Deletion of the hcpA gene was carried out in P. chrysogenum strain DS54555, which lacks penicillin cluster genes and the ku70 gene [18]. This strain was kindly provided by the DSM Biotechnology Center (Delft, Netherlands). A deletion plasmid was constructed by amplifying the flanking regions of the targeted gene with the Multisite Gateway H Three-Fragment Vector Construction Kit according to the procedure described by Invitrogen using pDEST R 4 -R 3 p as template. Primers used for the construction of the deletion plasmid pDEST R 4 -R 3 p PcHcpA ( Figure S1) are listed in Table S1. Escherichia coli DH5a (F-W80lacZDM15 D(lacZYA-argF) U169 recA1 endA1 hsdR17 (rK-, mK+) phoA supE44 lthi-1 gyrA96 relA1) was used as host strain for high frequency transformation and plasmid DNA amplification [21]. All the strains were grown on yeast nitrogen base-glucose-yeast extract (YGG)-medium for protoplasts formation and transformation [22]. Both mutant and host strains of P. chrysogenum were grown on secondary metabolite production medium as described previously [18].

C. Transformation procedure
The deletion plasmid pDes R4-R3p PcHcpA was transformed to the protoplasts of P. chrysogenum DS54555 [23] yielding the DhcpA derivative of strain DS54555 The phleomycin resistance gene was used as selection marker for the deletion of the HcpA gene [22,24]. D. Genomic DNA extraction, total RNA extraction, cDNA amplification and qPCR analysis Genomic DNA (gDNA) was isolated after 96 hours of growth on SMP medium (secondary metabolite production medium) using the modified yeast gDNA isolation protocol [25] in which the fungal mycelium is broken in a FastPrep FP120 system (Qbiogene). Isolated gDNA was measured using a NanoDrop ND-1000. gDNA of the host and various deletion strains was isolated using the E.Z.N.A. Fungal DNA kit (Omega Bio-tek). Total RNA of the host strain was isolated after 48 hours of growth in SMP medium for the first time and then with the interval of 24 hours up to 216 hours of growth using the Trizol reagent (Invitrogen), with additional DNase treatment using the Turbo DNA-free kit (Ambion). Total RNA was measured with the NanoDrop ND-1000 and a concentration of 500 ng per cDNA reaction was used. cDNA was synthesized using the iScript cDNA synthesis kit (Bio-Rad) in a 10 ml end volume. The primers used to analyze the expression of the hcpA gene were designed around an intron to avoid amplification on gDNA (Table S2). For expression analyses, the c-actin gene was used as a control for normalization (Table S2). A negative reverse transcriptase (RT) control was used to determine the gDNA contamination in isolated total RNA. The expression levels were determined as described [18].

E. Southern blotting
Southern blotting was carried out by digesting gDNA (5 mg) with the indicated restriction enzymes. Digested DNA fragments were separated on a 0.8 % agarose gel, blotted onto a Zeta-Probe membrane (Biorad) as described earlier [26], and hybridized with the indicated DIG labeled probes. F. Metabolite profiling and analysis 1. Sample preparation. Host and deletion strains of P. chrysogenum strains used for gene assignments were grown in quintuplicate according to the procedure described above. Samples for acquisition of the metabolite profiles from the growth curves were from five replicates. Metabolite profiling was carried out with modifications as described earlier [18]. Briefly, 4 ml of internal standard mixture (855 nmol/mL ranitidine, 657 nmol/ mL reserpine and 114 nmol/mL ampicillin) was added to 100 ml fermentation broth followed by the addition of 400 ml methanol for protein precipitation. The samples were vortexed and spun down at 14,000 g for 10 minutes. 300 mL supernatant was evaporated for 30 minutes in a speedvac (Thermo Scientific, San Jose, CA) and re-dissolved in 100 mL water. LC-UV-MS analysis was performed on an Agilent 1200 Capillary pump (Agilent, Santa Clara, CA) coupled in-line to a Surveyor PDA detector (Thermo Scientific, San Jose, CA) and LTQ-FT mass spectrometer (Thermo Scientific, San Jose, CA) using electrospray ionization and operated in a scan range between m/z 110 and m/z 2000 in positive/negative ion switching mode. Separation was performed on a Waters Atlantis T3 column (2.16100 mm, 3 mm) (Waters, Milford, MA) starting with 98 % of solvent A (1 % acetonitrile and 0.1 % formic acid in water) and 2 % solvent B (1 % water and 0.1 % formic acid in acetonitrile) for 1.5 minutes at a flow rate of 300 mL/min. 40 % B were reached after 22 minutes and 100 % B at 25 minutes. The column was flushed with 100 % B and reequilibrated to initial conditions. Peak detection and integration were performed using an in-house tool followed by statistical tests to discover significant different features. Finally, discovered features were integrated using LCquan v.26 (Thermo Scientific, San Jose, CA). The non-related non-endogenous compound reserpine was used as internal standard.
2. Identification of cyclic tetrapeptides. The identity of cyclic tetrapeptides was determined using samples from liquid cultures of P. chrysogenum and a crude spontaneous precipitate obtained from P. chrysogenum cultures, containing primarily 1 and 2 next to various minor abundant cyclic tetrapeptides in relative concentrations of 70 and 15 %, respectively. LC-MS n experiments for the determination of consecutive amino acid losses were performed according the metabolite profiling section with normalized collision energies of 35 %, an isolation width of 1 amu and an activation Q of 0.30. NMR spectra were recorded on a Bruker Avance III 700 MHz NMR spectrometer (Bruker, Billerica, MA), equipped with a 5 mm TCI probe. 2 mg of each sample was dissolved in 0.6 mL anhydrous DMSO. NMR spectra were acquired at 340 K.
3. Identification of linear tetrapeptides. Linear tetrapeptides were identified according their multiple-stage fragmentation after AQC derivatization [27]. Methanol (400 mL) was added to an aliquot of 100 mL fermentation broth for protein precipitation. Samples were vortexed for 10 minutes, spun down for 10 minutes and 300 mL of the supernatant was evaporated to dryness in a speedvac (Thermo Scientific, San Jose, CA). Derivatization was done according to the supplier's procedure by re-dissolving the sample in 40 mL water, 40 mL borate buffer (pH 8.5) and 20 mL AQC solution. The mixture was vortexed for 10 minutes and heated for 10 minutes at 55uC.
LC-MS n experiments were conducted on an Agilent 1200 Capillary pump (Agilent, Santa Clara, CA) coupled to a LTQ-FT mass spectrometer (Thermo Scientific, San Jose, CA) using electrospray ionization. Separation was performed on a Waters Atlantis T3 column (2.16100 mm, 3 mm) (Waters, Milford, MA) starting with 72 % of solvent A (1 % acetonitrile and 0.1 % formic acid in water) and 28 % of solvent B (1 % water and 0.1 % formic acid in acetonitrile) for 1.5 minutes at a flow rate of 300 mL/min. After 8 minutes the gradient reached 60 % of solvent B. Subsequently, the column was flushed with 100 % B before it was re-equilibrated to initial conditions. The peptide sequences were elucidated using multiple-stage collision-induced dissociation (CID) of the protonated molecule following consecutive cleavages of amino acids residues, starting from the C-terminus of the derivatized linear tetrapeptide. CID was performed with normalized collision energies of 35 %, an isolation width of 1 amu and an activation Q of 0.30.

Bioinformatic analysis of a tetrapeptide NRPS
Genome sequencing revealed that P. chrysogenum encodes 11 NRPS genes [15]. Microarray expression analysis under glucoselimited chemostat culture conditions as well as quantitative PCR under shake flask culture condition showed that Pc16g04690 (genbank protein identifier CAP93139.1) (hcpA) is highly expressed (Figure 2A) [15]. The hcpA gene encodes a large multimodular non-ribosomal peptide synthetase enzyme ( Figure 3) with 6064 amino acids and a calculated molecular mass of about 670 kDa. HcpA, which shows 54% sequence identity to the orthologous An08g02310 in A. niger, has the domain architecture A 1 -PCP 1 -E- adenylation, C = condensation, PCP = thiolation, and E = epimerization) ( Figure 3A) [15]. A similar domain architecture was deduced for the orthologous protein of A. niger except for an insertion of a 177 amino acid long sequence between the adenylation domain A 4 and A 2 which shows homology to conserved motifs of an incomplete condensation domain (C o ) ( Figure 3A). To predict the substrate specificity of the four adenylation domains of the two HcpA proteins, NRPSPredictor2 was used [28]. This program extracted the active site amino acid motifs DAACVAGVAK and DA-VIIAAVAK as the signature sequences for the A 1 domain in the P. chrysogenum and A. nidulans HcpA proteins. This motif exhibits similarity with the signature sequence of the adenylation domain of a bacitracine-producing NRPS that activates phenylalanine as a substrate. The signature sequences for the A 4 domain (DAVSAG-VAAK and DMQSAWFICK in the P. chrysogenum and A. nidulans HcpA, respectively) shows homology with the valine-activating adenylation domain of the gramicidin synthetase, whereas the A 2 (DAMTVGGVFK and DVLSTGAICK for the P. chrysogenum and A. nidulans HcpA, respectively) and A 3 (DAMFVGGVFK and DAMFVGGIFK for the P. chrysogenum and A. nidulans HcpA, respectively) domains have predicted specificities towards phenylalanine and valine, respectively. The overall architecture of both synthetases is unusual, as the A 2 and A 4 domains occur adjacent to each other, flanked by a single C and PCP domain in a C 2 -A 4 -A 2 -PCP 2 pattern. On the other hand, an incomplete module (C 4 -PCP 4 ) without an adjacent A domain is found at the N-termini of these NRPSs ( Figure 3A).

Genetic deletion of the tetrapeptide NRPS and secondary metabolite identification
In order to identify the secondary metabolites synthetized by HcpA, the corresponding gene was deleted and comparative metabolite profiling was performed on the culture supernatant of the host and deletion strain. As host strain, P. chrysogenum DS54555 was used, which is derived from the industrial DS17690 strain and that lacks the ku70 gene to make it competent for homologous recombination. The DS54555 strain also lacks the multiple penicillin biosynthetic genes clusters in order to facilitate the detection of unknown secondary metabolites in the culture broth as the profile is no longer dominated by b-lactams. The hcpA gene was resequenced from the genome of P. chrysogenum DS54555 and the nucleotide sequence of the open reading frame (genbank KJ679502) including the promotor region was found to be identical to the hcpA gene present in the sequenced genome of P. chrysogenum Wisconsin54-1255 [15]. The gene deleted by homologous recombination using the deletion plasmid pDEST R 4 -R 3 p ( Figure S1) containing the flanking regions of hcpA and the phleomycin resistance gene. Colonies were selected on phleomycin containing agar plates where the mutant colonies showed smooth phenotypic characteristics compared to the wrinkled surface of the colonies of the parental strain ( Figure 4). The deletion of the hcpA gene was confirmed by southern blot hybridization ( Figure 5A).
The host and DhcpA strain were grown for 168 hours in SMP Medium followed by comparative metabolite analysis using HPLC-UV-MS. Several secondary metabolites were found to be present in the host but absent in the deletion strain ( Figure 6, Table S3). These compounds could be classified into two groups according their chemical structure. The first group consists of ten cyclic tetrapeptides, which were identified using HPLC-MS n , NMR and a synthetic standard (Figure 1). Upon excitation, cyclic tetrapeptides undergo ring opening in the mass spectrometer resulting in four linear tetrapeptides, which can be sequenced in a similar fashion as linear peptides. By following the sequential loss of amino acids from b-ions of each generated linear tetrapeptide, the sequence of their cyclic origin could be determined ( Figure S2). The identities of the involved amino acids, corresponding to the losses in the mass spectrometer, as well as their sequence were additionally confirmed by 1 H-NMR and 13 C-NMR experiments for the compounds 1 and 2 (Table S4 and S5). To discriminate the amino acid isoleucine from its isomer leucine, which is represented by a loss of 113 Da (C 6 H 11 ON) in the mass spectra of compounds 5-10, 1 H-, 13 C-and various 2D-NMR experiments were conducted ( Figure S3, Table S6). Overall, cyclic tetrapeptides obtained from metabolic profiling contain various combinations of the five amino acids valine, isoleucine, phenylalanine, tyrosine and tryptophan making them extremely hydrophobic. They are arranged in a common sequence in which two aromatic amino acids are followed by two aliphatic amino acids. The absolute stereochemistry of compound 2 was confirmed by spiking its synthetic standard to a natural extract, which did not lead to additional signals in the 1 H-NMR spectra, whereas the intensity of the main signals increased as compared to the impurities ( Figure  S4A and B). In addition, superimposing the HMBC spectra of the natural and the synthetic sample of peptide 2 illustrated identical correlations and shifts ( Figure S5). Furthermore, retention time and MS 2 fragmentation did not show differences between the extracted compound and the synthesized standard. This leads to the conclusion that not only the sequence of amino acids is identical in the synthetic and natural peptide, but also the chirality  of the individual amino acids. Therefore, the cyclic tetrapeptide 2 has the same stereochemistry as observed for 1 [19] with the first amino acid of an aliphatic and aromatic pair in D-and the second amino acid in L-form. Due to the low concentration of cyclic tetrapeptides 3-10, reliable stereochemical information could not be obtained. However, as these peptides originate from the same NRPS as 1-2 and share the same sequence of aliphatic and aromatic amino acids, identical stereochemistry is expected.
The second category of identified compounds consists of 18 linear tetrapeptides which are comprised of the same five amino acids as their cyclic analogues (Table S3 and S7). Similar to the cyclic peptides, each linear tetrapeptide contains two aliphatic and two aromatic amino acids in different arrangements yielding various isomeric structures. Due to similar chromatographic properties and an identical mass-over-charge ratio, these isomers are represented as a group in data obtained from metabolic profiling ( Figure S6A and B). Their structure elucidation is challenging as minor fragments can be attributed to a fragmentation of low abundant linear tetrapeptides as well as to possible sequence scrambling of major linear tetrapeptides which was reported for similar linear peptides [29]. To separate and sequence these isomeric tetrapeptides and to prevent possible sequence scrambling, their N-terminus was derivatized using 6-aminoquinolyl-N-hydroxysuccinimidyl carbamate (AQC) (Figure S6C and D) [30]. De novo peptide sequencing was performed by following the consecutive amino acid losses of various b-ions using multiplestage fragmentation mass spectrometry ( Figure S6E and Table S7). Similar to the cyclic peptides, each linear tetrapeptide incorporates two consecutive aromatic and/or two consecutive aliphatic amino acids.
In conclusion, several linear and cyclic tetrapeptides with similar structural features were found to be present in the host but absent in the deletion strain which shows that they originate from one single NRPS, namely HcpA.

Expression of the hcpA gene and secondary metabolite production
To examine the expression of the hcpA gene, the host strain was grown for up to 216 hours in SMP medium. Samples were collected for total mRNA extraction and extracellular metabolites analysis during growth. Metabolite concentrations were determined by HPLC-UV-MS, while transcript levels were determined by quantitative PCR using c-actin as a reference gene. A high expression level of the hcpA gene was observed between 120 and 168 hours of growth, which was paralleled by a 12 times increase in the concentration of cyclic tetrapeptide 1 (Figure 2A and B) in the medium. In general, the concentration of cyclic tetrapeptides was exceptionally high around 168 hours of growth, while the concentration of the linear tetrapeptides increased with time.
These data suggest that the linear tetrapeptides are derived from the cyclic tetrapeptides by degradation.

Discussion
Here we have demonstrated that the NRPS HcpA is responsible for the biosynthesis of cyclic hydrophobic tetrapeptides. Bioinformatics analysis of HcpA shows an unusual domain architecture in comparison to classical linear NRPSs with no products associated so far. However, through the deletion of the hcpA gene and comparative metabolite profiling, ten cyclic tetrapeptides were identified, including the previously described secondary metabolite fungisporin 1 [19]. According to the non-ribosomal code in combination with discovered cyclic products, the first NRPS module is specific for phenylalanine. Due to the adjacent E domain, responsible for epimerization of the activated amino acid,  a D-configuration is expected as observed for product 1 (Figure 1). The next two modules of the NRPS contain an unusual architecture in which PCP 2 and C 3 are flanked by two neighboring adenylation domains A 2 and A 4 . As adenylation domain A 4 shows high specificity towards valine and adenylation domain A 2 shows high homology towards A 1 , both adenylation domains activate phenylalanine as supported by the structure of 1. Due to the lack of an adjacent PCP domain for A 4 and a missing C domain for A 2 , which are necessary for loading and condensation of the substrate, no module activity is expected according to the classical C-A-PCP geometry ( Figure 3A). Surprisingly, both products 1 and 2 show the incorporation of L-phenylalanine at the second position. As module 2 is predicted to be the only module capable to catalyze the incorporation of L-phenylalanine, A 2 indeed must be active. Consequently, it seems likely that A 4 is skipped, leaving the N-terminal C 2 domain of module 2 to catalyze the condensation of the first amino acid from module 1 to the second amino acid from module 2, as observed for the products 1 and 2. The third module of HcpA contains the domains C 3 , A 3 , PCP 3 and E arranged in a common linear order. The A 3 domain is predicted to activate valine, which agrees with the peptide sequence of 1 and 2, as D-valine is their third amino acid. The fourth module of the NRPS is an incomplete module consisting of C 4 and PCP 4 . Due to a missing preceding A domain no activity is expected. However, the chemical structures of both cyclic tetrapeptides 1 and 2 show the incorporation of L-valine as fourth amino acid in their peptide sequence. As A 4 is the only domain predicted to be specific to valine without an adjacent epimerization domain, it is very likely that this domain is a 'trans-acting' A domain that interacts with C 4 and PCP 4 to add the last amino acid to the tetrapeptide. A similar architectural flexibility has been observed in the biosynthesis of yersiniabactin, in which one A domain, located in HMWP2, loads three PCPs located on different modules [31][32][33]. As the linear domain organization of HcpA does not reflect a linear assembly of substrate incorporation into the final product, non-linear interactions are deduced. Although A 4 , C 4 and PCP 4 are not in a consecutive sequence on a genomic level, they might still be closely arranged in the final three-dimensional enzymatic structure. Structural characterization would be necessary to determine spatial proximity. Finally, after the incorporation of L-valine into the peptide chain, the C and PCP domain of the last module catalyze the cyclization of the peptide leading to the final cyclic structure, as previously observed in other NRPS systems [34,35]. It should be stressed that nonlinear NRPS organizations are a very heterogenous group of NRPS systems which deviate from the colinearity rule thus showing various unusual mechanisms [6]. In an alternative model A4 might be non-functional, leaving A3 loading two PCPs, namely PCP3 and PCP4, similar to the cysteine-specific A domain of HMWP2 in yersiniabactin biosynthesis [3]. However, more detailed biochemical studies are required to fully understand the interplay between these enzymes and to confirm the exact biosynthetic mechanisms involved.
Next to the production of 1 and 2, eight additional lower abundant cyclic tetrapeptides were identified to be present in the host strain and absent in the deletion strain ( Figure 1). They show a similar peptide sequence as 1, containing two aromatic amino acids followed by two aliphatic amino acids. Although stereochemical information is only available for the compounds 1 and 2, it can be assumed that each of the cyclic tetrapeptides contains an aliphatic and aromatic amino acid in the D configuration, more specifically at the first and third position. These assumptions in combination with the stereochemical structure of 1 and 2 lead to the conclusion, that each adenylation domain of HcpA shows specificity towards more than one precursor amino acid with A 1 being specific towards phenylalanine and tyrosine and A 2 being specific towards phenylalanine and to a lesser extent to tryptophan (Table S8) reminiscent of microheterogenicity. Together with the two aliphatic amino acid selecting adenylation domains A 3 and A 4 , which preferably activate valine before isoleucine, 16 cyclic tetrapeptide combinations are theoretically possible. However, only ten of these were detected in the fermentation broth of P. chrysogenum confirming a different degree of specificity towards their precursors. Based on a similar chemical scaffold of identified compounds 1-10 to the tetrapeptides cyclo-(N-MePhe-Ile) 2 , cyclo-(N-MePhe-Val) 2 and cyclo-(N-MePhe-Val-N-MePhe-Ile) reported from Onychocola sclerotic, cardiac channel blocking activities can be expected for the hydrophobic cyclic peptides presented here [36]. In addition, the colonies of the DhcpA strain lost the ability to produce a wrinkled surface leading to a rather smooth appearance ( Figure 4). As this change is attributed to the deletion of the hcpA gene, the hydrophobic cyclic peptides 1-10 need to be involved. Possibly, these molecules function analogous to hydrophobins in altering the surface properties and influencing aerial growth. The exact function is, however, still unclear.
In addition to the cyclic tetrapeptides several highly abundant linear tetrapeptides could be observed in the cultural broth of the host strain that were absent in the deletion strain. To each of the cyclic tetrapeptides, several linear tetrapeptides with the same sequence were present. For instance, for the cyclic tetrapeptide 1 with the sequence cyclo-(Phe-Phe-Val-Val), three linear tetrapeptides with the sequences Phe-Val-Val-Phe, Val-Phe-Phe-Val and Phe-Phe-Val-Val could be found at different ratios. Their concentration increased over time in the media while their cyclic counterpart decreased after 168 hours ( Figure 2B). This leads to the conclusion that the linear peptides originate from the degradation of their cyclic counterparts by hydrolysis of their peptide bonds, which was observed exclusively between two aromatic, two aliphatic or an aliphatic followed by an aromatic amino acid (Table S3 and S7). Linear tetrapeptides with a Nterminal aliphatic amino acid and a C-terminal aromatic amino acid were not detected, leading to the conclusion that cleavage of this bond is not favorable. As cyclic tetrapeptides are relatively stable towards chemical and thermal degradation, enzymatic hydrolysis might be most probable.
P. chrysogenum contains a second NRPS that could potentially be involved in tetrapeptide formation, i.e., Pc13g14330. This protein has a linear organization and thus differs from the HcpA protein.
Pc13g14330 is hardly expressed under batch culture conditions as employed in this study. Moreover, overexpression of Pc13g14330 under control of the strong pcbC promoter did not lead to any novel detectable metabolite in the growth medium, nor did the deletion of the Pc13g14330 gene affect the cyclic tetrapeptide production or result in a loss of other metabolites (unpublished data). Therefore, we conclude that Pc13g14330 is unrelated to HcpA, and not responsible for cyclic tetrapeptide formation. HcpA shows 54% amino acid sequence identity with the orthologous protein from A. niger with exactly the same module organization ( Figure 3A). Furthermore, all cyclic products 1-10 present in P. chrysogenum could also be found in the supernatant of an A. niger strain while they were absent in the HcpA deletion strain (unpublished data). Therefore, it is concluded that HcpA is involved in production of all cyclic metabolites (1-10) in A. niger. A small difference exists in the organization of both HcpA proteins with a short additional amino acid sequence present between domains A 4 and A 2 in the A. nidulans enzyme. This sequence showed limited homology to a condensation domain and appears as an incomplete condensation domain. Hence, one may deduce that this non-canonical situation might have evolved quite recently. Perhaps, a complete C-A-PCP-C-A-PCP module structure was present before, but degenerated after new interactions between the domains evolved.

Conclusions
Cyclic tetrapeptides are a structurally interesting group of peptides produced by fungi that have attracted much interest because of their cardiac ion-channel blocking properties. The complete biosynthetic mechanism of these metabolites has been deduced by gene deletion experiments in P. chrysogenum in combination with comparative metabolite profiling and consecutive structure elucidation. By this analysis, trans-aminoacylation for chain elongation in NRPS has been found. Furthermore, a distinct microheterogenicity of each adenylation domain towards different amino acid building blocks resulted in a range of cyclic tetrapeptides as produced by a single NRP synthetase.  Figure S3 HSQC spectrum of isolated mixture of cyclic tetrapeptides in DMSO. The spectra were used for the identification of isoleucine present in the minor abundant products 5 and 9. Signals corresponding to isoleucine are indicated with circles. Conducted TOCSY, COSY and HMBC experiments further confirm this conclusion (data not shown). 1 H and 13 C chemical shifts are shown in Table S6. (TIF) Figure S4 1 H-NMR spectra of mixtures of cyclic tetrapeptides. A: 1 H-NMR spectrum of a precipitate of various cyclic tetrapeptides containing primarily peptide 1 and 2 (bottom). Synthetic compound 2 spiked to the precipitated mix of various cyclic tetrapeptides in DMSO at 340 K (top). Signals corresponding to 2 were increased as compared to the impurities whereas additional signals did not appear. B: Zoomed regions of 1 H-NMR spectrum of natural precipitate (bottom) and precipitate spiked with compound 2 (top). Signals which increased after spiking are indicated (*). (TIF) Figure S5 Superimposed HMBC spectra of synthetic compound 2 (red) and an isolated mixture of cyclic tetrapeptides containing naturally produced compound 2 (black). Correlations between NH and CO are shown which indicate identical shifts for both samples. Assignments can be found in Table S4. (TIF) Figure S6 Sequencing of the linear isomers 11 (Phe-Phe-Val-Val), 12 (Val-Phe-Phe-Val) and 13 (Phe-Val-Val-Phe). A: Extracted ion chromatogram (EIC) of the linear tetrapeptides 11, 12 and 13 using the profiling method. No chromatographic separation could be achieved. B: MS 2 fragmentation spectrum of unseparated linear tetrapeptides 11-13. Although the spectrum is dominated by fragments originating from 11, several lower abundant fragments can be found originating from a fragmentation of 12 and 13 or possible sequence scrambling of 11. C: Sequence of linear tetrapeptides after N-terminal AQC derivatization to achieve better chromatographic separation and to prevent sequence scrambling. D: Normalized total ion chromatogram (TIC) of AQC derivatized linear peptides 11-13 after first Cterminal amino acid loss showing chromatographic separation and allowing peptide sequencing. E: Individual MS 3 fragmentation spectra of chromatographically separated derivatized peptides 11, 12 and 13 showing b 2 and b 1 ions used for peptide sequencing. The linear peptides 14-28 were identified accordingly. (TIF)     13 C chemical shifts of isoleucine in compound 5 with sequence cyclo-(Phe-Phe-Val-Ile) and compound 9 with sequence cyclo-(Phe-Phe-Val-Ile) present in a extracted mix of cyclic tetrapeptides. NMR signals corresponding to C = O and NH as well as remaining amino acid signals are not observed (n.o.) due to overlap with the main constituents. d DMSO ( 1 H/ 13 C) = (2.55/40.50). (DOCX)