Metabolic Engineering Plant Seeds with Fish Oil-Like Levels of DHA

Background Omega-3 long-chain (≥C20) polyunsaturated fatty acids (ω3 LC-PUFA) have critical roles in human health and development with studies indicating that deficiencies in these fatty acids can increase the risk or severity of cardiovascular and inflammatory diseases in particular. These fatty acids are predominantly sourced from fish and algal oils, but it is widely recognised that there is an urgent need for an alternative and sustainable source of EPA and DHA. Since the earliest demonstrations of ω3 LC-PUFA engineering there has been good progress in engineering the C20 EPA with seed fatty acid levels similar to that observed in bulk fish oil (∼18%), although undesirable ω6 PUFA levels have also remained high. Methodology/Principal Findings The transgenic seed production of the particularly important C22 DHA has been problematic with many attempts resulting in the accumulation of EPA/DPA, but only a few percent of DHA. This study describes the production of up to 15% of the C22 fatty acid DHA in Arabidopsis thaliana seed oil with a high ω3/ω6 ratio. This was achieved using a transgenic pathway to increase the C18 ALA which was then converted to DHA by a microalgal Δ6-desaturase pathway. Conclusions/Significance The amount of DHA described in this study exceeds the 12% level at which DHA is generally found in bulk fish oil. This is a breakthrough in the development of sustainable alternative sources of DHA as this technology should be applicable in oilseed crops. One hectare of a Brassica napus crop containing 12% DHA in seed oil would produce as much DHA as approximately 10,000 fish.


Introduction
Metabolic engineering of omega-3 long-chain ($C 20 ) polyunsaturated fatty acids (v3 LC-PUFA, Figure 1) has been a key metabolic engineering target in recent years. The two main v3 LC-PUFA are eicosapentaenoic acid (EPA, 20:5v3) and docosahexaenoic acid (DHA, 22:6v3). The dietary intake of preformed v3 LC-PUFA is important [1] since in vivo conversion of C 18 fatty acids to DHA is relatively poor [2]. This is especially relevant for brain development in infants [3] and for aspects of cardiovascular health [4]. These factors have resulted in the inclusion of DHA in infant formulae now being widespread and pharmaceutical-grade v3 LC-PUFA therapies are expanding rapidly for treatment of certain cardiovascular-related diseases. Demand for nutraceutical v3 LC-PUFA products, including DHA-specific products, is growing rapidly and an additional, sustainable source of v3 LC-PUFA is required to complement the existing marine fish oil supply [5].
Good progress has been made in engineering the C 20 EPA with groups reporting the seed production of levels similar to that observed in bulk fish oil (,18%), although v6 LC-PUFA levels also remained high in these examples [6][7]. The conversion of this C 20 fatty acid to the particularly important C 22 DHA, however, has been problematic with many attempts resulting in the accumulation of EPA/DPA and little DHA [6][7][8][9]. The difficulties associated with achieving high levels of DHA accumulation have been described [10][11] with key challenges including the reduction of undesirable v6 fatty acid co-production, achieving a continuous flux of substrates throughout the entire pathway without large losses to metabolically inactive pools and improvement of the critical D5-elongase efficiency to convert EPA to DPA ( Figure 1). We here report the construction and characterisation of an engineered pathway that largely overcomes these challenges, resulting in the accumulation of up to 15% DHA in the seed oil of Arabidopsis thaliana.

Results and Discussion
The insert region of the binary vector pJP3416_GA7 ( Figure 2) contained seven fatty acid biosynthesis genes driven by seedspecific promoters and a constitutively-expressed plant selectable marker. The transgenic pathway was designed to convert oleic acid (OA) to DHA ( Figure 1) and consisted of the Lachancea kluyveri D12-desaturase, Pichia pastoris D152/v3-desaturase, Micromonas pusilla D6-desaturase, Pyramimonas cordata D6and D5elongases and Pavlova salina D5and D4-desaturases. This combination of microalgal genes, with the exception of the yeast D12and D152/v3-desaturases, had previously resulted in an efficient conversion of native plant substrate fatty acids to DHA (up to 15.9% in leaf TAG) in the transient Nicotiana benthamiana assay system [11]. In pJP3416_GA7, these genes were expressed by the Brassicaceae-active seed-specific promoters A. thaliana FAE1, Linum usitatissimum conlinin1 (Cnl1) and conlinin2 (Cnl2) and the truncated Brassica napus napin promoter (FP1) with the tobacco mosaic virus 59 untranslated enhancer leader sequence upstream of each fatty acid biosynthesis gene.
The construct was transformed in the A. thaliana ecotype Columbia and a fad2 mutant (see Table 1 for seed fatty acid profiles of parentals). T 1 seeds from dipped plants were selected for phosphinothricin (PPT) resistance and T 2 seed from surviving plants was harvested and analysed for fatty acid composition ( Figure 3, Table 1). Several lines, indicated by brackets in Figure 3, were again selected for PPT resistance and resistant seedlings for each line transferred to soil. T 3 seeds from these plants were harvested and their seed oil-derived fatty acids were analysed by GC ( Table 1) which revealed that pJP3416_GA7 was functioning to generate significant levels of v3 LC-PUFA in seed oil. Up to 13.9% DHA was observed in the best T 3 event (Columbia#22) with a total of 24.3% new v3 fatty acids. Similarly, the best event in the fad2 mutant background yielded 20.6% total new v3 fatty acids including 11.5% DHA ( Table 1). In contrast, new transgenic v6 fatty acids were found in very low relative levels ( Table 1).
Seeds from the Columbia#22 line were planted directly to soil. Southern blot analysis of pooled material from this generation found that this line was triple-copy for the pJP3416_GA7 construct (data not shown). Seeds from mature plants were harvested and analysed. Some variation in DHA composition was observed in this generation (13.3%61.6, Table 1), suggesting that this threecopy event was not yet homozygous, although no variation in germination rate or seedling establishment was observed. This was further indicated by the DHA level in the best line being further increased to 15.1%, with the fatty acid profiles largely similar to the T 3 generation ( Table 1). Real-Time quantitative PCR was performed on cDNA generated from total RNA isolated from developing T 4 embryos using a fatty acid biosynthesis gene, bketoacyl-acyl carrier protein synthase II (KASII), as the reference gene ( Figure 4). The D6-desaturase and D6-elongase were found to be expressed relatively poorly compared to the other transgenes. 13 C NMR regiospecificity analysis was performed on DHAcontaining A. thaliana seed oil to determine the positional of the DHA fatty acid on the TAG. We found that the sn-1/3 position of TAG was significantly enriched for DHA ( Figure 5) with little DHA observed at the sn-2 position. This is especially interesting since the enrichment at the sn-2 position was recently observed for engineered arachidonic acid (a C 20 v6 LC-PUFA) in Brassica napus [14]. It will be important to observe the positional distribution of DHA in other engineered species. Finally, the total lipid was also analysed by triple quadrupole LC-MS to determine the major DHA-containing triacylglycerol (TAG) species ( Figure 6). The most abundant DHA-containing TAG species was found to be DHA-18:3-18:3 (nomenclature not descriptive of positional distribution) with the second-most abundant being DHA-18:3-18:2. Tri-DHA TAG was observed in total seed oil, albeit at low quantities. The two major DHA-containing TAG were further confirmed by Q-TOF MS/MS (data not shown). pJP3416_GA7 was designed to meet multiple functional objectives. First, we focused on producing DHA to the exclusion of intermediate and v6 fatty acids. To achieve this, adequate seed expression of all transgenes was required. The promoters used in this construct had either been previously described or tested in our laboratory for strong seed-specific function in A. thaliana. Second, intergenerational stability without gene silencing was required and we thus avoided a simple linear design [9] with identical promoters and polyadenylation regions for each expression cassette. Instead, Rb7 matrix attachment regions from Nicotiana tabacum were used as spacer DNA to separate the three-cassette inverted segments which had previously been found to operate effectively (data not shown). Third, correct expression timing was important so that the promoters expressing the first part of the pathway would not be active before the promoters expressing subsequent genes in the biosynthetic pathway. For example, we had found the FAE1 promoter tended to be active early in seed development relative to FP1, Cnl1 and Cnl2 and therefore did not use this promoter to express a gene where a native plant substrate was available. This was intended to avoid the accumulation of intermediates since transgenic fatty acids were only produced when the subsequent enzyme in the pathway was already expressed.
The use of the A. thaliana ecotype Columbia and a fad2 mutant provided a good contrast between DHA production in seed naturally high in the polyunsaturated LA and ALA as well as seed which contained little of these fatty acids but was rich in the precursor OA. One of the key challenges of LC-PUFA engineering in plant seeds is the loss of pathway intermediates from metabolically active acyl-PC and acyl-CoA pools to TAG before they can be further elongated or desaturated [10]. Earlier studies had demonstrated that ALA in particular was susceptible to accumulation in TAG and we decided to test whether the transgenic production of LA and ALA in the fad2 mutant would result in greater D6-desaturase accessibility. However, pJP3416_GA7 resulted in good DHA production in both backgrounds ( Table 1) and whilst the highest DHA line described above was found in the Columbia background, the averages of the selected T 3 populations were roughly equal with 9.7% DHA in the Columbia background and 9.9% in the fad2 mutant. This is likely due to the high activity of the L. kluyveri D12-desaturase which actually yielded slightly higher D12-desaturation in the best fad2 mutant event than the best Columbia event ( Table 1). This indicates that native FAD2 activity may not the most important factor in crop selection although it is important to note that the ratio between OA and polyunsaturates can be indicative of other important biochemical differences in a seed.
There were several notable characteristics of this pathway. For instance, the ratio of v3/v6 fatty acids was 8/1 when the native substrates LA and ALA were included or 16/1 if these were excluded. This was likely due both to the v3 preference of the M. pusilla D6-desaturase [12] and the presence of the broad-specificity P. pastoris v3-desaturase. This is an extremely desirable quality for an v3 oil due to the opposing, pro-inflammatory effects of v6 fatty acids [1]. It was also worth noting the low level of intermediate fatty acids in the seed oil. Previous attempts to produce DHA have resulted in the accumulation of high levels of EPA but relatively little conversion to DPA and DHA. Independent pJP3416_GA7 events had consistently high D5-elongation and subsequent D4desaturation indicating that the EPA to DPA conversion hurdle had been overcome in these lines. The D6-desaturase had consistently low activity relative to the other transgenes. The M. pusilla enzyme is likely acyl-CoA in nature and performs well alongside comparable enzymes in yeast and other assay systems [12], but has never exceeded approximately 50% ALA conversion in plant systems. Similar results have been observed by other groups and it has been proposed that ALA conversion by acyl-CoA desaturases can be limited in certain plant species [13]. The  D6-desaturase was also expressed relatively poorly and it is likely that this contributed to the relatively low activity.
The pJP3416_GA7 vector had several limitations which require further work to fully understand and overcome. Real-Time PCR showed that the expression levels of individual transgenes varied considerably with the D6-desaturase and D6-elongase having the lowest relative expressions. These genes also had the lowest apparent conversion efficiencies resulting in the accumulation of significant ALA and SDA and this is a likely cause of the gene dosage effect that resulted in a triple-copy event yielding the highest DHA accumulation. Future work will focus on improving construct design to result in stronger gene expression to reduce the gene dosage effect. It will also be important to examine the lipidome of developing transgenic DHA seeds to better understand how the pathway operates in the seed lipid pools. Given the high efficiency of the pathway in converting ETA through to DHA, it is tempting to speculate that the P. salina D5and D4-desaturases are able to utilise acyl-CoA substrates; further work is required to confirm this. Finally, the impact of underlying plant seed biochemistry on DHA production can be explored by transforming this pathway in other species.
In conclusion, the production of high levels of DHA has been a major goal of the metabolic engineering community. This study resulted in the accumulation of up to 15% DHA in a land plant seed oil, a level that exceeds the 12% level generally found in commodity bulk fish oil. A high v3/v6 ratio was also observed. We look forward to the application of this technology in crop species: 1 hectare of a Brassica napus crop containing 12% DHA in seed oil would produce as much DHA as approximately 10,000 fish. (This is a simplified calculation based on 10,000 kg fish = 1,000 kg oil = 120 kg DHA. Assumptions are that average fish = 1 kg, fish oil yield is 10% by mass, average DHA is 12%. For smaller size and less oily fish, the number of equivalent fish increases and for larger fish, the number of fish would decrease. Similarly, 1 Ha B. napus = 2.5 T seed = 1,000 kg oil = 120 kg

Binary Vector Construction and A. thaliana Transformation
Fatty acid biosynthesis gene sequences were sourced from yeast and microalgae. The L. kluyveri D12-desaturase (Genbank accession BAD08375) was identified by BLAST using known D12desaturase sequences as queries, whilst the other yeast gene, the P. pastoris D152/v3-desaturase, had been characterised as having a broad v3-specificity [15]. The microalgal genes had also been previously described and tested in the transient N. benthamiana leaf assay system under the control of both constitutive and seed specific promoters as well as stable seed expression [9] [11][12]. The seed-specific promoters used in this study had all been previously described: A. thaliana FAE1 [16], L. usitatissimum Cnl1 and Cnl2 [17][18] and the truncated B. napus napin promoter [19]. The vector was constructed by synthesising (Geneart, Regensburg, Germany) the seven fatty acid biosynthesis expression cassettes with MAR spacers [20][21] and tobacco mosaic virus 59 untranslated enhancer leader sequences [22] as a single 19.75 kb fragment flanked by NotI restriction sites. The fragment was then cloned into a binary vector, pJP3416, at a PspOMI site. pJP3416 contained the constitutively-expressed Streptomyces viridochromogenes phosphinothricin-N-acetyltransferase gene to confer phosphinothricin (PPT) resistance. A. thaliana ecotype Columbia and a fad2 mutant [23] were transformed by Agrobacterium-mediated floral dip [24] and seeds selected for PPT resistance by germination and establishment on MS media plates containing 3.5 mg/L PPT. DNA was extracted and Southern blots performed according to established protocols [25]. Total RNA was extracted using an RNeasy mini-kit (QIAGEN, Doncaster, VIC, Australia) and RT-PCR performed using a OneStep RT-PCR Kit (QIAGEN).

Fatty Acid Profile Analysis
Fatty acid profiles were determined on batches of approximately 200 seeds. Fatty acid methyl esters (FAME) were prepared by incubating seeds in 1N methanolic HCl at 80uC for 2 hours in a glass tube fitted with Teflon-lined screw cap and FAME extracted in hexane before analysis by gas chromatography (GC) using an Agilent Technologies 7890A GC (Palo Alto, California, USA) essentially as described by [26], but equipped with a 30 m SGE-BPX70 column. Peaks were quantified with Agilent Technologies ChemStation software (Rev B.04.03 (16), Palo Alto, California, USA) based on the response of the known amount of the external standard GLC-411 (Nucheck Prep, Elysian, MN, USA). Selected samples were also analysed on a non-polar GC column together with a GC-MS system for further confirmation of FA identification and component quantification. GC was performed using an Agilent Technologies 6890N GC (Palo Alto, California, USA) equipped with a non-polar Equity TM -1 fused silica capillary column (15 m60.1 mm i.d., 0.1 mm film thickness), an FID, a split/splitless injector and an Agilent Technologies 7683 Series auto sampler. Helium was the carrier gas. Samples were injected in splitless mode at an oven temperature of 120uC. After injection, oven temperature was raised to 270uC at 10uC min 21 and finally to 300uC at 5uC min 21 . Peaks were quantified with Agilent Technologies ChemStation software. Individual component identifications were confirmed by mass spectral data and by comparing retention time data with authentic and laboratory standards. GCmass spectrometric (GC-MS) analyses were performed on a Finnigan Thermoquest GCQ GC-MS fitted with an on-column injector and using Thermoquest Xcalibur software (Austin, Texas, USA). The GC was equipped with an HP-5 cross-linked methyl silicone fused silica capillary column (50 m60.32 mm i.d.) of similar polarity to that described above. Helium was used as carrier gas, with operating conditions previously described [27].

Lipid Extraction and 13 C NMR Regiospecificity Analysis
Total lipid was extracted by first crushing seeds under hexane before transferring to a glass tube containing 10 mL hexane. The tube was warmed at approximately 55uC in a water bath, vortexed and centrifuged. The hexane solution was removed and the procedure repeated with a further 4610 mL. The extracts were combined, concentrated by rotary evaporation and the TAG purified by eluting through a short silica column using 20 mL of 7% diethyl ether in hexane. Acyl group positional distributions on TAG were determined as previously described [12] using tuna oil as a sn-2 DHA comparator.  Real-Time Quantitative PCR Gene expression analysis was performed using a BioRad CFX96 TM Real-Time PCR (BioRad, Hercules, CA, USA). Gene-specific primers were designed to have similar melting temperatures and amplify ,200 bp fragments. PCR reactions were performed in triplicate in 10 mL volumes consisting of the iQTM SYBR Green supermix (BioRad), 5 mM each primer and 400 ng cDNA. The cycling conditions were 1695uC for 3 min., 40695uC for 10 sec., 60uC for 30 sec., 68uC for 30 sec. The endogenous fatty acid biosynthesis gene b-ketoacyl-acyl carrier protein synthase II (KASII) was used as a reference with data calibrated relative to each gene expression level following the 2 2DDCt method.

TAG Species Analysis with LC-MS
Total lipids were extracted from freeze-dried developing (twelve days after flowering) and mature seeds with tri-C17:0-TAG as internal standard. The extracted lipids were dissolved into 1 mL of 10 mM butylated hydroxytoluene in butanol:methanol (1:1, vol.) per 5 mg dry materials and analysed using an Agilent 1200 series LC and 6410B electrospray ionisation triple quadrupole LC-MS. Lipids were chromatographically separated using an Ascentis Express RP-Amide column (50 mm62.1 mm, 2.7 mm, Supelco) and a binary gradient with a flow rate of 0.