Covering Chemical Diversity of Genetically-Modified Tomatoes Using Metabolomics for Objective Substantial Equivalence Assessment

As metabolomics can provide a biochemical snapshot of an organism's phenotype it is a promising approach for charting the unintended effects of genetic modification. A critical obstacle for this application is the inherently limited metabolomic coverage of any single analytical platform. We propose using multiple analytical platforms for the direct acquisition of an interpretable data set of estimable chemical diversity. As an example, we report an application of our multi-platform approach that assesses the substantial equivalence of tomatoes over-expressing the taste-modifying protein miraculin. In combination, the chosen platforms detected compounds that represent 86% of the estimated chemical diversity of the metabolites listed in the LycoCyc database. Following a proof-of-safety approach, we show that % had an acceptable range of variation while simultaneously indicating a reproducible transformation-related metabolic signature. We conclude that multi-platform metabolomics is an approach that is both sensitive and robust and that it constitutes a good starting point for characterizing genetically modified organisms.


Chemical diversity
The coverage of the chemical diversity of the tomato metabolome was estimated by fetching the physicochemical properties predicted by ACD/Labs software at the ChemSpider database for both detected metabolites and all metabolites mentioned in the LycoCyc database. All 22 properties were downloaded where available but polarizability, LogD at pH 5.5, KOC at pH 5.5 and BCF at pH 5.5 was dropped due to redundancy with other features (correlation ρ > 0.99).

Proof of safety
Proof of safety analysis was performed by adapting the method described by [2] section 2.2 to the metabolomics setting. Two one sided two sample Student's t-tests (with correction for non-equal variances) are used to test the null-hypotheses: H − 0 : µ Transgene − µ C ≤ δ − where δ + and δ − are the acceptable deviations from the control (C). Proof of safety is declared if we can reject both H + 0 and H − 0 at the chosen condence level. For quantitative assays of known compounds δ − and δ + can be estimated using predened limits dervied from e.g. toxicological studies [2]. For untargeted metabolomics where we also prole unknown compounds, such limits are not available. Assuming that the traditional cultivars deviate from the control cultivar in a safe manner, we estimate acceptable deviation by the upper and lower limits of the t-distribution based 90% condence intervals, CI 0.90 , for µ CV,i − µ C for the cultivar (CV) furthest away from the control (C): [CI 0.90,upper (µ CV,i − µ C ); −CI 0.90,lower (µ CV,i − µ C )]) .
Symmetric boundaries were used as described in [2,3] setting δ − = −δ + . This denition estimates the range of the distances between the traditional cultivars and the designated control cultivar.
An alternative denition for the acceptable deviations with assymmetric thresholds was also used and reported in Supporting Figure  This is a more stringent denition where we only accept actually observed deviances from the control (and the range of metabolite levels observed in the control itself). To evaluate the eect of relaxing this denition we also report the results of allowing 10 and 20% deviations from δ − and δ + . The threshold of 20% is the deviation indicated as acceptable by the Nordic Council of Ministers [2]. Both these tests stands in contrast with the often used null hypothesis which tests the evidence for dierence, i.e. the proof of hazard.
In semi-quantitative measurements we can not easily ensure that the measurements are well correlated with the true concentration dierences. Reasoning that all metabolite levels show biological variance across the examined conditions, we test if the assay can capture the biological variation by examining if we can reject where B is the regression coecient matrix for the experiment design, X, onto the metabolite levels Y ; Y = XB + E using the classical ANOVA F-test. Outlier data were suppressed prior to testing H Noise 0 by Winsorizing datum more than three standard deviations away from the median to the maximum or minimum of the measurements within the same limit.

Multivariate statistics
OPLS-DA was done as described in ref. [4]. Briey, OPLS-DA extracts a set of components, meta features, that describe the class related variance in X (metabolite matrix with samples as rows and metabolites as columns). These components are given by the matrix T which are discriminate the sought classes well (high correlation with at least one column in the response matrix Y ). Another set of components are also calculated that describe as much of the class unrelated variance as possible: the T O matrix. X is thus decomposed into three separate matrices as: were W and P O are the weights vector matrices dening the relationship between each variable in X and T and T O . E is the residual matrix: E = X − T W + T O P O . The response matrix Y is a dummy matrix with one column per outcome class with entries corresponding to one for class membership and zero otherwise.
The number of components, n A and n A0 (columns of T and T O ), the model complexity, are user-dened parameters of the model which we estimate by maximizing the prediction accuracy, a, ((T P + T N )/(T P + T N + F P + F N ) [true positives (TN), true negatives (TP), false positives (FP), false negatives (FN)] during ve-fold cross-validation (CV). All CV segments were balanced to contain similar number of members from each class.
Resampling test for signicance of prediction was done by shuing the rows of the metabolite matrix and calculating accuracy during CV 300 times. The signicance p-value then equals the number of equal or better accuracies with shued data compared to non-shued divided by number of repeats. A total of ve dierent OPLS-DA models were estimated in this study, see Supporting Table S2 for a listing. PCA was performed using the pcaMethods package [5]. Both PCA and OPLS-DA was performed on unit-variance scaled metabolite matrices to make abundances comparable across metabolites and dierent platforms.

Univariate testing of metabolites for dierential abundances
In the HC experiment we used the following linear model to test for signicant dierences between transgenic and control lines for each metabolite m: where I is the intercept, g is a coecient describing the genotype (control or transgenic), s describes the ripening stage (red or green), g : s is the interaction term and e the residual. The t-statistic associated with H 0 : g = 0 was used to test for signicant genotype eect (results reported in Supporting Data 3). For the overview Figure in 4c we used: where c indicate the cultivars and x indicates if the sample is coming from a transgene.
In the soil experiment we used the following linear model for each metabolite m: where t indicates the watering treatment (more water or less water), h is the harvesting index (numbering 1 for the sample harvested rst, 2 for the second and so on), r indicate the truss number of sample and g : t the interaction eect between genotype and treatment, results reported in Supporting Data 4 and Supporting Figure S 5b. Acceptable normality of the distributions of the residuals,e, in (1) and (2)

Amount
The harvested fruits were chopped and 1 g fresh weight (FW) of the pieces was put in a 2-ml tube with a 5 mm of Zirconia bead to be used for metabolomics proling and 3 g was saved for ELISA and qRT-PCR assays.
GC-MS an equivalent of 0.6 µg and 6 µg of the derivatized samples were injected.
LC-MS an equivalence of 125 µg was injected.
CE-MS an equivalence of 14 µg was injected.

Growth condition
Seedlings of Solanum lycopersicum were potted in 1/2000 a Wagner pot containing compost soil (Kureha, Tokyo, Japan) for the soil experiment. Seeds were sown in 5 cm Ö 5 cm Ö 5 cm (height Ö length Ö width) rockwool cubes and grown in a hydroponics system (565 mg l −1 NO − 3 , 15.7 mg l −1 NH + 4 , 202.2 mg l −1 PO − 3 , 218.4 mg l −1 K + , 19.9 mg l −1 Mg + 2 , 95.0 mg l −1 Ca + 2 and micronutrients) in an environmentally controlled growth room at 25°C/20°C (light/dark) and 600 ppm CO 2 concentration with a light/dark cycle of 16 h/8 h for the hydroponic culture (HC) experiment. Seedlings were placed in a netted-greenhouse located at the Gene Research Center in University of Tsukuba.

Experimental condition
Same as growth conditions.

Sampling and sampling date
The fruits were harvested in spring (a pilot and HC experiments) and late summer (the soil experiment) in 2006, 2008, and 2009.

Metabolism quenching method
All samples were frozen within 30 s after sampling in liquid nitrogen. The frozen samples were lyophilized.

Chemical analysis meta data 2.2.1 Sample processing and extraction
The lyophilized sample in a 2 ml tube was frozen and then homogenized with a 5 mm of zirconia bead by a Mixer Mill (Retsch, Haan, Germany) at 20 Hz for 1 min. Five mg dry weight (DW) of the lyophilized samples were weighed for GC-MS and LC-MS analyses, while 25 mg DW of the samples for CE-MS analysis.
Extraction and derivatization for GC-MS: Each sample was extracted with a concentration of 2.5 mg DW of tissues per ml extraction medium (methanol / chloroform/water [3:1:1 v/v/v]) containing 10 stable isotope reference compounds: [ 2 H 6 ]-2-hydoxybenzoic acid and [ 13 C 6 ]-glucose using a Retsch mixer mill MM310 at a frequency of 30 Hz for 3 min at 4°C. Each isotope compound was adjusted to a nal concentration of 15 ng µl −1 for each 1-µl injection. After centrifugation for 5 min at 15,100 Ö g, a 200-µl aliquot of the supernatant was drawn and transferred into a glass insert vial. The extracts were evaporated to dryness in an SPD2010 SpeedVac® concentrator from ThermoSavant (Thermo electron corporation, Waltham, MA, USA). For methoximation, 30 µl of methoxyamine hydrochloride (20 mg/ml in pyridine) was added to the sample. After 24 h of derivatization at room temperature, the sample was trimethylsilylated for 1 h using 30 µl of MSTFA with 1% TMCS at 37°C with shaking. Thirty µl of n-heptane was added following silylation. All the derivatization steps were performed in the vacuum glove box VSC-100 (Sanplatec, Japan) lled with 99.9995% (G3 grade) of dry nitrogen.
For methoximation, 30 µl of methoxyamine hydrochloride (20 mg ml −1 in pyridine) was added to the sample. After 24 h of derivatization at room temperature, the sample was trimethylsilylated for 1 h using 30 µl of MSTFA with 1% TMCS at 37°C with shaking. Thirty µl of n-heptane was added following silylation. All the derivatization steps were performed in the vacuum glove box VSC-100 (Sanplatec, Japan) lled with 99.9995% (G3 grade) of dry nitrogen.
Extraction for LC-MS Five mg DW per 150 µl of extraction medium (methanol/water [2:5 v/v] with reference compounds [0.5 mg l −1 avonol-2'-sulfonic acid and 1.0 mg l −1 ampicilin]) each sample was used for the extraction of plant material using a Retsch mixer mill MM310 at a frequency of 20 Hz for 5 min at 4°C. After centrifugation for 10 min at 15,000 Ö g, the supernatant was transferred into a 2 ml tube. Thirty volumes of methanol were added to the tube and then extracted again using the mixer mill at a frequency of 20 Hz for 5 min at 4°C.
After centrifugation for 10 min at 15,000 Ö g, the resulting supernatant was transferred into the tube. Two hundred-µl aliquot of the extracts was ltered using an Oasis® HLB µ-elusion plate (30 µm, Waters Co., Massachusetts, USA). The extracts were evaporated to dryness in an SPD2010 SpeedVac® concentrator from ThermoSavant (Thermo electron corporation, Waltham, MA, USA). The extracts were dissolved by 160 µl of 20% aqueous methanol containing 0.5 mg l −1 lidocaine and d-camphor sulfonic acid.
Extraction for CE-MS Each sample was extracted in 200 volumes of methanol containing 8µM of two reference compounds (methionine sulfone for cation and camphor 10-sulfonic acid for anion analyses) using a Retsch mixer mill MM310 at a frequency of 27 Hz for 1 min. The extracts were then centrifuged at 20,400 Ö g for 3 min at 4°C. Five hundred-µl aliquot of the supernatant was transferred into a tube. Five hundred µl of chloroform and 200 µl of water was added into the tube to perform liquid-liquid distribution. The upper layer was evaporated for 30 min at 45°C by a centrifugal concentrator to obtain two layers. For removing high-molecular-weight compounds such as oligo-sugars, the upper layer was centrifugally ltered through a Millipore 5-kDa cuto lter at 9,100 g for 120 min at 4°C. The ltrate was dried for 120 min by a centrifugal concentrator. The residue was dissolved into 20 µl of water containing 200 µM of internal standards (3-aminopyrrolidine for cation and trimesic acid for anion analyses) that were used for compensation of migration time in the peak annotation step.

GC-TOF/MS conditions
One microliter of each sample was injected in the splitless mode by an CTC CombiPAL autosampler (CTC analytics, Zwin-gen, Switzerland) into an Agilent 6890N gas chromatograph (Agilent Technologies, Wilmingston, USA) equipped with a 30 m Ö 0.25 mm inner diameter fused-silica capillary column with a chemically bound 0.25-µl lm Rtx-5 Sil MS stationary phase (RESTEK, Bellefonte, USA) for metabolome analysis.
Helium was used as the carrier gas at a constant ow rate of 1 ml min −1 . The temperature program for metabolome analysis started with a 2-min isothermal step at 80 C and this was followed by temperature ramping at 30 C to a nal temperature of 320 C, which was maintained for 3.5 min. The transfer line and the ion source temperatures were 250 and 200°C, respectively. Ions were generated by a 70-eV electron beam at an ionization current of 2.0 mA. The acceleration voltage was turned on after a solvent delay of 273 s. Data acquisition was performed on a Pegasus IV TOF mass spectrometer (LECO, St. Joseph, MI, USA) with an acquisition rate of 30 spectra s −1 in the mass range of a mass-to-charge ratio of m/z = 60800. Alkane standard mixtures (C8C20 and C21C40) were purchased from SigmaAldrich (Tokyo, Japan) and were used for calculating the retention index (RI) [7,8]. The normalized response for the calculation of the signal intensity of each metabolite from the mass-detector response was obtained by each selected ion current that was unique in each metabolite MS spectrum to normalize the peak response. For quality control, we injected methylstearate in every 6 samples. Data was normalized using the CCMN algorithm [9].

CE-TOF/MS conditions CE-TOF MS instruments:
All CE-TOFMS experiments were performed using an Agilent CE capillary electrophoresis system (Agilent Technologies, Waldbronn, Germany), an Agilent G3250AA LC/MSD TOF system (Agilent Technologies, Palo Alto, CA), an Agilent 1100 series binary HPLC pump, and the G1603A Agilent CE-MS adapter and G1607A Agilent CE-ESI-MS sprayer kit. The G2201AA Agilent ChemStation software for CE and the Analyst QS software for TOFMS were used. Separation column and electrolytes: Separations were carried out using a fused silica capillary (50 µm i.d. x 100 cm total length) lled with 1 M formic acid for cation analyses or with 20 mM ammonium formate (pH 10.0) for anion analyses as the electrolyte. The capillary temperature was maintained at 20°C.
Sample injection: The sample solutions were injected at 50 mbar for 15 sec (15 nL). The sample tray was cooled below 4°C. Separation parameters: Prior to each run the capillary was ushed with electrolyte for 5 min. The applied voltage for separation was set at 30 kV. Fifty percent (v/v) methanol/water containing 0.5 µM reserpine was delivered as the sheath liquid at 10 µL/min. Ionization: ESI-TOFMS was conducted in the positive ion mode for cation analyses or in the negative ion mode for anion analyses, and the capillary voltage was set at 4 kV.
Dry gas condition: A ow rate of heated dry nitrogen gas (heater temperature 300°C) was maintained at 10 psig.
Voltage settings in TOF/MS: The fragmentor, skimmer, and Oct RFV voltage were set at 110V, 50V, and 160V for cation analyses or at 120V, 60V, and 220V for anion analyses, respectively.
Mass calibration: Automatic recalibration of each acquired spectrum was performed using Mass data acquirement: Exact mass data were acquired at a rate of 1.5 cycles/sec over a 50-1000 m/z range.
Quality control: In an every single sequence analysis (maximum 36 samples) on our CE-MS system, we analyzed the standard compound mixture at the rst and the end of sample analyses. The detected peak area of standard compound mixture was checked in point of reproducible sensitivity. Standard compound mixture composed of major detectable metabolites including amino acids and organic acids, and this mixture was newly prepared at least once a half year. In all analyses in this study, there were no dierences in the sensitivity of standard compounds mixture.

Data processing GC-MS Nonprocessed MS data from GC-TOF/MS analysis were exported in NetCDF format
generated by chromatography processing and mass spectral deconvolutionsoftware, Leco Chro-maTOF version 3.22 (LECO, St. Joseph, MI, USA) to MATLAB 6.5 (Mathworks, Natick, MA, USA), where all data-pretreatment procedures, such as smoothing, alignment, time-window setting, and H-MCR, were carried out [10]. The resolved MS spectra were matched against reference mass spectra using the NIST mass spectral search program for the NIST/EPA/NIH mass spectral library (version 2.0) and our custom software for peak annotation written in JAVA. Peaks were identied or annotated based on RIs and the reference mass spectra comparison to the Golm Metabolome Database (GMD) released from CSB.DB 1 [11]] and our in-house spectral library. The metabolites were identied by comparison with RIs from the library databases (GMD and our own library) and with those of authentic standards, and the metabolites were dened as annotated metabolites on comparison with mass spectra and RIs from these two libraries.
LC-MS The proling data les recorded in the MassLynx format (raw) were converted to the NetCDF format using the DataBridge function of MassLynx 4.1. From the set of NetCDF data les, the data matrix was generated using the MetAlign software (De Vos et al., 2007). By using this procedure, the data matrixes with unit mass data were generated. The data matrices were processed using in-house software written in Perl/Tk. The original peak intensity values were divided with that of the internal standards (lidocaine at m/z 235 [M + H]+ and ()-camphor-10sulfonic acid at m/z 231 [M H] for the positive and negative ion modes, respectively) determined in the same samples to normalize the peak intensity values among the metabolic prole data. CE-MS An original data le (.wi) was converted to an unique binary le (.ki) using in-house software (nondisclosure). Peak picking and alignment were performed using the another in-house software (nondisclosure), peaks were picked and aligned among samples automatically. By contrast with the detected m/z and migration time values of standard compounds including internal standards, peaks were annotated automatically using the same software. For normalization, the individual area of the detected peaks was divided by the peak area of the internal reference standards. Based on the calibration curves for standard compounds, peak area values were converted into values corresponding to amounts.
[2] Hothorn LA, Oberdoerfer R (2006) Statistical analysis used in the nutritional assessment of novel food using the proof of safety. Regul Toxicol Pharmacol 44: 125135.
[3] Tempelman RJ (2004) Experimental design and statistical methods for classical and bioequivalence hypothesis testing with an application to dairy nutrition studies. J Anim Sci 82 E-Suppl: E162E172.   PC1 is correlated to the development stage. PC2 is of unknown biological, unrelated to all known experimental factors. PC3 is correlated to the genotype. (b) The percentages of the peaks for which non-safety could be rejected at p < 0.05 when using asymmetric thresholds for the acceptable limits of deviation from the control and allowing for 0, 10 and 20% deviation from the upper or lower limit.
Supporting Figure   Shown is the observed and theoretical null-hypothesis quantiles of t-values for the signicance of the correlation between miraculin expression levels (protein and mRNA) and each metabolite peak in 7C in the soil experiment.