Comprehensive untargeted metabolomics of Lychnnophorinae subtribe (Asteraceae: Vernonieae) in a phylogenetic context

Members of the subtribe Lychnophorinae occur mostly within the Cerrado domain of the Brazilian Central Plateau. The relationships between its 11 genera, as well as between Lychnophorinae and other subtribes belonging to the tribe Vernonieae, have recently been investigated upon a phylogeny based on molecular and morphological data. We report the use of a comprehensive untargeted metabolomics approach, combining HPLC-MS and GC-MS data, followed by multivariate analyses aiming to assess the congruence between metabolomics data and the phylogenetic hypothesis, as well as its potential as a chemotaxonomic tool. We analyzed 78 species by UHPLC-MS and GC-MS in both positive and negative ionization modes. The metabolic profiles obtained for these species were treated in MetAlign and in MSClust and the matrices generated were used in SIMCA for hierarchical cluster analyses, principal component analyses and orthogonal partial least square discriminant analysis. The results showed that metabolomic analyses are mostly congruent with the phylogenetic hypothesis especially at lower taxonomic levels (Lychnophora or Eremanthus). Our results confirm that data generated using metabolomics provide evidence for chemotaxonomical studies, especially for phylogenetic inference of the Lychnophorinae subtribe and insight into the evolution of the secondary metabolites of this group.


Introduction
Vernonieae contains 21 currently recognized subtribes [1][2][3][4]. Among these, the subtribe Lychnophorinae is nearly endemic to Brazil [1,2,5,6]. Additionally, it contains 18 genera and ca. 120 species [1,2,5,6]. Most species are restricted to campo rupestre (literally rocky fields) in the highlands of southeastern and northeastern Brazil and to the Cerrado domain (Brazilian savanna). From a phytochemical point of view, these species exhibit high diversity of compounds. Flavonoids and terpenoids have been extensively identified; reviewing the absence of diterpenes and the extensive reports of sesquiterpene lactones (SLs) from the germacranolides (specifically germacrolides and heliangolides sub-types) and guaianolides types [7][8][9][10]. In a PLOS ONE | https://doi.org/10.1371/journal.pone.0190104 January 11, 2018 1 / 13 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 both UHPLC-MS and GC-MS analyses. To avoid analytical variations, the analyses were performed injecting 10 samples/batch and the injection of all samples was performed within a four days interval. To detect eventual variations, the same sample was analyzed before and after each batch. Eventual deviations in the internal standard retention times and areas in all the chromatograms were also checked.

Instrumentation
The UHPLC-MS and UHPLC-HCD MS/MS experiments were performed using an Accela UHPLC apparatus with a diode array detector (Accela) coupled to an ESI-Orbitrap mass spectrometer Exactive Plus (Thermo Scientific). The GC-MS experiments were performed using GC coupled to an EI mass spectrometer QP2010 Shimadzu.

UHPLC-MS and UHPLC-HCD MS/MS analyses.
UHPLC-MS and UHPLC-HCD MS/ MS analyses were performed using a core shell column (Kinetex 1.7 μm XB-C18, 150 X 2.1 mm, Phenomenex) connected to a guard cartridge of the same material. Separation was performed at a flow rate of 400 μL.min -1 and a gradient of H 2 O-HCO 2 H (0.1%) (v/v) (A) and CH 3 CN (B) as mobile phases; the elution profile was: 0-2 min, 5% B; 2-30 min, 5-100% B; 30-34 min (column washing), 100% B; 34-37 min, 100-5% B; 37-40 min (column equilibration), 5% B. The oven temperature was set at 45˚C. The DAD detector was set to record between 200-600 nm and chromatograms were registered at 254 nm, 270 nm and 330 nm. The column effluent was analyzed by ESI-MS (resolution of 70,000) and ESI-HCD MS/MS (resolution of 35,000) in both positive and negative ionization modes, all simultaneously. The mass spectra were acquired and processed using the software provided by the manufacturer. UHPLC-MS total ion current (TIC) chromatograms were recorded between m/z 150 and 1,200 and the following mass spectrometer parameters were maintained as the same in all analyses: 1.0 microscans per second; an automatic gain control (AGC) target, 1.0e6; maximum inject time, 100 ms; sheath gas flow rate, 30; auxiliary gas flow rate, 10; sweep gas flow rate, 11; capillar temperature, 320˚C; spray voltage in positive ionization mode, 3.6 kV; spray voltage in negative ionization mode, 3.2 kV; S-lens RF level, 50; and HCD, normalized collision energy (NCE) 35.0 eV. N 2 was used as the drying, nebulizer and fragmentation gas.
GC-MS analyses. Analyses by GC-MS were performed in split injection mode at 260˚C, using a DB-5MS capillary column (J&W Agilent) of 30 m X 0.25 mm, and film thickness 0.25 μm, with He (79.7 kPa) as a carrier gas at a flow rate of 1.3 mL.min -1 . An electron ionization mass spectrometer (EI-MS) detector was operated under an ion source temperature of 250˚C, a trap emission current of 60 μA and a 70 eV ionization energy. The global run time was recorded in full scan mode between m/z 50-500 and a scanning ratio of 0.30 scan.s -1 . The GC oven temperature was initially 100˚C, then linearly rose by 3˚C.min -1 to 300˚C during 90 min.
Sample preparation for UHPLC-MS and UHPLC-HCDMS/MS analyses. The leaves of each single plant were dried under air circulation (35˚C, 24 h) and powdered using liquid N 2 . The samples were prepared using 10.0 mg of dried powder in a glass vial and extracted with 1.0 mL of a solution of MeOH-H 2 O (7:3, v/v) in an ultrasonic bath for 10 min. Each extract was subjected to a clean-up with 500 μL of hexane. An internal standard of hydrocortisone (10.0 μg.mL -1 ) was added to the extract. Finally, each extract was filtered in a 0.20 μm PTFE membrane and 5.0 μL were injected.
Sample preparation for GC-MS analyses. The leaves were dried and powdered as described above. The samples were prepared using 30.0 mg of dried powder in a glass vial and extracted with 2.0 mL of CH 2 Cl 2 in an ultrasonic bath for 30 min. Each extract was transferred to a glass vial and the solvent was evaporated. Prior to analyses, the dried extract was suspended with CH 2 Cl 2 to a concentration of 10.0 mg.mL -1 .
Untargeted data processing and multivariate analysis. Mass signals (m/z) from the raw data files were automatically extracted and aligned by MetAlign (Rikilt, Institute of Food Safety) [24], resulting in 1,061 mass signals (GC-MS) at ion intensity higher than 5,000; and 36,861 and 24,482 mass signals in both positive and negative electrospray ionization mode (UHPLC-MS), respectively, at ion intensity higher than 10 5 . Mass signals were subsequently re-grouped using MSClust (Netherlands Genomics Initiative/Netherlands Organization for Scientific Research) [25], resulting in 73 (GC-MS), 2,972 (UHPLC-MS negative ionization mode) and 2,974 (UHPLC-MS positive ionization mode) reconstructed spectra, in a total of 6,019. Multivariate analyses (PCA and HCA) were performed using SIMCA P 13.0.3.0 (Umetrics AB Malmö, Sweden), after submitting data to both Pareto scaling and log transformation of metabolite signal intensities. HCA was performed using Euclidean distances. Subsequently, orthogonal partial least square discriminant analysis (OPLS-DA) was performed using SIMCA P 13.0.3.0 (Umetrics AB Malmö, Sweden) in accordance with the groups obtained from the PCA and HCA. The parameters used for OPLS-DA were the same as for PCA and HCA.
Chromatographic peak identification. Discriminant variables were provided by OPLS-DA and a variable importance plot (VIP) provided variables important for discrimination of each class. Then, discriminant compounds were identified according to the UV spectrum and molecular formulae calculated from accurate mass measurements, both obtained from UHPLC-UV-MS analyses. These UV spectra were used to suggest secondary metabolite classes corresponding to each peak, followed by screening against the molecular formulae in the Scifinder and Dictionary of Natural Products databases. In addition, fragmentation data obtained by UHPLC-UV-HCD MS/MS analyses, as well as comparison with authentic standards whenever possible, were also used for structure elucidations and to confirm the peak assignments. Finally, identified discriminant compounds were compared with the phytochemistry previously reported for the Lychnophorinae subtribe.

Multivariate analysis
Aqueous-methanol extracts were prepared from dried leaves and UHPLC-MS based metabolic fingerprinting was performed for all species, in both positive and negative electrospray ionization (ESI) modes. In a similar way, dichloromethane extracts were prepared from dried leaves and GC-MS-based metabolic fingerprinting was performed for all species, in an electron ionization (EI) mode. The data obtained in both positive and negative electrospray ionization (ESI) modes and the data obtained in an electron ionization (EI) mode were processed separately by MetAlign (Rikilt, Institute of Food Safety) [24] and reconstructed by MSClust (Netherlands Genomics Initiative/Netherlands Organization for Scientific Research) [25] followed by multivariate analysis with combined data from UHPLC-MS and GC-MS in SIMCA P 13.0.3.0 (Umetrics AB Malmö, Sweden).
The principal component analysis (PCA) (Fig 1) and hierarchical cluster analysis (HCA) (Fig 2) of 78 species showed segregation into four groups and these groups were respectively assigned 1A, 1B, 1C and 1D ( Table 1). None of the groups corresponded to clades of the phylogeny proposed by Loeuille et al. (2015b). However, it is possible to note that most of the species of Eremanthus and Lychnophora strict sensu were found in the groups (1A) and (1D). These groups comprised 17 among 20 analyzed species from Eremanthus and 16 among 23 analyzed species from Lychnophora.  which was also observed with HCA (Fig 2). These results corroborated that Lychnophoriopsis should be treated as a synonym of Lychnophora, and that L. damazioi is not closely related to Lychnophora, as proposed by Loeuille et al. (2015b).
It is noteworthy that multivariate analysis showed higher robustness (R2 cumulative = 0.444 and Q2 cumulative = 0.325) when UHPLC-MS and GC-MS data were analyzed together. Therefore, combining UHPLC-MS and GC-MS allows analysis of polar and nonpolar metabolites, respectively, providing a more widespread and robust metabolomics analysis. The resulting multivariate analysis using only the UHPLC-MS data (R2 cumulative = 0.291 and Q2 cumulative = 0.167) is shown in the Supporting Material ( Figures A and B in S1 File).
Lychnophora is one of the richest genera of the subtribe; therefore, it was analyzed separately by multivariate analysis with combined data from UHPLC-MS and GC-MS in SIMCA P 13.0.3.0 (Umetrics AB Malmö, Sweden). The multivariate analysis results for the Lychnophora species (Fig 3) are comparable with the phylogeny proposed by Loeuille et al. (2015b). The groups formed in PCA (Fig 3) were respectively assigned 3A, 3B, 3C and 3D. Also, the hierarchical cluster analysis (HCA) of Lychnophora species is shown in in the Supporting Material (Figure C in S1 File). Table 1. Species belonging to groups formed in PCA (Fig 1) and HCA (Fig 2).  All species of Bahian Lychnophora belonged in the same group (3B), with Lychnophora santosii (Lst), L. regis (Lre) and L. triflora (Ltr) very close to each other, and L. bishopii (Lbi) a little farther away. It should be noted that Bahian Lychnophora comprised a number of Lychnophora species and Eremathus leucodendron (Ele), and they are restricted to the Campos Rupestres of the Chapada Diamantina, the northern sector of the Espinhaço range of mountains, in the State of Bahia, Eastern Brazil they are morphologically distinct from the rest of Lychnophora species [5].
Eremanthus, the second speciose genus of the subtribe, was further analyzed separately by multivariate analysis with combined data from UHPLC-MS and GC-MS in SIMCA P 13.0.3.0 (Umetrics AB Malmö, Sweden). Multivariate analysis results for Eremanthus species (Fig 4) were comparable with the phylogeny proposed by Loeuille et al. (2015b). Species classified as Eremanthus segregated into four groups, which were respectively assigned 4A, 4B, 4C and 4D. Also, the hierarchical cluster analysis (HCA) of Lychnophora species is shown in in the Supporting Material (Figure D in S1 File). Most species belonged to groups (4B) and (4C), whereas only E. capitatus (Eca) and E. polycephalus (Epo) belonged to group (4A). On the other hand, E. pabstii (Epa) was unique species classified as Piptolepis, and it was the only species belonging to group (4D). In addition to species that were classified as Eremanthus, E. leucodendron (Ele), classified as Bahian Lychnophora and E. crotonoides (Ecr), were part of group (4B).

Discriminant variables
In accordance with the groups obtained in PCA and HCA, discriminant variables were assessed by orthogonal partial least square discriminant analysis (OPLS-DA) and a variable importance plot (VIP) provided variables important for discrimination of each class. Then, discriminant compounds were identified for each formed group in multivariate analysis.
Concerning discriminant compounds provided by OPLS-DA for formed groups in principal component analysis (PCA) of 78 species (Fig 1), flavonoids and sesquiterpene lactones were important for segregation of these species since they were found as discriminants in the four groups; those that were able to discriminate each group were identified. Table 2 presents the compound identities and the annotated compounds differentially accumulating in each group. Detailed information for identification of these compounds is provided as Supporting Material. In addition, the chemical structures of these compounds are represented in Figure E in S1 File.

Table 2. Identification of discriminant compounds of each group obtained in OPLS-DA of 78 species from the Lychnophorinae subtribe and data taken from UHPLC-MS and UHPLC-HCD MS/MS analyses.
Regarding discriminant compounds for formed groups under PCA of Lychnophora (Fig 3), which was analyzed separately, the identity of almost all SLs varied among the different groups, which allowed discrimination. Species belonging to group (3B) exhibited 15-hydroxy-16α- These results showed better congruence with the phylogenetic hypothesis when the metabolomic data were restricted to a single genus (Lychnophora or Eremanthus), whereas the metabolomic data seemed to be rather incongruent when all Lychnophorinae were taken into account. This noisy pattern at a higher taxonomic level probably reflects the existence of biological or physiological processes, such as convergence, adaptation or metabolite quantitative fluctuations that may obscure the phylogenetic signal. This is noteworthy on account of the harsh environmental conditions that prevail along the Campo rupestre and Cerrado ecosystems, including extreme oligotrophic and acid substrates, constant wind exposure, and intense fire regime [26]. Further studies are necessary for a better understanding of the processes leading to similar metabolic profiles in species not closely related. Here, in the same way that was done by Loeuille et al. (2015b), only one specimen of each species was analyzed for the purpose of mutual comparison. This could also explain the noisy pattern at higher taxonomic level. It is worthy note that most of the species studied are micro-endemics presenting very few populations; such a scenario offers a low probability for infraspecific chemical qualitative variation.
Notably, certain care must be taken when comparing multivariate analyses results with a phylogenetic hypothesis. The latter analysis is based on special similarity (homology, similarity inherited from a common ancestor) whereas multivariate analyses use an overall or global similarity without distinction between homology and homoplasy (the latter does not reflect evolutionary relationship) [27]. The incongruence noted with a phylogenetic hypothesis when all Lychnophorinae were taken into account may be attributed to such methodological intrinsic differences and the robustness of metabolomic analysis combining both UHPLC-MS and GC-MS data, which provided a more widespread metabolomics approach.
When considering the two most speciose genera, their similarity became even more evident. The Lychnophora species (Fig 3) classified as Lychnophora by Loeuille et al. (2015b) clustered almost all together into group (3C), as well as species classified as Prestelia Alliance, which clustered mainly into the close groups (3D) and (3C). In addition, all species classified as Bahian Lychnophora remained clustered into the same group (3B).
Regarding Eremanthus (Fig 4), species classified in this genus clustered almost all together into group (4C). The remaining species clustered into group (4A) and mainly into group (4B), with E. crotonoides and E. leucodendron, the unique species of that genus classified as Bahian Lychnophora [5]. In addition, E. pabstii was segregated into group (4D) according to its classification as the unique species of that genus Piptolepis [5].

Conclusions
We have recently reported the use of a similar approach on a small scale in a study of the genus Vernonia Schreb [22]. In this research, a preliminary study investigating the possibilities of using metabolomics for chemotaxonomical purposes was performed with a restricted group of ten species. In the present study, we not only greatly expanded the number of sampled species but also used a much more comprehensive raw data set for the statistical analyses, which included GC-MS and UHPLC-MS (both positive and negative ionization modes) combined. Our results using these improvements seemed to confirm that the metabolomics approach might be a promising tool for chemotaxonomical studies with taxonomic purposes but also highlighted the importance of choosing the correct taxonomic level; otherwise, the phylogenetic signal might be obscured by biological or physiological processes that may play a major role in the evolution of lineages on harsh habitats. The use of metabolomics as the primary data source for phylogenetic inference, in addition to molecular and morphological data, would offer an opportunity to obtain a robust phylogenetic hypothesis for the genera of Lychnophorinae subtribe and insight into the evolution of the secondary metabolites of this group.