Selection for High Oridonin Yield in the Chinese Medicinal Plant Isodon (Lamiaceae) Using a Combined Phylogenetics and Population Genetics Approach

Oridonin is a diterpenoid with anti-cancer activity that occurs in the Chinese medicinal plant Isodon rubescens and some related species. While the bioactivity of oridonin has been well studied, the extent of natural variation in the production of this compound is poorly known. This study characterizes natural variation in oridonin production in order to guide selection of populations of Isodon with highest oridonin yield. Different populations of I. rubescens and related species were collected in China, and their offspring were grown in a greenhouse. Samples were examined for oridonin content, genotyped using 11 microsatellites, and representatives were sequenced for three phylogenetic markers (ITS, rps16, trnL-trnF). Oridonin production was mapped on a molecular phylogeny of the genus Isodon using samples from each population as well as previously published Genbank sequences. Oridonin has been reported in 12 out of 74 species of Isodon examined for diterpenoids, and the phylogeny indicates that oridonin production has arisen at least three times in the genus. Oridonin production was surprisingly consistent between wild-collected parents and greenhouse-grown offspring, despite evidence of gene flow between oridonin-producing and non-producing populations of Isodon. Additionally, microsatellite genetic distance between individuals was significantly correlated with chemical distance in both parents and offspring. Neither heritability nor correlation with genetic distance were significant when the comparison was restricted to only populations of I. rubescens, but this result should be corroborated using additional samples. Based on these results, future screening of Isodon populations for oridonin yield should initially prioritize a broad survey of all species known to produce oridonin, rather than focusing on multiple populations of one species, such as I. rubescens. Of the samples examined here, I. rubescens or I. japonicus from Henan province would provide the best source of oridonin.

The aerial portions of I. rubescens have traditionally been used as a folk medicine in Henan province, central China for the treatment of sore throat, inflammation and gastrointestinal problems [17] where it is usually called donglingcao (???). The medicinal uses of I. rubescens were first documented in the 1977 edition of the Pharmacopoeia of the People's Republic of China [18]. As such, I. rubescens is a relatively recent introduction to the Traditional Chinese Medical literature. Although I. rubescens was not included in editions of the Chinese Pharmacopoeia after 1977, it was reintroduced in the most recent edition [19].
While much work has been done to characterize the chemical structure and diversity of diterpenoids in I. rubescens and other species of Isodon [3,17,20±22], relatively little is known about the evolutionary or ecological context of these abundant compounds [but see 23], or their variation in nature. The main goal of this paper is to summarize the results of a preliminary study of the natural variation of oridonin production using a unique combination of phylogenetic and population genetic approaches. This work was done to guide researchers in selecting populations of I. rubescens and related species that produce the most oridonin, since to-date there is no reported de novo synthesis of oridonin. This project was carried out in the context of a larger endeavor to evaluate more than 200 species of commonly used Chinese medicinal plants for their bioactivity [24]. The study of oridonin production in Isodon was to be a used as an example that could be applied to any potential screening discoveries of promising bioactive chemicals that would require additional plant material for follow-up pharmacological studies and clinical trials.
While oridonin has been extensively studied for therapeutic potential, it should be noted that it is not the only bioactive diterpenoid in Isodon [10]. Furthermore, it is possible that synergy or additivity of multiple diterpenoids may contribute to higher bioactivity of the whole extract of I. rubescens as compared to oridonin alone [16]. Therefore, the results of this study may not necessarily be extended to selection of other diterpenoids in Isodon or for the total bioactivity of Isodon extracts.
The genus Isodon is comprised of about 100 species and occurs primarily in Asia, with a few species in Africa [25]. The center of diversity of the genus is in China, where 77 species of the genus are located [26]. Although the classification of Isodon has been well studied, recent molecular phylogenetic work has placed the traditional subgeneric classification in question [27]. In addition, a study of Isodon in Japan using chloroplast sequence markers found low phylogenetic resolution and many species were polyphyletic. The phylogenetic patterns were attributed to some combination of a recent radiation, hybridization, or incomplete lineage sorting of ancestral polymorphisms in the genus Isodon [28]. Given the low phylogenetic resolution between Isodon species, published studies of the high diversity of diterpenoids in both Isodon [17] and I. rubescens [21], as well as suggestions of variability in diterpenoid production according to growing environment [29,30], the expectation for this study was that oridonin production would be highly variable, both throughout Isodon and within I. rubescens. This variability would be indicated by multiple gains and losses of oridonin production in the evolutionary history of Isodon, and also by variation of oridonin production between individual plants, both within the same population as well as different populations of I. rubescens and also between parents and offspring. To evaluate these expectations, oridonin production was  mapped onto a phylogeny of Isodon to determine the extent of evolutionary lability and to identify species or clades that might be the best sources of the compound. In addition, oridonin production was compared in microsatellite-genotyped individuals of wild-collected parents and greenhouse-grown offspring to determine heritability and populations with the highest chemical production.

Sample Collection
Samples of I. rubescens and related species were collected from the People's Republic of China in October and November of 2009 in Henan, Hubei, and Guizhou provinces. Samples were collected from a total of eight sites, with about 10 to 15 individuals collected at each site. Five populations consisted of the species I. rubescens, four of which were collected in the area of traditional use of donglingcao in Northern Henan province. One population of I. rubescens from Henan province represented a donglingcao cultivation site (JY). Two populations from Guizhou were determined to be the species Isodon enanderianus (Hand.-Mazz.) H.W. Li and one population from southern Henan province was identified as Isodon henryi (Hemsl.) Kudo Ã. In addition, individuals of I. japonicus and I. lophanthoides var. micranthus (C.Y. Wu) H.W. Li were collected growing at the same collection sites of I. rubescens and I. enanderianus, respectively. Location details for each population are shown in Table 1.
The collection of each individual consisted of a pressed voucher specimen, leaves in silica desiccant for DNA and chemical analyses, and seeds, if available. Representative voucher specimens are deposited at the Gray Herbarium of Harvard University (GH) and are noted in Table S1. In addition to wild-grown samples collected for the project, five samples of the herbal medicine donglingcao were purchased in Zhengzhou city, Henan province. Collection activities for this study were carried out under the auspices of a larger collaborative project between Harvard Medical School and the Beijing University of Chinese Medicine. Details about the collaboration agreements and the relevant Chinese government authorities that reviewed them are provided in a publication about the larger project [24]. No specific permits or permissions beyond those in the collaboration agreements were required for the collections in this study. Furthermore, collections were not conducted on privately owned or protected areas and no species of Isodon are endangered or protected. Greenhouse Seeds from parents representing each population were planted on April 7, 2010 in seedling trays. Seeds were grown in 3B Mix potting soil, which includes peat moss, pine bark, perlite, starter nutrients, wetting agent, and limestone (Fafard; Agawam, Massachusetts). A total of 1824 seeds were planted, averaging 27 seeds from each individual that was producing seeds at the time of collection. Seeds were collected from between four and 11 individuals per population. Of the total, 439 seeds germinated, and 308 plants survived to harvest. Seedlings were transplanted after two weeks to individual pots. Plants were arranged randomly with respect to parent population source, and re-arranged periodically during the growing period. Offspring individuals were harvested on August 10, 2010, which represents the typical harvest time of I. rubescens [19]. Leaves were harvested and immediately placed in silica desiccant.

Extraction and Chromatography of Oridonin
Between 200±500 mg of dried leaf samples were ground using a mortar and pestle to prepare for chemical analyses. The ground samples were put in a 40 mL vial (National Scientific Co.) and 10 mL of 1:1 dichloromethane : ethanol were added. The solution was sonicated for 30 min, centrifuged for 5 min, then filtered into a new 40 mL vial using Whatman Grade 1 Filter Paper (Whatman; Piscataway, NJ) without disturbing the pellet. The filtrate was placed in a cold room at 4uC and the filtrant was resuspended in 10 mL of 1:1 dichloromethane:ethanol and left at room temperature overnight. The mixture was then filtered, and resulting filtrate was combined with the filtrate of the first step. The combined filtrate was dried in a Savant SPD2010 Speedvac (Thermo Scientific; Waltham, MA) for 5±6 hrs. After the extract was completely dry, 20 mL of LC/MS grade methanol was added and the resulting mixture sonicated for 30±60 min. Once fully dissolved, 150 mL of the resulting solution was added to a 1 mL syringe (Luer Lock Tuberculin Syringe; BD; Franklin Lakes, NJ) with filter (Acrodisc LC 13 mm Syringe Filter with 0.2 mm PVDF membrane; Pall Corporation; Port Washington, NY), and filtered into a 2 mL screw top vial (Agilent Technologies; Santa Clara, CA) using a vial insert.
Ten mL of extracted and filtered samples were run on an Agilent 1200 Series HPLC/6130 Series MS (Agilent Technologies; Santa Clara, CA), with a Phenomenex Luna C18 column (100 mm64.60 mm 5 micron; Phenomenex; Torrance, CA) using a gradient elution of 15% to 100% CH 3 CN in water with 0.1% formic acid over 35 min at a flow rate of 0.7 mL/min. To quantify oridonin content, standard oridonin was run at seven concentrations from 1.5 ng/mL to 100 ng/mL. Oridonin content was quantified by comparison with a standard curve constructed from integrated peak areas at both UV 254 nm and MS+347. Oridonin content was expressed as percent dry weight (% dry wt.) of the plant.

Heritability and Population Comparison of Oridonin Content
The heritability of oridonin yield was assessed by plotting the average offspring oridonin amount against parent oridonin amount. The regression coefficient (b op ) of offspring oridonin yield against parental yield was estimated and tested for significance using a t-test. The one parent-offspring heritability formula was used to calculate the narrow-sense heritability (h 2 ) [31].

DNA Extraction and Sequencing
To prepare samples for DNA extraction, a small amount of leaf tissue, typically 1 cm 2 , was ground for 20 s in a 2 mL screw cap microcentrifuge tube (Sarstedt; Nu Èmbrecht, Germany) using a 1/ 40 ceramic sphere (MP Biomedicals; Solon, Ohio) in a Thermo-

Phylogenetic Analyses
Sequences were assembled in Geneious Pro 5.4.3 (Biomatters, Auckland, New Zealand). In addition to 18 samples newly sequenced for this paper, 40 samples were included from a previous study [27] to provide phylogenetic context and to help with the analysis of patterns of oridonin evolution. Coleus xanthanthus C. Y. Wu & Y. C. Huang was used as the outgroup, using Genbank sequences from another study [27]. A list of all sequences and associated Genbank identification numbers is provided in Table S1. Sequences were first aligned using MUSCLE [35] as implemented in Geneious, then manually edited in Jalview [36]. Ambiguously aligned regions, such as polynucleotide repeats and regions at the end of sequences were identified by eye, and excluded in Mesquite Version 2.74 [37].
Incongruence between the different gene regions was tested using the incongruence length difference (ILD) test [38] as implemented in PAUP* 4.0b10 [39]. The ILD test was conducted using the heuristic search method with 1000 homogeneity replicates of 1000 random addition sequence replicates each, saving one tree per addition sequence replicate.
Maximum parsimony (MP) analyses were done in PAUP using parsimony ratchet commands created in PAUPRat [40]. Each PAUPRat analysis was done with 1000 ratchet iterations, with 15% of the characters perturbed on each iteration. The PAUPRat commands were implemented five separate times and the most parsimonious trees from each run were concatenated in PAUP to make a strict consensus tree. Support for the MP trees was assessed using parsimony bootstrap. Each bootstrap analysis consisted of 1000 bootstrap replicates each consisting of 100 heuristic search random addition sequence replicates, saving one tree per replicate. The bootstrap analysis was done five times. The bootstrap trees from each analysis were concatenated in PAUP without excluding duplicate trees, and a majority rule consensus tree was constructed using tree weights to provide bootstrap support values.
The best model of evolution for maximum likelihood (ML) and Bayesian analyses was determined using jModeltest 0.1.1 [41,42]. Based on the Akaike information criterion, the TrN+C model was selected for the chloroplast data and the GTR+C model was selected for the ITS data. The ML analyses were conducted in PAUP and in the online version of RAxML [43] as implemented through the Cyberinfrastructure for Phylogenetic Research (CIPRES) web portal [44] using parameters specified by jModeltest. For ML analyses conducted in PAUP, the heuristic search method was used with 100 random addition sequence replicates. Bayesian analyses were conducted using MrBayes 3.1 [45,46]. Each Bayesian analysis consisted of four Markov chains running for 1,000,000 generations, and sampling every 100 generations. Since the exact jModeltest specifications could not be implemented in MrBayes, a general model (i.e., GTR+C) using estimated parameters was used for both the chloroplast and ITS datasets. The trees and parameters corresponding to the first 30% of samples were discarded as the burnin in order to calculate the best tree and posterior probability support values.

Evolutionary Gains and Losses of Oridonin Production
Oridonin is known to occur in at least 12 out of 74 species (16%) of Isodon examined for diterpenoid diversity [20]. In a subset of species known to produce oridonin, production may be polymorphic, as indicated by the fact that some studies have reported oridonin in those species and others have not. For the purposes of testing evolutionary lability of oridonin production, all species with at least one report of oridonin production were coded as`p resent'', whereas species without any report of oridonin production were coded as``absent.'' To evaluate the number of evolutionary gains and losses in oridonin production, the presence or absence of oridonin production was mapped onto the strict consensus tree of the MP analysis using the program Mesquite Version 2.74 [37]. Oridonin presence/absence was empirically determined for the samples newly sequenced in this study, and derived from the literature for other species of Isodon that were included in the phylogeny. Table S2 provides a list of species of Isodon, whether they produce oridonin, and associated references. The amount of evolutionary lability in oridonin production was estimated using the retention index (RI), and by comparing the parsimony tree length of the oridonin character (i.e., # evolutionary steps) with a null distribution created by randomly reshuffling the oridonin presence/absence character among the samples included in the study. The random distribution was created in Mesquite Version 2.74 by reshuffling the character 1000 times. The tree length comparison was conducted when resolving polytomies as``soft,'' since this is the more conservative option for detecting evolutionary gains and losses of a character.

Amplification and Scoring of Microsatellites
Eleven microsatellites were amplified in both parent and offspring samples in multiplex mixtures using fluorescently labeled primers. The resulting amplicons were mixed with a GeneScan 500 LIZ Size Standard (Applied Biosystems) and separated on an Applied Biosystems 3730xl sequencer (Applied Biosystems). Alleles were scored using GeneMapper v3.0 (Applied Biosystems). Additional details about the microsatellite regions and the methods used to amplify them are provided in [47].

Population Genetic Analysis
Statistical tests of population genetic variation were conducted within GENALEX v6.41 [48]. First, the effect of pooling parents and offspring was tested to see if it increased Wright's inbreeding coefficient (F IS ) for each population. Due to a significant increase in F IS in several populations, analyses of population genetic variation involve only parents. Each microsatellite locus was tested for deviation from Hardy-Weinberg equilibrium (HWE) using chisquare tests. Due to the large number of tests (n = 99), the nominal level of statistical significance (a = 0.05) was adjusted by Dunn-S Ï ida Âk correction (12(12a) 1/n ) to 0.00052. The number of alleles (N A ), observed (H O ) and expected heterozygosity (H E ), and fixation index (F = 12(H O /H E )) were averaged across loci for each population. Using the combined microsatellite data from the maternal parent and offspring, the Multi-locus Mating System Program (MLTR; [49]) was employed to calculate the maternal parent inbreeding coefficient (F mat ) and the biparental inbreeding rate (T m 2T s ). The biparental inbreeding rate is estimated here as the difference between the multi-locus population out-crossing rate (T m ) and the single locus out-crossing rate (T s ).
The spatial pattern of microsatellite variation was examined in order to identify distinct population clusters and the relationship between genetic distance and geographic distance. A principal coordinate analysis (PCoA) was first employed to summarize microsatellite genetic variation among all individuals. The eigenvectors of the PCoA were calculated from a covariance matrix with data standardization using the program GENALEX. The clustering of individuals from each population was examined based on the first two principal coordinates. A Bayesian clustering algorithm implemented in the program TESS v2.3.1 [50] was also employed, using the admixture model with correlated allele frequencies [51] to account for any migrants in the dataset, following recommendations of Francois and Durand [52]. TESS was run by setting the cluster (``k'') value incrementally from two to seven with 20 independent runs at each k value. A burn-in period of 75,000 sweeps was followed by MCMC sampling for 500,000 sweeps. The optimal k value was determined by examining the deviance information criterion (DIC) and the 20 independent runs at this value of k were summarized using the program CLUMPP [53] with the Greedy algorithm. The program DISTRUCT [54] was used to graphically display the output.
Microsatellite genetic data were also examined for genetic isolation by distance (IBD) and compared to chemical distance. A matrix of genetic distance [55] was compared to Euclidean and log-transformed Euclidean geographic distance. Genetic distance was also compared to a matrix of untransformed and logtransformed absolute difference of oridonin amounts (% dry wt.) in both parents and offspring. A mantel test of matrix correspondence was conducted using GENALEX, with statistical significance assessed after 999 permutations.

Repeatability of Oridonin Concentration Determination
A total of 144 unique samples were tested for oridonin concentration, including 75 from parent populations, 64 from greenhouse grown offspring, and five of purchased herbal samples. Representative LC/MS chromatograms are provided in Table S4. In order to test the reliability of the analytical method used in this study, a subset of 10 samples were run on multiple days and the resulting oridonin concentration values for those samples were compared. On average, the relative uncertainty in the detected value of oridonin was +/212% of the detected value. Oridonin yield in parents and offspring of each population is shown in Fig. 2.

Heritability of Oridonin Yield
The dry weight of oridonin for each offspring was averaged among siblings and regressed on the value of the maternal parent (Fig. 3) yielding a regression coefficient (b po ) of 0.381 (standard error60.083). This coefficient was highly statistically significant based on a t-test (p = 0.00007). Narrow-sense heritability (h 2 ) was estimated as 0.761. When a similar regression was performed using only populations of I. rubescens (HB, JY, QY, XX, YC), the correlation between parent and offspring oridonin content was no longer statistically significant, yielding a regression coefficient of 0.157 (standard error60.131; p = 0.244).

Phylogenetic Analyses
The aligned chloroplast data included a total of 1579 nucleotide characters, of which 1478 were constant, 51 were parsimony informative, and 50 were parsimony uninformative. The ITS dataset included 572 nucleotide characters, of which 404 were constant, 80 were parsimony informative, and 88 were parsimony uninformative. The ILD test indicated that the ITS partition was significantly different from the chloroplast partition (rps16, trnL-trnF) (P = 0.001), whereas the two chloroplast markers were not significantly incongruent from one another (P = 0.698). Therefore, chloroplast and ITS data were examined separately in all subsequent phylogenetic analyses.
The phylogenetic topologies resulting from the MP, ML, and Bayesian analyses were similar, mainly differing in the amount of resolution of shallower nodes. Parsimony ratchet analyses of the chloroplast data yielded 140 unique equally parsimonious trees (length, L = 130 steps; consistency index, CI = 0.86; retention index, RI = 0.95). The ITS parsimony analysis yielded 960 unique equally parsimonious trees (L = 285; CI = 0.78; RI = 0.84).

Evolutionary Gains and Losses of Oridonin Production
In the context of the chloroplast MP strict consensus phylogeny, oridonin production has five evolutionary steps (CI = 0.06; RI = 0.2), which is within the distribution of randomly reshuffled characters (mean = 7.116; standard deviation = 1.221). Oridonin production has three evolutionary steps (CI = 0.33; RI = 0.89) in the context of the ITS MP strict consensus phylogeny, however is lower than the values for randomly reshuffled characters (mean = 7.9958; standard deviation = 1.479). The pattern of oridonin presence/absence as mapped on the chloroplast and ITS strict consensus phylogenies is shown in Fig. 4.

Population Genetic Diversity and Spatial Genetic Structure of Isodon
Given the lack of phylogenetic resolution between populations collected for this study, the microsatellite data for all populations, regardless of species designation, were analyzed together. The microsatellite loci were first tested for deviations from Hardy-Weinberg equilibrium using chi-square tests (Table S3). After controlling the significance level for multiple testing, only loci 38 and 107 in the rubescens (XX) population had significant tests. Based on the low mean number of alleles across loci and the high fixation index in this population (Table 2), it is likely that these loci deviate from Hardy-Weinberg equilibrium for biological reasons rather than the presence of null alleles. Therefore, these two loci were included in all downstream analyses. The mean number of alleles across loci varied across populations, with enanderianus (QD) and rubescens (JY) having the highest polymorphism, and rubescens (XX) having the least. Since rubescens (JY) is a cultivated population from wild collected parent material, the observation of higher polymorphism might be expected. All populations show a deficit of average observed heterozygosity relative to average expected heterozygosity. The populations rubescens (XX) and rubescens (YC) have the largest fixation indices, but F does not exceed 0.2 in any population. Only rubescens (QY) has a large maternal fixation index (F mat ), but the large standard error makes this estimate unreliable. Similarly, the estimated biparental inbreeding rates (T m 2T s ) have large standard errors, and only two populations, rubescens (QY) and rubescens (JY), have estimated rate greater than zero.
The first two coordinates from the principal coordinate analysis account for 25% and 24% of the genetic variation, respectively. A plot of the first two components (Fig. 5) suggests that three populations form distinct clusters, and the remaining populations exhibit a large degree of overlap. Individuals from rubescens (YC) and enanderianus (QD) can be distinguished along the first coordinate axis, while enanderianus (TR) is distinguished along the second coordinate axis. The DIC values in the Bayesian clustering analysis approach an asymptote at six clusters ( Fig. 6A and 6B), with only a slight increase at seven clusters. No single individual is assigned to the seventh cluster when k = 7. Focusing on k = 6, unique clusters include enanderianus (QD), enanderianus (TR), henryi (XN), japonicus (XX), rubescens (YC), and a cluster where rubescens (JY), rubescens (QY), rubescens (HB), and rubescens (XX) are combined.
A pattern of isolation by distance is evident in the Mantel test of microsatellite variation and geographic distance. There is a significant correlation with Euclidean distance (N = 92, r xy = 0.327, p = 0.001) and the correlation is stronger for logtransformed Euclidean distance (N = 92, r xy = 0.401, p = 0.001). Similarly, when all populations in this study were included, genetic distance was significantly correlated with chemical distance in both parents (N = 73, r xy = 0.200, p = 0.002) and offspring (N = 64, r xy = 0.113, p = 0.005). However, when only populations of I. rubescens were compared (HB, JY, QY, XX, YC) there was no statistical correlation between chemical and genetic distance in either parents (N = 55, r xy = 0.038, p = 0.263) or offspring (N = 44, r xy = 0.031, p = 0.331).

Oridonin Yield
Average yields per population show that oridonin is at higher concentrations in the area of traditional production of donglingcao in Henan province (Fig. 2). Four of these high yield populations comprise a single genetic cluster and all were identified as I. rubescens. The sample of I. japonicus (XX) collected growing sympatrically with I. rubescens had the highest oridonin yield of all individuals. The southern population of I. rubescens from Hubei province (YC) had intermediate values of oridonin and was genetically distinct from I. rubescens in Henan province. Of individuals that produced oridonin, fewer than half were above the 0.25% oridonin dry weight limit specified by the Chinese Pharmacopoeia for samples of donglingcao [19]. Oridonin yield was generally higher in parents than offspring, with 37% of parent samples above the Chinese Pharmacopoeia limit, as compared to 21% of greenhouse grown offspring. The reason for the higher concentration of oridonin in parent samples is unclear, but may be due to lower environmental stress faced by greenhouse-grown offspring (e.g., no insect damage, regular watering) and/or the differences in timing of the plant collection. Interestingly, four out of five purchased herbal samples of donglingcao were above the Chinese Pharmacopoeia limit, despite evidence of adulteration (e. g., multiple banding of PCR amplicons of phylogenetic markers such as ITS; visible leaf material from other species).

Heritability of Oridonin Production
Estimates of oridonin yield show that there is a strong heritable component to the underlying variation in production of the chemical (h 2 = 0.761). This pattern is corroborated by the test of correlation between chemical and genetic distance. However, the contribution of additive genetic variance to the overall genetic variation is not clear, because our estimate of heritability assumes that there is no genotype-environment interaction and we are not able to test this assumption directly. The heritability of oridonin production is less clear when examining only populations of I. rubescens, either in the context of comparing average offspring values with parents to calculate heritability, or by examining correlations between genetic and chemical distance. The pattern of results in these two cases (i.e., all populations vs. only populations of I. rubescens) is suggestive that when seeking populations that produce the most oridonin, the selection of the species with highest yield should be prioritized before selecting populations within species. In the case of Isodon, it would be important to first select a wide taxonomic breadth to test for oridonin content, as opposed to focusing on variation of oridonin production within only one species (e.g., I. rubescens). However, the lack of heritability or correlation with genetic distance in oridonin content of only I. rubescens samples may be due to the smaller sample size. Additional samples of I. rubescens would be helpful to clearly determine whether heritability within I. rubescens is in fact low. This, in turn, would help to determine whether there is a genetic component to variation in oridonin production between populations of a single species, such as I. rubescens, and thus, whether additional sampling for high oridonin yield within species is necessary.

Phylogenetic Origins of Oridonin
The distribution of oridonin production on the chloroplast phylogeny cannot be distinguished from a random mapping of the character. By contrast, the distribution of oridonin production has less homoplasy than expected by chance in the context of the ITS phylogeny. The difference in these results is most likely because I. rubescens is not monophyletic in the chloroplast phylogeny, but is better resolved in the ITS phylogeny (Fig. 4). A phylogenetic study of Isodon species from Japan based on chloroplast markers did not resolve any species as monophyletic [28]. The authors attributed this pattern to incomplete lineage sorting of ancestral polymorphisms or hybridization, and suggested that nuclear gene markers such as ITS might be better at resolving phylogenetic relationships [28]. Because the ITS phylogeny is potentially more reliable, a tentative conclusion is that oridonin arose at least three times in Isodon, or less than would be expected if it were random (i.e., shows phylogenetic inertia). However, given the low resolution of the phylogeny, this number could be an underestimate. Further work with additional nuclear phylogenetic markers would be helpful for clarifying the patterns of oridonin evolution.
Oridonin is present in some populations and absent from others in at least seven of the 12 species of Isodon that have been found to produce the chemical (I. amethystoides, I eriocalyx, I japonicus, I nervosus, I phyllostachys, I. rosthornii, I ternifolius; see Table S2). The inter-specific variability of oridonin production was also corroborated by the low quantity of oridonin found in this study in the sample of I. lophanthoides var. micranthus and one parent sample of I. enanderianus, two species that have previously not been reported to produce oridonin. Furthermore, the offspring of the I. enanderianus sample that had a small oridonin yield did not produce oridonin. Therefore it is reasonable to conclude, as others have [29,30], that oridonin production can occur in some populations and not others of the same species.

Population Genetic Variation and Spatial Genetic Structure
Patterns of microsatellite genetic variability are fairly uniform across the Isodon populations, with the exception of samples from the population rubescens (XX) that have a reduced level of variation. Based on the average number of alleles in this population and heterozygosity patterns across loci, it is unlikely that null alleles are causing this deficit in variation. Instead, it is more likely that the reduction in variation in the rubescens (XX) population is due to biological factors, such as a recent population bottleneck (or founder event), or as a consequence of selfing. We used estimates of the maternal fixation rate and the biparental inbreeding rate [49] to test whether selfing is evident in any of the sampled populations. However, the estimates of inbreeding that we obtained have a large sampling error and cannot be distinguished from zero. While selfing is clearly not a dominant component of the mating strategy of Isodon, we are unable to clearly estimate inbreeding rates in these populations.
A principal coordinate analysis indicates that there is extensive overlap in genetic variation in most populations of Isodon, with clear separation of the three southern populations from Hubei and Guizhou provinces, rubescens (YC), enanderianus (QD), and enanderianus (TR). Notably, these three populations can be distinguished from one another based on the first two principal coordinates. Bayesian analysis of genetic structure suggests that the Isodon samples collected for this study are composed of six population clusters. In addition to the three southern populations, henryi (XN) and japonicus (XX) are also identified as distinct clusters. The clusters identified in the Bayesian analysis are geographically separated, with the exception that the sympatric sample of I. japonicus is distinct from I. rubescens at the Xinxiang (XX) site. One complicating factor of this analysis is that the populations also exhibit isolation by distance. Previous studies have shown that clustering methods can over-predict the number of unique clusters if there are large sampling gaps with respect to geographical distance [56]. However, it is noteworthy that different species of Isodon appear to form distinct clusters, though this may simply due to the fact that with the exception of the sympatric species at XX, the different species are separated by large distances.

Conclusions
1. Oridonin has been reported in 12 out of 74 Isodon species examined for diterpenoids. Oridonin has arisen at least three times in Isodon, but further phylogenetic work is needed to clarify patterns of oridonin evolution. 2. Isodon populations are characterized by high levels of gene flow, even between different species. Gene flow, however, is restricted geographically and genetic differentiation is observed among distant populations as evident in both the isolation-bydistance and structure analysis. 3. Microsatellite genetic distance is correlated with quantitative difference in oridonin production when all individual samples of this study are considered. 4. Oridonin production is strongly heritable when comparing different species of Isodon. The strong heritability is indicative that Isodon would respond well to selection and cultivation of species with high oridonin yield. 5. Oridonin yield does not appear to be heritable or correlated with genetic distance when only samples of I. rubescens are considered. However, additional sampling of multiple populations of I. rubescens is needed to confirm this. 6. Collectively, these results suggest that initial screening of Isodon populations for oridonin production should prioritize a broad survey of all species known to produce oridonin, rather than focusing on multiple populations of one species. 7. Of the samples in this study, populations of I. rubescens or I. japonicus from Henan province would be the best source of oridonin.