Phylogenetic relationships of Vepris (Rutaceae) inferred from chloroplast, nuclear, and morphological data

The tribe Toddalieae Hook. F. (Rutaceae) has been controversial since its inception by Bentham and Hooker. The nine taxa examined, Acronychia J.R. & G.Foster, Diphasia Pierre, Diphasiopsis Mendonca, Fagaropsis Mildbr.ex. Siebenl., Oricia Pierre, Teclea Delile, Toddaliopsis Engl., Toddalia Juss. and Vepris Comm. ex. A. Juss, have been recognized under the tribe Toddalieae or Tribes Acronychia, Phellodendron and Toddalia. More recently Araliopsis Engl., Diphasia, Diphasiopsis, Oricia, Teclea, and Toddaliopsis have been incorporated into the genus Vepris, while Toddalia and Fagaropsis have continued to be recognized as closely related. For this study, sequence data of one non-coding chloroplast region (trnL-F) and one nuclear region (ITS) and various morphological characters, based on Mziray’s taxonomic studies were examined to try to elucidate these relationships. This study found that the taxa Diphasia, Diphasiopsis, Oricia, Teclea, Toddaliopsis, Vepris, Toddalia eugeniifolia Engl. and Toddalia glomerata F. Hoffm. form a monophyletic group. Due to the amount of intrageneric and intraspecific variation, species delimitations were difficult to determine; however, these genera should be united into Vepris. The analyses also confirmed that Toddalia asiatica (L.) Lam., Zanthoxylon sp. and Fagaropsis angolensis (Engl.) H.M. Gardner are the closest relatives to this group.


Introduction
Generic circumscriptions within the tribe Toddalieae Hook. F. (Rutaceae) in Africa has been controversial since its inception. Toddalieae is one of seven tribes given to the Rutaceae by Bentham and Hooker (1867) [1]. This tribe along with the tribe Aurantieae were grouped together in (1867) based on ovary and fruit similarities. Engler (1896) [2] rearranged the family to include six subfamilies and ten tribes. To separate the tribes, he mainly used the number of carpels (2-5 in the Toddalieae). Later he (1917) [3] described a new genus, Humblotiodendron Engl., under the Toddalieae; however, Perrier (1948) [4] reduced it to synonymy in Vepris. Verdoorn (1926) [5] revised the African Toddalieae and recognized seven genera but the available material for study was inadequate and the key to the genera ignored the fact that that many plants are dioecious so male specimens cannot be identified. Since then more material a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 has been collected and additional taxa have been described, allowing the overlap and character-inconsistency between genera to become evident. Engler, (1931 [6]) increased the number of tribes to eleven and subdivided the Tolddalieae into six subtribes (Table 1). A chemosystematic review of the family by da Silva et al. (1988 [7]) proposed major changes in the Englerian scheme. In this proposed scheme Rutoideae and Toddaliodeae are united, the subtribal groups are dispensed with, and most of the taxa are brought together with the Australasian genus Table 1 Kokwaro (1982) [8] did not recognize a higher classification system. Hall and Waterman, (1979) [9] synonymized Oriciopsis Engl. into Vepris. The most recent taxonomic studies in the Toddalieae was completed by Mziray (1992) [10], who, based on morphology, recognized three genera, rather than the original nine; the three genera are Vepris, Toddalia and Fagaropsis. The largest genus is Vepris, which incorporates Araliopsis Engl., Diphasia, Diphasiopsis, Oricia, Teclea, and Toddaliopsis. Accordingly, the nomenclature used in this paper is that of Mziray (1992) [10] ( Table 1). Subfamilial phylogenetic analyses were completed for the Rutaceae by Chase et al. (1999) [11], Groppo et al. (2008) [12], Poon et al. (2007) [13], and Morton and Telmer (2014) [14], using evidence from rbcL, atpB, rps16, trnL-trnF, trnL-F, xdh, and ITS sequence variation. None of the above authors, included taxa from Araliopsis, Diphasia, Diphasiopsis, Oricia, Teclea, or Toddaliopsis. Only two unidentified species of Vepris were included by Groppo et al. (2008) [12], so, their relationship to each other and to other taxa of Rutaceae based on molecular techniques needs to be examined in order to assess the degree of congruence with morphological characters.
The goals of this study are (1) to evaluate the genera within Vepris as recognized by Mziray (1992) [10]; (2) to test the monophyly of the Vepris and to identify the closest relatives; (3) to examine the relationship based on congruence of morphology and molecular characters.

Methods
For this study, one non-coding chloroplast region (trnL-trnF), as well as, one nuclear region (ITS) and various morphological characters were selected. The trnL-trnF region consists of the trnL intron and the trnL-trnF intergenic spacer [15]. ITS consists of three genes that code for the 18S, 5.8S and 26S ribosomal subunits. The three genes are separated by two internal transcribed spacers, ITS1 between 18S and 5.8S and ITS2 between 5.8S and 26S. In addition to the extensive use of the rapidly evolving ITS spacer sequences in phylogenetic studies at lower levels [16,17], the sequences have also served to resolve intrafamilial relationships [18]. Morphological characters were taken from information in Mziray ([10], pages 43-45) on taxonomic studies in Toddalieae.

Taxon sampling & DNA extraction
Vouchers for the 85 species used in this study along with the GenBank accession numbers are listed in the Table 2. Some specimens were collected using a National Geographic Society Grant in association with the University of Dar es Salaam (COSTECH and USD immigration permits were issued). The following taxa were not sequenced due to a lack of material: Araliopsis and Oriciopsis. The total genomic DNA was extracted from (0.5-1.0 g) fresh or dried leaf material. Leaves were ground with a mortar and pestle and subsequently treated with the DNEasy plant DNA extraction kit from Qiagen (Qiagen, Valencia, California, USA) following the manufacturer's protocol. Alignments were made using the Sequencher software program (Gene Codes Corporation, Ann Arbor, MI), for each marker and also the broader trnL-F alignment with sampling across all Rutaceae subfamilies including Meliaceae and Simaroubaceae as outgroups.

trnL-trnF
The trnL intron and the trnL-trnF intergenic spacer for 80 (Oricia samples could not be amplified) species were amplified. PCR was performed using the universal primers trn-c, trn-d, trn- Table 2. GenBank accession numbers for molecular data sets from one chloroplast marker (trnL) and one nuclear marker (ITS). Newly sequenced taxa for this study are in bold with voucher information. All remaining GenBank accession numbers are from previous studies, as indicated by footnotes. References indicated by superscript after number and papers listed below.

3262937, MO KU193658
Phebalium woombye Domin JX307300 [14] (Continued) e, and trn-f as described by Taberlet et al. (1991) [15]. For some samples the entire trnL intron/ trnL-trnF spacer region was amplified with trn-c and trn-f. In others, two separate amplifications were performed, one to amplify the trnL intron with trn-c and trn-d and the other to amplify the trnL-trnF spacer with trn-e and trn-f. The DNA fragment amplified using these

ITS
The amplification of the ITS gene was performed successfully on 35 species using oligonucleotide primers ITS1/ITS4 [19]

Cycle sequencing
The PCR products were cleaned using the QIAGEN QIAquick PCR purification kit (QIAGEN Inc., Chatsworth, California, USA) following the protocols provided by the manufacturer. Cleaned products were then directly sequenced using the ABI PRISM Dye Terminator Cycle Sequencing Ready Kit with AmpliTaq DNA Polymerase (Applied Biosystems Inc., Foster City, California, USA). Unincorporated dye terminators were removed using the QIAGEN DyeEx dye-terminator removal system (QIAGEN Inc.) following the manufacturer's recommendations. Samples were then loaded into an ABI 3100 DNA Sequencer. The sequencing data was analyzed and edited using the Sequencher software program (Gene Codes Corporation, Ann Arbor, Michigan, USA).

Morphological characters
Thirty-three characters are morphological. Seventeen characters were coded as unordered binary and 16 as multistate. All characters were variable. All analyses were conducted as stated in the analysis section. Character states of taxa were taken from Mziray ([10], pages 43-45).

Phylogenetic analysis
Boundaries of the trnL intron, and the ITS nuclear gene were determined by comparison with sequences in Genbank. The following two alignment criteria and methodology were used: (1) when two or more gaps were not identical but overlapping, they were scored as two separate events and (2) phylogenetically informative indels (variable in two or more taxa) were scored as one event at the end of the data set. All DNA sequences reported in the analyses have been deposited in Genbank (Appendix 1). All phylogenetic analyses employed maximum-parsimony with the heuristic search option in PAUP Ã 4.0b8 [20] with uninformative characters excluded. Searches were conducted with 1000 random-taxon-addition replicates with TBR branch swapping, steepest descent, and Mul-Trees selected with all characters and states weighted equally and unordered. All trees from the replicates were then swapped onto completion, all shortest trees were saved, and a strict consensus or majority rule tree was computed. Relative support for individual clades was estimated with the bootstrap method [21]. One thousand pseudoreplicates were performed with uninformative characters excluded. Ten random-taxon-addition heuristic searches for each pseudoreplicate were performed and all minimum-length trees were saved for each search. To reduce bootstrap search times, branches were collapsed if their minimum length was zero ("amb-").
Unambiguous morphological state changes were identified by using a combined analysis and MacClade 4.0 [22].
Bayesian analyses were performed using MrBayes 3.1.1 [23] on the combined molecular datasets accessed through the CIPRES portal [24]. The substitution model for each DNA region was selected with MrModeltest 2.3 [25] under the Akaike information criterion (AIC). The parameters for the Bayesian analyses were as follows: nst = 6; rates = gamma; set autoclose = yes; mcmcp ngnen = 10000; samplefreq = 10; savebriens = yes and the first 25% of the trees were regarded as "burn in". Branch lengths are averaged from the distribution of trees and the posterior probability values (BPP) for the branches reported [25].
For the likelihood analyses, the program MrModelltest 2.3 [25] was used to select the models of nucleotide evolution. Maximum likelihood trees were calculated using the web service for GARLI 2.1 ( [26]., 2014; available at www.molecularevolution.org). A total of 1000 replicates were conducted using the combined datasets and using the GTR + I + G model.
To determine the combinability of the data sets, their data structures were compared using methods outlined by Mason-Gamer and Kellogg (1996) [27], who discussed various ways to assess conflict between data sets. In one method the combination of independent data sets is possible if the trees do not conflict or if conflict receives low bootstrap support. Therefore, each node on the independent trees is tested for congruence against the other. If the nodes do not contain conflicting information, they are congruent and the data sets are combinable. Where there are incongruent nodes, the bootstrap values for each node are examined. If the support is less than 70%, there is no hard conflict and the incongruence is interpreted as being due to chance. In this study the different data sets were analyzed independently and in combination to see how each data set changed or confirmed the tree topologies of each other and to adopt a hypothesis of phylogenetic relationships for the tribes and genera.

Results
The inclusion of gap coding in all data sets containing molecular data resulted in more homoplasy and lack of resolution; therefore, gap coding was not used in the following results. Genbank sequences KU193 625-KU193689 were specifically generated for this study.

Larger trnL-trnF family analysis
Multiple sequence alignment of 78 Rutaceae and two closely related taxa resulted in a data matrix of 1038 characters. No regions were excluded. Of the 1038 positions constituting the aligned trnL-trnF sequences, 690 (66%) were variable and 392 (38%) were parsimony-informative. The analysis recovered 390 equally optimal trees of 1281 steps (CI = 0.54, RI = 0.71; Fig 1).
Vepris is not monophyletic in the majority rule tree because Toddalia eugeniifolia, and Toddalia glomerata fall within the clade.  Combined molecular data using parsimony Following the methods outlined by Mason-Gamer and Kellogg (1996) [27], the data sets were considered combinable. Within each gene analysis, trnL-trnF, and ITS, Vepris was not monophyletic and needs to be re-circumscribed. Among the molecular trees there was only one conflicting node with bootstrap support greater than 75% as follows: within the trnL-trnF two species of Teclea simplicifolia and Teclea nobilis (BS 84%) formed a clade whereas in the ITS two species of Teclea simplicifolia (BS 92%) formed a clade sister to a clade containing Teclea nobilis, Teclea trichocarpa and Toddaliopsis heterophylla. The conflict is in the position of T. nobilis which has no BS support in the ITS clade, therefore congruence exists between the data sets and a combined molecular analysis was completed. Vepris consists of three clades labeled as A, B. and C. Clade A contains ((((Teclea hanangensis and Diphasiopsis fadenii BS 100%) Oricia swynnertonii BS 93%) and Vepris stolzii) (BS 99%)).
In the Bayesian combined analysis, the group of exemplars from Vepris were not monophyletic because Toddalia eugeniifolia and Toddalia glomerata is within the clade; Fig 3. In all but two branches, the posterior probability values were higher than 99%. The only difference between the majority rule tree and the Bayesian tree was in clade B. In the majority rule tree, positions of ((Teclea nobilis, T. trichocarpa) and Toddaliopsis herterophylla)) formed a non-supported clade whereas in Bayesian tree, these taxa formed a grade. In addition, in clade B, a sample of Teclea nobilis in the majority rule tree at the base of the Teclea grandifolia, T.simplicifolia, and Toddalia eugenifolia clade whereas in the Bayesian tree Teclea nobilis was at the base of both the clades. None of the majority rule clades had support over 50% therefore there was no conflict.
In the maximum likelihood analysis, the group of exemplars from Vepris were not monophyletic because of Toddalia eugeniifolia and Toddalia glomerata occur within the clade ; Fig 4. The differences between the majority rule tree and the maximum likelihood trees was in clades B and C. In clade C, in the majority rule tree, Teclea amanuensis and Teclea tricholcarpa formed a non-supported grade to the clade containing Toddaliopsis sansibarensis whereas in the maximum likehood analysis they were unresolved. In clade B, in the majority rule tree, ((Teclea nobilis, T. trichocarpa) and Toddaliopsis herterophylla)) formed a clade whereas in the maximum likehood analysis they were unresolved.

Combined molecular and morphological evidence
A reduced analysis containing 18 taxa was examined using morphological characters taken from Mziray (1992) [10] taxonomic studies in Toddalieae and the ITS and trnL molecular  Phylogenetic relationships of Vepris datasets from above. The selection of taxa was driven from the morphology matrix contained in Mziray (1992) [10]. Multiple sequence alignment of this group resulted in a matrix of 1848 characters, of which 642 (35%) were variable and 299 (16%) were parsimony informative. The analysis recovered 81 equally optimal trees of 717 steps (CI = 0.59, RI = 0.62; Fig. not shown).

Phylogenetic utility of the genes (trnL-trnF and ITS)
The respective numbers of variable and potentially phylogenetically informative characters in each dataset, the consistency indices and the numbers of branches with bootstrap support above 75% can be found in Table 3. The ITS sequences produced more parsimony-informative characters when compared with the trnL-trnF. The ITS gene also had the highest number of resolved branches at or above 75% bootstrap support when compared with the trnL-trnF. The combined parsimony analysis had 20 nodes at or above 75% bootstrap support whereas the trnL-trnF had 3 nodes and the ITS gene had 9 nodes. There was no correlation between the increase of the CI and RI values and the increase or decrease in the number of informative characters.

Monophyly of Vepris and its closest relatives
The assembly of a larger "trnL-F" dataset including 78 taxa of Rutaceae was completed to determine the outgroup relationship of Vepris. Based on this analysis 19 species formed a supported clade (BS 70%) including Diphasia, Diphasiopsis, Teclea, Toddalia, Toddaliopsis, and Vepris. The genus Acronychia, once grouped with the above taxa, form a clade with members of Baurella (= Sarcomelicope), Correa, Diplolaena, Eriostemon, Flindersia, Lunasia, Melicope, Phebalium, Pilocarpus, and Sarcomelicope. At the base of these clades were member of Aurantioideae, Rutoideae and Cneoroideae. We used the genus Harrisonia in the Cneorioideae as the outgroup for the combined molecular analysis.

Circumscription of Vepris
Both independent molecular and combined analysis of the molecular and morphological data supported that fact that the Vepris should be merged with other Rutaceae genera, as previously postulated by Mziray (1992) [10]. The present study examined thirty-three morphological characters, including vegetative, floral, fruit and seed features. The analysis was based on 18 taxa that Mziray (1992) [10] had included in his revision and Acronychia was used as the outgroup because morphological features were readily available. Only four characters provided unequivocal synapomorphies for several clades. The occurance of prickles and scrambling habit were two unequivocal synapomorphies defining Toddalia asiatica, while a stony endocarp feature grouped Toddalia asiatica and Fagaropsis angolensis. Species of Telcea simplicifolia and Toddalia eugeniifolia were grouped based on the unequivocal synapomorphies of unifoliolate leaf type and dominant number of leaflets. Vepris drummondii also shares these features but it was not included in this study and must be further examined. Mziray (1992) [10] stated that it is rather common to find the traditional morphological characters used to be inconsistent within a species or even in the same specimen. In addition, he stated that the variation within Diphasia makes the distinction among Diphasia, Teclea and Vepris very unclear. He goes on to state that his study did not reveal other distinguishing characters with enough stability to maintain all the genera. He found that Vepris, Toddaliopsis, Araliopsis, Diphasiopsis, Teclea, Oricia and Diphasia formed a monophyletic group whereas Toddalia asiatica and Fagaropsis form a clade at the base of the tree. This was also evident in Fig 2 where the above genera were intermixed. Furthermore, taxa which have more than one species such as Vepris stozii, Teclea simplicifolia, Teclea trichocarpa, and Teclea nobilis did not group together (Fig 2). Most characters within these genera contain a great amount of overlap in intrageneric and intraspecific variation. For example, the leaf morphology of Vepris and Teclea contain overlap, whereas some characters (mature seeds and both stages of the hermaphrodite flowers) are difficult to interpret because of the limited amount of material available. All specimens were personally keyed out by the author, but this did not eliminate the confusion. Just as Verdoom (1926) discovered the lack of material at all stages made the task of defining satisfactory diagnostic characters for the genera and species practically impossible. However, Verdoorn's reliance was laid too heavily on just a few morphological characters, mainly on the number of stamens relative to petals and the number of ovary cells. These characters are very plastic and this led to the recognition of rather artificial groups which have proved to be difficult to demarcate. This interspecific variation was made clear based on the molecular dataset and the analysis (Fig 2). Mziray (1992) [10] mentions that Oricia species and some species of Diphasia and Diphasiopsis seem to be allied. He believed this group shared the tendency to become pubescent especially in the inflorescence and fruit; however, he also stated that no clear consistent set of characters seem to hold the group together. We found that Oricia, Teclea hanangensis, Diphasiopsis fadenii and Vepris stolzii did form a strongly supported clade which we call Clade A (BS 99) (Fig 2). Hartley (1975) [28], stated that members of Bauerella Borzi, is sometimes confused with Acronychia because both bear dioecious flowers and opposite leaves. In the larger trnL-F analysis, Acronychia, Bauerella, Sarcomeliocope and Meliocope formed a clade (BS 71). Bauerella is currently acknowledged as a synonym of Sarcomeliocope.

Conclusions
Vepris should be regrouped to contain the following taxa: Vepris, Toddaliopsis, Diphasiopsis, Teclea, Oricia, Diphasia, Toddalia glomerata, and Toddalia eugeniifolia, which are considered to represent a monphyletic group. The above taxa display intrageneric and intraspecific variation. This study also confirms that Toddalia asiatica, Zanthoxylon sp. and Fagaropsis angolensis are the closest relatives to this group.
Based on the number of informative characters and the number of branches with supported, ITS was an excellent candidate for this study. In addition, ITS produced very few alignment difficulties within the ingroup and outgroup, and its tree topology remained consistent with that of the chloroplast gene.
The phylogenetic analysis presented here provided the first study to examine the African Rutaceae using molecular data (chloroplast and nuclear), as well as morphological data. Topics to be addressed in a future study include the use of additional material and genera for determinations of species delimitations and tribal groupings.