Common gardens in teosintes reveal the establishment of a syndrome of adaptation to altitude

In plants, local adaptation across species range is frequent. Yet, much has to be discovered on its environmental drivers, the underlying functional traits and their molecular determinants. Genome scans are popular to uncover outlier loci potentially involved in the genetic architecture of local adaptation, however links between outliers and phenotypic variation are rarely addressed. Here we focused on adaptation of teosinte populations along two elevation gradients in Mexico that display continuous environmental changes at a short geographical scale. We used two common gardens, and phenotyped 18 traits in 1664 plants from 11 populations of annual teosintes. In parallel, we genotyped these plants for 38 microsatellite markers as well as for 171 outlier single nucleotide polymorphisms (SNPs) that displayed excess of allele differentiation between pairs of lowland and highland populations and/or correlation with environmental variables. Our results revealed that phenotypic differentiation at 10 out of the 18 traits was driven by local selection. Trait covariation along the elevation gradient indicated that adaptation to altitude results from the assembly of multiple co-adapted traits into a complex syndrome: as elevation increases, plants flower earlier, produce less tillers, display lower stomata density and carry larger, longer and heavier grains. The proportion of outlier SNPs associating with phenotypic variation, however, largely depended on whether we considered a neutral structure with 5 genetic groups (73.7%) or 11 populations (13.5%), indicating that population stratification greatly affected our results. Finally, chromosomal inversions were enriched for both SNPs whose allele frequencies shifted along elevation as well as phenotypically-associated SNPs. Altogether, our results are consistent with the establishment of an altitudinal syndrome promoted by local selective forces in teosinte populations in spite of detectable gene flow. Because elevation mimics climate change through space, SNPs that we found underlying phenotypic variation at adaptive traits may be relevant for future maize breeding. Author summary Across their native range species encounter a diversity of habitats promoting local adaptation of geographically distributed populations. While local adaptation is widespread, much has yet to be discovered about the conditions of its emergence, the targeted traits, their molecular determinants and the underlying ecological drivers. Here we employed a reverse ecology approach, combining phenotypes and genotypes, to mine the determinants of local adaptation of teosinte populations distributed along two steep altitudinal gradients in Mexico. Evaluation of 11 populations in two common gardens located at mid-elevation pointed to adaptation via an altitudinal multivariate syndrome, in spite of gene flow. We scanned genomes to identify loci with allele frequencies shifts along elevation, a subset of which associated to trait variation. Because elevation mimics climate change through space, these polymorphisms may be relevant for future maize breeding.


Introduction
More modest structural changes include megabase-scale inversions that harbor 121 clusters of SNPs whose frequencies are associated with environmental variation [55, 122 56]. Also, differentiation-and correlation-based genome scans in teosinte populations 123 succeeded in finding outlier SNPs potentially involved in local adaptation [57,58]. 124 But a link with phenotypic variation has yet to be established. 125 In this paper, we genotyped a subset of these outlier SNPs on a broad sample 126 of 28 teosinte populations, for which a set of neutral SNPs was also available; as well 127 as on an association panel encompassing 11 populations. We set up common gardens 128 in two locations to evaluate the association panel for 18 phenotypic traits over two 129 consecutive years. Individuals from this association panel were also genotyped at 38 130 microsatellite markers to enable associating genotypic to phenotypic variation while 131 controlling for sample structure and kinship among individuals. We addressed three 132 among 37 populations (S1 Table)  Nelson, E. Guevara, 2008, Hole-filled seamless SRTM data V4, International Centre 165 for Tropical Agriculture (CIAT), available from http://srtm.csi.cgiar.org). 166 167 We gathered phenotypic data during two consecutive years (2013 and 2014). 168 We targeted 18 phenotypic traits that included six traits related to plant architecture, 169 three traits related to leaves, three traits related to reproduction, five traits related to 170 grains, and one trait related to stomata (S2 Table). Each of the four experimental assays (year-field combinations) encompassed four blocks. In each block, we 172 evaluated one offspring (half-sibs) of ~15 mother plants from each of the 11 teosinte 173 populations using a semi-randomized design. After filtering for missing data, the 174 association panel included 1664 teosinte individuals. We found significant effects of 175 Field, Year and/or their interaction for most traits, and a highly significant Population 176 effect for all of them (model M1, S3 Table). 177 We investigated the influence of altitude on each trait independently. All traits, 178 except for the number of nodes with ears (NoE), exhibited a significant effect of 179 altitude (S3 Table, Table). We obtained higher heritability for grain related 191 traits when mother plant measurements were included in the model with 0.631 (sd = 192 0.246), 0.511 (sd = 0.043) and 0.274 (sd = 0.160) for grain length, weight and width, 193 respectively, suggesting that heritability was under-estimated for other traits where 194 mother plant values were not available. 195

Multivariate analysis of phenotypic variation and correlation between traits. 207
Principal component analysis including all phenotypic measurements 208 highlighted that 21.26% of the phenotypic variation scaled along PC1 (Fig 3A), a PC 209 axis that is strongly collinear with altitude ( Fig 3B). Although populations partly 210 overlapped along PC1, we observed a consistent tendency for population phenotypic 211 differentiation along altitude irrespective of the gradient (Fig 3C). Traits that 212 correlated the most to PC1 were related to grain characteristics, tillering, flowering 213 and to a lesser extent to stomata density ( Fig 3B). PC2 correlated with traits 214 exhibiting a trend toward increase-with-elevation within parviglumis, but decrease-215 with-elevation within mexicana (Fig 3D). Those traits were mainly related to 216 vegetative growth ( Fig 3B). Together, both axes explained 37% of the phenotypic 217 and gradients a and b (Fig 4A & B). TreeMix analysis for a subset of 10 of these 247 populations confirmed those results with an early split separating the lowlands from 248 gradient a (cf. K=2, S6 Fig) followed by the separation of the three mexicana from the 249 remaining populations (Fig 4C). TreeMix further supported three migration edges, a 250 model that explained 98.75% of the variance and represented a significant 251 improvement over a model without admixture (95.7%, Figure S7). This admixture 252 model was consistent with gene flow between distant lowland parviglumis 253 populations from gradient a and b, as well as between parviglumis and mexicana 254 populations ( Fig 4C). Likewise, structure analysis also suggested admixture among 255 some of the lowland populations, and to a lesser extent between the two subspecies 256  Table) Together, these two methods indicated that phenotypic divergence among populations 289 was driven by local selective forces. Altogether, evidence of spatially varying selection at 10 traits (Table 1)  We successfully genotyped 218 (~81%) out of 270 outlier SNPs on a broad set 298 of 28 populations, of which 141 were previously detected in candidate regions for 299 local adaptation [58]. Candidate regions were originally identified from re-sequencing 300 data of only six teosinte populations (S1 Table) following an approach that included 301 high differentiation between highlands and lowlands, environmental correlation, and 302 in some cases their intersection with genomic regions involved in quantitative trait 303 variation in maize. The remaining outlier SNPs (77) were discovered in the present 304 study by performing F ST -scans on the same re-sequencing data (S5 Table). We 305 selected outlier SNPs that were both highly differentiated between highland and 306 lowland populations within gradient (high/low in gradient a or b or both), and 307 between highland and lowland populations within subspecies in gradient b (high/low 308 within parviglumis, mexicana or both). We first employed multiple regression to test for each SNP, whether the 326 pairwise F ST matrix across 28 populations correlated to the environmental (distance 327 along PCenv1) and/or the geographical distance. As expected, we found a 328 significantly greater proportion of environmentally-correlated SNPs among outliers 329 compared with neutral SNPs (χ² =264.07, P-value=2.2 10 -16 ), a pattern not seen with 330 geographically-correlated SNPs. That outlier SNPs displayed a greater isolation-by-331 environment than isolation-by-distance, indicated that patterns of allele frequency 332 differentiation among populations were primarily driven by adaptive processes. We 333 further tested correlations between allele frequencies and environmental variation. 334 Roughly 60.8% (104) of the 171 outlier SNPs associated with at least one of the two first PCenvs, with 87 and 33 associated with PCenv1 and PCenv2, respectively, and 336 little overlap (S5 Table). As expected, the principal component driven by altitude 337 (PCenv1) correlated to allele frequency for a greater fraction of SNPs than the second 338 orthogonal component. Interestingly, we found enrichment of environmentally-339 associated SNPs within inversions both for PCenv1 (χ² = 14.63, P-value=1.30 10 -4 ) 340 and PCenv2 (χ² = 33.77, P-value=6.22 10 -9 ). 341 342 Associating genotypic variation to phenotypic variation. 343 We tested the association between phenotypes and 171 of the outlier SNPs 344 (MAF>5%) using the association panel. For each SNP-trait combination, the sample 345 size ranged from 264 to 1068, with a median of 1004 individuals (S6 Table). We used 346 SSRs to correct for both structure (at K=5) and kinship among individual genotypes. 347 This model (M5) resulted in a uniform distribution of P-values when testing the 348 association between genotypic variation at SSRs and phenotypic trait variation (S10 349  Table). Ninety-three (73.8%) out of the 126 353 associated SNPs were common to at least two traits, and the remaining 33 SNPs were 354 associated to a single trait (S5 Table). The ten traits displaying evidence of spatially 355 varying selection in the Q ST -F ST analyses displayed more associated SNPs per trait 356 (30.5 on average), than the non-spatially varying traits (12.75 on average). 357 A growing body of literature stresses that incomplete control of population 358 stratification may lead to spurious associations [61]. Hence, highly differentiated 359 traits along environmental gradients are expected to co-vary with any variant whose allele frequency is differentiated along the same gradients, without underlying causal 361 link. We therefore expected false positives in our setting where both phenotypic traits 362 and outlier SNPs varied with altitude. We indeed found a slightly significant 363 correlation (r=0.5, P-value=0.03) between the strength of the population effect for 364 each trait -a measure of trait differentiation (S3 Table) -and its number of associated 365 SNPs (S5 Table). 366 To verify that additional layers of structuring among populations did not cause 367 an excess of associations, we repeated the association analyzes considering a 368  Table). Among the 126 SNPs associating with at least 371 one trait at K=5, only 22 were recovered considering 11 populations. An additional 372 SNP was detected with structuring at 11 populations that was absent at K=5. Eight 373 traits displayed no association, and the remaining traits varied from a single 374 associated SNP (Leaf length -LeL and the number of tillers -Til) to 8 associated 375 SNPs for grain weight (S5 Table). Altogether the 23 SNPs recovered considering a neutral genetic structure with 394 11 populations corresponded to 30 associations, 7 of the SNPs being associated to 395 more than one trait (S5 Table). For all these 30 associations except in two cases (FFT 396 with SNP_7, and MFT with SNP_28), the SNP effect did not vary among populations 397 (non-significant SNP-by-population interaction in model M5' when we included the 398 SNP interactions with year*field and population). For a subset of two SNPs, we 399 illustrated the regression between the trait value and the shift of allele frequencies 400 with altitude (Fig 6 A&B). We estimated corresponding additive and dominance 401 effects (S7 Table). In some cases, the intra-population effect corroborated the inter-402 population variation with relatively large additive effects of the same sign ( dominant. In other cases, the results were more difficult to interpret with negligible 405 additive effect but extremely strong dominance (S7 Table,   Among the 171, the subset of 23 phenotypically-associated SNPs (detected 433 when considering the 11-population structure) displayed an excess of elevated LD values -out of 47 pairs of SNPs phenotypically-associated to a same trait, 16 pairs 435 were contained in the 5% higher values of the LD distribution of all outlier SNP pairs. 436 Twelve out of the 16 pairs related to grain weight, the remaining four to leaf 437 coloration, and one pair of SNPs was associated to both traits. Noteworthy was that 438 inversions on chromosomes 1, 4, and 9, taken together, were enriched for 439 phenotypically-associated SNPs (χ² = 8.95, P-value=0.0028). We recovered a 440 borderline significant enrichment with the correction K=5 (χ² = 3.82, P-value=0.051). 441 Finally, we asked whether multiple SNPs contributed independently to the 442 phenotypic variation of a single trait. We tested a multiple SNP model where SNPs 443 were added incrementally when significantly associated (FDR < 0.10). We found 2, 3 444 and 2 SNPs for female, male flowering time and height of the highest ear, 445 respectively (S5 Table) Our multivariate analysis of teosinte phenotypic variation revealed a marked 475 differentiation between teosinte subspecies along an axis of variation (21.26% of the 476 total variation) that also discriminated populations by altitude (Fig 2A & B). The 477 combined effects of assortative mating and environmental elevation variation may 478 generate, in certain conditions, trait differentiation along gradients without underlying 479 divergent selection [77]. While we did not measure flowering time differences among 480 populations in situ, we did find evidence for long distance gene flow between 481 gradients and subspecies (Fig 4 A & C). In addition, several lines of arguments 482 suggest that the observed clinal patterns result from selection at independent traits and 483 is not solely driven by differences in flowering time among populations. First, two distinct methods accounting for shared population history concur with signals of 485 spatially-varying selection at ten out of the 18 traits ( Table 1) Spatially-varying traits that displayed altitudinal trends, collectively defined a 497 teosinte altitudinal syndrome of adaptation characterized by early-flowering, 498 production of few tillers albeit numerous lateral branches, production of heavy, long 499 and large grains, and decrease in stomata density. We also observed increased leaf 500 pigmentation with elevation, although with a less significant signal (S3 Table), 501 consistent with the pronounced difference in sheath color reported between 502 parviglumis and mexicana [78,79]. Because seeds were collected from wild 503 populations, a potential limitation of our experimental setting is the confusion 504 between genetic and environmental maternal effects. Environmental maternal effects 505 could bias upward our heritability estimates. However, our results corroborate 506 previous findings of reduced number of tillers and increased grain weight in mexicana 507 compared with parviglumis [80]. Thus although maternal effects could not be fully 508 discarded, we believe they were likely to be weak. 509 The trend towards depleted stomata density at high altitudes (S3 Fig)  Although we did not formally measure biomass production, the lower number 530 of tillers and higher amount and size of grains in the highlands when compared with 531 the lowlands may reflect trade-offs between allocation to grain production and 532 vegetative growth [88]. Because grains fell at maturity and a single teosinte individual 533 produces hundreds of ears, we were unable to provide a proxy for total grain 534 production. The existence of fitness-related trade-offs therefore still needs to be 535 formally addressed. 536 Beyond trade-offs, our results more generally question the extent of 537 correlations between traits. In maize, for instance, we know that female and male Nevertheless, correction for sample structure is key for statistical associations 557 between genotypes and phenotypes along environmental gradients. This is because 558 outliers that display lowland/highland differentiation co-vary with environmental 559 factors, which themselves may affect traits [95]. Consistently, we found that 73.7% 560 SNPs associated with phenotypic variation at K=5, but only 13.5% of them did so 561 when considering a genetic structure with 11 populations. Except for one, the latter 562 set of SNPs represented a subset of the former. Because teosinte subspecies 563 differentiation was fully accounted for at K=5 (as shown by the clear distinction 564 between mexicana populations and the rest of the samples, Fig 4A), the inflation of 565 significant associations at K=5 is not due to subspecies differentiation, but rather to 566 residual stratification among populations within genetic groups. Likewise, recent 567 studies in humans, where global differentiation is comparatively low [96] have shown 568 that incomplete control for population structure within European samples strongly 569 impacts association results [61,97]. Controlling for such structure may be even more 570 critical in domesticated plants, where genetic structure is inferred a posteriori from 571 genetic data (rather than a priori from population information) and pedigrees are 572 often not well described. Below, we show that considering more than one correction 573 using minor peaks delivered by the Evanno statistic (S5 Fig) can be informative. 574 Considering a structure with 5 genetic groups, the number of SNPs associated 575 per trait varied from 1 to 55, with no association for leaf and grain coloration (S5 576   Table). False positives likely represent a greater proportion of associations at K=5 as 577 illustrated by a slight excess of small P-values when compared with a correction with 578 11 populations for most traits (S10 Fig). Nevertheless, our analysis recovered credible 579 candidate adaptive loci that were no longer associated when a finer-grained 580 population structure was included in the model. For instance at K=5, we detected 581 Sugary1 (Su1), a gene encoding a starch debranching enzyme that was selected during 582 maize domestication and subsequent breeding [98,99]. We found that Su1 was 583 associated with variation at six traits (male and female flowering time, tassel branching, height of the highest ear, grain weight and stomata density) pointing to 585 high pleiotropy. A previous study reported association of this gene to oil content in 586 teosintes [100]. In maize, this gene has a demonstrated role in kernel phenotypic 587 differences between maize genetic groups [101]. Su1 is therefore most probably a 588 true-positive. That this gene was no longer recovered with the 11-population structure 589 correction indicated that divergent selection acted among populations. Indeed, allelic 590 frequency was highly contrasted among populations, with most populations fixed for 591 one or the other allele, and a single population with intermediate allelic frequency. 592 With the 11-population correction, very low power is thus left to detect the effect of 593 Su1 on phenotypes. 594 Although the confounding population structure likely influenced the genetic 595 associations, experimental evidence indicates that an appreciable proportion of the 596 variants recovered with both K=5 and 11 populations are true-positives (S5 Table).  Table). 611 The proportion of genic SNPs associated to phenotypic variation was not 612 significantly higher than that of non-genic SNPs (i.e, SNPs >1kb from a gene) (χ² (df=1) 613 = 0.043, P-value = 0.84 at K=5 and χ ² (df=1) =1.623, P-value =0.020 with 11 614 populations) stressing the importance of considering both types of variants [106]. For 615 instance, we discovered a non-genic SNP (SNP_149) that displayed a strong 616 association with leaf width variation as well as a pattern of allele frequency shift with 617 altitude among populations (Fig 6B). width quantitative trait locus in maize [106]. Corroborating these results, we found consistent association between the only SNP located within this inversion and leaf 635 width variation in teosinte populations (S5 Table) Table) under spatially-varying selection. Because traits co-varied with environmental differences along gradients, however, statistical associations between genotypes and 660 phenotypes largely depended on control of population stratification. Yet, several of 661 the variants we uncovered seem to underlie adaptive trait variation in teosintes. 662 Adaptive teosinte trait variation is likely relevant for maize evolution and breeding. 663 Whether the underlying SNPs detected in teosintes bear similar effects in maize or 664 whether their effects differ in domesticated backgrounds will have to be further 665 investigated.

Description of teosinte populations and sampling. 669
We used 37 teosinte populations of mexicana (16) and parviglumis (21)  670 subspecies from two previous collections [57,58,112] to design our sampling. These 671 populations (S1 Table) are distributed along two altitudinal gradients (Fig 1) We defined an association panel of 11 populations on which to perform a 683 genotype-phenotype association study (S1 Table) populations from a first gradient (a) -two mexicana and three parviglumis, and six 689 populations from a second gradient (b) -one mexicana and five parviglumis (Fig 1).
Finally, we extracted available SNP genotypes generated with the 691 MaizeSNP50 Genotyping BeadChip for 28 populations out of our 37 populations [57] 692 (S1 Table). From this available SNP dataset, we randomly sampled 1000 SNPs found 693 to display no selection footprint [57], hereafter neutral SNPs. Data for neutral SNPs 694 (Data S1) are available at: 10.6084/m9.figshare.9901472. We used this panel of 28 695 populations to investigate correlation with environmental variation. Note that 10 out 696 of the 28 populations were common to our association panel, and genotypes were 697 available for 24 to 34 individuals per population, albeit different from the ones of our 698 association mapping panel. 699

Common garden experiments 700
We used two common gardens for phenotypic evaluation of the association 701 panel (11 populations The original sampling contained 15 to 22 mother plants per population. Eight 710 to 12 grains per mother plant were sown each year in individual pots. After one 711 month, seedlings were transplanted in the field. Each of the four fields (2 locations, 2 712 years) was separated into four blocks encompassing 10 rows and 20 columns. We 713 evaluated one offspring of ~15 mother plants from each of the 11 teosinte populations 714 in each block, using a semi-randomized design, i.e. each row containing one or two 715 individuals from each population, and individuals being randomized within row, 716 leading to a total of 2,640 individual teosinte plants evaluated. 717 718 SSR genotyping and genetic structuring analyses on the association panel 719 In order to quantify the population structure and individual kinship in our 720 association panel, we genotyped 46 SSRs (S4 Table) We employed STRUCTURE Bayesian classification software to compute a 727 genetic structure matrix on individual genotypes. Individuals with over 40% missing 728 data were excluded from analysis. For each number of clusters (K from 2 to 13), we 729 performed 10 independent runs of 500,000 iterations after a burn-in period of 50,000 730 iterations, and combined these 10 replicates using the LargeKGreedy algorithm from 731 the CLUMPP program [118]. We plotted the resulting clusters using DISTRUCT 732 software. We then used the Evanno method [119] to choose the optimal K value. We 733 followed the same methodology to compute a structure matrix from the outlier SNPs. 734 We inferred a kinship matrix K from the same SSRs using SPAGeDI [120]. Grain Color), and one trait related to Stomata (StD: Stomata Density). These traits 765 were chosen because we suspected they could contribute to differences among 766 teosinte populations based on a previous report of morphological characterization on 767 112 teosinte collections grown in five localities [124]. 768 We measured the traits related to plant architecture and leaves after silk 769 emergence. Grain traits were measured at maturity. Leaf and grain coloration were 770 evaluated on a qualitative scale. For stomata density, we sampled three leaves per 771 plant and conserved them in humid paper in plastic bags. Analyses were undertaken at 772 the Institute for Evolution and Biodiversity (University of Münster) as followed: 5mm 773 blade discs were cut out from the mid length of one of the leaves and microscopic 774 images were taken after excitation with a 488nm laser. Nine locations (0.15mm 2 ) per 775 disc were captured with 10 images per location along the z-axis (vertically along the 776 tissue). We automatically filtered images based on quality and estimated leaf stomata 777 density using custom image analysis algorithms implemented in Matlab. For each 778 sample, we calculated the median stomata density over the (up to) nine locations. To 779 verify detection accuracy, manual counts were undertaken for 54 random samples. 780 Automatic and manual counts were highly correlated (R²=0.82), indicating reliable 781 detection (see S1 Annex StomataDetection, Dittberner and de Meaux, for a detailed 782 description   from the available dataset of 1000 neutral SNPs (S1 Table). We performed 100,000 940 iterations, saving the matrix every 500 iterations. We then tested the correlation of 941 these to the last matrix obtained, as well as to an F ST    populations on the first PCA plane with gradients a and b indicated by triangles and circles, respectively. The 11 populations evaluated in common gardens are surrounded by a purple outline. Populations that were previously sequenced to detect selection footprints are shown in bold (S1 Table). B: Correlation circle of the 19 climatic variables on the first PCA plane. Climatic variables indicated as Tn (n from 1 to 11) and Pn (n from 12 to 19) are related to temperature and precipitation, respectively. Altitude, Latitude and Longitude (in blue) were added as supplementary variables, and CEBAJ and SENGUA field locations were added as supplementary individuals. Populations are ranked by altitude. parviglumis populations are shown in green and mexicana in red. Lighter colors are used for gradient 'a' and darker colors for gradient 'b'. Units of measurement correspond to those defined in S2 Table. For male and female flowering time, we report values for all 11 populations although very few individuals from the two most lowland populations (P1a and P2b) flowered.

Supporting information captions
Covariation with altitude was significant for all traits except for the number of nodes with ears on the main tiller (S3 Table).     indicates the 95% threshold of the simulated distributions and the red line refers to the observed difference. In this analysis, we considered as spatially-varying traits those for which the observed difference fell outside the 95% threshold. Note that Plant height was borderline significant. *: Set of traits detected by DRIFTSEL. Figure S9: Genomic F ST -scans on 6 teosinte populations. We computed 4 pairwise-F ST values from 6 populations previously sequenced (S1 Table)    Pairwise LD between 171 SNPs was estimated using r 2 , and corrected for structure at K=5 and kinship computed from 38 SSRs. Blue shaded bars show the 23 SNPs found to associate with at least one phenotype under the 11 populations structure correction. S1 Table. Description of 37 teosinte populations and sets of populations used in the present study by data types.