Genotype networks of 80 quantitative Arabidopsis thaliana phenotypes reveal phenotypic evolvability despite pervasive epistasis

We study the genotype-phenotype maps of 80 quantitative phenotypes in the model plant Arabidopsis thaliana, by representing the genotypes affecting each phenotype as a genotype network. In such a network, each vertex or node corresponds to an individual’s genotype at all those genomic loci that affect a given phenotype. Two vertices are connected by an edge if the associated genotypes differ in exactly one nucleotide. The 80 genotype networks we analyze are based on data from genome-wide association studies of 199 A. thaliana accessions. They form connected graphs whose topography differs substantially among phenotypes. We focus our analysis on the incidence of epistasis (non-additive interactions among mutations) because a high incidence of epistasis can reduce the accessibility of evolutionary paths towards high or low phenotypic values. We find epistatic interactions in 67 phenotypes, and in 51 phenotypes every pairwise mutant interaction is epistatic. Moreover, we find phenotype-specific differences in the fraction of accessible mutational paths to maximum phenotypic values. However, even though epistasis affects the accessibility of maximum phenotypic values, the relationships between genotypic and phenotypic change of our analyzed phenotypes are sufficiently smooth that some evolutionary paths remain accessible for most phenotypes, even where epistasis is pervasive. The genotype network representation we use can complement existing approaches to understand the genetic architecture of polygenic traits in many different organisms.


Introduction
Understanding how genotypes are linked to phenotypes and ultimately to fitness is of central interest for evolutionary biology, because genotypic changes in the form of DNA mutations cause heritable variation in phenotypes, on which natural selection acts. A common approach to study the relationship between genotypes and phenotypes relies on the concept of genotypephenotype maps introduced by Alberch [1]. Such maps have been characterized for phenotypes on multiple levels of biological organization. For example, these maps relate RNA genotypes to RNA phenotypes [2,3], short amino acid sequences to folded proteins [4,5], genotypes of regulatory networks to gene expression phenotypes [6,7], genotypes of metabolic networks to nutrient consumption phenotypes [8,9], and genotypes of developing organisms to the morphologies of macroscopic body structures like teeth [10]. Genotype-phenotype maps have even been characterized for digital organisms [11]. While many early studies used computational approaches, genotype-phenotype maps are increasingly studied experimentally, for instance in a transfer RNA [12], a ribozyme [13], a small nucleolar RNA [14], a green fluorescent protein [15], and the antibiotic resistance beta-lactamase TEM-1 [16]. Both empirical and experimental studies of such maps address fundamental evolutionary questions, for example about the implications of pleiotropy for the evolvability of complex organisms [17], the robustness and persistence of bet-hedging [18], the role of epistasis in evolution [19], or the influence of genetic correlations on mutational robustness and evolvability [20].
A genotype-phenotype map can be represented as a network of genotypes. The vertices (or nodes) of this network are genotypes, that is, nucleotide or amino acid sequences. Two genotypes are connected by an edge (that is, they are neighbors), if the corresponding sequences differ in a single small mutation, such as a nucleotide or amino acid change [2,21,22]. Genotype networks can provide insights into the role of evolutionary constraints for adaptive evolution [22], the role of convergence [23], or the evolution of binding affinities of transcription factors to DNA [24][25][26]. Moreover, they can help us understand a fundamental characteristic of biological entities, namely their evolvability. Evolvability describes the potential of a biological system to bring forth adaptive heritable variation, such as mutations that increase fitness [27,28]. In genotype networks, a succession of such mutations constitutes an accessible path, that is, a path of edges along which fitness increases monotonically. Any one high fitness genotype may be reachable through multiple paths, but only some of these paths may be accessible, while others traverse fitness valleys. This fraction of accessible paths can serve as a proxy of evolvability, because it quantifies how easily high fitness genotypes can be reached through a series of mutations that never decrease fitness [25,[29][30][31][32][33]. One important factor that can affect path accessibility is epistasis, that is, non-additive interactions between mutations, because it can create steps of decreasing fitness along a mutational path, and thus reduce path accessibility [28,[34][35][36][37][38][39].
Here we apply the genotype network concept to experimentally determined organism-level phenotypes in order to study phenotypic accessibility and epistasis. Specifically, we characterized the architecture of genotype networks for 80 different quantitative phenotypes of the thale cress Arabidopsis thaliana, whose association with genotypes has been characterized in multiple genome-wide association studies [40]. A genome-wide association study (GWAS) identifies genomic sites at which nucleotide changes affect a phenotype. For any one genomic position, it quantifies whether different nucleotide variants are statistically associated with the phenotypic value of different individuals in a population. The method was pioneered in human genetics, where it is used to understand complex genetic traits that underlie different diseases [41]. In plants, a GWAS is often used to study the genetic architecture of agronomically important traits like crop yield and quality [42][43][44][45][46]. Arabidopsis thaliana is ideally suited for a GWAS because its natural inbred lines (called accessions) can be maintained via self-fertilization, such that many phenotypes of genetically similar individuals can be scored. Thus, multiple traits in A. thaliana have been successfully analyzed in a GWAS [40,[47][48][49][50][51][52][53][54].
Our work builds on a pioneering study that performed a GWAS on 107 different phenotypes of A. thaliana [40]. Among these 107 phenotypes are 80 quantitative traits. We built a genotype network for each of these 80 phenotypes and characterized its structure. We report pervasive epistasis in these genotype networks. Even though such epistasis can reduce the accessibility of mutational paths to extreme phenotypic values, a substantial fraction of all mutational paths to each of our analyzed genotypes remains accessible.

Building genotype networks whose size depends on the chosen P-value cutoff
We first reconstructed separate genotype networks for each of the 80 quantitative phenotypes. These genotypes fall into four categories, namely flowering, development, defense, and the concentrations of various ions. Phenotypes in each category are highly diverse. For example, flowering phenotypes include the expression level of the MADS-box transcription factor FLOWERING LOCUS C that inhibits flowering until exposure to a prolonged cold period (that is, after vernalization [55]). They also include the number of days required for the formation of first buds at a growth temperature of 16˚C. Developmental phenotypes include the germination rates of seeds that were stored for 28 days under dry conditions, but also the hypocotyl length after seven days of growth. Defense phenotypes include the number of colony forming units of the plant pathogen Pseudomonas syringae pv. tomato DC3000, and the number of offspring of the aphid Myzus persicae. Finally, the ion phenotype category comprises concentrations of two type of ions, those of heavy metals such as nickel or cobalt, and those of trace elements required for processes as diverse as stress resistance (e.g. potassium [56]) and enzyme catalysis (e.g. calcium [57]). We will discuss results for these four phenotypic categories separately throughout the paper.
The number of genomic positions associated with a phenotype depends on the P-value threshold below which a statistical association between phenotypic value and genotype at that position is deemed significant [58]. We first chose a specific P-value cutoff, and considered all genomic positions as associated with a phenotype below this cutoff as sites that affect the phenotype ( Fig 1A). We then concatenated, separately for each accession, the nucleotides at each significant locus into a single nucleotide string. The length of this string is equal to the number of significant positions (Fig 1B). This string is a compact representation of the accession's genotype or, more specifically, of its haplotype, at those genomic positions that affect the phenotype. Each unique string corresponds to one vertex in the genotype network. Two vertices are connected if they are nearest neighbors, that is, if they differ in exactly one nucleotide site ( Fig 1C).
If we choose a large (non-stringent) P-value in this construction, many loci will appear to be associated with the phenotype, and the resulting nucleotide string will be long ( Fig 1B). However, the likelihood that two such strings differ in exactly one position decreases with the length of the string. In consequence, most genotypes may be very different from one another, and only few genotypes will be neighbors. Such a genotype network will typically have many vertices, but it may be fragmented into multiple small connected components -sets of vertices that can be reached from each other through a sequence of edges. As this P-value decreases to become more and more stringent, each genotype will include fewer and fewer genomic positions, resulting in shorter nucleotide strings ( Fig 1B). As these strings become shorter, it also How to build a genotype network, illustrated with a hypothetical example. (A) Genome-wide association studies identify genomic positions affecting a phenotype. This statistical association is expressed with P-values, which are small (� 10 -9 ; black type face in this hypothetical example) for genomic positions strongly associated with a phenotype. They are larger (� 10 -4 ; dark grey type face) for genomic positions more weakly associated with the genotype. Light grey type face indicates nucleotides with no significant association to the phenotype. (B) Nucleotides at positions significantly associated with a phenotype are concatenated into one nucleotide string. The resulting string will be shorter if only strongly associated positions (smaller P-values) are considered (black letters, left hand side) and longer if also more weakly associated positions are considered (larger P-values, right hand side, black and dark grey letters). (C) Each nucleotide string of significantly associated positions becomes one vertex in a graph that we refer to as a genotype (haplotype) network. Two such vertices are connected by an edge if they differ in exactly one nucleotide. Black edges connect shorter nucleotide strings resulting from smaller P-values, and dark grey edges connect longer nucleotide strings resulting from larger (but still significant) P-values. We note that fewer pairs of nucleotide strings can be connected through edges as the length of nucleotide strings (and the P-value) increases. We note further that in biological data, multiple individuals may have the same genotype at all positions associated with the phenotype. For didactic reasons, our hypothetical example uses multiallelic genome positions, whereas the data we analyze comprises only biallelic positions.
https://doi.org/10.1371/journal.pcbi.1008082.g001 becomes more likely that they become identical to each other, and the number of strings with one nucleotide difference will become smaller. Both cases -genotype strings that are too long (large P-values) or too short (small P-values) -are not conducive for network analysis. For each phenotype, we thus explored a broad range of P-values between 10 -4 (largest published value and least stringent) and 10 -9 (most stringent), to identify one that yielded a network with the maximum number of squares (that is, cycles of length four) and a maximum component size. We aimed to maximize the number of squares in a genotype network, because their presence is required for the analysis of epistasis (see Methods).

Connected genotype networks can be built for whole-organism phenotypes
For all 80 phenotypes, we found that intermediate P-value cutoffs between 10 -4.5 and 10 -6 yielded genotype networks with the highest number of squares and the largest component size (S1 Fig, S1 Table and S2 Table). In each of the four phenotype categories (flowering, ions, defense, development), a single P-value lead to one network with a maximum number of squares. The network component with the largest number of squares comprised 39 squares and 45 vertices (flowering phenotypes), 10 squares and 13 vertices (ion phenotypes), 58 squares and 52 vertices (defense phenotypes), and 27 squares and 36 vertices (developmental phenotypes; S1 Fig, S1 Table and S2 Table). Fig 2 shows the structure of these four genotype networks, which represent the following phenotypes: (i) in the flowering category, plant diameter at flowering (phenotype 1, abbreviated as 'FT Diameter Field' in [40], i.e., the largest diameter of the plant on the day where the first flower was observed, Fig 2C); (ii) in the ions category, the concentration of arsenic (phenotype 67, 'As75' in [40]); (iii) in the defense category, the number of colony forming units of the bacterial plant pathogen Pseudomonas viridiflava, to which we refer below as 'bacterial growth' (phenotype 70, 'As CFU2' in [40], Fig 2E); and (iv) in the development category, the diameter of plants grown at 16˚C at eight weeks after germination, which we refer to as 'plant width' (phenotype 14, 'Width 16' in [40], Fig 2G). Pvalues leading to the maximum number of squares in each genotype network did not necessarily yield the maximum component size in each network (S2 Fig and S2 Table). However, since we aimed to study the influence of epistasis on the evolvability of whole-organism phenotypes, we chose P-value cutoffs leading to the largest number of squares in a network component for subsequent analyses. The number of squares in the four example networks is comparable to random networks for all but the defense genotype network, where the number of squares is much higher in the biological data (S1 Text and S3 Fig). For most analyses below, we will present detailed results for the four genotype networks shown in Fig 2, and discuss summary statistics for the remaining 76 genotype networks. At our chosen P-value threshold 13 genome positions (loci) are associated with the flowering phenotype, 10 positions with the ions phenotype, 79 positions with the defense phenotype, and 24 positions with the development phenotype (S1 Table; see S3 Table for annotations of protein coding genes in which these genomic positions lie).

Three of four largest genotype networks show a modular structure
Modularity is a common feature of biological networks [60][61][62], and it has also been reported for genotype networks, for example those of RNA molecules and the hemagglutinin sequences of influenza A viruses [3,23,63]. To identify modules (or 'communities') in our genotype networks, we applied an algorithm developed by Blondel et al. [64] as implemented in Gephi [59]. This revealed that three out of the four networks could be partitioned into two modules. The two main modules found in each network are highlighted by different coloring of vertices in a genotype network (Fig 2A, Fig 2C and Fig 2E). Phenotypic values differed significantly between the two modules in the flowering and defense genotype network ( Fig 2B and Fig 2F; P-values � 0.03007, Wilcoxon Rank-Sum test). However, the biological significance of modularity is difficult to assess in our analysis, because our genotype networks are generally small. We also investigated the relation between the number of mutational steps that separate vertices in a genotype network and the phylogenetic distance between accessions, but found only a weak association between these quantities (S2 Text and S4 Fig). This is not surprising, because loci that affect any one phenotype may diverge at rates different from that of an entire genome, especially if the phenotype is subject to selection.

Pervasive epistasis in genotype networks of whole-organism phenotypes
Because our analyses are based on genomic positions associated with phenotypic change [65] one would expect that neighbors in a genotype network -which differ at exactly one phenotype-associated position -have different phenotypic values. This may not be the case, however, if the phenotypic effects of mutations strongly depend on the genetic background in which they occur, that is, if epistasis is pervasive. To investigate the effect of each single mutational step in a genotype network, we first quantified the amount of phenotypic variation (δ) in our data that is due to non-genetic causes, such as measurement noise. Specifically, we computed for each phenotype the average coefficient of variation for phenotypic values measured from accessions where all phenotype-associated positions had the same genotype (Methods and S1 Table). We considered two phenotypic values as identical if they differed by less than δ.
Based on this criterion, many neighboring genotypes in a genotype network showed identical phenotypic values ( Fig 3A). Motivated by these observations, we next quantified epistasis in a more fine-grained way by following a previously established approach [25]. This approach is based on analyzing individual squares in a genotype network (Fig 3B). It allows us to discriminate between magnitude epistasis, simple sign epistasis, and reciprocal sign epistasis ( Fig  3C). In magnitude epistasis, the phenotypic value of a double mutant is higher than the sum of the phenotypic values of two single mutants (Fig 3C, subpanel ii). In simple sign epistasis, one single mutant has a higher phenotypic value and the second single mutant shows a lower phenotypic value compared to the wild type ( Fig 3C, subpanel iii). In reciprocal sign epistasis, both single mutants have a lower phenotypic value than the wild type ( Fig 3C, subpanel iv). We quantified each of these three kinds of epistasis in each phenotype and grouped the results by our four major phenotype categories ( Fig 3D).
We found epistasis in 67 out of 76 investigated phenotypes, and in 51 phenotypes, each pairwise interaction was epistatic (S1 Table). Epistasis of all three kinds was abundant in each In each network, a vertex (circle) represents a genotype (nucleotide string) and two vertices are connected by an edge if the underlying genotypes differ in one nucleotide. The size of each vertex is proportional to its vertex betweenness, which is the fraction of shortest paths through all vertex pairs that visit this vertex. Each network was laid out by the force-directed Fruchterman-Reingold graph embedding algorithm implemented in Gephi [59]. Networks shown in (A), (C) and (E) can be subdivided into two modules and each vertex in these networks is colored according to its module membership. Vertices of components that are not connected to the largest network component are colored black. Isolated vertices (that is, genotypes without a onemutant neighbor) are not depicted. No modular structure could be detected in the genotype network of plant width (G). phenotype category, affecting up to 100% of all pairwise interactions in a genotype network (S1 Table). For example, the genotype network of the phenotype days until flowering at 10˚C (phenotype 31) contained 17 squares, of which 12 showed magnitude epistasis, four showed simple sign epistasis, and one showed reciprocal sign epistasis. In the ion category, networks of the phenotypes magnesium concentration and selenium concentration (phenotypes 105 and 33) contain eight squares, and each of them showed magnitude or simple sign epistasis. The defense phenotype bacterial disease resistance (phenotype 42) was also found to be highly epistatic. In nine squares, we found five with magnitude epistasis, two with simple sign epistasis and two with reciprocal sign epistasis. Finally, the phenotype days to germinate at 10˚C (development; phenotype 59) contained 14 squares among which 12 showed magnitude epistasis and two showed reciprocal sign epistasis. Moreover, magnitude epistasis occurred in all six squares of the developmental phenotype hypocotyl length (phenotype 44), and we detected simple sign epistasis in all four squares of the developmental phenotype number of days between germination and senescence of the last flower (phenotype 58). In contrast, we found no epistasis in the two defense phenotypes bacterial disease resistance (phenotypes 103 and 83), the two ions phenotype copper and arsenic concentration (phenotypes 24 and 67), and in five developmental phenotypes, namely vegetative growth during and after vernalization (phenotype 72 and 39), secondary seed dormancy after six weeks of cold treatment (phenotype 106), seedling growth rate (phenotype 90) and lesioning measured by dry weight (phenotype 93) (S1 Table).
We also detected a high frequency of epistatic interactions when we chose a different Pvalue cutoff for our network reconstructions (S5 Fig). Specifically, we used for this sensitivity analysis P-values that are 0.5 scores (-log 10 ) higher or lower than P-values leading to genotype networks with the largest number of squares. Moreover, we detected also a high fraction of epistatic interactions when we took potential linkage disequilibrium into account (S3 Text and S6

Accessibility of evolutionary paths to minimum and maximum phenotypic values
A genotype network represents a sample of genotypes from an adaptive landscape, and to study such landscapes, it is useful to determine the number of fitness peaks and the number of accessible (fitness-increasing) paths to these peaks. Our phenotype data do not contain direct measurements of fitness, with the possible exception of fitness-proxies, such as a low number of colony forming units of plant pathogenic bacteria (phenotypes 21 and 70) as indicators of high pathogen resistance. However, even the number of mutational paths to maximal (minimal) phenotypic values along which a phenotype monotonically increases (decreases) can be informative about evolutionary dynamics [25,66,67], because a higher fraction of accessible mutational paths may reflect higher evolvability [28].
To find out how accessible minimum and maximum phenotypic values are for all of our genotype networks, we first enumerated the distribution of the length of all paths through the genotype network associated with each phenotype. Unsurprisingly, this distribution strongly depends on the phenotype (S8 Fig). We then focused only on those paths that reach the phenotype with the highest phenotypic value and determined the fraction of paths that monotonically increase the phenotypic value. We refer to these paths as mutationally accessible [29][30][31]. To ensure comparability of results across phenotypes, we considered only genotype networks where a single vertex is associated with the maximum phenotypic value. Below, we only discuss the fraction of accessible paths to the maximum value in detail, because analyzing the fraction of accessible paths to the phenotypic minimum yielded similar results (S9 Fig). We present results separately for each phenotype category (Fig 4).
The fraction of accessible paths generally decreased with increased path length, but the rate of this decrease differed between phenotypes. For example, all mutational path with a length between 1 and 2 were accessible in all flowering phenotypes (Fig 4A), in all defense phenotypes (Fig 4C), and in all developmental phenotypes (Fig 4D), but not in all ion phenotypes ( Fig 4B).
Specifically, in the ion phenotype boron concentration (phenotype 54), only 45.5% of such mutational paths were accessible (S1 Table), and in the ion phenotypes potassium and sulfur concentration (phenotypes 81 and 91) 50% of paths were accessible. The maximum fraction of accessible mutational paths with a length between two and three assumed its lowest values of 13.3% in the ion phenotype calcium concentration (phenotype 51), and 20.0% in the flowering phenotype days to flowering at 22˚C (phenotype 102). It assumed its highest values of 100% in both the ion phenotype nickel concentration (phenotype 64), and in the developmental phenotypes days to germination at 22˚C (phenotype 45) and vegetative growth rate (phenotype 72). Together, these data illustrate that the ability to increase a phenotype monotonically strongly depends on the phenotype.
Again, we performed a sensitivity analysis for these results. While the length of mutational paths to maximum phenotypic values decreases in smaller networks, a large fraction of mutational paths remains accessible if we construct networks with higher or lower P-value cutoffs (S10 Fig). We conducted this analysis for P-values that vary by one order of magnitude around the default P-value yielding genotype networks with the largest number of squares (S10 Fig).  One factor that influences the fraction of accessible mutational paths is the incidence of sign epistasis, which can render a mutational path inaccessible (Fig 3C and [28,68]). The presence of epistasis strongly affects the fraction of accessible mutational paths in some of the phenotypes we studied. One class of noteworthy examples comprises phenotypes with short mutational paths and nonetheless low accessibility. A case in point is the ion phenotype Boron concentration (phenotype 54), where the average path to the maximum phenotypic value requires 1.9 mutations, but where only 45% of paths are accessible (S1 Table). Consistent with an expectation of substantial epistasis, three out of four squares in the genotype network for this phenotype showed reciprocal sign epistasis and one square showed simple sign epistasis (S1 Table). Another example is the ion phenotype calcium concentration (phenotype 51), where paths to the maximum phenotypic value have a mean number of 2.67 mutational steps, but where only 13.3% of these paths are accessible (S1 Table). For this phenotype, we found reciprocal sign epistasis in four out of six squares, as well as simple sign epistasis and magnitude epistasis in the remaining two squares (S1 Table). The prevalence of reciprocal sign and simple sign epistasis limits the number of accessible paths.
Conversely, a low incidence of epistasis can render even long paths to some phenotypes highly accessible. Among the most extreme examples is the developmental phenotype primary dormancy after seven days of dry storage (phenotype 20). Here, an average of 3.06 mutational steps are needed to reach the maximum phenotypic value, but 100% of all paths are accessible. In the phenotype vegetative growth rate after vernalization (phenotype 39), an average of 3.4 mutational steps are required to arrive at the maximum phenotypic value, and again, 100% of all paths are accessible. The reason is that epistasis is largely absent in both phenotypes, with the exception of one square that shows magnitude epistasis in the latter phenotype (S1 Table). Of note is also the defense phenotype bacterial disease resistance (phenotype 2). Even though it has a long average path of 4.62 mutational steps to the maximum phenotypic value, almost a third (30.8%) of these paths are accessible (S1 Table). The reason is again a rarity of epistasis: 14 out of 27 squares in its genotype network show no epistasis, and four squares show only magnitude epistasis, which does not affect path accessibility (S1 Table). Similarly, in the ion phenotype magnesium concentration (phenotype 105), the average mutational path length is 5.25 steps, 41.7% of all paths are accessible (S1 Table), and none of its squares shows reciprocal sign epistasis.
When considering all phenotypes, we detected a negative correlation between the incidence of reciprocal sign epistasis and the fraction of accessible mutational paths (Kendall's tau = -0.1973; P-value = 0.0414; n = 69). This result is consistent with previous studies demonstrating that reciprocal sign epistasis limits the accessibility of mutational paths [28,68]. The association between simple sign epistasis and the mean fraction of accessible mutational paths was also negative but not significant (Kendall's tau = -0.154; P-value = 0.0943; n = 69), which we attribute to our modest sample size, and to the fact that simple sign epistasis reduces path accessibility less than reciprocal sign epistasis (Fig 3C). Also, we observed no significant correlation between magnitude epistasis and the mean fraction of accessible mutational paths (Kendall's tau = -0.125; P-value = 0.1782; n = 69), which is consistent with the fact that magnitude epistasis does not affect path accessibility (Fig 3C).

Discussion
We constructed genotype networks of whole-organism phenotypes from genome-wide association data for 80 quantitative Arabidopsis thaliana phenotypes that fall into four categories (flowering, ions, defense, and development). Representative networks from each category formed smaller or at most equally large connected components than networks constructed from randomized genotype strings with similar genetic diversity (S3 Fig). However, because our analysis is based on a limited number of 199 genotypes (accessions), the size of this component was generally modest and comprised only up to 115 genotypes. This modest size may help explain why network statistics like maximum vertex betweenness were not distinguishable from that of random networks in two of the four studied genotype networks.
The genotype networks we analyzed reveal that epistasis is widespread for organism-level phenotypes. Such epistasis has also been observed for diverse types of molecular phenotypes, for example those of the green fluorescent protein [15], the beta-lactamase TEM-1 conferring resistance to ampicillin [69], a tRNA gene [12], or in the binding of transcription factors to DNA [25]. Epistasis has also been reported for organism-level phenotypes of A. thaliana, for example for metabolomic phenotypes [70], stem branching [71], or flowering in different environments [72]. However, these studies relied on quantitative trait loci (QTL)-mapping, which provides associations between genomic regions and phenotypes. In contrast, our analysis of epistasis is based on mutations in single genomic positions and thus allows a fine-grained analysis of epistatic interactions that involve single nucleotide changes. In addition, our analysis is built on many (80) and very different kinds of quantitative phenotypes. It shows that epistasis is widespread for organism-level phenotypes and pervasive for some phenotypes, affecting all pairwise genetic interactions in 51 phenotypes (S1 Table; we note that in nine of these cases, only one square was present in the genotype network). Together with previous work, our analysis underscores that epistasis affects all levels of biological organization.
Pervasive sign epistasis influences phenotypic evolvability, because it can render mutational paths to extreme phenotypes inaccessible [28,68]. When quantifying the accessibility of mutational paths for each phenotype, we found that it strongly depends on the phenotype. For example, even short paths of one to two mutational steps were inaccessible in the three ion phenotypes potassium concentration, sulfur concentration, and boron concentration. Conversely, in the ion phenotype magnesium concentration, 41.7% of all paths between five and six mutational steps were accessible. While we found epistasis in almost all studied phenotypes (67 out of 76), it did not prevent the accessibility of maximum phenotypic values for the vast majority of phenotypes. That is, in 69 analyzed phenotypes, at least 10.3 percent of paths to the maximum were accessible.
We note that at least two factors beyond the scope of our work may help increase the accessibility of phenotypes we study here. First, because of the small number of accessions and the resulting small genotype network sizes, we did not consider epistatic interactions between more than two genomic sites, which can open paths that appear inaccessible when only pairwise interactions are considered [73][74][75][76]. Second, even strong and pervasive epistasis may not prevent phenotypic accessibility whenever populations are small and genetic drift is strong. The effective population size of A. thaliana is thought to be small, which may be explained by a strong bottleneck during the last ice age and a high frequency of inbreeding [77,78]. Strong drift can help traverse fitness valleys created by epistasis, and thus render otherwise inaccessible paths accessible [79].
Our small sample size of less than 200 accessions causes further limitations of our study. Phenotypes influenced by rare genetic variants, or by genomic loci with small effects can only be detected in larger samples [80], such as a collection of 1,135 recently genotyped (but not yet as thoroughly phenotyped) A. thaliana accessions [81]. A GWAS with more accessions would also permit the identification of many more loci associated with a truly polygenic trait [58]. For our genotype network analysis, such larger sample sizes would have several consequences. First, because A. thaliana genomes are not highly polymorphic [81], multiple accessions in our genotype networks had identical genotypes at the loci associated with a given phenotype. A larger sample would increase the diversity at those positions, and it would lead to larger connected genotype networks. Larger genotype networks could also strengthen the tentative evidence for modularity we found in our modest data set, which would permit assessing the biological significance of this modularity. Moreover, larger genotype networks may have multiple vertices associated with high phenotypic values, which could further increase the accessibility of paths to extreme phenotypes. Finally, much greater sample sizes could help in analyses that were completely beyond the scope of this work. For example, paths through a large genotype network may affect multiple phenotypes, such that genotype network analysis may help understand not just pleiotropic effects of mutations, but how trade-offs and synergisms between multiple phenotypes affect a phenotype's evolvability.

Data resources
We based our analysis on previously conducted genome-wide association studies in A. thaliana [40], where 199 accessions were phenotyped in 107 experimental conditions, and genotyped with a resequencing microarray covering 248,584 genomic positions. The genomic positions for this customized array were chosen on the basis of genomic polymorphism and linkage disequilibrium data [82]. For each of these phenotypes and genomic positions, the null hypothesis of no statistical association between phenotype and genotype was tested, and those genotypes significantly associated with a phenotype based on the rejection of the null hypothesis at a specific P-value were recorded. We restricted our analyses to all quantitative phenotypes from this study, of which there are 80 (S1 Table). These phenotypes can be grouped into four categories. They comprise phenotypes related to flowering (23 phenotypes), plant development (26 phenotypes), defense (13 phenotypes), and the concentrations of various ions in the plant (18 phenotypes). These ions include heavy metals, but also trace elements like potassium and calcium (S1 Table). The latter elements are, for example, necessary as enzyme co-factors or for stress responses. For each of the quantitative phenotypes, we retrieved the observed phenotypic values for each A. thaliana accession from the database AraPheno [83]. We also retrieved the genomic positions significantly associated with each phenotype, together with the corresponding P-value from the AraGWAS Catalog [84,85]. This database contains recomputed GWAS results based on the imputed genotypes of the 1001 Genomes Consortium [81]. We obtained data for all analyzed genomic positions, that is, we did not exclude positions with rare alleles in our main analyses. However, to assess the sensitivity of our findings to the presence of genomic positions with minor allele counts, we removed genomic positions with minor allele counts lower than five percent (the default setting of the AraGWAS Catalog). Results for this analysis are summarized in S4 Table and S5 Table. We note that not all 199 accessions were used in each association study (S1 Table).

Reconstruction and analysis of genotype networks for each phenotype
We first reconstructed separate genotype networks for each phenotype. To this end, we first chose a specific P-value cutoff, and considered all genomic loci associated with a phenotype below this cutoff as loci that significantly affect the phenotype. For the analyses described below, we varied these P-value cutoffs between 10 -4 and 10 -9 , and note that a large P-value implies that more genomic positions are considered significantly associated with a phenotype (Fig 1B). We then concatenated, separately for each accession, the nucleotides at each significant locus into a single nucleotide string. The length of this string is equal to the number of significant positions (Fig 1B). This string is a compact representation of the accession's genotype or, more specifically, of its haplotype, at those positions significantly associated with the phenotype. The number of these strings is equal to the number of accessions used in the association study for this phenotype. Our approach ensures that all resulting nucleotide strings are of the same length, and that the strings represent the same genomic positions, that is, they are aligned. For all resulting nucleotide strings that were identical among several plant accessions, we retained only one concatenated sequence, and used the mean phenotypic value across all accessions with this genotype string as the phenotypic value associated with this genotype. We call two nucleotide strings one-mutant neighbors if they differed in exactly one position. In all subsequent analyses, we considered only genotype strings that had at least one such neighbor. In the language of graph theory, this means that we analyzed only connected components of a genotype network that comprised at least two vertices [86].
We then used the nucleotide strings that had at least one neighbor as input for the R package igraph [87] to build a network in which each vertex corresponds to one of the strings, that is, to one genotype whose constituent nucleotides are significantly associated with the phenotype. Neighboring vertices correspond to neighboring strings, that is, to those genotype that differ in a single nucleotide. For each phenotype, we used multiple P-value cutoffs ranging between 10 -4 and 10 -9 , that is, we constructed a genotype network for each of these cutoffs, and determined the number of connected components comprising at least two vertices, as well as the size of the largest component, to determine the P-value that yields a network best suited for network analysis. To identify modules in our network, we employed the algorithm of Blondel and colleagues [64] as implemented in Gephi version 0.9.2 [59]. We also used Gephi to display genotype networks.

Creation of random genotype networks
To assess whether properties of the four example genotype networks presented in Fig 2 could be observed by chance alone, we performed randomization tests based on random genotype networks. Specifically, for each of the four example phenotypes plant diameter at flowering, zinc concentration, bacterial growth, and plant width, we created a random genotype network as follows. First, we determined the proportion of each of the four nucleotides in each column of the aligned genotype strings that we had used in creating the network. Second, we generated random nucleotide strings of the same length and with the same nucleotide composition provided by the GWAS data. Specifically, for each position of such a random string, we randomly chose a nucleotide with a probability equal to the proportion of this nucleotide in the corresponding column of the aligned genotype strings. We then built a genotype network for the resulting random strings in the same manner as for the biological genotypes. We repeated this procedure 10,000 times, and created one genotype network from each of the resulting 10,000 sets of random genotype strings. Similar to the analysis of networks based on biological data, we determined the number of squares in the resulting genotype network, the maximum component size, the maximum vertex betweenness, and the number of unique genotypes for each random genotype network.

Reconstruction of phylogenetic relationships
To infer phylogenetic relationships between all 199 A. thaliana accessions, we employed the nucleotides at all 214,052 polymorphic positions present in the resequencing microarray [40]. Since each position on the array corresponds to one (polymorphic) genomic position, we concatenated the genotyped nucleotides for each accession into one contiguous nucleotide string. The strings from different accessions are of equal length (there were no missing data), which obviates the need for aligning them. We then used FastTree version 2.1.10 SSE3 [88] under a generalized time-reversible model (option -gtr) to construct a phylogenetic tree from these sequences with 1,000 bootstrap replicates. The resulting tree is illustrated in S11 Fig and the tree's structure is provided in Newick format in S4 Text. We considered the number of internal nodes separating two accessions as a measure for the phylogenetic distance between them. We determined this number with the function distTips of the R package adephylo [89]. As a complementary distance measure, we also computed the number of nucleotide differences between all pairs of concatenated nucleotide sequences. To find out if one-mutant neighbors -genotypes differing in one nucleotide -in a genotype network also tend to be neighbors in the phylogenetic tree, we examined all pairs of neighboring genotypes in each genotype network and quantified how distant they are in the tree. Whenever one or both neighboring vertices in a genotype network were associated with multiple identical genotypes, we calculated the mean phylogenetic distance (number of intervening internal nodes and pairwise nucleotide differences) between all underlying genomes.

Estimating measurement noise for quantitative phenotypes
To analyze how strongly phenotypes vary among different genotypes in a genotype network, it was necessary to estimate the non-genetic phenotypic variation for each phenotype. Since data from replicate experiments were only available for five phenotypes, we based this estimate on those accessions that had identical genotypes at all loci significantly associated with the phenotype. If the focal phenotype varies among such accessions, the cause is likely to be non-genetic. Specifically, for each of our 80 phenotypes, we identified all vertices in the associated genotype network for which at least three accessions had identical nucleotide sequences. For each such vertex, we then calculated the coefficient of variation of the phenotypic values for these accessions, that is, we divided the standard deviation of the phenotypic values by the mean of these values. Finally, we averaged the coefficients of variation for all vertices and used this average as measure of non-genetic phenotypic variation (δ) for the focal phenotype (S1 Table). Between 2.94% and 80.0% of all vertices in a network contributed to the calculation of δ (S1 Table).

Detection of epistasis
To identify the presence of epistasis and to quantify the incidence of different types of epistasis, we first detected all squares in each genotype network by following a previously established approach [25,90]. A square consists of four genotypes (vertices), which comprise a focal genotype, two of its one-mutant neighbors, and the double mutant that can be formed from the single mutants (Fig 3B and Fig 3C). By convention, we designated the vertex with the highest phenotypic value as the 'double mutant' genotype AB, and we ignored all squares where more than one vertex had this maximal value to ensure meaningful comparisons between squares. We called the 'wild type' genotype 'ab' and the remaining vertices 'Ab' or 'aB'. Without epistasis (that is, when only additive interactions between genotypes occur), the phenotypic values V of the four genotypes should show the following relationship: where V AB is the phenotypic value of the double mutant, V ab is the phenotypic value of the wild type focal genotype and V Ab and V aB are the phenotypic values of the single mutants. If epistasis occurs, its magnitude can be calculated as follows: We considered epistasis to be present only if the absolute value of ε exceeded the threshold of non-genetic phenotypic variation (δ), whose calculation we described in the preceding subsection. We were able to calculate a value for δ for 79 of our 80 phenotypes (see above). For all squares of genotypes where ε < δ, we set the value of ε to zero, thereby allowing a conservative estimation of epistasis. Whenever epistasis was present (ε � δ), we distinguished between magnitude, simple sign, and reciprocal sign epistasis ( [30] and Fig 3C) as follows: Magnitude epistasis : jDV ab!Ab þ DV aB!AB j ¼ jDV ab!Ab j þ jDV aB!AB j ð3Þ where ΔV denotes the difference in phenotypic values caused by one mutational step, for instance ab!Ab.
Sign epistasis : jDV ab!Ab þ DV aB!AB j < jDV ab!Ab j þ jDV aB!AB j ð4Þ Reciprocal sign epistasis : jDV ab!aB þ DV Ab!AB j < jDV ab!aB j þ jDV Ab!AB j ð5Þ We performed this analysis only for 76 phenotypes, because the genotype networks for the two ion phenotypes molybdenum and cobalt concentration (phenotype numbers 73 and 22), as well as for the flowering phenotype number of leaves when the bolt reached five centimeters (phenotype number 36) contained no square. Moreover, the genotype network of the development phenotype days to germinate at 16˚C (phenotype number 60) did not contain a square with a single maximum phenotypic value.

Accessibility of minimum and maximum phenotypic values
We asked how accessible maximum (and minimum) phenotypic values are for all of our genotype networks. Specifically, we enumerated for each phenotype all paths through the associated genotype network that monotonically increase (decrease) the phenotypic value and that reach the genotype with the highest (lowest) phenotypic value. We refer to these paths as mutationally accessible [29][30][31]. We followed a previously established strategy [25] to compute this path. Briefly, we define a mutational path as the smallest number of edges (shortest path) that connects two vertices i and j. We designated the vertex with the highest (lowest) phenotypic value in a genotype as vertex j, iterated through all other vertices i, computed the shortest path between i and j, and identified for each path whether it is accessible, that is, whether the phenotypic value monotonically increases (decreases) along the path's edges. Whenever we found several mutational paths with the same length between any two vertices i and j, we computed the accessibility for each path. We then calculated the fraction of accessible mutational paths, and refer to this fraction as the accessibility of minimum and maximum phenotypic values, respectively. In all these calculations, we considered the phenotypic values of two neighboring vertices as identical if the absolute difference of them does not exceed the coefficient of phenotypic variation (δ) for accessions with the same genotype at all phenotype-associated loci. (D) resulting from 10,000 randomly generated genotype networks. Box plots illustrate results from 10,000 replicates, and do so separately for each phenotype category ("devel": development). Green crosses and numbers in parentheses on the horizontal axis indicate the empirically observed values for the example phenotypes in the flowering (plant diameter at flowering), ions (arsenic concentration), defense (bacterial growth) and development ("devel", plant width) categories. (TIF) S4 Fig. Relationship between mutational path lengths and phylogenetic distance. We compared the mutational path length between two genotypes (accessions) in our genotype networks with the number of branch-points separating these accessions on a phylogenetic tree. (A) Principle of counting internal branch-points. An accession (vertex) in the genotype network corresponds to one leaf on the phylogeny tree shown in S7 Fig. For example, the two leaves shown here as dark grey circles are separated by one internal branch-point (light grey circle), and the two leaves shown as dark green circles are three internal branch-points apart (light green circles). (B) For our four example genotype networks shown in Fig 2, we estimated the length of the shortest path between all vertices. We then sorted all path according to their length (horizontal axis) and calculated the mean number of internal branch-points between two accessions separated by a given path length (vertical axis). We did this individually for the phenotype plant diameter at flowering (dark orange circles), zinc concentration (light orange squares), bacterial growth (light purple diamonds), and plant width (dark purple triangles). None of the four phenotypes showed a strong association between the number of internal branch points and the length of a mutational path (Kendall's tau between -0.021 and 0.388; Pvalues � 0.0473). (C) We extended the same analysis to all 80 phenotypes. Each box plot summarizes rank correlation coefficients (Kendall's tau) for one phenotype category (devl, development). In each box plot, the central horizontal line indicates the median value, lower and upper box limits show the first and third quartile, whiskers indicate values within the 1.5-fold interquartile range, and each open circle shows one data point. The number n of phenotypes in each category with correlations significantly different from zero is shown on the horizontal axes. Panels (D) and (E) are analogous to panels (B) and (C), but show the number of nucleotide differences between two accessions as a phylogenetic distance measure. Again, we observed mainly weak correlations (Kendall's tau between -0.123 and 0.438; Pvalues � 0.0418) between the mutational path length and the number of nucleotide differences. (TIF)

S5 Fig. Fraction of identical phenotypes and prevalence of epistasis obtained with nonoptimal phenotype P-value cutoffs.
To determine how sensitive our quantification of epistasis is to the chosen P-value cutoff, we created genotype networks with non-optimal P-values for all phenotypes (i.e. P-values that do not yield the largest number of squares in a genotype network). Specifically, we used P-values whose -log 10 value is higher (panels A and B) or lower (panels C and D) by 0.5 compared to the individual optimal P-value of each phenotype. (A and C) Fraction of genotype network neighbors with identical phenotypic values as a result of higher (panel A) or lower (panel C) P-values. We consider the absolute difference between the phenotypic values of two neighbors as identical if it does not exceed the coefficient of phenotypic variation (δ) for accessions with the same genotype at all phenotype-associated loci (Methods). We plot the fraction of neighbors with identical phenotypic values (vertical axis) in the form of a box plot for each of the four phenotype categories ("devl.": development). The number of phenotypes in each category is shown on the horizontal axis as sample size (n). (B and D) Frequency of different types of epistasis for higher (panel B) or lower (panel D) P-values to reconstruct genotype networks. The vertical axis shows the fraction of squares with magnitude, simple sign, and reciprocal sign epistasis as a box plot, where data are grouped according to the type of epistasis and phenotype category (horizontal axis; "fl": flowering; "io": ions; "df": defense; "dv": development). In each box plot, the central horizontal line indicates the median value, lower and upper box limit shows the values of the first and third quartile, whiskers indicate values within the 1. For each phenotype, we calculated the mean length of the mutational path from each vertex in the network to the vertex with the minimum phenotypic value. We show the number of paths (left vertical axes) binned according to path length (horizontal axes, path length between 1 and 2 mutational steps, between 2 and 3 mutational steps, etc.) as indicated by the grey histograms. The colored box plot in each bin shows the fraction of paths to the minimum phenotypic value that is accessible (right vertical axes). Each panel shows the result for one phenotype category, namely flowering (A), ions (B), defense (C), and development (D). (TIF) S10 Fig. Distribution of mutational path lengths to maximum phenotypic values, and fraction of accessible paths in genotype networks that we created with non-optimal P-values (that is, P-values that do not yield the largest number of squares in a genotype network). For each phenotype, we reconstructed a genotype network based on P-values whose -log 10 score is higher by 0.5 (panels A to D) or lower by 0.5 (panels E to H) than our default P-value for each phenotype. We then calculated the mean length of the mutational path from each vertex in all networks to the vertex with the maximum phenotypic value. We show the number of paths (left vertical axes) binned according to path length (horizontal axes, path length between 1 and 2 mutational steps, between 2 and 3 mutational steps, etc.) as indicated by the grey histograms. The colored box plot in each bin shows the fraction of paths to the maximum phenotypic value that is accessible (right vertical axes). Each panel shows the result for one phenotype category, namely flowering (A and E), ions (B and F), defense (C and G), and development (D and H). To construct this tree, we first concatenated nucleotides of polymorphic sites in the genomes that were represented on a re-sequencing chip [82]. All such strings had the same length and no position had missing information, that is, the concatenated strings can be aligned without gaps. We used the set of concatenated nucleotide strings as input to FastTree to create a phylogeny with 1,000 bootstraps, and displayed the resulting tree with iTOL. Each leaf corresponds to one accession and leaf labels indicate the accession number.