Characterization of genes and alleles involved in the control of flowering time in grapevine

Grapevine (Vitis vinifera) is one of the most important perennial crop plants in worldwide. Understanding of developmental processes like flowering, which impact quality and quantity of yield in this species is therefore of high interest. This gets even more important when considering some of the expected consequences of climate change. Earlier bud burst and flowering, for example, may result in yield loss due to spring frost. Berry ripening under higher temperatures will impact wine quality. Knowledge of interactions between a genotype or allele combination and the environment can be used for the breeding of genotypes that are better adapted to new climatic conditions. To this end, we have generated a list of more than 500 candidate genes that may play a role in the timing of flowering. The grapevine genome was exploited for flowering time control gene homologs on the basis of functional data from model organisms like A. thaliana. In a previous study, a mapping population derived from early flowering GF.GA-47-42 and late flowering ‘Villard Blanc’ was analyzed for flowering time QTLs. In a second step we have now established a workflow combining amplicon sequencing and bioinformatics to follow alleles of selected candidate genes in the F1 individuals and the parental genotypes. Allele combinations of these genes in individuals of the mapping population were correlated with early or late flowering phenotypes. Specific allele combinations of flowering time candidate genes within and outside of the QTL regions for flowering time on chromosome 1, 4, 14, 17, and 18 were found to be associated with an early flowering phenotype. In addition, expression of many of the flowering candidate genes was analyzed over consecutive stages of bud and inflorescence development indicating functional roles of these genes in the flowering control network.


Introduction
The reproductive developmental cycle of grapevine spans two years (S1 Fig). Grapevine plants need intense light and high temperatures to initiate inflorescences during spring, which develop and flower during the subsequent summer [1]. The ongoing tendency to higher temperatures in spring due to global warming causes earlier bud burst and flowering [2]. As a a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 different expression patterns, which can consequently result in varying manifestations of traits. The determination of these alleles is an important step in the dissection of corresponding traits. Among other approaches, haplotypic information can be obtained from DNA sequence fragments to reconstruct the two haplotypes of a diploid individual. A sequence fragment that covers at least two variant sites in a genome can link those variants together and thus phase them. When fragments are long enough to encompass multiple variant sites and the sequencing coverage is sufficiently high to provide overlaps between fragments, fragments can be assembled to reconstruct longer haplotypes [30].
For haplotype or allele phasing a variant discovery process is necessary beforehand. The two mainly used methods are based on Shotgun Genome Assembly (SGA) or on amplicon sequencing. SGA generates phasing information without knowledge of the surrounding sequence, the library coverage needs to be high and it is computationally very challenging to distinguish paralogous repeats from polymorphism but it does not require sequence information for the loci. Amplicon sequencing, which includes the amplification of a genomic region by PCR, requires sequence information of the target locus for primer design and can be done very effectively. However, it is not practical for large-scale projects [31].
In this work, we used a F 1 population of V. vinifera, with the aim to associate allele sequences of several FTC candidate genes with the phenotype of flowering time in order to identify alleles influencing and controlling this trait using amplicon sequencing. Gene expression was analyzed in different time courses of bud and flower development in order to further investigate and confirm the role of FTC candidate genes.

Plant material
The mapping population GF.GA-47-42 x 'Villard Blanc' was crossed in 1989 using the breeding line GF.GA-47-42 ('Calardis Musque'; 'Bacchus Weiss' x 'Seyval') and the cultivar 'Villard Blanc' (Seibel 6468 x 'Subereux'). The 151 F 1 individuals were planted in the vineyards at the Institute for Grapevine Breeding Geilweilerhof in Siebeldingen (49˚13'05.0"N 8˚02'45.0"E) in Southwestern Germany (www.julius-kuehn.de/en/grapevine-breeding) in 1996. The offspring shows notable segregation for the trait "flowering time" as the maternal breeding line GF.GA-47-42 and its parents are early flowering while the paternal line 'Villard Blanc' as well as its parents flower rather late. QTL analysis for flowering time was carried out using a SSR marker-based genetic map of the biparental population [32].
Phenotyping of the mapping population GF.GA-47-42 x 'Villard Blanc' was performed for flowering time (full bloom) in nine years (1999,(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) as described in [23] (Table 1, S1 Table). For determination of the median of flowering time for each individual, the days of the flowering period of each year were numbered whereas the first day of the flowering period was numbered with one, the second day with two, etc. These numbers were then divided by the length of the flowering period. The resulting values were used to calculate the median. Values for global radiation and accumulated temperature from November 1 st of the previous year until the day of full bloom were obtained from the DLR (www.wetter.rlp.de) and refer to the location of the vineyard at Siebeldingen, Germany. For gene expression analysis of FTC target genes, leaves, buds, and inflorescences from early flowering GF.GA-47-42 were collected at several consecutive time points starting from latent winter buds until inflorescences shortly before full bloom within the developmental cycle that was completed over the two consecutive years 2012 and 2013. Moreover, in 2013, sampling of buds on consecutive time points before dormancy in winter was continued. The development of the plants was described using BBCH codes [33]. Plant tissue from four different GF.GA-47-42 plants was harvested into liquid nitrogen. We decided in favor of single samples but many time points to detect trends in expression levels. Table 2 shows an overview of the collected samples.

FTC candidate gene prediction
For the identification and characterization of putative flowering time control (FTC) genes, functional data from well studied model species (available from [34]) was used to exploit the grapevine genome for homologous genes. Using BLAST (e-value cut off below 1e-25) [35] protein sequences of candidate genes from A. thaliana and other model species were compared against the Vitis protein sequences (PN40024-12xv0, Genoscope gene prediction 12X.v0 (www.genoscope.cns.fr/externe/GenomeBrowser/Vitis/) and the CRIBI gene prediction 12X. v2 [12]). Results were manually checked for additional evidence from the literature. For functional annotation of FTC candidate genes, the method of reciprocal best hits (RBH) [36] was applied. A RBH pair consists of two sequences from different sets of sequences, each displaying the highest genome wide score in the other data set. Genomic sequences of FTC genes were compared against protein sequences of V. vinifera and A. thaliana with blastx. If a gene displayed several transcripts, the longest sequence was used. Using tblastn the hit showing the highest score was compared back against V. vinifera coding genes. When the original query was found to have the highest score, the resulting RBH pair was considered.
To establish unique genes, we used the Vv (Vitis vinifera) prefix followed, for almost all genes, by the gene name deduced from the Arabidopsis annotation. In many cases the Vitis genome holds several putative homologs for known FTC genes from model crops, leading to low number of RBHs between Vitis and Arabidopsis genes. In order to distinguish these Vitis genes, the one with the highest BLAST score to the query gene got the name extension "a", the second best the "b".

Amplimer design
Genes for targeted allele phasing (target genes) through amplicon sequencing were selected out of the identified FTC candidate genes. The cDNA sequences of target genes were used as query in a BLAST against the grapevine reference sequence PN40024-12xv0. Genomic DNA sequences were extracted in addition to 1,000 bp from the 5'-and 3'-UTR regions. Primers were designed for overlapping amplimers (S2 Table) of up to 8 kb using the tool Primer3 [37].

DNA isolation and amplicon generation
Extraction of genomic DNA was performed from young leaf tissue. The leaf material was grounded under liquid nitrogen and subsequently used for DNA isolation with the DNeasy Plant Maxi Kit (Qiagen, Hilden, Germany) according to manufacturer's protocols. The purified DNA was quality checked via gel electrophoresis and quantified using a NanoDrop spectrophotometer (Peqlab, Erlangen, Germany). Amplicons were amplified by long range PCR (98˚C 30 sec, 15 cycles of 10 sec 98˚C, 30 sec 72˚C-57˚C, 5 min 72˚C, 25 cycles 10 sec 98˚C, 30 sec 58˚C, 5 min 72˚C and finally 2 min 72˚C).
Target gene sequences were amplified from 37 individuals of the mapping population GF. GA-47-42 x 'Villard Blanc' including the parental lines and 35 F 1 individuals with early, intermediate and, late flowering time phenotypes (S3 Table).

Library preparation and amplicon sequencing
Amplicon sequencing was carried out on a MiSeq (Illumina, San Diego, USA) in seven runs. All amplicons belonging to a respective individual were pooled in equimolar amounts, fragmented by sonification using a Bioruptor (Diagenode, Denville, USA) and subsequently used for library preparation. The libraries were prepared as recommended by Illumina (TruSeq DNA Sample Preparation v2 Guide). Adaptor-ligated fragments were size selected on a two percent low melt agarose gel to an average insert size of 500 bp. Fragments that carry adaptors on both ends were enriched by PCR. Final libraries were quantified using PicoGreen (Quant-iT, Fisher Scientific, Schwerte, Germany) on a Fluostar platereader (BMG labtech, Ortenberg, Germany) and quality checked by HS-Chips on a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, USA). Up to 20 libraries were pooled and sequenced on an Illumina MiSeq platform with 2 x 250 bp read length using the Illumina MiSeq v2 reagents. After sequencing, basecalling and demultiplexing and FASTQ file generation was performed using a casavabased in house script.

Read processing and mapping
Adapter trimming of raw reads and quality filtering of reads with a window of four consecutive bases that exhibited a quality value below 30 was performed using Trimmomatic [38]. Bases at the heads and tails of the reads with quality values below 30 were cropped using Trimmomatic. Before and after trimming the tool FastQC (www.bioinformatics.babraham.ac.uk/projects/ fastqc) was used to check the quality of the reads. Between 11.5 and 35.6% (20.2% on average [standard deviation (SD): 5.5%]) of reads were dropped through trimming. Trimmed reads were mapped to the grapevine reference sequence PN40024 12x.v2 [14] using the BWA-MEM algorithm which is suitable for long reads with default parameters [39]. Mapping was performed for each individual separately. Instead of the entire reference sequence the target gene sequences only were chosen for mapping in order to prevent false positive mapping results. The SAM format files were converted to BAM format files and sorted using SAMtools [40]. Readgroups were added and duplicated reads removed using Picard Tools (https:// broadinstitute.github.io/picard/). Besides PCR duplicates unpaired reads were removed from the mapping files. About 15% of amplicons failed to be amplified or sequencing depth was below 20.

Allele phasing of target genes
In order to separate the two alleles of the sequenced target genes (phasing), a workflow using the Genome Analysis Toolkit (GATK) [41] was established (Fig 1). After read alignment, the quality of the alignments was improved in two ways. Firstly, local realignments around InDels were performed using InDelRealigner of GATK [41] to reduce the number of misalignments. Occasionally, the presence of insertions or deletions in individuals with respect to the reference genome sequence leads to misalignments of reads to the reference, especially when InDels are covered at the start or end of a read. Such misalignments lead to many false positive SNPs. Secondly, base quality scores of reads in the aligned mapping files were recalibrated using BaseRecalibrator of GATK in order to correct for variation in quality with machine cycle and sequence context. Thus, more accurate and more widely dispersed quality scores are provided.
Using the HaplotypeCaller of GATK variants were called for each individual separately. The ploidy parameter was set to 12 for variant calling. It was performed in gVCF mode for F 1 individuals and the parental lines of the population GF.GA-47-42 x 'Villard Blanc'. Cases of allele dropout were identified, in which the missing allele leads to genotyping errors. Since we were working with an F 1 population and by applying Mendelian constraints it was possible to determine which allele was missing within the population GF.GA-47-42 x 'Villard Blanc', but its sequence remained unknown. After variant calling, resulting variant files from individuals of the population were merged using GATK's GenotypeGVCFs in order to apply further downstream steps on all samples together. At each position of the input gVCFs, this tool combines all spanning records and outputs them to a new variant file. Raw variants were hard-filtered according to GATK's "Best Practices" recommendations [42,43]. In addition, variants with read coverage depth and genotype quality below 20 were filtered out. For the determination of allele-specific sequences initially physical phasing was performed using HapCUT [30]. Fragments were defined from the sequenced reads. Haplotype-informative reads that cover at least two heterozygous variants were extracted from the aligned file using the tool extractHairs from HapCut and used for the assembly of haplotypes. The information of polymorphic sites was passed to HapCUT through a variant file. A maximum number of 600 iterations were Characterization of genes and alleles involved in the control of flowering time in grapevine used to run HapCut and the reference sequence was provided in order to extract reads covering both SNPs and InDels. Using various python scripts, intervals in which phasing could be performed in individuals of the population GF.GA-47-42 x 'Villard Blanc' including the parents and F 1 individuals were determined and homozygous alternative variants were added to the variant files. Using GATKs FastaAlternateReferenceMaker FASTA-format files with alternate sequences were created for each individual within the regions in which allele phasing could be performed.
A nomenclature system was created for the alleles of genes within the population GF.GA-47-42 x 'Villard Blanc' (S4 Table). The system distinguishes between fourteen different cases, where four, three, or two different allele sequences can be present at a locus or all sequences can be identical. Moreover, it distinguishes between various combinations of two or three different sequences. E, as in E1, E2 and E0, refers to "early" and originates from early flowering GF.GA-47-42, while L, as in L1, L2 and L0 refers to "late" and originates from late flowering 'Villard Blanc'. N means that both GF.GA-47-42 and 'Villard Blanc' share one or more alleles. N1 means that E1 and L1 are alike, while N2 means that E2 and L2 are alike. N means that either L2 and E1 or E2 and L1 are alike. Na means that E1, E2, and L1 are alike. Nb means that E1, E2, and L2 are alike. Nc means that E1, L1, and L2 are alike. Nd means that E2, L1, and L2 are alike. Descriptions for allele combinations that distinguish between which of the two alleles of one parental line is alike the two alleles of the other line (as in NaNa x NaL2) was implemented in order to be able to track patterns of allele combinations throughout QTL regions and closely neighboring genes.
Correlation analysis. To test for the correlation of an allele and the flowering time phenotype, a Wilcoxon Rank-Sum test was carried out between a dichotomous variable (the presence or absence of an allele) and a continuous variable (flowering time). The null hypothesis assumed that the median of flowering time between groups of individuals carrying a certain allele or not is equal. When p-values below 5% were found, the null hypothesis was rejected and an association between an allele and the flowering time phenotype was found to exist.

Marker development and testing of the whole mapping population
After creating haplotype specific allele sequences through amplicon sequencing and the subsequent bioinformatic pipeline, markers were designed for haplotype specific PCRs. Obtained allele sequences of target genes were scanned for InDel structures differing between the parental alleles. Variants with low coverage or low quality were filtered out. In the case that InDels were filtered out, the actual allele sequence can be greater than the calculated one. The sequence information was used for subsequent STS (Sequence-Tagged Sites) marker design with the Primer3 tool [37]. Primers had an optimum Tm of 58-60˚C, with PCR products differing in size between 100-400 bp for multiplexing purposes (S7 Table). Forward primers were labeled at the 5'end with one of the fluorescent dyes 6-FAM (blue), HEX (green), TAMRA (yellow) or ROX (red). Allele distributions were analyzed over all 151 F 1 individuals of the mapping population GF.GA-47-42 x 'Villard Blanc'. PCRs were carried out with the QIAGEN multiplex PCR kit (Qiagen GmbH, Hilden, Germany) following the instructions of the manufacturer in three multiplexes combining different product sizes and fluorescent dyes. Resulting PCR products were analyzed on an ABI 3110xl Genetic Analyzer (Applied Biosystems, Foster City, USA) and the results compared with the respective phenotype of the tested individual (i.e. early, intermediate or late flowering).

RNA extraction and sequencing
Total RNA was extracted from up to 100 mg of liquid nitrogen ground tissue using the Spectrum Plant Total RNA kit (Sigma-Aldrich, Taufkirchen, Germany) according to the manufacturer's instructions for protocol B. After on-column DNase treatment with the DNase I Digest Set (Sigma-Aldrich, Taufkirchen, Germany) the RNA was quantified. RNA-libraries for each time point were prepared according to the Illumina TruSeq RNA Sample Preparation v2 Kit using an input of 1 μg of total RNA. RNA-Seq (1x 135 bp) was performed on an Illumina Rapid HiSeq-1500 Run. One barcoded library was created for each of the time points.

RNA-Seq read processing for analysis of gene expression kinetics
Read trimming and quality control was performed as described above in "Read processing and mapping". Sequence read data are available from SRA accession SRP153932. The reads were mapped to the grapevine reference sequence PN40024 12x.v2 [14] using tophat2 [44] which is capable of performing split read mapping. The maximal intron size was set to 3000, otherwise default parameters were used. Resulting BAM-format files were sorted and indexed using SAMtools [40]. With HTSeq [45] mapped reads were counted for each gene. Using normalized read counts differential gene expression was analyzed using the R-package DESeq2 [46]. DESeq2 performs normalization by calculating a geometric mean for each gene across samples. In each sample the counts for a gene is then divided by this mean. In order to perform an analysis of expression without replicates, the counts were modeled as a smooth function of time, and an interaction term of the condition with the smooth function was included. Likelihood ratio test of DESeq2s with a reduced design, which does not include the interaction term, was then applied. Genes with small p-values from this test are those showing a time-specific effect.

Identification of FTC candidate genes
Functional data from A. thaliana and other model organisms was systematically exploited to identify FTC candidate genes in the Vitis reference genome sequence. More than 500 homologous genes were identified which are distributed over all chromosomes including the unanchored, random part of the sequence (S5 Table). Some of the genes are absent from the CRIBI annotations, but were included in the previous annotations, provided by Genoscope. To our knowledge the majority of the identified FTC candidate genes was not analyzed or even mentioned in a previous publication. As expected, an enrichment of the FTC candidate genes (75) annotated within the FTC QTL regions was found. In several cases we identified more than one homologous sequence in the grapevine genome with a single copy Arabidopsis query. In these cases not necessarily the gene with the highest sequence similarity is the one in the FTC QTL region, nor the one with the highest expression in flowering related tissues. For instance the RAV genes VvRAV1b and VvRAV1c are located within the QTL regions on chr 1 and chr 14, respectively, whereas the RAV1a is located on chr 11 outside of any FTC QTL.
Many of the FTC candidate genes are transcription factors involved in flower development and morphogenesis such as members of the AP2/EREBP family [47] and homeodomain proteins [48]. About eight MYB-transcription factors that participate in cell cycle control in many living taxa [49] were among the identified FTC candidate genes in Vitis. Several other protein families were among the FTC candidate genes, such as a dozen GRAS and FRIGIDA proteins that are involved in flowering time and plant development. FRIGIDA proteins are required for the regulation of flowering time by upregulating FLC expression. Allelic variation at the FRIGIDA locus is an important determinant of natural variation in the timing of flowering [50]. The GRAS (GAI, RGA, SCR) family is a very important family of proteins involved in flowering in grapevine. GRAS proteins participate in GA signaling, which influences numerous aspects of plant growth and development [51]. Remarkably sixteen SQUAMOSA PRO-MOTER BINDING PROTEIN (SBP)-domain proteins, that are known from other plants as transcriptional activators involved in a variety of processes such as flower and fruit development, plant architecture, GA signaling, and the control of early flower development [52] are candidates.

Allele phasing
From our comprehensive list of V. vinifera FTC candidates the 72 most promising genes were chosen as targets for amplicon sequencing (S6 Table), many of which are located in flowering related QTL regions on chr 1, 14, and 17 [23]. The average read depth of coverage was 286 (SD: 276) and for most samples sequencing depth was between 100 and 300. Variants in the analyzed lines were detected with a density between 1.02 and 1.63 variants per 100 bp most of which were SNPs.
In order to link certain alleles of the sequenced candidate genes to the flowering time phenotype, the two alleles of genes had to be reconstructed from the mix of sequenced fragments of the two alleles. The phasing of alleles was performed on the basis of sites polymorphic between the two alleles of a gene.
Aside from recombination events, a parent-offspring pair must share one haplotype for each chromosome and thus one identical-by-descent allele for every gene. Hence, Mendelian constraints could be applied to validate the obtained allele-specific sequence. Alleles of the chosen 72 target genes studied could be identified in 46 cases (S6 Table; S1 File).
In 23 cases four different allele sequences could be found, three allele sequences in 18 cases, two in four cases and in one case (VIT_217s0000g00150; VvFL) only one allele sequence, meaning that all individuals of the population were homozygous for the respective locus. This fits the expectation since grapevine is highly heterozygous. The number of allele sequences has been deduced from regions of the genes in which phasing was performed. The lengths of the phased intervals were between 204 and 8,285 bp (S6 Table).

Correlation analysis of an allele and the flowering time phenotype
Allele sequences of the progeny of the mapping population GF.GA-47-42 x 'Villard Blanc' were compared against the allele sequences of the parental lines to determine the inheritance pattern within the population for each gene. In order to find alleles correlating with the phenotype of flowering time, a correlation analysis between the phased alleles of FTC target genes and flowering time phenotypes was performed. Several sets of phenotypic data were used. For the years 1999, 2009-2016 a correlation analysis was performed using days after January 1 st of the respective year. Additionally for the years 2011-2016 values of accumulated temperature above 3˚C from November 1 st of the previous year and global radiation in KWh/m 2 from January 1 st were considered.
After the reconstruction of inheritance patterns within the parental lines and the 35 analyzed F 1 individuals of the mapping population GF.GA-47-42 x 'Villard Blanc' through the amplicon sequencing approach and subsequent bioinformatic analysis, the numbers of individuals harboring each of the alleles was determined and a correlation analysis between alleles of FTC target genes and the flowering time phenotype was performed for 43 genes. A correlation between alleles and flowering time could be observed for several genes on chr 1, 4, 14, 17, 18, and within unassigned contigs. Correlation values differed depending on whether days, accumulated temperature or global radiation was used as phenotypic data. As an example Fig  2 shows allele combinations in the parental lines of the population GF.GA-47-42 x 'Villard Blanc' and the p-values of the correlation of alleles unique to one of the lines. Values equal and below 0.05 were considered to be significant and the lower the p-value the higher is the correlation. In total for 16 FTC target gene alleles a significant correlation with either an early or late flowering phenotype could be found.
The L2 alleles, inherited from the paternal line 'Villard Blanc', of VvSEP4 (SEPALLATA 4), VvBS2, VvHUA2a, VvRAV1b, and VvGAI1 (chr 1) correlate with late flowering, strengthen the importance of the FTC QTL on chr1. The E1 alleles of the two genes VvWNK6 (V. vinifera WITH NO LYSIN KINASE 6) and VvTM6 (V. vinifera TOMATO MADS-BOX 6), both located on chr 4 and inherited from the early flowering maternal line, were found to strongly correlate with early flowering. The p-values calculated from the median (Fig 2 is p = 0.007 and values down to p = 0.003 were observed for single years. Table 3 shows the p-values of correlation for different sets of phenotypic data related to VvWNK6 and VvTM6. Most of the significant correlations are obvious regardless the year or scale of phenotyping (days after January 1 st , accumulated temperature or global radiation). The differences in correlation among years are due to the seasonal weather conditions of the respective year, which influence both the flowering time and the length of the flowering period. A significant correlation between the E1 allele of VvWNK6 and the flowering time phenotype could not be observed in 2016 for neither days after January 1 st , accumulated temperature or global radiation. In 2015, the correlation was not significant for days after January 1 st but, albeit only slightly, for the other two sets of phenotypic data. Other genes, such as VvMFT (V. vinifera MOTHER of FT and TFL1) showed significant correlation in 2016 but not in 2013.
Compared to the reference sequence, the E1 allele of VvWNK6 (chr 4) was found to harbor a variation in the terminal exon (SNP at chr4:21997435/ C ! T) leading to an amino acid exchange from threonine to methionine.

Application of the pipeline for amplicon sequencing in a heterozygous plant for subsequent marker design
Amplicon sequencing was performed in 35 F 1 individuals and the parents of the mapping population. In order to investigate the resulting allele distributions over all 151 F 1 individuals of the mapping population GF.GA-47-42 x 'Villard Blanc', STS markers were designed from the allele sequences that enabled an easy allele-specific genotyping. The information obtained from amplicon sequencing of the FTC target genes proved usable for both deduction of segregation patterns and marker design for investigating allele distribution over the whole mapping population. Table 4 gives an overview of the segregation patterns as analyzed for all 151 F 1 individuals. From 15 markers 12 showed a segregation pattern matching the segregation pattern that was obtained through allele phasing. The markers GAVBInd_019 and GAVBInd_020 were not designed using the obtained allele sequences of GF.GA-47-42 and 'Villard Blanc', since suitable InDels were not available. Therefore, these markers were designed based on InDels upstream of the phased regions. Observed product sizes can deviate from the expected ones by 1-2 bp due to the limited accuracy of the used fragment analyzing method. Observed and expected product sizes can deviate (markers GAVBInd_004, GAVBInd_014, and GAV-BInd_019) and hence segregation patterns since the measuring method cannot reliably resolve differences in a very low bp range (1-4 bp). See S7 Table for further details.
Expected data were obtained through amplicon sequencing; observed data were gained by analyzing 151 F1 individuals of the mapping population GF.GA-47-42 x 'Villard Blanc' with  Table. https://doi.org/10.1371/journal.pone.0214703.g002 Characterization of genes and alleles involved in the control of flowering time in grapevine STS markers located within the FTC target genes. ab x cd: four alleles/both parents heterozygous, hk x hk: 2 alleles/both parents heterozygous, ef x eg: 3 alleles/both parents heterozygous, lm x ll: 2 alleles/ mother heterozygous, nn x np: 2 alleles, father heterozygous. x: amplification failed. See S6 Table for further information.
Using the results of marker segregation across the 151 F 1 individuals, a correlation analysis between alleles and flowering time phenotypes was performed. The correlation results of marker analysis support those of allele phasing (

Analysis of gene expression kinetics
Variation in expression could be detected in both time courses 2012/2013 and 2013/2014 for various FTC candidate and target genes when testing for time-specific effects. Between consecutive developmental stages of bud differentiation before dormancy (August 2 nd to September 5 th , 2013 time series 1, Table 2) differences in expression could be detected for the MADS transcription factor VvTM8 as well as the protein kinase encoding gene VvWNK5. VvTM8 encodes a MIKC transcription factor whose A. thaliana homologue AtTM8 has been shown to be involved in the specification of flower organ identity [25].
In a time course of dormant buds (BBCH 0) until after bud burst when leaf formation had already begun (BBCH 11-13), 58 of the FTC candidate genes were found to show a BBCH or developmental stage-dependent expression. Several of these genes are squamosa binding proteins, MADS-and MYC transcription factors that are known to influence floral development. Most of these genes show a variation in gene expression due to an up or down regulation towards developmental stages during inflorescence maturation. In order to test for expression variation between consecutive developmental stages of bud development before inflorescence structures become externally visible, inflorescences collected after bud break were excluded  from the analysis. Genes with different expression kinetics when the time course was extended to include visible inflorescences, are those showing a clear variation in gene expression between buds and inflorescence. In total 67 of such "inflorescence-specific genes" were identified (S9 Table). After excluding inflorescences, several genes were found showing an obvious time-dependent expression. They cluster into two groups: genes upregulated in winter during bud dormancy (Fig 4, upper part) and genes upregulated towards inflorescence development (Fig 4,  lower part). Most of these genes encode BZIP-, MADS-or MYC-transcription factors, which  For three selected time points, bud/inflorescence samples and the corresponding leaf from the same node were collected and differential gene expression was analyzed between leaves and the associated bud/inflorescence. Fig 6 shows a heatmap of the FTC candidate genes with expression differences between leaves and buds/inflorescences. With few exceptions, all genes with expression differences between leaves and buds or inflorescences are downregulated or not expressed in leaves.

FTC candidate genes
A large number of FTC candidate genes inside and outside of known flowering QTLs in grapevine were identified. Although the identification relies mostly on sequence homology to previously known genes from other plants, the putative functional connection via e.g. Pfam, literature search or the performed RNA-Seq experiments substantiate the reliability of the prediction. This comprehensive gene list opens the door for investigations on e.g. flowering time networks in the future. On the one hand, compared to Arabidopsis thaliana there is probably an overestimation of FTC candidate genes in Vitis. On the other hand the high complexity and long duration of bud initiation and flower development may require a large number of genes.

Allele phasing of target genes
A workflow for the phasing of amplicon sequenced genes using Illumina short-read sequencing of a diploid organism was established and successfully applied to separate alleles in regions with a length of up to 8.3 kb. By analyzing inheritance patterns within a family of parents and F 1 individuals, we could show that the inheritance of alleles of neighboring genes within a Characterization of genes and alleles involved in the control of flowering time in grapevine QTL remains largely constant throughout the QTL. Since grapevine has a highly heterozygous genome and suffers from inbreeding depression, we used a F 1 mapping population and followed a double pseudo-testcross strategy [55]. Therefore, a lower recombination frequency Characterization of genes and alleles involved in the control of flowering time in grapevine was expected compared to typical F 2 mapping populations in other plant species. The constancy of the inheritance pattern of alleles of closely neighboring genes indicates the functionality and applicability of the established allele phasing method.
For the phasing of alleles, a mapped read or read pair needs to encompass two or more heterozygous sequence positions. The phase of the heterozygous sequence positions can be determined since each read or pair of reads is obtained from a single haplotype. Read lengths after trimming was distributed between 80 and 300 bp with an average insert size of~500 bp. When variants were located farther apart than the maximum length that could be spanned by a read pair, alleles could not be phased despite the presence of variants. Moreover, the allele frequency, calculated from the read coverage of variants can vary despite being amplified from the same allele. The amount of reads covering a variant can differ from one variant to the next. When dealing with extremely biased allele frequencies, this can lead to some variants being detected while others remain undetected. In such cases allele phasing was unsuccessful. Some amplicons could hardly be amplified at all. This is likely due to a high diversity at the primer binding sites between the reference sequence and the plant lines analyzed in this work.
The use of paired-end sequencing is highly advantageous in haplotype phasing as it covers variants that are spaced at distances longer than the technology's read length limit. Read length in high-throughput sequencing is constantly increasing and technologies are evolving rapidly. With the rise of third generation technologies, capable of producing even longer reads, many of the difficulties associated with haplotype phasing might soon be alleviated as such data may permit direct phasing from sequence reads [26].

Correlation analysis
We were able to detect a correlation between alleles of FTC target genes and flowering time for several QTL regions, which supports the role of these regions in the timing of flowering. Flowering time is highly dependent on the weather conditions of the respective and previous year. Therefore, correlation values vary between the years, as observed e.g., for VvWNK6 in 2016 ( Table 3).
Alleles of FTC target genes within a QTL region on chr 1 were found to rather correlate with late, while QTL regions on chr 4 and 14 were found to correlate with early flowering. With one exception, all analyzed F 1 individuals carrying alleles correlating with flowering time from two of the QTL regions on chr 1, 4, and 14 or all three of them were either intermediateearly, early, or very early flowering. The correlation for the QTL regions on chr 4 and 14 was more stable than for chr 1 indicating a stronger affect of these QTLs in the timing of flowering. The investigation of epistatic effects between these QTL regions could contribute to the clarification of the genetic factors that influence and control flowering time in grapevine.
Correlation values between alleles of FTC target genes and flowering time phenotypes could be largely supported by genetic marker analysis. Deviations can be due to the measuring method that can occasionally lead to deviations of up to a few bp in product size. In order to distinguish the maximum putative number of alleles at a single locus within a bi-parental F 1 population of a diploid organism, the marker needs to be capable of distinguishing between four different alleles.
Classic high informative marker analysis requires InDels / SSRs that distinguish between the maximum number of different alleles with polymorphic differences of at least two bp in size at a specific locus. The usage of blocks of tightly linked polymorphisms and treating each haplotype of these blocks as a separate allele can produce highly polymorphic markers. In addition, it also uses SNPs and InDels shorter than two bp to distinguish between the alleles. This leads to a higher resolution compared to classic marker analysis and the detection of a higher number of different alleles.
The correlation of alleles of FTC genes with flowering time phentoypes is based on the genotypic data on one hand, which is obtained through the allele phasing workflow, from amplicon sequencing, mapping and variant calling to the final establishment of allele sequences. On the other hand, the correlation analysis is based on the phenotypic data, which is also prone to errors. Phenotyping of flowering time was performed on a daily basis throughout the flowering phase. Differences in the timing of flowering shorter than one day are therefore not recorded. Moreover, phenotyping is a subjective process when different people work on the recording of phenotypic data and hence a possible error source.
As already mentioned, the timing of flowering depends clearly on environmental parameters, especially weather and climatic conditions. These are most probably non-genetic factors causing the differences in the flowering periods between the respective years. In 2016, for example, flowering in the population GF.GA-47-42 x 'Villard Blanc' started on June 17 th being very late compared to other years (Table 2). However, the flowering period was very short, ending after only 10 days on June 26 th . Global radiation is distributed between~502 and~536 KWh/m 2 at the beginning of flowering in the analyzed population and between~548 and 597 KWh/m 2 at the end of it. While flowering occurred very late in 2016 compared to other years, the amount of global radiation until the first day of the flowering period was less than in the other years. This shows that the amount of solar radiation before flowering initiation was small which might have had an impact on the timing of flowering.
In some cases the p-value of correlation is significant although the medians are nearly equal or equal. This is because the Wilcoxon Rank-Sum test is a rank sum tests and not a median test. It ranks all of the observations from both groups and then sums the ranks from one of the groups and compares it with the expected rank sum. Therefore, it is in rare cases for groups possible to have different rank sums and yet have equal or nearly equal medians.
VvHUA2a of which an amplicon sequenced allele from 'Villard Blanc' was found to correlate with late flowering is a floral homeotic gene. It's homologue in A. thaliana, HUA2, regulates the expression of the floral homeotic class-C gene AGAMOUS (AG) and FLC [56]. This suggests a role of VvHUA2 in the delay of flowering.
An allele of VvGAI1 from late flowering 'Villard Blanc' was found to correlate with late flowering. Mutants of VvGAI1 are insensitive to gibberellic acid and form inflorescences instead of tendrils. These mutants show a correlation between inflorescence development and increased VvFL expression, a floral developmental gene [57]. In A. thaliana, GAI acts as a repressor of LFY and SOC1 and thus represses flowering.
From the amplicon sequenced and early flowering individuals (median data) of the population GF.GA-47-42 x 'Villard Blanc', 90% were found to carry the VvTM6 E1 allele inherited from GF.GA-47-42. Only 10% of plants that carry the other maternal allele are early flowering. VvTM6 is a MADS-box B-class floral identity gene and influences the development of petals and stamen. In A. thaliana, mutants exhibit a transformation of petals to sepals and stamen to carpels. B-class floral homeotic genes either belong to the paleoAPETALA3 or to the PISTIL-LATA (PI) gene lineage, which are paralogous and resulted from a duplication event before the emergence of angiosperms. The paleoAP3 lineage underwent a further duplication event at the base of the core eudicots resulting in the two sublineages euAP3 and TM6 (named after the Tomato MADS-box gene 6) [57]. A TM6 homologue is absent in A. thaliana [53,58,59]. In grapevine, all three B-class floral homeotic genes were found to be highly expressed in inflorescences (Fig 5) but not in leaves (Fig 6). [25] showed that VvTM6 (VvAP3.2) is expressed in fruits, while the expression of VvAP3 (VvAP3.1) and VvPI is more restricted to flowers. Also, [53] showed that the expression of VvTM6 is higher in carpels, fruits, and seeds than in petals. Due to the expression of VvTM6 in carpels and during berry development and ripening, it was suggested to play an important role in grapevine fruit development [25]. The expression of VvTM6 increases towards inflorescence maturation, which is followed by berry formation and ripening. This is consistent with its role during berry development and ripening.
All early flowering amplicon sequenced individuals of the population GF.GA-47-42 x 'Villard Blanc' were observed to carry the E1 allele of VvWNK6 (Fig 3). In A. thaliana WNK6 has been shown to be involved in circadian rhythm [60]. WNKs are a subfamily of serine/threonine protein kinases with a lysine residue essential for ATP-binding, which is located in kinase subdomain I instead of subdomain II as common among all other kinases [61]. It has been suggested that WNK gene family members regulate flowering time in A. thaliana by modulating the photoperiod pathway. For instance, APRR3, a component of the clock-associated APRR1/ TOC1 quintet is a substrate of WNK1 in A. thaliana. T-DNA knockout mutants of AtWNK1 are delayed in flowering time while T-DNA knockout mutants of AtWNK2, 5, and 8 flower early [62]. WNK6 transcription is downregulated in AtABI4 mutants, which show an early flowering phenotype [63]. In A. thaliana, ABI4 negatively regulates flowering through directly promoting FLC transcription, a negative regulator of flowering [64]. This might indicate that VvWNK6 is involved in the delay of flowering. VvWNK6 expression was detected in leaves, buds, and inflorescences of the early flowering GF.GA-47-42. Both alleles E1 and E2 are expressed at a similar level. However, all individuals of the mapping population carrying the E1 allele of VvWNK6 flower early. This suggests that either the E1 allele of VvWNK6 itself might contribute to early flowering or alleles of other nearby-genes inherited together with E1 of VvWNK6. Further analysis should include the investigation of sequence variations leading to an alteration of the amino acid sequence and the functionality of the protein.

Gene expression kinetics
Many of the analyzed FTC candidate genes show variations in expression pattern in the course of the developmental cycle, supporting their role in flowering time control. Genes coding for transcription factors and other proteins involved in inflorescence architecture, floral transition and flower development are usually upregulated after bud burst, while genes coding for proteins that repress flowering in diverse manners typically show an upregulation during bud dormancy (Fig 5). Among the genes showing downregulation towards bud burst and inflorescence maturation are transcription factors involved in circadian rhythm such as VvGRP2A (Glycine Rich Protein 2A), VvRVE1 (REVEILLE1), VvTICb (TIME FOR COFFEE) and VvELF3 (EARLY FLOWERING3). It is not unexpected to detect different gene expression kinetics for genes involved in circadian rhythm since sampling was performed at the same time of the day over the entire time course. However, the period from daybreak until the time of sampling varies throughout the year and the different seasons.
AtGRP7, the homologue of VvGRP2A in A. thaliana, undergoes circadian oscillations with peak levels in the evening [65]. RVE is a MYB-like transcription factor that controls auxin levels, promotes free auxin and hence plant growth during the day [66]. TIC and ELF3 are components of the circadian clock in A. thaliana. ELF3 is a circadian clock gene that contributes to photoperiod-dependent flowering in plants [67][68][69]. Our findings thus suggest a similar impact of these genes in grapevine.
Moreover, genes coding for transcription factors involved in GA biosynthesis were found to be upregulated during bud dormancy. GAs are inhibitors of flowering in many fruit species but their role in grapevine varies with the stage of bud development. The initiation and development of lateral meristems is promoted by GAs as well as their development into tendrils, while inflorescence development is suppressed by GAs. Thus GA is a promoter of flowering at an early stage but acts as an inhibitor of flowering later on and promotes vegetative growth [19]. SPY (SPINDLY), whose Vitis homologue VvSPY was found to be upregulated during bud dormancy, is a negative regulator of GA response in A. thaliana and functions with GI (GIGANTEA) in pathways controlling flowering [70]. In Vitis the role of SPY in GA signaling is still unclear. It could be shown that treatment of grapevine plants at pre-bloom stage with GA led to rachis elongation and a downregulation of VvSPY in the rachis [71]. In A. thaliana GA signaling is initiated through its binding to the GA INSENSITIVE DWARF1 (GID1) receptors. This allows subsequent interaction between GID1 and DELLA proteins (GA INSENSITIVE [GAI], REPRESSOR OF GAI-3 [RGA], RGA-LIKE1 [RGL1], RGL2, and RGL3). DELLA proteins are transcriptional repressors and downregulate GA response genes. In the presence of gibberellin, the stable GID1-GA-DELLA complex is recognized by the SCF SLY1 complex which ubiquintylates the DELLA proteins and causes their degradation by the 26S proteasome [72,73]. It has been reported previously that GID1-transcripts are upregulated during bud dormancy in grapevine while transcripts of DELLA are downregulated [74]. Similarly, we found that the GID1B receptor transcript is upregulated during bud dormancy while the DELLA-protein SLR1-like (SLENDER RICE 1 LIKE) are downregulated. This confirms the promoting role of GID1B in plant growth, and the development of lateral meristems in dormant buds and indicates that SLR1-like is responsive for the mediation of the suppression of inflorescence development through GA.
In our analyses, numerous other genes involved in the repression of floral transition and flower development were found to be upregulated during bud dormancy. HUA2-like genes, which play a role in the repression of floral transition [75], are upregulated during bud dormancy in Vitis. The KNOTTED1-like homeobox gene BP (BREVIPEDICELLUS) was found to be upregulated towards grapevine bud burst and inflorescence maturation. In A. thaliana BP controls distal pedicel growth and thus inflorescence architecture [76,77]. ER (ERECTA) and other KNAT (KNOTTED-LIKE) genes, are involved in inflorescence architecture in A. thaliana [78,79], were also found to be upregulated towards bud burst, which indicates their function in inflorescence development. Genes for SQUAMOSA promoter-binding proteins, known to be involved in flower development [80], were downregulated during bud dormancy while upregulated during flower formation in grapevine. The BEL-like gene (VvBELa and b) and the Vitis STM orthologue VvSBH1 were also found to be upregulated during bud dormancy. STM and the A. thaliana homeobox-gene BEL1 build a complex, which maintains the indeterminacy of the inflorescence meristem [81].
MYC transcription factors VvbHLH74 and VvbHLH63 show large variations in gene expression over time with a peak in expression around March when buds are swelling. CIB1 (cryptochrome-interacting basic-helix-loop-helix), the A. thaliana homologue of VvbHLH63, plays a role in CRY2 (cryptochrome 2)-dependent regulation of flowering time. Cryptochromes (CRY) are blue-light receptors that mediate light response. In yeast and A. thaliana, CIB1 interacts with CRY2 when blue light is available. It promotes CRY2-dependent floral initiation together with additional CIB1-related proteins and stimulates FT transcription [82]. Hence, VvbHLH74 and VvbHLH63 might be involved in light dependent floral initiation.
ELF-like genes as well as a CONSTANS-like gene (VvCOL16) and CDF genes (CYCLING DOF FACTORS) were upregulated during bud dormancy. DOF proteins delay flowering by repressing CO transcription [83]. ELF3, ELF4, and TOC1 function in the primary, phytochrome-mediated light-input pathway to the circadian oscillator in A. thaliana. TOC1 is necessary for light-induced CCA1 (CIRCADIAN CLOCK ASSOCIATED 1)/ LHY (LATE ELONGATED HYPOCOTYL) expression [84]. Mutants of elf4 show attenuated expression of CCA1 and early flowering in non-inductive photoperiods, which is probably caused by elevated amounts of CONSTANS (CO), a gene that promotes floral induction [85]. ELF4 is a flowering pathway gene that may play a key role in signaling processes regulating dormancy induction in grapevine [86].
MBD9, whose A. thaliana homologue-AtMBD9 is related to the inhibition of flowering [87] and suggested to have a role in bud development through an interaction with FLC in A. thaliana [86], is upregulated during bud dormancy in grapevine. VvSPAR2 (SUPPRESSOR OF PHYA RELATED2) is upregulated during bud dormancy and downregulated towards inflorescence development. Its homologue in A. thaliana represses photomorphogenesis by negatively regulating the transcription factor HY5 (ELONGATED HYPOCOTYL 5), which promotes photomorphogenesis [88,89].

Conclusion
Here, we have reported a new workflow for amplicon sequencing including allele phasing in the highly heterozygous species grapevine. Our genetic association study revealed a significant correlation between alleles of selected FTC target genes and flowering time phenotypes within and outside of previously mapped QTL regions for flowering time on chr 1, 4, 14, 17, and 18. The discovery of a correlation between alleles of FTC target genes and the timing of flowering for genes within previously defined QTL regions supports the role of these QTLs in the timing of flowering. The analysis of gene expression kinetics revealed strong changes in expression pattern for many FTC candidate genes over the consecutive developmental stages. A shift between an up-or downregulation in expression mostly occurred between dormant and swelling buds, or toward inflorescence maturation when the young inflorescence structures at the shoots grow out of the buds and become externally visible. These time-dependent expression profiles underline the role of many FTC candidate genes in the control of flowering time. Moreover, many FTC candidate genes were found to be expressed in buds and inflorescences but not in leaves. This tissue specificity further confirms their role in flowering time and floral development.
The knowledge of genes and loci that influence flowering time and play a role in early flowering may allow the selection of genotypes not carrying these alleles through grapevine breeding programs. To meet the expected change of climate conditions late flowering cultivars might be better adapted, especially in the present cool climate areas.
For future research, grapevine cultivars are to be analysed for alleles of flowering time control genes correlating with early or late flowering in order to further investigate the role of these alleles in the timing of flowering and study epistatic and additive effects between QTL regions influencing the timing of flowering.