Genome-wide association study of cassava starch paste properties

An understanding of cassava starch paste properties (CSPP) can contribute to the selection of clones with differentiated starches. This study aimed to identify genomic regions associated with CSPP using different genome-wide association study (GWAS) methods (MLM, MLMM, and Farm-CPU). The GWAS was performed using 23,078 single-nucleotide polymorphisms (SNPs). The rapid viscoanalyzer (RVA) parameters were pasting temperature (PastTemp), peak viscosity (PeakVisc), hot-paste viscosity (Hot-PVisc), cool-paste viscosity (Cold-PVisc), final viscosity (FinalVis), breakdown (BreDow), and setback (Setback). Broad phenotypic and molecular diversity was identified based on the genomic kinship matrix. The broad-sense heritability estimates (h2) ranged from moderate to high magnitudes (0.66 to 0.76). The linkage disequilibrium (LD) declined to between 0.3 and 2.0 Mb (r2 <0.1) for most chromosomes, except chromosome 17, which exhibited an extensive LD. Thirteen SNPs were found to be significantly associated with CSPP, on chromosomes 3, 8, 17, and 18. Only the BreDow trait had no associated SNPs. The regional marker-trait associations on chromosome 18 indicate a LD block between 2907312 and 3567816 bp and that SNP S18_3081635 was associated with SetBack, FinalVis, and Cold-PVisc (all three GWAS methods) and with Hot-PVisc (MLM), indicating that this SNP can track these four traits simultaneously. The variance explained by the SNPs ranged from 0.13 to 0.18 for SetBack, FinalVis, and Cold-PVisc and from 0.06 to 0.09 for PeakVisc and Hot-PVisc. The results indicated additive effects of the genetic control of Cold-PVisc, FinalVis, Hot-PVisc, and SetBack, especially on the large LD block on chromosome 18. One transcript encoding the glycosyl hydrolase family 35 enzymes on chromosome 17 and one encoding the mannose-p-dolichol utilization defect 1 protein on chromosome 18 were the most likely candidate genes for the regulation of CSPP. These results underline the potential for the assisted selection of high-value starches to improve cassava root quality through breeding programs.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.  If the data are held or will be held in a public repository, include URLs, I am pleased to submit an original research article entitled "Genome-wide association study of cassava starch paste properties" for consideration for publication in PLOS One.
Improvements in starch quality are needed to meet the demands of food processing and industrial applications. The pasting properties determine the starch quality and functionality. Dissection of the genetic basis of pasting properties will help to improve starch quality in cassava.
Genome-wide association studies (GWAS) provide an opportunity to decipher genetic architectures of complex traits in crops. GWAS  The results indicate that these traits are strongly correlated and possibly under the control of the same gene.
The regional marker-trait associations on chromosome 18 indicate a LD block between 2907312 and 3567816 bp. In addition to this genomic block associated with several pasting property traits, another portion of chromosome 18 was also identified, which was specifically associated with SetBack at position 17771889. These results indicate that the genetic control of Cold-PVisc, FinalVis, Hot-PVisc, and SetBac generally had additive effects. Therefore, increasing the frequency of these favorable alleles in the right direction would significantly improve the quality of cassava starch.
One transcript encoding glycosyl hydrolases family 35 enzymes on chromosome 17 and one encoding the mannose-p-dolichol utilization defect 1 protein on chromosome 18 were the most likely candidate genes for the regulation of CSPP. These candidate genes and significant SNPs provide the guidance for further understanding the molecular mechanisms of starch pasting properties. These results have promise in the practical implementation of assisted selection to improve cassava starch quality through breeding programs.
This manuscript has not been published and is not under consideration for publication elsewhere. We have no conflicts of interest to disclose, and we have no opposed reviewers. production. In addition, it is a staple food for approximately 800 million people in tropical countries of 55 3 for food security, especially in several African countries [2]. In addition to its consumption fresh, cooked, 57 fried, or in the preparation of typical dishes, cassava is widely used in the food industry more broadly, 58 mainly in the form of dehydrated chips, cassava flour, and starch [3]. Cassava starch has unique 59 physicochemical characteristics, such as a high swelling capacity, high peak viscosity, and low 60 retrogradation tendency [4]. Starch retrogradation is generally considered to have undesirable effects on 61 food products, but is desirable in some applications, such as in the production of breakfast cereals, 62 dehydrated purees, and resistant starches, which have several nutritional properties [5]. Starch is composed 63 primarily of two glucose polymers: amylose, which takes the form of a predominantly linear chain, and 64 amylopectin, which is highly branched and contains many short-chain clusters. Starch granules are 65 generally composed of an amorphous region interspersed with concentric semicrystalline rings, which 66 exhibit alternating amorphous and crystalline materials [6,7]. 67 Starches from different crops exhibit great variation in their properties, which can be amplified by 68 physical (pregelatinization, hydrothermal, and non-thermal processes) and chemical modifications 69 (oxidation, esterification, and etherification) [8]. Native cassava starch has a lesser retrogradation tendency 70 than cereal and potato starches. The different forms (round, oval, polyhedral), sizes (3 to 110 μm), and 71 especially amylose and amylopectin contents of starches provide significant differences in their properties. 72 In addition, starch properties can be altered by the presence of other components, such as proteins and lipids 73 [9]. 74 The pasting and gelatinization properties of starch are important because they define its industrial 75 applications and have a strong impact on the quality of the final product [10]. In addition, current changes 76 in consumers' eating habits have led to a preference for less-processed products, thus increasing the demand 77 for native starches with specific profiles for industrial use. 78 The improvement of pasting properties has become a key objective of cassava breeding programs 79 [11,12]. Genetic studies of starches, especially of QTLs (quantitative trait loci) that control pasting 80 properties and their effects on food quality, have been performed to understand the genetic basis of these 81 quantitative traits. However, these mapping efforts rely exclusively on related genotypes, examining only 82 one small panel of the full genetic diversity of cassava germplasm [11,13] Brazilian states and other countries (e.g., Colombia, Mexico, Venezuela, Panama, and Nigeria) (S1 Table). The planting region is located in a geomorphological unit known as a coastal board, which presents 124 characteristics such as deep soils, flat or slightly undulating topography, and the occurrence of cohesive 125 horizons in some soils. It has a warm and humid tropical climate (Aw to Am according to the Köppen 126 classification) with an average annual rainfall of 1,170 mm (varying from 900 to 1,300 mm) and an average 127 annual temperature of 24.1°C. During the plant growth period, March to August was the rainiest, while 128 September to February was the driest. 129 Starch extraction from cassava roots 130 The analyzed roots were selected during the harvesting process based on their commercial pattern 131 (i.e., no mechanical damage). Subsequently, these roots were transported to the laboratory, washed with 132 running water to remove impurities from the soil, and then processed. The root pulp was sectioned into 133 cubes, with at least 2 kg of root material processed per accession. The roots were ground in a blender with 134 a non-cutting propeller for 1 minute at a root-to-cold water ratio of 1:1. This process was repeated once 135 after a 1-minute break. 136 For particle size analysis, the ground root mixture was filtered through voile-type fabric placed on 137 a sieve (100 μm) over a 5-liter plastic bucket. Thereafter, the slurry was washed with 3.5 L of cold water 138 for starch extraction. The filtrate was then placed in a cold room at 5°C for 12 hours for decantation of the 139 starch. After this time, the supernatant was discarded and the decanted starch at the bottom of the vessel 140 was washed with 20 mL of 95% ethyl alcohol to accelerate the drying process. The alcohol was then 141 6 discarded. Then, the starch was transferred to aluminum trays and conditioned in an oven with forced air 142 circulation at 40°C until completely dry (10-14% moisture). The dry starch was gently macerated with the 143 aid of a mortar and pestle to obtain a fine powder. The starch was then packed in plastic bags, vacuum-144 sealed, and stored at 5°C for further analysis. The starch suspension was then subjected to the following schedule (temperature/time): 50°C for 1 minute; 153 heating from 50 to 95°C (6°C/min); maintenance of the paste at 95°C for 2.5 minutes; cooling from 95 to 154 50ºC (6ºC/minute) and maintenance of the paste at 50ºC for 2 minutes. The suspension was then shaken at 155 160 rpm throughout the analysis, which was performed twice. 156 Paste viscosity was expressed in centipoise (cP) and temperature was expressed in °C. The total 157 analysis time was 13 min for each duplicate. During this period, the following characteristics were 158 evaluated: pasting temperature (PastTemp), peak viscosity (PeakVisc), hot paste viscosity (Hot-PVisc), 159 cool paste viscosity at 50°C (Cold-PVisc), final viscosity (FinalVis), breakdown (BreDow, which is the 160 difference between PeakVisc and Hot-Pvisc), and setback (SetBack, which is the retrogradation tendency 161 or difference between Cold-PVisc and Hot-Pvisc). 162

Data analysis
163 Statistical analyses were performed in R software version 4.0.1 [25] unless otherwise noted. The 164 BLUPs for each of the pasting properties were estimated using the lme4 package from R based on the 165 following mixed linear model: y = Xu + Zg + e, where y is the vector of phenotypic data, u is the general 166 mean (fixed effect), g is the vector of the genotypic effects of the accessions (assumed as random), e is the 167 vector of errors or residuals (random), and X and Z represent the incidence matrices for the respective effects 168 of u and g of the mixed model. The lme4 package was also used to estimate the variance components 169 obtained by the restricted maximum likelihood (REML) and broad-sense heritability (ℎ 2 ). The ℎ 2 values 170 of the traits were estimated according to ℎ 2 = 2 2 + 2 , where 2 is the genotypic variance and 2 is the 171 residual variance. 172

DNA extraction
173 DNA was extracted from young leaves according to the CTAB (cetyltrimethylammonium 174 bromide) protocol as described [26] with few modifications, being the addition of polyvinylpyrrolidone 175 (PVP) and increasing the concentration of 2-mercaptoethanol to 0.4%. DNA quality was assessed by 1. Biolabs, Boston, USA) at 65°C for two hours followed by visualization on agarose gel. 183

Genotyping-by-sequencing (GBS)
184 DNA samples were genotyped at the Genomic Diversity Facility of Cornell University 185 (http://www.biotech.cornell.edu/brc/genomic-diversity-facility). The basic protocol for GBS has been 186 described by [27]. As recommended by [28], the DNA was digested by the enzyme ApeKI, which is a type 187 II restriction endonuclease that recognizes a 5-base degenerate sequence (GCWGC, where W is A or T) in 188 lengths of 100 bp. Binding between the ApeKI cut-off fragments and the adapter was performed after the 189 digestion of the samples and implementation of a multiplex system with 192 samples for sequencing. GBS The SNPs were subjected to a quality filter in TASSEL software [29], where the tags with more 194 than 20% missing data and those with an MAF < 0.05% were discarded. To test the association between 195 pasting properties and gene sequences, the Genome Association and Prediction Integrated Tool (GAPIT) for the marker test. The covariates are adjusted through forward and backward stepwise regression of the 208 mixed model. FarmCPU is also a multi-locus analysis given by the formula: = + + . The latter 209 method incorporates multiple markers as covariates in a stepwise MLM to partially remove the confounding 210 data between testing markers and kinship. FarmCPU uses both the fixed-effect model and the random effect 211 model iteratively. The derived from the markers is used to select the associated ones using the maximum 212 likelihood method. 213 The population structure was inferred using a PCA using the population parameters previously 214 determined (P3D) [31], while the relative kinship coefficients of individual genotypes based on identity 215 (i.e., state) were estimated according to the VanRaden method [32] using all SNP markers. 216 The p-value obtained for each SNP was transformed into a logarithmic scale and later presented 217 in circular Manhattan and quantile-quantile (QQ) plots created using CMplot packages [33]. The significant 218 associations between SNPs were corrected using the Bonferroni correction for multiple tests at the 5% and 219 1% levels of significance (0.05/23,078 and 0.01/23,078, respectively). Only traits that exceeded the 220 threshold value and were consistent across GWAS methods have been reported in this study. The percent 221 phenotypic variance (PV) explained by all significant SNPs was calculated as the squared correlation 222 between the phenotype and genotype of the SNP. 223 Linkage disequilibrium was estimated using the correlation coefficients (r²) between the pairs of 224 loci on each chromosome using the sommer package in R [34], while the IntAssoPlot package [35] provided 225 9 an integrated graphic of GWAS results with linkage disequilibrium (LD) and gene structure information. 226 To investigate the LD decline, r² values were plotted as a function of genetic distance in kilobases (kb) 227 using loess regression. 228 We used the find.cluster function of the adegenet package in R [36] to identify the number of 229 clusters of the germplasm analyzed using the kinship matrix. This function implements successive K-means 230 running with an increasing number of clusters (k) after transforming data using a PCA and then computing 231 the goodness of fit (Bayesian Information Content) for each model to choose the optimal k. Then, the 3D 232 PCA was used to group the cassava germplasm of each k cluster. 233 In silico SNP annotation  In terms of broad-sense heritability (ℎ 2 ), the variations for the numerous traits associated with 256 pasting properties were of moderate to high magnitude ( Table 1)  The PCA plot based on the kinship matrix grouped the 876 cassava clones into five clusters (Fig 275 3, Table S1). In general, weak relationships were identified among cassava accessions based on genomic 276 kinship due to the evolutionary process and differentiated domestication processes of the accessions in this 277 collection. The accessions were grouped into five clusters with sizes ranging from 115 to 271. Group  The LD values were calculated for each pair of SNPs (Fig 3). We observed that LD declined to 295 background levels (r 2 < 0.1) of between 0.3 and 2.0 Mb in most cassava chromosomes (Fig 3). However, 296 chromosome 17 showed atypical behavior, with an extensive megabase-scale LD. Therefore, the LD 297 decayed to r 2 < 0.1 at a physical distance of ~20 Mb for chromosome 17. The relatively rapid LD decays 298 observed in this germplasm panel, especially for chromosomes 4, 6, 7, 8, 9, 10, 11, 12, 13, 15, and 16, are 299 indicative of its potential for reducing QTL intervals and fine mapping in several GWAS analyses. Cold-PVisc, and Hot-PVisc, respectively). 320  SNPs S18_3399799 and S18_3399801 (separated by only two bases) also showed significant and  Fig 4).
The FarmCPU method also revealed the presence of another SNP that was positively associated (81.72 cP) with SetBack ( Table 2, Fig 4) as well as two SNPs on chromosomes 3 (S3_5856578) and 8 (S8_32339820) with negative (-0.46ºC) and positive (1.26ºC) effects on PastTemp, respectively. On the other hand, only one SNP significant for the PeakVisc trait was found, on chromosome 18 (S18_13522313). S18_13522313 was consistently identified by the three GWAS methods with negative effects (-254.66 cP).
Even with the high correlation of PeakVisc × BreDow (0.86) (Fig 1), no SNPs associated with BreDow were identified. This may indicate that this correlation is due to pleiotropic effects and not physical linkage.

2
The variance explained by these markers was very close for SetBack, FinalVis, and Cold-PVisc (r 2 range: 0.13 to 0.18), while the low variances of the PeakVisc and Hot-PVisc traits were explained by the SNPs (r 2 range: 0.06 to 0.09) ( Table 2).
A more detailed analysis of chromosome 18 indicated a block of markers in high LD between positions 2907312 and 3567816. The regional marker-trait associations of SetBack (S1 The genotypes of SNP S17_17327003 exhibited significant differences for Cold-PVisc and FinalVis (i.e., the CG contents were approximately 379 and 329 cP higher than in the GG genotypes) ( Fig   5). On the other hand, SNPs S18_2907312, S18_3081635, S18_3399799, S18_3399801, S18_3567791, and S18_3567816 showed that homozygous genotypes of favorable alleles (A, G, G, A, G, and T, respectively) had high Cold-PVisc, FinalVis, Hot-PVisc, and SetBack, while heterozygous and homozygous alternative alleles exhibited medium and low pasting property traits, respectively. The same trend was identified for SNP S18_3408138, in which the AA, AC, and CC genotypes had ~1,163 cP, 2,315 cP, and 2,937 cP, respectively, for Cold-PVisc, and 1,141 cP, 2,285 cP, and 2,890 cP, respectively, for FinalVis. These results indicated that the genetic control of Cold-PVisc, FinalVis, Hot-PVisc, and SetBack generally had additive effects. Therefore, increasing the frequency of those favorable alleles in the right direction would significantly improve the quality of cassava starch. On the other hand, for SNP S18_17771889, the genotypes TT and CT were similar for SetBack (1,240 cP and 1184 cP, respectively), while the alternative homozygous form was significantly different from the other genotypes (1,126 cP). The same was identified for the significant SNP for PeakVisc (S18_13522313).
The genotypes AA and AG of S18_13522313 had ~5,000 cP, while the genotype GG had ~4,700 cP.
Therefore, it is possible to speculate that the latter SNP on chromosome 18, which is completely separated from the other LD block, had a dominance effect on the expression of the setback trait in cassava starch.

In silico annotation of SNPs associated with pasting properties of cassava starch
For PastTemp, we identified 4 and 12 transcripts, in SNPs S3_5856578 and S8_32339820, respectively, most of which have unknown functions (S3 Table). The transcript of SNP S3_5856578 is associated with polyketide cyclase/dehydrase and lipid transport (Manes.03G059600), which is responsible and Manes.08G162100 is related to phosphatidylinositol 3-and 4-kinase.
SNPs S17_17327003 and S18_3408138 were significantly associated with Cold-PVisc and FinalVis and had 10 and 1 transcripts, respectively (S3 Table). Most transcripts from S17_17327003 have known functions that are largely related to organic solute transporters (Manes.17G040100), beta-  Indeed, starch pasting properties can be affected by environmental conditions since the contents of amylose and amylopectin (the main constituents of starch [9]) vary according to the environment [39].
In the present study, the environment had little influence on the expression of starch properties in the cassava germplasm, predominantly from Latin America, possibly because the accessions were cultivated in only one region, which may have been reflected in the high heritability of these traits.
Regarding the correlations between the traits associated with starch pasting properties, other studies of cassava have also shown a negative correlation between PastTemp × PeakVisc and Hot-PVisc (r = -0.06 and -0.19, respectively) when analyzing biofortified cassava accessions [40]. In another study, [41] also identified a negative correlation between PastTemp × FinalVisc and BreDow (r = -0.58 and -0.55, respectively). In addition, the latter authors reported a strong and positive correlation between PeakVisc × BreDow (r = 0.98) and a high positive correlation between Setback × BreDow (r = 0.64).
According to [42], in general the high correlations among starch pasting properties are due to the nature of the RVA analysis. A potential explanation for the high positive correlation among some of those traits may be that when the temperature reaches 95 °C and decreases to 50 °C (Cold-PVisc), the molecules tend to reorganize (SetBack), which can increase viscosity and maintain it until the end of the analysis (FinalVis).

Density of markers, population structure and linkage disequilibrium
The high density of markers present in the cassava genome significantly affects GWAS efficiency, since in general it tends to improve the resolution of the mapping, and thus the accuracy of genomic associations [7,43]. This effect of marker density is determined by the extent of linkage disequilibrium (LD), since low LD requires a higher density of markers to provide better mapping resolution. However, when LD is higher, a lower density of markers can be used, although, in some cases this leads to low resolution [44]. The mean marker density observed in this study, even after the quality filters were adopted, showed broad representativeness of the cassava genome, with a distribution of 1 SNP/23.31 kb.
Linkage disequilibrium (LD) studies have been used routinely with cassava as a complementary approach to studies of genomic selection [45], genetic diversity and population structure [46,47], QTL mapping [13], as well as GWAS studies [15,17,18,23,47,48]. The extent of LD is a determining factor for GWAS efficiency, and among the methods used to visualize LD decay, the coefficient of determination (r²) approach has been widely used [49]. This method is based on the plotting of LD estimates as a function of the genetic distance in base pairs (bp) between SNPs [50,51].
In the present study, LD decay (r 2 < 0.1) occurred in the region between 0. In another study of cassava, [15] used a panel of 672 accessions from Africa genotyped with 72,279 SNP loci and reported that the LD extended to a distance of 2Mb (r 2 < 0.10). Therefore, our results demonstrate a lesser extent of LD in the cassava germplasm from Latin America, which can be quite useful for trait introgression. Throughout the genome, the extent of LD can be highly variable both within and between populations of the same species. These differences are related, in particular, to the complexity and size of the genome as well as the number of markers used to achieve all existing genetic variations [53].
Evolutionary factors such as mutation, recombination, selection, genetic drift, and crossing patterns also have a strong impact on the extent of LD [54,55]. In general, prior understanding of the LD decay pattern improves statistical power and decreases the rate of false positives in genomic association analyses that target the discovery of QTL-bound genes [56]. 8 GWAS studies have great potential to detect associations between markers and phenotypes of interest. They are directly dependent on a high number of molecular markers that are well-distributed throughout the genome, in addition to a sufficiently large and genetically diverse population [48]. In the present study, the detection of genomic regions associated with the starch pasting properties was carried out based on a diverse set of cassava genotypes (876 accessions) with high phenotypic variation for several agronomic attributes, such as pulp color of the roots, cyanide content, plant height, root yield, dry matter content, and size and shape of the starch granules [57], as well as high molecular diversity [46].

Genome-wide association study for cassava starch traits
Based on the analysis of this germplasm, the 13 SNPs associated with the PeakVisc, Hot-PVisc, SetBack, Cold-PVisc, PastTemp, and FinalVis traits can guide MAS to improve starch pasting properties.
To the best of our knowledge, to date this is one of the first reports of the use of GWAS for mapping these traits in cassava starch. Of the 13 SNPs, 10 are on chromosome 18, and the rest on chromosomes 3, 8, and 17. Of the 10 SNPs on chromosome 18, eight are in an LD block that covers ~660 kb from SNP S18_2907312 to S18_3567816. In the modern cassava breeding pipeline, the MAS of these co-associated genetic loci could simultaneously improve the multi-target traits of cassava root quality. All told, six different genomic regions in high LD with the PeakVisc, Hot-PVisc, SetBack, Cold-PVisc, PastTemp, and FinalVis traits in cassava starch were identified. Similarly, in waxy maize, there are also reports of the presence of QTL clusters co-associated with two or more RVA parameters in conventional biparental QTL mapping [58]. In total, eight QTLs, for BreDow, PeakVisc, SetBack, FinalVis, SetBack, pasting time, and trough viscosity were co-located on chromosomes 2, 3, 4, 5, 7, and 9.
The first attempt to identify QTLs and candidate genes associated with cassava starch paste properties was performed by [11], using a traditional biparental population with 100 lines. In that study, the genetic mapping based on the mean of the field trials identified 15 QTLs affecting five starch pasting viscosities. The most important QTL was co-localized in linkage group 1 for PastTemp (qPT.1LG1).
Although most of the reported QTLs are present in some environments only, [11] also reported one QTL controlling several of the starch pasting properties, such as PeakVisc, Hot-PVisc, Cold-PVisc, and SetBack (RY08.1LG1).
The r 2 explained by the SNPs related to cassava starch paste properties ranged from 0.13 to 0.18 for SetBack, FinalVis, and Cold-PVisc, and from 0.06 to 0.09 for PeakVisc and Hot-PVisc (Table 2). A similar range of r 2 explained by molecular markers was found in maize, in which 72 QTLs detected for 9 seven RVA parameters accounted for 0.06 to 0.18 of the phenotypic variation by analyzing 198 families derived from a cross between two waxy maize parents [58]. In contrast, in biparental QTL mapping of cassava, [11] reported a large phenotypic variance explained by the qPT.1LG1 (0.48) for PastTemp and a range of 0.13 to 0.37 for PeakVisc, Hot-PVisc, Cold-PVisc, and SetBack. However, the limited sample size in mapping populations has adverse consequences on QTL inferences. Through simulations, [59] demonstrated that a limited sample size leads to serious underestimation of the number of QTLs that affect the trait, and secondarily, to overestimation of the effect of any QTL. This could explain the high phenotypic variance of the cassava starch paste properties in the study by [11], who used only 100 lines of an F1 mapping population (Huay Bong 60 × Hanatee).
Regarding the gene action, it is possible to speculate that the traits Cold-PVisc, FinalVis, Hot-PVisc, and SetBack have additive effects due to the significant differences between the genotypes that have one or two copies of the favorable alleles, especially in the LD block between SNP S18_2907312 and S18_3567816. On the other hand, genotypes with at least one copy of the alternative alleles, e.g., TT and CT for S18_17771889 and AA and AG for S18_13522313, showed higher SetBack and PeakVisc averages, respectively (without a significant difference between these two genotypes), whereas the homozygous genotypes for the favorable alleles showed the lowest averages for SetBack (CC) and PeakVisc (GG).
Therefore, a dominance effect may be responsible for the genetic action of these two most distant SNPs on chromosome 18.
The traits Hot-PVisc, Cold-Pvisc, and FinalVis are indicative of the ability of the sample to form a viscous paste after the cooking and cooling process. In general, higher Cold-PVisc and FinalVis occur due to the reassociation of amylose molecules during the cooling phase [5]. Also, they are related to the final texture of the product, determining food quality and acceptability, especially for foods that need to be stored for long periods [60]. In the present study, the LD block from SNP S18_2907312 to S18_3567816 had a high additive effect on these traits; this might support early selection of genotypes to improve these attributes.
Generally, cassava starches with lower PastTemp have some advantages for industrial food and non-food processes, since they form pastes more easily, reducing the energy cost of starch processing for production of certain foods [61]. Therefore, cassava accessions with a sufficient variation in PastTemp should be able to provide a wider range of options for starch industries. Therefore, SNPs on chromosomes 3 (S3_5856578) and 8 (S8_32339820), with negative (-0.46 ºC) and positive (1.26 ºC) effects, respectively, for PastTemp can be useful genomic regions underlying PastTemp for MAS.
On the other hand, the SetBack trait should also increase under this selection, so there will be a greater retrogradation tendency. Although this feature is not desired in some industrial applications, especially producing foods that cannot lose water during storage, starch with high values of SetBack can be used to increase the dietary fiber content as well as modulate the glycemic index due to the lower digestibility of the starch, especially in diet foods [5,62].
For PeakVisc, the dominant effect of the A allele of SNP S18_13522313 can support the selection of genotypes with high-viscosity starches for the development of products with a high thickening power [63]; for example, puddings and gumdrops, as well as fat substitutes in the manufacture of ice creams [64,65]. On the other hand, accessions with the G allele, which has a negative effect, can be used for the manufacture of products that require low viscosity, such as cheese-like products, jellies, processed meats, and extruded snacks, since they generally present lower granule swelling, higher solubility, a higher gelatinization temperature range, and a lower retrogradation tendency [66].
Starch pasting properties are characteristics predominantly controlled by a few genes with a significant effect on phenotypic expression [11]. Previous studies have reported that pleiotropic effects and QTL hotspots are the primary factors affecting pasting properties in rice [39]. The pleiotropic effects are important for the genetic improvement of crops, especially when the gene affects multiple traits of interest, since they allow effective direct selection of genotypes [38]. However, in cassava the moderate to high heritability found in the present study and the large LD block on chromosome 18, and a few other regions associated with pasting traits, suggest that a set of overlapping QTLs can control the same traits. Therefore, it is possible that these correlated effects are primarily due to the genetic linkage instead of pleiotropy.
In the present study, even with the high correlation between PeakVisc × BreDow (r = 0.86), no SNPs were associated with BreDow. In rice, nine QTLs associated with BreDow were detected in different chromosomes and environments, and QTL qBDV11 explained the highest proportion of phenotype variation (43.88%) [39]. Therefore, environmental and even pleiotropic effects may be responsible for the expression of BreDow in cassava starch. However, future studies including cassava starch obtained in different environments × years of cultivation, in addition to another set of germplasm and high-density SNPs, should be analyzed to confirm this hypothesis.

Comparing different GWAS methods
The MLM method detected the largest number of SNPs significantly associated with Cold-PVisc, FinalVis, Hot-PVisc, PeakVisc, and/or Setback on chromosome 18 (S18_2907312, S18_3081635, S18_3399799, S18_3399801, S18_3407893, S18_3408138, S18_3567791, S18_3567816, and S18_13522313), while the MLMM method identified only four SNPs, on chromosomes 17 and 18 (S18_3081635, S18_3399799, S18_13522313, and S17_17327003), associated with Cold-PVisc, FinalVis, Although there have been important improvements in the statistical power of GWAS methods under certain assumptions, the choice of the most appropriate methods for a given species and dataset is still a task that must be investigated. In the specific case of cassava starch paste properties, although we selected the three methods with the highest statistical power for detecting QTLs, there was a large discrepancy in the significance of most SNPs. Therefore, it will still be necessary to validate the genomic regions we found through gene expression analysis or even the development of KASP markers to complement the statistical and biological evidence for the practical use of these markers in molecularassisted selection processes. GH35 is involved in chemical reactions and metabolic pathways associated with carbohydrates, or any group of organic compounds based on the general formula Cx(H2O)y. It includes the formation of carbohydrate derivatives by adding carbohydrate residues to other molecules, through selective interaction with any carbohydrate, which includes monosaccharides, oligosaccharides, and polysaccharides [72]. On the other hand, MPDU1 has a biological function in glycosyl transferase, which consists of a superfamily containing hundreds of families that are generally located in the cytosol and are involved in the synthesis of products such as flavonoids, phenylpropanoids, terpenoids, and steroids, as well as plant hormonal regulation [73]. In addition, glycosyltransferases act in the antioxidative system, which is related to the neutralization and elimination of free radicals, thus preventing damage to biomolecules [74]. The glycosyltransferases, together with the glycosidases, comprise a group of enzymes of great importance in 13 the biosynthesis and remodeling of the cell wall of carbohydrate polymers during different stages of plant development [75].

Transcripts
Other studies have reported that the genes GRMZM2G146028 (AP2/EREBP family transcription factor) on chromosome 4 and GRMZM2G142709 (glycosyltransferase) on chromosome 5 are likely candidate genes for QTLs qPV4-1 and qTV4-1, and qFV5-2, respectively, which regulate PeakVisc, trough viscosity, and FinalVis [58]. In other starchy species such as rice, a transcription factor of the AP2/EREBP family has been identified as a determinant in the physicochemical properties of starch, by regulating the starch biosynthesis underlying the amylopectin structure [76]. Based on the analysis of biparental cassava populations, [11] reported a QTL with greater effect for PastTemp (qPT.1LG1), colocalized with candidate genes encoding the family of glycosyl or glucosyl transferases and hydrolases at the periphery of QTL peaks. Therefore, these results corroborate our demonstration that GH35 and MPDU1 can be effectively associated with several RVA parameters of cassava starch. However, this effective association requires further study, which would involve strategies like the fine mapping of QTLs or knockouts and overexpression of candidate genes in QTL intervals.
Most of the other transcripts present in genomic regions associated with cassava starch paste properties have general functions in numerous cellular processes, including cell division, DNA duplex unwinding, endonucleolysis, endosperm development, protein folding, sorting and degradation, general metabolism, genetic information processing, glycerolipid metabolism, heme oxidation, photosynthesis, lipid metabolic processes, MAPKK activity, metabolism of cofactors and vitamins, negative regulation of the abscisic acid-activated signaling pathway and kinase activity, nucleic acid metabolic processes, phosphorylation, plant hormone signal transduction, protein phosphorylation, proteolysis, proton transmembrane transport, regulation of cyclin-dependent protein serine/threonine kinase activity, regulation of transcription, RNA phosphodiester bond hydrolysis, SCF-dependent proteasomal ubiquitindependent protein catabolic processes, signaling and cellular processes, and steroid biosynthetic processes [77]. This is one of the first studies using GWAS to understand cassava starch paste properties and can contribute to more complex investigations of variations in candidate genes that may affect different accessions, thus serving as a basis for gene cloning related to starch quality, in order to further understand its complex metabolic pathway and pasting properties. Future studies should focus on the resequencing the candidate genes and fine mapping of QTLs, with a large number of accessions and the validation of these candidate genes in different environments.

Final remarks
In this study, the genetic basis of cassava starch paste viscosity was dissected using GWAS with a medium to high density of SNPs markers. GWAS is a tool with enormous potential to accelerate breeding strategies to improve the quality of cassava roots for industrial use, as it allows breeders to make the selection based on the most effective marker-trait associations. In general, few genomic regions were associated with any RVA parameters on cassava chromosomes 3, 8, 17, and 18, indicating that a only a few genes may be responsible for the expression of these traits. Thus, the transfer of these genes/QTLs to different genetic backgrounds via backcrosses or even recurrent selection programs can be easily obtained and even monitored after the development and validation of specific markers for these RVA parameters.
This is a pioneering study on the genetic architecture and the mapping of genomic regions associated with cassava starch paste properties. Two SNPs located close to genomic regions that perform several functions in plant metabolism have been identified. Although they are not directly associated with the genes of starch biosynthesis, they are indicative of the discovery of new alleles that control these variations in cassava germplasm. The candidate genes identified in this study are useful for validation of QTLs, fine mapping, and cloning of genes, to help breeders enhance starch pasting properties through marker-assisted selection. • The funder provided support in the form of fellowship and funds for the research, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.   Table. List of cassava accessions with geographic and genetic origin, state of collection, and entry date in the Embrapa Cassava Germplasm Bank.