Structural and functional analysis of genes with potential involvement in resistance to coffee leaf rust: a functional marker based approach

Physiology-based differentiation of SH genes and Hemileia vastatrix races is the principal method employed for the characterization of coffee leaf rust resistance. Based on the gene-for-gene theory, nine major rust resistance genes (SH1-9) have been proposed. However, these genes have not been characterized at the molecular level. Consequently, the lack of molecular data regarding rust resistance genes or candidates is a major bottleneck in coffee breeding. To address this issue, we screened a BAC library with resistance gene analogs (RGAs), identified RGAs, characterized and explored for any SH related candidate genes. Herein, we report the identification and characterization of a gene (gene 11), which shares conserved sequences with other SH genes and displays a characteristic polymorphic allele conferring different resistance phenotypes. Furthermore, comparative analysis of the two RGAs belonging to CC-NBS-LRR revealed more intense diversifying selection in tomato and grape genomes than in coffee. For the first time, the present study has unveiled novel insights into the molecular nature of the SH genes, thereby opening new avenues for coffee rust resistance molecular breeding. The characterized candidate RGA is of particular importance for further biological function analysis in coffee.


Introduction 40
Coffee is one of the most valuable cash crops in many developing economies as it provides 41 employment opportunities in cultivation, processing and marketing activities, thereby sustaining the 4 79 In coffee, the origin and organization of disease resistance genes have begun to emerge in recent 80 years as part of an effort to understand the role of major rust resistance genes. One such endeavor was the 81 assembly of R genes spanning the S H 3 locus with the objective of tracing the evolution and diversity of 82 LRR domains in three coffee species [6]. Despite the partial sequencing and annotation of several disease 83 resistance genes in Arabica coffee [20], completely sequenced and characterized candidate genes are not 84 yet readily available. Resistance to rust is conferred by nine major genes (S H 1-9) and the corresponding 85 v 1-9 pathogen factors are known for long in the coffee rust pathosystem [3,21]. Nonetheless, molecular 86 and functional characterization of any of the S H genes and the associated regulatory elements is entirely 87 obscure, yet holds immense potential in changing the perspective of rust resistance breeding. Likewise, 88 the use of functional markers that serve as a direct rust resistance screening tool amongst the host-89 differential coffee clones is important but is barely addressed. The lack of a typical candidate rust 90 resistance gene is one of the bottlenecks in coffee breeding. A resistance gene analog (RGA) marker, 91 CARF005, was previously confirmed to share disease resistance ORF region in coffee [20,22]. This 92 polymorphic RGA marker encodes the disease resistance protein domain NB-ARC (nucleotide binding 93 site-ARC: ARC for APAF-1, R protein and , exclusive in coffee cultivars resistant to H.

94
vastatrix [22]. The complete sequencing and molecular characterization would help identify candidate 95 disease resistance genes. The state-of-the-art bioinformatics analysis, availability of differential coffee 96 clones with specific S H genes, structural and functional analysis of conserved domains and associated 97 motifs of candidate RGAs belonging to the S H gene series could greatly advance coffee rust resistance 98 breeding. Therefore, the objectives of this study were to trace the origin of resistance gene analogs 99 (RGAs) involved in coffee rust resistance and perform comparative molecular characterization of selected 100 candidate gene to determine whether it belongs to the S H gene series. We also investigated if any of the 6 136 achieved by incubating the plates at 37 °C for 18 h and shaking at 180 rpm. The identification of clones 137 using the CARF005 insert was performed by grouping and subsequent group decomposition of the 384 138 clones until a single clone was identified as outlined in S1 Fig The single BAC clone isolated using the CARF005 fragment was sequenced using the Illumina 144 HiSeq2000/2500 100PE (paired-end reads) platform at Macrogen (Seoul, South Korea). Paired-end 145 sequence processing and contig assembly were done using SPAdes software [27]. Contigs that matched 146 bacterial genome (E. coli) and sequences of the flanking vector (pCC1BAC TM ) were excluded prior to any 147 downstream sequence processing. The assembled BAC contigs were used to map against a transcriptome 148 constructed from coffee genes that were activated in response to H. vastatrix infection by Tophat 2 [28] 149 and to locate the contig region with active gene expression.

188
Resistance gene screening among the differential coffee clones 8 189 To investigate the linkage of RGAs to known SH genes, differential coffee clones with different 190 SH genes were subjected to RGA screening using the functional marker, CARF005. Of the 21 differential 191 coffee clones, the marker was detected in eight clones as presented in Table 1

213
Unknown race (-), coffee genotypes used as negative control (c), presence/absence of allelic differences among S H genes (+/-) and unknown S H gene (s) (?).    ORFs showed a range of protein arrays most of which had no role in disease resistance and lacked conserved 259 domains. Among the five RGAs detected in either NCBI BLASTp, or BLASTn against the C. canephora 260 genome, genes 9 (unnamed protein product), 10 (putative resistance gene) and 12 (putative resistance gene) 261 had similarity to RGAs as observed by mapping to C. canephora genome as presented in S3  x resistance protein domain and four additional multi-domains featuring the entire protein sequence (Fig 1).

282
These genes can be referred as CC-NBS-LRR, as they comprise the N-terminal CC and LRR C-terminal In addition, annotation of both genes indicates that they encode defense proteins involved in various 290 biological defense as demonstrated in Table 3. Notably, although these genes share 90.24% nucleotide 291 identity, their amino acid sequence identity is only 80.03% (Fig 2). The possibility of substitution mutation 292 events was considered in explaining the diversity. Accordingly, the overall amino acid diversity was 293 attributed to non-synonymous substitution events (non-synonymous/synonymous ratio, ka/ks = 1.5913) in 294 both genes. Further analysis of LRR region showed a higher rate of non-synonymous substitution mutation 295 (ka/ks, non-synonymous/synonymous substitution ratio = 1.9660).

314
Few sites that were specific to each gene were highly affected while most of the binding sites had moderate to 315 no effect as presented in S6 showed high variability in the LRR regions (Fig 2; Fig 3a & c). Although most of the residues are not 319 conserved, the ADP binding site of the NBS domain contained some conserved sites (Fig 3b.II & d.II).

331
where 2 refers to the highest confidence) and 0.29 and 0.17 (C-score ranging from 0-1, where higher score 332 indicates reliable prediction) for nucleotide binding prediction for the two proteins, respectively.

Interlocus comparison of S H genes 335
To investigate the conserved regions in the five RGAs (genes 5, 9, 10, 11 and 12), contigs 3 and 9 336 were queried against three C. canephora and 10 C. arabica specific contigs assembled from BAC clones 337 spanning S H 3 locus from the work of [6]. All the 10 S H 3 contigs matched with contig 3 but with varying 338 alignment lengths and identities as presented in S7

Phylogenetic analysis 345
In an attempt to discern the ancestral relationship of a set of sequences, phylogenetic analysis was 346 performed. Accordingly, two resistance gene families (the NBS-LRR and non-NBS-LRR) were identified, 347 completely sequenced and mapped to chromosome 0 of C. canephora genome with a query coverage of 348 99.94% for genes 5 and 11, 72.68% for gene 9, 33.05% for gene 10 and 97.52% for gene 12. The diversity of 349 the NBS-LRR family was detected by analyzing the ka/ks ratios as presented in

358
Moreover, phylogenetic analysis showed that the tomato gene 5 was closely related to genes 5 and 359 11 of coffee than the gene 11 of both tomato and grape (Fig 4). Within coffee itself, a significant diversity 360 between genes 5 and 11 was detected by the MEGA 7 bootstrap method of the phylogenetic test.  [8,9,42,43]. In the present study, a cluster of two different classes of RGAs resistant to coffee rust, the NBS-

391
Additionally, the mapping against the transcriptome of C. arabica-H. vastatrix interaction suggests that the 392 three other clustered RGAs (genes 9, 10 and 12) have differential-expression during the incompatible 393 interaction. Mapping study has also revealed the presence of reads exclusively mapped to transcripts of 394 pathogen-infected plants at 12 and 24 hai.

412
Mapping of the RGAs to C. canephora, the result from differential clone screening and annotation 413 altogether confirmed that gene 11 locus is descended from C. canephora, hence is a sibling of S H 6-9 [49].

414
Besides, mapping of RGAs to C. canephora was complemented by the differential clone screening for SH

439
Different selection pressures shape the evolution of domains in the NBS-LRR encoding genes. The

440
NBS domain was assumed to be under the purifying selection (a negative selection in which variation is 441 minimized by stabilizing selection) than by the diversifying selection, which acts on the LRR domain [9,62].

442
In contrast, the diversifying selection (positive selection) act on all the domains of genes 5 and 11 (ka/ks >1).

443
This result is contrary to the general assumption that diversifying selection is diluted when the overall non-444 synonymous substitution is considered [6], indicating an intense diversifying selection action on both genes.

445
Further investigation of four more orthologous genes also resulted in similar findings, indicating that the 446 NBS-LRR genes are highly variable due to substitution mutations. As the LRR domains are involved in direct 447 ligand binding, their variability due to non-synonymous substitution is higher than that seen in other domains.

448
This results in the formation of a super-polymorphic region to cope with the continuously evolving pathogen 449 effectors. Similar findings (from different plants, including coffee) on diversifying selection have been 450 reported [6,9,11,38,45,63,64]. Diversifying selection by non-synonymous substitution was detected in non-

454
Based on the phylogenetic tree of orthologous genes originated from related genomes, the six genes 455 could be divided into two groups. Gene 5 from tomato is closely related to genes 5 and 11 from coffee, 456 making the first group, whereas genes 5 and 11 from grape happens to be the second highly diversified group.

497
Mapping was carried out by CDS (coding sequence) BLASTn followed by track assembly on C. canephora 498 genome hub server 5 .