LysM receptors in Coffea arabica: Identification, characterization, and gene expression in response to Hemileia vastatrix

Pathogen‐associated molecular patterns (PAMPs) are recognized by pattern recognition receptors (PRRs) localized on the host plasma membrane. These receptors activate a broad-spectrum and durable defense, which are desired characteristics for disease resistance in plant breeding programs. In this study, candidate sequences for PRRs with lysin motifs (LysM) were investigated in the Coffea arabica genome. For this, approaches based on the principle of sequence similarity, conservation of motifs and domains, phylogenetic analysis, and modulation of gene expression in response to Hemileia vastatrix were used. The candidate sequences for PRRs in C. arabica (Ca1-LYP, Ca2-LYP, Ca1-CERK1, Ca2-CERK1, Ca-LYK4, Ca1-LYK5 and Ca2-LYK5) showed high similarity with the reference PRRs used: Os-CEBiP, At-CERK1, At-LYK4 and At-LYK5. Moreover, the ectodomains of these sequences showed high identity or similarity with the reference sequences, indicating structural and functional conservation. The studied sequences are also phylogenetically related to the reference PRRs described in Arabidopsis, rice, and other plant species. All candidates for receptors had their expression induced after the inoculation with H. vastatrix, since the first time of sampling at 6 hours post‐inoculation (hpi). At 24 hpi, there was a significant increase in expression, for most of the receptors evaluated, and at 48 hpi, a suppression. The results showed that the candidate sequences for PRRs in the C. arabica genome display high homology with fungal PRRs already described in the literature. Besides, they respond to pathogen inoculation and seem to be involved in the perception or signaling of fungal chitin, acting as receptors or co-receptors of this molecule. These findings represent an advance in the understanding of the basal immunity of this species.

Introduction studies demonstrate that the PTI and ETI form a continuum, which is necessary for a durable and efficient defense response [11]. Therefore, programs that seek to enable resistance to phytopathogens, with a focus on increasing the capacity of the recognition system, are successful by adding the PTI and ETI as the main strategy for obtaining resistant cultivars [14,26]. Few non-model plants, such as barley [27], apple [28,29] and mulberry [30], had PRRs characterized. Coffea arabica is an important coffee species cultivated in countries such as Brazil, Vietnam, Colombia, and Indonesia and is consumed around the world [31]. PAMP receptors have been scarcely studied in Coffea spp., therefore, it is crucial to identify the receptors that are present in their genome, and whether there is a response induced by the inoculation of pathogens, thus allowing the use of PRRs in coffee breeding programs.
The rust is the main coffee disease, causing severe losses in productivity in all regions where coffee is cultivated [32,33]. In Brazil, the biotrophic fungus Hemileia vastatrix Berk. & Br, the etiological agent of coffee rust, has caused damage since the 1970s [34,35]. In regions with favorable conditions for the pathogen, the decline in productivity can reach 50% [35]. To circumvent such damage, chemical control has been used, however, the use of tolerant or resistant cultivars is a viable alternative to reduce costs and possible environmental damage [32,36,37]. Therefore, the goals of this study were (i) to identify the pattern recognition receptors (PRRs) for fungi in the C. arabica genome, (ii) to characterize these sequences for protein domains and motifs and (iii) to analyze the gene expression of these PRRs in cultivars of C. arabica contrasting to rust resistance inoculated with H. vastatrix. The data obtained suggested that C. arabica has LysM receptors that act as fungal PAMP receptors, and that the expression of these receptors is stimulated after H. vastatrix inoculation. Our results contribute to the understanding and future employment of PRRs in coffee breeding programs.

Identification and characterization of specific PRRs for fungi in the C. arabica genome
The reference PRRs described in the literature for fungal PAMPs recognition in Arabidopsis thaliana and in Oryza sativa were selected: At-CERK1, At-LYK4, At-LYK5 and Os-CEBiP (Table 1). To identify these receptors, the C. arabica genome (accession UCG-17, variety Geisha) sequenced by the University of California (UC Davis Coffee Genome Project) and partially available in the Phytozome database (https://phytozome.jgi.doe.gov/pz/portal.html) was used. The search was based on sequence similarity and domain conservation. For this, a BLASTp (Align Sequences Protein BLAST) with default parameters was performed in Phytozome. The C. arabica sequences returned by BLASTp were selected based on the following criteria: e-value � 10 −5 , extracellular domain corresponding to the reference sequence used (Lysin motifs -LysM), and transmembrane or GPI anchor domain. The domains were analyzed using the SMART (http://smart.embl-heidelberg.de/), the TMHMM2.0 (http://www.cbs. dtu.dk/services/TMHMM/) and the PredGPI (http://gpcr.biocomp.unibo.it/predgpi/pred. htm).
After selecting the sequences of C. arabica, they were again compared to the reference sequences by phylogenetic analysis. This analysis enabled to identify which peptide sequences had the greatest phylogenetic similarity to the reference PRRs, thus allowing the selection of candidate sequences. Additionally, considering that these PRRs present protein domains very close, a joint phylogenetic tree, with the candidate sequences in C. arabica, the reference PRRs and homologs (Table 1), was also created to confirm the separation of these groups and the homology of these sequences. The databases used to retrieve the reference sequences were: the GenBank from the National Center for Biotechnology Information (NCBI) sequence database, the Arabidopsis Information Resource (TAIR), the Sol Genomics Network, the Apple Genome and Epigenome, and Phytozome. The complete amino acid sequences were aligned by the CLC Genomics Workbench software version 11.0.1 (QIAGEN) (default parameters with very accurate) and the phylogenetic tree was generated by the Mega software version 10.1.8 [38] using the Maximum Likelihood method with a bootstrap of 1000 replications.
To characterize the extracellular regions of the candidate sequences, the lysin motifs (LysM) were used for multiple alignments between the candidate and reference sequences. The LysM motifs of each sequence were predicted by SMART using the extracellular region and aligned by the MAFFT program online version (https://mafft.cbrc.jp/alignment/server/) [39]. After the alignment, the visualization and calculation of the identity and similarity of each of the candidate sequences against the reference sequences were obtained by BioEdit version 7.2.5 [40].
Considering the fact that C. arabica is an allotetraploid (2n = 4x = 44 chromosomes), originated from natural hybridization between C. canephora and C. eugenioides [41,42], the sequences selected as PRR candidates for the arabica coffee (variety Geisha from Phytozome) were also analyzed by BLASTp in the database of the NCBI (https://www.ncbi.nlm.nih.gov/) against the genome of C. arabica, Red Caturra cultivar (Cara_1.0, GenBank assembly accession: GCA_003713225.1). This genome was deposited after the beginning of this study and presents the scaffolds anchored to the chromosomes of each ancestral subgenomes. This analysis aimed to verify the possible genomic origin of the studied PRRs.

Primer design
The C. arabica sequences selected as candidates by the phylogenetic analysis were used for primer design. The primers were designed using the Primer Quest software and their quality was analyzed using the Oligo Analyzer software, both available online by IDT (Integrated DNA Technologies, USA). After the primers were designed, they were blasted (BLASTn-Standard Nucleotide BLAST) against the NCBI and Phytozome database (https://blast.ncbi. nlm.nih.gov/Blast.cgi) to attest their specificity through the identification of non-complementarity with nonspecific sequences.

Fungal inoculum preparation
The inoculum used was obtained from leaves of C. arabica naturally infected with H. vastatrix. The pustules of these leaves were scraped and placed in microtubes, were frozen in liquid nitrogen, and stored in a freezer at -80˚C. To prepare the inoculum, the stored spores were submitted to 40˚C thermal shock for 10 min, added in sterile distilled water and the suspension was calibrated at 1 x 10 6 urediniospores/mL. The viability of inoculum was verified by observing the spore germination in glass cavity slides. After preparing the suspension for plant inoculation, three drops were transferred to glass cavity slides, which were incubated at 25˚C for 48 hours. After the incubation, the spores were visualized under an optical microscope, so their germination could be observed (S1 Fig).

Plant materials, experimental design, and inoculation
Aiming to analyze the gene expression of the PRR selected candidates, seedlings of four cultivars of C. arabica were used, being two rust susceptible cultivars, Catuaí Vermelho IAC 144 (CV) and Mundo Novo IAC 367-4 (MN), and two rust resistant, Aranãs RV (AR) and Iapar-59 (IP). The experiment was conducted in a randomized complete block design (RCBD) with three replicates and an experimental plot consisting of three plants. The treatments were arranged in a 2 x 3 x 4 factorial scheme, the factors being: condition (inoculated and not inoculated); evaluation times (06, 24 and 48 hours post-inoculation-hpi) and cultivars (Catuaí Vermelho IAC 144, Mundo Novo, Aranãs RV, and Iapar-59). The experiment was repeated twice independently. Young plants (3-4 pairs of leaves) were inoculated in a growth chamber with a controlled environment (temperature of 22 ± 2˚C, relative humidity of 90%) favoring the disease development. The suspension was sprayed on abaxial leaf surfaces and the inoculated plants were kept in the dark in a humid chamber according to a previously published methodology [43]. The control plants (sprayed with pure water only) were also sampled at all the evaluated time points. All the leaves collected were immediately frozen in liquid nitrogen and subsequently stored in a freezer at -80˚C. After the treatment and sampling, the plants were kept in a greenhouse until the first symptoms and signs of the pathogen were seen to make sure the inoculation was effective (S2 Fig).

RNA extraction and quantification
The leaf samples were ground with liquid nitrogen until a fine powder was obtained. The ground material was stored in a ultrafreezer at -80˚C until the RNA extraction was performed. The extraction was performed using the Plant RNA Purification Reagent (Thermo Fisher).
Subsequently, the RNA was treated with DNase (RQ1 RNase-Free DNase, Promega) to remove any residual DNA in the sample. These procedures were performed according to manufacturer's instructions. The integrity of the RNA was verified on 1% agarose gel and quantified on the NanoDrop One spectrophotometer (Thermo Fisher). All samples used showed a ratio reading 1.8-2.0 of absorbance at 260/280 nm and 260/230 nm for high-quality RNA.

cDNA synthesis and RT-qPCR
An aliquot containing 1 μg of total RNA (treated with DNase) was used for cDNA synthesis using the High-Capacity cDNA Reverse Transcription Kit with RNase Inhibitor (Thermo Fisher). After the synthesis, the cDNA was diluted 5x and stored at -20˚C. The RT-qPCR were performed in the QuantStudio1 3 Real-Time PCR System (Applied Biosystems) using the SYBR1 Green detection system. The amplification conditions were: 50˚C for 2 min and 95˚C for 10 min, 40 cycles: 95˚C for 15 s, 60˚C for 1 min and a final step of 95˚C for 15 s (melting curve). The final reaction volume was 10μL contained the following components: 1. For each of the three biological samples, technical triplicates were used and for each plate an inter-assay sample was used to ensure the reproducibility of the technique. The relative quantification was calculated according to the formula by Pfaffl, 2001 [44]. Referring to the data normalization, the expression stability of four reference genes was analyzed: protein 14-3-3 (14-3-3), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), ribosomal protein 24S (24S) and factor elongation 1α (EF1-α) [45][46][47][48]. The efficiency correction of these genes in Cq values was performed by the GenEx Enterprise program (version 7.0) and the stability was verified by the RefFinder tool [49]. The two most stable genes were 14-3-3 and GAPDH (S3 Fig), which were used to normalize the transcription levels of the target genes. The samples with the lowest expression were used as calibrators. The MN 48 hpi was used as calibrator sample, except for the Ca1-CERK1 (experiment 2), which was used the IP 48 hpi sample. The PCR amplification efficiencies and linear regression coefficients were determined using the Lin-RegPCR software version 2018.0 ( Table 2) [50]. The average expression was obtained by the ratio of the sample inoculated with H. vastatrix compared to the average of the control treatment (without inoculation).

Statistical analysis
The relative expression data of the two experiments were subjected to analysis of variance, using the following model: is the effect of block b within experiment k; E k is the effect of experiment k, C i is the effect of cultivar i, T w is the effect of time w, (EC) ki is the effect of the interaction between experiment k and cultivar i, (ET) kw is the effect of the interaction between experiment k and time w; (CT) iw is the effect of the interaction between cultivar i and time w; (ECT) kiw it is the effect of the interaction between experiment k cultivar i and time w; e kiw is the effect of the experimental error, \ N(0, σ e 2 ). Checks for outliers and of the assumptions of residuals from models were accomplished using diagnostic plots within the R software [51].
The interaction between cultivar and time was decomposed and the means between the levels of the factors were analyzed by Tukey's test at 5% of probability. Data analysis was performed using the R software [51].

Identification and characterization of specific fungal PRR in the C. arabica genome
The BLASTp analysis in Phytozome with the reference PRRs resulted in 4, 10, 12 and 14 sequences in the C. arabica genome for Os-CEBiP, At-LYK5, At-CERK1 and At-LYK4, respectively (Fig 1 and S1 Table). These sequences were selected because they have e-value � 10 −5, extracellular region containing lysin motif (LysM) and transmembrane domain or GPIanchor. After the phylogenetic analysis, two candidate sequences were selected for LYK4 (Scaffold 612.376 and 952.320) and LYK5 (Scaffold 628.522 and 1841.91) (Fig 1B and 1D and S1 Table) and four ones for CERK1 (Scaffold 539.592, 1805.113, 2193.164 and 476.38) (Fig 1A and S1 Table). As the phylogenetic analysis for candidate sequences to the CEBiP protein did not result in a significant bootstrap (Fig 1C), other proteins belonging to the LYP clade (CEBiP-like) described in Arabidopsis and rice were included in a new analysis: At-LYP1 (At-CEBiP / LYM2), At-LYP2 (LYM1), At-LYP3 (LYM3), Os-LYP4 and Os-LYP6 (Table 1).
The new phylogenetic analysis for CEBiP ( Fig 1E) showed two distinct clades. The clade one formed by the sequences Scaffold 506. 17    Scaffold sequences 439.212 and 1196.90 showed greater similarity with the Os-CEBiP homologue in A. thaliana (At-LYP1), they were selected as candidate sequences for the CEBiP-like (Fig 1 and S1 Table). Moreover, the At-LYP2 (LYM1) and At-LYP3 (LYM3), belonging to clade one, are described in the literature for their ability to recognize the peptidoglycan, a bacterial PAMP [52]. These sequences formed the nearest clade to the Scaffold 506.17 and 1856.2 sequences, substantiating the choice of the two C. arabica sequences belonging to clade two. The Os-LYP4 and Os-LYP6 that play a dual role, recognizing peptidoglycan and chitin [25], were not evaluated in this study. All the domains found in the coffee candidate sequences correspond to the characteristic domains of the reference sequences. The description of these sequences such as identity and similarity in relation to the reference sequences as well as the gene size, the CDS and the number of exons, are shown in Table 3. The candidate sequences for CERK1, LYK4 and LYK5 have an extracellular LysM domain (with three LysM), a transmembrane domain, and an The extracellular lysin motif regions (LysM1, LysM2 and LysM3) for these sequences ranged from 38 to 49 aa. The multiple alignments of these regions with the reference proteins showed high residue conservation but varied among the studied receptors (Fig 3). Out of eleven residues described as important for the chitin oligomer binding function in At-CERK1 [53,54], eight ones displayed identity or similarity with the candidate sequences in C. arabica. For Os-CEBiP, from nine described [55], only three were present. In At-LYK5, only one of three described [23] showed similarity with C. arabica sequences. The tyrosine (Tyr) residue, located at position 128 in At-LYK5, considered as the fourth chitin-binding residue for this receptor, was not analyzed, as it is present between the LysM1 and LysM2 motifs, a region that was not analyzed in the alignment.

Joint phylogenetic analysis and BLASTp against the genome of C. arabica, Caturra red cultivar
A joint phylogenetic tree was created to verify whether the candidate sequences would form distinct clades, including the reference sequences used. This tree was composed of the selected candidate sequences for PRRs in C. arabica, the reference sequences used to search for these PRRs in coffee (At-CERK1, At-LYK4, At-LYK5 and Os-CEBiP) and homologs of these proteins described experimentally in the literature (Table 1). This analysis formed four clades that separated the candidate sequences in coffee with the respective reference proteins used, confirming their phylogenetic relationships (Fig 4).
The Clades II and III belonging to LYK4 and LYK5 formed closer clades. The coffee sequences were grouped more closely to the LYK4 homologues in grape and for the LYK5 they formed a subclade with the reference sequence At-LYK5 and its homolog also in grape (Vv-LYK5-1). In clade IV, belonging to the CEBiP cluster, it was observed that candidate sequences in coffee were significantly grouped with the Os-CEBiP homologs. The BLASTp analysis in the NCBI database against the genome of C. arabica (Red Caturra cultivar) showed that six candidate sequences for PRRs in C. arabica (variety Geisha) have greater percentage of identity with sequences belonging to the C. eugenioides subgenome and four showing greater identity with the C. canephora subgenome (Table 4). This analysis allowed us to identify that each of the candidate sequences for LYK5 and LYP, in addition to the two sets of sequences for CERK1 (considering subclades I and II, Fig 1), had greater identity with sequences from each of the subgenomes. For LYK4, both candidate sequences had greater identity with a sequence in the C. eugenioides subgenome.

Primer design
The four sequences selected as candidates for CERK1 in the C. arabica genome by phylogenetic analysis formed two distinct subclades (Fig 1A).   Fig 3. Alignment of the LysM motifs between reference sequences and candidate sequences in C. arabica. The LysM motif sequences were aligned using MAFFT and visualized by BioEdit. The numbers at the beginning of each sequence represents the scaffold (candidate sequence in C. arabica). The green line highlights the reference sequence. The purple and gray shading represent identical and similar amino acids, respectively. The percentages of identity and similarity between candidate sequences and references are indicated by � and �� , respectively. In red are the critical residues that bind to chitin and the green arrows indicate residues identical or similar to these regions present in the candidate sequences in C. arabica. The numbers at the end of each sequence represent the size of the LysM motifs in number of amino acids. For the primer design in the gene expression analysis, the formation of these two subclades was considered, thus using a pair of primers for each of the formed subclades. They were named Ca1-CERK1 and Ca2-CERK1 respectively and are referred to as such in the gene expression analysis (Table 2).  (Table 2).

Transcriptional response of candidate receptors in C. arabica
To verify the transcriptional responses of the candidate sequences to the PRRs in C. arabica, four cultivars with contrasting rust resistance levels were inoculated with H. vastatrix. The inoculum used displayed viability in both tests: the one with the glass cavity slides (S1 Fig) and the other about the ability to cause the disease symptoms and signs in susceptible cultivars CV and MN (S2 Fig). The resistant cultivars AR and IP presented no symptoms or signs of the disease. The fungal inoculation induced the expression of all candidate receptors in all cultivars and studied time points. To a greater or lesser degree, there was an increase in expression from 6 hpi (Fig 5), with the peak varying between 6 and 24 hpi, followed by a decrease at 48 hpi.
The two groups of candidate sequences for CERK1 showed different expression profiles (Fig 5A and 5B) at 24 hpi. The Ca1-CERK1 had higher expression than Ca2-CERK1. Concerning the former, the expression rate was seven times higher than that of the control in cultivar MN, regarding the latter, the highest value did not reach twice as much for IP. When the time expression levels were analyzed for each cultivar in the two groups (Fig 5A and 5B), there was a significant difference for 24 hpi, except for CV Ca2-CERK1. For the Ca1-CERK1, the analysis between cultivars (Fig 5A) showed that IP and MN displayed approximately 6-and 7-fold higher expression levels at 24 hpi, respectively, demonstrating significant differences compared to AR and CV. No significant difference was observed for 6 and 48 hpi. Concerning Ca2-CERK1 (Fig 5B), the analysis between cultivars showed that at 6 hpi it was the most expressed in CV and MN. At 24 hpi, the highest expression was in IP, and at 48 hpi the same cultivar showed a reduction in its expression, which was the least expressed among the cultivars.
A similar profile to CERK1 was observed for the sequences studied as candidates for LYP and LYK5 (Fig 5C-5F). The Ca1-LYP and Ca2-LYK5 obtained cultivars with higher expression Table 4. BLASTp analysis of candidate sequences in C. arabica (Geisha) against C. arabica (Red Caturra). levels at 24 hpi than Ca2-LYP and Ca1-LYK5, however, for these genes, the candidate sequences were studied apart. Considering Ca1-LYP and Ca2-LYP (Fig 5C and 5D), the expression patterns were different at 6 and 24 hpi. The Ca1-LYP expression levels did not reach twice as much compared to the control at 6 hpi, while for Ca2-LYP the highest averages were observed at that time. Moreover, regarding the Ca1-LYP, all cultivars showed an

PLOS ONE
expression above twofold higher at 24 hpi. Therefore, the greatest inductions for Ca2-LYP occurred at 6 hpi while for Ca1-LYP they happened later at 24 hpi. The expression differences in time for each cultivar considering Ca1-LYP (Fig 5C) showed that AR and IP have significant differences at 24 hpi, which did not occur in CV and MN. The analysis between cultivars showed that at 6 hpi and 48 hpi there were no differences, but that at 24 hpi, IP was the cultivar that showed the highest expression, reaching 6-fold higher. Considering Ca2-LYP (Fig 5D), AR and CV showed higher expressions at 6 hpi. For IP and MN, the largest expression occurred at 6 and 24 hpi, with no difference between these times. The analysis between cultivars showed that at 6 hpi, AR obtained the highest expression while IP presented the lowest expression. On the other hand, at 24 and 48 hpi, there were no differences between cultivars. However, it was found that 48 hpi was the time with the lowest average observed, within and between cultivars.
For Ca1-LYK5 (Fig 5E), there was a difference between the times for all cultivars, except for AR. The MN cultivar had the highest average at 6 hpi, while IP obtained the highest at 24 hpi. For the cultivar CV, there were no differences between these times, only at 48 hpi. Concerning the analysis between cultivars, the MN obtained the highest average at 6 hpi and IP at 24 hpi. At 48 hpi, there were no differences between cultivars and this time presented the lowest average for all. Referring to Ca2-LYK5 (Fig 5F), all cultivars showed differences between the evaluated times, except for CV. The AR and IP cultivars showed significant differences in averages at 24 hpi compared to the ones at 6 and 48 hpi, coming to express about six and eight times more than the control, respectively. Regarding MN, the highest average was also detected at 24 hpi, but this did not differ statistically from 6 hpi, only from 48 hpi. For the times between cultivars, there were differences only in 24 hpi, with AR and IP having the highest expression.
The values for Ca-LYK4 were the result of a single primer pair designed for two candidate sequences. In this receptor, the expression levels at 24 hpi differed within and between the cultivars evaluated. The IP cultivar obtained the highest average expression, reaching almost 19 times higher than that of the control, followed by MN, which expressed ninefold higher. The lowest averages for that time were observed for CV and AR, with an expression seven-and sixfold higher, respectively. For 6 and 48 hpi there was no difference within and between cultivars, the averages for those times reached at most twice as much.

Fungal PRRs in the C. arabica genome
Understanding basal immunity has been the focus of several studies with the purpose of identifying the mechanisms governing this line of defense, enabling its use as another tool in the search for plant resistance to pathogens [17]. The description of the reference PRRs and studies of the modulation of their gene expression in response to H. vastatrix, one of the most devastating pathogens in coffee trees, presents an advance for understanding this crop basal immunity. In the present study, fungal PRR candidate sequences well described in the literature for model plants such as Arabidopsis and rice were studied in C. arabica. We observed that there is more than one candidate sequence for each receptor studied, which may be the result of the ploidy of this species or duplication of these receptors, a common mechanism in plant genomes [56].
Each of the candidate sequences for LYK5 and LYP (CEBiP-like) presented higher percentages of identity with one of the C. arabica subgenomes. Therefore, it is possible to infer that those genes may have come from of each of the parental genomes (Table 4). Referring to LYK4, both candidate sequences showed greater identity with C. eugenioides subgenome, which can indicate duplication events. For CERK1 two sequences had a higher percentage of identity with a sequence from C. canephora subgenome (subclade I), and the other two (subclade II) with a sequence from C. eugenioides subgenome. For this receptor we can suggest that both events occurred. Besides to having a gene from each of the subgenomes, a duplication event of these genes may also have occurred in C. arabica (Variety Geisha). However, differences in the quality of C. arabica genomes (Geisha and Caturra red) can also interfere with this conclusion.
The size of the CDS and the organization of exons demonstrated that the genes encoding LYK4 and LYK5 candidate proteins in C. arabica do not have introns, and the coding sequences are the result of a single exon. In fact, when compared to CERK1 or CEBiP, these receptors are closer to each other in phylogenetic analysis. These results (Fig 4) corroborates with others described in the literature [53,57] and shows a greater evolutionary relationship between these receptors. Homologs of the At-LYK4 and At-LYK5 in many plant species have no introns and the coding region is the result of a single exon [24,[57][58][59][60]. For LysM receptors homologous to At-CERK1, the CDS region mostly presents around 1800 bp with ten to twelve exons [28,53,61], which is likewise with the size of the CDS and number of exons found for the CERK1 candidate sequences in coffee, except for the Scaffold 539.592, which presents a larger coding region, with 2511bp and 13 exons. However, this number of thirteen exons has also been found in Ps-LYK9, a CERK1-like gene in peas, which is involved in the control of plant immunity and symbiosis formation [61].
Regarding the genes LYPs (Receptor-like proteins or RLPs) such as Os-CEBiP, the number of exons reported is more variable from two to six [22,57,62]. In C. arabica, Scaffold 1196.90 and 439.212 presented four and five, respectively. The structural pattern of genes, such as the distribution of introns or exons in gene families, reinforces the ortholog identification between sequences since these are almost conserved among all orthologous. Minor differences may be due to evolutionary changes or errors in gene structure predictions [58].

Characterization of domains and motifs (LysM)
Proteins with LysM domain classified as LYKs (Receptor-like kinases or RLKs) are composed of lysin motifs (LysM)-containing ectodomains, a transmembrane domain and an intracellular kinase. LYP proteins (RLPs), on the other hand, present LysM ectodomain, but without intracellular kinase and can be anchored to the plasma membrane by a transmembrane domain or GPI-anchor [57,63]. The At-CERK1, At-LYK4 and At-LYK5 contain three extracellular LysM motifs, a transmembrane domain and intracellular kinase, while Os-CEBiP has two extracellular LysM motifs and GPI anchor [21][22][23]. The SMART and PredGPI analysis predicted that the amino acid sequences of the PRRs studied in C. arabica present a signal peptide, extracellular LysM motifs, a transmembrane domain, or a putative signal sequence for the GPI anchor, besides the presence or absence of intracellular kinase. These characteristics differentiate them into LYKs (Ca1 and 2 CERK1, Ca1 and 2 LYK5 and Ca-LYK4) and LYPs (Ca1 and 2 LYP) ( Fig  2) and suggest that they all act as membrane receptors.
As a result of the organization of the domains, these proteins have different protein sizes. LYKs are generally larger than LYPs because they have an additional kinase domain. Protein sequences reported for these classes of receptors are around 500 or 600 and 300 or 400 aa respectively [22,57,64]. Candidate sequences in coffee have equivalent sizes, except for Scaffold 539.592 with 836aa, which may be a consequence of the size of the coding region.
The PRR extracellular region varies in plant with sizes from 35 to 50 aa [56,57]. These regions define the type of recognized PAMP and its binding affinity in addition to the interaction between receptors and co-receptors [65]. Differences in the chitin-binding properties between At/Os-CERK1 ectodomains show variation in the performance of these receptors in Arabidopsis and rice. At-CERK1 and At-LYK5, for instance, bind directly to chitin through their ectodomains containing LysM motifs with different affinities to the ligand, while At-LYK4 appears to be a co-receptor [21,23,66]. In rice, Os-CERK1 does not bind to chitooligosaccharides and the heterodimerization between Os-CERK1 and Os-CEBiP is necessary for the innate immune response in this species [20,67]. Distinction in the role of these receptors suggests that plants use different chitin binding and signaling strategies [24,68].
In C. arabica, this region varied from 38 to 49 aa and the candidate sequences showed a high degree of identity and/or similarity with the reference LysM sequences used, indicating a conserved extracellular structure [53,55]. For CERK1, eight residues reported as important for chitin binding in Arabidopsis are present in the Scaffold 2193.164 and Scaffold 476.38 sequences (six identical and two similar), suggesting that they can bind chitin. However, complementary data are still needed to clarify which would be the primary receptor and co-receptor of the innate immunity in this species, and further studies of chitin-receptor and receptorreceptor interaction are required.

Joint phylogenetic analysis
PRRs are conserved in several plant species [58].This conservation indicates a fundamental importance of the PAMP recognition system [25]. The joint phylogenetic analysis showed that the sequences selected as candidates for CERK1 in coffee, were highly related to Md-CERK1, Md-CERK1-2, Ps-LYK9, Mm-LYK2, Vv-LYK1-1, Vv-LYK1-2, Os-CERK1 and At-CERK (Fig 4). All of these proteins have been described as being involved in the defense against fungal pathogens [20, 21, 28-30, 53, 61], suggesting that the studied sequences also participate in the defense responses against this group of phytopathogens. Among the species compared, tomato and grape have greater evolutionary proximity to coffee. Bti9 (Sl-LYK1), a CERK1 homolog in tomato, which grouped more closely to the Scaffold 2193.164 and 476.38 sequences (Ca2-CERK1) in this clade, presents an identity of 58.6% with At-CERK [69]. Candidate sequences in coffee, however, showed around 57% of identity ( Table 3).
The Bti9 (Sl-LYK1) in tomato interacts with AvrPtoB, effector in Pseudomonas syringae. The kinase region of this protein is the target and this results in blocking the PTI signaling [69]. Despite being described as a bacterial effector target, the study by Zeng et al., 2012 [69] or later reports by Xin and He, 2013 [70] did not describe the interaction of this protein with chitin or the transcriptional profiles regarding the response to fungal pathogens. Nonetheless, Bti9 is a membrane receptor with extracellular LysM motifs and high homology to At-CERK1. Furthermore, the At/Os-CERK1, besides playing a role as a receptor for fungal PAMPs, also participates as a co-receptor for PRRs in bacterial recognition [52,71], which demonstrates the multiple functions of this receptor and turns it into a possible target of bacterial and fungal effectors that suppress PTI.
The Ca1 and 2 LYK 4 and 5, clades II and III, were grouped to grape receptors Vv-LYK4-1/ 2 and Vv-LYK5-1 (Fig 4). These were shown to be highly expressed during infection by Botrytis cinerea in grapevine fruits [53]. The clustering of Bd-LYK4 in this clade corroborates the results presented by Tombuloglu et al., 2019 [57] for this PRR described in the Brachypodium genome, which presented a greater phylogenetic relationship to At-LYK5. In clade IV, the Ca1 and 2 LYP grouped, in addition to other homologs, to Mm-LYP1. The Mm-LYP1 is a receptor described in white mulberry, besides having a high affinity for chitin, it displays a significant increase in transcriptional profiles in fruits and leaves of mulberry infested with popcorn disease. The Mm-LYP1 interacts with Mm-LYK2, a homolog of At-CERK1, present in clade I and grouped with the candidate sequences for CERK1 in C. arabica. The Mm-LYK2 does not have a high affinity for chitin, but it functions as a co-receptor with intracellular kinase for the PTI signaling [30]. Additionally, in this clade, the Hv-CEBiP in barley, has been described for recognizing chitin oligosaccharides derived from Magnaporthe oryzae [27] and Mt-LYM2, in Medicago truncatula, demonstrated specific binding to biotinylated N-acetylchitooctaose in a similar way to CEBiP in rice [22,62]. Thus, the receptors cited for the phylogenetic groupings of this study reinforces the possible role of candidate sequences in C. arabica as PAMP receptors.

Transcriptional response of candidate receptors in C. arabica
The PAMPS are defined as highly conserved molecules from microorganisms and, therefore, have an essential function in their survival or fitness [72,73]. It is suggested that since PAMPs are essential for the viability or lifestyle of microorganisms, it is less likely that they avoid host immunity through mutation or deletion in these regions [14,74]. Chitin is a PAMP present in the fungal cell wall. Fragments of N-acetylquitooligosaccharides are released by the breakdown of this PAMP by plant chitinases during plant-fungus interactions. These fragments serve as elicitors for the innate immunity of plants by modifying the transcriptional levels of PRRs [22].
In this study, the expression increases were detected from 6 hpi, showing that all candidate PRR were stimulated after the inoculation of H. vastatrix. The highest averages of expression were observed at 24 hpi, for most receptors, followed by a decrease at 48 hpi (Fig 5). These results describe an initial stimulus with subsequent suppression. The experiments showed that at 24 hpi it is already possible to detect the penetration of the hypha produced by the appressorium of H. vastatrix in stomata of coffee leaves, both in resistant and susceptible genotypes and at 48 hpi the presence of haustoria is already observed [75][76][77]. In addition, a LRR receptorlike kinase described in this pathosystem has a peak expression at 24 hpi in compatible and incompatible interactions [78], thus suggesting that the signal exchange between the two organisms is already occurring in this period.
To inhibit PTI, some fungal pathogens secrete proteins containing LysM motifs that compete with plant receptors [79,80]. These proteins seem to impede the detection of chitin polymers or interfere with the functioning of essential molecules in the downstream signaling of basal immunity. It is assumed that the decrease in PRR expression in C. arabica leaves, observed at 48 hpi, may be related to the suppression of PTI signaling. Fungal effectors such as Ecp6, ChELP1/2 bind to chitin oligosaccharides released by the action of chitinases and prevent their recognition by the host PRR [79,81], while effectors like Avr4 protect chitin from fungal cell walls from degradation by host chitinase [82]. In addition, a study of the H. vastatrix secretome showed that effector candidates expressed in incompatible interaction (resistance) were more abundant within 24 hours, suggesting that these pre-haustorial effectors could be involved in the attempt to suppress PTI [83].
The expression results of the candidate receptors did not show difference in profiles between the groups of resistant and susceptible cultivars. Despite the IP showing high levels of expression at 24 hpi for the transcripts Ca1-LYP, Ca2-LYK5 and Ca-LYK4, the susceptible cultivar MN showed equivalent levels of expression for Ca1-CERK1 and Ca2-LYP or MN and CV showed comparable levels or even larger than the AR resistant cultivar for Ca2-CERK1, Ca2-LYP, Ca1-LYK5 and Ca-LYK4 (Fig 5). This result was expected, since the basal immunity is characterized by being broad-spectrum and non-specific [12,17]. The resistance of coffee to rust has been reported as pre-haustorial in some genotypes [77,84], in which resistant plants cease the growth of the fungus with mechanisms of pathogen recognition by resistance proteins. Thus, the difference between resistant and susceptible cultivars is generally evidenced in studies of expression of genes involved in pathogen-specific pathways and not in broad-spectrum receptors, such as PRRs [84].
Additionally, the recognition and signaling of PAMPs occurs when PRRs associate and act as part of multiprotein immune complexes on the cell surface [85,86]. Although they share common structural characteristics, these receptors are distinct in terms of recognized expression patterns and epitopes [23,25,52,62]. This shows that the receptors roles appear to have evolved independently in different groups of plants [25,71]. Therefore, considering that all candidate receptors in coffee, described in this study, increased their expression from 6 hpi in all evaluated cultivars, each one may have possible roles in the basal immunity of C. arabica.

Conclusion
The results indicate that candidate sequences in C. arabica have protein domains and motifs characteristic of fungal PRRs and are homologous to At-CERK1, At-LYK4, At-LYK5 and Os-CEBiP. Additionally, the expression of these genes was increased after the inoculation of H. vastatrix at all times and cultivars evaluated. Therefore, this study presents an advance in the understanding of the basal immunity of this species. Furthermore, the characterization of PTI receptors in C. arabica opens new perspectives and deserves further studies. Assays with purified chitin, for example, will allow to unveil in more detail the mechanisms of basal defense signaling in coffee, defining the binding affinities of this PAMP with each of the studied receptors, its co-receptors and components of the signaling pathway. Gene knockout studies will define the importance of these receptors in the defense of coffee against rust, in addition to clarifying the role of PTI signaling events to the specific ETI responses. Genetic engineering approaches to improve the role of these receptors in the defense response also represent a possibility. Increasing the binding affinity to the target PAMP, for example, can enhance broadspectrum resistance in coffee, not only to rust, but to other fungal pathogens.