Genome-wide identification of DCL, AGO and RDR gene families and their associated functional regulatory elements analyses in banana (Musa acuminata)

RNA silencing is mediated through RNA interference (RNAi) pathway gene families, i.e., Dicer-Like (DCL), Argonaute (AGO), and RNA-dependent RNA polymerase (RDR) and their cis-acting regulatory elements. The RNAi pathway is also directly connected with the post-transcriptional gene silencing (PTGS) mechanism, and the pathway controls eukaryotic gene regulation during growth, development, and stress response. Nevertheless, genome-wide identification of RNAi pathway gene families such as DCL, AGO, and RDR and their regulatory network analyses related to transcription factors have not been studied in many fruit crop species, including banana (Musa acuminata). In this study, we studied in silico genome-wide identification and characterization of DCL, AGO, and RDR genes in bananas thoroughly via integrated bioinformatics approaches. A genome-wide analysis identified 3 MaDCL, 13 MaAGO, and 5 MaRDR candidate genes based on multiple sequence alignment and phylogenetic tree related to the RNAi pathway in banana genomes. These genes correspond to the Arabidopsis thaliana RNAi silencing genes. The analysis of the conserved domain, motif, and gene structure (exon-intron numbers) for MaDCL, MaAGO, and MaRDR genes showed higher homogeneity within the same gene family. The Gene Ontology (GO) enrichment analysis exhibited that the identified RNAi genes could be involved in RNA silencing and associated metabolic pathways. A number of important transcription factors (TFs), e.g., ERF, Dof, C2H2, TCP, GATA and MIKC_MADS families, were identified by network and sub-network analyses between TFs and candidate RNAi gene families. Furthermore, the cis-acting regulatory elements related to light-responsive (LR), stress-responsive (SR), hormone-responsive (HR), and other activities (OT) functions were identified in candidate MaDCL, MaAGO, and MaRDR genes. These genome-wide analyses of these RNAi gene families provide valuable information related to RNA silencing, which would shed light on further characterization of RNAi genes, their regulatory elements, and functional roles, which might be helpful for banana improvement in the breeding program.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 characterized in many economically important plant species such as 28 genes in maize (Zea mays) [18] and tomato (Solanum lycopersicum) [35], 32 genes in rice (Oryza sativa) [ Banana (Musa acuminata) is a perennial, monocotyledonous major fruit crop grown all over the tropical and sub-tropical country, especially in the African, Asia-Pacific, and Latin American and Caribbean regions [58,59]. Banana contains vitamins (vitamin A, vitamin C, vitamin B6, and vitamin B12), antioxidants, minerals, fiber, starch, sugar and cellulose. Previous studies showed the significant impact of bananas antioxidant on various diseases, for example, anti-hypertension, anti-cancer and anti-diabetes, anti-coronary disease, anti-diarrhea, and defense against infectious disease [60,61]. Thus, various agronomic traits of bananas such as biotic and abiotic stress, disease and pest resistance, and fruit quality are of considerable interest to plant breeders. However, the improvement of bananas through the breeding program has been a great challenge to the breeder for various reasons. Therefore, genetic engineering approaches play a vital role in crop improvement.
So far, characterization and expression analysis of target RNAi pathway genes have comprehensively been conducted in many plant species such as rice [ [48]. However, this approach has limitations in skillful human resources, a well-equipped laboratory, extended time, and experimental budgets. Despite the extensive analysis of RNAi genes using laboratory-based experiments, we can obtain their genome-wide information from many plant species using the knowledge of integrated bioinformatics approaches which might save cost, labor, and longer time demand.
Therefore, in the present study, we performed a comprehensive in silico analysis for genome-wide identification of RNAi pathway gene families DCL, AGO, and RDR through bioinformatics approaches such as sequence similarity, phylogenetic relationship, gene structure (domain, motif, and exon-intron numbers), chromosomal localization, GO, sub-cellular localization, regulatory network and sub-network analysis between TFs and candidate genes, prediction of cis-acting regulatory elements of TFs in the banana genome. The complete banana genome sequence [69] provides us an excellent opportunity to identify the putative genes related to the RNAi pathway in the entire banana genome, which would be a useful resource in banana improvement programs in the future. We have explained our proposed approach graphically in Fig 1.
The retrieved candidate protein sequences from M. acuminata were downloaded with the significant alignment score (�50) based on BLOSUM62 matrix, identity percentage (�30%), coverage percentage (�70%) and the significant E-values (�10E-10). Only the primary transcript of the sequences was considered to avoid the redundancy of protein sequences in this study. Genomic information, including the primary transcript, genomic length, chromosomal position of a gene, and length of the open reading frame (ORF), protein length, was downloaded from the M. acuminata genome database deposited in Phytozome. The molecular weight of the identified protein sequences was determined and predicted by using the ExPASy-ComputepI/Mwtool (https://web.expasy.org/). Identified DCL, AGO, and RDR genes in M. acuminata genome were named according to the nomenclature based on the phylogenetic of the similar gene family members of the previously named A. thaliana genes.

Multiple sequence alignments and phylogenetic analysis
The multiple sequence alignments of DCL, AGO, and RDR protein sequences of both M. acuminata and A. thaliana were conducted by using the Clustal-W method [71] through the MEGA 11.0 software [72]. Finally, the phylogenetic tree analysis was carried out using the Neighbor-joining method [73] implemented on the aligned sequenced and the 1,000 bootstrap-replicates were used to check this evolutionary relationship. The evolutionary distances were computed using the Equal Input method [74].

Conserved domain and motif analysis
To investigate the conserved domains of DCL, AGO, and RDR gene families in M. acuminata, protein sequences were retrieved and analyzed using the protein family database (Pfam, https://pfam.xfam.org/) (Fig 1). The maximum number of significant functional conserved domains of M. acuminata (MaDCL, MaAGO, and MaRDR) that are similar to the A. thaliana AtDCL, AtAGO, and AtRDR proteins were selected.
We used the Multiple Expectation Maximization for Motif Elicitation (MEME) webserver to investigate the conserved motifs in all of the predicted DCL, AGO, and RDR proteins (http://meme.sdsc.edu/meme4_3_0/cgi-bin/meme.cgi) [75]. This analysis was performed using the following parameters (i) an optimum motif width as �6 and �50; (ii) a maximum number of motif of 20. Any Motifs that did not match the structural domains in each protein family were rejected.

Gene structure and chromosomal localization analysis
The gene structure (domain structure, exon-intron organization) of predicted genes were analyzed using the online program Gene Structure Display Server (GSDS2.0: https://gsds.cbi.pku. edu.cn) [76]. Furthermore, to compare the exon-intron structure of the predicted genes in M. acuminata, the selected gene structures were compared with the A. thaliana gene structure. Chromosomal location of predicted MaDCL, MaAGO, and MaRDR genes was mapped using the online tool MapGene2Chromosome V2 (http://mg2c.iask.in/mg2c_v2.0/) and considered the physical position of the genes in different scaffold locations.

Gene ontology and sub-cellular localization analysis
To determine the relationship of predicted RNAi-related genes with the different clusters of molecular pathways, the GO analysis was performed using the online tool of Plant Transcription Factor Database (PlantTFDB, http://planttfdb.cbi.pku.edu.cn//) [77]. We determined the corresponding p-values by the Fishers test and Benjamini-Hochberg's correction. The p-value <0.05 was considered to be a statistically significant level for GO enrichment results of the respective predicted genes. We predicted sub-cellular location of identified MaDCL, MaAGO, and MaRDR proteins into the various organelles of the cell using their protein sequences by a powerful and high accuracy web server Plant Sub-cellular Localization Integrative Predictor (PSI) [78].

Regulatory network analysis between TFs and RNAi related genes in M. acuminata
To study the regulatory relationship and network analysis between TFs and RNAi-related genes in M. acuminata, we extensively studied the PlantTFDB (http://planttfdb.cbi.pku.edu. cn//) [77]. Initially, we identified the TFs, which were closely associated with RNAi-related genes in M. acuminata. Then, constructed a regulatory network and visualized the network using regulatory network visualization tool Cytoscape 3.7.1 [79].

Promoter cis-acting regulatory elements analysis
To investigate the promoter cis-acting regulatory elements of MaDCL, MaAGO, and MaRDR gene families, upstream region (1.5 kb genomic sequences) of the start codon (ATG) of each RNAi gene were retrieved. Then, we analyzed the stress-response associated promoter cis-acting regulatory elements through online prediction analysis using Signal Scan search program from the Plant CARE database (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/) [80]. We classified the analyzed promoter cis-acting regulatory elements into five groups; LR, SR, HR, OT, and unknown function. The known function and reported promoter cis-acting regulatory elements of MaDCL, MaAGO, and MaRDR are shown separately.

In silico identification of DCL, AGO, and RDR genes in banana genome
To identify banana RNA silencing genes, AtDCL, AtAGO, and AtRDR retrieved protein sequences were used as query sequences to construct a Hidden Markov Model (HMM) (Fig 1). We identified three genes encoding DCL proteins (MaDCLs), 13 thirteen genes encoding AGO proteins (MaAGOs), and five genes encoding RDR proteins (MaRDRs) in the banana genome database based on the HMM profile analysis. The identified RNA silencing genes, their chromosomal location, and structural features (ORF length, gene length and intron number), protein profile (molecular weight of the encoded protein, isoelectric point (pI) are presented in Table 1. All conserved domains; DEAD, Helicase-C, Dicer-dimer, PAZ, RNase III, and DSRM were predicted in the polypeptide sequence of the three MaDCL loci which confirmed the identity of plant DCL family. The identified MaDCLs ORF ranged from 14058bp to 15156bp, corresponding to MaDCL1 (GSMUA_Achr8T12350_001) and MaDCL3 (GSMUA_Achr5T10880_001) with potentially encoded amino acids (aa) 1818 and 1598 aa (Table 1). According to the pI values of the MaDCLs proteins, MaDCL1 and MaDCl2 showed the acidic properties, whereas only MaDCl3 demonstrated the highest pI value 8.19 which corresponds to the basic properties.
On the other hand, identified all 13 MaAGO genes exhibited an N-terminal PAZ and a Cterminal PIWI conserved domain in their polypeptide sequences which are the major components of the plant AGO protein family. The identified MaAGOs ORF ranged from 2841 to 15822 bp, which corresponded to MaAGO7 (GSMUA_Achr7T26400_001) and MaAGO5 (GSMUA_Achr6T04510_001) with the potential encoded 637 and 1237 aa ( Table 1). The genomic length of the identified MaAGO genes varied from 2841 bp to 15822 bp, which are produced by the MaAGO genes, MaAGO7 (GSMUA_Achr7T26400_001) and MaAGO5 (GSMUA_Achr6T04510_001), respectively. The coding potentiality of MaAGO7 and MaAGO5 are 637 aa and 1237 aa, respectively. The pI values of the predicted all MaAGOs proteins showed higher basic properties (pI value 8.81~9.46).
Our HMM analysis predicted the RdRP conserved domain in MaRDR gene family. The genomic length of the five identified MaRDRs varied from 3754 bp to 13305 bp, which are corresponding with MaRDR1b (GSMUA_AchrUn_randomT26010_001) encoded protein length 964 aa and 460 aa, respectively. The pI values of the MaRDRs proteins demonstrate that the proteins are more likely to be basic, where only the MaRDR6 has the highest pI value of 8.47, which showed the basic properties.

Multiple sequence alignment of DCL, AGO and RDR proteins in banana and Arabidopsis
We obtained the multiple sequence alignment by aligning the predicted MaDCL, MaAGO, and MaRDR protein sequences to the reference sequence of AtDCL, AtAGO and AtRDR (Figs  2-4). The alignment results revealed that the RNase III catalytic sites of the predicted MaDCL proteins in the two RNase III domains at the glutamate (E), aspartate (D), aspartate (D), glutamate (E) (EDDE) position with the orthologs of AtDCLs except for AtDCL3 where aspartate (D) replaced by glutamine (Q) (Fig 2). The three metal-chelating conserved catalytic residues (D = aspartate, D = aspartate, and H = histidine) are found in the PIWI domain, which was first identified in AtAGO1 [45] and responsible for endonuclease activity [83,84]. Further, the alignment of 10 AtAGOs and 13 MaAGOs proteins demonstrated the conserved DDH triad residues of PIWI domains of both AGOs (Fig 3). Moreover, sequence alignment of AtRDRs with the predicted MaRDRs proteins exhibited the DxDGD catalytic motif of the RdRP conserved domain (Fig 4).
The DDH/H motif was detected in the predicted proteins of MaAGO1a, MaAGO1b, MaA-GO1c, MaAGO1d, MaAGO1e, MaAGO10a, MaAGO10b, and MaAGO10c, which are similar to AtAGO1 and AtAGO10 proteins ( Table 2). The DDH/P motif was found in MaAGO4, MaAGO6a, and MaAGO6b proteins, which are similar to AtAGO6 proteins but dissimilar to AtAGO4 proteins (DDH/S) ( Table 2). The DDH/N motif was only observed in MaAGO5 proteins, which are dissimilar to AtAGO5 proteins (DDH/H) ( Table 2). Another motif DEH/H was predicted in MaAGO7 proteins, whereas the DDH/H motif was identified in AtAGO7 proteins ( Table 2). In this study, MaAGO4 and MaAGO5 proteins catalytic residues histidine (H) at 786 th position were replaced to proline (P) and asparagines (N), which are replaced by the fourth serine (S) and histidine (H) residue at 798 th position (Table 2). Additionally, MaAGO7 represented one replaced PIWI domain catalytic residues in the second glutamate (E) at 786 th position was replaced by the aspartate (D) residue at 895 th position ( Table 2). Motif analysis results revealed that DDH catalytic residues structure of PIWI domains is not fully conserved in all MaAGOs proteins in bananas.
In Arabidopsis, the PIWI domain of AGO proteins contains DDH/H, DDH/S, and DDH/P motif, which are necessary for their in vitro endonuclease activity [83,85,86]. Replacement in the DDH/H conserved motif of PIWI domains in the identified MaAGOs proteins based on the motif analysis due to natural mutation or genetic diversity in the MaAGOs populations. Replacement of aa residues in MaAGOs proteins may reflect reduced endonuclease activity, or replaced aa residues may play significant gene functions in bananas. We can confirm the gene function changes by introducing the reporter genes together with the MaAGO genes and observing their expression analysis in transient expression assay using model plant species such as Nicotiana, Arabidopsis. So, further gene function analysis is required to understand the proper functions of the PIWI domain with replacements in catalytic residues of MaAGO proteins.

Phylogenetic relationship of DCL, AGO and RDR proteins in banana and Arabidopsis
We determined the evolutionary relationship between DCL, AGO and RDR proteins of banana and Arabidopsis using each full-length protein sequence by phylogenetic tree analysis

PLOS ONE
Identification of DCL, AGO and RDR gene families and their associated functional regulatory elements in banana (S1-S3 Files and Fig 5A-5C). The phylogenetic tree analysis results revealed that three MaDCL proteins (MaDCL1, MaDCL3 and MaDCL4) were clustered into three Groups (Group I-III) along with their corresponding DCL proteins in Arabidopsis with well-supported bootstrap values (S1 File and Fig 5A). The MaDCL1 and MaDCL3 proteins were closely clustered with AtDCL1 and AtDCL3 in Group II and Group III, respectively. The MaDCL1 and MaDCL3 comprised proteins are DCL1 and DCL3 subfamily on the basis of higher sequence similarity with the AtDCL1 and AtDCL3, respectively. The MaDCL4 gene is clustered with AtDCL4 and AtDCL2, which is included in Group I. However, it is noted that MaDCL4 is DCL4 subfamily based on higher sequence similarity with the AtDCL4 gene. The members of the DCLs play key roles in sRNA biogenesis process and are involved in leading the long dsRNAs into mature sRNAs [12,14]. Specifically, based on AtDCL1 protein functions, we can assume that MaDCL1 may be involved in development, environmental stress condition and flowering mechanism [1,43,44,87]. According to AtDCL2 and AtDCL3 functions, we can expect that MaDCL3 will be regenerating the siRNAs and trans-acting small interfering RNA (ta-siRNAs) participate in vegetative phase development, disease resistance and flowering mechanism [87,88]. We can implicit based on the AtDCL4 function that MaDCL4 will be related with ta-siRNA metabolism and acts on RNA-dependent methylation (RdDM)-mediated epigenetic maintenance during post-transcriptional silencing [22,89].
According to the phylogenetic tree analysis, AGO genes of flowering plant species are mainly divided into three clusters; Cluster 1 (AGO1/5/10), Cluster 2 (AGO2/3/7), and Cluster 3 (AGO4/6/8/9), which is similar to our study [90]. We identified 13 AGO genes from M. acuminata that were classified into five Groups (Group I-V) (S2 File and Fig 5B). In group I, five banana proteins named MaAGO1a, MaAGO1b, MaAGO1c, MaAGO1d, and MaAGOe were clustered with AtAGO1. The five banana proteins are AGO1 subfamily on the basis of higher sequence similarity with the AtAGO1. Group II comprises of three banana proteins (MaA-GO10a, MaAGO10b and MaAGO10c) together with AtAGO10 proteins from Arabidopsis. The three banana proteins are similar to AGO10 subfamily according to higher sequence   The phylogenetic tree analyses also revealed four groups of RDR genes (Group I-III) (S3 File and Fig 5C). The RDR genes obtained from bananas were designated as MaRDR1a, MaRDR1b, MaRDR2, MaRDR5, and MaRDR6. Group I includes three genes from banana (MaRDR1a, MaRDR1b and MaRDR2) and two genes from Arabidopsis (AtRDR1 and AtRDR2), but MaRDR1a and MaRDR1b is RDR1 subfamily on the basis of higher sequence similarity with the A. thaliana AGO protein AtRDR1. In addition, the MaRDR2 protein exists in the RDR2 subfamily based on the sequence similarity with the A. thaliana AGO protein AtRDR2. Similarly, the MaRDR6 protein is grouped (Group II) with the AtRDR6 protein and contained RDR6 subfamily according to the sequence similarity with AtRDR6. In Group III, the MaRDR5 protein is clustered with AtRDR3, AtRDR4, and AtRDR5 genes but closely clustered with AtRDR5. The MaRDR5 is included in the RDR5 subfamily on the basis of higher sequence similarity with a single Arabidopsis protein AtRDR5. RDRs proteins can regenerate dsRNAs from ssRNA to initiate a signal for RNA silencing mechanism [95,96]. Based on the AtRDR1 function, we can implicit that MaRDR1 could be promoted by a viral infection or salicylic acid and major components for RNA silencing pathway, antiviral defense, and transgenes silencing in many plants species [38,57,97,98]. In relation to AtRDR2 function, we can assume that MaRDR2 may be involved in generating siRNA and associated with chromatin modification [21,99]. According to the AtRDR6 function, we can expect that the MaRDR6 could produce the ta-siRNA precursor and serve in antiviral defense by degradation of RNA molecule [100].
In our analysis, MaDCL2, MaAGO2, MaAGO3, MaAGO8, MaAGO9, MaRDR3 and MaRDR4 were found to be absent in the whole banana genome. These results indicate their functional diversity which would be helpful for further banana improvement.

Conserved domain and motif analysis of DCL, AGO and RDR proteins in banana and Arabidopsis
The domain analysis results showed that most functional domains are well conserved in the DCL, AGO, and RDR families from banana and Arabidopsis (Fig 6). The MaDCL proteins demonstrated all significant conserved domains; DEAD/Res III, Helicase-C, Dicer-dimer, PAZ, RNase III, and DSRM (Fig 6). The previous studies revealed that these predicted domains play crucial roles in protein activity in plants [22,101,102]. Previously it has been found that the combined activity of two DCL genes has played a crucial role in plant defense against viral infection [103]. The DCLs proteins are responsible for the cleavage of dsRNAs into 21-24 nucleotide long sRNAs. The PAZ domain of DCL proteins mainly functions to bind siRNA and dsRNA, which is cleaved by the two RNase III catalytic functional domains. The endonuclease enzyme-containing RNA-induced silencing complex (RISC) is given by these sRNAs, which encourage the AGO proteins to debase the target homologous RNAs with the arrangement integral to the sRNAs [43,45]. These are also engaged with gene silencing at a transcriptional level by executing chromatin reorganization [21,104].
AGO proteins are mainly characterized by two domains; an N-terminal PAZ, and a C-terminal PIWI domain [10,24,[105][106][107]. Both PAZ and PIWI domains are predicted in all the MaAGO proteins. The predicted AGOs functional domains demonstrated the similarity with the previously reported AtAGO proteins [84]. Previous studies have reported that the PAZ and PIWI domain of AGOs plays a critical role in RNase activity [24,108,109]. Both domains had identical homology with RNase H, which binds to the 5' end of the siRNA of the target RNA and cleaves the target RNA, thereby demonstrating that the sRNAs are complementary sequences [25,85]. The predicted PAZ and PIWI conserved domains of MaAGO proteins which might play important functional role in synthesizing the double-stranded RNA into single-stranded RNA and stimulate the target RNA degradation process [25,108,110]. The Glyrich Ago1 domain predicted in MaAGO1a, MaAGO1b, and MaAGO1c proteins is similar to AtAGO1. The Gly-rich Ago1 domain coordinates the binding with the ribosome to enhance AGO protein stimulation for the RNA silencing process [111]. Beside these, the Tify, CCT, and GATA domains are found only in the MaAGO5 protein. The Tify domain is a highly specific conserved domain in plants. Binding with the CCT and GATA, Tify domain is defined as a novel TFs that regulate various developmental processes and respond to biotic and abiotic stresses in plants [112][113][114]. MaAGO5 protein could possibly perform a major role related to various stresses in bananas which will be clarified by further characterization of this protein in detail.
RDRs are involved in starting a new RNAi silencing process by synthesizing dsRNAs using a single-stranded RNAs (ssRNAs) as templates. A single conserved domain RdRP is present in RDR proteins which possesses a catalytic β' subunit of RdRP motif [33,45, [115][116][117]. We predicted the typical RdRP domain in all MaRDR proteins in our analysis, which are showed the similarity with RdRP conserved domain of AtRDRs.
Prediction of motifs insight into a protein sequence has the critical clues to further characterize their functional regulatory roles for gene expression [75]. We predicted typical well-distributed motifs and conserved them in MaDCL, MaAGO, and MaRDR proteins (Fig 7).
We observed a maximum of 20 motifs in both proteins of MaDCL and AtDCL. Possibly, MaDCL1 will appear highly functional as like AtDCL1. However, MaDCL1 comprises of 20 motifs that are similar to the paralogs AtDCL1. The MaDCL3 and MaDCL4 contained 17 and 15 motifs, whereas 16 and 19 motifs were found in AtDCL3 and AtDCL4 proteins, respectively. We were not found motif 13 of DEAD domain in MaDCL3. Moreover, motif 6, 8, 13, and 17 of DEAD domain in MaDCL4 were absence as compared with the AtDCL4 protein.
The absence of DEAD motifs in both MaDCL3 and MaDCL4 will possibly show a functional diversity between Arabidopsis and banana.
On the other hand, we also predicted maximum of 20 motifs in MaAGOs. We found all 20 motifs in MaAGO1 (MaAGO1a, MaAGO1b, MaAGO1c, MaAGO1d, and MaAGO1e), and MaAGO10 (MaAGO10a, MaAGO10b, and MaAGO10c), which exhibited higher conservation with their paralogs AtAGO1 and AtAGO10s. These results reflect that the MaAGO1 and MaAGO10 proteins are highly homologous and will show functional similarities like AtAGO1 and AtAGO10s. However, some motif variability was observed in AGO4, AGO5, AGO6a, AGO6b, an AGO7 between the Arabidopsis and banana. The motif 20 of ArgoL1 domain was not detected in AtAGO4 but present in MaAGO4. The motif 12 of PAZ domain also was not found in AtAGO5, whereas present in the MaAGO5. However, motif 20 of ArgoL1 domain of in MaAGO5 was absent in this analysis. The motif 17 of ArgoN domain was not detected in MaAGO6a, and MaAGO6b. Another motif 20 of ArgoL1 domain was absent in AtAGO6, whereas this motif was present in the MaAGO6a only. Importantly, motif 18 of PAZ domain was not found in AtAGO6 but were present in MaAGO6a and MaAGO6b in our analysis. The presence of conserved catalytic PAZ domain could lead to enhance the function of target RNA

Analysis of gene structure and chromosomal location of DCL, AGO and RDR proteins in banana and Arabidopsis
The predicted MaDCL, MaAGO, and MaRDR genes showed well-conserved gene structure having the similarity with the reference Arabidopsis genes according to the gene structure analysis (Fig 8). The exon-intron numbers of predicted MaDCLs displayed higher numbers compared to the AtDCLs except for MaDCL4 (Fig 8 and Table 1). The MaDCLs intron numbers [12][13][14][15][16][17]21,[23][24][25]104] demonstrated similarity with AtDCLs. Out of 13 MaAGO genes, 12 MaAGO genes exhibited 22-29 intron, except for MaAGO7 that has only 8 introns (Fig 8). The MaAGOs showed maximum variable numbers of intron (8-30), which were closely similar to the gene structures of AtAGOs. On the other hand, five MaRDR genes displayed 5-10 numbers of the intron in their gene structure, which are shown similarity with AtRDR gene structure except for MaRDR5 (Fig 8). A higher similarity of MaDCL, MaAGO, and MaRDR gene structures with their orthologs Arabidopsis suggesting their closely similar functional roles in RNAi pathway.
The chromosomal localization analysis results demonstrated that mapped MaDCL, MaAGO, and MaRDR genes were localized in 21 different scaffolds across the 11 independent chromosomes, including one unknown chromosome of the banana's entire genome (Fig 9,  Table 1). The MaDCLs, MaAGOs, and MaRDRs showed a unique scaffold position and were distributed throughout the 11 chromosomes of the banana genome.

PLOS ONE
Identification of DCL, AGO and RDR gene families and their associated functional regulatory elements in banana diverse pattern due to their close genomic location, and further study can be performed under various stress conditions.

Analysis of gene ontology of DCL, AGO and RDR proteins in banana
We hypothesized the biological functions of the predicted RNAi-related genes in detail through GO analysis (Fig 10 and S4 File). The GO analysis results indicated that 9 genes (MaDCL1, MaDCL3, MaDCL4, MaAGO1b, MaAGO4, MaAGO7, MaRDR1a, MaRDR2, and MaRDR6) took part in gene silencing functions (GO: 0016458; p-value: 5.60E-18) and PTGS   60E-21). We drew the Ven Diagram to observe the shared GO terms for three clusters of the RNAi-associated gene families considering biological process, cellular components, and molecular functions (Fig 10). The Ven Diagram analysis results showed that many MaDCL, MaAGO, and MaRDR genes are common in the GO pathway. According to our study, MaDCL, MaAGO, and MaRDR genes are appeared to be involved in 96 biological processes (Fig 10D). We also observed that predicted RNAi-related genes are shown 11 and 3 common groups of GO pathways in the case of cellular components and molecular functions in bananas, respectively (Fig 10E and 10F). These results suggested that large numbers of RNAi genes are associated with different biological functions, cellular components, and molecular functions in bananas.

The sub-cellular localization of the predicted proteins in banana
The biological processes of a eukaryotic cell are linked with the sub-cellular localization of specific proteins. The cellular location of protein helps us to understand their functional roles at the cellular level [120,121]. Sub-cellular localization analysis revealed that identified MaDCL, MaAGO, and MaRDR proteins were localized only in the nucleus, plasmamembrane, cytoplasm, and mitochondria (Fig 11). It was observed that MaDCL1 protein is distributed in the nucleus and cytoplasmic region. However, MaDCL3 proteins appeared only in plasma-membrane. On the other hand, MaDCL4 proteins occur in the nucleus, cytoplasmic region, and plasma membrane. Surprisingly, all MaAGO proteins are predicted to be localized only in the nucleus. Among the 5 MaRDR proteins MaRDR1a and MaRDR6 are present in the nucleus only. MaRDR1b protein is predicted in both nucleus and cytoplasm organelles. The MaRDR2 protein is distributed in the nucleus and plasma membrane. Besides this, MaRDR5 protein is abundant in mitochondria and plasma membranes. Based on protein sequences, a computational method for sub-cellular localization of DCL, AGO, and RDR proteins has been conducted in C. sinensis and predicted them in cell organelles, such as nucleus, cytoplasm, plasma membrane, mitochondria, and plastid [5]. The PTGS and transcriptional gene silencing process occur in the cytoplasm of a cell for targeted mRNA degradation [122,123]. The RNAi proteins are directly involved in RISC-mediated cleavage activities in PTGS process by DCL, AGO, and RDR proteins [119]. The occurrences of PTGS in the cytoplasmic region indicate that candidate protein dominantly participates in this process [122,123]. Moreover, previous studies revealed that Arabidopsis RNAi proteins AGO4 and DCL3 localized in the nucleus and coordinately leading the RNAi silencing process [124]. Our computational-based prediction

PLOS ONE
Identification of DCL, AGO and RDR gene families and their associated functional regulatory elements in banana provides important clues associated with the functions of identified DCL, AGO, and RDR proteins in the RNAi pathway.

Regulatory relationship between transcription factors and RNA interference genes in banana
TFs play a key role in various biological processes in living organisms, in particular, in plants. The plant TFs are involved in regulating diverse functions, e.g., responses to biotic and abiotic stresses, growth, development, metabolism, and defense against microbial infection [125][126][127][128][129]. In plants, TFs act as a molecular switch or key regulators of several functional genes that are expressed under particular stress, growth, and developmental conditions. There are various TFs such as MYB, CBF/DREB1, HSF, AP2/EREBP, Dof, ERF, NAC, MIKC_MADS, WRKY, bZIP ERF, TGA6, and BOS1 families that exist in plants and functions under various stresses and developmental conditions [128][129][130][131][132][133][134].
We identified a total of 180 TFs which can regulate the candidate RNAi genes in the banana genome (S5 File). Based on the TFs families, identified TFs were divided into 22 groups. Among the TFs families, ERF, Dof, C2H2, TCP, and GATA families included 21, 8, 7, 6, and 5 TFs and calculated 54.65% of the total identified TFs (S5 File). Our analyzed results indicate that identified TFs could play significant roles in regulating RNAi genes. Based on network analysis, the identified TFs family showed a unique structure and linked to the candidate RNAi genes (Fig 12A). Likely, the ERF is dominantly associated with RNAi gene MaAGO5 (Fig 12B, S5 File). In addition, the ERF is also related to MaDCL1, MaAGO10b, and MaA-GO10c, which are liked to MaAGO5 (Fig 12B).
We also analyzed the sub-network relationship between TFs and predicted MaDCL, MaAGO, and MaRDR genes (Fig 13). ) and MaRDR5 genes. By the node degree analysis, we identified five hub TFs which contained at least five associated predicted RNAi genes (Fig 13). Our identified hub TFs interacted with twelve RNAi genes. Among the twelve RNAi genes corresponding to five hub TFs, one is MaDCL, seven are MaAGO and four are MaRDR. Out of five hub TFs, 3 belonged to Dof TFs family, one is C2H2 and one is MIKC_MADS TFs family (Fig 13). Previous studies suggested that Dof TFs family is associated with the DNA-binding activities in the N-terminal and C-terminal region of target RNAi genes and they regulated the target gene expression in several plant species. The Dof TFs family is involved in i) controlling the phenylpropanoid and glucosinolates metabolism, ii) influencing the seed germination, iii) controlling stress tolerance and iv) flowering time-period [135][136][137][138][139]. A transcriptional

PLOS ONE
Identification of DCL, AGO and RDR gene families and their associated functional regulatory elements in banana repressor, MaDof23 physically interacts with MaERF9 and act as a regulator of fruit ripening which could be associated with regulation of others genes related to cell wall degradation and aroma formation in banana [140]. The MYB TFs family is found in both animals and plants. MYB TFs family in plants is considered to be the largest TFs family which contained MYB domain (a 52-53 aa residues motif) located in their N-and C-terminal region and have important role in metabolism, biotic and abiotic stress, and defense against pathogen attack [141][142][143]. An R2R3-MYB TF MaMYB3 showed the fruit ripening activity through the modulation of starch degradation process [144]. In various plant species, the WRKY family is believed to be involved in i) the modulation of antagonistic interaction between salicylic acid and jasmonic acids signaling and ii) the enhancement of defense against neurotropic pathogens [145,146]. Previous studies have shown that MaWRKY52 exhibited resistance against a major destructive nematode, Pratylenchus coffeae of banana [147]. However, it is important to note that the WRKY TFs family is also considered as anti-microbial signals molecules [148]. Prakash and Chakraborty revealed that various TFs, including MYB, WRKY, and NAC TFs could regulate the expression of RDR gene families under different stress conditions [149]. These TFs are involved mainly in plant growth, development, and response to a variety of stress conditions [148,[150][151][152]. Plant calmodulins and calmodulin-related proteins play significant roles in regulating defense-related genes interacting with various TFs such as MYB, WRKY, and NAC [153,154]. An overexpression of MusaNAC042 demonstrated a positive correlation related to salinity and drought resistance in Agrobacterium-mediated transgenic banana [155]. Another important TFs family, MIKC_MADS also involved in the transcription of several RDR genes. For example, the OsRDR1 gene could enhance the response to the rice stripe virus (RSV) in rice (O. sativa) [156]. Two banana MADS-box family members, MaMADS24, and MaMADS49 have shown the interactions with various proteins such as hormone-response proteins, ethylene signal transduction and biosynthesis-related proteins, starch biosynthesis proteins and metabolism-related proteins associated with fruit development and ripening [157]. The regulatory network and sub-network showed that RNAi process of predicted putative genes in M. acuminata represents a diagrammatic evolutionary model that could be explored in detail through the characterization of these predicted genes. Further gene function analysis studies are required to investigate the biosynthesis pathway of calmodulins and calmodulin-related proteins, which could be involved in RNA-related pathways in M. acuminata.

Prediction of cis-acting regulatory elements of DCL, AGO and RDR proteins in banana
The cis-regulatory elements (CAREs) are typically non-coding DNA composed of the short motif (5-20 bp), which is located in the promoter region of target genes [158,159]. By binding the target sites of CAREs, TFs and transcriptional regulators (up-regulator/down-regulator) control the transcriptional process and act as gene regulators [159]. Recently, a large number of sequencing data on economically important crops are being deposited each year through the advancement of high-throughput genome sequencing techniques [160]. Therefore, we can easily access the database and search these functional regulatory elements with specific gene functions within their DNA sequence (typically promoter and enhancer region) by using integrated bioinformatics techniques. We determine the various important motifs, their functional roles, and diversity of the predicted DCL, AGO, and RDR genes in banana by CAREs analysis (S6 File; Fig 14).
Analyzed results showed that LR, SR, HR, and OT-associated motifs were present in the upstream regulatory regions of RNAi genes (S6 File; Fig 14). Photosynthesis is a crucial physiological parameter and involved with the light response, which occurs in the leaves tissue of bananas [161]. Among the motifs, the LR motifs were abundant in the upstream regulatory regions of RNAi genes when compared with the other motifs. LR motifs, 3AF1 binding site, AAAC motif, ACA motif, AT1 motif, ATC-motif, ATCT-motif, box-II, chs unit1m1, GA motif, GATT-motif, GTGGC motif, LAMP element, L-box, LS7, and 3AF3 binding site were dominantly shared by the huge numbers of predicted RNAi genes in banana. Besides this, some other important LR motifs were recognized in this study, such as ACE, box-4, CAG motif, chs CMA1a, chs CMA2a, Gap box, GATA motif, G-box, G box1, GT1 motif, MRE, Sp1, TCCC motif, TCT motif which have been shared their CAREs by RNAi genes in banana. Previous studies have shown that these predicted LR-related motifs are greatly involved in the light response of different species [162][163][164][165]. An in silico analysis of DCL, AGO, and RDR gene families in C. sinensis also identified almost similar LR motifs, which were predicted to be involved in the leaves' photosynthesis process [5]. Therefore, predicted motifs related to LR

PLOS ONE
could play a significant role in the photosynthesis mechanism in banana leaves, which can increase grain quality and productivity.
We detected AT-rich sequence, boxII-like sequence, HD Zip1, HD Zip3, MBS1, motif1, MSA-like, Non-box, RY elements associated with various plant biological functions are highly shared by many RNAi genes predicted in banana (Fig 14). Plant hormones are also called plant growth regulators, which individually or coordinately play regulatory roles in plant growth and development activities [166][167][168][169]. These plant growth regulators have important biological functions in seed germination, plant growth, development and metabolism activities [158,[170][171][172][173][174]. We also predicted various plant HR motif such as O 2 site, P-box, TGA elements, Aux RR core, GC motif, TATC box, TCA elements, TGA box, which are shared by most of the identified RNAi genes in banana. The prediction of HR-related motifs suggests their important biological functions in bananas.
Moreover, DRE core, LTR, and TC-rich repeats were shared with several predicted RNAi genes. Several research groups evaluated that TC-rich repeats, LTR elements, MBS, and DRE act as SR motif in different plant species [5, [175][176][177][178]. In addition, some unknown CAREs were recognized in this study (S6 File). Collectively, CAREs shared by the putative RNAi gene family in the banana will provide valuable information on their functional roles in plant growth, development, and defense against microbial infection.

Conclusion
In this study, we performed a set of integrative bioinformatics approach to in silico identification of RNAi pathway genes in the banana genome. In total, 3 DCL, 13 AGO, and 5 RDR RNAi pathway genes were identified in the banana genome. The phylogenetic analysis demonstrated that all subfamilies of RNAi genes maintain their maximum higher evolutionary relationship corresponding to banana and Arabidopsis RNAi genes. Conserved domain, motif, and gene structure revealed their functional similarity to those of banana and Arabidopsis RNAi genes. GO analysis revealed that most of the identified MaDCL, MaAGO, and MaRDR genes are connected with important biological functions; RNA silencing, defense against pathogens, and metabolic activity. Sub-cellular localization prediction exhibited that most of the MaDCL, MaAGO, and MaRDR proteins were abundant in the cytoplasm, and nucleus, where the PTGS process mainly occurred. Regulatory network and sub-network analysis were identified important TFs; ERF, Dof, C2H2, TCP, GATA and MIKC_MADS families, which are associated with MaDCL, MaAGO, and MaRDR genes. Also, analyzed CAREs associated with LR, SR, and HR were predicted to bind the TFs of putative MaDCL, MaAGO, and MaRDR genes. Therefore, our findings will provide valuable information on DCL, AGO, and RDR genes in the banana genome, which might be helpful for the cloning and characterization of banana RNAi genes in wet lab conditions and further improvement of these genes for the breeding programs of this economically important crop species. Moreover, these findings could upgrade the knowledge to in silico analysis of genes related to the various important biological pathways from others crop species.