CmMDb: A Versatile Database for Cucumis melo Microsatellite Markers and Other Horticulture Crop Research

Cucumis melo L. that belongs to Cucurbitaceae family ranks among one of the highest valued horticulture crops being cultivated across the globe. Besides its economical and medicinal importance, Cucumis melo L. is a valuable resource and model system for the evolutionary studies of cucurbit family. However, very limited numbers of molecular markers were reported for Cucumis melo L. so far that limits the pace of functional genomic research in melon and other similar horticulture crops. We developed the first whole genome based microsatellite DNA marker database of Cucumis melo L. and comprehensive web resource that aids in variety identification and physical mapping of Cucurbitaceae family. The Cucumis melo L. microsatellite database (CmMDb: http://65.181.125.102/cmmdb2/index.html) encompasses 39,072 SSR markers along with its motif repeat, motif length, motif sequence, marker ID, motif type and chromosomal locations. The database is featured with novel automated primer designing facility to meet the needs of wet lab researchers. CmMDb is a freely available web resource that facilitates the researchers to select the most appropriate markers for marker-assisted selection in melons and to improve breeding strategies.


Introduction
Melon (Cucumis melo L.) (2n = 2x = 24) is an important eudicot diploid horticultural crop with an estimated haploid genome size of 454 Mb. The melon belongs to the Cucurbitaceae family, which also contains other important crops such as cucumber, watermelon and pumpkin. It is cultivated worldwide, mainly in temperate, subtropical and tropical regions [1]. Melon is well known for its specific biological, medicinal and economic significance. Melon is amongst important fleshy fruits used for fresh consumption [2]. Melon displays high variability in its physical, biochemical and phenotypic characteristics and also acts a stimulator for precursors of modern genetics [3]. According to recent statistics available from the USDA reports of 2009, melon production has reached 31,053,716 tons worldwide with an increase of just 0.3% compared to its previous year's production [4]. The major melon producing countries are China with the share of 52% of total global production followed by USA, Spain, Turkey and Iran respectively in the order of production [4].
Various molecular markers have been developed and extensively used to monitor genomic divergence in and among species for breeding and genetic analysis studies [5]. In recent years, microsatellite markers gained much importance in various genetics and molecular studies because of their high co-dominant inheritance, reproducibility, multi-allelic variation and their abundance in the genome. These microsatellite markers have been employed in various applications of structural, functional and comparative genomics (syntenic and phylogenetic study), variety identification, marker-assisted selection and construction of high density genome maps [6][7][8][9][10][11][12][13]. Despite of melon's economic and scientific importance, there exists only a limited numbers of microsatellite markers and resources. There were numerous attempts to develop SSR markers and resources for various important crops like bottle gourd [14], watermelon [15], cucumber [16], melon [17], foxtail millet [18][19][20], tomato [21], barely [22], potato [23], sugarcane [24], capsicum [25], and eggplant [26].
A number of genetic and molecular resources were developed over the last few years in the field of functional genomics, including the development of physical maps [27,28], ESTs [29,30] and microarrays [31,32]. However, a very limited numbers of molecular markers were developed for melon despite its various scientific and economic importance [33]. Therefore, it is necessary to develop a large scale catalogue of microsatellite DNA marker by taking advantage of the availability of whole genome sequence of melon [34]. Considering the employability of whole genome sequence-based microsatellite markers, we made an attempt to identify and develop microsatellite markers from melon genome and subsequently develop a relational database to provide unrestricted access to plant breeders and researchers.

Database processing pipeline
The draft of whole genome sequences of Cucumis melo L. [34] was downloaded in FASTA format from the melonomics website [35] and fed to Micro Satellite (MISA) identification tool [36] to identify the microsatellite markers with the following search criteria: 6 repeat units used for dinucleotide repeats (DNRs), 5 repeat units used for rest all nucleotides including trinucleotide repeats (TNRs), tetranucleotide repeats (TeNRs), pentanucleotide repeats (PNRs) and hexanucleotide repeats (HNRs). The primer pairs were designed from either side of flanking regions of the identified microsatellite markers using integrated Perl 5 interface module of MISA-Primer3 software [36]. The identified microsatellite markers were mapped by using BLAST software [37] with default E-value onto the melon's chromosomes. The flanking sequences of microsatellite markers of melon were then submitted to BLAST [37] against genome sequences of cucumber and watermelon to study the marker-assisted syntenic relationships among the chromosomes of C. melo, cucumber and watermelon. A cutoff E-value of 1e-05 was considered significant for the BLAST analysis. Finally, the identified microsatellites with their corresponding primer pairs and the syntenic information were stored in MySQL database [38].

Database architecture
Cucumis melo Microsatellites Database (CmMDb) is an online relational and interactive database. The database was designed based on "Three-Level Schema Architecture" as represented in Fig 2 and was developed by using MySQL 5.0 [38] to store all the information associated with microsatellite markers. The CMap [39] schema has been integrated with CmMDb to facilitate interactive visualization of physical map and for comparative mapping (syntenic information) of the marker data with genomes of cucumber and watermelon. The schema of CmMDb is shown in

User interface
The user-friendly interface for CmMDb was developed using open-source server side scripting language, PHP 5.4 and HTML to query and retrieve the results from the database as per user requirements. The database can then be searched either by providing user specific input (gSSR tab) or by browsing the interactive physical map (Map tab). The 'gSSR' tab provides different search parameters to search the entire CmMDb or to choose the desired microsatellite markers based on motif repeat and motif length, motif sequence, marker ID, chromosome number and motif type.
The result of the respective query will be presented in a tabular format with marker ID, motif sequence, motif repeat, chromosomal location, start and end positions along with hyperlinks to corresponding primer information and physical map (Fig 4A). The hyperlink provided for 'Primers' under 'Get' column redirects to the primer data page that lists information about the corresponding primer which include forward and reverse primer sequences, respective melting temperatures (Tm), lengths and expected product size (Fig 4B). The 'Position' hyperlink under 'Physical' column redirects to CMap interface [39] where the physical locations of the markers in the melon genome can be viewed interactively (Fig 4C). The 'Map' tab gives CMap [39] based interactive physical map of all the microsatellite markers on the twelve chromosomes of the melon genome (Fig 5). This interactive map allows the user to study the syntenic relationships among any chromosome or all the chromosomes of the melon genome and cucumber and/or watermelon chromosomes by drawing interactive comparative maps (Fig 6). Finally, the complete marker data of CmMDb can be downloaded from the 'Download' section of the CmMDb. In addition to the self-explanatory usage of the interface, a robust tutorial has also been built for better user understanding.

Results and Discussion
The available whole genome (375 Mb) sequences of melon [34] was searched for microsatellites, and a total of 39,702 microsatellite markers comprising different kinds of desirable motifrepeats (from DNRs to HNRs) were identified with the average frequency of about 123 microsatellites per mega base sequences (Mb). The DNR type (61.13%) was found to be dominant, followed by TNR type (30.66%), TeNR type (5.59%), PNR type (1.83%) and the HNR type (0.77%). Chromosome 1 contains the highest number of markers (4103) while chromosome 10 exhibits a minimum number of microsatellite markers (1960) ( Table 1). Among DNRs, AT/TA motifs (71.91%) were more frequent, followed by AG/GA (10.19%), CT/TC (10.08%), AC/CA  (4%), TG/GT (3.76%) and GC/CG (0.04%). Among the TNRs, AAT/TAA (21.29%) motifs were most abundant followed by ATT/TTA (19.11%), AAG/GAA (10.87%) and TTC/CTT (10.49%), whereas CCG/GCC (0.47%) followed by TGG/GGT (0.77%) motifs were less abundant. Based on the length of the repeat motifs, a total of 1,835 (4.62%) microsatellite markers was classified as long and hypervariable class I (20 bp) types, a total of 4,684 (11.79%) microsatellites as variable class II (12-19 bp) types and remaining 32,553 (81.99%) microsatellites as variable class III (5-11 bp) types. All data related to distribution and frequencies of markers are based on SSR class and predominant motif present in DNR and TNR. microsatellites as shown in S1 Table, its parts S1, S1.1 and S1.2, respectively.  To study the marker-assisted syntenic relationships between melon, cucumber and watermelon chromosomes, the physically mapped melon microsatellite markers were submitted to BLAST against genomes of cucumber and watermelon with e-value of 1e-05. The comparative genome mapping showed 44.46% (17,375) of the melon markers have sequence-based orthology and syntenic relationships with cucumber while 18.67% (7,298) melon markers have sequence-based orthology and syntenic relationships with watermelon genomes, respectively. The 4 th chromosome of melon has shown maximum synteny (1,632 markers) with cucumber followed by 6 th chromosome of melon (1,442 markers), whereas the 3 rd chromosome of melon shown maximum synteny (463 markers) with watermelon followed by 1 st chromosome of melon (447 markers). All the comparison among the chromosomal wise distribution of microsatellite markers between melon with that of watermelon and cucumber is presented in S2 and S3 Tables, respectively.
CmMDb serves as a repository of DNA markers and can be employed as a tool in markerassisted breeding programme of melon's improvement, genetic diversity and syntenic studies between the Cucurbitaceae members. CmMDb can be of much support in browsing or searching for a particular marker on the genome interactively and also to visualize each marker on the chromosome by using CMap interface [39]. The relative density of the melon whole genome reported in the study is 123markers/Mb, while the other closest model plant genome is Arabidopsis with 157 Mb. Whereas in comparison to other crops such as cucumber (367 Mb), rice (370-490 Mb), popular (485 Mb), grape (487 Mb), sorghum (818 Mb), soybean (1115 Mb), maize (2365 Mb), wheat (1000 Mb) and pigeon pea (833 Mb) [36] reported with higher number of markers.
CmMDb is of great use to melon breeders in molecular breeding. This database contains a number of options which facilitate easy search and retrieval of a specific marker on a chromosome or on a specific portion of the chromosome and corresponding primer pairs along with their physical map (CMap interface) [39]. This database contains sequence-based orthology and syntenic relationships for each microsatellite markers with cucumber and watermelon and the same can be visualized using a CMap interface. CmMDb is useful to study the marker-assisted syntenic studies between the melon's chromosomes and chromosomes of cucumber and watermelon. CmMDb can be resourceful for mining information in order to design experiments in directions of interpreting novel roles and functions of microsatellites.

Conclusion
CmMDb contains the information related to genomic microsatellite markers, with unrestricted public access. Based on available genome sequence data, genomic SSR markers were examine in the present study. It identified a total of 39,072 microsatellite markers in available whole genome sequences of melon. Motif frequencies decreased as there was increased in motif length. as for DNR type it was 61.13% which was found to be dominant, followed by TNR, TeNR, PNR and HNR as 30.66%, 5.59%, 1.83% and 0.77% respectively. The highest number of markers is present in chromosome 1 while a minimum number of microsatellite markers is present in chromosome 10. The identified markers can be acclimated to tag a specific biological trait of melon such as biotic or abiotic stress resistant traits (QTLs) to further develop high yielding and resistant melon varieties. Furthermore, the researchers who have developed SSR markers in Cucumis melo with the purpose of availing the Cucumis melo amendment are invited to submit their marker data to CmMDb.
Supporting Information S1 Table. S1. Distribution and frequency of different classes of marker in Cucumis melo genome. S1.1. Distribution and frequency of different kinds of motifs based on motif repeats in microsatellite markers of Cucumis melo. S1.2. Statistics of predominent motifs found in DNR and TNR microsatellite markers of Cucumis melo.