Genome-wide identification of pistil-specific genes expressed during fruit set initiation in tomato (Solanum lycopersicum)

Fruit set involves the developmental transition of an unfertilized quiescent ovary in the pistil into a fruit. While fruit set is known to involve the activation of signals (including various plant hormones) in the ovary, many biological aspects of this process remain elusive. To further expand our understanding of this process, we identified genes that are specifically expressed in tomato (Solanum lycopersicum L.) pistils during fruit set through comprehensive RNA-seq-based transcriptome analysis using 17 different tissues including pistils at six different developmental stages. First, we identified 532 candidate genes that are preferentially expressed in the pistil based on their tissue-specific expression profiles. Next, we compared our RNA-seq data with publically available transcriptome data, further refining the candidate genes that are specifically expressed within the pistil. As a result, 108 pistil-specific genes were identified, including several transcription factor genes that function in reproductive development. We also identified genes encoding hormone-like peptides with a secretion signal and cysteine-rich residues that are conserved among some Solanaceae species, suggesting that peptide hormones may function as signaling molecules during fruit set initiation. This study provides important information about pistil-specific genes, which may play specific roles in regulating pistil development in relation to fruit set.


Introduction
The pistil is a single reproductive organ that develops into a fruit after fruit set. The efficiency of fruit set is one of the most important traits that determine yield in many fruit-bearing crops such as tomato (Solanum lycopersicum L.). Because of its worldwide production and availability, tomato has been widely accepted as a model system for investigating fruit set. In general, fruit set is induced after successful development of the pistil upon pollination and following fertilization [1]. Through conventional molecular, genetic, and biochemical analyses of tomato, plant hormones such as auxin and gibberellic acid (GA) have been shown to play a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Therefore, to extend our understanding of the molecular mechanism underlying fruit set and to generate new tools for pistil-specific regulation of fruit set-associated genes, it is important to identify PSGs that are specifically expressed during fruit set initiation.
In this study, we conducted genome-wide analysis of PSGs in tomato by RNA-seq and compared the results with publicly available data. As a result, we identified about one hundred of PSGs including genes encoding signaling-related proteins, several transcription factors, and peptide hormone-like proteins, in addition to many genes of unknown function. Further analysis of these mined genes would increase our understanding of the mechanisms underlying of pistil development and fruit set and would be useful for generating genetic engineering tools, such as tissue-specific promoters.

RNA-seq, processing, mapping of Illumina reads, and detection of PSGs
The 35-nt and 100-nt single-end sequencing analysis was conducted on the Illumina Genome Analyzer IIx system and Illumina HiSeq 2000, respectively. To identify the transcriptome of each tissue, "direct-mapping method" was conducted.
For the direct-mapping method, the quality of Illumina raw FASTQ data was checked by FastQC before and after trimming with Trimmomatic according to the instruction manual (S1 Fig) [37]. After trimming, only sequences with a minimum length of 20 bp were retained. The trimmed sequence data were imported into CLC Genomics Workbench ver 7.0.4 (QIAGEN, Germany) and mapped to the tomato genome SL2.50 and gene model SL2.40. Gene expression data were obtained as gene length by reads per kilobase of exon per million mapped reads (RPKM) values [38]. The data were normalized by the quantile method to reduce obscuring of variation among samples, and a logarithmic transformation part 2 subjects the normalized data after adding 1 to each values to logarithmic transformation for heat map analysis [39,40]. To narrow down the candidate genes expressed specifically in pistils, an RPKM value of 0.5 was used as the cutoff value to determine specific expression in each sample. Based on this criterion, candidate tomato PSGs with values higher than 0.5 in pistils and lower than 0.5 other tissues were identified by comparing the transcriptome data for each tissue.

Data mining of publically available RNA-seq data
To examine the expression patterns of the identified genes in tissues other than pistils, publically available data were downloaded from transcriptome analyses of tomato from the Tomato Functional Genomics Database (http://ted.bti.cornell.edu/cgi-bin/TFGD/digital/home.cgi). Data from nine different vegetative samples from tomato cv. Heinz and wild tomato species S. pimpinellifolium were extracted and investigated to determine whether the candidate genes were expressed in these tissues.
To estimate the regions in the pistil in which the candidate genes are expressed, tissue-specific transcriptome data from the pistils of tomato wild relative S. pimpinellifolium [27] were used to identify genes with expression levels higher than RPM (reads per million mapped reads) = 2 in at least one sample. The expression levels of the top-ten genes in each tissue were then examined. To confirm the expression patterns of the candidate genes in the pistil, their expression levels were also investigated using transcriptome data from tomato cv. 'Moneymaker' [23]. If the expression level was higher than FPKM (Fragments Per Kilobase of exon per Million mapped fragments) 0.5 in at least one sample, it was judged to be an expressed gene. To investigate the responses of the genes to plant hormone treatment, a publically available dataset from the transcriptomes of pollinated or parthenocarpic fruit induced by hormone treatment was utilized [20]. To compare the list of differentially expressed genes with the candidate genes, unigene numbers were converted to ITAG IDs using the Unigene converter in the SGN database.
Gene ontology analysis ITAG IDs of the candidate PSGs were used as input with the AgriGO agricultural gene ontology (GO) analysis tool (http://bioinfo.cau.edu.cn/agriGO/analysis.php) to elucidate enriched GO terms. A false discovery rate (FDR; e-value corrected for list size) of 0.05 was used as the criterion to obtain enriched GO terms.

Gene expression analysis by RT-PCR
To confirm the expression patterns of the candidate genes by RNA-seq analysis, RT-PCR was performed using cDNA samples derived from vegetative and reproductive organs, including young leaves, mature leaves, mature stems, mature roots, flower buds, and flower from 3-weekold plants. To analyze the expression patterns of the genes in ovaries or fruits before/after pollination, RT-PCR was performed using cDNA samples from tomato pistils and fruits at the corresponding developmental stages: A, pistils from 2-2.5 mm flower buds at 10 days before flowering (10 DBF); B, pistils from 3-4 mm flower buds at 7 days before flowering (7 DBF); C, pistils at 1 day before flowering (1 DBF); D, pistils at anthesis/pollination (0 DAF); E, pistils at 5 days after flowering (5 DAF); F, 5 mm ovaries at 7 days after flowering (7 DAF); G, mature green fruits at 33 days after flowering (MG); H, red fruits at 44 days after flowering (RED). Semi-quantitative reverse transcription polymerase chain reaction (RT-PCR) analysis was performed with Mastercycler ProS (Eppendorf, Germany) using an ExTaq Kit (TaKaRa Bio, Japan) and the primer sets listed in S5 Table. As an internal control for expression analysis in different organs, SAND expression was monitored using the primers SAND-F (5'-TTGCTTGGAGGAACAGACG -3') and

Sequence analysis of genes with unknown functions
Protein sequences were downloaded from the Sol genomic network. Sequence alignments were conducted using ClustalW in DDBJ (http://clustalw.ddbj.nig.ac.jp/). Phylogenetic trees were generated using CLC Genomic Workbench. The presence of secretion signals in the small proteins was investigated using SignalP 4.

Transcriptome analysis of various tomato tissues
To obtain transcriptome profiles of various tomato organs in order to identify PSGs, we performed RNA-seq analysis of 17 different floral and vegetative samples at different developmental stages (Fig 1A). We initially selected six different stages for the pistil samples (P1 to P6) and three different stages for the anther samples (A1 to A3). P1 to P3 and A1 to A2 represent samples at pre-anthesis; P1 corresponds to pistils in 2-2.5 mm flower bud, P2 and A1 correspond to pistils and anthers, respectively, in 3-4 mm flower bud, and P3 and A2 correspond to those in flower buds 1 day before flowering (1 DBF), while P4 and A3 correspond to those in flower buds at anthesis (0 DAF). P5 and P6 represent samples from post-anthesis stages: P5 and P6 correspond to pistils/fruits in flowers at 5 days after flowering (5 DAF) and in 5 mm ovaries at 7 days after flowering (7 DAF) samples, respectively. In addition, we used eight samples from different tissues. We conducted 35 nt and 50 nt single reads sequencing by Illumina GAIIx and Hiseq 2000, respectively (S1 Table). We used the "direct-mapping method" to identify sets of PSGs (Fig 1B). In the direct-mapping method, whole sequenced short reads were directly mapped onto the tomato reference genome.
The direct-mapping method is a common approach for transcriptome analysis in which sequence reads are mapped onto the reference genome of a target organism [8,38]. Our RNAseq generated different amounts of raw data ranging from 8.32 to 39.42 million reads. After quality checking and trimming of low quality reads and adapter sequences, we obtained 7.28 to 35.23 million clean reads for mapping (S1 Table). We analyzed the reads using CLC Genomic Workbench ver. 7.0.4, a user-friendly mapping tool; 82.3% to 90.5% of the clean reads from each sample were mapped to the tomato genome SL2.40 [43] (S1 Table). An RPKM cutoff value of 0.5 was utilized to declare a locus expressed, resulting in an average of approximately 25,000 genes above the expression threshold in 17 samples (S2A Fig).
Before isolating PSGs, we examined the quality of our transcriptome data, as we used only one replicate per sample. Initially, to characterize the transcriptome data, we conducted principal component analysis (PCA) with CLC Genomic Workbench ver. 7.0.4. Component 1 explained 91% of the variation, while component 2 explained 2% of the variation, indicating that the two components together explained 93% of the variation of the 17 original variables. Samples from vegetative and reproductive organs were separated into two groups, with samples such as petals and sepals (which are components of reproductive organ but are composed of vegetative cells) located in the middle of the two groups (Fig 2A), indicating specialized transcriptomes. We investigated the expression patterns of the homologs that had already been identified as tissue-specific genes, such as YABBY transcription factor genes. In Arabidopsis, two YABBY transcription factors, CRABS CLAW (CRC) and INNER NO OUTER (INO), show pistil-specific expression and are involved in the pistil and early fruit development [44][45][46]. Thus, we expected the expression patterns of their homologs in tomato to show pistilspecific expression, and we therefore examined this possibility. Three of nine YABBY transcription factor genes found in the tomato genome, SlCRCa (Solyc01g0101240), SlCRCb For vegetative organs, four samples were collected, including mature leaves from 3-week-old plants and young leaves, stems, and roots from 1-week-old plants. For reproductive organs and fruits, 13 samples were collected, including pistils and anthers of 2-2.5 mm buds, pistils and anthers of 3-4 mm buds, pistils at 1 daybefore-flowering (1 DBF), pistils and anthers at anthesis (0 DAF), ovaries of 5-days after flowering (5 DAF), 5 mm ovaries (7 DAF), sepals and petals at anthesis, mature green fruits (MG), and red fruits (RED). (B) Work flow of transcriptome analyses. For the direct-mapping method, whole transcriptome data from short reads were obtained, which were directly mapped onto the tomato reference genome, and expressed genes were identified. https://doi.org/10.1371/journal.pone.0180003.g001 Identification of pistil specific genes in tomato PLOS ONE | https://doi.org/10.1371/journal.pone.0180003 July 6, 2017 with RPKM values greater than 0.5 in at least one pistil sample and less than 0.5 in the other tissue samples using publically available data. Out of 532 genes, 206 were found to be expressed in the transcriptome produced by Pattison et al. (2015) [27]. On the other hand, the expression of 376 genes was detected in at least one sample from many different tissues and conditions, and 275 of these showed RPKM values less than 1 in nine vegetative samples (Floral organ specific genes). Finally, by comparing the two gene sets, 108 genes were found in both sets, identified as pistil-specific genes (PSGs). (C) Heatmap of the expression of 108 genes in 17 different samples. Normalized Log2-transformed expression data were visualized by constructing a heatmap using MeV software. Hierarchical clustering by Pearson correlation was conducted. (D) Gene ontology analysis was performed using AgriGO (http://bioinfo.cau.edu.cn/agriGO/). Only one category, carboxylesterase activity (Solyc05g012050), and SlINO (Solyc05g005240), were specifically expressed in flower buds and flower at the anthesis stage, which is consistent with the results obtained in a previous study [47] (S2B Fig). SlCRCa was expressed in the early stage of pistil development, and SlCRCb and SlINO were expressed through all stages of pistil development, while they were barely expressed in the other tissues (S2B Fig). These data support the quality of the transcriptome dataset. Next, according to RPKM values, we narrowed down the list of genes to those with RPKM values greater than 0.5 in at least one pistil sample and less than 0.5 in the other tissues, resulting in the identification of 532 of the initial candidate PSGs obtained by the direct-mapping method (Fig 2B).
Validation of the expression specificity of the candidate genes using publically available datasets To reconfirm the tissue-specific expression of the 532 candidate genes, we performed comparative analyses between our transcriptome dataset and two publicly available transcriptome datasets (Experiment 1 and Experiment 2) from 26 samples, including vegetative tissues and floral tissues derived from tomato cv. Heinz and wild relative S. pimpinellifolium (strain. LA1589) available in the Tomato Functional genomics database (http://ted.bti.cornell.edu/); Experiment 1 (Exp1; Tomato Genome Consortium, 2012), Experiment 2 (Exp2; accession no. PRJNA179156). As a result, 376 of the 532 candidate genes were detected in at least one of the 26 samples from the public data, suggesting that these genes are most likely expressed in tomato plants. We investigated the expression levels of the 376 genes in nine different vegetative samples. We then excluded genes whose RPKM values were >1 in any of nine vegetative samples and identified 275 genes as "Floral organ-specific genes" (Fig 2B).
Alternatively, to obtain information about the cell types in which the candidate genes are expressed, we investigated their expression patterns in cell-type-specific transcriptome data from pistils of wild tomato (S. pimpinellifolium) [27]. We then selected genes expressed in pistils based on the criterion used by Pattison et al. [27]; genes with RPM values > 2 in at least one sample were chosen. In total, 206 genes were defined as "Pistil expressed genes"; their expression was evident in the pistil, especially after anthesis, while the other 326 genes excluded by this step may not be expressed in the pistil or may be expressed only at the earlier stages than 1 DBF (Fig 2B).
We compared "Floral organ-specific genes" and "Pistil expressed genes" and selected redundant genes, ultimately identifying 108 genes as PSGs by the direct-mapping method ( Fig  2B and 2C, Table 1 and S2 Table). Among these, 56 genes had not been characterized. Public transcriptome data analysis provided information about both the organs and cell types in which the 108 PSGs were expressed. Using cell-type-specific transcriptome dataset from pistils of wild relative S. pimpinellifolium [27], we obtained spatial information about the expression of PSGs within the pistil (S3 Table). Hierarchical heat mapping clearly showed their cell-typespecific expression profiles ( Fig 3A). Remarkably, roughly two-thirds of the genes appeared to show highly tissue-specific expression in the ovule and/or seed tissues (embryo, endosperm, seed coat). While many genes were preferentially expressed in the ovule and the seed tissues except seed coat, several genes were preferentially expressed in the pericarp at anthesis, in the placenta, and in the seed coat after pollination (Fig 3A). For example, five genes were preferentially expressed in the pericarp before pollination: genes encoding cinnamoyl CoA reductase-  . In addition, the expression of 63 out of 108 genes was detected also in the recently published ovary transcriptome dataset derived from cultivated tomato 'Moneymaker' [23], in which RNA-seq analyses against ovule and ovary wall tissue were conducted; their average expression levels were over FPKM of 0.5 [23] (S4 Table). The 55 other genes were not detected in that dataset, indicating that these 55 genes were barely expressed in cultivated tomato or were only expressed in other type of tissues such as the placenta and septum, which were excluded from their experiment.

Validation of gene expression patterns by RT-PCR
We then verified the expression patterns of the PSGs by RT-PCR analysis. Since many of these genes were highly expressed in the ovule and/or seed, especially the embryo (Fig 3A), we initially focused on genes specifically expressed in these tissues. Among the 108 PSG candidates, the top-five PSGs highly expressed in 0 DAF ovules were designated Ovule Preferentially Expressed genes 1-10 (OPE1-5) (S5 Table). We verified the tissue-specific expression of five of these genes by RT-PCR analysis (Fig 3B). OPE1 was preferentially but not exclusively expressed in the pistil at anthesis, OPE2 and OPE5 were specifically expressed in the pistil at both 1 DBF and 0 DAF, and the expression of OPE3 in the pistil was not detected in this experiment. OPE4 was expressed in the pistil at 0 DAF and mature green fruits. We also designated the top-five PSGs that were highly expressed in 4 DAF embryos as Embryo Preferentially Expressed genes 1-5 (EPE1-5) (S5 Table). EPE1, encoding a self-incompatibility protein-like protein according to SGN, might function in pollen-pistil interactions, while most of the EPEs had not been functionally characterized or annotated in previous studies. Like the OPEs, we investigated the expression of the five EPEs (EPE1-5) by RT-PCR to validate their tissue-specific expression patterns. Three genes, EPE1-EPE3, were specifically expressed in the pistil and EPE5 was preferentially expressed in the pistil especially before anthesis, although we failed to detect the expression of EPE4 in our RT-PCR analysis (Fig 3B). EPE1 was specifically expressed in the pistil throughout pistil/fruit development but was not expressed in mature red fruits. EPE3 was also specifically expressed in the pistil, but only after anthesis. EPE2 was expressed exclusively during fruit set initiation between 1 DBF and 0 DAF (Fig 3B). In summary, three OPEs and four EPEs were specifically expressed in pistils, confirming their tissue-specific expression in the pistil (Fig 3B). Therefore, we confirmed the tissue-specific expression of PSGs in the pistil. These results indicate that the direct-mapping method also successfully identified true PSGs. Identification of pistil specific genes in tomato

GO analysis using AgriGO
To elucidate the enriched functional categories of the 108 identified PSGs, we performed GO analysis using AgriGO. A false discovery rate (FDR; e-value corrected for list size) of < 0.05 was used as the criterion to obtain enriched GO terms. Consequently, only one category, Carboxylesterase activity (GO:0004091), showed significant abundance (p-value = 0.0017, FDR = 0.037) (Fig 2D). This category includes five genes (listed in Fig 2D), three of which (Solyc02g069330, Solyc09g011280, and Solyc09g011290) were assigned to the sub-term "Pectinesterase inhibitor". Even though Solyc09g011290 was classified as a "Pectinesterase inhibitor", it was labeled as an "invertase inhibitor homolog" in the SGN database and has higher sequence homology with the invertase inhibitor group that includes invertase inhibitor 1 (INVINH1, Solyc12g099200), which specifically regulates cell wall invertase activity in early developing fruits [49]. Pectin, a major component of the primary cell walls of higher plants, is methyl-esterified by pectin methyltransferase (PMT) before its transport to the cell wall following its biosynthesis in Golgi bodies [50,51], whereas pectin methylesterase (PME) catalyzes the removal of methyl esters from pectin [52-54]. The removal of methyl group from pectin allows carboxyl groups to form Ca 2+ -and Mg 2+ -mediated linkages, leading to the hardening of pectin [55,56]. In addition, pectin methylesterase inhibitors (PMEIs) directly interact with PME and inhibit its activity, affecting pectin composition in the cell wall. Lionetti et al. [57] reported that overexpressing Arabidopsis PMEI increased the degree of pectin methylesterification by approximately 16%, resulting in longer roots due to the promotion of cell elongation. Therefore, the degree of methylation and demethylation of pectin determines the balance between extensibility and rigidity, affecting growth and cell shape. In tomato, PMEU1, a ubiquitously expressed pectin methylesterase gene, is expressed during early fruit development [58]. Terao et al. [59] recently reported the occurrence of rapid pectin metabolism during the early stage of fruit development in tomato: immunolocalization analysis demonstrated that methyl-esterified pectin levels in the ovary increased from 1 DBF to 3 DAF [59]. During fruit set, the transition of cell state from cell division to cell expansion occurs during a short period of time, and the regulation of this process is important for determining the size of the fruit. Therefore, it would be interesting to investigate whether PMEI plays a role in the post-translational regulation of PME and cell wall state during fruit set.
In addition, the pectinesterase inhibitor protein family includes several enzyme inhibitors such as invertase (Beta-fructofuranosidase) inhibitors, each of which has a specific target [60,61]. Solyc09g011290 was annotated as an invertase inhibitor homolog in the SGN database. Invertase inhibitors regulate specific invertases in a post-translational manner, negatively affecting the enzyme activity of their targets [49,61]. We found that Solyc09g011290 was specifically and highly expressed in the ovule/seed (S3 Table). The expression of Solyc09g011290 was induced during anthesis and remained at high levels in the absence of pollination but was down-regulated by pollination and hormone treatments (S6 Fig). Several studies on the cell wall invertase (CWIN) and INVINH1 in tomato suggest that these proteins play important roles in seed set and fruit set by regulating the unloading of sugar from the phloem during the ovary-to-fruit transition [4,49,62,63]. Thus, the expression pattern of Solyc09g011290, the upregulation during flowering and the down-regulation by the fruit-set stimulus (S6 Fig), suggests that Solyc09g011290 may also participate in the modulation of the sugar unloading to unpollinated pistil via post-translational inhibition of invertase activity.

Identification of pistil-specific transcriptional regulators
Pistil-specific transcription factors. Next, we performed similar RT-PCR analyses of several transcription factor genes listed among the PSGs and confirmed the tissue-specific expression of three transcription factor genes ( Fig 3B).
Solyc04g074320 (SlTT1-like) shares high homology (71.5%) with Arabidopsis zinc-finger protein TRANPARENT TESTA1 (TT1) [64]. Arabidopsis TT1 expression is restricted to developing ovules and young seeds and functions in the accumulation of proanthocyanidin pigments in the seed coat [65,66], while in the current study, tomato SlTT1-like transcripts were exclusively detected in the ovule, embryo, and endosperm but not in the seed coat ( Fig  3B). Mazzucato et al. [67] provided evidence that higher anthocyanin content is associated with increased early fruit growth in non-pollinated flowers. Furthermore, there is an evidence that the alteration of the flavonoid pathway via the regulation of biosynthesis genes induces seedless fruit development in both a pollination-dependent and pollination-independent manner [68,69]. Further elucidation of the function of SlTT1-like in the control of flavonoid-related genes may provide insight into the role of flavonoids during fruit set initiation.
SlINO (Solyc05g005240) was identified as a pistil-specific YABBY transcription factor gene ( S2B Fig and Fig 3B). YABBY family proteins contain two conserved domains, i.e., a C2C2 zinc-finger-like domain in their N-termini and a helix-loop-helix domain known as the YABBY domain [70]. In Arabidopsis, two YABBY genes, INO and CRC, show tissue-specific expression in the pistil and are involved in pistil and early fruit development [44][45][46]. Nine YABBY genes were previously identified in tomato, three of which (SlCRCa, Solyc01g0101240; SlCRCb, Solyc05g012050; SlINO, Solyc05g005240) are specifically expressed in the flower bud and in open flowers at anthesis [50]. In the current study, we found that SlCRCa, SlCRCb, and SlINO were preferentially expressed in the pistil (S2B Fig). SlCRCa was expressed in the early stage of pistil development, while SlCRCb and SlINO were expressed during all stage of pistil development. Furthermore, we confirmed the tissue-specific expression of SlINO in pistils by RT-PCR analysis (Fig 3B), suggests its role in the regulation of pistil development [46].
Solyc01g010600 (SlATHB13/23-like), which encodes a homeodomain leucine zipper 1 transcription factor (HD-Zip TF), shares similarity with Arabidopsis class-1 HD-Zip genes AtHB13 and AtHB23 and was specifically expressed in the pistil before anthesis (Fig 3B). The HD-Zip TF family forms a large gene family that is divided into four classes; 58 HD-Zip proteins found in both Arabidopsis and tomato are listed in PlantTFDB version 3.0 (http://planttfdb.cbi.pku. edu.cn) [71,72]. Although the molecular functions of class-1 HD-Zip proteins in the regulation of pistil development remain elusive, AtHB13 and AtHB23 were shown to play negative roles in inflorescence stem elongation by affecting cell division, and AtHB13 also regulates pollen hydration and development [73]. In tomato, class-1 HD-Zip SlHZ24 functions as a transcriptional activator of SlGMP3 (encoding GDP-D -mannosepyrophosphorylase), which plays an important role in the production of the antioxidant ascorbate [74]. In addition, virus-induced gene silencing of class-1 HD-Zip LeHB1 reduced the mRNA accumulation of LeACO1 and inhibited ripening [75]. Further, Lin et al. [75] also reported that ectopic overexpression of LeHB1 led to the conversion of sepals into carpel-like structures. We also found that SlATHB13/ 23-like was highly expressed in the ovary wall in the pistil at anthesis (S3 and S4 Tables), suggesting its regulatory role in carpel and ovary wall development.
Besides, we identified the type-I MADS box gene, Solyc01g106730 among PSGs (Table 1). MADS box genes encode transcription factors which are generally involved in homeotic regulation of reproductive organs and can be divided into two subfamilies (type-I and type-II) according to the presence of conserved domains [75,76]. Some type-II MIKC MADS box genes play key roles as regulators of meristem identity, flowering time, and fruit and seed development [77,78], whereas little information is available about the function of type-I MADS box proteins. Most (38 out of 61) Arabidopsis type-I MADS box genes are expressed in the female gametophyte [79], whereas others exhibit highly specific expression such as in the central cell and embryo sac [80]. For example, Arabidopsis type-I AGAMOUS-LIKE61(AGL61)/DIANA (DIA) is expressed exclusively in the central cell and early endosperm and plays crucial role in endosperm development after fertilization through transcriptional control in the central cells [79][80][81][82]. AtAGL62 also plays important role in the endosperm and seed coat development [83][84][85]. Gene expression analysis using tissue-specific transcriptome data from wild tomato S. pimpinellifolium [27] revealed that the type-I MADS box gene Solyc01g106730 is preferentially expressed in the ovule (S3 and S4 Tables). In addition to the relationship between type-I MADS box genes and seed development, there is an evidence that down-regulation or mutation in type-II MADS box genes, such as TM29, TAP3, TM8, SlAGL11 or SlAGL6, results in parthenocarpy [86][87][88][89][90]. Thus, it would be important to investigate the roles of Solyc01g106730 in pistil, seed, and fruit development.
Pistil-specific peptide hormone-like small peptide genes and receptor-like proteins. The role of peptide hormones in plant signaling pathways is a popular focus of study [91][92][93][94][95]. The peptide hormone signaling system involves two main components: (1) small ligand proteins such as small cysteine-rich peptides (CRPs) and (2) receptor proteins such a leucine-rich receptor-like kinases (LRR-RLKs) [96]. CRPs function as signaling molecules (peptide hormones) in various plant species, which are required for many aspects of development including antimicrobial defense, pollen tube guidance, stomatal patterning, and early embryo patterning [97][98][99][100][101][102][103][104]. CRPs contain four, six, or eight conserved cysteine residues at their C-termini in addition to a secretion signal at their N-termini. Interestingly, a substantial number of PSGs identified in this study encode small proteins (44 out of 108 genes identified by the mappingbased method [40%]) less than 200 amino acids in length ( Table 2). Small proteins are defined as proteins smaller than 200 amino acids according to previous reports [94,97,105]. Since peptide hormone-like small proteins share a conserved structure, we performed a sequence similarity search of the 44 identified small proteins and one TAPETUM DETERMINANT 1 (TPD1)like protein (204 aa) by BLAST analysis and SignalP 4.1 server (http://www.cbs.dtu.dk/services/ SignalP/) manually to investigate whether they have conserved residues or functional domains. Roughly half of these proteins also have a secretion signal in their N-termini (Table 2). Notably, through subsequent sequence analysis of these small proteins, four tissue-specific CRPs including an unknown gene (Solyc06g075200) and two LRR-RLK-like proteins were identified (Listed in Table 2, S5 Fig).
OPE4 (Solyc11g012650) was homologous to Arabidopsis TAPETUM DETERMINANT (AtTPD1, AT4G24972) (6e-37), encoding a peptide hormone that functions as a ligand molecule to regulate the specification of tapetum cells in coordination with receptor protein EMS/ EXS [106,107], while BLAST searches of the tomato genome identified three other homologs, designated SlTPD1L1 (Solyc03g097530), SlTPD1L2/OPE4 (Solyc11g012650), and SlTPD1L3 (Solyc11g006850), based on sequence similarity to AtTPD1, with 59.4% (1e-49), 55% (6e-37), and 50% (4e-33) sequence similarity, respectively. Like AtTPD1, we confirmed the presence of a secretion signal in the N-terminal region and conserved cysteine residues at the C-terminus among the three deduced proteins (S3A Fig). Although the sequence of N-terminal secretion signal region varied among the three proteins, an alignment of each SlTPD1L compared to amino acids 26-179 of AtTPD1 revealed a high degree of similarity (48-56%) (S3B Fig). Although it is known that AtTPD1 is also expressed in inflorescence meristems, floral meristems, carpel primordia, and ovule primordia, its function in these tissues remains unknown [106,107]. The notion that SlTPD1L2/OPE4 (Solyc11g012650) showed pistil-specific expression (Fig 3B), and that OPE4 is shown to be preferentially expressed in the ovule both in S. pimpinellifolium and tomato cultivar 'Moneymaker' (S3 and S4 Tables), it was suggested that SlTPD1L2/OPE4 might play a tissue-specific role in ovule development.
Solyc11g005540, Solyc11g005500, and Solyc05g010190 share sequence similarity with Arabidopsis ECA1-like protein (EC1) genes, which involved in gamete fusion [108] (S4B and S4C  Fig). mRNA from the five Arabidopsis EC1 genes (EC1.1 to EC1.5; belonging to the ECA1 Table 2. List of pistil-specific or preferentially expressed small proteins. [Early Culture Abundant] gametogenesis-related cysteine-rich protein subfamily) is present only in egg cells before fertilization. And, small proteins encoded by EC1 genes are secreted into the outer region of the egg cell to redundantly regulate the fusion of the germ cell to the sperm cell during double fertilization [82,108]. Three tomato EC1-like protein homologs are expressed in the ovule/seed in S. pimpinellifolium (S3 and S4 Tables), while in the current study, we confirmed the pistil-specific expression of Solyc11g005500 and Solyc11g005540 (S4A Fig). Sequence similarity analysis of these protein sequences with AtEC1s identified five tomato homologs, which share six conserved cysteine residues in the C-terminal region and an N-terminal secretion signal peptide (S4B Fig). Therefore, these genes were designated S. lycopersicum SlEC1-like genes (SlECLs; e.g., SlECL1 [Solyc11g005500], SlECL3 [Solyc05g010190], and SlECL5 [Solyc11g00 5540]), suggesting that they play a conserved role in the pistil at fertilization. In addition to several ligand-like, pistil-specific CRPs, we also identified two proteins with RLK-related domain; OPE5 (Solyc10g051370) and Solyc03g096190. OPE5 was expressed in ovule and seed, and Solyc03g096190 was expressed in placenta and septum (S3 Table). In tomato, the LRR-RLK family represents the largest family of RLKs [109]. Molecular genetic studies have shown that LRR-RLKs are involved in a wide range of plant development, such as stem cell maintenance [110,111], cell fate determination and patterning [103,112], and brassinosteroid signaling [113][114][115]. Generally, LRR-RLK proteins, such as EMS1/EXS and CLA-VATA1 (CLV1), function as receptors by interacting with small ligand proteins at LRR domain and subsequently transmit external signals into cells by activating kinase domain, inducing various cellular responses [116][117][118]. While Solyc03g096190 possesses the LRR domain, one transmembrane domain and the cytoplasmic kinase domain, OPE5 possesses only small LRR domain composed of five repeats (S5 Fig). Some receptor-like proteins with extracellular LRR domains lacking the internal kinase domain also have been identified in other plants, such as Arabidopsis CLAVATA2 (CLV2) [119], which regulates various developmental and immunity signaling pathways by interacting with other RLKs including CLV1 [119][120][121][122]. Unraveling the role of pistil-specific RLK like proteins may help understand the detailed mechanism of signal transduction during fruit set. Identification of pistil specific genes in tomato

Relationship between PSGs and fruit set signaling
We further characterized and identified PSGs associated with fruit set in the pistil. Specifically, we investigated their expression in a dataset of differentially expressed genes (DEGs) in the pistils of plants treated with 2,4-D and GA 3 from Tang et al. [20]. Among total 4764 and 6875 DEGs in ovaries undergoing fruit set after induction by plant hormones (auxin and GA) or pollination at 4 DAF compared to unpollinated ovaries at 2 DBF and unpollinated ovaries at 4 DAF, respectively, three genes, including invertase inhibitor homolog (Solyc09g011290), SlATHB13/23-like (Solyc01g010600), SlCKX8 (Solyc10g017990) were included out of 108 PSGs, suggesting their roles in pistil development and/or fruit set initiation (S6 Fig). The mRNA levels of SlATHB13/ 23-like (Solyc01g010600) were significantly reduced at 4 DAF by either plant hormone (auxin or GA) treatment or pollination compared to that at 2 DBF, while no significant difference was found compared to ovaries at 6-days post-emasculation (6 DPE). By contrast, the mRNA levels of SlCKX8 (Solyc10g017990), invertase inhibitor homolog (Solyc09g011290) were significantly lower in ovaries undergoing fruit set compared to those at 6 DPE, indicating that these genes are up-regulated in the absence of fruit set signaling after flowering.

Regulatory regions of PSGs as genetic engineering tools
Tissue-specific promoters are useful tools for regulating the expression of genes of interest in a spatial and temporal manner, which could provide new insights into various biological mechanisms and facilitate their application to molecular breeding, such as generating genetically modified organisms [123]. Constitutive promoters such as the 35S promoter from Cauliflower mosaic virus (CaMV) is widely used to regulate the expression of target genes in plants [124,125]. However, the constitutive regulation of target genes is not always useful, since it sometimes induces additional undesirable effects that hamper agronomic applications. Thus, in the past several decades, many tissue-specific promoters have been isolated, such as fruitspecific promoters [126][127][128][129], root-specific promoters [130][131][132], seed-specific promoters [133][134][135], many pollen and/or anther-specific promoters [36,[136][137][138][139], and so on. Several individual promoters targeting the stigma and/or style in the pistil have also been evaluated in several plant species, such as the stigma-and style-specific thaumatin/PR5-like protein(PsTL1) promoter from Japanese pear (Pyrus serotina) [140][141][142]. Importantly, the use of the ovulespecific DefH9 and INO promoters allows parthenocarpy to be efficiently induced via the ovule-specific activation of auxin signaling without producing substantial undesirable fruit traits in tomato [143]. Moreover, several reproductive tissue-specific promoters, such as sperm cell-and egg cell-specific promoters, have been successfully used to induce targeted mutagenesis by CRISPR/Cas9 [36,144]. For example, Wang et al. [144] used the promoter of egg cellspecific EC1.2 (homolog of SlEC1.2 and SlEC1.3 identified in the current study) for CRISPR/ Cas9 to generate homozygous mutants for multiple target genes in a single generation in Arabidopsis. These findings indicate that pistil-specific promoters, especially ovule-specific promoters, can be highly effective tools for plant breeding. Here, we identified various types of PSGs in tomato, e.g., SlINO and two homologs of Arabidopsis EC1.2, which were specifically expressed in the ovule (Table 2). Thus, these promoters could contribute to tissue-specific regulation or genome editing of target genes to improve fruit set by inducing parthenocarpy.

Conclusion
In conclusion, in this study, we conducted global analysis of tomato PSGs in tomato, which might be involved in the ovary development or fruit set process, by performing RNA-seq analysis and comparisons with publically available data. This study successfully identified several genes encoding signaling-related transcription factor and peptide hormone-like proteins, in addition to many genes with unknown functions (Fig 4). Although their biological functions remain to be determined, our findings lay the foundation for further analysis of the precise gene regulatory network and developmental mechanisms underlying fruit set, in addition to the usage of promoter region of PSGs for genetic engineering and molecular breeding.