iTRAQ protein profile analysis of developmental dynamics in soybean [Glycine max (L.) Merr.] leaves

Zao5241 is an elite soybean [Glycine max (L.) Merr.] line and backbone parent. In this study, we employed iTRAQ to analyze the proteomes and protein expression profiles of Zao5241 during leaf development. We identified 1,245 proteins in all experiments, of which only 45 had been previously annotated. Among overlapping proteins between three biological replicates, 598 proteins with 2 unique peptides identified were reliably quantified. The protein datasets were classified into 36 GO functional terms, and the photosynthesis term was most significantly enriched. A total of 113 proteins were defined as being differentially expressed during leaf development; 41 proteins were found to be differently expressed between two and four week old leaves, and 84 proteins were found to be differently expressed between two and six week old leaves, respectively. Cluster analysis of the data revealed dynamic proteomes. Proteins annotated as electron carrier activity were greatly enriched in the peak expression profiles, and photosynthesis proteins were negatively modulated along the whole time course. This dataset will serve as the foundation for a systems biology approach to understanding photosynthetic development.


Introduction
Soybean [Glycine max (L.) Merr.] is an important oil and grain crop and is also the world's most important source of edible vegetable oil and vegetable protein [1]. Soybean seeds are rich in protein, fat, and other nutritional compounds including isoflavones and oligosaccharides [2]. With the increasing gulf between supply and demand, fully tapping yield potential and producing elite, high yielding, high quality soybean cultivars is essential for production [3]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 Photosynthetic efficiency directly affects soybean production, so understanding the dynamic development of the leaf is of singular importance.
As the leaf is so important to the basic functions of plant growth including photosynthesis, transpiration, and respiration, it has been the focus of widespread and continuing research. The cell-specific transcriptomes of successive developmental stages were compared in bundle sheath (BS) and mesophyll (M) cells of maize leaves [4], the results showed that the number of genes preferentially transcribed in one or another cell type varies greatly between the different stages of leaf development. The transcriptome of maize leaves were analyzed using Illumina sequencing, the results quantified transcript abundance along a leaf developmental gradient in mature BS and M cells [5]. Differential gene expression analysis was performed in soybean leaf tissues at late developmental stages under drought stress showing that the down regulation of many photosynthesis-related genes can contribute to retardation of growth under drought stress which may serve as an adaptive mechanism for plant survival [6]. However, the research relevant to the dynamic development of soybean leaves is still lacking.
Studying proteins to reveal gene function and the nature of biological phenomena is important. However, biological proteins are complex, and each cell generally has thousands of proteins. Therefore, a technique capable of separating large number of proteins simultaneously is required. Proteomics is a recent developing technology which could be useful for the largescale, comprehensive study of protein structure, modification, action and expression in cells and tissue [7]. Comparative proteomics is the identification of differences and changes in the proteome between states, tissue types, and environments. On one level, this provides the tools and methods for studying the function of life and physiological and pathological phenomena, but these same tools can also be applied to examine the basic laws of life [8][9][10]. There are many methods for proteomics research. Two-dimensional electrophoresis (2-DE) is the main method for separating proteins due to its simplicity and is widely used in various plants proteomics. However, 2-DE results are not sufficiently accurate with some proteins being easily lost and others hard to detect in low abundance [11]. Compared with 2-DE, isotope affinity tag (ICAT) technology significantly improves the ease of separating membrane proteins, can be combined with high performance liquid chromatography, and provides protein isolation results which are more accurate and reliable. However, ICAT technology applies only to proteins containing cysteine residues and identifying small molecule peptides is difficult [12]. iTRAQ (Isobaric Tags for Relative and Absolute Quantitation) is a high-throughput method which can be used to study the relative and absolute quantification of 2-8 samples at the same time with good accuracy and repeatability. iTRAQ is one of the most widely used markers in comparison proteomics [13]. The research technology of iTRAQ quantitative protein has high sensitivity and can identify any type of protein. iTRAQ does not affect protein physical and chemical properties such as the isoelectric point, abundance of the protein itself, molecular weight, and hydrophobicity. At the same time, iTRAQ can detect alkaline, hydrophobic, high or low molecular weight, and low abundance proteins which are difficult to detect using 2-DE. In protein quantitative experiments using iTRAQ, more than 97% of the peptides can be marked efficiently and completely with high repeatability. In addition, through direct tandem application with the mass spectrum, iTRAQ can be automated saving time and effort with a fast reaction rate. All in all, iTRAQ is a good technique to study the plant protein quantity.
In our previously study, we compared leaf proteome specifically the photosynthetic protein expression patterns, between Hobbit and JD17 at same developmental stage [14]. In this research, we compared the protein different expression profiles in three important growth periods of Zao 5241, like V2 (second Trifolicate), R1 (beginning flowering) and R3 (beginning pod). Moreover, we analyzed enriched expressed proteins and revealed dynamic proteomes by cluster analysis. The results reported here represent a necessary step toward providing a useful tool and information for isolating proteins and better understanding the molecular basis underlying developmental dynamics in soybean.

Plant growth conditions and tissue collection
Zao 5241 is one of the most important elite germplasm lines for soybean breeding in the Huang-Huai-Hai region incorporating foreign descent such as Yanli from Japan and Williams from the United States. It is a versatile line with excellent plant type traits, early maturity, high oil content (23.77%), good resistance to four different strains of Soybean Mosaic Virus (SMV), and combining ability. It has been used to develop new soybean varieties and germplasm for twenty years in summer soybean breeding.
The plants were grown in a 50:50 mix of vermiculite and soil in a greenhouse with 12hr/ 12hr light/dark, 30˚C light/ 22˚C dark, and 60% relative humidity in Shijiazhuang (114˚26'E, 38˚03'N). Three growth stages, V2 (second trifolicate), R1 (beginning flowering) and R3 (beginning pod) were chosen as time points to analyze the protein expression profiles in order to analyze and detect different protein expression profiles associated with leave, flower and pod development. Leaves were collected at 2, 4, and 6-weeks 2 hours into the light period. Three biological replicates were carried out, and each sample was pooled from 5 plants.

Protein extraction
Total protein samples were extracted using the cold-acetone method described by Meyer et al. [15]. Leaves were ground in liquid nitrogen and suspended in precooled acetone (-20˚C) containing 10% (v/v) TCA. Then, samples were incubated at -20˚C for 2 h after thorough mixing. Proteins were collected by centrifuging at 30,000 g and 4˚C for 30 min, and the supernatant was carefully removed. To reduce acidity, the protein pellets were re-suspended with precooled acetone and centrifuged again at 20,000 g and 4˚C for 30 min. They were then washed 3 times with cold acetone and re-suspended in 1 ml protein extraction reagent [8 M urea, 4% (w/v) CHAPS, 30 mM HEPES, 1 mM PMSF, 2 mM EDTA and 10 mM DTT], the protein was resolubilized through sonication in an ultrasonic cleaner (SCIENTZ, SB4200D). The supernatant was collected after centrifuging at 20,000 g at 4˚C for 30 min. Protein concentration was determined using a 2-D Quant Kit (General Electric Company, USA). SDS-PAGE was used to verify the protein quality and concentration.

Protein digestion and iTRAQ labeling
Total protein (100 μL) was diluted by equal volume tetraethylammonium bicarbonate (TEAB, pH 8.5; Sigma, St. Louis, MO). Subsequently, modified trypsin (Promega, Madison, WI) was added to the mixture (3.3 μg trypsin/100 μg protein) to digest the protein at 37˚C for 24 hour. The solvent was then removed in a refrigerated centrifuge (Eppendorf, 5417R), and MALDI TOF/TOF (Bruker limited, Coventry, UK) was used to verify the efficiency of protein digestion.
Peptides were labeled with an iTRAQ labeling kit (Applied Biosystems) according to the manufacturer's manual. 2, 4, and 6-week-old samples were labeled with reagent 114-119, respectively. Nine Peptide samples labeled were divided into two iTRAQ groups, each iTRAQ groups were carried out independently (S1 Table). Three independent biological experiments were performed. The peptides were then combined and further fractionated offline using strong cation exchange (SCX) with high performance liquid chromatography (HLPC) system (Shimadzu, Japan) connected to a SCX column (Luna 5 μ column, 4.6 mm I.D. × 250 mm, 5 μm, 100 Å; Phenomenex, Torrence, CA). The retained peptides were eluted using Buffer A (10 mM KH 2 PO 4 in 25% ACN, pH 3.0) and Buffer B (2 M KCl, 10 mM KH 2 PO 4 in 25% ACN, pH 3.0) with a flow rate of 1 mL/min. In total, 38 fractions were collected and combined into 17 fractions to reduce peptide complexity. Subsequently, eluted fractions were lyophilized in a centrifugal speed vacuum concentrator and dissolved with 0.1% formic acid prior to reversephase nanoflow liquid chromatography (nLC) and tandem mass spectrometry (nLC-MS/MS).

MS/MS analysis
MS/MS analysis was performed on an Easy Nano-LC system (Proxeon Biosystems, Odense, Denmark) connected to a hybrid qradrupole/time-of-flight mass spectrometer (MircOTOF-Q II, Bruker, Germany) equipped with a nano electrospray ion source. Peptides of each fraction were equalized before being injected into the Nano-LC-system. The peptides were separated on a C18 analytical reverse phase column of 75μm i.d.x100mm length with Solution A (5% acetonitrile, v/v, 0.1% formic acid, v/v), and Solution B (95% acetonitrile, v/v, 0.1% formic acid, v/v) at a flow rate of 300 nL/min. After equilibrating with 5% Solution B for 10 min, the following gradient schedule would start: 45% Solution B at 80 min, 80% Solution B at 85 min, maintained for 15 min, 5% at 105 min and held for 15 min before ramping back down to the initial solvent condition. The MicroTOF-Q II hybrid MS was used to analyze the fractions. The MS/MS survey scan for all experiments between 50 and 2000 mass-to-charge ratio (m/z). The ionization tip voltage and interface temperature was run as 1250 V and 150˚C.

Protein identification and quantification
All MS data were collected using MicrOTOFcontrol and analyzed using DataAnalysis (Bruker Daltonics). Mascot v2.3.01 (Matrix Science) [16] was used to identify proteins. The Universal Protein Resource (UniProt) [17] database was used as a reference. For biological repeats, spectra from 17 fractions were combined into one file and searched. The parameters were set as follows: specifying trypsin as the digestion enzyme; cysteine carbamidomethylation as fixed modification; iTRAQ8-Plex on N-terminal residue, iTRAQ-8Plex on tyrosine (Y), iTRAQ-8Plex on lysine (K), glutamine-pyroglutamic acid and oxidation on methionine (M) as the variable modification; peptide tolerance was set at 0.05 Da, and MS/MS tolerance was set at 0.05Da. Finally, an additional filter before exporting the data was set: significance threshold P<0.05 (with 95% confidence) and ion score or expected cut-off less than 0.05 (with 95% confidence).
The protein search results were exported by Mascot then normalized and quantified using Scaffold v3.0 [18]. 4-and 6-week-old protein profiles were quantified based on the 2-week-old results.

Protein annotation
Based on the results of the protein search against the UniProt database, DAVID Bioinformatics Resources v6.7 [19] was employed to obtain the functional classification of the proteins based on Gene Ontology (GO) [20] and Clusters of Orthologous Groups (COG) terms [21] and determine the distribution of the gene functions at the macro level. The Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome.jp/kegg/) [22,23] was used to annotate the pathway of these proteins. The enrichment test of functional category proteins was performed by a chi-square test with the defining cutoff as 0.01 under an Arabidopsis thaliana background. A false discovery rate (FDR) significance threshold of 0.05 was used as falsepositive control.

Differentially expressed protein definition and cluster analysis
Proteins with significant changes during leaf development were selected using the method described before [24,25]. 90% confidence (Z-score = 1.645) of log 2 ratios was used to select the proteins whose distribution was removed from the main distribution. The mean and SD from those proteins overlapping in three biological repeats was calculated. The profiles from different time points were calculated independently, and a broad threshold was selected. The result showed the cut-off value for up-regulated proteins was 1.75-fold (mean + Z-score × SD) and down-regulated proteins was 0.69-fold (mean-Z-score × SD). Protein ratios outside this range were defined as being significantly different at P = 0.01.
The expression profiles of the differentially expressed proteins were determined by cluster analysis based on the k-means method with Pearson's correlation distance in Genesis v1.7.6 [26,27].

Identification of leaves proteins
A total of 1,245 proteins were identified in all experiments with the number of unique peptides ranging from 0 to 40. Of all proteins, only 405 (32.53%) had been well annotated previously under a Glycine max background. These were mostly composed of metabolic pathway, biosynthesis of secondary metabolites pathway, and cytoplasm component proteins. Remarkably, a total of 49 Glycine annotated proteins related to photosynthesis were found among the abundant proteins. All of the proteins were also compared against the UniprotKB/Swiss-Prot Arabidopsis thaliana protein database using blastx with an E-value cut-off of 1e-5 revealing 1,212 proteins. For the three biological replicates, 1,002, 982, and 970 proteins (80.48% 78.88%, and 77.91% of total proteins) were identified with an overlap of 727 (72.55%, 74.03%, and 74.95%) proteins. Although the lack of protein information limited identification, this investigation demonstrated the capability and applicability of the iTRAQ technology to quantify the leaf proteome of soybean.

Functional classification and enrichment analysis of proteins
For the iTRAQ dataset, 598 proteins, with at least 2 unique peptides, were used for further analysis. The iTRAQ identified proteins were classified into 36 GO functional terms which are distributed into three main categories according to their functional role: biological process (434), cellular components (460), and molecular function (439) (Fig 1).
There were 17 GO terms in the biological process category. Four out of those GO terms were significantly enriched (P<0. Four out of the ten GO Terms were enriched in the molecular functional category: structural molecule activity (67, P = 2.17E-26), antioxidant activity (19, P = 3.22E-08), electron carrier activity (40, P = 1.56E-06), and catalytic activity (267, P = 1.59E-0.5). A total of 58 proteins out of 598 were annotated with photosynthesis (GO: 0015979), a child term of the biological process category, which was significantly enriched in this dataset (P = 1.70E-44). 31 proteins were classified as electron transport chain (GO: 0022900) which is the father category of photosynthetic electron transport chain (GO: 0009767) and respiratory electron transport chain (GO: 0022904).
For KEGG analysis, those proteins were significantly enriched in Photosynthesis pathway (P = 4.18E-4 for diff-expressed proteins of four/two weeks old leaves, P = 5.45E-6 for diffexpressed proteins of six/two weeks old leaves).

Cluster analysis of differentially expressed proteins and validation with immunoassays
Clusters of differentially expressed proteins were analyzed based on the k-means methods using Pearson's correlation distance in Genesis v1.76. The proteins were divided into seven clusters based on their expression modulation representing the number of profiles indicated by FOM analysis (Fig 4A). Cluster 1 and 2 contained proteins positively or negatively modulated along the whole time course; cluster 3 and cluster 4 contained proteins modulated only during the four to six weeks developmental stage; cluster 5 and cluster 6 contained proteins positively and negatively modulated only at four weeks old leaves, and proteins up-regulated during two to four weeks stage fell into cluster 7 (Fig 4B). Functional categories enrichment were showed in Fig 4C. For example, proteins classified into catalytic activity were greatly enriched in cluster 3, with the proteins being upregulated during the second stage. Differentially-expressed proteins annotated as electron carrier activity were greatly enriched in cluster 6 with peak expression of proteins in the four week old leaves. For each cluster, the enrichment of KEGG analysis based on the DAVID database ( Fig 4C) were showed Photosynthesis pathway was significantly enriched in cluster 2 and cluster 5.
For further validation, the expression patterns of four proteins, CAB1, OEE1, LOX, and SBP were identified by western blotting and the results were consistent with the iTRAQ results, this verified the reliability of the iTRAQ system and the data analysis methods (Fig 5).

Discussion
iTRAQ is a technique with high sensitivity and accuracy, demonstrating a remarkable advantage in simultaneous analysis of multiple samples and subsequently providing the relative quantification on hundreds of proteins at one time [29]. In the present study, an iTRAQ-based proteomic technique was used to assess proteomic changes and identify proteins which were differentially expressed during soybean leaf development. In total, 1,245 proteins were identified. Data from three biological replicates were analyzed and proteins detected by querying data with a soybean protein database. Although soybean protein annotation is limited, a total of 58 proteins out of 598 were annotated with photosynthesis, and 31 proteins were classified as electron transport chain. Pathway analysis showed that photosynthesis was the most iTRAQ protein profile analysis in soybean enriched pathway followed by carbon fixation in photosynthetic organisms and photosynthesis. Differential proteomic analysis of grapevine leaves revealed responses to heat stress and subsequent recovery by iTRAQ. One hundred and seventy-four proteins were differentially expressed under heat stress and/or during the recovery phase, in comparison to unstressed iTRAQ protein profile analysis in soybean controls. Based on MapMan ontology, functional categories for these dysregulated proteins included mainly photosynthesis (about 20%), proteins (13%), and stress (8%). some proteins related to electron transport chain of photosynthesis, antioxidant enzymes, HSPs and other stress response proteins, and glycolysis may play key roles in enhancing grapevine adaptation to and recovery capacity from heat stress. In our preliminary research, proteomic analysis of two soybean cultivars were performed using iTRAQ-base quantitative. Enrichment factor analysis indicated that proteins involved in photosynthesis comprised an important category [14].
To estimate how many and what proteins expressed during the leave developing, the 2-week Protein Profile of Zao5241 was used to produce the reference, 41 and 84 of 1,245 differentially expressed proteins were found in 4 weeks and 6 weeks, respectively. On the basis of GO and pathway enrichment analysis, we concluded that the "photosynthesis was most significantly enriched. KEGG and enrichment analysis of differentially expressed proteins showed that electron carrier activity proteins were significantly enriched in the photosynthesis pathway (P = 4.18E-4 for differentially expressed proteins of four-/two-weeks old leaves, P = 5.45E-6 for diff-expressed proteins of six/two weeks old leaves). The proteomes of soybean leaves and roots under salt treatment were investigated using iTRAQ-based proteomic approach. Protein-protein interaction analysis revealed metabolism, carbohydrate and energy metabolism, protein synthesis and redox homeostasis could be assigned to four high salt stress response networks [30].
Moreover, dynamic proteomes were revealed by cluster analysis. Electron carrier activity proteins were greatly enriched in the peak expression profiles, and photosynthesis proteins were negatively modulated along the whole time course. Further studies for protein function analysis were needed to further clarify the molecular mechanism during leaf development in soybean.
Supporting information S1