Identification of Proteins Related to Epigenetic Regulation in the Malignant Transformation of Aberrant Karyotypic Human Embryonic Stem Cells by Quantitative Proteomics

Previous reports have demonstrated that human embryonic stem cells (hESCs) tend to develop genomic alterations and progress to a malignant state during long-term in vitro culture. This raises concerns of the clinical safety in using cultured hESCs. However, transformed hESCs might serve as an excellent model to determine the process of embryonic stem cell transition. In this study, ITRAQ-based tandem mass spectrometry was used to quantify normal and aberrant karyotypic hESCs proteins from simple to more complex karyotypic abnormalities. We identified and quantified 2583 proteins, and found that the expression levels of 316 proteins that represented at least 23 functional molecular groups were significantly different in both normal and abnormal hESCs. Dysregulated protein expression in epigenetic regulation was further verified in six pairs of hESC lines in early and late passage. In summary, this study is the first large-scale quantitative proteomic analysis of the malignant transformation of aberrant karyotypic hESCs. The data generated should serve as a useful reference of stem cell-derived tumor progression. Increased expression of both HDAC2 and CTNNB1 are detected as early as the pre-neoplastic stage, and might serve as prognostic markers in the malignant transformation of hESCs.


Introduction
Human embryonic stem cells (hESCs) derived from the inner cell mass of human embryos have held great promise for future cell-and tissue-replacement therapy because of their unique capacity to self-renew and to differentiate into any cell type. However, concerns have been raised with regard to the safety of hESCs, which commonly undergo adaptive changes during prolonged passaging in vitro, such as increased growth rate, reduced apoptosis and especially karyotypic changes [1][2][3][4][5][6][7][8]. With these changes, the culture adaptation of hESCs tends towards a transformed phenotype of tumor stem cells and emphasizes the need for thorough analysis of cells destined for clinical applications [2], [3], [5], [6], [8], [9]. It is important to realize that hESCs preparations destined for clinical use are free from cancerassociated genomic alterations and the ability to identify karyotypically abnormal hESCs in contrast to normal hESC can provide further insights into this abnormality and provide a screening approach to detect abnormal cells. Thus, the discovery of cell proteins that improve characterization, and are capable of distinguishing particular hESCs populations has a high potential for providing an invaluable resource for culturing hESC lines in clinical practice.
By contrast, many studies have indicated that genetic changes of transformed hESCs are associated with neoplasia, which can affect apoptotic pathways, differentiation control or cell cycle, and which may arise in precancerous cells, resulting in uncontrolled and increased growth [2], [3], [5], [6], [8], [9]. Although several mechanisms may contribute to the genomic instability of hESCs, such as abnormal DNA repair systems [10], hypoxia [11] and long interspersed element 1 s retrotransposition accommodation [12], the molecular mechanisms underlying the progression and metastasis of transformed hESCs remain poorly understood. Thus, transformed hESCs may serve as an excellent model for characterizing the initial stages that determine transition of embryonic stem cells into cancerous stem cells.
To date, many attempts have been made to study the global genome of hESC and its chromatin state [5], [13][14][15]. However, characterization of the proteome of hESCs, especially of aberrant karyotypic hESCs, has just begun. Some of the proteomic studies have been performed on human ESCs and ECCs [16][17][18][19] including non-quantitative and quantitative analyses. However, although useful in providing critical information with regard the regulators of these two closely related, but developmentally distinct stem cells, proteomic studies do not permit differential analyses among these populations nor do they give quantitatively differential analyses of transformed hESCs at different stages of development.
Our previous reports indicated that during long-term culture, chHES-3 cells [20], which were established in our laboratory, acquired genomic alterations. In addition, the anti-apoptotic and proliferative ability of transformed chHES-3 gradually increased with increasing karyotypic complexity, which had a tendency to progress to malignant cells [8], [9]. Here, we employed ITRAQbased quantitative and comparative proteomics [21][22][23][24] to quantify proteins of normal and aberrant karyotypic hESCs from simple to more complex karyotype abnormalities. We believe that such analyses will provide functional characterization to distinguish transformed and normal hESCs cells. Such information should advance understanding of the molecular mechanisms that regulate carcinogenesis of karyotypically abnormal hESCs. Additionally, such analyses could help identify the factors involved in hESCs proliferation, self-renewal and pluripotency, and thus contribute to the discovery of transformed phenotypic biomarkers for further monitoring the clinical safety and use of hESCs in therapeutic applications.

Cell Culture
The derivation experiment of hESCs was approved and guided by the local ethical committee of Reproductive and Genetic Hospital of CITIC-Xiangya, China. The hESC lines (chHES- 3) were established in our laboratory as previously described [20]. chHES-3 was cultured in serum-free medium and on embryonic fibroblast cells(MEF), which were isolated from Chinese Kun-Min White mice at 12.5 d with a high density of 6-7610 4 cells/cm 2 and were inactivated by mitomycin-C (10 mg/ml) [9]. Every 6-7 days, the chHES-3 cells were passaged using mechanical dissection or a combination of 200 U/ml collagenase IV digestion reagent (Gibco-BRL, USA) followed by mechanical slicing. After multiple passaging, chHES-3 cells acquired chromosomal abnormalities when cultured in this suboptimal culture conditions. By periodically examining karyotype of hESCs every 5-10 passages, we found that the karyotype of the chHES-3 cell line displayed a slowly progressive changes. But there existed only one karyotype after 142 passages by karyotype analysis of standard G-banding [8], [9].
The hESC lines were cultured under identical conditions over a 4 wk period to prevent culture variation, and were constantly monitored for any differentiation events by immunocytochemistry. To completely remove traces of feeder cells, hESCs were cultured on MEFs and subsequently passaged on matrigel-coated plates for five sub-cultures in conditioned media. Media that was prepared for feeder-free cultures was conditioned by exposure to MEFs for 24 h, which was then supplemented with an additional 4 ng/mL of bFGF. Conditioned media was filtered using a 0.2-mM filter. Before lysis, ESCs were washed three times with cold PBS to remove traces of cell-culture contaminants from the culture medium. The human embryonal carcinoma cell line (hECCs) NTERA-2 cl.D1 (EC) was obtained from the American Type Culture Collection (Manassas, Virginia, USA) and cultured on matrigel-coated plates under conditions described previously for this cell line [25]. All other hESC lines were established and cultured in our laboratory as previously reported [26].

Karyotypic Analysis
Karyotype analysis by standard G-banding was performed with approximately 50 metaphases and three successive passages of hESCs were analyzed. Human ESCs cells were cultured in hESC medium containing 0.06 ug/ml colcemid (Sigma, USA) for 2.5 h. After washing with PBS for three times, the cells were incubated in hESC medium containing 0.05% trypsin and 0.53 mM EDTA (Gibco-BRL, USA), at 37uC for 5-10 min and harvested using standard procedures, followed by standard G-banding for karyotyping. At least 50 metaphase spreads were examined for each sample using an Olympus epi-fluorescence microscope BX51 (Olympus, Tokyo, Japan) with LUCIA KARYOTYPE software (Lucia, Praha, Czech Republic).

Detection of hESC-specific Markers
Antibodies used for immunocytochemical staining and Western blot analyses are summarized in the Table S1. HESCs-specific surface markers consisting of OCT4, TRA-1-60, TRA-1-81, SSEA-4, SSEA-3 and SSEA1 were tested by immunostaining. Cells were fixed in 4% paraformaldehyde for 20 min, permeabilized with 0.2% Triton X-100 for 10 min, and blocked in 4% goat serum in PBS for 30 min. Cells were incubated with primary antibody overnight at 4uC. Next, the cells were stained with Alexa Fluor (Invitrogen, USA) secondary antibody for 1 h. Nuclei were then counterstained with 49, 6-diamidino-2-phenylindole (DAPI, KPL, Gaithersburg, MD, USA). Alkaline phosphatase activity was detected according to the protocol of the Fast Red Substrate Pack (Invitrogen, USA).

Protein Preparation and Quantification
Normal chHES-3 cells (Normal) of passage number 30 (P30, P represents the hESCs culture passage in vitro), and karyotypically aberrant chHES-3 cells with a simple duplication karyotype (SIMP) of passage number 72 (P72) and complex karyotype (COMP) of passage number 182 (P182) were cultured in 6-well tissue culture plates in clone-like clumps, wherein we selected out undifferentiated ES cells under an inverted microscope. HECCs were also collected in serum-free media. The isolated cells were washed three times with ice-cold PBS, and collected by centrifugation at 1000 rpm. Cells were suspended in 200 ml of cell lysis buffer (7 M urea, 1 mg/mL DNase I, 1 mM Na3VO4, and 1 mM PMSF) at 4uC. The cell lysate was subjected to intermittent sonication using a Vibra Cell TM high intensity ultrasonic processor (Jencon, Leighton Buzzard, Bedfordshire, UK). The remaining unbroken cells and debris were removed by centrifugation at 12, 0006g at 4uC for 10 min. Protein concentrations of the cleared lysates were determined by 2-D quantification kit (Amersham Biosciences, Uppsala, Sweden) according to the manufacturer's instructions.

Protein Digestion and iTRAQ Labeling
The approximately 100 mg of proteins were reduced with 5 mM tris-carboxyethyl phosphine hydrochloride (TCEP) for 60 min at 37uC, alkylated with 10 mM methylethanethiosulfonate (MMTS) for 20 min at room temperature (RT), and then diluted 10 times with deionized water prior to digestion with 20 mL of 0.25 mg/mL sequencing grade trypsin (Promega, WI, USA) overnight at 37uC. Samples were then air-dried using a Speedvac (Thermo Electron). Peptides generated this way were labeled with iTRAQ reagents according to the manufacturer's supplied protocol (Applied Biosystems, MA, USA). Briefly, digested proteins were reconstituted in 30 ml of dissociation buffer (0.5 M TEAB) and mixed with 70 ml of ethanol-suspended iTRAQ reagents (one iTRAQ reporter tag per protein sample). The samples were labeled with the respective tags as follows: chHES-3 cells subsets including Normal, SIMP and COMP chHES-3 cells, were labeled with the reporter tags 114, 115, and 116 respectively, and hECCs cells were labeled with reporter tag 117. Labeling reactions were carried out at RT for 60 min before all samples were mixed in a single tube and airdried using a Speedvac.

Fractionation of Peptides by Isoelectric Focusing (IEF) on an Immobilized pH Gradient
iTRAQ-labeled tryptic peptide samples were dissolved in 300 mL of 8 M urea and 1% Pharmalyte (Amersham Biosciences). Samples were used to rehydrate IPG strips (pH 3210, 18 cm long, Amersham Biosciences) for 14 h at 30 V. Peptides were subsequently focused successively for 1 h at 500 V, 1 h at 1000 V, 1 h at 3000 V and 8.5 h at 8000 V to give a total of 68 kV?h on IPGphor (Amersham Biosciences). The strips were then removed and quickly cut into 3660.5 cm pieces. We performed peptide extractions by incubating the gel pieces in 100 mL of 2% acetonitrile and 0.1% formic acid for 1 h. These fractions were lyophilized in a vacuum concentrator and subjected to C-18 cleanup using a C18 Discovery DSC-18 SPE column (100 mg capacity, Supelco, Sigma-Aldrich). The cleaned fractions were then lyophilized again and stored at 220uC prior to mass spectrometric analysis.

Mass Spectrometric Analysis Using Q-STAR and Data Analysis
Each cleaned-up peptide fraction was resuspended in 20 ml of Buffer A (0.1% formic acid in 2% acetonitrile). Ten microliters of sample was injected into the nano-LC2ESI2MS/MS system for each analysis. Mass spectrometry was performed using a QStar Elite Hybrid ESI Quadrupole time-of-flight tandem mass spectrometer (ESI-Q-TOF-MS/MS, Applied Biosystems, Framingham, MA, USA; MDS-Sciex, Concord, Ontario, Canada) coupled to an online capillary liquid chromatography system (Dionex Ultimate 3000, Amsterdam, The Netherlands). The peptide mixture was separated on a PepMap C-18 RP capillary column (Dionex) at 0.3 ml/min. A 125-min gradient was used, in which the gradient was started with 4% Buffer B (0.1% formic acid in 98% acetonitrile) and 96% Buffer A for 3 min, followed by 3 ramping gradients of 4210% Buffer B for 7 min, 10235% Buffer B for 55 min and 352100% Buffer B for 25 min. This was then held in 100% Buffer B for 15 min and finally in 96% Buffer A for 20 min.
The mass spectrometer was set to perform data acquisition in the positive ion mode, with a selected mass range of 30021800 m/z. The time of summation of MS/MS events was set to be 2s. This refers to the amount of time allowed for the machine to accumulate MS/MS events before switching back to the MS scan. The two most abundant charged peptides above a 20-count threshold were selected for MS/MS and dynamically excluded for 30 s with 650 mDa mass tolerance. Protein identification and quantification for iTRAQ samples were carried out using ProteinPilot software (version 2.0; Applied Biosystems, MDS-Sciex). Following independent analyses, three datasets from biological replicates were searched as one. The search was performed against the International Protein Index (IPI) human database (version 3.41, date of release: March 2008, 72, 155 sequences). We searched databases by setting cysteine modification by MMTS as a fixed modification. Other parameters included mass tolerance of up to 0.2 Da, maximum of one missed cleavage of trypsin, oxidation of methionine, N-terminal iTRAQ labeling and iTRAQ labeled-lysine. Relative quantification of proteins in the case of iTRAQ was performed on the MS/MS scans and was the ratio of the areas under the peaks at 114, 115, 116 and 117 Da, which represented the masses of the tags that corresponded to the iTRAQ reagents used to label the samples. Statistical calculation for iTRAQ-based detection and relative quantification was performed using the Paragon Algorithm19 embedded within the ProteinPilot software.
Following data analysis by the ProteinPilot software, the protein summary results were exported into an Excel spreadsheet and manually inspected and processed. Briefly, for protein identification and quantitative analysis, 95% confidence intervals were used. Protein identification must be based on at least two unique peptides and the p-values for the relative quantification by iTRAQ must be P,0.05. Protein hits that did not satisfy these criteria were removed.

PANTHER Analysis
The PANTHER database was used to elucidate the molecular function, biological process and signaling pathways associated with each individual protein (http://panther.appliedbiosystems.com/).

Reanalysis of Microarray Data
Microarray data of normal chHES-3 cells and aberrant chHES-3 cells with a simple duplication karyotype (SIMP) and complex karyotype (COMP) were obtained from PubMed GEO datasets (GSM172579, GSM172580, GSM172581, GSM172582). To compare transcriptomic and proteomic data, all proteins with a SWISS-PROT accession number were selected, and these accessions were used to identify the relevant probeset identifiers from the microarray dataset. Protein and mRNA data were combined in the Microsoft Access query tool. Data were processed by normalizing expression values to reference data and generating ratios of mRNA data from each replicate of the experiment by PartekH Genomics Suite TM software. Ratios were averaged in Excel using the GeoMean (geometric mean) function, and the student's T-test was performed to obtain P values. Hierarchical cluster analysis was performed with Cluster 3.0 software.

Real-time Quantitative RT-PCR
Total RNA was extracted using Trizol reagent (Gibico BRL, Grand Island, New York, USA) according to the manufacturer's instructions. Two microgram of RNA per sample was reversetranscribed into first-strand cDNA by using the A3500 reverse transcription system (Promega, USA) in a standard protocol with random oligo (dT) primers. According to the manufacturer's instructions, real-time PCR amplifications were performed on the Roche LightCycler system (Roche Diagnostics, Mannheim, Germany) with SYBR Green I dye, which binds preferentially to double-strand DNA and enables real time detection of PCR products. The cDNA was submitted to real-time PCR using the following primer pairs as shown in Table S2 (Supporting Information) (Origene, Rockville, MD). Briefly, a 20 ml reaction mixture containing 2 ml of cDNA, 2 ml of Faststart DNA Master SYBR Green 1 mix (Roche Diagnostics, Mannheim, Germany), 0.5 ml of 10 mmol/L PCR forward primers, 0.5 ml of 10 mmol/L PCR reverse primers, 1 ml of 25 mmol/L MgCl 2 and 14 ml H 2 O was loaded into glass capillary tubes, and cycling was carried out as follows: 50uC for 2 min and 95uC for 5 min followed by 40 cycles of 95uC for 30 s, 56uC for 30 s and 72uC for 30 s. After each run, the cycle threshold (CT) values were provided by realtime PCR instrumentation by the LightCycler software. A melting curve analysis was performed to determine the specificity of the amplified products. Analysis of relative gene expression was performed using the 2 2DDCT method as described [27]. Evaluation of 2 2DDCT indicates the fold change in gene expression relative to the internal standard gene 28S and takes into account the standard deviation. Individual CT values were based on three separate measurements. The specificity of the PCR amplification was directly verified by melt-curve analysis of the final products in the iCycler. To verify the melting curve data, all PCR products were verified by DNA sequencing.

Western Blot Analysis
Western blot analyses were performed as described previously [28]. The cells were harvested from flasks, washed twice with cold PBS and lysed in a lysis buffer (50 mmol/L Tris, PH7.4, 100 mmol/L NaCl, 1 mmol/L MgCl 2 , 2.5 mmol/L Na 3 VO 4 , 1 mmol/L PMSF, 2.5 mmol/L EDTA, 0.5% Triton X-100, 0.5% NP-40, 5 mg/mL of aprotinin, pepstatin A, and leupeptin) for 60 min on ice, followed by centrifuging at 11,0006g for 15 min at 4uC to remove cell debris. Then, proteins were quantified by the Bradford reagent assay (Bio-Rad). After an addition of 2 6loading buffer, 80 mg of lysate was boiled at 95uC for 5 min and was separated through 10% or 12% SDS-PAGE gels. Proteins were subsequently electrotransferred to Hybond-P PVDF membranes. After blocking with 5% nonfat dry milk in TBS-T containing 0.1% Tween-20 for 2 h at room temperature, the membranes were probed with anti-DNMT3B, anti-CTNNB1, anti-HDAC2, anti-VIM, anti-DNMT3A, anti-NES, anti-HSPA1A, anti-HIST1H1B, anti-H3K9ac3, anti-H3ac, anti-H4ac, anti-H4k12ac or anti-b-ACTIN diluted 1:1000-1:2000 overnight at 4uC, followed by incubation in a 1:2000 dilution of secondary antibodies conjugated to horseradish peroxidase for 1 h at room temperature. Antibodies are summarized in Table S1. Protein bands were detected using the ECL detection system, followed by exposure on Hyperfilm (Amersham Biosciences). All Western immunoblots were performed at least three times. In each experiment, membranes were also probed with anti-b-ACTIN antibody to correct for differences in protein loading. The Image J image analysis system was applied to analyze the strap of Western Blot and to calculate their grayscale ratio relative to the expression of b-ACTIN.

Quantification of Gene Copy Number
Genomic DNA was isolated from cells using the Qiagen DNAeasy extraction kit (Qiagen). Genomic DNA (50 ng) was amplified using Roche LightCycler system. Gene copy number was compared by the 2 2DDC(t) method, and normalized to the results obtained using the nuclear b-globin gene of nuclear as an endogenous reference gene [27,29]. Individual CT values were based on three separate measurements. All primer sequences used for qPCR are listed (Table S3). Specificity for each primer pair was examined by melting curve functionality and PCR products were verified by DNA sequencing.

Spontaneous Differentiation of hES Cells
Spontaneously in vitro differentiation was conducted through embryoid body (EB) formation. Human ES colonies were mechanically dissociated into small clumps and detached to grow as aggregates in suspension to form embryoid bodies in DFSR medium without bFGF. The medium was changed every 2 days. At day 21, EBs were collected for further analysis.

Statistical Analysis
Independent sample t-tests between groups were used to evaluate the statistical significance of mean values by using SPSS 18.0 for Windows. Homogeneity of variance was analyzed before the independent sample t-test, in which statistical significance levels of the two variance estimates was P.0.1. If the varience was not equal, unequal-variances t test was used. Statistical significance levels were P,0.05 (denoted as *). All P values were two-tailed.
In this study, iTRAQ was used for multiplexed peptide profiling of normal and karyotypically aberrant chHES-3 cells and EC cells labeled with four independent reagents of the same mass that upon fragmentation in MS/MS, gave rise to four unique reporter ions (m/z = 114-117). These were subsequently used to quantify the Normal, SIMP and COMP chHES-3 cells, and EC cells respectively. Examination of the average values and standard deviations of the data from triplicate experiments revealed that the overall variation was less than 30%. LC-MS/MS analysis of 20 SCX fractions from cell lysates generated a total of 190, 569 validated MS/MS spectra. Using a confidence cut-off score, defined as the ProtScore value .1.3 (95% confidence), a total of 2583 proteins were identified from 102, 640 distinct peptides. The iTRAQ ratios, their respective statistical values and % coverage for a total of 2583 proteins detected with 95% confidence are shown in Table S4 (Supporting Information). Nearly, 518 proteins showed more than 1.3 fold changes (ratio ,0.77 or .1.3) in expression levels and had P,0.05 in SIMP chHES-3 cells as compared to the normal cell line. By contrast, 220 proteins showed more than 1.3 fold changes with P,0.05 in COMP chHES-3 cells ( Figure 1A-a). Due to space constraints, 316 proteins with more than 1.3 fold changes in SIMP or COMP chHES-3 cells with P,0.05 as compared to the normal cell line are shown in Table  S5.
By contrast, Table 1 provides the partial list of proteins (top 10) identified in this study that were highly expressed or decreased in SIMP or COMP chHES-3 cells, or EC cells when compared with Normal chHES-3 cells. MS/MS and iTRAQ reporter ion spectra of representative peptides from 12 proteins with different expression levels in normal and karyotypically aberrant chHES-3 cells and EC cells were shown in Figure 2. The MS/MS spectra and iTRAQ ratios of peptides from LIN28 homolog (LIN28), Rho-associated protein kinase 2 (ROCK2) and Borealin (CDCA8) showed no significant changes (Figure 2 A). The MS/MS spectra and iTRAQ ratios of peptides from developmental pluripotency associated 4 (DPPA4), high mobility group protein B1 (HMGB1) and isoform 1 of PC4 and SFRS1-interacting protein (PSIP1) were highly expressed in normal chHES-3 cells (Figure 2 B). The MS/ MS spectra and iTRAQ ratios of peptides from Thy-1 cell surface antigen (THY1), isoform 2 of podocalyxin-like protein (PODXL) and DNA (cytosine-5)-methyltransferase 3A (DNMT3A) were highly expressed in SIMP chHES-3 cells (Figure 2 C). Furthermore, Panels of Figure 2 D showed MS/MS spectra of peptides from DNA (cytosine-5)-methyltransferase 3B (DNMT3B), 14-3-3 protein zeta/delta (YWHAZ), and histone deacetylase 2 (HDAC2), which were all highly expressed in both SIMP and COMP chHES-3 cells. Quantitative data was supported by Pvalues wherever more than two peptides were used for quantitation, and each was done with biological replicates. The error factor, which was similar to the SD and gave a measure of the certainty of the average ratio, and the number of peptides (.95% confidence) used for quantitation were also included. The ProteinPilot software program calculates the Error factor = 10 95%Confidence error .

Comparison between the Proteome and Transcriptome of Normal and Karyotypically Aberrant chHES-3 Cells
Further analysis of these data was performed by comparing relative quantification of proteins with the microarray data. We correlated 2583 proteins with probe sets from the microarray analysis. The reason a complete comparison was not made was because of an inability of the PartekH Genomics Suite TM software (Gene Company Limited) to match certain protein accession numbers to their associated microarray probe sets with a sufficiently high level of certainty. We thus concentrated data analysis where co-identity of protein and probe set data was assured (Table S6). Based on an analysis of the matched datasets we found the vast majority of identified proteins and mRNA were unaltered, and a low correlation between the protein and microarray changes seen for each gene product (Figure 1). This was in agreement with several other proteomic analyses in multiple systems, which have concluded that the proteome and transcriptome correlate only weakly, if at all [31][32][33][34][35][36].
Venn diagram illustrations of the data showed protein expression level changes by ITRAQ analysis (a) and mRNA level changes by expression chip analysis (b) for both SIMP and COMP chHES-3 cells, and for EC cells as compared with normal chHES-3 cells ( Figure 1A). The SIMP (iTRAQ 115) chHES-3 cells, COMP chHES-3 cells (iTRAQ 116), and EC cells (iTRAQ 117) as compared with normal (iTRAQ 114) chHES-3 cells, showed 518(20.05%), 220(8.52%) or 461(17.85%) differentially expressed proteins respectively, with common proteins in 97 cases ( Figure 1Aa, Figure 1B). With regard the changes in mRNA levels of 2583 identified proteins by ITRAQ analysis, the results obtained from the changes in expression as shown by chip probe analysis, showed changes in 429, 666 or 953 probe values from SIMP chHES-3 cells, COMP chHES-3 cells, and EC cells respectively, with common probes in 195 cases ( Figure 1A-b).
However, many changes in protein levels could not be ascribed to altered mRNA expression based on the lack of correlation spanning each pair-wise comparison of the protein and mRNA ratios between the normal and aberrant karyotypic hESCs ( Figure 1B). The SIMP (iTRAQ 115) chHES-3 cells (Figure 1Ba) showed 210 cases (of a total of 518 protein changes) as compared with normal chHES-3 cells (iTRAQ 114) where the proteins were significantly augmented, yet mRNA levels were not significantly changed. In addition, in 225 cases where the proteins were significantly down-regulated, we did not shown any significantly changed mRNA expression levels as compared to normal chHES-3 cells (iTRAQ 114). Only 38 (12 upregulated and 26 down-regulated) of the 518 protein changes correlated with similar changes seen in mRNA levels. This trend continued when the COMP (iTRAQ 116) chHES-3 cells ( Figure 1B-b) were compared with normal chHES-3 cells (iTRAQ 114), which showed 68 cases (of a total of 220 protein changes) where the proteins were significantly enhanced, yet the mRNA levels were not significantly changed. In addition, there were 88 cases where the proteins were significantly down-regulated yet the mRNA levels were not significantly changed. Only 25 (7 upregulated and 18 downregulated) of the 220 protein changes were seen at the mRNA level as well. As for the EC (iTRAQ 117) cells ( Figure 1B-c) as compared with normal chHES-3 cells (iTRAQ 114), this comparison showed 130 cases (of a total of 461 protein changes) where the proteins were significantly upregulated yet the mRNA levels were not significantly changed. Moreover, 172 cases were identified where the proteins were significantly down-regulated yet the mRNA levels were not significantly changed. Finally, 114 (36 upregulated and 78 down-regulated) of the 518 protein changes were correlated to the same changes seen in mRNA.
Furthermore, when changes in mRNA expression levels of 2583 proteins were considered in the SIMP chHES-3 cells as compared with normal chHES-3 cells, 346 genes (123 upregulated and 223 down-regulated) of a total of 429 with changes in mRNA expression were identified; in these cases, levels of protein expression were found to be unaltered ( Figure 1B-d). As for COMP chHES-3 cells, 602 (205 upregulated and 397 downregulated) of a total of 666 genes with mRNA expression changes were identified; in these cases, protein expression levels were unaltered as compared with normal chHES-3 cells (Figure 1B-e). Finally, there were 794 genes (263 upregulated and 531 downregulated) of a total of 953 in EC cells showing changes in mRNA expression levels; in this analysis, the protein expression levels were unaltered as compared with normal chHES-3 cells.
However, hierarchical cluster analysis of 2583 proteins datum ( Figure 3A) in ITRAQ analysis (a), and in associated microarray probe sets (b), showed that Normal chHES-3 cells and SIMP chHES-3 cells were more closely related to each other than to COMP chHES-3 cells, which were more closely related to EC cells. Coincident with the cluster results of a total number of 2583 proteins, hierarchical cluster analysis of 316 differential expressed proteins ( Figure 3B) in ITRAQ analysis (a) and in associated microarray probe sets (b), showed similar observations. But, studies of protein levels in these cells were also required since the levels of protein expression do not always directly correlate with the transcriptomic changes ( Figure 1A-B).

Categorization and Functional Annotation Analysis of Proteins Quantitated in Normal and Aberrant chHES-3 Cells
Functional annotations of the combined lists of proteins from all three experiments are shown in Figure 4. These 316 proteins, which were differentially expressed between normal and aberrant chHES-3 cells, could be classified into 11 functional categories using the PANTHER classification system (http://www. pantherdb.org). The top three molecular function categories were binding (GO:0005488)(39.8%), catalytic activity (GO:0003824) (31.0%), and structural molecule activity (GO:0005198) (11.4%).
Futhermore, significantly over-and under-represented GO biological process and protein class terms for the set of differentially expressed proteins were represented as follows: the top three biological process categories were metabolic process (29.0%), cellular process (16.5%) and cell communication (8.0%). Examples of some early developmental markers high in normal karyotypic hESCs included NES (Nestin), GRB2 (Isoform 1 of Growth factor receptor-bound protein 2), RBM14 (Isoform 1 of RNA-binding protein 14) and HNRNPAB (Heterogeneous nuclear ribonucleoprotein A/B). Several proteins related to hyper-proliferation and suppression of apoptosis such as HDAC2 and API5 (Apoptosis inhibitor 5) were highly expressed in abnormal hESCs. However, proteins that were associated with apoptosis such as HNRNPK (Heterogeneous nuclear ribonucleoprotein K), DFFA (DNA fragmentation factor subunit alpha) and PRKDC (DNA-dependent protein kinase catalytic subunit) were down-regulated in the karyotypically abnormal hESCs.
We also investigated relative mRNA expression levels of EIF3D (eukaryotic translation initiation factor 3, subunit D), HIST1H1B (Histone H1.5), HNRNPD (AU-rich element RNA binding protein 1, P37 kDa) were consistent with proteomic analysis, while they showed no effective signaling in the analysis of the expression chip datum ( Figure 5B). Furthermore, some genes such as DNMT3A, CTNNB1 (b-catenin), HDAC2, SMAD2,NES, and HSPA1A (heat shock 70 kDa protein 1A) showed no significant changes in mRNA expression levels, whereas the proteomic analysis showed that these proteins were differentially expressed in karyotypically aberrant chHES-3 cells as compared with normal chHES-3 cells ( Figure 5B).
To further verify the proteomic data, we studied expression levels of 8 proteins by Western blot analysis, among which DNMT3B, CTNNB1, HDAC2, VIM, and DNMT3A were upregulated in karyotypically aberrant chHES-3 cells or EC cells, while by contrast NES, HSPA1A and HIST1H1B were downregulated. As a loading control, b-ACTIN showed no obvious changes ( Figure 6A). Among these, enhanced expression of HDAC2 protein was observed in karyotypically aberrant chHES-3 and EC cells. HDAC2 is a member of histone deacetylases and plays an important role as a transcriptional repressor [42], [43] by interacting with sumoylated substrates, and thus could be associated with the decrease of acetylation levels of histones. So we also investigated the decreased acetylation levels of histones including H3, H4 and H3K9, which was detectable with antibodies recognizing acetylated histone H3 (H3ac), acetylated histone H4 protein (H4ac) and acetylated histone H3K9 (H3K9ac3), with no obvious changes in acetylated histone H4K12 (H4K12ac)( Figure 6B). The balance of histone acetylation is important for proper cellular function, and such changes might malignant formation of karyotypically aberrant chHES-3 and EC cells.

Copy Number Variations and Functional Analysis of Differentially Expressed Proteins Including DNMT3B, VIM, CTNNB1 and HDAC2
This large-scale quantitative proteomic analysis of normal and aberrant karyotypic hESCs should serve as a useful reference set in understanding the malignant transformation process of aberrant karyotypic hESCs which is closely related to changes in DNA methylation, histone acetylation, cell cycle and apoptosis, WNT pathways, and other systems. To verify the iTRAQ data, and to study whether the differentially expressed proteins in the malignant transformation process were vulnerable during prolonged culture under optimal culture conditions [44], Western blot   (Figure 7), all of which showed a normal karyotype (data not shown). Western blot analyses showed that both DNMT3B and VIM exhibited relatively significant increases in late passages, which showed 2.1 and 1.7 fold changes as compared with early passage of hESCs (P,0.01). By contrast, the protein expression of both CTNNB1 and HDAC2 displayed no obvious changes, which was consistent with no obvious change seen in the levels of histone acetylation including H3, H4 and H3K9. Furthermore, OCT4 and b-ACTIN were detected as a loading control, and showed no obvious change in expression (Figure 7, Figure S2).
Gene amplification is a common mechanism for over-expression of oncogenes in cancers. We thus detected the copy number variation of these genes with up-regulated protein expression, including DNMT3B, VIM, CTNNB1 and HDAC2, in chHES-3 and six pairs of hESCs samples at different passages, as mentioned above. We did not detect the copy number gain or loss in the amplified gene regions of DNMT3B, VIM, CTNNB1 and HDAC2 by quantitative PCR in late passages of all six hESCs samples and the normal and aberrant karyotypic chHES-3 cells (Figure 8). However, we detected increased mRNA expression of DNMT3B and VIM in those hESC lines (Figure 9) that demonstrated upregulation of corresponding protein in the late passages (Figure 7 and Figure S2). Those indicated that increased protein expression of DNMT3B and VIM in late-passage cells might result from the transcriptional regulation, and epigenetic mechanisms, but not copy number amplification.
By contrast, several studies have suggested that the maintenance of mutipotential differentiation ability in embryonic stem cells is controlled by epigenetic mechanisms such as DNA methylation and histone modifications [45]. To determine whether increased expression of genes associated with epigenetic modification such as DNMT3B in the late passages of hESCs affects the pluripotency of hESCs, we compared the differentiation capability of early and late passages hESCs from three cell lines including chHES-8, chHES-69 and chHES-22. All three cell lines were able to differentiate into the three germ layers in vitro. This is consistent with most published studies, which reported that late-culture hESCs still maintained their multi-lineage differentiation ability [46]. However, the expression levels of 8 marker genes representing different germ layers, showed obvious differences among different cell lines rather than the difference between early and late passage of the same cell line. chHES-8 and chHES-69 cells showed decreased expression of most of the marker genes in late passage cells. By contrast, chHES-22 cells showed increased expression levels of most of marker genes in the late passage. But there was no correlation of differences in this differentiation pattern with differentially increased expression levels of DNMT3B and VIM in the late passage of chHES-8 and chHES-22 cells, while chHES-69 cells showed no clear difference of DNMT3B and VIM (see Figure  S3; Supporting Information). Based on the current experiments, we did not find any differentiation defects or bias for the latepassage hESCs with increased expression of the biomarker genes.
As for CTNNB1 (b-Catenin), it is a major transcriptional modulator of the WNT signal transduction pathway, and showed increased expression in karyotypically abnormal hESCs relative to normal cells. To determine whether elevated CTNNB1 leads to more activated WNT pathway, we analyzed the expression of MYC and CCND1 (Cyclin D1), as the downstream genes of CTNNB1, in chHESC-3 and six hESC cell lines at early and late passages. We detected the increased expression of these genes in aberrant chHES-3 but not in six pairs of hESC cells, which are correlated to the expression variation of CTNNB1 protein in these samples ( Figure 10). This indicated that the function of CTNNB1 was to modulate the activity of the WNT signaling pathway, and implicated its role in stem cell-derived tumor initiation and progression.

Discussion
Quantitative proteomics have been shown to be a useful technique for studying the molecular mechanisms in different stages of disease. Relative and absolute quantification (iTRAQ) analysis is currently one of the most widely used approaches for high throughput protein quantitation, and enables simultaneous quantitation of up to 8 different biological samples [47], [48]. The aim of this iTRAQ proteomic study was to gain insight into the molecular mechanisms of the transition of normal embryonic stem cells and provide a screening tool to detect abnormal cells, which would be essential for developing clinical therapies.
Here, we have identified and relatively quantified 2583 proteins with 95% confidence (Table S4) in chHES-3 cell lines of progressive karyotypic change. Hierarchical cluster analysis based on quantified proteins and associated microarray probes demonstrated that chHES-3 cells showed a tendency to malignant transformation from normal chHES-3 cells to COMP chHES-3 cells, which were closely related to EC cells (Figure 3). This further supported that our model successfully simulated the tumorigenic process.
Many factors expressed in early development and those associated with pluripotency were identified, which provides a model system to distinguish pluripotency and oncogenesis. chHES-    Figure S1), which were expressed at such low abundance as to be still detected. However, other more abundant members were expressed in all of the hESCs and hECCs cells. Several of these are known markers of undifferentiated hESC which had been previously reported [18]. Microarray analysis is wildly used in the detection of gene expression variations in pre-neoplastic stages to assist in the identification of reliable markers of tumorigenesis at the mRNA level. However, only a few comparisons have been performed on the transcriptome and their associated levels of protein expression [8], [17], [18]. The biological and analytical impact of this comparative analysis demonstrates that changes at the mRNA level cannot be used to assume concordant protein expression levels. Further, it indicates the importance of investigating directly the differences in protein expression. Here, hESCs at a specific stage of transformation were collected to undertake both mRNA and proteomic analyses. There was evidence of a poor statistical agreement between mRNA and protein expression changes for a variety of gene products in karyotypically abnormal hESCs ( Figure 1). Modulation of transcription does not directly govern the levels of many proteins, which was also confirmed by the inconsistent results of real-time quantitative RT-PCR validation of differentially expressed proteins in this study ( Figure 5B).
We developed an interaction network of proteins identified as changing in expression during transformation of hESCs. DNA methylation and histone deacetylation are key factors in the regulation of transcription. Deregulation of epigenetic information with pluripotent potential may also alter the defining properties of stem cells, their self-renewal and their differentiation potential, leading to the initiation of cancer. An interesting point was found with respect to HDAC2, which belongs to the histone deacetylase family and showed increased expression in karyotypically abnormal hESCs and in EC cells, which was accompanied by decreased acetylation of histones H3, H4 and H3K9 ( Figure 6). It is well established that modifications of histones regulate the architecture of chromatin, which is an important factor in regulating gene expression. HDAC2 can interact with HDAC1, and together form the catalytic core of a number of complexes, which target chromatin by sequence-specific transcription factors to repress transcription and in cooperation with other chromatin modifiers. HDAC1/2 could directly mediate repressive functions of a number of well-characterized cellular oncogenes and tumorsuppressor genes, leading to aberrant gene expression [42], [49].
Aberrant regulation of HDAC2 was reported to play a pivotal role in the generation and development of many type of cancers including hepatocellular carcinoma [50], ovarian cancer [37], gastric cancer [51] and others. Giudice [52] reported that chemical inhibition of HDACs reduces the number of cancer stem cells (CSC) and inhibits clonogenic sphere formation. That ablation of HDAC family in many tumor cell lines led to severe proliferation defects or enhanced apoptosis, also suggests that it might be a target for cancer therapy.
We further compared the significance of differentially expressed proteins in our early and later passage hESCs in an optimized culture condition [44] since we wished to exclude the influence of other changes associated with the in vitro culture adaptation. Our data showed relatively stable expression of HDAC2 in long-term in vitro cultures of normal hESCs (Figure 7), but displayed increasing levels during tumorigenesis followed with increased histone deacetylation (Figure 6). These observations supported the idea that enhanced HDAC2 expression may be associated with malignant transformation by regulating the architecture of chromatin. Thus HDAC2 could serve as a potential marker for abnormal hESCs with a tendency of initiating progression to the malignant state.
Also, our data showed increased levels of proteins associated with DNA methylation, including DNMT3A, DNMT3B, DNMT1 and KPNB1. These proteins might contribute to the increased methylation of CpG islands and silencing of affected target genes, which are frequently found in human cancer [53]. Though DNMT3B increases in karyotypically abnormal hESCs, in much the way as HDAC2, expression of DNMT3B was also enhanced in long-term in vitro and optimal culture condition ( Figure 7), which was consistent with the gene mRNA expression levels ( Figure 9). Recent studies have also shown that DNA methyltransferases, and DNMT3B, were correlated with HDAC1 and HDAC2 and were involved in the epigenetic regulation by silencing transcription [54] and promoting cell proliferation and tumorigenesis [29]. DNMT3B can also interact with HDACs 1/2 and other components of the epigenetic machinery to establish the chromatin environment [55]. Peter Andrews reported that hESCs might undergo culture adaptation in long-term in vitro culture and that variations in gene expression might reflect the aberrant karyotype of the cells or might result from karyotypically silent epigenetic changes, implying that adaptation reflects an alteration in the balance between self-renewal and differentiation [56]. Longterm in vitro cultured hESCs show a high risk of genomic instability due to culture conditions. Thus, we optimized our culture system including the preparation of feeder cells by irradiation, control of the density of feeder cells and passage by manual cutting [44]. In optimized conditions, hESCs show a stable normal karyotype, even after more than two years of in vitro culture. The increased expression of DNMT3B in culture suggested that it might be relevant to culture adaptation and reflect the progressive adaptation of self-renewing cells to their culture conditions. However, the possibility of tumorigenesis can not be excluded, and further research is required in future work to empirically demonstrate or disprove this.
Moreover, aberrant expression of CTNNB1, which is the epigenetically modified protein, induces malignant pathways in normal cells and abnormal activity of CTNNB1 also exists in malignant progression [57]. In stem cells, expression of CTNNB1 might serve as a multifunctional protein with a central role in stem cell renewal and differentiation [58], [59]. Recent studies have shown that CTNNB1 could regulate Tert expression through the interaction with Klf4, and thereby telomere length, which could be critical in human cancer [60]. Here, enhanced expression of CTNNB1 and increased expression target genes accompanied with more serious transformation in hESCs implied its role in stem cell-derived tumor initiation and progression ( Figure 6 and Figure 10). Clearly, further work is needed to fully appreciate its function on malignant transformation of stem cells. As for VIM, it is one of the mammalian intermediate filament proteins and a feature of proliferating fetal cells. VIM expression increased in karyotypically abnormal hESCs showed an increased protein level along with increased gene expression levels ( Figure 9) even in optimal culture condition. This suggested that vimentin might be relevant for cellular adaptation to culture conditions and reflects the progressive adaptation of self-renewing cells to their culture conditions.

Conclusions
We have shown for the first time the use of ITRAQ-based tandem mass spectrometry to quantify proteins of normal and aberrant karyotypic hESCs from simple to more complex karyotype abnormalities, the purpose of which was to elucidate the dynamics of the malignant transformation process seen in hESCs. This study should serve as a useful resource for the discovery of transformed phenotypic biomarkers for further monitoring the safety of the clinical use of hESCs. Increased expression of HDAC2 and CTNNB1 might serve as potential prognostic markers in the malignant transformation of hESCs and could be detected as early as the pre-neoplastic stage. This complex rearrangement contains a reciprocal translocation between chromosome 1, 6 and 4, and as well as an insertion segment of 1q21q25 into the band 4p16; a derivative chromosome 2 resulted from two reciprocal translocations, one is positioned between chromosomes 2 and 7 and the other is positioned between the same chromosome 7 and chromosome 8; a inversion of chromosome 10, and a derivative chromosome 15 resulted from a reciprocal translocation between chromosomes 4 and 15. This is seen as 46,XX,dup(1)(p32p36)t(1;6;4)(q25;q23; p16)ins(4;1)(p16;q21q25), der(2)t(2;7)(q35;qter)t(7;8)(q22; q22),inv(10) (p11q21),der (15)