Analysis of Human Blood Plasma Proteome from Ten Healthy Volunteers from Indian Population

Analysis of any mammalian plasma proteome is a challenge, particularly by mass spectrometry, due to the presence of albumin and other abundant proteins which can mask the detection of low abundant proteins. As detection of human plasma proteins is valuable in diagnostics, exploring various workflows with minimal fractionation prior to mass spectral analysis, is required in order to study population diversity involving analysis in a large cohort of samples. Here, we used ‘reference plasma sample’, a pool of plasma from 10 healthy individuals from Indian population in the age group of 25–60 yrs including 5 males and 5 females. The 14 abundant proteins were immunodepleted from plasma and then evaluated by three different workflows for proteome analysis using a nanoflow reverse phase liquid chromatography system coupled to a LTQ Orbitrap Velos mass spectrometer. The analysis of reference plasma sample a) without prefractionation, b) after prefractionation at peptide level by strong cation exchange chromatography and c) after prefractionation at protein level by sodium dodecyl sulfate polyacrylamide gel electrophoresis, led to the identification of 194, 251 and 342 proteins respectively. Together, a comprehensive dataset of 517 unique proteins was achieved from all the three workflows, including 271 proteins with high confidence identified by≥2 unique peptides in any of the workflows or identified by single peptide in any of the two workflows. A total of 70 proteins were common in all the three workflows. Some of the proteins were unique to our study and could be specific to Indian population. The high-confidence dataset obtained from our study may be useful for studying the population diversity, in discovery and validation process for biomarker identification.


Introduction
Determination of the protein constituents of human plasma has been an active area of research for several years [1]. The documentation of a number of proteins that can be detected was highly dependent on the sensitivity of the available detection methods. The list of abundant proteins in the plasma along with their concentration has been documented well before mass spectral methods were deployed [2]. The interest in the protein composition of human plasma has largely stemmed from their relevance in clinical diagnostics [2][3][4]. Mass spectral methods became popular in the analysis of plasma, as it became increasingly possible to detect very low amounts of peptides and proteins [5][6][7]. There have been international collaborative efforts to examine data from different mass spectral instruments and works flows and evolve criteria to arrive at a definitive list of proteins present in the human plasma [8,9]. Anderson et al merged data from four studies reporting in-depth human plasma proteome analysis, including three published experimental datasets using proteomics approach based on different methodologies and fourth dataset drawn from individual published reports on serum or plasma. They reported a non-redundant list of 1,175 gene products, of which 195 proteins appeared in more than one dataset [8]. Another study based on the separation of proteins largely by gel electrophoresis and off-gel electrophoresis, followed by tryptic digestion and analysis using linear ion trap-Orbitrap (LTQ-Orbitrap) and linear quadrupole ion-trap-Fourier transform mass spectrometers, identified a set of 697 proteins with high confidence in the human plasma [10]. Earlier, mass spectral data have been analyzed based on improved algorithm and a list of approximately 1200 proteins have been listed to be present in the plasma [11].
Population proteomics is a recent concept and still emerging. There have been attempts to investigate protein diversity in human population and population specific modification/changes in proteins have been documented [12][13][14]. However, populationspecific plasma proteomics has not been investigated as extensively as genomic analysis of populations. The use of standard workflows involving extensive pre-fractionation is one of the important limitations to analyze a larger number of samples to study population diversity or any disease condition in a larger cohort. Hence, in the current study, we have analyzed plasma proteome from Indian population by using strategies that do not involve extensive fractionation. Here, 'reference plasma sample', a pool of plasma from 10 healthy individuals, was used for the study. The samples were immunodepleted with 14 most abundant proteins followed by evaluation of three different workflows with minimum pre-fractionation. These include analysis after a) no prefractionation b) prefractionation at peptide level by strong cation exchange (SCX) chromatography and c) prefractionation at protein level by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), followed by nanoscale reverse phase liquid chromatography tandem mass spectrometry (nano-RP-LC-MS/MS).

Sample collection
The Human Ethics Committee at Centre for Cellular and Molecular Biology (CCMB), Hyderabad, India had approved the study. All the blood samples were collected at dispensary of CCMB, Hyderabad, India from the healthy individuals after written informed consent. Blood was collected in EDTA-coated vacutainers from 10 healthy individuals (5 male and 5 female) of Indian origin with age group between 25-60 yrs. The samples were centrifuged at 1500 6g for 20 min. to separate plasma. Equal volume of plasma from each individual was pooled to get 'reference plasma sample'. The sample was aliquoted and stored at 280uC until used for further analysis.

Immunodepletion
Reference plasma sample was immunodepleted using MARS column Hu-14 (4.66100 mm) on Agilent HPLC-1100 series as per the manufacturer's instruction. Hu-14 column removes albumin, IgG, antitrypsin, IgA, transferrin, haptoglobin, fibrinogen, alpha2macroglobulin, alpha1-acid glycoprotein, IgM, apolipoprotein Al, apolipoprotein All, complement C3, and transthyretin. The flowthrough fraction was collected ( Figure S1A) and desalted using a 5 KDa cutoff spin filters (Agilent Technologies, Santa Figure 1. Experimental overview and bioinformatic analysis to study plasma proteome. Reference plasma sample was prepared by pooling equal volumes of plasma from 10 healthy individuals of either sexes and age group of 25-60 years. The sample was immunodepleted with 14 abundant proteins and analyzed using three different workflows-a) no prefractionation b) prefractionation at peptide level by strong cation exchange (SCX) chromatography and c) prefractionation at protein level by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), using nano LC-MS/MS approach for in-depth human plasma proteome analysis. The analysis led to the identification of a total of 517 unique proteins identified from all the three workflows. A total of 271 proteins were identified with high confidence i.e. identified with$2 unique peptides or by a single peptide identified in any of the two workflows. doi:10.1371/journal.pone.0072584.g001 Clara, CA, USA). Consistency of immunodepletion was confirmed by performing SDS-PAGE analysis ( Figure S1B).

Trypsin digestion
Reference plasma protein sample (500 mg), obtained after immunodepletion, was reduced using dithiothreitol (10 mM DTT) and alkylated using iodoacetamide (50 mM IAA), followed by in-solution digestion with trypsin (Promega, Madison, WI, USA) for 16 h at 37uC overnight. The digest was then lyophilized and stored at 280uC until used. These samples were further analyzed by two of the workflows-a) no prefractionation and b) prefractionation by strong cation exchange (SCX) chromatography. The third workflow involved prefractionation of proteins by SDS-PAGE followed by in-gel trypsin digestion. Proteins separated by SDS-PAGE were subjected to destaining followed by in-gel digestion with trypsin (1:100 in 25 mM ammonium bicarbonate) (Promega, Madison, WI, USA) for 16 h at 37uC overnight. The peptides were extracted and the digest was lyophilized and stored at 280uC until used. Workflows a) No prefractionation. 20 mg of tryptic digest was desalted using C18 cartridge (Pierce, Rockford, USA) as per the manufacturer's instructions and 2 mg of the digest was used further for LC-MS/MS analysis. b) Prefractionation at peptide level by SCX chromatographic separation. Tryptic digest from a total of 320 mg protein was resuspended in 1 ml of buffer A [10 mM KH2PO4, 25% (v/v) acetonitrile (ACN), pH 2.9] and separated on a SCX column (Zorbax 300-SCX, 5 mm, 4.5 mm ID 6 100 mm, Agilent Technologies, Santa Clara, CA, USA) on Agilent 1100 series HPLC ( Figure S2A). The conditions for SCX fractionation include: flow rate 700 ml/min and 40 min gradient-5 min, 025% buffer B (buffer A+1 M KCl); 5 min, 5-10%; 5 min, 10-23%; 5 min, 23-50%; 10 min, 50-100%; 10 min, 100% B. One minute fractions were collected, vacuum-dried and desalted using C18 cartridge (Pierce, Rockford, USA). After desalting, consecutive fractions were pooled to get six fractions with comparable peptide quantities approximated from SCX chromatograms and were subjected to LC-MS/MS analysis. c) Prefractionation at protein level by SDS-PAGE. The Hu-14 immunodepleted and desalted reference plasma sample was fractionated at protein level by 1D-SDS-PAGE. 10 mg of the protein was run on 11 cm, 4-20% Tris-glycine gradient gel (Invitrogen, Carlsbad, CA, USA) in duplicates for half an hour to get a gel with partial run. A total of six bands were excised and cut into small pieces from the Coomassie-stained gel ( Figure S2B). The gel pieces were destained with 25 mM ammonium bicarbonate and 50% acetonitrile (ACN), dehydrated in 100% ACN followed by drying with speed vacuum concentrator. In-gel trypsin digestion was performed by rehydration of dried gel pieces with modified sequencing grade trypsin (0.25 mg; Promega, Madison, WI, USA) for 16 h at 37uC. Peptides were extracted with 0.3% triflouroacetic acid (TFA) in 50% acetonitrile (ACN) followed by drying with speed vacuum concentrator. The samples were desalted using Pepclean C18 cartridges (Pierce, Rockford, USA), dried and were further subjected to LC-MS/MS analysis.

LC-MS/MS analysis
Tryptic digests of samples with no prefractionation, fractionation after SCX (6 fractions) and fractionation after SDS-PAGE (6 fractions) were reconstituted in 0.1% formic acid in 5% acetonitrile (ACN). The samples were analyzed by nano-RP-LC-MS/MS using a nano LC (EASY nLC Proxeon, now Thermo Scientific) connected to LTQ-Velos Orbitrap (Thermo Scientific, Bremen, Germany). A reversed-phase BioBasic C-18 analytical column (5 mm particle size, 300 Å pore size, 75 mm610 cm) (Thermo Scientific, Bremen, Germany) with picofrit, was used for separation of peptide samples. Tryptic peptides were eluted at a The voltage applied for ionization was 1.7 kV. The precursor ions MS spectra were acquired in the Orbitrap with resolution of 60,000 at m/z = 400, (mass range 400-2000) with 1610 6 accumulated ions. MS/MS was performed for the twenty most intense precursor ions from each MS scan. The peptides were fragmented in the linear ion-trap by collision-induced dissociation with 35% collision energy and the resulting fragment ions were detected at a mass resolution of 15,000 (at m/z 400). Data were acquired using Xcalibur software version 2.1 in data dependent mode. The lock mass option was enabled for accurate measurement in both MS and MS/MS modes. The polydimethylcyclosiloxane ions generated during electrospray from ambient air (m/ z, 445.120025) was used for internal calibration in real time [10,15].

Bioinformatic Analysis and Protein Identifications
The RAW files were analyzed using both Sequest and Mascot search engines of Proteome Discoverer (Thermo Scientific, Version 1.2) against IPI (International Protein Index) database version 3.75. MS/MS search criterion was as follows: Mass tolerance of 10 ppm for MS and 0.25 Da for MS/MS mode, trypsin as the enzyme with 1 missed cleavage allowed, carbamido methylation of cysteine as static and methionine oxidation as dynamic modifications respectively. For Sequest search analysis Xcorr was set at 1.9 (1 + ), 2.2 (2 + ), and 2.3 (3 + ) and for Mascot search cut off score was set at $30. High confidence peptides were used for protein identifications by setting a target false discovery rate (FDR) threshold of 1% at the peptide level. Only unique peptides with high confidence and rank 1 were used for protein identifications. Proteins identified with $2 peptides in any workflow or by a single peptide identified in any of the two workflows was considered to be identified with high confidence. Only unique peptides with high confidence and rank 1 were used for protein identifications. Biological function and localization of proteins were obtained from Gene Ontology database (http:// www.geneontology.org). Mass spectrometry raw data are available with PG and RN.

Results and Discussion
The current study has attempted to evaluate the workflows involving minimal prefractionation that may be employed for studying population proteomics. As an initial effort, we used 'reference plasma sample', a pool of plasma from 10 healthy individuals with various age groups and gender, followed by immunodepletion of 14 most abundant proteins to improve the identification of low abundant proteins. The bound and flowthrough fractions were separated clearly (Figure S1A). We explored three different workflows and analyzed the plasma sample after a) no prefractionation b) prefractionation of the tryptic peptides using SCX chromatography and c) prefractiona-  Mass spectral data obtained from each workflow was analyzed separately using Sequest or Mascot search node. The individual data files from analysis of SCX fractions (6 fractions) or by SDS-PAGE (6 bands) from Sequest or Mascot search node were merged for protein identifications (Table S1A-F). Data obtained after Mascot or Sequest search analysis for each work flow were then merged, redundant proteins were removed and a final list of proteins was obtained. The number of unique proteins identified after no prefractionation, prefractionation by SCX and prefractionation by SDS-PAGE was 194, 251 and 342, of which 56, 112 and 201 proteins were specific to each workflow respectively. The data from each workflow are summarized in Table S2A-C. After combining the data from all the three workflows, a total of 517 unique or nonredundant proteins was obtained and listed in Table S3. Of these, 271 proteins were identified with $2 unique peptides or by a single peptide identified in any of the two workflows. The remaining proteins were identified by single peptide in any of the workflows. We compared the proteins identified with high confidence (n = 271) in our study with the dataset of 697 and 1,175 proteins reported by Schenk et al [10] and Anderson et al [8] respectively. A total of 121 and 82 proteins were found to be common with these datasets. Overall, a total of 72 proteins were common between three datasets i.e. dataset from the current study, Schenk et al and Anderson et al study (see Table S3). In our data, although several proteins were initially listed as 'uncharacterized' by searching against IPI database, subsequent search in UniProt database revealed their IDs that are indicated in the Table S3.
The proteins identified by single and $2 peptides in three different work flows are shown in Figure 2. Interestingly, the workflows that include no prefractionation and prefractionation at protein level before digestion yielded greater number of protein IDs with $2 unique peptides as compared to single peptide IDs. Only in the workflow, in which tryptic peptides were prefractionated on a SCX column, the number of IDs based on single peptide was more. The number of proteins versus number of unique peptides and molecular weight are shown in Figure 3A and 3B. Most of the proteins were identified with $2 peptides. Seventy seven proteins were identified with $10 peptides. While maximum number of identified proteins had molecular weights in the range 21-40 kDa, a large number of high molecular weight proteins have also been identified. The distribution is similar to the analysis by Schenk et al [10]. Degree of consensus of protein IDs and peptides identified in three workflows is represented as Venn diagram (Figure 4). A total of 70 proteins and 245 unique peptides were identified in all the three workflows. These proteins are listed in Table 1. More than 70% of the proteins match with the plasma proteins identified by Schenk et al [10] and/or Anderson et al [8] and have been indicated in Table 1. The cellular localization of these proteins reveals that most of them are extracellular. These proteins participate in various biological processes related to metabolism, immune response, cell growth and/or maintenance, transport and signal transduction as shown in Figure 5. Some of the important biological processes and proteins include cell communication and signal transduction (pigment epitheliumderived factor, insulin-like growth factor-binding protein complex acid labile subunit and retinol binding protein 4, plasma); cell growth and/or maintenance (thrombospondin-1, afamin and isoform C of fibulin-1); and transport (serum amyloid A-4 protein, hemopexin, apolipoprotein B-100, apolipoprotein E, vitamin Dbinding protein isoform 1 precursor).
Among these, a number of proteins identified here have been implicated in various disease processes. Pigment epitheliumderived factor, a glycoprotein that belongs to the superfamily of serine protease inhibitors, has been shown to be a potent inhibitor of angiogenesis in the mammalian eye, and is involved in the pathogenesis of angiogenic eye diseases [16,17]. Fibulin-l is reported to be involved in the spread of ovarian cancer in the peritoneal cavity and/or in distal metastases [18]. Plasma retinol binding protein 4 has been reported as a potential biomarker of nephropathy and cardiovascular disease in type 2 diabetic subjects [19]. Apolipoprotein B, involved in transport, has been implicated in cardiovascular diseases [20,21]. Apolipoprotein E (APOE) is largely produced by glial cells and its genotype is reported to be one of the major genetic risk factor for Alzheimer disease [22]. Reduced conversion of vitamin D-binding protein to a macrophage activation factor has been reported to be valuable to determine risk of disease extension in juvenile idiopathic arthritis (JIA) patients [23]. The presence of such proteins in our datasets suggests the possibility to study population diversity, in discovery and validation process for biomarker identification, in a larger cohort.  Interestingly, keratin IDs account for approximately 3.5% of total proteins in the list of 517 proteins. We also identified keratin isoform 10, a marker for poor prognosis in hepatocellular carcinoma patients after resection [24]. A large number of single peptide-based hits had Mascot and Sequest scores generally considered as acceptable. The number of ions obtained in MS/ MS was also $50% of the theoretically possible fragmentation. They were also included in the comprehensive list of proteins in Table S3. By employing the strategy described in this paper, we have been able to identify a large number of proteins present in the human plasma. The HPLC method of separation of abundant proteins appears to be effective, as the 14 depleted proteins were not observed in the unbound fraction. We have detected a large number of 'classical plasma proteins' in addition to some proteins leaked from tissue. Human plasma is dynamic and it is unlikely there is an absolute number of proteins. The levels may also vary depending on several factors. The concentration range of proteins detected was from 9-20 mM (hemopexin) to 0.04-0.08 mM (cystatin C) a dynamic range of 10 3 . A total of 140 out of 271 proteins identified with high confidence that did not match to the proteins reported by Schenk et al and Anderson et al, marked by (-) sign in Tables 1 and S3, are unique to this study and could be specific to Indian population. There have been attempts to investigate protein diversity in human population implicated in cancer, with various modifications observed in abundant proteins [12,25]. We have noted the presence of some of the abundant proteins such as apolipoprotein CII, apolipoprotein CIII, apolipoprotein E, antithrombin III, and cystatin C even when the sample was analyzed without any prefractionation was analyzed. Hence, it should be possible to do directed proteomics to examine post-translational modification or changes in peptide sequence without extensive fractionation of plasma proteins after depletion of abundant proteins using the power of Orbitrap Velos mass spectrometer.
The current study was designed to evaluate the workflows employing minimal pre-fractionation towards the goal of population proteomics and not for in-depth analysis to identify a large dataset of proteins from plasma, which may require extensive fractionation. Here, we have used three different workflows to identify proteins in the pooled plasma of individuals from Indian population and could achieve a comprehensive dataset of 517 unique proteins with 271 proteins identified with high confidence. Of these, 70 proteins were identified in all the workflows and could be targeted in our future studies, considering that 'population proteomics' study may focus on a small/targeted dataset analyzed in larger cohort of samples. Even without fractionation of plasma depleted of abundant proteins, a large number of proteins were detected. The strategies employed here can be applied for quantitative analysis such as iTRAQ labeling of proteins/peptides or label free quantitation. Though, the number of proteins identified in our study is relatively less, we believe that the protein dataset identified with high confidence in our study include functionally relevant proteins and would be useful to address population diversity for validation process, where a large cohort is required to establish the outcome.  Figure S2 Prefractionation of immunodepleted reference plasma proteins at peptide and protein level using SCX chromatography and SDS-PAGE respectively. (A) SCX chromatogram showing fractionation at peptide level. A total of 320 mg protein was digested with trypsin and peptides were fractionated using SCX column on Agilent 1100 series HPLC. After desalting, consecutive fractions were pooled to get six fractions with comparable peptide quantities approximated from SCX chromatograms and were subjected to LC-MS/MS analysis (see methods). (B) SDS-PAGE showing prefractionation at protein level. A total of 10 mg of the protein was separated using SDS-PAGE for half an hour to get a partial run. A total of six bands were excised and subjected to ingel digestion. The samples were desalted and were further subjected to LC-MS/MS analysis (see methods). SCX-Strong cation exchange chromatography; SDS-PAGE-sodium dodecyl sulfate polyacrylamide gel electrophoresis (TIF)   Table S3 A comprehensive dataset of 517 unique proteins identified from all the three workflows. A total of 271 proteins were identified with$2 unique peptides or by a single peptide identified in any of the two workflows. Another 246 proteins were identified by single peptide and any one of the workflows. The workflow(s) in which a protein was identified is shown in the