Comparative proteome analysis of Milnesium tardigradum in early embryonic state versus adults in active and anhydrobiotic state

Tardigrades have fascinated researchers for more than 300 years because of their extraordinary capability to undergo cryptobiosis and survive extreme environmental conditions. However, the survival mechanisms of tardigrades are still poorly understood mainly due to the absence of detailed knowledge about the proteome and genome of these organisms. Our study was intended to provide a basis for the functional characterization of expressed proteins in different states of tardigrades. High-throughput, high-accuracy proteomics in combination with a newly developed tardigrade specific protein database resulted in the identification of more than 3000 proteins in three different states: early embryonic state and adult animals in active and anhydrobiotic state. This comprehensive proteome resource includes protein families such as chaperones, antioxidants, ribosomal proteins, cytoskeletal proteins, transporters, protein channels, nutrient reservoirs, and developmental proteins. A comparative analysis of protein families in the different states was performed by calculating the exponentially modified protein abundance index which classifies proteins in major and minor components. This is the first step to analyzing the proteins involved in early embryonic development, and furthermore proteins which might play an important role in the transition into the anhydrobiotic state.


Introduction
Tardigrades are small invertebrates with a body length of 0.1-1.0 mm. Milnesium tardigradum Doyère (1840) belongs to the species of carnivorous tardigrades and is analyzed regarding different aspects of its life history [1,2]. Tardigrades have been in focus in the last decades because of their amazing capability to undergo anhydrobiosis and survive physical extremes including high and subzero temperatures [3,4,5,6], high pressure [5,7] and extreme levels of ionizing radiation [8,9,10]. There are two known strategies to cope with water deficiency: ''desiccation-avoidance strategy'' and ''desiccation-tolerance strategy'' [11]. The term ''desiccation-avoidance strategy'' describes physiological and morphological adaptations to reduce water loss. For example the African lungfish build a waterproof cocoon to prevent the overdehydration [11]. ''Desiccation-tolerance strategy'' is used for withstanding the dehydrated state. The best example is anhydrobiosis, when the metabolic activity is reversibly at a standstill. Thereby, tardigrades contract their legs and build the so-called tun [12], in which they are resistance to extreme environmental conditions.
Even though detailed aspects of the life cycle of tardigrades are already described, there remains a notable absence of detailed knowledge concerning the proteome and genome of these animals, which provides the basis for further investigations including developmental analysis and also characterizing the molecular mechanisms of the protections and survival mechanisms in tardigrades during anhydrobiosis. With our investigation we intended to fill this gap by performing shotgun proteomics on tardigrades using 1D-SDS-PAGE and high sensitivity nanoLC-ESI-MS/MS on an LTQ-Orbitrap mass spectrometer.
Up to date there are only few published transcriptomic [13,14] and proteomic [15,16] studies available, which were carried out using EST sequences generated by Sanger sequencing from M. tardigradum. Using a newly established EST database based on 454 sequencing, we present in this study a comprehensive comparative analysis of the proteome of tardigrades in three different states: early embryonic state (EES), adult tardigrades in active (AS) and anhydrobiotic (tun) state (TS). More than 3000 proteins were identified with high sequence coverage. This comprehensive proteome resource includes different protein families such as chaperones, antioxidants, ribosomal proteins, cytoskeletal proteins, transporters, protein channels, nutrient reservoirs, and developmental proteins. In addition proteins such as Late Embryogenesis Abundant protein (LEA), which were previously identified by homology search against the NCBInr database [15] are now characterized by MS/MS analysis using the M. tardigradum database for the first time.
Our study presents not only a milestone in analyzing the proteome of tardigrades, but also a comparative analysis of different states of tardigrades using a label-free semi-quantification method. All proteins were quantified by calculating their exponentially modified Protein Abundance Index (emPAI), which allows the classification of proteins in major and minor components and thereby a semi-quantitative analysis of differentially expressed proteins in different states. Applying this method, we firstly compared the proteome of tardigrades in early embryonic state versus adult tardigrades (in both active and tun state) and secondly adult tardigrades in active state versus tun state.

Identification and Classification of Proteins Expressed in M. tardigradum
One dimensional gel electrophoresis in combination with high sensitive nanoLC-ESI-MS/MS allowed us the identification of proteins on a large scale. We investigated the proteome of M. tardigradum in early embryonic state (EES) and of adult animals in active (AS) and tun state (TS) (Figure 1). The analysis yielded 1982 proteins in EES, 2345 proteins in AS and 2281 proteins in TS. The complete results of database searches and protein identifications for each state including decoy analysis are provided in Tables S1 (EES), S2 (AS) and S3 (TS). Identifications based on one peptide were allowed only in cases we found the same protein in different gel slices. By setting the search parameters as such that they refer to a match probability of p,0.01, we minimized the false discovery rate (FDR) to values below 5%. Only the FDRs in gel slices in the low molecular weight range (e.g. slice 26 and 27) were higher than 5%. Since proteins identified in these slices were mostly one peptide identifications, they were excluded from further analyses. Database search of the MS/MS spectra resulted in proteins that could be separated into two groups: identified proteins with annotation (annotated by Blast search against SwissProt and NCBInr databases) and those without annotation. Proteins with annotation were classified into different functional groups defined by gene ontology using Blast2GO program. A summary of all identified proteins and their classification in selected protein families and functional groups is given in Table  S4. A broad range of diverse protein families including chaperones, antioxidants, ribosomal proteins, cytoskeletal and motor proteins, transporters, protein channels, nutrient reservoirs, and developmental proteins are present in the results. Identified proteins, which could not be annotated using homology search against the SwissProt and NCBInr database were analyzed for specific protein domains using DomainSweep. A total of 1135 contigs without annotation were identified including one-peptide identifications. The DomainSweep analysis resulted in 129 proteins, which showed significant protein domains. For another 455 proteins we found putative protein domains. For the remaining 551 contigs we could not receive any information. The result of DomainSweep analysis for not annotated proteins identified with more than one peptide is available in Table S4.

Determination of Major Components in Early Embryonic State and of Adult Animals in Active and Tun State
The comparative analysis of tardigrades in different states was performed using a label-free technique based on emPAI. The emPAI (exponentially modified Protein Abundance Index) is defined as the ratio of the number of identified tryptic peptides to the number of theoretically observable tryptic peptides for each protein [17]. In our study the emPAI (included in Tables S1, S2 and S3) was only used to give an approximate estimate of relative protein concentration to grouping the proteins into minor and major components for each state. Thus, our data provide an overview of protein classes, which are highly abundant in each state.
Selected proteins associated with diverse processes such as response to stimulus, protection and development were compared based on their emPAI. Data are summarized in Table 1.
To analyze the major components in each state we selected protein hits which showed an emPAI of . 30. We found 38 proteins as major components in EES, from which 20 are without annotation (Table 2). Among annotated proteins we found 10 protein members of the large lipid transporter protein (LLTP) superfamily [18] such as apolipophorins and vitellogenins. Heat shock proteins and ribosomal proteins are further proteins of the major component category. 60S ribosomal protein L7 and 40S ribosomal proteins S30 are involved in translation and in particular 60S ribosomal protein L7 is known to be involved in reproduction and embryonic development ending in birth or egg hatching [19]. Two heat shock proteins are present: Hsc71 and sHsp p40 (major egg antigen) which is highly expressed in EES. Furthermore we found only one protein belonging to structural constituent of cytoskeleton (actin-5C). Other cytoskeleton proteins seem to be not highly expressed at this state.
Proteins without annotation are indicated with an asterisk, in case we found putative candidates in DomainSweep results. For one contig (EZ760287/contig08235:1:820:2) DomainSweep analysis delivered a significant candidate (indicated with #), namely the whey acidic protein (WAP) 4-disulfide core. This protein has a peptidase inhibitor activity. Contig18794:1:101:3 (EZ761369) contains only 33 amino acids and shows a high emPAI of 874.69 in EES. Generally, proteins with short sequences deliver a small number of observable peptides resulting in high emPAI values [17,20]. On the other hand we have performed a relative comparative analysis of different states using the same database. The emPAI of this contig is considerably lower in AS (70.62) and TS (29.55), which means that the high emPAI is in fact due to the higher abundance in EES than AS or TS. Blast search of this contig against NCBInr delivered ribosomal protein L4 (Danio rerio), however with an insufficient e-value. DomainSweep analysis of this contig resulted in ribosomal protein L4/L1e as putative candidate. In-depth proteomics analysis is needed to verify these results.
We found 53 proteins as major components in adult tardigrades in AS and 49 in TS (Table 3). Comparing the annotated proteins in AS and TS we found the same three major functional groups, members of structural constituent of cytoskeleton and muscle, furthermore members of LLTP superfamily. Proteins without annotation include contigs (indicated with asterisk), for which we have found putative candidates by DomainSweep analysis.
The same protein members of LLTP superfamily are present in AS as well as in TS. These include the following vitellogenin proteins: VTG-1, VTG-2, VTG-4 (2 different contigs), VTG-6 (3 different contigs). The early embryonic state contains all these vitellogenins except for vitellogenin-2. Interestingly, vitellogenin-2 is described to be involved in biological process of determination of adult lifespan, which means the control of viability and duration in the adult phase of the life-cycle [21,22]. Actin 5-C (2 contigs), cytoplasmic actin and filamin-A belong to the structural constituent of cytoskeleton. Myosin heavy chain, paramyosin, myosin-7, myosin-3, and troponin I are muscle proteins and except troponin I have all motor activity function. We found one isoform of myosin heavy chain, troponin I, two protein members of heat shock protein family (Hsc 71 and AGAP000941-PA) and 40S ribosomal protein S3 as major components in AS, and annexin A11, transketolase-like protein 2 and 60S acidic ribosomal protein P0 as member of major component group in TS. Con-tig01971:138:399:3 is annotated as AGAP000941-PA from Anopheles gambiae, which shows high homology to small heat shock proteins. Among proteins without annotation, there are 18 proteins present in both AS and TS.

Proteins Found in One State Only
The proteome analysis yielded 1982 proteins in EES, 2345 proteins in adult tardigrades in AS, and 2281 proteins in TS. A total of 1301 proteins are found in all three states as shown in the Venn diagram in Specific proteins like protein members of the piwi family are identified only in EES. Piwi like proteins are developmental proteins that play a central role during gametogenesis [23]. Proteins involved in iron homeostasis like soma ferritin are also found only in EES. In total four contigs annotated as proteins belonging to ferritin family are identified (Table S4), two of which are present in all three states, one in EES and TS and another one only in EES. Two members of heat shock protein family are identified only in EES: the small heat shock protein C4 involved in stress response and 10 kDa heat shock protein belonging to the GroES chaperonin family and involved in protein folding. In addition two different contigs annotated as small heat shock  Table 1. Semi-quantitative analysis of selected proteins associated with diverse processes such as response to stimulus, protection and development. GO  protein major egg antigen (p40) are identified. One is only found in EES and the other one in all states. A total of 256 proteins are found only in AS, from which 71 proteins are without annotation. The two proteins (EZ760543, EZ762990) with the highest emPAI value are without annotation and DomainSweep analysis delivered no specific protein domains. Dixin, a developmental protein involved in Wnt signalling pathway is the protein with the third highest emPAI value. Wnts  control development in organisms ranging from nematodes to mammals. The Blast2GO analysis of annotated proteins (Figure 2b) delivered metabolic process, oxidation reduction and proteolysis as abundant categories, which are important processes for a living organism. We identified 199 proteins in TS, from which 58 are without annotation. Two proteins without annotation (EZ758977, EZ762549) followed by myosin heavy chain (EZ763186) are the proteins with the highest emPAI value found only in TS. The result of Blast2GO analysis of annotated proteins is shown in Figure 2c. The first ten biological process categories include three categories involved in response to stimulus, such as heat, oxidative stress and xenobiotic stimulus (Figure 2c). Only the last one is also present in AS (Figure 2b). Heat shock protein 81-2 (Hsp90 family), hypoxia up-regulated protein 1 (Hsp70 family), and two members of DnaJ protein family are present as chaperones involved in stress response in tun state. Although activation of stress response was expected in TS, it seems there are other processes which are probably associated with anhydrobiosis. Proteins involved in intracellular signaling cascade and phosphorylation are present. Protein amino acid phosphorylation as a biological process category is present in Blast2GO results of proteins identified only in AS and TS (Figure 2b). However, the involved proteins in phosphorylation in both states seem to be different. Dual specifity mitogen-activated protein kinase (EZ759901/con-tig05524:314:1363:2), RAC serine/threonine-protein kinase (EZ760607) and cell division cycle 2-like protein kinase 6 (EZ761193), which are involved in phosphorylation have been identified only in TS.
Reanalysis of our data by including phosphorylation of serine, threonine and tyrosine as modification delivered 49 proteins (Table S5). We have identified 13 different phosphoproteins only in EES, 11 phosphoproteins only in AS and another 11 phosphoproteins only on TS. Further seven phosphoproteins are identified in both AS and TS, two phosphoproteins in both EES and AS and another two phosphoproteins in both EES and TS. We found 3 phosphoproteins in all three states. The comparison of the phosphoproteins which are found only in AS or TS shows that almost half of the phosphoproteins in TS are without annotation. Also of major interest are proteins involved in intracellular signaling cascade: calcium-regulated heat stable protein 1 (EZ759268), RAC serine/threonine-protein kinase (EZ760607) and Drebrin-like protein (EZ760971/contig16604:102:1247:3). However, the role of these proteins in relation to desiccation tolerance has to be investigated.
Among diverse proteins only identified in TS we found lipid storage droplets surface-binding protein which is involved in lipid transport and is reported to be required for normal deposition of neutral lipids in the oocytes [24,25]. Lipids represent probably the only nutrient sources during all steps from dehydration (transitional state I) to rehydration (transitional state II) and thus are essential for surviving.

Proteins Overlapping in Two States
Whereas 680 proteins were identified only in adult tardigrades (active and tun), the number of proteins which are overlapping between EES and adults is significantly lower (108 between EES and AS and 101 between EES and TS), which is expected (Venn diagram in Figure 2). Whereas cellular component organisation and transport are main processes in both EES and AS (Figure 2d), translation, development and biological regulation are abundant categories found in both EES and TS (Figure 2e). Proteins found only in AS and TS are mainly involved in cellular process,  -overlapping (a, b, c) or partially overlapping (d, e, f) between the different states are analyzed using Blast2GO program to determine the involved biological processes. The ten major biological processes for non-overlapping proteins are listed in 2a-2c and for partially overlapping proteins in 2d-2f. doi:10.1371/journal.pone.0045682.g002 oxidation reduction, proteolysis and biological regulation (Figure 2f).
Proteins involved in metabolic processes are present in TS but reduced to half compared to AS, which is in accordance to the expectation since during anhydrobiosis a metabolic dormancy is described [26,27].

Discussion
Comprehensive Analysis of the M. tardigradum Proteome In our previous publication a proteome map of tardigrades was developed utilizing 2D gel electrophoresis and LC-ESI-MS/MS analysis [15]. 2D gel electrophoresis offers high resolution and allows analysis of single spots, which contain at most only a few proteins. In particular, the absence of a comprehensive database at the time of our previous study made the reduction of complexity achieved by 2D gel electrophoresis necessary to increase the number of detected peptides which belong to the same protein.
Since our parallel tardigrade EST sequencing project provided us recently with a high number of new EST sequences generated by 454 sequencing, we could consider the 1D gel electrophoresis as a complementary platform to 2D gel electrophoresis to analyze the proteome of tardigrades (Figure 3). The present study includes a comprehensive proteome resource of M. tardigradum and demonstrates the first comparative analysis of expressed proteins in three different states.
In addition we have reanalyzed the MS/MS data of protein spots from our 2D gel study [15] against the 454 database (Table  S6). Interestingly, our 2D proteomics data of active tardigrades support the quantification analysis based on emPAI. Proteins with a high emPAI could be identified repeatedly in different protein spots on the 2D gel, which indicates the high amount of these proteins in the whole protein extract. For instance major egg antigen (EZ761340) shows a high emPAI of 108.34 and could be identified in 22 protein spots. Other proteins such as vitellogenin, apolipoproteins and actin show the same relation between emPAI and number of spots on the 2D gel.
Although the present 454 protein database is the most comprehensive one available at the moment, it is still an incomplete database. Calculation of emPAI using an incomplete database delivers high values for contigs with very short sequences, which can lead to misinterpretation [17,20]. In these cases the high emPAI is caused by the calculation using a short sequence present in the database and therefore is not related to the amount of the protein. Nevertheless, a comparative analysis of the same protein in different states is possible, since we perform a relative quantification using the same database for all three states. In total we identified more than 3000 proteins, 2460 of which could be functionally annotated by homology search against the SwissProt and NCBInr databases. The results cover two main aspects: (i) Identifcation of diverse protein families for the first time in M. tardigradum. Of major interest are proteins that have been reported to be related to anhydrobiosis such as heat shock proteins, Late Embryogenesis Abundant protein, aquaporins, and antioxidant proteins. (ii) Comparative analysis of major components in different states. Protein families identified only in early embryonic state deliver new aspects in terms of developmental biology. Comparative analysis of proteins in active versus tun state could bring us closer to understanding the molecular mechanisms during anhydrobiosis.
These two aspects are discussed in the following section by selected protein families.

Comparative Analysis of Proteins Associated with Anhydrobiosis and Survival
Among the numerous proteins identified in this study some proteins have already been reported to play an important role during anhydrobiosis, most importantly Late Embryogenesis Abundant (LEA) proteins. Although the precise role of LEA proteins has not yet been fully elucidated, different studies have reported on the association of these proteins with tolerance to water stress by desiccation [28,29]. The presence of LEA proteins in tardigrades has been shown by analyzing 2D gels prepared from whole protein lysates of M. tardigradum and homology search against NCBInr database [15]. In the present study LEA could be identified also in our tardigrade specific database. Contig EZ759004 shows high similarity to the LEA protein from Alteromonas macleodii. The predicted sequence from M. tardigradum was confirmed by MS/MS analysis of peptides covering 61.9% of the entire sequence (length: 147aa). This protein is up-regulated in adults and shows a 1.2 times higher emPAI in tun state compared to active state. The search for specific protein patterns using DomainSweep (Table S4) resulted in two significant hits (EZ759288, EZ761565) and 5 putative candidates (EZ759004, EZ759235, EZ761969, EZ762343, EZ762913) for LEA proteins. Among these candidates only contigs EZ759004, EZ759288 and EZ759235 are identified with more than one peptide.
Chaperones in particular heat shock proteins (Hsps) play key roles in cell protection and response to diverse stimuli like stress, heat and hypoxia by preventing protein aggregation ( Table 1). The relation of Hsps in particular low molecular weight Hsps in desiccation tolerance and dormancy is reported in different studies [12,30]. A comprehensive proteomic study of Hsps in tardigrades in active versus tun state has been reported earlier [16]. Different Hsp families are present in our results: Hsp90, Hsp70, Hsp60, Hsp40 and Hsp20, GroES, and GrpE families. We identified three sHsps that are described for the first time in M. tardigradum: the small heat shock protein C4 and 10kDa heat shock protein (GroES chaperonin family) identified only in the EES and a sHsp (AGAP000941-PA, sHsp 20.6 isoform 3 (EZ759251)) in all three states. In addition other chaperonin families such as TCP-1 and calreticulin were identified in all three states (Table S4). Our semiquantitative analysis indicates an up-regulation of a small heat shock protein (major egg antigen, p40) and furthermore a ferritin homologue in EES of M. tardigradum. Major egg antigen is found in Schistosoma mansoni and is described to be involved in response to heat. In all analyzed states major egg antigen (EZ761340) is the heat shock protein with the highest emPAI, particularly in EES. Artemin, the ferritin homologue identified in Artemia is reported to protect cells from stress and acts similar to molecular chaperones such as small heat shock proteins. In studies on Artemia it has been shown that the small heat shock protein and artemin are associated with anhydrobiosis [31]. Since we found p40 and soma ferritin both up-regulated in EES and not in anhydrobiotic state, we assume that these proteins are involved in development and hence are specific markers for the EES. However, the role of p40 and ferritin in anhydrobiotic tardigrades has to be investigated.
An important aspect of desiccation tolerance is protection against free radicals [32,33]. Superoxide dismutases (SODs) are one of the most important antioxidant enzymes in defense against ROS and particularly superoxide anion radicals [34,35]. Generally SOD is present in two forms inside the eukaryotic cell, SOD (Cu-Zn) in the cytoplasm and outer mitochondrial space, and SOD (Mn) in the inner mitochondrial space [36]. Both superoxide dismutases SOD (Cu-Zn) (6 contigs) and SOD (Mn) (2 contigs) have been identified in tardigrades (Table S4). The superfamily of glutathione transferases (GSTs) builds a further cellular detoxification system [37]. In addition GSTs have cellular physiology roles such as regulators of cellular pathways of stress response and housekeeping roles in the binding and transport of specific ligands [38]. We have found 27 different contigs that belong to the GST superfamily. The expression of 1-cysteine (1-Cys) peroxiredoxin family of antioxidants is reported in Arabidopsis thaliana and is shown to be related to dormancy [39]. Different isoforms of peroxiredoxins (8 contigs) are included in our results. Peroxiredoxins and diverse other proteins like catalase, peroxidasin, thioredoxin reductase and glutamate cysteine ligase are described to be involved in response to oxidative stress ( Table 1). The comparison of the total emPAI (sum of emPAI of each protein member) of protein families with antioxidant activity shows that GSTs are approximately 3 fold higher in adults compared to EES (Figure 4), which is probably due to the exposition to higher amounts of endobiotics and xenobiotics. Eggs are laid inside the old cuticle and remain there during the embryonic development. Therefore embryos are not directly attacked by xenobiotics. In contrast Cu-Zn SODs are up-regulated in EES compared to adults (Figure 4). The studies on development of mouse embryos in vitro have shown that thioredoxin and SODs promote the in vitro development of mouse embryos fertilized in vitro [40]. This suggests that protection of embryos from oxidative stress is a prerequisite for their development in vitro. We assume that the up-regulation of Cu-Zn SODs in EES is related to their important roles in development. Comparing active to tun state we observed upregulation of GSTs and peroxiredoxins in active state and in contrast up-regulation of SODs in tun state.
We identified 350 transmembrane proteins, 53 of which are involved in transmembrane transport. One group of channel proteins that plays an important role in ''desiccation-tolerance strategy'' is the aquaporin (AQP) protein family [11]. AQPs are passive transport channels for water and permit water to move in the direction of an osmotic gradient. Kikawada et al. could show that AQP is involved in the removal of water in the desiccation process en route to anhydrobiosis [11]. Different AQPs are identified in M. tardigradum: AQP 3, AQP 4, AQP 9, AQP 10 (2 contigs). AQP 4 is the most abundant one in all three states and in particular up-regulated in EES. Compared to the active state AQP 4 is 1.2 times more expressed in the tun state. However, the question whether identified AQPs are involved in anhydrobiosis in M. tardigradum needs to be answered by performing functional analysis.
Although a high number of 1981 proteins overlaps between AS and TS, there are also numerous proteins (256 in AS and 199 in TS) that are identified only in one state. The Blast2GO analysis of proteins identified only in TS led to the assumption that not only the response to stimuli plays an important role in the anhydrobiotic state, but also further processes and mechanisms are associated such as response to heat, oxidative stress, intracellular signalling cascades, and phosphorylation. As shown in Table 1 we found not only Hsps involved in response to stimuli, but also two other main groups namely kinases in particular those involved in mitogen-activated (MAPK) signaling pathway and translation initiation factor, which is associated with protein biosynthesis. There are reports of observed changes in protein phosphorylation in plants which were exposed to water deficit, suggesting reversible phosphorylation as a regulator [41]. In particular mitogenactivated protein kinases (MAPKs) and other kinases belonging to the MAPK cascade have been identified in plants in response to dehydration, suggesting that the MAPK cascade is involved in stress signaling [42,43].
The analysis for phosphorylation delivered 49 phosphoproteins (Table S5). As expected the number of identified phosphopeptides was very low, since we did not perform any enrichment steps for phosphopeptides prior to the mass spectrometry analysis. Enrichment steps need high amounts of peptides and have to be optimized. In our study the starting material was limited and did not allow any further procedures. However, we found specific phosphoproteins for each state. Almost half of the phosphoproteins in TS (5 out of 11) are without annotation. The functional analysis of these tardigrade specific proteins has to be investigated.

Comparative Analysis of Proteins Identified in EES Versus AS and TS
Members of large lipid transfer protein (LLTP) superfamily belong to major components in all three states. In addition this superfamily is highly expressed in EES compared to adults as shown by calculating total emPAI ( Figure 4). Lipid transport in animals is mediated by members of the LLTP superfamily, which are grouped into three major families: the apoB-like LLTPs, the vitellogenin-like LLTPs, and the microsomal triglyceride transfer protein (MTP)-like LLTPs, or MTPs [18]. In addition to lipid transport they have also been reported to play an important role in animal development [44], reproduction [22], and immunity [45] as well as aging and lifespan regulation [46]. The high regulation of protein members of LLTP superfamily in adults can be explained by the fact that we used middle age, egg producing adults in our experiments. ApoB-like LLTPs are represented in our study by the following protein members: three contigs which show high homology to apolipoprotein B, apolipoprotein O and apolipophorin (Table 1). Glycolipophosphoprotein vitellogenin (VTG) is the major precursor of the egg-yolk protein, vitellin (Vn), which provides sources of nutrients during embryonic development in oviparous organisms [22,47]. It has been reported that lower vertebrates possess multiple Vtg genes and proteins [47] as has been shown for Danio rerio [48], Xenopus laevis [49], and salmonid fishes [50] as well as for the nematode Caenorhabditis elegans [51]. Similarly multiple Vtg proteins are found in M. tardigradum: VTG-1, VTG-2, VTG-4 (2 contigs) and VTG-6 (3 contigs). Whereas apoB-like LLTPs and vitellogenin-like LLTPs are present abundantly in EES, MTP-like LLTPs are underrepresented and are found only in adult tardigrades.
Other proteins could be identified that are associated with lipid transport and metabolism such as low-density lipoprotein receptors (LDLR family), vigilins (perilipin family) and high density lipoprotein-binding proteins. Also of interest are proteins associated with lipid catabolic process such as lipases. Lipoprotein lipases are assumed to be involved in fatty acid uptake, transport and metabolism. They are also known to serve as yolk proteins in dipterans eggs [52]. All protein categories related to lipid transport, storage and metabolism (Table 1) are significantly upregulated in EES, since they most likely present a key source for energy during embryonic development. Similarly it has been shown that lipid metabolic pathways were up-regulated in the C. elegans dauer larval stage [53]. Furthermore association of these proteins with hibernation and dormancy can be expected since it is shown that lipids also serve as the main energy source in hibernating mammals [54]. Lipid metabolism associated proteins are almost similarly expressed in both AS and TS in tardigrades.
Other major components in EES are ribosomal proteins. This is also reflected in the semi-quantitative analysis of ribosomal proteins by comparing the total emPAI of all three states ( Figure 4). Furthermore we found 32 ribosomal proteins that are only identified in EES. This result can be explained by the high need of protein synthesis with diverse functions including development en route to a mature organism. Proteins contributing to the structural integrity of cytoskeletal, muscle and vitelline membrane structure are weakly expressed in EES (Figure 4). Since vitelline membrane is a portion of egg shell, we expected the expression of those proteins only in mature animals, which is also reflected in our semi-quantitative analysis. Similarly proteins associated with pathogenesis such as pathogenesis-related protein 5 (PR-5, thaumatin family) are mainly expressed in adults. Eggs are laid inside the old cuticle and remain there during the embryonic development. Therefore developing embryos are not directly attacked by pathogens. The need of defence mechanisms against pathogens and fungi leads to higher expression of these proteins in adults (Table 1). Chitinases are widely distributed in a broad range of species and are described to be involved in digestion, arthropod molting, defence/immunity and pathogenicity by degrading of chitin and chitodextrins of chitin containing fungal pathogens (for review see [55]. Ophanin belonging to cysteine-rich secretory protein (CRISP) family is characterized as a snake venom protein that acts as a neurotoxin by targeting and inhibiting the voltage-gated calcium channels on smooth muscle [56]. Two contigs are annotated as ophanin; one is found only in EES and TS and the other one in all three states and is upregulated in AS. Since adult tardigrades are carnivorous we assume that ophanin has not only defence function but is also used to trap the prey animals.

Conclusion
The current study presents the first comparative proteome analysis of tardigrades in different states, which is an important resource for future research in this area. Since the amount of biological material was highly limited we were not able to perform biological replicates. However, the main focus of our study was to obtain information of highly abundant protein families present in the different life states of tardigrades rather than an accurate quantification of differentially expressed proteins. The semiquantitative analysis of proteins served predominantly for estimation of relative protein concentration to grouping the proteins into minor and major components. This method mainly delivered results in comparing EES with adult animals in AS and TS. The up-regulation of specific protein families such as large lipid transfer (LLTP) superfamily and ribosomal proteins in EES could clearly be demonstrated. However, since the majority of 1981 unique proteins overlapped between AS and TS there is a need to extend the applied label-free quantification method to other more accurate techniques such as labeling-based approaches to detect even subtle differences in protein expression between AS and TS. For selecting the suitable quantification technology there are some limitations. Technologies such as SILAC [57] and 14 N/ 15 N [58] metabolic labeling rely on metabolic incorporation of heavy isotopes and are suitable for cell culture and only in rare cases for whole organism [59] because the whole food chain of the organism has to be considered for labeling. Furthermore the number of tardigrades cultivated in the laboratory is limited and only the homogenization of a high number of individuals results in enough protein amount to perform experiments with biological replicates. This represents the major limitation in investigating tardigrades and makes quantification a challenging task.

Tardigrade Culture and Sampling
Tardigrades of the species Milnesium tardigradum Doyère (1840) were obtained from Dr. Ralph O. Schill (Department of Zoology, University of Stuttgart, Stuttgart, Germany) as described in our previous study and were maintained in a laboratory culture [12]. Briefly, the culture was grown on agarose plates (3%) (peqGOLD Universal Agarose, peqLAB, Erlangen Germany) covered with Volvic TM water (Danone Waters, Wiesbaden, Germany) at 20uC. The juveniles were fed on green algae Chlorogonium elongatum, the adults with bdelloid rotifers Philodina citrina. The specimens for the experiments were all of middle-age (egg producing), thus effects of age can be excluded. Tardigrades were starved for 3 days before harvesting and washed several times with Volvic TM water to avoid contamination with food-organisms. Subsequently the animals were transferred to microliter tubes (200 individuals per tube) and surrounding water was reduced to approx. 1-2 ml. Active (I) and anhydrobiotic states (III) according to Schill et al. [12] and eggs in the early embryonic state (blastula state), according to Suzuki [2] were investigated in this study. For the induction of the anhydrobiotic state (III), animals were desiccated in open microliter tubes (Biosphere SafeSeal Micro Tubes, Sarstedt, Nümbrecht, Germany) exposed to 85% relative humidity (RH) in a chamber containing a saturated solution of KCl (Roth, Karlsruhe, Germany) at 21uC for 24 h, subsequently transferred to a chamber containing a saturated MgCl 2 solution (Roth, Karlsruhe, Germany), where they were exposed to 33% RH for at least 48 h.
During egg deposition which is always accompanied by a moult process, eggs are laid inside the old cuticle. The average clutch contains about 7 eggs with a minimum of 3 and a maximum of 12. The egg laying process usually takes less than two minutes from the first to the last egg. Egg containing cuticles (780 eggs in total) were collected 24 h after egg deposition and washed several times with Volvic TM water. Eggs were not separated from the cuticles because this process would damage the eggs. All samples were frozen in liquid nitrogen and stored at 280uC.

Sample Preparation and One Dimensional Gel Electrophoresis
The animals (200 individuals each for active and tun state) and eggs (blastula, 780 eggs) were homogenized as described before [15] with the slight modification of adding phosphatase inhibitors to the lysis buffer. Briefly, collected animals/eggs were homogenized in 60 ml lysis buffer containing 8 M urea, 4% CHAPS, 30 mM Tris, Protease Inhibitor Mix (GE Healthcare, München, Germany), Phosphatase Inhibitor Cocktail 1+2 (Sigma-Aldrich, München, Germany) and orthovanadate (50 mM), pH 8.5 by ultrasonication (SONOPULS, HD3100, Bandelin Electronic, Berlin, Germany) with 45% amplitude intensity and 1-0.5 sec intervals at 4uC. Orthovanadate (50 mM) was prepared as described by Thingholm et al. [60]. 20 ml of each Phosphatase Inhibitor Cocktail 1+2 and orthovanadate (50 mM) were added to 1 ml lysis buffer to inhibit phosphatase activity. After homogenization the samples were shock frozen and stored at 280uC. For gel electrophoresis insoluble particles were removed by centrifugation for 2 min at 14,000g and 4uC and the supernatant was quantified using BCA mini-assay. One dimensional gel electrophoresis was performed using precast 4-12% Bis-Tris mini gels (Invitrogen, Karlsruhe, Germany) in MES buffer system. Gels were loaded with 40 mg of protein per lane and stained using protein staining solution from Fermentas (St. Leon-Rot, Germany). The entire lane was cut into 27 equal slices (except slice 26 and 27, which were twice as large) and used for in-gel digestion with trypsin. Since the amount of material is highly limited no biological replicates could be performed.

Preparation of Peptides and Protein Identification
Tryptic digestion of proteins and extraction of peptides were performed as described [61]. After extraction the solutions were dried in a speed-vac at 37uC for 2 h. Peptides were redissolved in 5 ml 0.1% TFA by sonication for 15 min and were applied for separation using a nanoAcquity UPLC (Waters GmbH, Eschborn, Germany). Peptides were trapped on a nanoAcquity C18 column, 180 mm 6 20 mm, particle size 5 mm (Waters GmbH, Eschborn, Germany). The liquid chromatography separation was performed at a flow rate of 400 nl/min on a BEH 130 C18 column, 100 mm 6 100 mm, particle size 1.7 mm (Waters GmbH, Eschborn, Germany). Slices 1-22 were analyzed using a 2 h gradient and for slices 23-27 a 1 h gradient was applied. The 2 h gradient was set as follows: from 0 to 4% B in 1 min, from 4 to 30% B in 80 min, from 30 to 45% B in 10 min, from 45 to 90% B in 10 min, 10 min at 90% B, from 90 to 0% B in 0.1 min, and 10 min at 0% B. The 1 h gradient was set as follows: from 0 to 4% B in 1 min, from 4 to 40% B in 40 min, from 40 to 60% B in 5 min, from 60 to 85% B in 0.1 min, 6 min at 85% B, from 85 to 0% B in 0.1 min, and 9 min at 0% B. Solvent A contains 98.9% water, 1% acetonitrile, 0.1% formic acid, solvent B contains 99.9% acetonitrile and 0.1% ml formic acid. The nanoUPLC system was coupled online to an LTQ Orbitrap XL mass spectrometer (Thermo Fisher Scientific, Bremen, Germany). Data were acquired by scan cycles of one FTMS scan with a resolution of 60000 at 400 m/z and a range from 370 to 2000 m/z in parallel with six MS/MS scans in the ion trap of the most abundant precursor ions.
The mgf-files were used for database searches with the MASCOT search engine (Matrix Science, London, UK; version 2.2) against a newly developed tardigrade database containing contigs from 454 sequencing (unpublished data). The peptide mass tolerance for database searches was set to 5 ppm and fragment mass tolerance to 0.6 Da. Carbamidomethylation of C was set as fixed modification. Variable modifications included oxidation of M and deamidation of NQ. In a separate search we selected phosphorylation of S, T and Y as additional modification for the identification of phosphopeptides. One missed cleavage site in case of incomplete trypsin hydrolysis was allowed. Furthermore, proteins were considered as identified if more than one unique peptide had an individual ion score exceeding the MASCOT identity threshold (ion score cut-off of 24). Identification under the applied search parameters refers to a match probability of p,0.01, where p is the probability that the observed match is a random event.
The abundance of proteins was estimated by comparing the exponentially modified Protein Abundance Index (emPAI) [17] which was automatically calculated by the MASCOT search engine. We analyzed each slice separately and avoided to merge the MS/MS data prior to protein database search to maintain the information about molecular weight of each protein. Since emPAI is defined to represent the absolute protein amount we manually calculated the sum of emPAI for proteins that were found repeatedly in different slices. The Protein Abundance Index (PAI) is defined as the number of identified peptides divided by the number of theoretically observable tryptic peptides for each protein, and was later converted to exponentially modified PAI (emPAI, the exponential form of PAI minus one) [17]. The success of using emPAI was demonstrated by determining absolute abundance of 46 proteins in a mouse whole-cell lysate, which had been measured using synthetic peptides [62]. The emPAI can be directly used for reporting approximate protein abundance in a large-scale analysis as shown in different studies [63,64,65,66,67].

Preparation of Tardigrade Protein Database
Assembly of the 454 sequences. 1 million reads from the 454 sequencing and their de novo assembly by Newbler (454/ Roche) were received by GATC (http://www.gatc-biotech.com/ de/index.html). From the reads 400890 clusters were included in the assembly with 85% aligned reads. The assembly yielded 28345 contigs, 13076 contigs with a length larger than 500 bases.

Classification of Proteins
For functional analysis of identified proteins we used Blast2GO program, which consists of three main steps: blast to find homologous sequences, mapping to collect GO-terms associated with blast hits and annotation to assign functional terms to query sequences from the pool of GO terms collected in the mapping step [71]. Functional assignment is based on GO database. Sequence data of identified proteins were uploaded as a multiple FASTA file to the Blast2GO software. We performed the blast step against the public NCBInr database using blastp. Other parameters were kept at default values: e-value threshold of 1e-3 and a recovery of 20 hits per sequence. Furthermore, minimal alignment length (hsp filter) was set to 33 to avoid hits with matching regions smaller than 100 nucleotides. QBlast-NCBI was set as Blast mode. An annotation configuration with an e-value-hit-filter of 1.0E-6, Annotation CutOff of 55 and GO weight of 5 have been selected. To grouping all identified proteins in selected subgroups of GO categories (molecular function and biological process) we used the analysis tool of combined graph. To obtain a compact representation of the information, we selected a sequence filter of 20 [72]. The sequence information of proteins in every GO subgroup can be exported as a text file.

Protein Domain Analysis of Proteins without Annotation
Six frame translations of the author constructed cDNA clusters were run through the DomainSweep pipeline [73] and the significant and putative hits were collected. For each of the protein/domain databases used, different thresholds and rules were established [73]. Domain hits are listed as 'significant'. i. if two or more hits belong to the same INTERPRO [74] family. The task compares all true positive hits of the different protein family databases grouping together those hits, which are members of the same INTERPRO family/domain. ii. if the motif shows the same order as described in PRINTS [75] or BLOCKS [76]. Both databases characterize a protein family with a group of highly conserved motifs/segments in a well-defined order. The task compares the order of the identified true positive hits with the order described in the corresponding PRINTS or BLOCKS entry. Only hits in correct order are accepted.
All other hits above the trusted thresholds are listed as 'putative'. By comparing the peptides which were identified by mass spectrometry with the six translations, the correct frame and the associated domain information was listed.