The Kinome of Pacific Oyster Crassostrea gigas, Its Expression during Development and in Response to Environmental Factors

Oysters play an important role in estuarine and coastal marine habitats, where the majority of humans live. In these ecosystems, environmental degradation is substantial, and oysters must cope with highly dynamic and stressful environmental constraints during their lives in the intertidal zone. The availability of the genome sequence of the Pacific oyster Crassostrea gigas represents a unique opportunity for a comprehensive assessment of the signal transduction pathways that the species has developed to deal with this unique habitat. We performed an in silico analysis to identify, annotate and classify protein kinases in C. gigas, according to their kinase domain taxonomy classification, and compared with kinome already described in other animal species. The C. gigas kinome consists of 371 protein kinases, making it closely related to the sea urchin kinome, which has 353 protein kinases. The absence of gene redundancy in some groups of the C. gigas kinome may simplify functional studies of protein kinases. Through data mining of transcriptomes in C. gigas, we identified part of the kinome which may be central during development and may play a role in response to various environmental factors. Overall, this work contributes to a better understanding of key sensing pathways that may be central for adaptation to a highly dynamic marine environment.

The Pacific oyster Crassostrea gigas belongs to one of the most species-rich but genomically poorly explored phyla, the Mollusca. Here we report the sequencing and assembly of the oyster genome using short reads and a fosmid-pooling strategy, along with transcriptomes of development and stress response and the proteome of the shell. The oyster genome is highly polymorphic and rich in repetitive sequences, with some transposable elements still actively shaping variation. Transcriptome studies reveal an extensive set of genes responding to environmental stress. The expansion of genes coding for heat shock protein 70 and inhibitors of apoptosis is probably central to the oyster's adaptation to sessile life in the highly stressful intertidal zone. Our analyses also show that shell formation in molluscs is more complex than currently understood and involves extensive participation of cells and their exosomes. The oyster genome sequence fills a void in our understanding of the Lophotrochozoa.
Oceans cover approximately 71% of the Earth's surface and harbour most of the phylum diversity of the animal kingdom. Understanding marine biodiversity and its evolution remains a major challenge. The Pacific oyster C. gigas (Thunberg, 1793) is a marine bivalve belonging to the phylum Mollusca, which contains the largest number of described marine animal species 1 . Molluscs have vital roles in the functioning of marine, freshwater and terrestrial ecosystems, and have had major effects on humans, primarily as food sources but also as sources of dyes, decorative pearls and shells, vectors of parasites, and biofouling or destructive agents. Many molluscs are important fishery and aquaculture species, as well as models for studying neurobiology, biomineralization, ocean acidification and adaptation to coastal environments under climate change 2,3 . As the most speciose member of the Lophotrochozoa, phylum Mollusca is central to our understanding of the biology and evolution of this superphylum of protostomes.
As sessile marine animals living in estuarine and intertidal regions, oysters must cope with harsh and dynamically changing environments. Abiotic factors such as temperature and salinity fluctuate wildly, and toxic metals and desiccation also pose serious challenges. Filter-feeding oysters face tremendous exposure to microbial pathogens. Oysters do have a notable physical line of defence against predation and desic-cation in the formation of thick calcified shells, a key evolutionary innovation making molluscs a successful group. However, acidification of the world's oceans by uptake of anthropogenic carbon dioxide poses a potentially serious threat to this ancient adaptation 4 . Understanding biomineralization and molluscan shell formation is, thus, a major area of interest 5 . Crassostrea gigas is also an interesting model for developmental biology owing to its mosaic development with typical molluscan stages, including trochophore and veliger larvae and metamorphosis.
A complete genome sequence of C. gigas would enable a more thorough understanding of oyster biology and the evolution of Lophotrochozoa. One of the main challenges, however, is the high levels of polymorphism present in oysters and many marine invertebrates [6][7][8] . To overcome this, an oyster derived from four generations of full-sibling mating (coefficient of inbreeding, F 5 0.59) was used for genome sequencing and assembly (Supplementary Text B1) through fosmid pooling, next-generation sequencing (NGS) and hierarchical assembling. Combining these genomic data with transcriptomes from different organs, different developmental stages and adults challenged with stressors, in addition to mass spectrometric analysis of shell proteins, allowed us to explore characteristics of the oyster genome and key aspects of molluscan biology related to stress response and shell formation.
Sequencing and hierarchical assembly NGS technology has been successfully applied for de novo genome sequencing and assembly using whole-genome shotgun strategies [9][10][11][12][13] . We initially generated 155-fold Illumina whole-genome shotgun reads (Supplementary Table 1), but could not adequately assemble them owing to high levels of polymorphism and abundant repetitive sequences (Supplementary Text B2 and Supplementary Fig. 1). As possible alternative sequencing strategies-such as the addition of longer Roche 454 reads 12,13 or traditional bacterial artificial chromosome (BAC)-to-BAC sequencing-are expensive, we opted instead for a more cost-effective fosmid-pooling strategy. In brief, a fosmid library was constructed, and 145,170 clones (,tenfold genome coverage) were evenly and randomly assigned into 1,613 pools, each of which was sequenced to ,60-fold depth and assembled separately ( Fig. 1 Table 3). Over 90% of the assembly was covered by the longest 1,670 (14%) scaffolds.
To assess the completeness of the assembly, 105-fold coverage of short-insert library reads (,2 kb) that participated in assembly (Supplementary Table 1) were aligned against the assembly. Over 99% of these reads were successfully mapped, using a combination of Burrows-Wheeler Aligners 15 and the more sensitive LASTZ (Supplementary Fig. 5 and Supplementary Table 4). The integrity of the assembly is further demonstrated by the successful mapping of 99% of the BAC sequenced obtained using the Sanger sequencing technique, and 98% of ,68,000 expressed sequence tags from 454 sequencing (Supplementary Text B5, Supplementary Fig. 6 and Supplementary  Tables 5 and 6). Fosmid pooling has been used for re-sequencing 16,17 , and our results show that the combination of fosmid pooling, NGS and hierarchical assembly provides a new, cost-effective alternative for de novo sequencing and assembly of complex genomes.

Polymorphism and repetitive sequences
To understand polymorphism in the oyster genome, we analysed allelic variation in the assembled genome (inbred) and one resequenced wild oyster (wild) (Supplementary Text C1). The inbred genome contained 3.1 million single-nucleotide polymorphisms and 258,405 short insertion/deletion (indels, 1-40 base pairs (bp)) yielding a sequence polymorphism rate of 0.73%, whereas the wild genome had 3.8 million single-nucleotide polymorphisms and 238,182 indels, or a polymorphism rate of 1.3% (Supplementary Table 7), comparable to previous estimates 18 . This 44% reduction in polymorphism in the inbred genome is smaller than the 59.4% predicted from four generations of brother-sister mating, indicating that selection favouring heterozygotes had occurred 19 . The polymorphism combining inbred and wild (among four haplotypes) was 2.3%, higher than that in most studied animal genomes 20,21 but comparable to that in known high-polymorphism species 7 . In inbred and wild, we found 3,094 short indels located in coding regions inferred to cause frameshift variants in 2,665 genes, providing an important source for recessive lethal mutations.
k-mer-based analysis of the oyster genome showed that ,35% of 17-mers had at least two identical copies in the genome, suggesting an abundance of repetitive sequences ( Supplementary Fig. 1). Similarly, homology searching and ab initio prediction found 202 Mb (36% of the genome) in repetitive sequences (Supplementary Text C2 and  Supplementary Table 8). Over 62% of the detected repeats could not be assigned to known categories, reflecting the paucity of genomic information from molluscs 22 . Large numbers of transposase (359) and reverse-transcriptase (779) gene fragments were detected; over 96% of these had detectable transcripts ( Supplementary Fig. 8). Alignment of the wild sequence against the assembly identified 20,605 deletions (.100 bp), over 80% of which overlapped with detected transposable elements, suggesting that transposable elements may have an active role in shaping genome variation. Using MITE-hunter 23 , we detected 157,007 copies of miniature inverted-repeat transposable elements (MITEs), accounting for a remarkable 8.82% of the genome (Supplementary Text C2.3 and Supplementary Table 9). Pair-wise comparisons show extremely low sequence divergence in some MITE families (Supplementary Fig. 9), indicating that they may still be active.

Gene annotation and developmental genomics
A total of 28,027 genes were predicted encoding 50 amino acids or more by combining de novo prediction and evidence-based searches using reference genomes, oyster expressed sequence tags and transcriptomes from multiple organs and developmental stages (Supplementary Text D1 and E1 and Supplementary Fig. 11 Supplementary Fig. 15).
One notable finding of developmental interest is that the oyster Hox gene cluster is broken into four sections ( Fig. 2) with flanking non-Hox genes ( Supplementary Fig. 16). We did not find a clear  DNA was randomly sheared into fragments. a, b, A 40-kb-insert fosmid library was constructed (a), and 145,170 fosmid clones were randomly selected and assigned into 1,613 pools, each containing 90 clones covering 0.57% of the diploid genome (b). For each pool, three Illumina short-insert barcoded libraries (two 200 bp and one 500 bp) were constructed and ,60-fold coverage of 90-bp reads (20-fold per library) were generated, and assembled using SOAPdenovo with optimizing parameters. Assemblies from each pool were further corrected and reassembled if unexpected connections were detected owing to high similarity sequences from different fosmids, and gaps were filled by local assembly. c, Fosmid scaffolds were split into contigs at unfilled regions, leaving no undetermined bases in the sequences. Each base was assigned a Phred-like quality score determined by its coverage and alignment mismatches, and these sequences were merged into supercontigs using the overlap layout consensus method. Redundancy was removed using self-to-self alignment and sequencing depth information. d, Whole-genome shotgun Illumina libraries (200-bp to 20-kb inserts) from sheared genomic DNA were constructed for mated-pair Illumina sequencing. e, The fosmid supercontigs were linked into scaffolds using (1) the whole-genome shotgun sequences; (2) inferred pairedend information extracted from assembled pool scaffolds with a span size ranging from 50 bp to 37.5 kb; and (3) 225,000 fosmid ends sequenced using Sanger technology.

RESEARCH ARTICLE
Antennapedia gene, which is present in other bivalves such as Pecten and Yoldia 24 ( Supplementary Fig. 17). Disruption of the Hox cluster, as also observed in tunicates, nematodes and drosophilids, has been attributed to the loss of temporal co-linearity and modified developmental control 25 . Supporting this model, we find that Hox genes in the oyster are not activated in an order matching their identity or genomic position, with, for example, HOX4 and HOX1 peaking before gastrulation, LOX5 and POST2 during the trochophore stage and HOX5 during the pediveliger stage (Supplementary Fig. 18 and Supplementary Table 15).

Adaptation to environmental stress
Comparison with seven other sequenced genomes identified 8,654 oyster-specific genes (Supplementary Text E3.1) that are probably important in the evolution and adaptation of oysters and other molluscs. With oysters being the only representative, these genes could be shared by other molluscs. Among these genes, gene ontology terms related to 'protein binding', 'apoptosis', 'cytokine activity' and 'inflammatory response' are highly enriched (P , 0.0001; Supplementary Text E2 and Supplementary Table 17), indicating over-representation of some host-defence genes against biotic and abiotic stress. Manual examination shows that several gene families related to defence pathways, including protein folding, oxidation and anti-oxidation, apoptosis and immune responses, are expanded in C. gigas ( Fig. 3a and Supplementary Table 18). The oyster genome contains 88 heat shock protein 70 (HSP70) genes, which have crucial roles in protecting cells against heat and other stresses, compared with ,17 in humans and 39 in sea urchins. Phylogenetic analysis finds clustering of 71 oyster HSP70 genes to themselves, suggesting that the expansion is specific to the oyster (Supplementary Fig. 19). Also expanded are cytochrome P450 (Supplementary Fig. 20) and multi-copper oxidase gene families, which are important in the biotransformation of endobiotic and xenobiotic chemicals 26 , and extracellular superoxide dismutases, which are important in defence against oxidative stress. The oyster genome has 48 genes coding for inhibitor of apoptosis proteins (IAPs), compared with 8 in humans and 7 in sea urchins, indicating a powerful anti-apoptosis system in oysters. Genes encoding lectin-like proteins, including C-type lectin, fibrinogen-related proteins and C1q domain-containing proteins (C1QDCs), are highly over-represented in the oyster genome (P , 0.0001; Supplementary  Table 18); these genes have important roles in the innate immune response in invertebrates [27][28][29] . Interestingly, many immune-related genes, including genes coding for Gram-negative bacteria-binding proteins, peptidoglycan-recognition proteins, defensin, C-typelectin-domain-containing proteins and C1QDCs, are highly expressed in the digestive gland ( Supplementary Fig. 21), indicating that the digestive system of this filter feeder is an important first-line defence organ against pathogens.
To investigate genome-wide responses to stress, we sequenced 61 transcriptomes from C. gigas subjected to nine stressors, including temperature, salinity, air exposure and heavy metals (Supplementary Text G1 and Supplementary Tables 19 and 20). We found that 5,844 genes were differentially expressed under at least one stressor, and genes responding to different stressors showed significant overlap ( Fig. 3b and Supplementary Fig. 23a). Air exposure induced a response from the largest number of genes (4,420), indicating that air exposure is a major stressor and that oysters have evolved an extensive gene set in defence. Genes differentially expressed in response to stress are more likely to have paralogues (Fig. 3c), suggesting that expansion and selective retention of duplicated defence-related genes are probably important to oyster adaptation. Under most stressors, genes coding for HSPs, histones, IAPs and protein biogenesis were upregulated, and those for protein degradation downregulated, pointing to concerted responses to maintain cellular homeostasis 30 (Supplementary Text G3  and Supplementary Table 21). Genes involved in the unfolded protein response to cellular stress in the endoplasmic reticulum (coding for calreticulin, calnexin, 78-and 94-kDa glucose-regulated proteins) were upregulated, indicating that protein quality control is critical in cellular homeostasis under stress.
Air exposure induced up to 67-fold upregulation of five highly expressed IAPs (Supplementary Fig. 24a). Other inhibitors of apoptosis were also upregulated: BCL2 up to fourfold and BAG up to 12-fold ( Supplementary Fig. 24b). These apoptosis inhibitors were also highly upregulated under heat and low salinity stress. These findings, along with the expansion of IAPs, suggest that a powerful antiapoptosis system exists and may be critical for the amazing endurance of oysters to air exposure and other stresses. The existence of an intrinsic apoptosis pathway in invertebrates has been controversial, and parts of the pathways have only recently been demonstrated for two lophotrochozoans 31,32 . The finding of key genes belonging to both intrinsic (BAX, BAK, BAG, BCL2, BI1 and procaspase) and extrinsic (TNFR and caspase 8) apoptosis pathways indicates that oysters have advanced apoptosis systems. Powerful inhibition of apoptosis as shown by genomic and transcriptomic analyses may be central to the ability of oysters to tolerate prolonged air exposure and other stresses.
Heat stress induced a ,2,000-fold increase in expression of five highly inducible HSP70 genes or a 13.9-fold increase in average expression of all HSP70 genes, amounting to 4.2% of all transcripts ( Supplementary Figs 24c and 25). The genomic expansion and massive upregulation of HSP genes help to explain why C. gigas can tolerate temperatures as high as 49 uC when exposed to summer sun at low tide 33 . HSP genes were also upregulated under other stressors and may be central to the oyster defence against all stresses (Supplementary Fig. 25). HSP genes may also inhibit apoptosis by binding to effector caspases 34 .
Genes involved in signal transduction, including genes coding for G-protein-coupled receptors and Ras GTPase, were also activated by stressors (Supplementary Fig. 24f) and over-represented in the oyster genome (Supplementary Table 11). These regulators may have a role in orchestrating stress responses, which seem to be well coordinated ( Fig. 3a and Supplementary Fig. 25). The expansion of key defence genes and the strong, complex transcriptomic response to stress highlight the sophisticated genomic adaptations of the oyster to sessile life in a highly stressful environment.

Shell formation
Calcified shells provide critical protection against predation and desiccation in sessile marine animals such as oysters. Molluscan shells consist of calcium carbonate (CaCO 3 ) crystals of either aragonite or calcite embedded in an elaborate organic matrix. Two models have been advanced for molluscan shell formation. The matrix model posits that mineralization occurs in a mantle-secreted matrix of chitin, silk fibroin and acidic proteins 35,36 . Chitin and silk proteins are proposed to provide matrix structure, whereas acidic proteins control the nucleation and growth of CaCO 3 crystals. The cellular model suggests that biomineralization is cell-mediated; that is, crystals are formed in haemocytes and then deposited at the mineralization front 37 .
We searched the oyster genome for genes implicated in shell formation in previous studies and examined their expression in different tissues and at different stages (Supplementary Text H1, 2). We also sequenced peptides from shells, mapped them to the genome and identified 259 shell proteins (Supplementary Text H3 and  Supplementary Table 24). Although our search found evidence for the involvement of chitin, we did not find any silk-like proteins encoded in the oyster genome (Supplementary Text H2) but found, instead, many diverse proteins that may have roles in matrix construction and modification. Notably, a gene coding for a fibronectinlike protein was highly expressed at the early developmental stage, when larval shells are formed, in unison with chitin synthase (Fig. 4a) and was mostly expressed in the adult mantle (403 other organ average; Fig. 4b); the fibronectin-like protein was among the most abundant proteins found in oyster shells. Genes coding for laminin and some collagen proteins were also highly expressed in the mantle (Supplementary Fig. 27a) and found in shells. These are typical extracellular matrix (ECM) proteins, and their presence in shells suggests that the shell matrix has similarities to the ECM of animal connective tissues and basal lamina. Unlike silk fibroins that can self assemble 38 , the formation of fibronectin fibrils in the ECM is cell mediated 39 . Oyster fibronectin-like proteins have five type-III domains for integrin binding and cell adhesion. Genes coding for integrins were highly expressed in haemocytes (43 other organ average, Supplementary Fig. 27b). Thus, haemocytes may organize fibronectin-like fibril formation in the shell matrix as they do in ECM.
The involvement of cells in shell formation is further supported by the functional diversity of proteins detected in shells. Many housekeeping proteins, such as elongation factor 1a and ribosomal proteins, were found in the shell; indeed, most oyster shell proteins are not structural proteins but are distributed in diverse metabolic pathways ( Fig. 4c and Supplementary Table 25). This functional diversity of shell proteins mirrors that of cells, which is unexpected under the matrix model. Furthermore, 84% of the 259 shell proteins identified are not classical secreted proteins (Supplementary Text H3.4 and  Supplementary Table 24); they may be part of cells or deposited by exosomes 40 . Supporting the presence of exosomes, 61 of the 259 shell proteins matched proteins in the exosome database 41 . Cells and exosome-like vesicles containing calcite crystals have been observed at the mineralization front 37,42 , although their significance in shell formation is debated. This study provides molecular evidence for their presence inside shells and their probable participation in shell formation.
Many shell proteins are enzymes that may be involved in matrix construction or modification. A homologue of penicillin-binding protein is exclusively expressed in mantle (723 other organ average) and highly abundant in shells ( Supplementary Fig. 27d). Penicillinbinding protein is a transpeptidase that crosslinks glycopeptides in bacterial cell walls 43 and may have similar functions in the shell matrix. Another notable enzyme found is tyrosinase. The oyster

RESEARCH ARTICLE
genome has an expanded set of 26 genes coding for tyrosinase, compared with one in Caenorhabditis elegans and two in humans; most genes coding for tyrosinase are mantle specific (Fig. 4d) and highly enriched among shell proteins (P 5 8 3 10 26 ). Although tyrosinase is a key enzyme in melanogenesis 44,45 , it is most highly expressed in the non-pigmented pallial mantle (Fig. 4d), indicating that it has other functions in the oyster. The mantle secretes tyrosinerich proteins 46 , and oxidation of tyrosine may be essential for shell matrix maturation. Several proteinases and proteinase inhibitors are highly mantle specific and abundant in shells, and may be involved in matrix formation, modification and protection (Supplementary  Table 24). Together, these results indicate that oyster shell matrix is not formed simply by self-assembling silk-like proteins but by diverse proteins through complex assembly and modification processes that may involve haemocytes and exosomes.

Concluding remarks
We sequenced and assembled the genome of the Pacific oyster using an inbred individual, short-read NGS and a new fosmid-pooling and hierarchical assembly strategy. The draft assembly provided insight into a molluscan genome characterized by high polymorphism, abundant repetitive sequences and active transposable elements. Genomic, transcriptomic and proteomic analyses show unique adaptations of oysters to sessile life in a highly stressful intertidal environment and the complexity of shell formation. The oyster genome sequence and comprehensive transcriptome data provide valuable resources for studying molluscan biology and lophotrochozoan evolution, and for genetic improvement of oysters and other important aquaculture species.

METHODS SUMMARY
The sequenced Pacific oyster is an inbred female produced by four generations of brother-sister mating. Genome sequences were produced with Illumina platform using fosmid pooling and assembled with a new hierarchical assembly strategy. Fosmid ends were sequenced by Sanger. Gene models were obtained by integrating results of de novo gene prediction and alignment-based methods based on homology and transcriptomic evidence. Transcriptomes were sequenced with Illumina platform. The proteome of the shell was obtained by mass spectrometry. All methods are described in detail in the Supplementary