Splice Variants of Perlucin from Haliotis laevigata Modulate the Crystallisation of CaCO3

Perlucin is one of the proteins of the organic matrix of nacre (mother of pearl) playing an important role in biomineralisation. This nacreous layer can be predominately found in the mollusc lineages and is most intensively studied as a compound of the shell of the marine Australian abalone Haliotis laevigata. A more detailed analysis of Perlucin will elucidate some of the still unknown processes in the complex interplay of the organic/inorganic compounds involved in the formation of nacre as a very interesting composite material not only from a life science-based point of view. Within this study we discovered three unknown Perlucin splice variants of the Australian abalone H. laevigata. The amplified cDNAs vary from 562 to 815 base pairs and the resulting translation products differ predominantly in the absence or presence of a varying number of a 10 mer peptide C-terminal repeat. The splice variants could further be confirmed by matrix-assisted laser desorption ionisation time of flight mass spectrometry (MALDI-ToF MS) analysis as endogenous Perlucin, purified from decalcified abalone shell. Interestingly, we observed that the different variants expressed as maltose-binding protein (MBP) fusion proteins in E. coli showed strong differences in their influence on precipitating CaCO3 and that these differences might be due to a splice variant-specific formation of large protein aggregates influenced by the number of the 10 mer peptide repeats. Our results are evidence for a more complex situation with respect to Perlucin functional regulation by demonstrating that Perlucin splice variants modulate the crystallisation of calcium carbonate. The identification of differentially behaving Perlucin variants may open a completely new perspective for the field of nacre biomineralisation.


Introduction
The mollusc shell consists of two calcium carbonate polymorphs; the outer layer of calcite and the inner iridescent aragonite, or nacre [1]. Nacre itself consists of aragonite tablets arranged in a ''stack of coins'' [2,3]. These aragonite tablets are surrounded by the organic matrix, which contains chitin and proteins, like silk fibrinogen-like proteins, secreted by the mantle epithelia [4,5]. The interplay of the organic matrix and aragonite leads to its structural robustness [6], which makes nacre a composite of interest for material sciences. It is probable that amorphous CaCO 3 is initially formed as a precursor of crystal growth [7,8]. If nacre nucleation is induced on glass slides, a sheet of proteins is first secreted onto the substrate, which then is overlaid by a CaCO 3 calcite sheet. Once this layer is confluent, an abrupt production of aragonite sets in [9]. Belcher et al. [10] showed that regulation, nucleation, growth and aggregation do not need pre-organized organic arrays and that polyanionic proteins are required for aragonite crystal formation. In early shell formation there is a predominately irregular growth of calcite with low expression levels of proteins, followed by more regular growth of calcite on top of the first nacre layer controlled by the organic matrix and mediated by a significant increase of protein expression [11]. Thus far those water-soluble proteins of abalone nacre that have been studied include Perlucin [12][13][14], Perlustrin [13,15], Perlwapin [16], Perlinhibin [17], Perlbikunin from H. laevigata and Lustrin A [18][19][20][21] AP7, AP8 [22], AP24 [20,[23][24][25] from H. rufescens. Further proteins reported to be involved in biomineralisation are Nacrein [26], the N16 family (N16-1, N16-2, N16-3) [27,28] N66, N14 [29], MPP1 [30] or Pif 97 and Pif 80 [31]. Perlucin from H. laevigata was first purified and characterised as a water-soluble protein of 155 amino acids from the nacreous layer [12,13]. Based on the amino acid sequence a synthetic gene has been designed [GenBank: FB705690.1], although the natural coding sequence has not yet been determined. Perlucin comprises a C-type lectin with a lectin domain spanning the first 130 amino acids followed by two repeats of ten amino acids each [12]. It has one predicted N-glycosylation site at position 84. Furthermore, it is able to bind lactose and mannose, most likely via its lectin domain [12]. However, the physiological function of this activity has not yet been investigated.
Together with other proteins of the organic matrix, one function proposed for Perlucin is the nucleation of aragonite crystals [13,32] by binding to aragonite instead of calcite, [33]. In calcium carbonate precipitation experiments, native Perlucin clearly leads to faster precipitation of CaCO 3 [13]. Using atomic force microscopy it has been demonstrated that 100 mg/mL Perlucin supports nucleation on calcite in saturated CaCO 3 solutions [32]. Moreover, if Perlucin is dialysed against a saturated CaCO 3 solution, the CaCO 3 crystals formed incorporated Perlucin, which has recently been confirmed by Weber et al. [34] using green fluorescent protein (GFP) tagged Perlucin. Purified native Perlucin always appears very heterogeneous. This observation raised the idea that additional variants exist beside the described protein [12]. Here we describe the cloning and expression of cDNAs encoding Perlucin from the mantle epithelia of H. laevigata. Three new splice variants of Perlucin were identified, which differ mainly in the number of a 10 amino acid repeating unit at their C-termini. Recently, it has been assumed that aragonite-associated proteins have evolved signature sequence traits of intrinsic disorder and a predicted ''disordered domain'' within the repeating region of a Perlucin variant described here has been identified [35]. Interestingly, in recombinant Perlucin the number of repeats strongly influences the precipitation behaviour of CaCO 3, suggesting specific physiological roles of the splice variants in the regulation of crystal growth.

RNA Isolation and cDNA Synthesis
A fresh specimen of the Australian abalone H. laevigata was obtained as a kind gift from Great Southern Waters (Great Southern Waters Pty. Ltd., The Esp Indented Head, Victoria, Australia). The 10 cm specimen was cooled down on ice for 1 h. Afterwards the mantle epithelium was dissected and immediately frozen in liquid nitrogen. Deep-frozen mantel epithelium was homogenized and total RNA was isolated by using the SV Total RNA Isolation System (Promega, Madison, USA) following the manufacturer's instructions. Concentration and quality of the RNA were measured using a Nanodrop (Peqlab, Erlangen, Germany). First strand cDNA was performed by using 1 ng of total RNA as a template for RevertAid M-MuLV Reverse Transcriptase (Fermentas, Thermo/Fisher Waltham, MA, USA) and oligo dT primer following the manufacturer's protocol. Degenerated Primers (sense 59-TGYTACTGGTTCWSN-39) and (anti sense 59-YTTYTGRCAYTGRTARTCRTTCCA-39) were created based on the amino acid sequence for Perlucin (GenBank: P82596) from the Australian abalone H. laevigata. PCR was performed using 1 mL of first strand cDNA and GoTaq polymerase (Promega, Madison, USA) at 50uC annealing temperature. As a control we used primers for actin (sense 59-GTCACCAACTGGGACGA-39) and (anti sense 59-AC-CTGACCTCGGGAA-39) based on the available sequence of H. tuberculata actin (GenBank: AM236595.1). PCR products were ligated into pJET1.2 vector (Fermentas), transformed into E. coli strain XL1-Blue and clones were sequenced (Max Planck Institute for Marine Microbiology, Bremen, Germany) and edited using the Geneious software [36].

RACE-PCR
To obtain full-length cDNA of Perlucin we used rapid amplification of cDNA-ends (RACE) as described [37] and followed the manufacturer's instruction (Roche Applied Science, Mannheim, Germany). For elucidating the 59 ends we used sequence specific primer (antisense SP1 59-GTTCAGATCAGA AGCACCAAG-39) and a second sequence specific primer (antisense SP2 59-GACCAAGCCAGTAATTGAAAGC 39). For clarifying the 39 ends we used a third specific primer (sense SP3 59-ATATGGCTGTGGGAAGGACAACGCC-39).
To also obtain MBP with a C-terminal Strep-tag we amplified His-MBP using oligonucleotides (sense 59-GCCCATGGGCCAT-CACCATCACCATCAC- 39) and (antisense 59-GCGAATTCGCCCTGAAAATAAAGATTCTC-39) taking the pETM 41 vector as a template. The oligonucleotides containing the NcoI and EcoRI sites were annealed and cloned into NcoI/EcoRI digested pET28b HDGFStrep-tag plasmid [38]. The coding sequence for HDGF had been removed by digesting with NcoI/ EcoRI, while the coding sequence for the Strep-tag remained in the plasmid.

Phylogenetic Analysis
DNA fragments from Haliotis spec. were obtained by searching with BLASTN using Perlucin-R8 cDNA and screening GenBank for Perlucin from Haliotis spec. For comparison a range of functional diverse C-type lectins from vertebrates and invertebrates were selected (Table 1). Full length protein sequences were first aligned using T-Coffee server [39] and then truncated at the N-terminus and C-terminus starting from the first cysteine and ending at the last cysteine of the C-type lectin domains, respectively. DNA sequences encoding these proteins fragments were back translated in RevTrans, version 2.0 [40] (http://www. cbs.dtu.dk/services/RevTrans-2.0/web/) and aligned based on the T-Coffee alignment obtained for the amino acid sequences (alignment shown in supporting information S1). Best parameters (HKY85 substitution model [41] with 4 invariant gamma rate categories, HKY85 G+I) for phylogenetic constructions were determined using MEGA5 [42] and applied in the phylogenetic calculations using the MrBayes 3.1 [43] plug-in of Geneious [36], using type II antifreeze glycoprotein from Atlantic herring (Clupea harengus) [GenBank: S65819.1] as out-group.
Germany) were grown in DMEM supplemented with 10% foetal bovine serum, 100 U mL 21 penicillin and 50 mg mL 21 streptomycin at 37uC in a humidified atmosphere of 5% (vol/vol) CO 2 . For transient transfection, cells were grown to a density of 80% confluence. Directly before transfection, the media was changed from 10% foetal bovine serum to 2% foetal bovine serum. Transfection was carried out using polyethylenimine (Sigma-Aldrich, St. Louis, MO, USA) as described [38]. For protein expression, COS-7 cells were transfected with the indicated amount of pcDNA3 Perlucin plasmids and the supernatant were harvested after 24 h. Cell culture supernatants were acetone precipitated using 4 times sample volume and incubated overnight at 220uC. Proteins were precipitated by centrifugation at 200006g at 4uC for 20 minutes, dissolved in SDS-PAGE sample buffer and loaded on a 12% polyacrylamide gel.

Protein Expression in E. coli
Overnight cultures of transformed E. coli Rosetta (DE3) pLacI were used to inoculate 1000 mL of LB media (1:200, vol/vol). The new cultures were subsequently grown at 37uC until OD 600 of 0.5 was reached in order to be induced with a final concentration of 0.01 mM isopropyl-1-thio-b-D-galactopyranoside (Fermentas, Thermo Fisher, Waltham, MA, USA) and 10 mM glucose following constant shaking overnight at 4uC. Cells were collected by centrifugation at 54006g at 4uC for 15 min and resuspended in 2% of the original volume with 100 mM Tris/HCl, 500 mM NaCl, pH 8.0. The bacteria were sonicated on ice until complete lysis in four intervals of 30 seconds and lysates were clarified by centrifugation at 20,0006g for 30 min at 4uC. The recombinant soluble protein was first affinity purified via its N-terminal His-tag using Ni-NTA beads (Macherey and Nagel, Düren, Germany) and afterwards via its C-terminal Strep-tag using StrepTactin beads

CaCO 3 Precipitation Assay
The method was performed as previously described [8,45] with slight modifications. In brief, 250 mL of 20 mM NaHCO 3 , 500 mM NaCl at pH 8.7 with or without protein were mixed under constant stirring with 250 mL of 20 mM CaCl 2 , 500 mM NaCl. MBP-Perlucin or MBP dialyzed in 20 mM NaHCO 3 , 500 mM NaCl, pH 8.7 was added to the NaHCO 3 solution before CaCl 2 was added. As a control we used BSA or setups without any additives. The pH value was measured every 5 seconds for at least one hour using a thin Ross Micro pH electrode (Thermo Fischer, Waltham, USA) via pH meter (Portamess pH 910, Knick, Berlin, Germany) connected to a computer. After each measurement the pH-electrode was washed for 1 hour with HCl/Pepsin (Appli-Chem, Darmstadt, Germany). Afterwards, the pH-electrode was regenerated using 3 M KCl and then calibrated. For each splice variant three independent measurements were performed and both the average and the standard deviation were calculated. To address the concentration dependency only one measurement was performed.

Western Blot Analysis
For Western blot SDS-PAGE (MiniProtean III; BioRad, Hercules, CA, USA) was performed and separated proteins were transferred to a polyvinylidene difluoride (PVDF) membrane (Millipore, Merck KG, Darmstadt, Germany). The membrane was blocked with 5% BSA in Tris buffered saline (TBS), 0.15% Tween 20 TM for one hour. Immunodetection was carried out with a polyclonal rabbit anti Strep-tag antibody (1:1000, IBA, Göttingen, Germany) in blocking buffer overnight. After washing four times with TBS, 0.15% Tween20 TM a secondary donkey anti rabbit horseradish peroxidase conjugated antibody was applied (1:20,000, Dianova, Hamburg, Germany) for two hours. After washing four times with TBS, 0.15% Tween20 TM the membrane was developed with an enhanced chemoluminescence system (Amersham, GE Healthcare, Chalfont St Giles in Buckinghamshire, UK) and then exposed to X-ray film (Amersham).

MALDI-ToF MS
Following 2D electrophoresis, Coomassie Brilliant Blue-stained protein spots were excised from gels and processed using Sigma's Trypsin Profile IGD Kit as recommended by the manufacturer (Sigma, Deisenhofen, Germany). The extracted peptides were further purified by C 18 -ZipTip (Millipore, Billerica, MA, USA) and then premixed with a saturated matrix solution of a-cyano-4hydroxycinnamic acid in 60% acetonitrile with 0.1% formic acid for spotting onto target plate. MALDI-ToF MS analysis was performed on an Autoflex III mass spectrometer in reflectron positive ion mode (Bruker Daltonics, Bremen, Germany). For each spectrum a minimum of 500 shots were accumulated in a mass range of 600-3,500 Da. For calibration a standard peptide mixture procured from Bruker Daltonics (Bremen, Germany) was spotted next to each sample and used as external control. After identifying Perlucin-derived peptides via a search against the SWISS-PROT database, an internal recalibration was performed using these peptides to reach highest mass accuracy.

Liquid Chromatography Electrospray Ionisation Mass Spectrometry
For analysis peptides eluted from acrylamide gels (see above) were separated using an Ultimate 3000 RSLCnano system (Dionex-LC Packings, Idstein, Germany). Samples were loaded onto a trapping column (Acclaim PepMap Nanotrap, 75 mm620 mm) by the loading pump of the system operating at 5 mL/min, and 0.1% trifluoroacetic acid in water was used as mobile phase. After 6 min, the valve was switched and the sample was eluted onto the analytical separation column (Acclaim PepMap RSLC, 75 mm6150 mm), using a flow rate of 300 nL/ min. The mobile phases used were H 2 O/0.1% formic acid (v/v) for buffer A and 80% ACN/0.1% formic acid (v/v) for buffer B. Peptides were resolved by gradient elution using a gradient of 22 55% buffer B over 30 min, followed by a gradient of 50290% buffer B over 2 min. After 5 min at 90% B the gradient returned to 5% buffer B in preparation for the next run. Column effluent was monitored using a 3 nL UV flow cell (214 nm). Mass spectrometric analysis was done via ESI-MS/MS using a LTQ-Orbitrap Velos mass spectrometer (Thermo Fisher, Bremen, Germany) equipped with a nano-electrospray ion source. The mass spectrometer was operated in the data dependent mode. Survey MS scans were acquired in the orbitrap with the resolution set to 30.000. Up to 15 most intense ions per scan were fragmented and analysed in the linear ion trap. The raw files were processed using Proteome Discoverer software version 1.2 (Thermo). The peak list files were searched against a homemade Perlucin database using the SEQUEST search engine. The initial parent and fragment ion maximum mass deviation were set to 8 ppm and 0.8 Da, respectively. The search included variable modifications of oxidation of methionine and carbamidomethylation of cysteine. The false discovery rate was set to 0.01.

Determination of the Hydrodynamic Diameter
The hydrodynamic diameter of the three different splice variants in solution was determined by dynamic light scattering at a scattering angle of 90u using a L.i.SA light scattering machine (Fraunhofer IFAM, Bremen, Germany) [46]. Every splice variant was measured once after one month and again after two month in 20 mM NaHCO 3 , 500 mM NaCl, pH 8.7 at a concentration of 100 mg/mL, respectively.
The equations for the calculation of the theoretical hydrodynamic radii of the proteins were described elsewhere [47,48]. In brief, for native proteins: ln(rh) = 0.29 ln (number of residues) + 1.56, for denatured proteins: ln(rh) = 0.57 ln (number of residues) + 0.79 were applied. Whereas the 39ends of cDNAs encoding for Perlucin-R5 and Perlucin-R8 are identical, they are different in Perlucin-R0 mRNA. This could be confirmed by the sequencing of different clones derived from different cDNA preparations. All these variants contain the same characteristic signal peptide leader sequence, predicted with high score by SignalIP [49].

Perlucin Splice Variants
The nucleotide sequences coding for the N-terminal 130 amino acids of the C-type lectin domain were almost identical in these variants (Figure 1). Only at 3 positions were sequence deviations leading to three amino acids exchanges (M89I, V129D, R149L) where found in several Perlucin-R0 cDNA clones, suggesting the existence of at least one additional allele. Consistent with this finding, additional Perlucin-R0 cDNA clones (562 bp, 149 aa [GenBank: FN674443.1]) were later found, encoding the same amino acids at positions 89 and 126 as Perlucin-R5 and Perlucin-R8.
Perlucin-R0 basically consists of the C-type lectin domain, whereas Perlucin-R5 and Perlucin-R8 have 5 or 8 repeats of the 10 amino acids sequence SLHA(G)NLQQR(M)D, respectively. These repeats are highly conserved within the cDNA of Perlucin splice variants. According to the NetPhos 2.0 Server [50], the serine residues of the repeats are potential phosphorylation sites with a high score (0.95) unless the previous repeat contains a methionine instead of arginine at the eighth position ( Figure 1). This results in six potential phosphorylation sites in the repeat region of Perlucin-R8.

Phylogeny
In nucleotide databases several genes of Haliotis species have been assigned as (potential) Perlucins. Since the group is quite heterogeneous, we tested their phylogenetic relationships to Perlucin, focussing on the C-type lectin domains of gene products from Haliotis spec. assigned as Perlucin in the databases (Figure 2) in comparison to C-type lectins from other animal species. The resulting tree is divided into two main groups. One group represents the C-type lectins from Haliotis spec., whereas the other group comprises C-type lectins from vertebrate and other invertebrate species. The Haliotis spec. group is divided into several branches. Interestingly, genes from H. discus discus are distributed over different branches. Only one of these (HddPerl5), which has experimentally been confirmed as Perlucin [14], is closely related to H. laevigata Perlucin. This cluster most likely represents real Perlucins (Figure 2) comprising further genes from H. varia and H. asinina.
Another set of genes, including HddPerl3, although less related to the Perlucin cluster, is clearly separated from the majority of other genes encoding C-type lectins from H. discus discus and H. diversicolor. These ''Perlucin-like'' genes might encode for active Perlucins, due to their similarity, but this needs to be verified experimentally.

C-terminal Repeats in Native Perlucin
The discovery of Perlucin mRNA splice variants encoding for proteins with different C-termini raised the important questions, of (I) whether these occur in native protein from nacre and (II) whether they might be the reason for the heterogeneity of purified Perlucin. If expressed in COS-7 cells, C-terminally Strep-tagged recombinant Perlucin variants migrate in SDS-PAGE at 15 kDa (Perlucin-R0), 20 kDa (Perlucin-R5) and 25 kDa (Perlucin-R8) ( Figure 3A). The electrophoretic migration behaviour of the recombinant proteins correlates quite well with those observed for Perlucin preparations from the shell of H. laevigata using a standardised protocol [13] (Figure 3B). Native Perlucin appeared as several spots on 2D electrophoresis in a broad pH range from approximately 5.0 to 9.5 and apparent molecular masses from 27 kDa to 10 kDa ( Figure 3C) representing the three prominent distinct bands detectable in SDS-PAGE at 25 kDa, slightly below 25 kDa and 15 kDa ( Figure 3B). Seven spots from the 2D electrophoresis ( Figure 3C) were selected for MALDI-ToF MS analysis and digested with trypsin: Spots 1 and 2 in the range of 25 kDa, spots 3 and 4 at approximately 20 kDa, spot 5 at 15 kDa, spot 6 below 15 kDa.
Most of the peptide masses expected from the published Perlucin sequence [12] were identified in all samples (except spot 3) demonstrating that these spots represent Perlucin ( Table 2). The peptide containing the single putative N-glycosylation site was detected only in spot 4 ( Figure 1). As, depending on their sequence, tryptic peptides show different ionization properties in matrix assisted laser desorption ionization (MALDI) or electrospray ionization (ESI), we reanalysed all spots by LC-ESI-MS so as to not miss any sequence information. In fact through this method we found an additional peptide DSLHANLQQMDSL-HANLQQR (Figure 1) including the repeat sequence found in the C-termini of the splice variants Perlucin-R5 and Perlucin-R8, but not Perlucin-R0, in spot 1 and 2, in low intensity in 4, but not in spot 5 and 6. These findings clearly demonstrate the presence of Perlucin variants in the organic material of nacre.

Recombinant Protein Expression and Purification
For further characterization the splice variants of Perlucin were expressed in E. coli as fusion proteins, each with His-tagged MBP and Strep-tag at the N-and C-termini, respectively. This allowed tandem tag affinity purification first via the His-tag and then via the Strep-tag leading to a high purity of the proteins, as demonstrated by SDS-PAGE. As a control MBP without Perlucin was prepared accordingly (Figure 4).

Calcium Carbonate Precipitation Assay
Critical questions were (I) whether the recombinant variants of Perlucin modulate the precipitation of CaCO 3 and (II) whether the absence or presence of the C-terminal repeats and their number has an influence on this function of Perlucin. To investigate this, we used a CaCO 3 precipitation assay with the MBP fusion proteins. A similar assay [13] had been used to establish that native Perlucin accelerates the precipitation of CaCO 3 . Furthermore, this assay has been also performed with a recombinant MBP-Perlucin fusion protein [14] derived from H. discus discus. As expected for functional Perlucin, the recombinant MBP-Perlucin fusion proteins lead to an accelerated precipitation of CaCO 3 compared to MBP alone. When tested with 100 mg/mL freshly prepared MBP-Perlucin splice variants, significant differences on the precipitation behaviour of CaCO 3 were observed with an increasing effect in the order from MBP-Perlucin-R0 to MBP-Perlucin-R5 to MBP-Perlucin-R8 ( Figure 5B) compared with 100 mg/mL BSA or MBP alone ( Figure 5A).
In a next step, we addressed whether this accelerating effect on CaCO 3 precipitation depends on Perlucin concentrations. Therefore, we set up assays in which we combined different amounts of MBP-Perlucin-R8 with MBP added to a total of 75 mg protein.
Under these conditions, 25 mg MBP-Perlucin-R8 barely accelerated the precipitation of CaCO 3 , whereas in the presence of 50 mg MBP-Perlucin-R8 precipitation started after approx. 2,500 seconds and with 75 mg MBP-Perlucin-R8 accelerated precipitation of CaCO 3 was even stronger after approx. 500 seconds with a pH shift from pH 8.6 to pH 7.9 ( Figure 6). For MBP-Perlucin-R0 a similar concentration dependency on CaCO 3 precipitation was observed.

Determination of the Hydrodynamic Diameter
The stronger accelerating effect of MBP-Perlucin-R5 and MBP-Perlucin-R8 on CaCO 3 precipitation could be due to the formation of higher order Perlucin complexes induced by the Cterminal peptide repeats. This hypothesis was supported by the observations that storing Perlucin preparations for longer time periods leads to faster precipitation of CaCO 3 . To investigate the formation of Perlucin oligomers, we determined the hydrodynamic diameters of the different Perlucin variants stored for one or two months in 20 mM NaHCO 3 , 500 mM NaCl, pH 8.7 using the same protein concentrations. For comparison we used MBP, which shows almost no aggregation (hydrodynamic diameter below 10 nm). In contrast, after one month the hydrodynamic diameters for all Perlucin variant fusion proteins were at least 10 times larger and after 2 months had increased further to almost 100 times greater than the theoretically calculated diameter for MBP-Perlucin-R5 and MBP-Perlucin-R8 containing 5 or 8 repeats, respectively, which showed a bimodal particle size distribution and formed aggregates of up to 750 nm diameter (see Table 3).
The mass spectrometry data of purified native Perlucin digested with trypsin clearly showed that these variants are present in the shell of H. laevigata, since a NH 2 -DSLHANLQQMDSL-HANLQQR-COOH peptide was identified in protein spots with apparent molecular weights of 25 kDa and 20 kDa, but not in the spots 5 and 6 appearing at around 15 kDa ( Figure 3). Interestingly, this peptide does not match the sequence by Mann et al [12], but is consistent with the cDNA sequences encoding Perlucin-R5 and Perlucin-R8 (Figure 1). Current database information is insuffi-cient to identify such repeats in Perlucin from other species. However, it is quite possible that Perlucin variants with similar repeats will be identified in other species of the genus Haliotis using a similar strategy.
Several genes or proteins have been assigned as Perlucin or Perlucin-like in DNA and protein databases. To address this remarkable diversity we performed a phylogenetic analysis ( Figure 2). The two proteins for which Perlucin activity has been reported from H. laevigata (HlaevPerl0) and from H. discus discus (HddPerl5) [12][13][14]32] are closely related to the putative proteins from H. asinina (Hasi551, Hasi499) and H. varia (HvarPerl). This high degree of similarity suggests that they are Perlucin orthologs  and therefore their assignments as Perlucins are likely to be justified. Several other genes from H. discus discus and H. diversicolor have also been assigned as Perlucins. Only three of these genes share the main branch with Perlucins and are therefore considered to be Perlucin-like proteins, whereas the other genes are only distantly related to Perlucin and are clustered in a separate branch, thus it is unlikely that they have Perlucin-like properties, in particular the promotion of CaCO 3 crystallisation, and probably represent other C-type lectins. Obviously, clarification of the experimental evidence of all proteins is necessary in order to determine which of these can be classified as Perlucin.
Recently, Perlucin-like proteins have been found in the transcriptome of Mytilus edulis [51] and in the genome study of Pinctada fucata [52]. Although there is as yet no experimental evidence that these genes are involved in shell formation in bivalves [52], Perlucin and Perlucin-like proteins may be conserved proteins in bivalve and gastropods, derived from a common ancestor.
In addition to alternative splicing, posttranslational modifications clearly lead to heterogeneity. One motif for N-glycosylation has been identified in Perlucin [12]. Our MALDI-ToF MS data demonstrate that the non-glycosylated protein also occurs, since the corresponding non-glycosylated peptide was identified in the peptide map of spot 4 (20 kDa, Figure 1). Obviously, this does not exclude the existence of glycosylated forms of Perlucin exist. Since the glycan structures of Haliotis are unknown, the molecular masses of the corresponding glycopeptides cannot be predicted and no evidence for the non-glycosylated peptide was found in any of the other spots. Another important posttranslational modification might be phosphorylation, which for example positively influences calcium binding by Orchestin, a soluble matrix protein of the crustacean calcium storage structures [53]. Purified native Perlucin consists of several species of different isoelectric points ( Figure 3B) consistent with patterns observed for phosphorylation and supported by several predicted serine phosphorylation sites. One important functional parameter for Perlucin is its ability to accelerate the precipitation of CaCO 3 from a saturated solution [13] [32]. Apparently this important function is modulated by the repeats, since the precipitation velocity increases with their number ( Figure 5) and H. laevigata Perlucin-R8 led to the most pronounced and fastest CaCO 3 precipitation in a concentration dependent manner ( Figure 5). This is in good agreement with CaCO 3 precipitation induced by native Perlucin from H. laevigata [13]. Perlucin from H. discus discus (no repeats) fused to MBP accelerated the precipitation of CaCO 3 [14], at a rate similar to H. laevigata Perlucin-R0 (Figure 6), whereas recombinant H. laevigata Perlucin (two repeats) fused to GFP did not accelerate the precipitation of CaCO 3 compared to control protein Concanav-alin A or even prevented the precipitation, if added as a CaCl 2 solution [34].
In contrast to Perlucin, other proteins appear to stabilize the supersaturated CaCO 3 solutions, as shown for MBP and BSA ( Figure 5A). In this context it is important to note that crude soluble matrix proteins from the shells of H. laevigata [45] or Crassostrea virginica [8] inhibit crystals growth of CaCO 3 . Thus, it is tempting to speculate that the balance between crystallisationinhibiting matrix proteins and Perlucin controls the crystal growth in nacre. Our observation that the splice variants promote CaCO 3 precipitation at different rates supports the idea that the repeats play a role in the fine tuning of aragonite formation. Along this line, it is interesting to note that the number of repeats influences the rate of higher order Perlucin complex formation (Table 3).
Other C-type lectins oligomerise via C-terminal extensions, such as the mannose-binding protein A with its collagen like Cterminus [54], collectins [55] or tetranectins [56]. Although the Cterminal repeats do not resemble collagen, they might act in a similar way to oligomerise Perlucin leading to the observed aggregates (Table 3). Recently, Evans et al. proposed [35] that the repeats described here might represent an intrinsically disordered region. This would explain the protein aggregates of almost 1 mm that we observed for the splice variants R5 and R8. Similarly, a function of intrinsically disordered regions has been proposed for protein r-n16.3 from pearl oyster, which oligomerises in a wide pH range and the radii of these oligomers increase in the presence of Ca 2+ [57] and the abalone nacre protein AP7 [58]. Furthermore, Parham et al. [59] demonstrated that the N-terminal repeats of the yeast protein Sup35p stabilise protein aggregates similar to the human prion protein PrP.
Although the dynamic light scattering results have to be considered as ''preliminary'' and more extensive studies concerning this phenomenon have to be addressed, they provide evidence for a very interesting functional aspect of the repeats: Perlucin-R5 and Perlucin-R8 may oligomerise in the same way and this oligomerisation could lead to the large aggregates we found. And these large aggregates could in turn lead to a faster precipitation of CaCO 3 , fine-tuned depending on of the length of the repeats.

Conclusions
Alternative splicing of H. laevigata Perlucin leads to variants with different number of C-terminal 10 mer peptide repeats. With increasing number of repeats Perlucin forms aggregates with a hydrodynamic radius of up to almost 1 mm. Interestingly, the number of repeats also correlates with the promotion of CaCO 3 precipitation, a hallmark of Perlucin.
It has been hypothesised that the accumulation of secreted Perlucin leads to aggregates serving as nucleation seeds, thereby accelerating the precipitation of CaCO 3 . In agreement with this hypothesis, native H. laevigata Perlucin and Perlucin-GFP fusion constructs are incorporated in CaCO 3 crystals [32,34]. Crystal growth could be further influenced by other proteins of the organic matrix acting as promoters or inhibitors of this process. Along this line, alteration of expression rates and secretion levels of the Perlucin splice variants reported here, with their altered aggregation behaviour, appear ideal for modulating rate and composition of aragonite platelets and thus fine tuning nacre formation.  Hydrodynamic diameters were determined as described in the Methods section after storing in 20 mM NaHCO 3 500 mM NaCl pH 8.7 for one or two months. Theoretical hydrodynamic diameters were calculated as described [47,48]. doi:10.1371/journal.pone.0097126.t003