Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Massively Parallel Amplicon Sequencing Reveals Isotype-Specific Variability of Antimicrobial Peptide Transcripts in Mytilus galloprovincialis

Massively Parallel Amplicon Sequencing Reveals Isotype-Specific Variability of Antimicrobial Peptide Transcripts in Mytilus galloprovincialis

  • Umberto Rosani, 
  • Laura Varotto, 
  • Alberta Rossi, 
  • Philippe Roch, 
  • Beatriz Novoa, 
  • Antonio Figueras, 
  • Alberto Pallavicini, 
  • Paola Venier



Effective innate responses against potential pathogens are essential in the living world and possibly contributed to the evolutionary success of invertebrates. Taken together, antimicrobial peptide (AMP) precursors of defensin, mytilin, myticin and mytimycin can represent about 40% of the hemocyte transcriptome in mussels injected with viral-like and bacterial preparations, and unique profiles of myticin C variants are expressed in single mussels. Based on amplicon pyrosequencing, we have ascertained and compared the natural and Vibrio-induced diversity of AMP transcripts in mussel hemocytes from three European regions.

Methodology/Principal Findings

Hemolymph was collected from mussels farmed in the coastal regions of Palavas (France), Vigo (Spain) and Venice (Italy). To represent the AMP families known in M. galloprovincialis, nine transcript sequences have been selected, amplified from hemocyte RNA and subjected to pyrosequencing. Hemolymph from farmed (offshore) and wild (lagoon) Venice mussels, both injected with 107 Vibrio cells, were similarly processed. Amplicon pyrosequencing emphasized the AMP transcript diversity, with Single Nucleotide Changes (SNC) minimal for mytilin B/C and maximal for arthropod-like defensin and myticin C. Ratio of non-synonymous vs. synonymous changes also greatly differed between AMP isotypes. Overall, each amplicon revealed similar levels of nucleotidic variation across geographical regions, with two main sequence patterns confirmed for mytimycin and no substantial changes after immunostimulation.


Barcoding and bidirectional pyrosequencing allowed us to map and compare the transcript diversity of known mussel AMPs. Though most of the genuine cds variation was common to the analyzed samples we could estimate from 9 to 106 peptide variants in hemolymph pools representing 100 mussels, depending on the AMP isoform and sampling site. In this study, no prevailing SNC patterns related to geographical origin or Vibrio injection emerged. Whether or not the contact with potential pathogens can increase the amount of AMP transcript variants in mussels requires additional study.


Mytilus species (Phylum Mollusca, Class Bivalvia) are intertidal filter-feeders distributed worldwide, anchored to hard substrates in dense communities and widely used as bio-sensors of coastal pollution. Mussel populations of the northern and southern hemisphere probably separated 0.54–1.31 million years ago, far after the trans-Arctic expansion towards North America, and before the divergence between the Atlantic and Mediterranean ecotypes [1]. M. galloprovincialis hybridizes with M. edulis in southwest England and the Mediterranean mussel is now reported in Eastern Asia, California, Chile and Western Australia [1][2].

At different latitudes, mussels face tidal and seasonal fluctuations, changeable pollutant loads and also the surrounding bioma with behavioral changes [3], metabolic adjustments [4] and a variety of defense reactions [5][7]. With the exception of a few metazoan parasites which also somewhat affect the Mytilus species [8][9], mussels seem refractory to diseases and could instead influence the prevalence of pathogens such as Perkinsus spp. and Betanodavirus (Nodaviridae) in other bivalves and fishes, respectively [10]. Like other invertebrates, bivalve molluscs rely on ancient and rapid defenses to fight potential pathogens, and gene-encoded antimicrobial peptides (AMPs) are major humoral components of their immune system.

Host defense peptides are present in virtually all living organisms, with more than 30 AMPs expressed in humans and about 200 peptides identified in insects (approximately 1500 molecules very diverse in sequence and secondary structures are reported in specific databases). [11][13]. Among other structural features, a conserved γ-core motif originated from the bidirectional orientation of specific aminoacid residues including an invariant cysteine array has indicated the evolutionary relatedness of cysteine-stabilized α-β (CS-αβ) AMPs, kinocidins, invertebrate toxins and snake venoms: such unifying structure provides an interesting hypothesis for context-specific action modes, from the perturbation of negatively charged cell membranes and ion channels to the immunoregulatory functions [14].

In the continuous fight with competitors, predators and pathogens, the evolutionary diversification of AMP types and gene families likely occurred through events of gene duplication, shuffling of functional elements and selection for variation at positions adjacent, or integral to, the conserved structural motifs [13], [15]. In Crassostrea gigas, combined mechanisms of sequence diversification (e.g. recombination, parallel homoplasic mutations, indel events) and directional selection have been suggested to explain the remarkable gene multiplicity and variable copy number of defensins and proline-rich peptides, whereas the marked transcript diversity of Cg-bpi, a bactericidal permeability protein, has been mainly referred to the allelic polymorphism of one single gene [16]. Microsatellite-mediated mosaics of sequence elements, low-transcription fidelity and transcript editing support the evidence of about 50 polymorphic genes, and an extraordinary diverse set of Sp185/333 proteins expressed in response to pathogens by the purple sea urchin [17][18]. Worthy of note, the copy number polymorphism of α and β defensin genes with proportional peptide levels in neutrophylic granulocytes, has been related to the individual risk of infection in humans [19][20].

Tens of different AMPs or AMP families have been discovered in marine invertebrates [21]. In the mussels M. galloprovincialis and M. edulis, four different groups of CS- αβ AMPs with multiple isoforms have been discovered and classified according to their primary sequence and secondary structure: defensins reported as MGD1 and MGD2, mytilin A, B, C, D and E, myticin A, B and C, mytimycin, the only strictly antifungal peptide with an EF-hand like domain [22][25]. These AMPs share small size (3.7–4.5 kDa, except mytimycin of 6.2 kDa), positive charge and amphiphilic behavior. Their precursors (pre-pro-peptides) consist of an N-terminal signal peptide, a central mature peptide and a C-terminal extension. Each family is characterized by a cysteine array of 8 (12 in mytimycin) cysteines engaged in intramolecular disulfide bonds.

A broad spectrum of activity, often complementary and not strictly antibacterial, was reported for the mussel defensins, mytilins and myticins [26][27] whereas mytimycin, a 6.2 kDa peptide isolated from normal and immunostimulated mussels, selectively inhibited Neurospora and Fusarium growth [22]. Whether purified in sufficient amounts from cellular fractions or obtained in stable conformations by chemical synthesis or recombinant system, pure peptides are essential to investigate the antibiotic power of the different mussel AMPs.

In situ hybridization and immunolocalization assays performed on mussel hemocytes demonstrated a partially overlapping expression of defensins and mytilins [28]. AMP expression and stored peptides have been observed in several tissues and developmental stages [28][29] to indicate that cells other than hemocytes can produce and release AMPs, a phenomenon well known from frog skin [30] and the male reproductive system of rats [31]. Overall, mussel AMPs display rather complex expression patterns, dependent on developmental stage, seasonality and immunostimulation [32][34]. In M. galloprovincialis, massive EST sequencing confirmed the abundance and transcript diversity of AMPs and other key players of the innate immunity [35]. The AMP precursors represented 26–43% of the hemocyte transcripts in mussels injected with viral-like and bacterial preparations; in particular, 74 precursor and 25 mature peptide variants of myticin C were detected in a sample of only 100 mussels, with unique profiles of transcript variants in single mussels and less common alleles differing at single nucleotide positions from the two most common ones [29], [36].

The myticin C variation is also remarkable compared to mytilin B, as one mussel can produce 2–10 different mytilin B transcripts but silent substitutions restrict the peptide variants to only a few [37]. In spite of the abundance of other AMPs, just one singleton plus 4 similar sequences denote mytimycin in Mytibase, interactive catalogue including 18788 expressed sequence tags (ESTs) of M. galloprovincialis [25]. Sequencing and Southern blot data indicate one gene copy per genome for defensin MGD2, mytilin B and myticin C [38], [36]. Two gene copies or allelic polymorphism could explain the simultaneous presence of two length variants of the mytimycin gene per mussel [39]. The gene copy number of the mussel AMPs need verification since partial gene sequences covering the coding sequence (cds) are only available for MGD1, mytilin B, myticin C and mytimycin [29], [35][39]. Two 3D structures have been established by NMR spectrometry, defensin MGD1 [40] and mytilin B [41].

Thus far no Mytilus genome has been sequenced and, compared to better known model organisms, a limited number of genes have been investigated: for instance those concerning defensin and mytilins [38], heat shock proteins [42], metallothioneins [43] and apoptotic caspases [44]. Also gene-centered studies take advantage of the massive production of ESTs which currently contributes to the identification of molecules and pathways underlying the mussel response to various natural and experimental conditions [45][48]. Among the 67,726 ESTs and 4680 aminoacid sequences publicly available for the Mytilus genus, about 29 and 32%, respectively, refer to M. galloprovincialis (, June 2011). Recently, new ESTs have been produced from the digestive gland, foot, gill and mantle of M. galloprovincialis by advanced sequencing [49]. The so called ‘pyrosequencing’ was the first alternative to the use of chain-terminating inhibitors [50] and it has radically increased the sequencing power as well as the resolution of low-abundance variants [51][53]. Adequate read coverage can assure reliable quantification of single nucleotide changes (SNC) when seeking critical mutations or sequence polymorphisms.

Based on 454 pyrosequencing, we have thoroughly studied the sequence diversity of 9 different AMP precursors expressed in hemocytes of mussels (M. galloprovincialis) farmed in three European regions. Similarly, we have analyzed and compared mussels farmed offshore or living inside the Venice Lagoon (Italy), before and after injection with live Vibrio cells.


In Table 1 we summarized the main features of nine CS- αβ AMPs expressed in M. galloprovincialis.

Following appropriate primer design, we amplified the related transcript sequences from hemolymph pools representing groups of 100 mussels farmed in south France (Pa), northwest Spain (Vi) and northeast Italy (Ve) or native from the industrial canals of the Lagoon of Venice (Ve nc). In addition, groups of 40 offshore-farmed (Ve ft) and lagoon-native (Ve nt) mussels were injected with 107 live Vibrio splendidus cells and similarly processed (Table 2). The resulting 78 PCR products (13 amplicons×6 samples) were then purified, quantified, diluted to the appropriate concentration (∼7*109 molecules/µl) to compose two equimolecular pools for the emulsion PCR (cDNA concentration was 2.3 and 2.5 ng/µl, respectively) and bidirectional sequencing.

Table 2. Description of the mussel haemolymph samples processed for 454 pyrosequencing.

Overall, massively parallel sequencing produced 359,867 output reads and more than 73 Mbases with good quality scores, for a total of 304,621 trimmed reads with 226 bp average length. We discarded about 15% of reads per sample (88% of them shorter than 70 bp) which possibly originated in the PCR amplification or sequencing reaction (short sequences are not expected to bias the amplicon coverage nor the accuracy of SNCs detection).

Total or partial overlapping of the forward and reverse reads allowed the complete coverage of the 13 reference sequences. Hence, 97.5% of the good quality reads correctly mapped against the 9 selected AMPs and could be attributed to the 6 original samples. With the exception of MytM in the sample Ve ft, at least 1034 reads mapped on each AMP precursor transcript (range 1034–10814, File S3). Figure 1 shows relevant differences in the average base coverage calculated per AMP precursor transcript in each sample (3378×, the average read depth per AMP).

Figure 1. Average base coverage (x) per AMP and sample.

Coverage is calculated as total sequenced base divided by the length of the amplified transcript.

Separately for each AMP and sample, we subsequently grouped the reads having the same length and 100% identity in two cluster types: (1) equal to the original transcript and (2) with at least 1 SNC. The last ones were employed for SNC detection and related analysis. Only the SNCs covered 30×, and representing at least 3% of the reads mapping a given AMP, were considered genuine and counted per AMP precursor (Table 3). Based on the SNC counts, the average value of SNC per base calculated for the coding sequence was 0.18. Considering all AMPs together, as much as 134 SNCs were common to the 6 samples and represented 86% of the genuine cds variation (File S4). The SNC frequency values (cds) indicated MytC and MGD1 as the most polymorphic AMP transcripts, opposite to those of mytilins B and C. The fraction of non-synonymous changes ranged from null (MytlC) to 25 (MytC) whereas the related ω values highlighted the opposite cases of MytlC (null) and MytlD (9–15).

Table 3. Number of SNCs detected in the whole AMP transcript precursor and cds, related frequencies, non-synonymous changes and ω values.

All SNC frequency values were used to test statistically the sequence diversity of the 9 selected AMPs (one-way ANOVA, α = 0.001, followed by Tukey's Honestly Significant Difference test, α = 0.05). The null hypothesis (equal SNC frequency between AMPs) was rejected and, according to the Tukey's HSD test we classified the 9 cases from the least changeable mytilins, to the most polymorphic myticins and defensins (Figure 2).

Figure 2. AMP precursors grouped on the basis of their variability.

Grouping based on HSD test, α = 0.05; sample-specific and averaged diversity levels are reported in Table 3.

Common and exclusive SNCs are reported in Figure 3 per geographical region: the common changes represent the majority, with 68, 74 and 78% in the Vi, Ve and Pa samples, respectively. Similar percentages of common SNCs were found in mussels farmed offshore or living wild in the industrial canals of the Venice lagoon (Ve versus Ve nc), injected or not with 107 live V. splendidus (Ve versus Ve ft, Ve nc versus Ve nt). Details on the immune stimulation and related host response are reported elsewhere [34], [35].

Figure 3. Common and exclusive SNCs in the Pa, Vi and Ve samples and paired samples from native and immunostimulated mussels.

Finally, we analyzed the nucleotidic and aminoacidic substitution patterns of each AMP precursor, separately in the samples from Palavas, Vigo and Venice. The percentage of read clusters differing at least 1 SNC ranged from 56% (MytlD) to 100% (MytB and MGDt). The read clusters covered at least 3× were virtually translated into amino acids, and redundancy due to silent substitutions was removed. As reported in Table 4 we could estimate a number of AMP transcript variants ranging from 9 (Pa MytM) to nearly 100 (Ve MytB, Vi MGD1, Ve MGD1 and Ve MGDt).

4. AMP transcript variants and expected number of non-redundant peptides per geographical region.

The facts emerging from 454 pyrosequencing of the Mytimycin and Myticin C amplicons are reported here below in more detail as instructive examples.

The two gene sequences publicly available for the MytM of M. galloprovincialis (FJ804479.1, FJ804478.1) denote 3 exons and 2 introns, and differ only in the length of intron 2; the short and long version of it occurring simultaneously in single mussels [39]. The amplicon designed in the present work covered 429/456 bp, i.e. 94% of the cds.

The MytM pyrosequencing yielded 6645 aligned reads (33, 30 and 35% of them differing from the reference sequence in the Pa, Vi and Ve samples, respectively).

A nucleotide change in position 58 (Thymine in the place of Cytosine) was detected in all reads and was not considered as SNC because it could represent an error occurred during Sanger sequencing of the original Mytibase singleton (MGC05878, File S4 in Supplementary Materials). The AVA software (Roche Life Sciences) grouped the Pa, Vi and Ve MytM reads in 31, 206 and 32 clusters or singletons (27, 199 and 31 of high quality), respectively. In total, 5690 high quality sequences (86%) were translated in amino acids and produced 9–49 expected peptide variants (Table 4).

Irrespective of the geographical origin, we could consistently identify the two most abundant MytM types: a consensus very similar to the original sequence (MytM_1, MGC05878) and a second one (MytM_2) similar to the sequence MytM-P recently described [39]. Jointly, MytM_1 and MytM_2 represent the 87, 78 and 76% of the MytM reads in the samples Pa, Vi and Ve, respectively (Vi MytM display higher sequence variability than the other two samples).

MytM_2 represents 20, 8 and 11% of the MytM reads in samples Pa, Vi and Ve, respectively, and displays 25 SNCs. Eighteen out of 25 changes are non-synonymous (7 in the signal peptide, 8 in the mature peptide and 3 in the C-terminal extension) and confirm the substitutions detected in MytM-P by Sonthi et al. using a Sanger approach. Figure 4 locates all genuine SNCs along the original MytM sequence (cds).

Figure 4. SNCs mapped along the MytM sequence in the Pa, Ve and Vi samples.

Synonymous (empty) and non-synonymous (filled) changes are located on the AMP precursor sequence. Empty horizontal bars indicates cds, mature peptide sequence is underlined.

Partial gene sequences including the cds have been reported for MytC and denote 3 exons and 2 introns, with the mature peptide entirely located in exon 2 [29], [36].

The MytC pyrosequencing yielded 22,119 aligned reads fully covering the cds (71%, 88% and 79% of them differing from the original sequence in the Pa, Vi and Ve samples, respectively). The Pa, Vi and Ve MytC reads could be grouped in 844, 777 and 468 clusters or singletons (823, 752 and 466 of high quality), respectively. In total, 18113 high quality reads (82%) were translated in amino acids and produced 41–93 expected peptide variants (Table 4) without evidence of prevailing variation patterns.

Despite the remarkable number of sequence variants, 98.8% of the MytC peptide clusters retained the typical cysteine array. Figure 5 locates all genuine SNCs along the original sequence of MytC. Amplicon pyrosequencing confirmed 3/5, 17/27 and 19/33 changes previously detected in the signal mature peptide, and C-terminal regions, respectively [36] and revealed additional changeable positions. As observed in MytM, different SNC combinations increase the total number of possible peptide variants.

Figure 5. SNCs mapped along the MytC sequence in the Pa, Ve and Vi samples.

Synonymous (empty) and non-synonymous (filled) changes are located on the AMP precursor sequence. Empty horizontal bars indicates cds, mature peptide sequence is underlined.

The maps locating all synonymous and non-synonymous SNCs along the transcript sequence (cds) of the remaining AMPs are reported in Supplementary Materials (File S5) for comparison.


This study intended to assess, by high-throughput amplicon sequencing, the natural variability of nine AMP precursor sequences found expressed in the Mediterranean mussel. For this purpose, we sampled mussels from farming sites subjected to common European regulations and sanitary controls in three producer countries.

Thirteen sequences almost covering the AMP cds (File S1) have been successfully amplified from hemocyte RNA samples representing mussels farmed in France, Spain and Italy, as well as native mussels of the Venice lagoon area, before and after Vibrio injection, for coupled comparison.

Sample preparation is a decisive step of the sequencing workflow, due to the difficulty in preparing a well-balanced unique amplicon pool for the emulsion PCR and subsequent pyrosequencing. For this purpose, we measured and equalized the concentrations of each amplicon with great attention before pooling. Although the two sequencing half plates produced similar read numbers per sample, a general variability of coverage depth between AMP amplicons was finally evident (Table 3, Figure 1). Nevertheless, we obtained at least 500× amplicon coverage with one only exception (MytM in the Ve ft sample).

Stringent criteria were then used to identify genuine SNCs, removing false positives without losing substantial information, and we could retain most of the output reads (82% and 86% to exemplify Myticin C and Mytimycin, respectively). At first look, the output reads of each AMP appeared very diverse, 88% with at least one SNC on average (100% for MGDt, no matter from which sample). The analysis of SNC frequency per base enabled us to rank the selected AMPs on the basis of the transcript variability (Figure 4). Despite the invariance of the cysteine array, each AMP showed typical levels of diversity irrespective of the geographical origin, with a majority of common SNCs present in all samples (86%, i.e. 134 SNCs) and the Vi sample showing the greatest number of SNCs (292 in total). Moreover, we did not see evidence of increased AMP sequence diversity in farmed and native mussels injected with a high dose of live Vibrio cells. Compared to the samples prepared from 100 mussels (Pa, Vi, Ve, Ve nc), those prepared from 40 mussels (Ve ft, Ve nt) showed fewer SNCs (−11%); a fact indicating that the sample size can limit the amount of detectable sequence variants.

The ratio of non-synonymous vs. synonymous changes (ω) substantiates the evolutionary diversification of the mussel AMP isotypes and suggests the functional advantage of transcript variability for most of the analyzed AMPs (ω values higher than 1, indicative of positively selected residues, were frequently detected). However, the ω values did not reflect precisely the classification based on SNC frequencies and some AMP (e.g. MytlD) may have been subjected to higher evolutionary pressure than others.

The virtual translation of the transcript consensuses resulting from read clustering allowed the identification of AMP isotypes with a relatively low number of SNCs and high sequence diversity according to different SNC combinations (e.g. MytB and MGDt). In the case of mytimycin, we confirmed two major sequence types previously described [39] with no evidence of additional variants. For the remaining AMPs, it was not possible to identify specific patterns of variation (amino acid changes combined together without scheme).

In conclusion, the sequence data reported in this study further emphasize the sequence diversity of mussel AMP precursors. Redundant expression of diverse AMPs with a broad range of action could be regarded as a strategy to reinforce the host response against invaders (foes trying in their turn to escape detection and the host reactions) while the immune system also has to maintain the organism homeostasis with appropriate responses to commensal microbes (friends) and to danger signals released by damaged host cells [54]. On the other hand, environmental factors act as selective force only if they change the distribution of host genotypes (affecting only some genotypes, not all), thus influencing the immune system evolution of the host in the context of its life-history and population traits [55].

The isotype diversity levels found in this study might result from events occurring at DNA level as well as post-transcriptional changes such as deaminase-mediated cytidine to uridine transitions [14], [56][57]. Hence, targeted sequence enrichment and extension strategies applied to genomic DNA could identify active and remnant gene copies of each AMP isotype, and reveal the mechanisms underlying the observed sequence variation.

Materials and Methods

Sampling sites, treatment and RNA extraction

Adult mussels (Mytilus galloprovincialis) with a shell length of 6–8 cm and mixed sex were obtained from commercial shellfish stocks near Palavas (Pa, Mediterranean Sea, France; 43°31′49 N, 03°54′53 E), Ria de Vigo (Vi, Atlantic Ocean, Spain; 42°14′32 N, 08°48′26 E) and off-shore Venice (Ve, North Adriatic Sea, Italy; 45°18′29.8 N, 12°21′32.0 E). In addition, we collected wild mussels from the industrial canals of Porto Marghera (Lagoon of Venice, Italy; 45°27′33.5 N, 12°15′41 E). More than 100 animals per group were sampled.

According to the EU Directive 91/492, mussels cultivated in waters classified A (e.g. ¼ mile off-shore in the Adriatic Sea) can be marketed without depuration and are assumed not to contain potential pathogens nor biotoxins. Due to heavy mixed pollution, shellfishing was prohibited since 1996 in the area from the industrial district (P. Marghera) to the Venice town, though the overall shellfish quality can be improved by 2 month-depuration in type A waters. Mussels farmed offshore or living in the industrial canals (Venice lagoon area) were acclimatized for one week in sea water collected at flood tide (32‰, 22°C) and fed with Isochrisis galbana. Following shell notching, 0.1 ml of exponentially growing bacteria (107 V. splendidus LGP32 cells) were injected into the posterior adductor muscle (samples Ve ft and Ve nt).

Hemolymph (1 ml per animal) was withdrawn from the posterior adductor muscle with a syringe containing 0.2 ml of Alsever solution (27 mM sodium citrate, 2.6 mM citric acid, 114 mM glucose and 72 mM NaCl in distilled water) adjusted at pH 7.4, and used to compose pools, each representative of 10 animals.

Haemocytes were pelleted by 15 min centrifugation at 800 xg (4°C), carefully resuspended in 200 µl of TRIZOL reagent (Invitrogen, Carlsbad, USA) and stored at −80°C until use. Total RNA was isolated according to the manufacturer's instructions and resuspended in RNAse-free water. A further purification step with LiCl 2 M was applied to remove possible contaminants. RNA concentration was measured by UV-spectrometry (ND1000, NanoDrop Technologies, Wilmington, USA) and the RNA integrity was verified by microcapillary electrophoresis (RNA 6000 Nano LabChip, Agilent Technologies, Palo Alto, USA).

Finally, equal quantities of each RNA pool (N = 10) were mixed together to compose a unique pool per sample (N = 100 mussels for samples Pa, Vi, Ve and Ve nc; N = 40 for samples Ve ft and Ve nt).


cDNA was synthesized from 1 µg of total RNA using SuperScript II enzyme and oligo(dT)18 primers (Invitrogen), following the manufacturer's instruction. To increase cDNA yield, the reaction was extended for a second hour, adding 0.5 µl of enzyme. cDNA was then purified with MinElute PCR Purification Kit (Qiagen, Hilden, Germany).

Primer design

Primer design was performed on the raw EST sequences denoting 9 mussel AMP isoforms in Mytibase (

All available ESTs for each selected AMP precursor were aligned using ClustalW [58] and primers were designed on conserved regions flanking the cds, whenever possible, with Primer3 [59]. Due to the pyrosequencing limit of about 250 bp, read length in forward and reverse direction, the maximum length of the PCR products was set at 440 bp (File S1). Degenerated primers have been designed to consider the whole ESTs variability (MytlC, MGD1 and MGDt). Two partially overlapping amplicons were designed for longer cds (MytC, MytlB, and MytlC) or in cases where high sequence variability made the definition of a single primer pair difficult (MytA). Thereby, 13 amplicons were designed in total. Amplicon's specificity was tested using the BlastX algorithm [60].

A tagged sequencing strategy with 5′nucleotide barcodes was implemented to facilitate the parallel processing of multiple samples [61][62]. Briefly, the forward and reverse PCR primers were modified by 5′-addition of 39 unique 5-mer barcodes (File S2). Barcodes enable the identification of the 454 reads corresponding to specific AMP, amplicon and sample so that PCR amplicons derived from multiple reactions could be combined for the sequencing run. To reduce the likelihood of misidentification, barcodes were designed not to contain homopolymers and to differ each one by at least two bases according to Roche Life Science protocols. Finally, 19-mer sequences corresponding to either the 454 Roche A Adaptor (for forward primers) or B Adaptor (for reverse primers) were fused to each PCR primer (Fusion primer).

Thermodynamic properties of Fusion primers were controlled to avoid the formation of hetero- or homo-dymer (OligoAnalyzer 3.1, Melting temperatures were fixed according to Primer3 software [63], adding 5 (±1) °C following HF Phusion polymerase instructions (Finnzymes, Espoo, Finland).

Fusion primers were designed in two sets of 39 (13 amplicons×3 samples), with each primer pair having a unique barcode.

PCR amplification

The PCR amplifications of 78 amplicons were carried out individually in a PCR volume of 50 µl with 20 ng of cDNA template, 1× Phusion HF buffer, 0.2 mM dNTPs, 1 U HF Phusion DNA polymerase, 1.5 µl DMSO and 0.2 µM of both forward and reverse primers. Amplification was performed in a Mastercycler Gradient Thermal Cycler (Eppendorf, Hamburg, Germany) programmed as follows: 98°C for 30 s followed by 35 cycles of 98°C for 10 s, 60–65°C for 20 s, 72°C for 30 s and a final extension step at 72°C for 5 min.

The resulting amplification products were run on a 2% agarose gel and visualized by SYBR Gold staining (Invitrogen) using UV light transillumination (Gel Doc XR System, Bio-Rad, Hercules, USA). Unspecific small products and primer–dimers were removed using the Agencourt AMPure system (AmPure PCR Purification kit, Brea, USA) and amplicons integrity was confirmed with Agilent 2100 Bioanalyzer (DNA-1000 chip). Good quality amplicons were finally used to compose an equimolecular pool; the number of molecules of each amplicon was calculated with the following formula:

  1. C: sample concentration (ng/µl)
  2. NA: Avogadro constant
  3. bpw: average pair basis weight (g)
  4. bp: pair basis number

Massively parallel 454 pyrosequencing (FLX-System, Roche Life Sciences, Branford, USA) was performed by BMR-Genomics ( using two PicoTiter half plates. Reads have been recorded at the Sequence Read Archive accessible at (submission ID: SRA038518.3). Six 454-output files corresponding to the 6 sequenced samples are available (SRR286638.1, SRR286639.1, SRR286640.1, SRR286641.1, SRR287657.1 and SRR287658.1).

Data analysis

Tag sequence were used as keys to part the unprocessed 454 reads into the 6 different samples by means of the GS Amplicon Variant Analyzer Software (AVAST, Roche Life Sciences). Reads in raw format were trimmed using quality score (limit 0.05) and minimum length equal to 100 bp. Subsequently, the output reads were aligned to a backbone consisting of the 9 AMP original transcripts, as obtained from the 13 reference sequences, with CLC Genomic Workbench version 4.6 (CLC Bio, Katrinrbjerg, Denmark). The total number of sequenced bases divided by the length of the amplified transcript provided the average base coverage per AMP.

The reads of each mapping (AMP isotype) showing the same length and 100% similarity were clustered together. Single nucleotide changes (SNCs) were detected considering all the aligned reads of each mapping. Non-specific and low quality matches are ignored during the process and SNCs were considered genuine only when covered at least 30×, with a minimum frequency of 3%, and setting the quality level of the changed base and surrounding bases to at least 20 and 15, respectively. SNCs located in the same codon were merged. The expected amino acid changes in the precursor and cds peptide sequences were deduced by virtual translation, and the ratio (ω) between non-synonymous and synonymous changes was also computed. SNC frequency per base in the cds region was calculated for each AMP and sample with the following formula:

  1. ntseq: cds sequenced nucleotides
  2. n SNC(AMP): number of genuine SNC per amplicon in each sample

To assess possible differences in levels of sequence diversity between the AMP precursors amplified from each of the 6 samples, data were analysed with 1-way ANOVA (α = 0.001). The null hypothesis predicted that all AMPs had the same variation rate in all samples. If not, Tukey's Honestly Significant Difference test (HSD, α = 0.05) could then discriminate different AMP groups. Genuine SNCs were mapped on the 9 AMP sequences using CLC Genomic Workbench. The cds, signal peptide, mature peptide with the cysteine array and C-terminal regions were systematically localized.

Once the correct sequence reading frame was defined the sequences covered at least 3× were virtually translated to investigate the overall patterns of variation for each AMP transcript precursor in the Pa, Vi and Ve samples. Leftover low quality read ends were manually trimmed and redundancy was removed by using Jalview [64].

Supporting Information

File S1.

Number of reads per AMP and sample.


File S2.

Detailed list of all SNCs detected per AMP precursor in the 6 samples, with type of change and related frequency.


File S3.

The maps locating all synonymous and non-synonymous SNCs along the transcript sequence (cds) of the AMPs.


File S4.

The 13 reference sequences designed to cover 9 AMP precursor transcripts of M. galloprovincialis. Sequence number and expected amplicon length with related transcript ID, precursor length, and number of clustered ESTs. All reference sequences are also reported in detail.


File S5.

Fusion primer pairs used for the AMP amplification, primer length, melting temperature and GC percentage.



A special thanks to Francesca Dal Pero for helpful advice.

Author Contributions

Conceived and designed the experiments: PV AP. Performed the experiments: UR. Analyzed the data: UR. Contributed reagents/materials/analysis tools: PV. Wrote the paper: UR PV AR. Evaluated the AMP ESTs, cured primer design and collected Ve hemolymph samples: LV. Provided Pa and Vi samples: PR BN. Critically contributed to the revision of the work: AF BN AP PR PV.


  1. 1. Gérard K, Bierne N, Borsa P, Chenuil A, Féral JP (2008) Pleistocene separation of mitochondrial lineages of Mytilus spp. mussels from Northern and Southern Hemispheres and strong genetic differentiation among southern populations. Mol Phylogenet Evol 49(1): 84–91.
  2. 2. Gilg MR, Hilbish TJ (2003) Patterns of larval dispersal and their effect on the maintenance of a blue mussel hybrid zone in southwestern England. Evolution 57(5): 1061–77.
  3. 3. Galler S (2008) Molecular basis of the catch state in molluscan smooth muscles: a catchy challenge. J Muscle Res Cell Motil 29(2–5): 73–99.
  4. 4. Babarro JM, De Zwaan A (2008) Anaerobic survival potential of four bivalves from different habitats. A comparative survey. Comp Biochem Physiol A Mol Integr Physiol 151(1): 108–13.
  5. 5. Bers AV, D'Souza F, Klijnstra JW, Willemsen PR, Wahl M (2006) Chemical defence in mussels: antifouling effect of crude extracts of the periostracum of the blue mussel Mytilus edulis. Biofouling 22(3–4): 251–9.
  6. 6. Akaishi FM, St-Jean SD, Bishay F, Clarke J, da S Rabitto I, et al. (2007) Immunological responses, histopathological finding and disease resistance of blue mussel (Mytilus edulis) exposed to treated and untreated municipal wastewater. Aquat Toxicol 82(1): 1–14.
  7. 7. Moore MN (2008) Autophagy as a second level protective process in conferring resistance to environmentally-induced oxidative stress. Autophagy 4(2): 254–6.
  8. 8. Rayyan A, Damianidis P, Antoniadou C, Chintiroglou CC (2006) Protozoan parasites in cultured mussels Mytilus galloprovincialis in the Thermaikos Gulf (north Aegean Sea, Greece). Dis Aquat Organ 70(3): 251–4.
  9. 9. Francisco CJ, Hermida MA, Santos MJ (2010) Parasites and symbionts from Mytilus galloprovincialis (Lamark, 1819) (Bivalves: Mytilidae) of the Aveiro Estuary Portugal. J Parasitol 96(1): 200–5.
  10. 10. Gomez DK, Baeck GW, Kim JH, Choresca CH Jr, Park SC (2008) Molecular detection of betanodaviruses from apparently healthy wild marine invertebrates. J Invertebr Pathol 97(3): 197–202.
  11. 11. Takahashi D, Shukla SK, Prakash O, Zhang G (2010) Structural determinants of host defense peptides for antimicrobial activity and target cell selectivity. Biochimie 92(9): 1236–41.
  12. 12. Bucki R, Leszczyńska K, Namiot A, Sokołowski W (2010) Cathelicidin LL-37: a multitask antimicrobial peptide. Arch Immunol Ther Exp (Warsz) 58(1): 15–25.
  13. 13. Tian C, Gao B, Fang Q, Ye G, Zhu S (2010) Antimicrobial peptide-like genes in Nasonia vitripennis: a genomic perspective. BMC Genomics 11: 187.
  14. 14. Yeaman MR, Yount NY (2007) Unifying themes in host defence effector polypeptides. Nat Rev Microbiol 5(9): 727–40.
  15. 15. Pujol N, Zugasti O, Wong D, Couillault C, Kurz CL, et al. (2008) Anti-fungal innate immunity in C. elegans is enhanced by evolutionary diversification of antimicrobial peptides. PLoS Pathog 4(7): e1000105.
  16. 16. Smith LC (2010) Diversification of innate immune genes: lessons from the purple sea urchin. Disease Models & Mechanisms 3: 1–6.
  17. 17. Buckley KM, Terwilliger DP, Smith LC (2008) Sequence variations in 185/333 messages from the purple sea urchin suggest posttranscriptional modifications to increase immune diversity. J Immunol 181(12): 8585–94.
  18. 18. Hamilton CE, Papavasiliou FN, Rosenberg BR (2010) Diverse functions for DNA and RNA editing in the immune system. RNA Biol 7(2): 220–8.
  19. 19. Linzmeier RM, Ganz T (2005) Human defensin gene copy number polymorphisms: comprehensive analysis of independent variation in alpha- and beta-defensin regions at 8p22-p23. Genomics 86(4): 423–30.
  20. 20. Chen Q, Hakimi M, Wu S, Jin Y, Cheng B, et al. (2010) Increased genomic copy number of DEFA1/DEFA3 is associated with susceptibility to severe sepsis in Chinese Han population. Anesthesiology 112(6): 1428–34.
  21. 21. Sperstad SV, Haug T, Blencke HM, Styrvold OB, Li C, Stensvåg K (2011) Antimicrobial peptides from marine invertebrates: Challenges and perspectives in marine antimicrobial peptide discovery. Biotechnol Adv. in press.
  22. 22. Charlet M, Chernysh S, Philippe H, Hétru C, Hoffmann J, et al. (1996) Innate immunity: Isolation of several cysteine-rich antimicrobial peptides from the blood of a mollusc, Mytilus edulis. J Biol Chem 271: 21808–21813.
  23. 23. Hubert F, Noel T, Roch P (1996) A member of the arthropod defensin family from edible Mediterranean mussels (Mytilus galloprovincialis). Eur J Biochem 15;240(1): 302–6.
  24. 24. Mitta G, Vandenbulcke F, Roch P (2000) Original involvement of antimicrobial peptides in mussel innate immunity. FEBS Lett 486(3): 185–90.
  25. 25. Venier P, De Pittà C, Bernante F, Varotto L, De Nardi B, et al. (2009) MytiBase: a knowledge base of mussel (M. galloprovincialis) transcribed sequences. BMC Genomics 10: 72.
  26. 26. Mitta G, Hubert F, Noël T, Roch P (1999) Myticin, a novel cysteine-rich antimicrobial peptide isolated from haemocytes and plasma of the mussel Mytilus galloprovincialis. Eur J Biochem 265(1): 71–8.
  27. 27. Mitta G, Vandenbulcke F, Hubert F, Salzet M, Roch P (2000) Involvement of mytilins in mussel antimicrobial defense. J Biol Chem 275(17): 12954–62.
  28. 28. Mitta G, Vandenbulcke F, Noël T, Romestand B, Beauvillain JC, et al. (2000) Differential distribution and defence involvement of antimicrobial peptides in mussel. J Cell Sci 113(Pt 15): 2759–69.
  29. 29. Costa MM, Dios S, Alonso-Gutierrez J, Romero A, Novoa B, et al. (2009) Evidence of high individual diversity on myticin C in mussel (Mytilus galloprovincialis). Dev Comp Immunol 33(2): 162–70.
  30. 30. Berkowitz BA, Bevins CL, Zasloff MA (1990) Magainins: a new family of membrane-active host defense peptides. Biochem Pharmacol 39(4): 625–9.
  31. 31. Li P, Chan HC, He B, So SC, Chung YW, et al. (2001) An antimicrobial peptide gene found in the male reproductive system of rats. Science 291(5509): 1783–5.
  32. 32. Cellura C, Toubiana M, Parrinello N, Roch P (2007) Specific expression of antimicrobial peptide and HSP70 genes in response to heat-shock and several bacterial challenges in mussels. Fish Shellfish Immunol 22(4): 340–50.
  33. 33. Li H, Toubiana M, Monfort P, Roch P (2009) Influence of temperature, salinity and E. coli tissue content on immune gene expression in mussel: results from a 2005–2008 survey. Dev Comp Immunol 33(9): 974–9.
  34. 34. Li H, Venier P, Prado-Alvárez M, Gestal C, Toubiana M, et al. (2010) Expression of Mytilus immune genes in response to experimental challenges varied according to the site of collection. Fish Shellfish Immunol 28(4): 640–8.
  35. 35. Venier P, Varotto L, Rosani U, Millino C, Celegato B, et al. (2011) Insights into the innate immunity of the Mediterranean mussel Mytilus galloprovincialis. BMC Genomics 12: 69.
  36. 36. Pallavicini A, Costa MM, Gestal C, Dreos R, Figueras A, et al. (2008) High sequence variability of myticin transcripts in hemocytes of immune-stimulated mussels suggests ancient host-pathogen interactions. Dev Comp Immunol 32(3): 213–26.
  37. 37. Parisi MG, Li H, Toubiana M, Parrinello N, Cammarata M, et al. (2009) Polymorphism of mytilin B mRNA is not translated into mature peptide. Mol Immunol 46(3): 384–92.
  38. 38. Mitta G, Hubert F, Dyrynda EA, Boudry P, Roch P (2000) Mytilin B and MGD2, two antimicrobial peptides of marine mussels: gene structure and expression analysis. Dev Comp Immunol 24(4): 381–93.
  39. 39. Sonthi M, Toubiana M, Pallavicini A, Venier P, Roch P (2011) Diversity of Coding Sequences and Gene Structures of the Antifungal Peptide Mytimycin (MytM) from the Mediterranean Mussel, Mytilus galloprovincialis. Mar Biotechnol (NY) 13: 857–867.
  40. 40. Yang YS, Mitta G, Chavanieu A, Calas B, Sanchez JF, et al. (2000) Solution structure and activity of the synthetic four-disulfide bond Mediterranean mussel defensin (MGD-1). Biochemistry 39(47): 14436–47.
  41. 41. Roch P, Yang Y, Toubiana M, Aumelas A (2008) NMR structure of mussel mytilin, and antiviral-antibacterial activities of derived synthetic peptides. Dev Comp Immunol 32(3): 227–38.
  42. 42. Pantzartzi C, Drosopoulou E, Yiangou M, Drozdov I, Tsoka S, et al. (2010) Promoter complexity and tissue-specific expression of stress response components in Mytilus galloprovincialis, a sessile marine invertebrate species. PLoS Comput Biol 6(7): e1000847.
  43. 43. Aceto S, Formisano G, Carella F, De Vico G, Gaudio L (2011) The metallothionein genes of Mytilus galloprovincialis: genomic organization, tissue expression and evolution. Mar Genomics 4(1): 61–8.
  44. 44. Romero A, Estévez-Calvar N, Dios S, Figueras A, et al. (2011) New insights into the apoptotic process in mollusks: characterization of caspase genes in Mytilus galloprovincialis. PLoS One 6(2): e17003.
  45. 45. Gracey AY, Chaney ML, Boomhower JP, Tyburczy WR, Connor K, et al. (2008) Rhythms of gene expression in a fluctuating intertidal environment. Curr Biol 18(19): 1501–7.
  46. 46. Banni M, Negri A, Mignone F, Boussetta H, Viarengo A, et al. (2011) Gene expression rhythms in the mussel Mytilus galloprovincialis (Lam.) across an annual cycle. PLoS One 6(5): e18904.
  47. 47. Lockwood BL, Somero GN (2011) Transcriptomic responses to salinity stress in invasive and native blue mussels (genus Mytilus). Mol Ecol 20(3): 517–29.
  48. 48. Manfrin C, Dreos R, Battistella S, Beran A, Gerdol M, et al. (2011) Mediterranean mussel gene expression profile induced by okadaic acid exposure. Environ Sci Technol 44(21): 8276–83.
  49. 49. Craft JA, Gilbert JA, Temperton B, Dempsey KE, Ashelford K, et al. (2010) Pyrosequencing of Mytilus galloprovincialis cDNAs: Tissue-Specific Expression Patterns. PLoS One 5(1): e8875.
  50. 50. Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci 74: 5463–67.
  51. 51. Thomas RK, Nickerson E, Simons JF, Jänne PA, Tengs T, et al. (2006) Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nat Med 12(7): 852–5.
  52. 52. Wang C, Mitsuya Y, Gharizadeh B, Ronaghi M, Shafer RW (2007) Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res 17(8): 1195–201.
  53. 53. Benaglio P, Rivolta C (2010) Ultra high throughput sequencing in human DNA variation detection: a comparative study on the NDUFA3-PRPF31 region. PLoS One 5(9): e13071.
  54. 54. Lazzaro BP, Rolff J (2011) Immunology. Danger, microbes, and homeostasis. Science 332(6025): 43–4.
  55. 55. Schulenburg H, Kurtz J, Moret Y, Siva-Jothy M (2009) Introduction. Ecological immunology. Philos Trans R Soc Lond B Biol Sci 364: 3–14.
  56. 56. Gao B, Zhu S (2010) Identification and characterization of the parasitic wasp Nasonia defensins: positive selection targeting the functional region? Dev Comp Immunol 34(6): 659–68.
  57. 57. Zhang Y, He X, Li X, Fu D, Chen J, et al. (2011) The second bactericidal permeability increasing protein (BPI) and its revelation of the gene duplication in the Pacific oyster, Crassostrea gigas. Fish Shellfish Immunol 30(3): 954–63.
  58. 58. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 11; 22(22): 4673–4680.
  59. 59. Rozen S, Skaletsky HJ (2000) Primer3 on the WWW for general users and for biologist programmers. In: Krawetz S, Misener S, editors. Bioinformatics Methods and Protocols: Methods in Molecular Biology. pp. 365–386. Humana Press, Totowa, NJ.
  60. 60. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17): 3389–402.
  61. 61. Binladen J, Gilbert MT, Bollback JP, Panitz F, Bendixen C, et al. (2007) The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. PLoS One 2(2): e197.
  62. 62. Parameswaran P, Jalili R, Tao L, Shokralla S, Gharizadeh B, et al. (2007) A pyrosequencing tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing. Nucleic Acids Res 35: 130.
  63. 63. Breslauer KJ, Frank R, Blöcker H, Marky LA (1986) Predicting DNA duplex stability from the base sequence. Proc Natl Acad Sci 83(11): 3746–50.
  64. 64. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ (2009) Jalview Version 2-a multiple sequence alignment editor and analysis workbench. Bioinformatics 25(9): 1189–1191.