Major ampullate silk gland transcriptomes and fibre proteomes of the golden orb-weavers, Nephila plumipes and Nephila pilipes (Araneae: Nephilidae)

Natural spider silk is one of the world’s toughest proteinaceous materials, yet a truly biomimetic spider silk is elusive even after several decades of intense focus. In this study, Next-Generation Sequencing was utilised to produce transcriptomes of the major ampullate gland of two Australian golden orb-weavers, Nephila plumipes and Nephila pilipes, in order to identify highly expressed predicted proteins that may co-factor in the construction of the final polymer. Furthermore, proteomics was performed by liquid chromatography tandem-mass spectroscopy to analyse the natural solid silk fibre of each species to confirm highly expressed predicted proteins within the silk gland are present in the final silk product. We assembled the silk gland transcriptomes of N. plumipes and N. pilipes into 69,812 and 70,123 contigs, respectively. Gene expression analysis revealed that silk gene sequences were among the most highly expressed and we were able to procure silk sequences from both species in excess of 1,300 amino acids. However, some of the genes with the highest expression values were not able to be identified from our proteomic analysis. Proteome analysis of “reeled” silk fibres of N. plumipes and N. pilipes revealed 29 and 18 proteins, respectively, most of which were identified as silk fibre proteins. This study is the first silk gland specific transcriptome and proteome analysis for these species and will assist in the future development of a biomimetic spider silk.


Introduction
Spider silk is an outstanding proteinaceous fibre that outperforms other natural and synthetic fibres in tensile strength analyses. In addition to fibre strength, spider silks can also be tough, lightweight, highly extensible/flexible, biodegradable and stable across a broad temperature range [1]. Studies have also demonstrated the biocompatibility of spider silk: spider silk can be implanted into living tissue without eliciting an immune response [2,3]. The potential of all a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 related cob-weavers [31][32][33]; and Correa-Garhwal et al. examined silk expression in male spiders of the family Theridiidae [34].
In Queensland, Australia, Nephila plumipes (Latreille, 1804) and Nephila pilipes (Fabricius, 1793) are commonly encountered golden orb-weaving species [35]. Studies on N. plumipes have reported on the mechanical properties of their silk, the relationship between protein secondary structure and primary amino acid sequence, populations in the urban environment, and copulation behaviour, however, no transcriptome data is available on the MA gland of this species, and only a handful of silk sequences are available for N. pilipes in online databases [36][37][38][39][40][41][42]. In this study, we report on a silk-gland specific transcriptome analysis for these golden orb-weaving species, N. plumipes and N. pilipes. Furthermore, proteomic analysis of silk fibres from these species was undertaken to compare predominant proteins within the silk fibre to predominant proteins expressed within the MA gland transcriptome. This study found that the silk gland transcriptome of N. plumipes and N. pilipes could be assembled into contiguous sequences, and proteome analysis of "reeled" silk fibres could confirm, and be used to mine for, spidroins within the transcriptomes. Novel proteins, which may be important constituents in the structure of spider silk, were also discovered in the silk proteome.

Animals and preparation of RNA
Golden orb spiders of the genus Nephila plumipes and Nephila pilipes were collected from the Sunshine Coast (26˚41'43.1"S 153˚05'56.7"E) and the Cooloola Coast (25˚54'02.8"S 1530 5'25.6"E) regions of Queensland, Australia, between the months of March and June 2013, and February and April 2016. Specimens were dissected immediately after sacrifice and each major ampullate gland were removed and either stored in RNAlater (Invitrogen) or immediately frozen in liquid nitrogen.
RNA was isolated from a pair of MA glands from an individual spider from each species using two different methods; the PicoPure RNA Isolation kit (Arcturus) and TRIzol Reagent (Invitrogen), according to manufacturer's instructions, with the following changes: For the PicoPure RNA extraction, an additional RNA purification step was performed using LiCl precipitation. Total RNA (30 μL) was divided into two aliquots for each species, and volume restored to 30 μL with 15 μL of RNase-free water so that samples were in duplicate. All RNA samples were incubated in 2.5 M LiCl (9.4 μL of 8 M LiCl) and 2.5 volumes (98.5 μL) of 100% EtOH for 1.5 h at -20˚C. After incubation, the RNA was pelleted at 20,000 x g for 20 min at 4˚C and the supernatant discarded. The pellet was washed with 70% EtOH and spun at 20,000 x g for 10 min at 4˚C. The pellet was dried for 10 min at 37˚C. The RNA concentration of a second aliquot from each species was estimated spectrophotometrically (NanoDrop 2000) after rehydration with 30 μL RNase-free water to ensure OD 260 /OD 280 range was between 1.8 and 2.0. Dried samples of the first aliquot were stored at -80˚C until shipment to BGI (China) for de novo RNA sequencing and bioinformatics.
For the TRIzol RNA extraction, all centrifugation steps were performed at 12,000 rpm at 4˚C. RNA concentration and purity was estimated spectrophotometrically (NanoDrop 2000) to ensure OD 260 /OD 280 range was between 1.8 and 2.0. Resuspended RNA was stored at -80˚C until shipment to AGRF (Australia) for de novo RNA sequencing.

Next-generation sequencing, assembly and annotation
Total RNA from paired MA glands from individuals of both spider species were provided to the BGI for de novo RNA sequencing and bioinformatics using Illumina HiSeq 2000, and to the AGRF for library construction and paired-end sequencing using an Illumina HiSeq 2500 platform. Raw sequences were assembled into contigs using the Genomic CLC Workbench 9 software (default settings). Protein-coding regions were determined using the open reading frame (ORF) predictor [http://bioinformatics.ysu.edu/tools/OrfPredictor.html]. Blast2GO was utilised for functional annotation of protein-coding regions against the NCBI nr database [43]. Relative expression of genes in the transcriptome was determined based on reads per kilobase of transcript per million mapped reads (RPKM) values, utilizing the de novo RNA-seq CLC Genomic Workbench 9 software: transcripts per kilobase million mapped reads (TPM) are also reported.
Spidroin sequences of the genus Nephila were obtained from NCBI, compiled and used in a BLASTp search to identify homologous proteins derived from the N. plumipes and N. pilipes gland transcriptomes. Matches were manually assessed to determine conservation. Further, a "spidroin-like" database was created by examining the six translated nucleotide reading frames for the following spidroin-like amino acid motifs: AAAAA, GGYGG, GYGPG, GQQGP, and GAGAGG. Finally, CD-search [https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi?] was utilised to identify specific hits to spidroin protein domains and superfamilies.

Spider silk preparation for proteomics
Spider silk threads were obtained by hand-reeling silk straight from live N. plumipes and N. pilipes spiders. Proteins were extracted from the silk by homogenisation in 100 μL protein extraction buffer (7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 65 mM DTT), to which 100 μL of 30 mM Tris (pH 7.5) was added. Homogenate was vortexed and pulse-centrifuged several times to mix the dissolution buffer and Tris solution. The homogenate was vortexed for 15 min, pulse centrifuged, and then incubated in a sonicating water bath at ambient temperature for 20 min. Undissolved substances were pelleted by centrifugation at 12,000 x g for 8 min, and the supernatant containing the dissolved proteins were collected and stored at -80˚C.

Sodium dodecyl sulphate-polyacrylamide gel electrophoresis (SDS-PAGE) and in-gel trypsin digestion
The spider silk proteins were separated by SDS-PAGE using a 4-20% polyacrylamide gradient gel (Amersham ECL Gel, GE Healthcare Life Sciences) according to manufacturer's instructions. Samples were prepared 1:1 with 2x SDS sample buffer and incubated at 95˚C for 5 min, then cooled to room temperature prior to pulse centrifugation. Samples were run for 1 h at 160 V (45 mA). Separated proteins were visualised by Coomassie Brilliant Blue (Bio-Rad) according to the staining process recommended by GE Healthcare. Upon completion of electrophoresis the proteins were precipitated with fixing solution [400 mL of EtOH, 100 mL of acetic acid, 500 mL of distilled water (DI)] for 30 min, followed by immersion in staining solution (1 tablet of PhastGel Blue-R 350, 400 mL of destaining solution; 250 mL of EtOH, 80 mL of acetic acid, 670 mL DI) for 10 min. The gel was subsequently preserved by immersion in preserving solution (25 mL of (87% v/v) glycerol with DI, 225 mL destaining solution) for 30 min. Protein sizes were estimated using a Pierce Blue Molecular Weight marker (Thermo Scientific).
Excised protein bands were washed in 50 mM NH 4 HCO 3 at room temperature for 5 min, and then de-stained by incubating gel pieces for 30 min in 50 mM NH 4 HCO 3 in 30% acetonitrile in a sonication water bath to remove Coomassie Brilliant Blue. Gel pieces were subsequently collected by in-gel trypsin digestion using the method described elsewhere [44]. The samples were reconstituted in 0.1% formic acid and stored at -20˚C until mass spectroscopy.

NanoLC tandem TripleTof MS/MS analyses and protein identification
The spider silk extracts were analysed by LC-MS/MS on a Shimadzu Prominence Nano HPLC (Japan) coupled to a Triple ToF 5600 mass spectrometer (ABSCIEX, Canada) equipped with a nano electrospray ion source. Each extract (7 μL) was injected onto a 50 mm x 300 μm C18 trap column (Agilent Technologies, Australia) at 30 μL/min. The samples were de-salted on the trap column for 5 minutes using 0.1% formic acid (aq) at 30 μL/min. The trap column was then placed in-line with the analytical nano HPLC column, a 150 mm x 75 μm 300SBC18, 3.5 μm (Agilent Technologies, Australia) for mass spectrometry analysis. Linear gradients of 1-40% solvent B over 35 min at 300 nL/minute flow rate, followed by a steeper gradient from 40% to 80% solvent B in 5 min were used for peptide elution. Solvent B was held at 80% for 5 min for washing the column and returned to 1% solvent B for equilibration prior to the next sample injection. Solvent A consisted of 0.1% formic acid (aq) and solvent B contained 90/10 acetonitrile/ 0.1% formic acid (aq). The ionspray voltage was set to 2400V, declustering potential 100V, curtain gas flow 25, nebuliser gas 1 (GS1) 12 and interface heater at 150˚C. The mass spectrometer acquired 500 ms full scan TOF-MS data followed by 20 by 50 ms full scan product ion data in an Information Dependant Acquisition mode. Full scan TOF-MS data was acquired over the mass range 350-1800 and for product ion ms/ms 100-1800. Ions observed in the TOF-MS scan exceeding a threshold of 100 counts and a charge state of +2 to +5 were set to trigger the acquisition of product ion, MS/MS spectra of the resultant 20 most intense ions. The data was acquired and processed using Analyst TF 1.5.1 software (AB SCIEX, Concord, Canada).
The LC-MS/MS data were imported to the PEAKS studio (Bioinformatics Solutions Inc., Waterloo, ON, Canada, version 7.0) with the assistance of MS Data Converter (Beta 1.3, http:// sciex.com/software-downloads-x2110). The database search included our own Nephila sp. transcriptome-derived protein databases, our "spidroin-like" database made by motif-searching the six translated nucleotide reading frames, and non-redundant protein databases (Gen-Bank and UniProt). De novo sequencing of peptides, database search and characterising specific PTMs were used to analyse the raw data; false discovery rate (FDR) was set to 1%, and [-10 Ã log(p)] was calculated accordingly where p is the probability that an observed match is a random event. The PEAKS used the following parameters: (i) precursor ion mass tolerance, 0.1 Da; (ii) fragment ion mass tolerance, 0.1 Da (the error tolerance); (iii) tryptic enzyme specificity with two missed cleavages allowed; (iv) monoisotopic precursor mass and fragment ion mass; (v) a fixed modification of cysteine carbamidomethylation; and (vi) variable modifications including lysine acetylation, deamidation on asparagine and glutamine, oxidation of methionine and conversion of glutamic acid and glutamine to pyroglutamate.

Results and discussion
Nephila plumipes and N. pilipes ( Fig 1A) were collected and the MA glands were removed ( Fig  1B) for RNA isolation and sequencing. MA gland reference transcriptomes were constructed for each species by combining next-generation sequence (NGS) data from sequencing runs produced in 2013, with data produced in 2016 (GenBank Accession: SRR6747912, SRR67 47911). The combined MA gland transcriptomes for N. plumipes and N. pilipes produced 42,351,802 and 46,060,170 total paired reads, respectively. Paired reads were assembled into 69,812 contiguous sequences (contigs) for N. plumipes with an average length of 685 nucleotides, and into 70,123 contigs for N. pilipes with an average nucleotide length of 672. The data returned from ORF prediction were 67,862 and 67,942 sequences, and Blast2Go annotation allowed for the annotation of approximately 25% and 29% of all transcripts plus identification of 48 and 35 spidroin contigs, for N. plumipes and N. pilipes, respectively. The total spidroin count for each species was increased to 73 and 60 spidroins for N. plumipes and N. pilipes (Table 1 and 2), upon mining for spidroin sequences identified within each corresponding transcriptome-derived silk proteome (S1 and S2 Tables), by analysing the six translated nucleotide reading frames for spidroin-like repeat motifs, and by BLAST searching unique silk sequences found in the related N. clavipes genome reported by Babb et al. [27].  A recent paper reporting on the genome and tissue transcriptomes of the golden-orb spider, N. clavipes, identified 28 spidroins [27]. Our transcriptomes included partial sequences with homology to several of the unique spidroins found in this closely related species, including AgSp-a, AgSp-c, Sp-5803, Sp8175, Sp74867 and, MaSp-c, -d -g and -h (Table 1 and 2). In N. plumipes, our study uncovered numerous matches to the N. clavipes spidroin Sp-907, potentially non-overlapping contigs aligning to different regions on the same gene. In N. pilipes, our study found sequences with homology to MiSp-a and -d, and Sp-1339. Our longest assembled spidroin contigs in both species matched to the N. clavipes the glue-like aggregate spidroin, AgSp-c. Aggregate spidroins, which form evenly-spaced droplets along flagelliform prey capture threads, have also been characterised in three other species from the family Araneidae, and three species from the family Theridiidae [45,46], These spidroins vary greatly in length among species and it appears we have recovered a full-length aggregate spidroin (1609 aa) from N. pilipes. However, aggregate spidroins were not evident in the proteome of either species. The N. pilipes and N. plumipes silk proteins were obtained after several predominant bands were excised from the Coomassie stained SDS-PAGE gel (Fig 2), followed by trypsin digestion and LC-MS/MS analysis. This analysis identified 29 and 18 proteins for N. plumipes and N. pilipes (Table 3 and 4), respectively (!20.00 -10lgP, ! 2 peptide matches). For the silk proteins of N. plumipes mapped to the transcriptome, 24 were spidroins, mostly MaSp1-like spidroins, MiSp-like and novel N. clavipes spidroin, Sp-907, confirming the abundance of this spidroin in the transcriptome. Besides spidroins, a cuticle protein and a coiled-coil domain-containing protein were identified in the proteome along with two proteins which could not be identified. Cuticle proteins have been described previously in the MA gland [7,47]. The coiled-coil domain protein found within the transcriptome and proteome of N. plumipes is potentially interesting as coiled-coil structures are also found to form in the silk of the Japanese yellow hornet, Vespa simillima [48]. N. pilipes silk proteins accounted for 13 of 18 proteins mapped back to the corresponding transcriptome. Again, most of these spidroins were MaSp-like, one MiSp and the remaining 5 proteins were not able to be annotated at this stage.
Beyond the matches made to the transcriptome, from the silk of N. plumipes and N. pilipes a further 2,420 and 2,658 de novo only peptides, respectively, were identified with high confidence (average local confidence above 70) that did not match any sequences within the transcriptome. This de novo dataset of unmatched potential proteins was BLASTed against NCBI protein databases, however the focus in this study is on those proteins relevant to their corresponding transcriptomes. Unlike a genome, a transcriptome can only provide us with genes transcribed at the time of RNA isolation and this may partly explain the discrepancy between matched and de novo peptides. A further explanation is that the hand-reeled silk also contains silk from different silk glands. Each silk gland ends at its own spigot on the surface of a spinneret. It is possible that as the silk thread passes past other spigots during collection, it also collects fibres from other silk glands. However, the spigots closest to the major ampullate spigot on the anterior lateral spinnerets produce pyriform spidroins and there was no evidence of these spidroins in the proteome [4]. Further, our gel-based extraction method might have missed proteins with relatively low molecular weight or low abundance, such as the cysteine-  [8]. Quantitative analyses were undertaken and based on reads per kilobase of transcript per million mapped reads (RPKM) values by mapping the 2013 and 2016 data back to a combined de novo reference transcriptome. The 50 most highly expressed sequences of N. pilipes and N. plumipes were manually selected for further annotation (Tables 5 and 6). These abundant sequences were matched to sequences found in the NCBI or Uniprot public protein databases (accessed Oct-Dec 2017). Spidroins were, as expected, among the most highly expressed   sequences of both datasets, numbering 23 and 26 spidroins for N. plumipes and N. pilipes, respectively. In both species, major and minor ampullate, and tubuliform spidroins were highly expressed in the MA gland. Interestingly, the N. plumipes sequence with the highest RPKM value could not be characterised based on BLAST protein prediction. Uncharacterised highly expressed sequences will be selected for functional annotation in future works. This study found the MA gland alone produces six of the seven classes of silk products: MA, minor ampullate, flagelliform, tubuliform (also at times referred to as cylindriform silk), aciniform and aggregate silk products. Several other studies have also found multiple spidroin types expressed in a single gland [6,7,49,50]. The only silk product not found to be produced by the MA gland of both N. pilipes and N. plumipes was pyriform adhesive silk, which is used to attach threads to objects and to each other [51]. The processing duct of the pyriform gland is shorter than most other ducts suggesting other silks require more extensive processing, which may explain why this silk is absent from the MA gland transcriptome. However, pyriform products are the least intensively studied of the spider silk repertoire, and the lack of pyriform annotation in our MA databases may be a reflection of poor representation in the public databases at the present time [51,52].
Interestingly, in both N. plumipes and N. pilipes, tubuliform spidroins were found to be more highly expressed in the 2016 MA gland transcriptomes yet not expressed in the 2013 transcriptomes (see Table 1 and 2). Tubuliform silk is produced during reproduction for the formation of egg sacs [4,53]. While no spiders were gravid at the time of dissection, it is possible they were collected and dissected just after the production of an egg sac, or just prior to vitellogenesis, and the 2016 transcriptomes reflect this in their relatively high expression of tubuliform silk transcripts. Expression of tubuliform spidroins in the MA gland has been previously noted in transcriptomic studies [50]. Vasanthavada et al. suggest that spiders can downregulate the production of various silks to maintain MA spidroin synthesis as an energetic trade-off, and Larracas et al. suggest that female spiders may shift synthesis of MA gland spidroins to tubuliform spidroins during the reproductive stage [6,54,55]. Our study did not find tubuliform spidroins in the silk proteome, however, the silk was collected and digested at the same time as the 2013 transcriptomes. It would be interesting to see if tubuliform spidroins Proteins from the corresponding transcriptome with 2 or more peptide matches were BLAST annotated (E-value cut-off 10 −3 ). Example matching peptides are shown (full list, see S3 Table). PTM, posttranslational modifications.
https://doi.org/10.1371/journal.pone.0204243.t003   could be found within the dragline silk of spiders prior to, during, or just post egg sac production. This study is the first silk gland-specific transcriptome and proteome analysis in these Australian golden orb-weaving species. Major ampullate transcriptome analysis procured sequences for all silk types thus far known for golden orb spiders with the exception of pyriform adhesive silk. We found differential expression of tubuliform silk in the MA gland, suggesting a greater role for this gland producing tubuliform silks during spider reproduction. The silk proteome analysis resulted in 29 and 18 proteins for N. plumipes and N. pilipes that match to their corresponding MA gland transcriptomes.
Supporting information S1