An Insight into the Sialome of the Lone Star Tick, Amblyomma americanum, with a Glimpse on Its Time Dependent Gene Expression

Hard ticks feed for several days or weeks on their hosts. Blood feeding is assisted by tick saliva, which is injected in the host skin regularly, alternating with blood ingestion. Tick saliva contains hundreds or thousands of different peptides and other bioactive compounds that assist feeding by inhibiting their hosts’ blood clotting, platelet aggregation, vasoconstriction, as well as pain and itching. Immunomodulatory and antimicrobial peptides are also found in tick saliva. Molecular characterization of tick salivary compounds, or its sialome (from the Greek sialos = saliva), helps identification of possible antigens that might confer anti-tick immunity, as well as identifying novel pharmacologically active compounds. Amblyomma americanum is a major nuisance tick in Eastern and Southern US, being a vector of Theileria and Ehrlichia bacteria to animals and humans. Presently we report an RNAseq study concerning the salivary glands of adult female A. americanum ticks, which involved sequencing of four libraries collected at different times of feeding. A total of 5,792 coding sequences were deduced from the transcriptome assembly, 3,139 of which were publicly deposited, expanding from the previously available 146 salivary sequences found in GenBank. A remarkable time-dependent transcript expression was found, mostly related to secretory products, supporting the idea that ticks may have several “sialomes” that are expressed at different times during feeding. The molecular nature of this sialome switching remains unknown. The hyperlinked spreadsheet containing the deduced coding sequences can be found at http://exon.niaid.nih.gov/transcriptome/Amb_americanum/Ambame-web.xlsx.


Introduction
Hard ticks feed for several days or weeks on their vertebrate hosts, acquiring a very large meal relative to their size, which may be over 100 fold their pre-engorgement weight. Tick feeding is usually divided into three phases associated with attachment, slow feeding and fast feeding phases [1]. Tick attachment involves the penetration of the skin by the chelicera and secretion of salivary cement that helps to fix the tick at the feeding site. Tick saliva also assists feeding through its complex mixture of compounds that disarms their hosts' hemostasis, which includes platelet aggregation, vasoconstriction and blood clotting, as well as their innate and acquired immunity arms of inflammation in the form of salivary anticomplement and lymphocyte inhibitors, among other activities; antimicrobial peptides have also been found in tick saliva, perhaps controlling microbial infection at the tick bite and in the ingested meal [2][3][4]. Tick saliva composition as revealed by sialotranscriptomes (from the Greek, sialo = saliva) indicates the existence of over 1,000 putative secreted peptides grouped into dozens of protein families [3]. It appears that during the prolonged feeding period of ticks, which may last for weeks, different members of related proteins are expressed at distinct times thus possibly avoiding their hosts' immune response [5].
The tick Amblyomma americanum is a three host tick, widespread in the Eastern and Southeastern United States, being a major nuisance and confirmed vector of tularemia and ehrlichiosis to humans and animals, and Theileria cervi to white deer [6,7]. Amblyomma americanum bites additionally are a possible cause of IgE antibody responses to α-gal in the southeastern United States [8]. The current known distribution of delayed anaphylactic reactions to red meat is similar to known A. americanum distribution. Previously, a sialotranscriptome of this tick has been reported consisting of 3,869 expressed sequence tags (EST's) from which 142 coding sequences (CDS) have been deposited to GenBank [9]. We presently report an extension of this sialome by assembling over 344 million Illumina sequences totaling over 34 billion nucleotides deriving from four salivary gland cDNA libraries made from A. americanum adult female ticks that were unfed, or fed for 12-48 h, 72-144 h and 7-11 days, allowing identification of the changing profile of transcription in salivary glands as feeding progresses. We added 3,139 deduced protein sequences to the Transcriptome Shotgun Annotation (TSA) portal of the National Center for Biotechnology Information (NCBI), providing for a discovery platform for proteomic studies that will serve to mine proteins of antigenic or pharmacological interest. We additionally describe the remarkable change in transcription levels of hundreds of transcripts according to time of feeding, most of a secreted nature, and discuss the possible mechanisms of sialome switching in feeding ticks.

Ethics statement
All animal experiments were performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. The protocol of tick blood feeding on the sheep was approved by the Institutional Animal Care and Use Committee of the University of Southern Mississippi (protocol # 10042001). All efforts were made to minimize animal suffering. room temperature and 90% relative humidity under 14/10 hour light/dark photoperiod before infestation on sheep. Ticks were fed on sheep and were either allowed to feed to repletion or removed between 12-264 hours, depending on the experimental protocol. Adult ticks were blood-fed specifically for this study and animal studies were performed in accordance with protocols approved by the Institutional Animal Care and Use Committee (IACUC) at the University of Southern Mississippi.

Tick salivary glands (SG) preparation
The unfed and blood-fed (12,18,24,36,48,72,120,144,168,192,216, 264 hrs post attachment) female adult ticks were dissected within four hours of after removal from the sheep. Tick salivary glands were dissected in ice-cold M-199 buffer [11,12]. After removal, salivary glands were washed gently in the same ice-cold buffer. The dissected SGs were stored immediately after dissection in RNAlater (Invitrogen, Carlsbad, CA, USA) prior to extracting mRNA.

RNA Preparation
Total RNA was isolated from salivary glands dissected from unfed and partially blood-fed adult female ticks using illustra RNAspin Mini RNA isolation kit (GE Healthcare, NJ, USA) following the manufacturer's protocol. The quality of the RNA samples was confirmed by labon-chip analysis using the 2100 Bioanalyzer (Agilent Technologies, Inc. Santa Clara, CA, USA). The total RNA quantity was determined by a Nanodrop, and the total RNA samples (A260/280> 1.

Illumina Sequencing
Total RNA samples were submitted to Otogenetics Corporation (Norcross, GA USA) for RNA-seq assays. Total RNA was treated with the Ambion GLOBINclear-Human Kit to deplete globin mRNA that may have resulted from the blood meal; the Clontech SmartPCR cDNA kit (Clontech Laboratories, Inc, Mountain View, CA USA, Cat#634926) was used to generate 1-2 μg of cDNA from 100 ng of total RNA, following the manual instructions that included 6 min at 68°C followed by 10 min at 70°C during the 14 cycles of the PCR reaction. Restriction digestion was used to remove adaptor sequences and the resulting cDNA was fragmented using Covaris (Covaris, Inc., Woburn, MA USA), profiled using Agilent Bioanalyzer, and subjected to Illumina library preparation using NEBNext reagents (New England Biolabs, Ipswich, MA USA). Agilent Bioanalzyer 2100 was used to assess the quality, quantity, and size distribution of the Illumina libraries. The libraries were then submitted for Illumina HiSeq2000 sequencing according to the standard operation. Paired-end 90 or 100 nucleotide reads were generated and checked for data quality using FASTQC (Babraham Institute, Cambridge, UK). The number of paired ended reads of 100 nt in length for each of the four libraries was: Group I, 106,249,828; Group II 50,190,144; Group III, 99,582,182 and Group IV, 88,887,224 reads. Sequence reads were deposited to the NCBI under Bioproject PRJNA218793, Biosample SAMN02352759 and read files SRR1740607 (unfed), SRR1740608 (12-48h), SRR1740609 (72-144 h) and SRR1740611 (7-11 days).

Bioinformatics Analysis
Assembly of all reads was done using the assemblers Abyss and Soapdenovo-Trans with every other kmer (-k program switch) parameter from 17 to 85 [13][14][15][16][17]. Resulting contigs were reassembled by a pipeline of blastn and cap3 assembler [18] as described earlier [19]. Coding sequences were extracted based on blastx [20] results deriving from several database matches, including a subset of the non-redundant protein database of the National Center for Biotechnology Information (NCBI) containing tick and other invertebrate sequences, as well as the Swissprot and Gene Ontology (GO) databases. The longest open reading frame was also extracted if it had a signal peptide indicative of secretion as evaluated by version 3.0 of the Sig-nalP program [21]. Reads from the four libraries were mapped back into the CDS by blastn with a word size of 25 and allowing one gap. Reads were mapped up to a maximum of five different CDS if the blast scores were the same for all matches. Read number differences for each CDS between libraries were compared by a X 2 test. For heat map display [22] of the CDS temporal expression, the number of reads for each treatment was normalized by multiplying it to the grand total number of reads deriving from all libraries and dividing the product by the total number of reads of the particular library, zeroes were replaced by one, then each row of data was average normalized, and then log(10) transformed. Heatmaps were produced with the programs gplots and heatmap.2 using R [23]. Differential gene expression clustering was done with the program Expander version 6.5 [24], using as input read fragments per thousand nucleotides per million (FPKM) data and the click1 algorithm. More details of the input are available in the results section.
All coding sequences and their reads are available for browsing in the supporting S1 File which also contains hyperlinks to several databases, as explained previously [19,25]. FPKM were calculated for each library [26]. Deduced coding sequences and their translations were deposited at DDBJ/EMBL/GenBank under the accession GBZX00000000. The version described in this paper is the first version, GBZX01000000.

Results and Discussion
Assembly of the 344,909,378 reads from 4 libraries allowed extraction of 5,792 CDS that mapped 143,158,375 reads. Following blast and rpsblast comparisons to several databases, these CDS were classified into 27 categories revealing 49% of the total reads mapping to CDS associated with putative secreted proteins followed by 18% of reads mapping to CDS associated with the protein synthesis machinery, a common finding in previous tick sialotranscriptomes (Table 1). Transposable elements and bacterial/algal sequences were also identified. The secreted CDS group was further classified into 8 major subdivisions as indicated in Table 2. Noticeable are the abundance of members of the Lipocalin group (200 CDS and 27% of the reads of the secreted class) as well as of the Kunitz group (over 100 CDS and > 8% of the reads). The subclass of Glycine rich proteins, which includes cement proteins, contains 46 CDS mapping 16% of the reads. Several previously orphan protein families were deorphanized and novel protein families were discovered. Consult previous publications for further information on the different protein families described in Table 2 [3,19]. The hyperlinked spreadsheet containing all CDS and their classifications can be found in the supporting S1 File. Three thousand and forty one CDS and their translations were deposited in the TSA portal of the NCBI.

Differential gene expression following feeding according to RNAseq data
Comparisons within each CDS of the number of mapped reads or FPKM values deriving from each of the four libraries (made from unfed ticks -UF, and from ticks feeding for 12-48 h, 72-144 h and 11 days) allowed for determining their temporal expression patterns which can be visualized in a heat map, and also on S1 File, worksheet named DiffExp. Fig 1 displays 836 CDS that are significantly not uniformly expressed among the 4 libraries, as assessed by a X 2 test, having a minimum FPKM of 5 in at least one of the 4 time points and showing a standard deviation of 1 or more regarding the log(10) transformation of their average-normalized data (Fig 1). Notice that this heat map was made with log(10) transforms, not the usual log(2) transformations. Indeed, the difference from the blue to red in the Fig. represents 10,000 fold variation. This exquisitely high differential expression of transcripts according to time post-feeding in ticks has been also observed for Ixodes ricinus [27]. Clusterization of these 836 coding sequences with the click algorithm of the program Expander [24], using as input the FPKM data, indicates five time dependent clusters, with exclusion of 19 singleton CDS, which could not be taken into any of the 5 clusters (Fig 2). These clusters clearly define transcripts that increase their expression towards the end of the blood meal (Fig 2A), or are overexpressed in the unfed stage (Fig 2B), at 12-48 h post attachment (Fig 2C), at all stages except unfed ( Fig  2D), or are overexpressed at 72-144 h post attachment, towards the middle of the blood meal (Fig 2E).
While within the complete CDS set of 5,792 sequences, 37% belong to the secreted class ( Table 1), 79% of the differentially expressed set belongs to this class (Table 2 and Fig 3). Of note, known multi-gene families expressed in tick sialotranscriptomes such as lipocalins (Fig  4), Kunitz (Fig 5), Cystatins, metalloproteases, TIL domain containing peptides, members of the 8.9 kDa family and evasins [see [3] for description of these families] have CDS that are remarkably differentially expressed (S1 File, worksheet "DiffExp"). Two cystatin genes from I. scapularis have been shown to change their expression reciprocally during feeding and it was postulated that these changes may reflect a form of antigenic variation of tick proteins exposed to their hosts [28]. Overall, this pattern of differential expression of secreted proteins is extreme and supports the idea that it might have evolved as a mechanism of antigenic variation of salivary proteins. Regarding highly differentially expressed housekeeping proteins, we identify a sulfotransferase that is up regulated toward the end of the feeding period (Aam-16214), which might be associated with detoxification of dopamine agonists of salivary secretion [29] (S1 File, worksheet "DiffExp"). Several glucose dehydrogenase enzymes peak at the first 12-24 hours and might be associated with the use of host glucose as an initial energy source. A FMRFamiderelated neuropeptide (Aam-32932) is maximally expressed in the unfed group, decreasing its levels more than 100 x on the subsequent time periods examined. Proteins associated with apoptotic messages, as expected, increase toward the end of the blood meal [30], as do heme lipoprotein precursors and vitellogenin, and many transporters, including one aquaporin (Aam-38627), sugar transporters and several monocarboxylate transporters. Some transposable elements also show remarkable time dependent expression levels, some being most expressed in unfed and others at the 11 day-derived libraries.  Among the highly differentially expressed CDS, 14 best matched bacterial proteins, 13 of which best matching Coxiella. All these bacterial transcripts were most expressed in unfed ticks. A. americanum has been found to harbor Coxiella sp. at high prevalence [31]. Somewhat surprising was the finding of 11 CDS best matching plant or algal sequences, all of which are most expressed in the 72-144 h library. Whether these sequences represent artifactual contamination of the sample, derive from apicomplexa parasites or some symbiotic association remains to be determined. For the record, plant-like sequences were also found in our previous A. maculatum sialotranscriptome [19], such as that recorded at http://www.ncbi.nlm.nih.gov/ protein/346465067?report = genbank&log$=prottop&blast_rank=1&RID=0DKR925H013.

Invariant transcripts
We also evaluated those CDS that did not change their expression in the various libraries. For that goal, considering only transcripts with FPKM > 10, we divided each contig FPKM value for each library by the average FPKM of all libraries and determined the standard deviation (SD) of these transformed values. Forty four transcripts had a SD < 0.05 indicating remarkable time invariance among the libraries ( Table 3). None of these transcripts belong to the secreted class, and include histones, ubiquitin related proteins, transcription factors and ribosomal proteins that are normally used to normalize transcript tissue expression in qPCR experiments. Further details regarding these transcripts can be found in the worksheet named "ConstantExpression" in the S1 File. This set of results serves to validate the differentially time expressed contigs described above, and for selection of PCR primers for control transcripts when comparing transcript expression using qPCR.

Conclusions and Perspectives
This work provides for an increase from 142 to 3,281 proteins publicly deposited from A. americanum annotated as salivary proteins, thus vastly increasing the repertoire of this tick sialome that might be useful as a platform for protein discovery using mass spectrometry protocols, and to help genome annotations (gene exon/intron borders) when this tick genome is sequenced and assembled.
As found before in other time-dependent tick sialotranscriptome studies, the tick salivary secretory repertoire changes dramatically as blood feeding progresses [27,32]. It appears that ticks have several "built-in" sialomes that are selected for expression as time progresses. Indeed, many transcripts have oscillations on the order of hundreds or thousands fold. This has been proposed before [27] as a mechanism of antigenic variation, as for example, an antigen secreted on Monday would be absent Friday and substituted by a functionally similar protein with different antigenic properties. The question remains whether the sialome switch occurs purely following an internal clock, or whether it responds to a feeding stress sensor. We have previously suggested that sialome switch in A. maculatum might follow epigenetic determined variations in histone acetylation, as serendipitously observed following the knock down of the selenocysteine elongation factor that led to an enormous increase of a sirtuin transcript involved in histone acetylation, and a dramatic change in transcripts associated with secreted salivary products [33]. This hypothesis might be tested by determining the sialotranscriptome of ticks  The Sialome of A. americanum fed artificially where no feeding stressors should change with the meal, compared to transcriptomes of "in vivo" feeding controls. Gene expression patterns displayed by ticks in experiments with naïve and successively infested hosts could also help to elucidate this matter. These sialome variations thus bring challenges to applied research related to tick vaccine antigen selection, as well as to basic research related to the underlining mechanism of sialome switching in ticks.
Supporting Information S1 File. Hyperlinked Excel spreadsheet containing the analyzed coding sequences. (XLSX)