De Novo Transcriptome Analysis and Detection of Antimicrobial Peptides of the American Cockroach Periplaneta americana (Linnaeus)

Cockroaches are surrogate hosts for microbes that cause many human diseases. In spite of their generally destructive nature, cockroaches have recently been found to harbor potentially beneficial and medically useful substances such as drugs and allergens. However, genomic information for the American cockroach (Periplaneta americana) is currently unavailable; therefore, transcriptome and gene expression profiling is needed as an important resource to better understand the fundamental biological mechanisms of this species, which would be particularly useful for the selection of novel antimicrobial peptides. Thus, we performed de novo transcriptome analysis of P. americana that were or were not immunized with Escherichia coli. Using an Illumina HiSeq sequencer, we generated a total of 9.5 Gb of sequences, which were assembled into 85,984 contigs and functionally annotated using Basic Local Alignment Search Tool (BLAST), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) database terms. Finally, using an in silico antimicrobial peptide prediction method, 86 antimicrobial peptide candidates were predicted from the transcriptome, and 21 of these peptides were experimentally validated for their antimicrobial activity against yeast and gram positive and -negative bacteria by a radial diffusion assay. Notably, 11 peptides showed strong antimicrobial activities against these organisms and displayed little or no cytotoxic effects in the hemolysis and cell viability assay. This work provides prerequisite baseline data for the identification and development of novel antimicrobial peptides, which is expected to provide a better understanding of the phenomenon of innate immunity in similar species.


Introduction
Cockroaches (order: Dictyoptera; suborder: Blattaria) are among the known primitive winged insects, with an extremely high diversity of~4,000 species worldwide. Thirty of these species are considered as household insects [1]. The American cockroach Periplaneta americana (Linnaeus) is a synanthropic pest that generally inhabits cosmopolitan to urban areas. Cockroaches survive in warm weather with high moisture conditions as well as in unfavorable environments for humans (i.e., sewers and other human-made habitats) [2]. Accordingly, cockroaches physically transmit several human pathogens and allergens from the environment to human habitations [3]. However, the cockroach has also been a beneficial insect for humans, serving as an established model organism for basic research in the fields of neurobiology [4,5], cardiophysiology [6], blood clotting mechanisms [7], gut microbial diversity [8,9], and the discovery of allergenic proteins [10].
Innate immunity is the first line of defense of multicellular organisms against invading microbes such as bacteria, fungi, and viruses. Multicellular organisms thus adapt to microbes via their innate immune system through the rapid synthesis and release of various small peptides known as antimicrobial peptides (AMPs) [11].
In insects, AMPs are synthesized from the fat body and various epithelia, which are secreted into the hemolymph. Through the hemolymph, AMPs are directly supplied to the whole body in the context of microbial infection [12]. Moreover, insect autophagy also actively participates along with the innate immunity to evade the microbial infections, and these mechanisms have been extensively studied in the Drosophila model. Furthermore, the signal transaction cascade receptors such as pathogen-associated molecular patterns (PAMPs) and pattern recognition receptors (PRRs) are also activated in response to infection [13,14]. These combinatorial molecular mechanisms serve to completely protect the insect/host from microbial infection. Since the first insect AMPs were isolated from Hyalophora cecropia in 1980 [15], 259 insect AMPs have been functionally annotated and classified according to their structural and physiochemical properties [16]. Furthermore, the effects of AMPs on innate immunity, and their corresponding molecular and metabolite/peptide synthesis mechanisms differ according to their degrees of evolutionary conservation [17].
AMPs have been exploited and developed into effective antibiotic and antimicrobial drugs from a diversity of insect species [18]. In particular, Lee et al. [19] suggested that cockroaches are a good source of antimicrobial agents. They further found that the cockroach (P. americana) brain tissues showed potent broad-spectrum antimicrobial activities, including against antibiotic-resistant bacteria [20]. AMPs are low-molecular-weight and heat-stable proteins, which are typically cationic and often comprise less than 100 amino acid residues. Despite the large number of AMPs that have been identified from different insect species, little information on their potential applications is available. In general, AMPs are predicted through in silico approaches based on their derived characteristics, i.e., similarity in physiochemical and structural properties to known AMPs [10,21]. Several reports have indicated that AMPs can be expressed either constitutively or can be induced upon pathogenic challenge [22]. Alternatively, massive developments in high-throughput sequencing technologies have presented a more efficient method for genomic characterization of a species [23]. However, based on the few studies conducted to date, the genetic resources of cockroaches are scarce [1,21,24,25]. Recently, the transcriptome of the German cockroach (Blattella germanica) was reported using next-generation sequencing (NGS) technology, which led to the identification of genes that putatively encode detoxification enzyme systems, insecticide targets, key components in systematic RNA interference, and the immunity and chemoreception pathways [26]. Therefore, identifying new insect AMPs may provide insight into natural interactions between pathogens and proteins. In this present study, we sequenced the P. americana transcriptome using an NGS platform. Libraries representing control and Escherichia coli-immunized P. americana were systematically analyzed for gene expression profiles along with AMP and allergenic protein prediction. This transcriptome data set and AMPs provide a solid baseline for further functional analysis in P. americana.

NGS of the cockroach transcriptome
To obtain high-throughput transcriptome data of P. americana, we implemented Illuminabased NGS sequencing. Total RNA was isolated from E. coli-immunized (18 h after injection) and non-immunized (Control) adults. Total RNA was quantitated using a Nanodrop spectrophotometer (Thermo Scientific) and its quality was assessed with the RNA 6000 Nano assay kit (Agilent) and Bioanalyser2100 (Agilent). NGS libraries were generated from 1 μg of total RNA using TruSeq RNA Sample Prep Kit (Illumina), according to the manufacturer's protocol. In brief, the poly-A-containing RNA molecules were purified using poly-T oligo-attached magnetic beads. After purification, the total poly A+ RNA was fragmented into small pieces using divalent cations under elevated temperature. The cleaved mRNA fragments were reverse-transcribed into first-strand cDNA using random primers. Short fragments were purified with a QiaQuick polymerase chain reaction (PCR) extraction kit and resolved with elution buffer for end repair and addition of poly (A). Subsequently, the short fragments were connected with sequencing adapters. Each library was separated by adjoining distinct MID tags. The resulting cDNA libraries were then paired-end sequenced ( [28]. Genes showing a minimum of 2-fold up-and down-regulation were filtered from the isoforms expression dataset with 1 log 2 (fold-change [FC]) values, and the Gene Ontology (GO) annotations were classified using the WEGO webserver [29].

AMP prediction and classification
The deduced amino acid sequences were subjected to AMP prediction analysis by using a modified bioinformatics strategy. Peptide characteristics of molecular propensity (based on physicochemical properties) and aggregation propensity (in vitro and in vivo) were determined, and AMP prediction was established using a predefined bioinformatics strategy with parameters defined previously [30]. In addition to this previous strategy, the allergenic propensity of the peptides was also determined using Allerdictor software [31]. Finally, the AMPs were mapped with the CAMP database [32] and classified as novel and known AMPs. To classify the predicted AMPs as novel, sequences were matched to the CAMP database by using two programs: PatMatch (no mismatch) for sequences 20 bp in length [33] and BLASTP (1E-05) for sequences 20 bp in length. The BLAST results were filtered with a similarity score 90. Sequences with observed similarity at the given cutoff values were considered as known AMPs, and others were considered as novel AMPs. Finally, the novel and the known AMPs were manually validated for continuous stretches of amino acids to account for the low-complexity regions and assembly artifacts.

Peptide synthesis
All putative and novel peptides were selected based on the various prediction tools used previously [28]. The peptides were synthesized using solid-phase peptide synthesis methods at Any-Gen Co. Ltd. (Gwangju, Korea). Then, each peptide was purified to >95% by highperformance liquid chromatography, and the purity was confirmed by mass spectrometry analysis. The peptides were dissolved in acidified distilled water (0.01% acetic acid) and stored at −20°C until used in subsequent experiments.

Antimicrobial activity assay
The radial diffusion assay was performed to test the antimicrobial activities of peptides, as described previously with slight modifications [34]. In brief, bacteria and yeast strains were grown to the mid-logarithmic phase in TSB at 37°C and then washed twice with 10 mM Tris-HCl (containing 5 mM glucose, pH 7.4). A total of 4 × 10 6 CFU was added to 10 mL of an underlay agarose gel [0.03% (w/v) TSB, 1% (w/v) agarose (Sigma, USA), and 0.02% (v/v) Tween 20 (Sigma, USA) in 10 mM Tris-HCl]. The underlay gel was poured into a 100-mm INTEGRID TM Petri dish. After agarose solidification, 3-mm-diameter wells were punched and 5 μL of each peptide solution was added to each well. Buffer alone was used as a negative control. Plates were incubated at 37°C for 3 h to allow for diffusion of the peptides. The underlay gel was then covered with 10 mL of nutrient-rich agar overlay (6% TSB and 1% agarose in 10 mM Tris-HCl). The antimicrobial activity of a peptide was measured as the diameter of the cleared zone around each well after 12 h of incubation at 37°C. This experiment was repeated at least 3 times and the same results were obtained. In addition, antimicrobial activities of the peptides were also tested by broth microdilution assays against E. coli, S. aureus, and C. albicans. Briefly, microbes were grown overnight in Mueller-Hinton Broth (MHB) to the onset of the stationary phase with shaking at 200 rpm. The cultures were diluted in fresh MHB to a final concentration of 2 × 10 4 CFU/mL. A stock solution of each peptide was prepared to a concentration of 640 μg/mL in 0.01% acetic acid, and was then serially diluted two-fold to reach a concentration of 10 μg/mL. After 90-μL aliquots of the microbial suspension were dispensed into each well of a 96-well polypropylene microtiter plate, 10 μL of the peptide solution was added. The antimicrobial activities of the peptides were assessed by measuring the visible turbidity in each well of the plate after 18 h of incubation at 37°C. Minimum inhibitory concentrations (MICs) are expressed as a specific value that caused complete growth inhibition.

Hemolytic assay
This experiment was approved by the Institutional Animal Care and Use Committee (IACUC) of the National Academy of Agricultural Sciences (approval number: NAAS-1114). The hemolytic activity of the peptides was determined by monitoring the release of hemoglobin from rat erythrocytes at 540 nm. For the hemolytic assay, 20 μL of each peptide solution at a predetermined concentration was added to 180 μL of a 2.5% (v/v) suspension of rat erythrocytes in phosphate-buffered saline (PBS). Melittin (Sigma, USA), a hemolytic and α-helical peptide isolated from bee venom, was used as the positive control. This mixture was incubated for 30 min at 37°C, and 600 μL of PBS was then added to each tube. After 3 min of centrifugation at 10,000 ×g, the supernatant was removed, and the absorbance was measured at 540 nm. Evaluations were made from the results of at least three independent experiments, each carried out in triplicate.

Results and Discussion
Sequencing and transcriptome assembly The cDNA library prepared from cockroach samples was sequenced using the Illumina HiSeq TM 2000 sequencer. As a result of sequencing, 4,687,932,060 (52,088,134 reads) and 4,781,794,320 (53,131,048 reads) bases were obtained for E. coli-immunized and non-immunized cockroaches, respectively. We injected live E. coli into the hemocoel of the cockroaches for immunization, although mixtures of bacteria and fungi should have been employed for the possibility of full induction. A comparative study of differential gene expression is required in future studies to determine the effects of various elicitors. In the present study, we focused on the prediction and experimental validation of novel AMP candidates. None of the novel AMPs was identified among known AMPs that are induced by bacteria and fungi, since we excluded known AMPs after comparison with sequences from the UniProtKB database for the novel AMPs selection procedure. In addition, we employed a naïve sample as a non-immunized control to exclude the expression data of overlapping genes for calculating the maximum fold change of differential gene expression. Initially, the total reads were subjected to preprocessing, as described in the Materials and Methods, resulting in 4,302,302,163 (49,317,908 reads) and 4,380,901,481 (50,270,016 reads) bases, for an average of 91% coverage from raw sequences for the immunized and non-immunized samples, respectively (Table 1). Preprocessed sequences were taken for de novo transcriptome assembly by using CLC Assembly Cell v. 4.0. In total, 85,984 contigs were obtained from the assembly, ranging from 200-to 18,078-bp transcripts with an average of 620.8 bp (Fig 1A), which was considered as the draft reference transcriptome for P. americana.

Functional annotation of unigenes
The standardized automated software suite Pendant-Pro (Biomax Informatics) was used to annotate the transcripts. Initially, the assembled transcripts were subjected to repeat masking with a human repeat library, resulting in 44,222,145 reads (85,608 contigs), and masked sequences were subjected to Pendant-Pro with default parameters to obtain the annotations. In total, 17,744 (20.7%) sequences were annotated from 13,726 UniProt protein sequences (Table 2) and the remaining sequences were unannotated (most sequences < 300 bp were not annotated well) (Fig 1A). More than 60% of the annotated sequences were homologous to proteins from mosquitoes (Aedes aegypti, Anopheles gambiae, Culex quinquefasciatus), flies (Drosophila melanogaster, Drosophila pseudoobscura), and mammals (human and mouse) (Fig 1B). Further, the annotated transcripts were grouped into GO subcategories, i.e., biological process (BP), molecular functions (MF), and cellular components (CC), from level-2 GOs. The GO terms cell; cell part; organelle (in CC); binding, catalytic, and transcription regulators (in MF); and cellular process, metabolic process, and pigmentation (in BP) were shown to be the top 3 clusters (Fig 2).

DGE profile
To analyze the gene expression profiles of P. americana from the transcriptome data, DGE analysis was performed, as described in the Materials and Methods. In total, 2,076 (2.4%) transcripts were found to be significantly up-and down-regulated with a 2-fold change. Among these transcripts, 848 (1.0%) were up-regulated and 1,228 (1.4%) were down-regulated in the immunized condition, which were plotted in a histogram based on the GO categories (S1 Fig).
In silico analysis of allergens and AMPs from P. americana Isolation of AMPs from insects has been one of most effective and promising strategies in the development of antimicrobial drugs [18]. For the most part, AMPs have been predicted through computational rather than experimental methods. The primary goal of this study was to predict the AMPs from the transcripts of P. americana and validate these predictions experimentally. In total, 86 AMPs were predicted to be novel AMPs (Table 3 and S1 Table), 72 were identified as putative AMPs (S2 Table), as defined in the Materials and Methods, and 180  proteins were predicted as allergens (S3 Table). Both the novel and putative AMPs were identified as non-allergenic peptides and are listed in Table 3. Among the putative AMPs, 54 are known to function as antibacterials, 5 are antifungals, and 3 are antivirals (S1 Table). Three of these transcripts were annotated as being related to the immune response (ISGCock_Con-tig01_0792, ISGCock_Contig04_0023, and ISGCock_Contig08_4679), 20 transcripts were annotated as being involved in protein binding, and none of the novel AMPs was annotated. The allergenic proteins were grouped into GO subcategories (S2 Fig). Previously known allergen proteins were only predicted from 233 UniProt database sequences of P. americana, and 9 were validated [10]. In our predictions, 57 novel transcripts were predicted as allergens. These novel candidates should be useful for progress in anti-allergen development.

Experimental validation of putative and novel AMPs
Experimental validation is required to examine the accuracy of any putative and novel AMPs identified. The 86 peptides were sorted according to fold-change in expression, and 21 AMPs with potentially high activity were ultimately selected (Table 4). Twenty-five peptides were synthesized according to results of the AMPA server (http://tcoffee.crg.cat/apps/ampa). We chose the active regions based on AMPA stretch due to efficiency and cost of peptide synthesis for development of antimicrobial agents. We tested their antimicrobial activities against Gram-negative bacteria, Gram-positive bacteria, and yeast using a radial diffusion assay (Fig 3). We found antimicrobial activity in 11 synthetic peptides (ISGCock_Contig04_0915, ISGCock_Contig 13_4610-1, ISGCock_Contig16_2060, ISGCock_Contig16_4974, ISGCock_Contig07_3736-1, ISGCock_Contig10_4736-2, ISGCock_Contig13_3006, ISGCock_Contig05_0593, ISGCock_ Contig12_4176, ISGCock_Contig15_1337-1, and ISGCock_Contig15_1337-2), which increased in a dose-dependent manner. Remarkably, the peptides ISGCock_Contig04_0915, ISGCock_Contig13_4610-1, and ISGCock_Contig15_1337-2 showed stronger antimicrobial activity than melittin in E. coli. The antimicrobial activities of ISGCock_Contig13_4610-1, ISGCock_Contig10_4736-2, ISGCock_Contig05_0593, ISGCock_Contig12_4176, ISGCock_ Contig15_1337-1, and ISGCock_Contig15_1337-2 were greater than that of melittin in S. aureus. Correspondingly, ISGCock_Contig04_0915, ISGCock_Contig13_4610-1, ISGCock_ Contig 10_4736-2, ISGCock_Contig05_0593, ISGCock_Contig12_4176, ISGCock_Contig15_1337-1, and ISGCock_Contig15_1337-2 showed strong antimicrobial activity against C. albicans. Thus, different effects were observed for different strains. ISGCock_Contig15_1337-2 showed the highest antimicrobial activity of the tested peptides in E. coli, ISGCock_Con-tig05_0593 showed the highest antibacterial effect in S. aureus, and Contig15_1337-1 showed the highest antifungal activity of the tested peptides with C. albicans. Although the antimicrobial activity of AMPs is generally related to the cell membrane components of the microbes known as PAMPs [35], these AMPs showed a broad range (200 μg/mL) of activity toward Gram-negative bacteria, Gram-positive bacteria, and yeast.  We performed additional antimicrobial testing to determine MIC values against E. coli, S. aureus, and C. albicans. Table 5 shows the antimicrobial activities of the selected peptides including melittin as a control peptide. The MICs of melittin for microbes were measured to be between 4 μg/mL and 8 μg/mL. The ISGCock_Contig16_2060, Contig16_4974, ISGCock_Con-tig10_4736-2, ISGCock_Contig05_0593, and ISGCock_Contig12_4176 peptides showed potent antibacterial activities in E. coli. Most of the peptides were relatively less potent against S. aureus except for ISGCock_Contig16_2060, ISGCock_Contig16_4974, and ISGCock_Con-tig05_0593 compared to their E. coli-cidal activities. The ISGCock_Contig16_2060, Con-tig16_4974, ISGCock_Contig05_0593, ISGCock_Contig12_4176, and ISGCock_Contig15_1337-1 peptides showed potent anti-Candida activities and the MIC values were equal to the anti-E. coli activities except for ISGCock_Contig15_1337-1. In contrast, the ISGCock_Contig04_0915, ISGCock_Contig13_4610-1, ISGCock_Contig07_3736-1, ISG-Cock_Contig13_3006, ISGCock_Contig15_1337-1, and ISGCock_Contig15_1337-2 peptides exhibited higher MIC for most strains, indicating that these peptides may be influenced by the MHB components. Further study is required to elucidate the mechanism and source of the observed antimicrobial activity. Overall, the ISGCock_Contig16_2060, Contig16_4974, ISG-Cock_Contig05_0593, and ISGCock_Contig12_4176 peptides were prime candidates for development of antimicrobial agents.
The hemolytic effects of the 11 selected synthetic peptides showing antimicrobial activity in the radial diffusion assay are shown in Fig 4A. Melittin lysed 99% of rat red blood cells at a concentration of 25 μg/mL, whereas no hemolytic activity was observed for the 11 synthetic peptides at this concentration and up to 50 μg/mL, although the ISGCock_Contig05_0593 and Contig16_4974 peptides showed relatively strong hemolytic activity even at a high concentration (200 μg/mL). Nevertheless, the hemolytic activities of the peptides ISGCock_Contig13_ 4610-1, ISGCock_Contig12_4176, ISGCock_Contig15_1337-1, and ISGCock_Contig15_13 37-2 were relatively low compared to that of melittin at a concentration of 100 μg/mL. Therefore, the ISGCock_Contig05_0593 and Contig16 _4974 peptides are thought to be effective at doses less than 100 μg/mL, indicating their potential as therapeutic agents against a vast array of microbial infections (Fig 4A). In addition, we investigated the cell viabilities of normal human cell lines (keratinocytes and HUVECs) after treatment with the selected peptides for 24 h at the same concentration of the hemolysis assay. Most of the peptides did not decrease cell viabilities of the cell lines except for the ISGCock_Contig16_4974 and Contig05_0593 peptides together with the ISGCock_Contig16_2060 and Contig13_3006 peptides in HUVECs (Fig 4B). These two peptides (ISGCock_Contig16_4974 and Contig05_0593) showed hemolytic activity in the hemolysis assay and the data are consistent with the MTS assay results, except for the ISGCock_Contig16_2060 and Contig13_3006 peptides, which suggests that these peptides have a specific cytotoxic effect on eukaryotic cells. In contrast, normal human cells were more susceptible to melittin treatment even at the lowest concentration. Melittin has strong and broad antimicrobial spectrum, but the peptide lacks selectivity in normal cells. The purpose of this experimental study is to find novel peptides, which have potent antimicrobial activities with little or no cytotoxicity. Thus, these data indicate that the selected peptides are useful for the development of novel antimicrobial agents.

Conclusions
Microbial resistance towards antibiotics threatens the effective prevention and treatment of a wide range of infections caused by bacteria, parasites, viruses, and fungi. In recent years, intensive studies have been undertaken towards the development of more effective antimicrobial drugs. AMPs are vital components of innate immunity that can rapidly respond to diverse microbial pathogens. Insects, as a rich source of AMPs, have attracted considerable research attention with respect to both understanding the insect's immune system and searching for new molecular models for anti-infective drug design [1,6,11].
Here, we have shown the effectiveness of a combination of in silico and in vitro approaches to identify the putative and novel AMPs in P. americana. We performed de novo transcriptome sequencing of E. coli-immunized and non-immunized P. americana and selected 86 AMPs by combining the transcriptome with the successive assembly strategies. We further validated the antimicrobial and hemolytic effects of 11 selected AMPs experimentally, demonstrating broadrange antimicrobial activity.
Reduction in sequencing costs and the availability of high-throughput data facilitated by NGS have provided essential genetic resources to help expand fundamental knowledge of the biology and evolutionary history of an organism. Collectively, the present findings show that the combination of in silico and in vitro approaches could narrow down the identification of potential AMPs, and recent advances in both fields could be used to validate the applications of these 11 candidate AMPs as a template for further development as effective antibiotic therapeutics. Furthermore, these transcriptome sequencing results provide a genetic resource that should facilitate further comprehensive studies on the American cockroach.