Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Metagenomic Characterisation of the Viral Community of Lough Neagh, the Largest Freshwater Lake in Ireland

  • Timofey Skvortsov,

    Affiliation School of Biological Sciences, The Queen’s University of Belfast, Belfast, Northern Ireland, United Kingdom

  • Colin de Leeuwe,

    Affiliation School of Biological Sciences, The Queen’s University of Belfast, Belfast, Northern Ireland, United Kingdom

  • John P. Quinn,

    Affiliation School of Biological Sciences, The Queen’s University of Belfast, Belfast, Northern Ireland, United Kingdom

  • John W. McGrath,

    Affiliation School of Biological Sciences, The Queen’s University of Belfast, Belfast, Northern Ireland, United Kingdom

  • Christopher C. R. Allen,

    Affiliation School of Biological Sciences, The Queen’s University of Belfast, Belfast, Northern Ireland, United Kingdom

  • Yvonne McElarney,

    Affiliation Agri-Food & Biosciences Institute, Belfast, Northern Ireland, United Kingdom

  • Catherine Watson,

    Affiliation Agri-Food & Biosciences Institute, Belfast, Northern Ireland, United Kingdom

  • Ksenia Arkhipova,

    Affiliation School of Biological Sciences, The Queen’s University of Belfast, Belfast, Northern Ireland, United Kingdom

  • Rob Lavigne,

    Affiliation Laboratory of Gene Technology, KU Leuven, Leuven, Belgium

  • Leonid A. Kulakov

    Affiliation School of Biological Sciences, The Queen’s University of Belfast, Belfast, Northern Ireland, United Kingdom

Metagenomic Characterisation of the Viral Community of Lough Neagh, the Largest Freshwater Lake in Ireland

  • Timofey Skvortsov, 
  • Colin de Leeuwe, 
  • John P. Quinn, 
  • John W. McGrath, 
  • Christopher C. R. Allen, 
  • Yvonne McElarney, 
  • Catherine Watson, 
  • Ksenia Arkhipova, 
  • Rob Lavigne, 
  • Leonid A. Kulakov


Lough Neagh is the largest and the most economically important lake in Ireland. It is also one of the most nutrient rich amongst the world’s major lakes. In this study, 16S rRNA analysis of total metagenomic DNA from the water column of Lough Neagh has revealed a high proportion of Cyanobacteria and low levels of Actinobacteria, Acidobacteria, Chloroflexi, and Firmicutes. The planktonic virome of Lough Neagh has been sequenced and 2,298,791 2×300 bp Illumina reads analysed. Comparison with previously characterised lakes demonstrates that the Lough Neagh viral community has the highest level of sequence diversity. Only about 15% of reads had homologs in the RefSeq database and tailed bacteriophages (Caudovirales) were identified as a major grouping. Within the Caudovirales, the Podoviridae and Siphoviridae were the two most dominant families (34.3% and 32.8% of the reads with sequence homology to the RefSeq database), while ssDNA bacteriophages constituted less than 1% of the virome. Putative cyanophages were found to be abundant. 66,450 viral contigs were assembled with the largest one being 58,805 bp; its existence, and that of another 34,467 bp contig, in the water column was confirmed. Analysis of the contigs confirmed the high abundance of cyanophages in the water column.


Lough Neagh is the largest freshwater lake in the British Isles. It is located in Northern Ireland about 30 km to the west of Belfast (54°37′06″N, 6°23′43″W) and has dimensions of 30 km by 15 km. With a mean depth of just 9 m, and a surface area of 392 km2, the relatively high mean wind speeds locally (>4.5 m sec -1) ensure that the 3.5 km3 of water it contains is completely mixed; oxygen saturation levels rarely drop below 60%. Lough Neagh serves as a main source of potable water in Northern Ireland, providing more than 40% of the region’s supply. Among its other uses, the lake contains Europe’s largest eel fishery, provides sand for the construction industry and offers many tourism and leisure activities. Full details of the lake and of its catchment can be found in [1].

Lough Neagh also has a long history of cultural eutrophication; it receives discharges from several wastewater and sewage treatment plants and from diffuse agricultural sources across its catchment of 4,500 km2 with a population of 390,000 [2]. This has caused a shift from mesotrophic conditions at the start of the 20th century to its present status as one of the world’s most hypertrophic lakes–a situation that threatens to irreversibly change its ecosystem. For example, algal species richness has decreased over the last century, with a progressive increase in the dominance of cyanobacteria, most recently of non-diazotrophic species [2]. Although the ecology of Lough Neagh has been studied extensively during the last several decades, little is known about its total bacterial populations [3], whilst the viral community of Lough Neagh has never been studied, even though this is likely to make a major contribution to nutrient cycling in the lake.

Bacteriophages represent the most numerous and important constituents of microbial communities and are likely to play an extremely important role in the cycling of nutrients [4, 5]. As a result, metagenomic analyses supported by next generation sequencing have been widely conducted in marine environments, but freshwater viromes have so far attracted much less attention. Among the first studies in this area was an investigation of viral communities in fish ponds [6], followed by the characterisation of RNA viromes from a freshwater lake [7] and the profiling of viral diversity in Lake Limnopolar (Byers Peninsula, Antarctica) [8]. Viral metagenomic studies have also been carried out on four freshwater ponds located in the Sahara Desert [9], on Feitsui freshwater reservoir in North Taiwan [10], and at two sites in the aquaculture facility of Kent SeaTech Corporation in California, USA [11]. The detailed study reported in [12] demonstrated the relatedness of viromes from two temperate but ecologically different French lakes, and their genetic distinctiveness from other aquatic communities. Among their findings was the demonstration of similarities in viromes from related environments (freshwater, marine, hypersaline), with the salinity level of the habitat having more impact on the viral community structure than its geographical location. Only one preliminary investigation of the composition of planktonic viral communities in a eutrophic freshwater environment has been carried out to date [13].

In the present study, we report a comprehensive characterisation of the viral and bacterial metagenomes of the water column of Lough Neagh, using Illumina high-throughput shotgun sequencing and 16S rRNA gene targeted 454 pyrosequencing, respectively. We present the identification of the major taxonomic groups and functional categories of the viral community, an analysis of sequences of bacterial origin found in the virome, and a comparison of these to the available datasets from other studies. This study provides a first insight into the structure of the bacterioplankton population and that of its phages in one of the most important European temperate eutrophic freshwater lakes.

Results and Discussion

Bacterial diversity

The Lough Neagh ecosystem has been extensively monitored for the last fifty years. The analysis of the monitoring data records, available from literature microbiological and microscopic evidence, and sequencing data allowed us to conclude that major changes in the structure and composition of Lough Neagh bacterial community occur during the spring time (transition to cyanobacterial dominance, see below), which was the reason to use a sample obtained in April for metagenomic analysis of the microbial communities of this lake. The values of chemical and environmental parameters at the time of sample collection (S1 Table) confirmed the typical hypertrophic status of Lough Neagh.

The study of bacterial community structure was based on the pyrosequencing of 16S rRNA gene amplicons and the analysis of the dataset obtained was performed by QIIME [14] as described in the Experimental procedures section. Amplicon sequencing generated 3,275 high-quality reads, of which 2,335 reads were clustered into operational taxonomic units (OTUs) with at least four reads per OTU; a total of 118 different OTUs were identified. 375 reads (16.1%) in 35 OTUs could not be assigned taxonomy by the RDP classifier of QIIME and were designated as “unclassified”. The representative sequences of each OTU were extracted from the dataset and manually examined by carrying out BLASTn [15, 16] searches against the nucleotide collection (nt) database. The inspection of the alignments generated by BLAST for these reads revealed their homology (e-value < 10−5) to sequences annotated as 16S rRNA genes of uncultured bacteria and to 18S rRNA gene sequences of various eukaryotic planktonic microorganisms (e.g., diatoms). The presence of eukaryotic small-subunit ribosomal RNA gene sequences in the amplicon dataset can be explained by non-specific amplification due to the similarity of certain 16S rRNA gene primer sequences to specific regions of 18S rRNA genes, a situation which has previously been observed in metagenomic studies [17, 18]. The unclassified sequences were excluded from further analyses, and the remaining 1960 reads (83.9%), representing 83 OTUs, were assigned taxonomic classifications (to genus level wherever possible).

Bacteria of nine phyla were present in the Lough Neagh water column sample (Fig 1); of these, Cyanobacteria was the most abundant group, comprising 29.1% of the processed reads. While the initial taxonomic classification assigned 39.4% of amplicons to Cyanobacteria, the examination of the taxonomic breakdown of Cyanobacteria at different levels in the QIIME output revealed that about a third of these reads originated from the 16S rRNA genes of Stramenopiles (Heterokonta) and Chlorophyta chloroplasts, which were classified as Cyanobacteria by QIIME algorithms. The chloroplast-related sequences were removed from the subsequent taxonomic analyses of the bacterial community by filtering the OTU table with QIIME scripts and the statistics were updated to reflect that. The remaining 1675 reads were clustered into 74 OTUs, and diversity estimates calculated after rarefaction: Shannon index (H) = 4.679, Simpson index (D) = 0.918. Members of the phylum Proteobacteria accounted for 23.9% of all amplicons, followed by Planctomycetes (15.6%), Verrucomicrobia (13.6%), and Bacteroidetes (13.0%). Other bacterial phyla present in the Lough Neagh water column community (Actinobacteria, Acidobacteria, Chloroflexi, and Firmicutes) constituted less than 5% of the total. Bacterial community structure was clearly dominated by Cyanobacteria, while only 1.7% of 16S rRNA gene amplicons were affiliated with Actinobacteria. At the lowermost (genus) level, two cyanobacterial genera, Planktothrix and Pseudanabaena, accounted for 18.6% and 8.5% of all sequences, respectively. Verrucomicrobia were mainly represented by genera Candidatus Xiphinematobacter (6.1%) and Luteolibacter (3.3%). The majority of the reads assigned to Proteobacteria (18.1%) were from 16S rRNA gene amplicons of bacteria of Pelagibacteraceae (SAR11) family [19, 20], the freshwater members of which belong to the LD12 clade [21, 22]. As the RDP classifier was unable to classify the OTU to the genus level, we performed a manual alignment by BLASTn online [23] (, bl2seq megablast algorithm with default parameters) of the representative sequence of the OTU to the prototypical LD12 sequence (Genbank accession no. Z99997.1, data not shown), which demonstrated 99.8% identity of the sequences analysed.

Fig 1. Major bacterial groups found in Lough Neagh (phylum level).

Partial sequences of 16S rRNA genes were amplified and sequenced using 454 pyrosequencing. The sequences were clustered into OTUs at the 97% sequence similarity level and taxonomic annotation of OTUs was carried out using QIIME; the results obtained were used to generate the distribution of bacteria at the phylum level.

To highlight the major changes occurring in the bacterial community of the lake during the year, we performed a similar sequencing analysis of Lough Neagh water samples conducted over 12-month period. The analysis confirmed that indeed Cyanobacteria was the most abundant group in the lake and that the proportion of Actinobacteria remained relatively low (Fig 2).

Fig 2. Seasonal changes in abundance of six major bacterial phyla in Lough Neagh over 12-month period.

Partial sequences of 16S rRNA genes from the additional water samples collected on 1 April 2014, 23 June 2014, 2 September 2014, 10 November 2014, and 18 February 2015 were amplified and sequenced using 454 pyrosequencing. The sequences were clustered into OTUs at the 97% sequence similarity level and taxonomic annotation of OTUs was carried out using QIIME; the results obtained were used to generate the distribution of bacteria at the phylum level.

A comprehensive meta-analysis of freshwater bacterial compositions by Newton and colleagues [24] demonstrated the abundance of Actinobacteria species (over 25% of all 16S rRNA gene sequences on average). Such a predominance of Actinobacteria (over 35%) is characteristic for both of the most comprehensively studied lakes, Bourget and Pavin [2527]. Notably, only 1.6% of Lough Neagh 16S rRNA gene amplicons showed similarity to this taxon. Using principal coordinates analysis, we performed a comparison of bacterial communities from a range of freshwater environments with publicly available 16S rRNA amplicon datasets and differing in their trophic status and geographic location (S1 File). The analysis conducted demonstrated that Lough Neagh is clearly distinct from the other freshwater lakes analysed. Actinobacteria abundance in bacterial community structures analysed ranged from 1.4% (Lake Michigan) to 76.1% (Ouagadougou reservoir), averaging at 50.5% (95% confidence interval 38.4% to 62.6%). In contrast, the analysis of bacterial composition of Lough Neagh at 6 different time points (Fig 2) demonstrated that Actinobacteria content was in the range from 0.7% to 3.3% throughout the year, being on average just 2.2% (95% confidence interval 1.4% to 3.0%). It is known that Actinobacteria are less abundant in nutrient-rich environments due to their slower growth and decreased competitiveness [28]. It was also recently suggested that the abundance of Actinobacteria negatively correlates with that of Cyanobacteria and the increase of cyanobacterial numbers may reflect serious ecological damage to freshwater systems [29]. High abundance of organic matter, readily available inorganic nutrients (especially N and P), and increased temperatures lead to uncontrollable growth of Cyanobacteria. Cyanobacterial blooms have a number of detrimental effects on an aquatic ecosystem, the most prominent of them being an increase in water turbidity, release of cyanobacterial toxins, and oxygen depletion [30, 31]. All these factors negatively affect the biodiversity of the ecosystem, threatening to cause an irreversible alteration in community structure and composition. Therefore, the dominance of cyanobacteria is an important indicator of deteriorating ecological situation in freshwater environments. Indeed, Cyanobacteria was the largest taxon in terms of 16S rRNA gene amplicon numbers found in the Lough Neagh metagenome in the present study (29.1%; Fig 1) and remained the dominant group of bacteria in all samples studied, except that of 1 April 2014 (Fig 2). A strong correlation between the levels of nitrogen pollution and predominance of Cyanobacteria (more specifically, Planktothrix) in Lough Neagh was previously demonstrated [2]. Our analysis of bacterial populations in six timepoints (2014–2015) corroborates the above conclusion.

Viral community

MetaVir analysis of unassembled reads.

2,295,055 reads were uploaded to the MetaVir server [32, 33] for taxonomic annotation and comparative analyses with other viromes. Rarefaction analysis was performed on the whole dataset with clustering of sequences at 90% identity level, and demonstrated that, while sequencing effort was substantial and sufficient for accurate taxonomic annotation of major groups of viruses, it wasn’t exhaustive, as the rarefaction curve had not approached a plateau (S1A Fig). To further assess cluster richness, we conducted a comparative rarefaction analysis of subsamples from the Lough Neagh virome and several viral freshwater metagenomes. Comparison with the freshwater lakes Bourget and Pavin is shown in S1B Fig (sampling depth– 50,000 reads, clustering of sequences at 90% identity level). All three rarefaction curves could be fit to linear functions using GraphPad Prism (r2 > 0.99); the comparison of their slopes demonstrated that all three curves were different (p < 0.0001) with Lough Neagh having a more diverse virome.

Taxonomic annotation on Metavir was performed by comparing all reads from the Lough Neagh virome with the RefSeq complete viral genomes protein sequence database (2014-09-10 release) using BLASTx [16]. 14.6% (334,507 reads) of the virome sequences produced a database hit (threshold of 50 on the BLAST bit score, with no minimum alignment length). These reads were annotated on the basis of their similarity to known viruses, and the taxonomic composition of the virome was determined after normalisation with the Genome relative Abundance and Average Size (GAAS) tool [34] to account for differences in the genome lengths of viruses (Fig 3). Less than 0.5% of these reads had similarity to ssDNA viruses, and the majority of the remaining reads (97.0%) originated from dsDNA viruses, of which Caudovirales (tailed bacteriophages) accounted for 79.9% of reads. Unclassified dsDNA phage sequences comprised 15.8%, and unclassified dsDNA viruses 1.0% of reads. The majority of reads annotated as arising from Caudovirales had similarity to genomes of the Podoviridae family phages (34.3% of all reads), closely followed by Siphoviridae (32.8%), while Myoviridae was the least numerous group, with 10.3% of reads affiliated with this taxon. The predominant subfamilies/genera (accounting for more than 0.5% of metagenome) for Podoviridae were unclassified and unassigned Podoviridae (26.6% and 0.8%, respectively), Bppunalikevirus (2.3%), Autographivirinae (1.8%), P22likevirus (0.8%), Epsilon15likevirus (0.7%), and Luz24likevirus (0.5%). The majority of the reads assigned to Siphoviridae were from unclassified Siphoviridae (29.1%), followed by Lambdalikevirus (1.8%), Phic3unalikevirus (0.9%), and Yualikevirus (0.8%). In the case of Myoviridae, no subgroup with abundance of more than 0.5% (except unclassified Myoviridae; 8.4%) was identified. Fourteen individual phage sequences were most abundant in the virome, making up more than 1% each. Of these, seven can be linked to the Podoviridae, two to the Siphoviridae family, while five others correlated to unclassified dsDNA phages. Due to abundance corrections introduced by GAAS, the most abundant virotypes in terms of number of mapped reads were different from the most abundant ones selected based on GAAS-corrected values. The combined list of the most abundant phage sequences is given in Table 1. Of special notice is Pelagibacter phage HTVC010P [35], which made up 1.8% of the virome (GAAS-corrected value) with 4,223 reads mapped to its genome. Pelagiphages are possibly among the most numerous types of viruses on the planet [35], but little is known about their role in freshwater environments. One of the top 21 contigs in terms of the number of mapped reads assembled in this work (LNW4-c10) also had the TerL gene showing high similarity to the TerL of Pelagibacter phage HTVC010P (see below). Three other Pelagibacter phage sequences were identified in the Lough Neagh dataset, constituting 1.4% of the virome, with 5,527 reads mapped to their genomes. In agreement with the dominance of Cyanobacteria in the microbial community structure, 39,845 (9.00%) reads from the whole virome were annotated as originating from bacteriophages of Synechococcus and Prochlorococcus cyanobacteria as well as unclassified cyanophages.

Fig 3. Taxonomic composition of Lough Neagh virome.

Composition was computed at the MetaVir server from a BLAST comparison with the RefSeq complete viral genomes protein sequences database. Abundance of the major viral groups shown with the numbers of mapped sequences at the right ends of the corresponding bars.

MG-RAST analysis of unassembled reads.

After merging of paired-end reads, quality processing, and deduplication, the MG-RAST analysis pipeline [36] generated 2,601,470 reads. These reads were subjected to functional and taxonomic classification. MG-RAST utilises a number of different databases for functional annotation of reads, including four databases allowing for hierarchical functional annotation, namely KEGG Orthology (KO), COG, eggNOG, and SEED Subsystems [37]. The SEED subsystems database is manually curated and thus is considered to be more accurate. It is a conclusion reached by, for example, [37, 38], which we share, so we chose it as a primary method of functional annotation. The unassembled reads processed by MG-RAST were compared to the Subsystems database using a maximum e-value of 10−5, a minimum identity of 60%, and a minimum alignment length of 15 (measured in aa for protein and bp for RNA databases). 125,852 reads were classified this way. The functional distribution of reads at the highest hierarchical level of MG-RAST Subsystems classification is presented in Fig 4A. 68.3% of all classified reads were identified as belonging to the functional category of “Phages, Prophages, Transposable elements, and Plasmids”. Phages and prophages were the largest part of this group (66.4% of all classified reads), while 1.4% of reads belonged to the GTA (Gene Transfer Agents). A small number of reads in the functional category of “Phages, Prophages, Transposable elements, and Plasmids” were assigned to functional categories of”Pathogenicity islands” (0.5%) and “Transposable elements and integrons” (0.1%) (Fig 4C). It should be noted that in Fig 4C, in the category “Phages, Prophages” the top subgroup is “r1t-like streptococcal phages” (26.7%). We used functional classification based on SEED Subsystems. One of these subsystems, named “r1t-like streptococcal phages”, contains several genes characteristic of streptococcal bacteriophages, which are similar to phage r1t. The reads from our virome that had best BLAST hits to the genes in the category “r1t-like streptococcal phages” were classified as such, not necessarily originating from streptococcal phages. The remaining 21.7% reads were divided between various non-viral functional groups (Fig 4A). A detailed description of these groups presented in Fig 4B. It is important to note that the pstS (high affinity phosphate transporter) gene was identified in 116 reads. The pstS gene has previously been detected as integrated into genomes of a number of bacteriophages in a study of marine viruses by Sullivan and colleagues [39, 40]. To assess the extent of horizontal gene transfer we based the study of functional diversity of the virome on the analysis of individual reads, and not the assembled contigs. The presence of the pstS gene in our viral metagenome could arise from its being permanently integrated into a phage genome (specialized transducing phages) or from various transducing entities (generalised transducing phages or GTAs).

Fig 4. Functional analysis of Lough Neagh virome.

The analysis was carried using SEED subsystems hierarchical functional annotation on the MG-RAST webserver. (A) Relative abundance of level one functional categories. (B) Distribution of minor functional categories. (C) Distribution of functional categories in the “Phages, Prophages, Transposable elements, Plasmids” group at levels 2 and 3.

This study has produced the largest virome sequencing coverage of a freshwater lake to date. Nevertheless, the rarefaction analysis conducted clearly demonstrates that this sequencing is not exhaustive (S1A Fig). Comparison with previously published viromes of the French lakes Pavin and Bourget (S1B Fig), sequenced with less depth [12], demonstrated that the Lough Neagh virome has a higher sequence diversity. The lower limit of viral richness for Lough Neagh was estimated according to [41]. The average length of the 2,295,055 reads uploaded to MetaVir was 276 bp, and the reads were clustered into approximately 650,000 clusters at 90% identity level, and into approximately 840,000 clusters at 98% identity level. Using 50,000 bp as an average bacteriophage genome size, and defining “a single viral species” as in [41] (as being a grouping of isolates at nucleotide identity levels of 90% to 95%), we estimate the lower limit of the number of different viruses as being between 3588 and 4637, using the formula N*L/G, where N is the number of clusters, L the average read length (bp), and G the average bacteriophage genome size (bp). The Lough Neagh virome was also compared to freshwater viromes available on MetaVir (S2 Fig). Depending on the algorithm used for the comparison (di-, tri-, or tetranucleotide bias comparison [42] or BLAST-based comparison [32]), the closest viral communities identified were the viromes of Lagoa Vermelha [MetaVir project ID 4000], Tilapia_Channel– 1105 [MetaVir project ID 33] [11], El Berbera [MetaVir project ID 395] [9], and Lake Bourget [MetaVir project ID 7] [12], respectively.

According to MetaVir analysis, 14% of all reads were classified as of viral origin; the rest were not assigned. MG-RAST analysis of the same virome classified approximately 15% of the reads analysed. This means that over 80% of the sequences analysed lack any substantial homology to database entries (with an e-value smaller than 10−5). This is typical for those viral metagenomes analysed to date [41]. According to MG-RAST analysis, 10.9% of the reads were annotated as of bacterial origin (72% of all reads after QC and post-processing). This apparent anomaly could be explained by the fact that sequences of GTAs, bacterial vesicles, free external DNA, malformed VLPs (with bacterial DNA), and transduced bacterial DNA would be included in this category. It is also should be taken into account that the MG-RAST pipeline is heavily biased towards the annotation of sequences as being of bacterial origin. All precautions were taken in this work to minimise external bacterial DNA contamination; the VLP fraction was treated with an excess of DNase I as recommended [43] until disappearance of the 16S rRNA gene products (results not shown). Indeed, only 4 of 2,601,470 reads were classified as originating from 16S rRNA genes. These are likely to originate from general transducing phages or GTA particles.

When compared with two temperate freshwater viromes published [12], the striking difference is the absence of ssDNA viruses in Lough Neagh metagenome (0.5%); comparable values are 80% for Lake Pavin and 85% for Lake Bourget. The most likely explanation of this is the difference in preparation of the metagenomic samples for sequencing. No multiple displacement amplification (MDA), which is known to be highly biased towards the amplification of single-stranded DNA molecules [44, 45], was used in our work. In another viral metagenome project, where MDA was also not employed, ssDNA viruses also constituted less than 1% of all raw reads [41]. It may be concluded that avoiding the amplification of viral metagenomic samples using MDA is desirable for a more accurate representation of viral communities.

Contig construction and MetaVir analysis.

66,450 contigs ranging from 301 to 58,805 bp were produced as described in the Experimental Procedure section. All contigs were uploaded to MetaVir server for annotation and comparison with other publicly available viromes. There were 21 contigs larger than 30 kb, with the largest being 58.8 kb. The essential characteristics of these contigs are presented in Table 2. The in-depth analysis has been conducted for largest contigs (i.e., LNW4-c0 –LNW4-c20), as well as for those which were detected as the most abundant in Lough Neagh (identified by high sequence coverage). As can be seen from the Table 2, putative cyanophages are highly represented in the Lough Neagh virome (contigs LNW4-c0, LNW4-c11, LNW4-c20).

Genetic maps for contigs LNW4-c0 and LNW4-c12 are shown in Fig 5. LNW4-c0 represents a putative Myoviridae (possibly T4-like) phage. 51 full and 1 partial ORFs were identified in this contig of 58,073 bp. On the basis of the analysis of orf35, identified as a terminase large subunit by BLASTp and hmmscan comparisons, this phage can be classified as being related to Prochlorococcus phage P-SSM7 (NC_015290.1) and Sinorhizobium phage phiM12 (KF381361). While it is impossible to unambiguously determine the taxonomic affiliation of the phage in question, the similarity of a number of other ORFs of the contig to genes of cyanophages favours the hypothesis of a cyanophage origin. The genome sizes of both related phages are more than 150 kb; therefore, it is likely that LNW4-c0 contig represents a partial sequence of a phage genome from Lough Neagh. LNW4-c12 probably comes from a member of Podoviridae family, this 34,467 bp circular contig contains 52 ORFs. It is likely that this contig represents a genome of a phage with either circular permutations or long direct terminal repeats. According to MetaVir BLASTp and independent BLASTx analyses, the closest homologs of the LNW4-c12 TerL gene are sequences of the terminase large subunit from Roseobacter phage RDJL Phi 1 (62,668 bp) and the terminase large subunit from the Burkholderia sp. TJI49 phage genome, respectively. Due to a high diversity of environmental bacteriophages and a limited number of viral genomes available in the reference databases, it is not possible to state whether or not LNW4-c12 is indeed a phage infecting bacteria of genus Roseobacter or Burkholderia.

Fig 5. Maps of putative phage genomes identified in Lough Neagh.

Genome regions amplified using PCR and genome specific primers are indicated with horizontal bars. Identified ORF shown by arrows. (A) Genome map of putative phage LNW4-c0. (B) Genome map of putative phage LNW4-c12.

To confirm that the identified contigs LNW4-c0 and LNW4-c12 corresponded to the genomic DNA molecules present in the sample analysed, three pairs of specific primers were designed for each of these two contigs to amplify segments 4–6 kbp long, and PCR reactions were performed using the same metagenomic DNA that had been used for Illumina sequencing. In all six cases, PCR products were obtained and Sanger sequencing analysis confirmed the presence of these contigs (the PCR amplified and confirmed regions are indicated in Fig 5).


Lough Neagh is the largest and the most important freshwater lake of the British Isles. Here for the first time, a metagenomic analysis of the microbial community of the lake has been conducted with an emphasis on characterisation of the virome. As in the majority of previously characterised viromes a large number (85%) of the reads did not have homologs in available databases. However, this work demonstrates that the microbial community of Lough Neagh is clearly different from those of major freshwater lakes previously analysed. The most important of these differences are: i) the abundance of Cyanobacteria (27%) and paucity of Actinobacteria; ii) the apparent abundance of putative cyanophages in the Lough Neagh virome; iii) the high diversity of the virome. The abundance of the Cyanobacteria group is most likely a result of intensive agricultural activity in the area leading to ecological damage to this freshwater system [29]. It is difficult to reliably assess the proportion of phages infecting Cyanobacteria in Lough Neagh due to the absence of universal genetic markers for this group of viruses (and for bacteriophages in general). However, we were able to identify a number of putative cyanophage genomes (Tables 1 and 2) abundant in the Lough Neagh ecosystem. The assembled contig of phage LNW4-c0 was confirmed in PCR experiments using the corresponding metagenomic DNA. Previous works on the viral communities from the marine environments provide valuable information about the role of cyanophages. For example, earlier studies by Paul and colleagues, who investigated bacteria-phage relationships in the marine environment, indicated that an environment inimical to bacterial growth supports lysogeny [46, 47]. Studies of the marine cyanobacterium Synechococcus indicated that phage S-PM2 infecting this species preferentially enters into a lysogenic state in phosphate (Pi)-depleted waters [48, 49]. A study of various phages infecting Cyanobacteria in the marine environment identified phage encoded genes for alkaline phosphatase (phoA) and the periplasmic high affinity phosphate-binding protein (pstS) [50]. Crucially, the transcriptional activity of these was shown to be activated in Pi-starved bacteria and controlled by the host’s Pi starvation response regulon [51]. It is likely that Pi levels play an important role in phage production in the marine environment and, importantly, that the corresponding phages could serve as early indicators of the phosphate status of the environment. While relatively little is known about freshwater cyanophages, it is important to note that identified cyanophage-derived contigs (Table 1, Table 2, and contig annotations and phylogenetic trees available on the Metavir website in the project Lough Neagh—4pW contigs, Project id: 5053), suggest the ubiquity and importance of cyanophages in the Lough Neagh freshwater ecosystem, where they might play roles similar to those of marine phages. It is important to note that the pstS gene, which was shown to be integrated into genome of some marine cyanophages, was found to be present in the Lough Neagh metagenome. This may indicate the horizontal gene transfer of this gene by generalised transducing cyanophages or GTAs in a freshwater environment.

Experimental Procedures

All prevailing local, national, and international regulations and conventions, and normal scientific ethical practices have been respected. No specific approvals and permissions were required to collect and process water samples from Lough Neagh, as all the work conducted did not involve endangered or protected species and was carried on outside of privately owned or protected areas.

Primary water sample

Lough Neagh (54°37′06″N, 6°23′43″W) is the largest lake in the British Isles. Three 10 m integrated water column samples of 5 litres each were collected from Lough Neagh using a flexible hose at a site situated approximately 5 km North from Kinnego Marina on 28 April 2014 at 11:00 GMT, taken to the laboratory within 2 hours and placed on ice. The Secchi depth, temperature and pH of lake water at surface, 5 m and 10 m depths were recorded on the site and several extra water samples were taken for chemical analysis (S1 Table).

Additional water samples

Five additional water column samples were collected from Lough Neagh at the same location as the primary sample on 1 April 2014, 23 June 2014, 2 September 2014, 10 November 2014, and 18 February 2015. These water samples were processed in the same manner as the primary sample and were used for the analysis of taxonomic composition of bacterial communities via 16S rRNA gene amplicon sequencing.

Primary sample processing and DNA extraction

The samples were processed within 24 h of collection. Total DNA was extracted from 500 ml of water using sterile 0.2 μm ME 24 ST Mixed Cellulose Ester Membrane filters (Whatman/GE Healthcare, UK) and PowerWater DNA Isolation kit (MO BIO, USA). To obtain a ‘virus-like particle’ (VLP) fraction, 5 litres of water were filtered through 0.22 μm Steripak GP-20 filter units (EMD Millipore, USA) and concentrated to 50 ml using an LV Centramate Lab Tangential Flow Filtration System with a 100 kDa Omega membrane suspended screen cassette (Pall, USA). To ensure removal of any remaining planktonic microorganisms, the preparation was further filtered through 0.22 μm Millex-GS syringe filter units (EMD Millipore, USA). The filtrate was concentrated into a final volume of 4 ml using an Amicon Ultra-15 Centrifugal Filter Unit with 100-kDa molecular mass cut-off (EMD Millipore, USA). The resulting VLP concentrate was incubated with 3,000 U of DNase I (Roche, USA) at 4°C for 24 h. PCR with universal 16S rRNA gene primers (63-F/1387-R) [52] was then carried out to confirm the removal of external bacterial DNA. DNase I treatment was continued until no 16S rRNA gene sequences could be detected in the sample. Viral DNA was isolated from the purified VLP concentrate by a formamide/CTAB extraction procedure [53, 54], purified with PowerClean Pro DNA Clean-Up Kit (MO BIO, USA) and quantified using a Quantus fluorometer (Promega, USA). The absence of bacterial contamination was monitored at all stages by epifluorescence microscopy of the SYBR Gold (Invitrogen, USA) stained samples as previously described [55].

Preparation of libraries and sequencing

A 16S rRNA gene amplicon library was constructed from total DNA of the primary sample. Partial bacterial 16S rRNA gene sequences were amplified from the total DNA sample by two-step PCR with primers 909-F/1492-R (1st step, 27 cycles) and 909-F B Lib L/1492-Tag 4 A Lib L (2nd step, 5 cycles) [56, 57]. The primers 909-F/1492-R used for the first step of amplification were evaluated in the study by Klindworth et al. [58] and demonstrated good coverage of the domain Bacteria (specifically, if no mismatches are allowed, 91.7% by primer 909-F and 73.4% by primer 1492-R). The resulting PCR amplicons were purified with a High Pure PCR Product Purification Kit (Roche, USA) and quantified using a Quantus fluorometer (Promega, USA). Amplicon sequencing was performed on a 454 GS Junior (Roche, USA) with Lib-L Shotgun chemistry at the University of Cambridge DNA Sequencing Facility.

Viral DNA of the primary sample was subjected to whole genome shotgun (WGS) sequencing at the University of Cambridge DNA Sequencing Facility. A Nextera DNA Sample Preparation kit (Illumina, USA) was used to generate the sequencing library directly from 50 ng of metagenomic viral DNA without preliminary amplification. A 1% PhiX v3 library spike-in was used as a quality control for cluster generation and sequencing. The resulting library was sequenced from both ends (2×300 bp) with the 600-cycle MiSeq Reagent Kit v3 on MiSeq (Illumina, USA). Sequencing adaptors were trimmed off the raw reads at the sequencing facility.

Primary sample bacterial community analysis

The raw reads obtained from sequencing of total water column 16S rRNA gene amplicons were processed using the QIIME pipeline v 1.8.0 [14], following standard protocols. Briefly, the reads were length- and quality-filtered and de-noised, yielding 3,275 sequences for downstream analyses (S1 Datasets, 28_April_2014). Operational taxonomic units (OTUs) were picked using the usearch clustering and quality-filtering method with default parameters. OTUs were clustered at the sequence similarity level of 97%, with a minimum cluster size of 4. The detection and discarding of chimeric sequences was performed by usearch, using both de novo and reference-based detection (ChimeraSlayer reference database, version microbiomeutil-r20110519). The most abundant sequences found in the OTUs were selected as representative sequences. Taxonomic assignment of OTUs was performed using the RDP classifier and Greengenes reference database v 13.8, with minimum confidence score of 0.5. Unclassified and chloroplast-related sequences were filtered out from the OTU table. Statistical analyses were performed using R and GraphPad Prism.

Primary sample virome processing and analysis

Illumina sequencing of viral DNA produced 2,298,791 2×300 bp reads. The reads obtained ranged from 35 to 300 bp in length, with an average length of 263 bp, and a median length of 299 bp. The sequence data files have been submitted to NCBI Sequence Read Archive (SRA, under the following accession numbers: SRP062094 (study), SRR2147000 (sequencing run). Initial quality control was performed with FastQC ( and the NGS QC Toolkit [59], and reads were processed with BBMap v 33.54 ( Briefly, all reads with an average Q-score < 13 or containing Ns were discarded. The reads were then trimmed of adaptors and quality-trimmed (trimq = 15) by script. Finally, was used to merge paired-end reads having an overlap of at least 20 bp, and all reads shorter than 30 bp were discarded.

The IDBA-UD sequence assembler v1.1.1 [60] was used to assemble the processed reads into contigs. Two modifications were made to the source code before compiling: in the source code file idba-1.1.1\src\basic\kmer.h the expression "static const uint32_t kNumUint64 = 4" was changed to "static const uint32_t kNumUint64 = 16"; in the source code file idba-1.1.1\src\sequence\short_sequence.h the expression "static const uint32_t kMaxShortSequence = 128" was changed to "static const uint32_t kMaxShortSequence = 32768". The assembly was performed using the following parameters:—mink 20—maxk 250—step 20—num_threads 8. Reads were mapped to contigs with Bowtie2 [61] and mapping statistics were obtained using SAMtools [62] and BEDTools [63], while Artemis [64] and IGV [65] were used for mapping visualisation. The virome was analysed using two online pipelines: MetaVir and MG-RAST. The contigs and unassembled processed reads were uploaded to the MetaVir [33] ( server for taxonomic annotation and comparison with other publicly available viromes (project ID 4925 –Lough Neagh virome, project ID 5053 –Lough Neagh assembled contigs). Functional annotation of the virome was performed with MG-RAST [36] ( Due to the fact that MG-RAST implements its own quality-filtering and pre-processing pipeline, the original unprocessed reads were uploaded (MG-RAST ID 4585272.3).

Supporting Information

S1 Datasets. 16S rRNA gene amplicons from the Lough Neagh water samples.


S1 Fig. Rarefaction analysis of Lough Neagh virome.

(A) A rarefaction curve of the total viral metagenome was obtained after high-throughput sequencing of the Lough Neagh sample. The rarefaction curve was constructed within MetaVir with clustering set at 90% identity; 2,295,055 reads were analysed. (B) Comparison of the rarefaction curves of the three freshwater viral metagenomes conducted using MetaVir; subsamples of 50,000 reads from each virome were used. Red, Lough Neagh; green, Lake Pavin; blue, Lake Bourget.


S2 Fig. Score matrices-based global comparisons of Lough Neagh virome to freshwater viromes at MetaVir website.

Results of oligonucleotide signatures comparison of full viromes and BLAST-based comparison of 50,000 sequences are shown. Hierarchical clustering and tree generation were done by R package pvclust. (A) Dinucleotide composition bias comparison. (B) Trinucleotide composition bias comparison. (C) Tetranucleotide composition bias comparison. (D) BLAST-based comparison.


S1 File. Comparison of bacterial communities of Lough Neagh and selected freshwater lakes.


S1 Table. Chemical and environmental parameters of the Lough Neagh water sample used for metagenomic analysis.



We are grateful to Ms Hannah Cromie for her help with the collection of the samples.

Author Contributions

Conceived and designed the experiments: TS JPQ JWM CCRA CW RL LAK. Performed the experiments: TS CL. Analyzed the data: TS CL JPQ JWM YM CW KA RL LAK. Contributed reagents/materials/analysis tools: YM CW. Wrote the paper: TS CL JPQ JWM CCRA YM CW KA RL LAK.


  1. 1. Wood RB, Smith RV, editors. Lough Neagh: the ecology of a multipurpose water resource. Netherlands: Springer; 1993.
  2. 2. Bunting L, Leavitt PR, Gibson CE, McGee EJ, Hall VA. Degradation of water quality in Lough Neagh, Northern Ireland, by diffuse nitrogen flux from a phosphorus-rich catchment. Limnology and Oceanography. 2007;52(1):354–69.
  3. 3. Quinn JP. Heterotrophic micro-organisms in the water column and sediments of Lough Neagh. In: Wood RB, Smith RV, editors. Lough Neagh: the ecology of a multipurpose water resource. Netherlands: Springer; 1993. p. 369–79.
  4. 4. Suttle CA. Marine viruses—major players in the global ecosystem. Nature reviews Microbiology. 2007;5(10):801–12. pmid:17853907.
  5. 5. Weitz JS, Wilhelm SW. Ocean viruses and their effects on microbial communities and biogeochemical cycles. F1000 biology reports. 2012;4:17. pmid:22991582; PubMed Central PMCID: PMC3434959.
  6. 6. Dinsdale EA, Edwards RA, Hall D, Angly F, Breitbart M, Brulc JM, et al. Functional metagenomic profiling of nine biomes. Nature. 2008;452(7187):629–32. pmid:18337718.
  7. 7. Djikeng A, Kuzmickas R, Anderson NG, Spiro DJ. Metagenomic analysis of RNA viruses in a fresh water lake. PLOS one. 2009;4(9):e7264. pmid:19787045; PubMed Central PMCID: PMC2746286.
  8. 8. Lopez-Bueno A, Tamames J, Velazquez D, Moya A, Quesada A, Alcami A. High diversity of the viral community from an Antarctic lake. Science. 2009;326(5954):858–61. pmid:19892985.
  9. 9. Fancello L, Trape S, Robert C, Boyer M, Popgeorgiev N, Raoult D, et al. Viruses in the desert: a metagenomic survey of viral communities in four perennial ponds of the Mauritanian Sahara. The ISME journal. 2013;7(2):359–69. pmid:23038177; PubMed Central PMCID: PMC3554411.
  10. 10. Tseng CH, Chiang PW, Shiah FK, Chen YL, Liou JR, Hsu TC, et al. Microbial and viral metagenomes of a subtropical freshwater reservoir subject to climatic disturbances. The ISME journal. 2013;7(12):2374–86. pmid:23842651; PubMed Central PMCID: PMC3834851.
  11. 11. Rodriguez-Brito B, Li L, Wegley L, Furlan M, Angly F, Breitbart M, et al. Viral and microbial community dynamics in four aquatic environments. The ISME journal. 2010;4(6):739–51. pmid:20147985.
  12. 12. Roux S, Enault F, Robin A, Ravet V, Personnic S, Theil S, et al. Assessing the diversity and specificity of two freshwater viral communities through metagenomics. PLOS one. 2012;7(3):e33641. pmid:22432038; PubMed Central PMCID: PMC3303852.
  13. 13. Ge X, Wu Y, Wang M, Wang J, Wu L, Yang X, et al. Viral metagenomics analysis of planktonic viruses in East Lake, Wuhan, China. Virologica Sinica. 2013;28(5):280–90. pmid:24132758.
  14. 14. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nature methods. 2010;7(5):335–6. pmid:20383131; PubMed Central PMCID: PMC3156573.
  15. 15. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10. pmid:2231712.
  16. 16. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC bioinformatics. 2009;10:421. pmid:20003500; PubMed Central PMCID: PMCPMC2803857.
  17. 17. Huys G, Vanhoutte T, Joossens M, Mahious AS, De Brandt E, Vermeire S, et al. Coamplification of eukaryotic DNA with 16S rRNA gene-based PCR primers: possible consequences for population fingerprinting of complex microbial communities. Current microbiology. 2008;56(6):553–7. pmid:18301945.
  18. 18. Roh SW, Kim KH, Nam YD, Chang HW, Park EJ, Bae JW. Investigation of archaeal and bacterial diversity in fermented seafood using barcoded pyrosequencing. The ISME journal. 2010;4(1):1–16. pmid:19587773.
  19. 19. Carlson CA, Morris R, Parsons R, Treusch AH, Giovannoni SJ, Vergin K. Seasonal dynamics of SAR11 populations in the euphotic and mesopelagic zones of the northwestern Sargasso Sea. The ISME journal. 2009;3(3):283–95. pmid:19052630.
  20. 20. Morris RM, Rappe MS, Connon SA, Vergin KL, Siebold WA, Carlson CA, et al. SAR11 clade dominates ocean surface bacterioplankton communities. Nature. 2002;420(6917):806–10. pmid:12490947.
  21. 21. Salcher MM, Pernthaler J, Posch T. Seasonal bloom dynamics and ecophysiology of the freshwater sister clade of SAR11 bacteria 'that rule the waves' (LD12). The ISME journal. 2011;5(8):1242–52. pmid:21412347; PubMed Central PMCID: PMC3146277.
  22. 22. Zwart G, Hiorns WD, Methe BA, van Agterveld MP, Huismans R, Nold SC, et al. Nearly identical 16S rRNA sequences recovered from lakes in North America and Europe indicate the existence of clades of globally distributed freshwater bacteria. Systematic and applied microbiology. 1998;21(4):546–56. pmid:9924823.
  23. 23. Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Res. 2008;36(Web Server issue):W5–9. pmid:18440982; PubMed Central PMCID: PMCPMC2447716.
  24. 24. Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. A guide to the natural history of freshwater lake bacteria. Microbiology and molecular biology reviews: MMBR. 2011;75(1):14–49. pmid:21372319; PubMed Central PMCID: PMC3063352.
  25. 25. Boucher D, Jardillier L, Debroas D. Succession of bacterial community composition over two consecutive years in two aquatic systems: a natural lake and a lake-reservoir. FEMS microbiology ecology. 2006;55(1):79–97. pmid:16420617.
  26. 26. Debroas D, Humbert JF, Enault F, Bronner G, Faubladier M, Cornillot E. Metagenomic approach studying the taxonomic and functional diversity of the bacterial community in a mesotrophic lake (Lac du Bourget—France). Environmental microbiology. 2009;11(9):2412–24. pmid:19558513.
  27. 27. Ghai R, McMahon KD, Rodriguez-Valera F. Breaking a paradigm: cosmopolitan and abundant freshwater actinobacteria are low GC. Environmental microbiology reports. 2012;4(1):29–35. pmid:23757226.
  28. 28. Haukka K, Kolmonen E, Hyder R, Hietala J, Vakkilainen K, Kairesalo T, et al. Effect of nutrient loading on bacterioplankton community composition in lake mesocosms. Microbial ecology. 2006;51(2):137–46. pmid:16435168.
  29. 29. Ghai R, Mizuno CM, Picazo A, Camacho A, Rodriguez-Valera F. Key roles for freshwater Actinobacteria revealed by deep metagenomic sequencing. Molecular ecology. 2014;23(24):6073–90. pmid:25355242.
  30. 30. Paerl HW, Paul VJ. Climate change: links to global expansion of harmful cyanobacteria. Water Res. 2012;46(5):1349–63. pmid:21893330.
  31. 31. Havens KE. Cyanobacteria blooms: effects on aquatic ecosystems. Adv Exp Med Biol. 2008;619:733–47. pmid:18461790.
  32. 32. Roux S, Faubladier M, Mahul A, Paulhe N, Bernard A, Debroas D, et al. Metavir: a web server dedicated to virome analysis. Bioinformatics. 2011;27(21):3074–5. pmid:21911332.
  33. 33. Roux S, Tournayre J, Mahul A, Debroas D, Enault F. Metavir 2: new tools for viral metagenome comparison and assembled virome analysis. BMC bioinformatics. 2014;15:76. pmid:24646187; PubMed Central PMCID: PMC4002922.
  34. 34. Angly FE, Willner D, Prieto-Davo A, Edwards RA, Schmieder R, Vega-Thurber R, et al. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLOS computational biology. 2009;5(12):e1000593. pmid:20011103; PubMed Central PMCID: PMC2781106.
  35. 35. Zhao Y, Temperton B, Thrash JC, Schwalbach MS, Vergin KL, Landry ZC, et al. Abundant SAR11 viruses in the ocean. Nature. 2013;494(7437):357–60. pmid:23407494.
  36. 36. Meyer F, Paarmann D, D'Souza M, Olson R, Glass EM, Kubal M, et al. The metagenomics RAST server—a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC bioinformatics. 2008;9:386. pmid:18803844; PubMed Central PMCID: PMC2563014.
  37. 37. Wilke A, Glass EM, Bischof J, Braithwaite D, DSouza M, Gerlach W, et al. MG-RAST Manual for version 3.3.6, revision 9. 2014.
  38. 38. Mao X, Zhang Y, Xu Y. SEAS: a system for SEED-based pathway enrichment analysis. PLOS one. 2011;6(7):e22556. pmid:21799897; PubMed Central PMCID: PMCPMC3142180.
  39. 39. Sullivan MB, Coleman ML, Weigele P, Rohwer F, Chisholm SW. Three Prochlorococcus cyanophage genomes: signature features and ecological interpretations. PLOS biology. 2005;3(5):e144. pmid:15828858; PubMed Central PMCID: PMC1079782.
  40. 40. Breitbart M. Marine Viruses: Truth or Dare. Annual Review of Marine Science. 2012;4(1):425–48.
  41. 41. Adriaenssens EM, Van Zyl L, De Maayer P, Rubagotti E, Rybicki E, Tuffin M, et al. Metagenomic analysis of the viral community in Namib Desert hypoliths. Environmental microbiology. 2015;17(2):480–95. pmid:24912085.
  42. 42. Willner D, Thurber RV, Rohwer F. Metagenomic signatures of 86 microbial and viral metagenomes. Environmental microbiology. 2009;11(7):1752–66. pmid:19302541.
  43. 43. Kleiner M, Hooper LV, Duerkop BA. Evaluation of methods to purify virus-like particles for metagenomic sequencing of intestinal viromes. BMC genomics. 2015;16:7. pmid:25608871; PubMed Central PMCID: PMC4308010.
  44. 44. Kim KH, Bae JW. Amplification methods bias metagenomic libraries of uncultured single-stranded and double-stranded DNA viruses. Applied and environmental microbiology. 2011;77(21):7663–8. pmid:21926223; PubMed Central PMCID: PMC3209148.
  45. 45. Marine R, McCarren C, Vorrasane V, Nasko D, Crowgey E, Polson SW, et al. Caught in the middle with multiple displacement amplification: the myth of pooling for avoiding multiple displacement amplification bias in a metagenome. Microbiome. 2014;2(1):3. pmid:24475755; PubMed Central PMCID: PMC3937105.
  46. 46. Paul JH. Prophages in marine bacteria: dangerous molecular time bombs or the key to survival in the seas? The ISME journal. 2008;2(6):579–89. pmid:18521076.
  47. 47. Williamson SJ, Houchin LA, McDaniel L, Paul JH. Seasonal variation in lysogeny as depicted by prophage induction in Tampa Bay, Florida. Applied and environmental microbiology. 2002;68(9):4307–14. pmid:12200280; PubMed Central PMCID: PMC124089.
  48. 48. Wilson WH, Carr NG, Mann NH. The effect of phosphate status on the kinetics of cyanophage infection in the oceanic cyanobacterium Synechococcus sp. WH7803. Journal of Phycology. 1996;32(4):506–16.
  49. 49. Wilson WH, Turner S, Mann NH. Population Dynamics of Phytoplankton and Viruses in a Phosphate-limited Mesocosm and their Effect on DMSP and DMS Production. Estuarine, Coastal and Shelf Science. 1998;46(2):49–59.
  50. 50. Sullivan MB, Huang KH, Ignacio-Espinoza JC, Berlin AM, Kelly L, Weigele PR, et al. Genomic analysis of oceanic cyanobacterial myoviruses compared with T4-like myoviruses from diverse hosts and environments. Environmental microbiology. 2010;12(11):3035–56. pmid:20662890; PubMed Central PMCID: PMC3037559.
  51. 51. Zeidner G, Bielawski JP, Shmoish M, Scanlan DJ, Sabehi G, Beja O. Potential photosynthesis gene recombination between Prochlorococcus and Synechococcus via viral intermediates. Environmental microbiology. 2005;7(10):1505–13. pmid:16156724.
  52. 52. Marchesi JR, Sato T, Weightman AJ, Martin TA, Fry JC, Hiom SJ, et al. Design and evaluation of useful bacterium-specific PCR primers that amplify genes coding for bacterial 16S rRNA. Applied and environmental microbiology. 1998;64(2):795–9. PMC106123. pmid:9464425
  53. 53. Thurber RV. Methods in Viral Metagenomics. In: de Bruijn FJ, editor. Handbook of Molecular Microbial Ecology II: Metagenomics in different habitats: Wiley-Blackwell; 2011.
  54. 54. Thurber RV, Haynes M, Breitbart M, Wegley L, Rohwer F. Laboratory procedures to generate viral metagenomes. Nature protocols. 2009;4(4):470–83. pmid:19300441.
  55. 55. Chen F, Lu JR, Binder BJ, Liu YC, Hodson RE. Application of digital image analysis and flow cytometry to enumerate marine viruses stained with SYBR gold. Applied and environmental microbiology. 2001;67(2):539–45. pmid:11157214; PubMed Central PMCID: PMC92618.
  56. 56. Berry D, Ben Mahfoudh K, Wagner M, Loy A. Barcoded primers used in multiplex amplicon pyrosequencing bias amplification. Applied and environmental microbiology. 2011;77(21):7846–9. pmid:21890669; PubMed Central PMCID: PMC3209180.
  57. 57. Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nature methods. 2008;5(3):235–7. pmid:18264105; PubMed Central PMCID: PMC3439997.
  58. 58. Klindworth A, Pruesse E, Schweer T, Peplies J, Quast C, Horn M, et al. Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies. Nucleic Acids Res. 2013;41(1):e1. pmid:22933715; PubMed Central PMCID: PMCPMC3592464.
  59. 59. Patel RK, Jain M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLOS one. 2012;7(2):e30619. pmid:22312429; PubMed Central PMCID: PMC3270013.
  60. 60. Peng Y, Leung HC, Yiu SM, Chin FY. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–8. pmid:22495754.
  61. 61. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature methods. 2012;9(4):357–9. pmid:22388286
  62. 62. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. pmid:19505943
  63. 63. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. pmid:20110278
  64. 64. Carver T, Harris SR, Berriman M, Parkhill J, McQuillan JA. Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data. Bioinformatics. 2012;28(4):464–9. pmid:22199388
  65. 65. Thorvaldsdóttir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Briefings in Bioinformatics. 2012:bbs017.