Functional Gene Composition, Diversity and Redundancy in Microbial Stream Biofilm Communities

We surveyed the functional gene composition and diversity of microbial biofilm communities in 18 New Zealand streams affected by different types of catchment land use, using a comprehensive functional gene array, GeoChip 3.0. A total of 5,371 nutrient cycling and energy metabolism genes within 65 gene families were detected among all samples (342 to 2,666 genes per stream). Carbon cycling genes were most common, followed by nitrogen cycling genes, with smaller proportions of sulphur, phosphorus cycling and energy metabolism genes. Samples from urban and native forest streams had the most similar functional gene composition, while samples from exotic forest and rural streams exhibited the most variation. There were significant differences between nitrogen and sulphur cycling genes detected in native forest and urban samples compared to exotic forest and rural samples, attributed to contrasting proportions of nitrogen fixation, denitrification, and sulphur reduction genes. Most genes were detected only in one or a few samples, with only a small minority occurring in all samples. Nonetheless, 42 of 65 gene families occurred in every sample and overall proportions of gene families were similar among samples from contrasting streams. This suggests the existence of functional gene redundancy among different stream biofilm communities despite contrasting taxonomic composition.


Introduction
The role of streams as landscape drainage systems makes these ecosystems susceptible to impacts of land use changes such as deforestation, agriculture, and urbanisation [1][2][3]. While there is much interest in improving the health of degraded streams, our understanding of stream ecological processes is incomplete, limiting the effectiveness of management and restoration efforts. In particular, the links between microbial community composition, metabolic functions, and stream biogeochemistry are not well understood [4].
A majority of the diverse and abundant microorganisms found in streams occur in surfaceassociated biofilm communities, where they are thought to contribute significantly to in-stream ecological processes [5,6]. This includes carbon (C) processing [7][8][9], nitrogen (N) and sulphur (S) cycling [10], and immobilization and transformation of aquatic contaminant molecules [11][12][13]. Catchment land use affects the organic material present in streams [14,15], and streams in urban and agricultural catchments typically receive elevated inputs of nutrients and inorganic pollutants [1,16]. Recent studies have revealed differences in the composition of stream benthic bacterial communities related to catchment urbanization [17,18], and evidence suggests such changes in microbial community composition may influence microbiallymediated functions [8,[18][19][20][21][22]. However, it remains unclear to what extent differences in microbial community composition in stream biofilms are accompanied by differences in microbial functional genes or associated in-stream biogeochemical processes.
The study of microbial functional genes in natural environments is challenging due to the great diversity and complexity of microbial communities and associated functional genes. However, recently developed molecular approaches are providing new insights into the role of microbial functional genes in natural systems [23]. For example, GeoChip 3.0 is a comprehensive functional gene microarray which includes 27,812 probes covering 56,990 gene sequences from 292 functional gene families involved in C, N, S and phosphorus (P) cycling, energy metabolism, antibiotic and metal resistance, organic contaminant degradation, and a phylogenetic marker gene [24].
In the present study, we surveyed the microbial functional gene composition of biofilm samples from a variety of streams affected by different land use types using GeoChip 3.0. Our objectives were to characterize the typical functional gene composition of microbial stream biofilm communities, to investigate whether there were any differences in functional gene composition among streams representing different types of catchment land use, and to explore links between microbial community structure and environmental variables. The present study focuses on the functional genes involved in nutrient (C, N, P and S) cycling and energy metabolism. We detected a broad range of microbial nutrient cycling and energy metabolism genes in stream biofilms, evidence of contrasting composition among streams affected by different types of catchment land use (depending on functional gene category), and the apparent existence of functional gene redundancy despite contrasting microbial composition.

Sample sites and sample collection
Biofilm samples were collected in November 2009 from 18 streams and rivers located throughout the greater Auckland region in New Zealand (Table 1). These streams are located in catchments representing the dominant types of land development in the region, with varying proportions of forest, agriculture, and urban land cover, and with in-stream water quality and ecological health ranging from excellent to poor, according to results of a long-term monitoring programme [25][26][27]. Prior investigations have found evidence of catchment development and water quality-related differences in microbial biofilm communities among these streams [28]. The locations of sampling sites were kindly provided by the Freshwater Monitoring group at Auckland Council. The sites and sampled material are not protected, and no special permits were required for sampling at any of the sites.
At each sampling site, the stream was divided into five consecutive reaches of approximately 5 m length. Five rocks were randomly selected and consecutively removed from within each reach, starting from the most downstream reach and proceeding in an upstream direction, for a total of 25 rocks per stream. After removal from the water, biofilm was immediately sampled from the upper surface of each rock by abrasion with a sterile Speci-sponge (Nasco, Fort Atkinson, WI, USA), which was then sealed in a sterile Whirl-Pak bag (Nasco) and chilled on ice, and the rock returned to the stream. In one stream (Oteha stream) no rocks were found, and in this case samples were instead collected from fallen tree branches.
In the laboratory, each Speci-sponge was transferred into a bag with about 40 ml sterile water, and subjected to repetitive compression in a Seward Stomacher laboratory blender (normal speed, 120 s) to release biofilm material. The released material was transferred into 50 mL centrifuge tubes and pelleted by centrifugation (3500 x g, 10 min). Pelleted biofilm samples

DNA preparation and GeoChip hybridization
DNA was extracted from each of the five combined biofilm samples from each stream (0.25 g) using the method of Zhou et al. [29] and resuspended in 100 μl ultra pure water (Invitrogen). DNA extracts from the five biofilm samples from each stream were pooled to give a single representative DNA sample for each stream. Pooled DNA extracts were then subjected to wholegenome amplification using a Templiphi Kit (GE Healthcare, Piscataway, NJ, USA) and a modified buffer which included single-strand binding protein (267 ng μl -1 ) and spermidine (0.1 mM) to improve amplification efficiency [30]. Reactions were incubated at 30°C for three hours then stopped by heating to 65°C for 10 minutes. Amplification products were subsequently labeled with the fluorescent dye Cy-5 using a random priming method, as described previously [31], before purification using a QIA quick purification kit (Qiagen, Valencia, CA, USA) and drying in a SpeedVac (Thermosavant, Milford, MA, USA) at 45°C for 45 minutes. Samples were then resuspended in 120 μl hybridization buffer (50% formamide, 3 x saline-sodium citrate buffer, 10 μg herring sperm DNA (Promega, Madison, WI, USA) and 0.1% SDS), denatured at 95°C for 5 minutes, and then hybridized with a GeoChip 3.0 microarray on an HS4800 Pro hybridization station (TECAN US, Durham, NC, USA) according to manufacturer's directions. The hybridized microarrays were scanned using a Scan Array Express Microarray Scanner (Perkin-Elmer, Boston, MA, USA), and resulting images processed using ImaGene 6.0 software (BioDiscovery, El Segundo, CA, USA) according to previously described procedures [32]. To assess the reproducibility of this microarray-based approach, biofilm samples from six streams (Cascade, Opanuku, Hoteo, Matakana, Lucas and Oakley) were subjected to triplicate GeoChip hybridizations, while all other samples were hybridized once.

Data analysis
GeoChip data comprised a list of functional genes detected in each stream, with the abundance of each gene indicated by the measured intensity of the probe hybridization signal. Shannon and Simpson diversity indices were initially calculated on raw gene abundance data. For comparisons of overall functional gene assemblages, total gene abundance data was standardized among samples. Alternatively, for comparisons of the relative proportions of functional gene families, the numbers of detected genes belonging to different gene families were summed and expressed as proportions of the total number of genes in each sample. Subsets of data belonging to different functional gene categories (carbon, nitrogen, phosphorus, and sulphur cycling, and energy metabolism) were standardized and analysed separately. For analysis of the taxonomic composition of samples, the number of gyrB phylogenetic marker genes belonging to different genera were summed and expressed as proportions of the total number of gyrB genes detected in each sample. The gyrB gene encodes the DNA gyrase β-subunit, and provides a higher level of taxonomic resolution than 16S ribosomal RNA gene sequences [24,33,34]. Similarities and differences between standardized and proportional gene assemblage data from different streams and catchment types were investigated using multivariate statistical analyses. Bray-Curtis similarity of gene assemblages was calculated between all samples. Differences between samples were investigated using clustering analysis, non-metric multidimensional scaling (MDS) and Analysis of Similarity (ANOSIM). Similarity Percentage (SIMPER) analysis was used to investigate the variables contributing to observed similarities and differences. Similarities between patterns of Bray-Curtis sample similarity based on different subsets of functional genes and based on taxonomic composition were investigated using RELATE (a Mantel-type comparison of similarity matrices). All multivariate statistical analyses (MDS, ANOSIM, SIMPER, and RELATE) were carried out in Primer 6 (Primer-E Ltd., UK). Links between functional gene assemblages and environmental variables were investigated by regressing water quality and catchment land use measurements against functional gene diversity indices, and against primary and secondary axes of MDS plots based on functional gene assemblages in biofilm samples. Regression analyses were carried out in R 2.12.1 [35].

Reproducibility of GeoChip analysis of stream biofilm samples
To investigate the reproducibility of the GeoChip analysis method, pooled biofilm DNA samples from six streams were each subjected to triplicate GeoChip hybridizations. Triplicate results from three streams (Cascade, Opanuku, and Lucas) had very similar numbers and composition of genes (! 90% of detected genes shared between triplicate results; > 80% Bray-Curtis similarity between triplicate results). Triplicate results for each of the remaining three streams (Matakana, Hoteo, and Oakley) had similarly consistent gene assemblages, except one triplicate result in each case had 20% to 40% fewer genes than the other two triplicate results, suggesting reduced GeoChip hybridization effectiveness in these cases. Clustering analysis showed that triplicate hybridization results from each stream always grouped together however, with the exception of one result from Oakley Creek biofilm (S1 Fig). ANOSIM analysis indicated that triplicate analysis results from each stream were always significantly more similar to each other than they were to results from different streams (Global R = 0.863, p = 0.001; all pairwise R values ! 0.889, p = 0.1). This suggests that the results of GeoChip analysis of stream biofilms are generally repeatable, providing confidence in the results of non-triplicate analyses.
Fewer than 400 genes were detected by triplicate analyses of Matakana Stream biofilm and by one of three analyses of Hoteo River biofilm. This represents less than half of the number of genes detected in any other samples, and these samples were excluded from further analysis. All other triplicate results were subsequently replaced with single results by averaging the probe hybridization data among triplicate results, or by calculating distances among centroids in Primer 6 (Primer-E Ltd., UK) for subsequent multivariate analyses.

Genes and gene families detected in stream biofilms
In total, 12,801 genes and 273 gene families were detected among all biofilm samples, corresponding to 46% of the probes and 93.5% of the gene families represented on the GeoChip 3.0 microarray (S4 Table). Of these, 5,371 genes (65 gene families) were associated with nutrient (C, N, P, and S) cycling and energy metabolism. A further 638 genes were gyrB phylogenetic markers for analysis of taxonomic composition. The remaining 6,792 genes (206 gene families) were associated with resistance to antibiotics and metals or remediation of organic contaminants. This paper focuses on the nutrient cycling, energy metabolism, and phylogenetic marker genes.
The number of nutrition and energy metabolism genes detected in biofilm samples varied widely, from 342 from rural Ngakoroa Stream to 2,666 from rural Kumeu Stream, with a mean of 940 genes detected per stream (sd = 570; Table 2). There was no evidence of differences in the numbers of genes detected in samples from streams with different predominant types of catchment land use. About half of the 5,371 nutrition and energy metabolism genes were each detected only in one sample, about half again were detected in two samples, and there were similar reductions between the numbers of genes shared between three, four, five or six samples. Between 46 and 81 genes were shared between each of seven to 17 samples. Genes that were detected only once accounted for 1.4% to 13% of the genes detected in most samples, with   the remainder detected in two or more samples. Biofilm samples from two streams (rural Kumeu Stream and urban Oakley Creek) had unusually high proportions of unique genes however (35% and 58% respectively), which together accounted for one third of the total number of genes detected among all samples. There were 31 genes detected only in samples from native forest streams, compared with 190 genes only detected in exotic forest streams, 1,562 genes detected only in rural streams, and 1,222 genes detected only in urban streams. On average, any two samples shared 48% of their genes (sd = 19.2%). The number of nutrient cycling (C, N, P and S) and energy metabolism gene families detected showed much less variation than the number of genes detected, ranging from 49 in rural Ngakaroa Stream biofilm to 65 gene families in rural Kumeu Stream biofilm (Table 2). A majority of the detected gene families (42 of 65) were present in all samples. Land use did not appear to select for any particular gene family.
The relative proportions of genes belonging to different overall nutrient cycling/energy metabolism categories were very similar between streams, with the exception of samples from the largest, rural waterway in the study, Hoteo River, and a stream with an almost entirely urban catchment, Oakley Creek ( Table 2). C cycling genes constituted ! 64% of the nutrition/energy metabolism genes detected in these two samples, but 43%-48% of the genes detected in other samples. Conversely, the proportion of genes associated with N cycling was 17% in Hoteo River biofilm and 23% in Oakley Creek, compared to 28%-33% in all other samples. Both of these waterways also had less than half of the proportion of S cycling genes detected in most other samples, while Hoteo River biofilm also had the highest proportion of energy metabolism genes and Oakley Creek biofilm had a very high proportion of gyrB phylogenetic marker genes, most of which were not detected in any other samples. These proportional differences were consistent with differences of absolute gene abundance when samples with similar total gene abundance were compared.

Functional gene composition differences between streams and catchment types
The relative Bray-Curtis similarity of biofilm samples based on the 5,371 nutrition/energy metabolism genes ranged from less than 20% to almost 80%, with a mean of 41.7% (sd = 12.3%). The greatest similarity (75-80%) was observed between biofilm from two urban streams (Lucas and Omaru) and, surprisingly, between these two streams and a native forest-dominated stream (Cascade), as indicated by their close proximity on an MDS plot (Fig 1a). In comparison, there was 58.1% similarity between gene assemblages detected in biofilm from two native forest streams with high water quality (Cascade and Opanuku). The lowest similarity was generally observed between rural Hoteo River biofilm compared to other streams (16.1%-27.6% similarity), and between biofilm from highly urbanized Oakley Creek compared to other streams (23.3%-38.6% similarity), as shown by the position of these sites on the periphery of the MDS ordination (Fig 1a). There was only 30% similarity between biofilm from two streams with near-wholly urban catchments (Oakley and Pakuranga). Low levels of gene assemblage similarity were also observed between biofilm from rural streams (Ngakaroa and Kumeu), and between biofilm from an exotic forest stream (Riverhead) compared to many other sites. Samples from native forest streams tended to group with those from most urban streams according to MDS ordination, suggesting relatively similar functional gene assemblages in these streams despite contrasting environmental characteristics (Fig 1a). Samples from rural streams were most scattered, and samples from exotic forest streams were separated from most other samples, suggesting divergent functional gene assemblages in these streams. When biofilm samples were compared on the basis of relative proportions of 65 nutrition and energy metabolism gene families, Bray-Curtis similarity between most samples ranged from about 80% to over 90%, with the highest similarity again between Lucas, Omaru and Cascade samples, but markedly lower similarity between Hoteo and Oakley biofilm compared to other samples (62.1%-68.7%). Exclusion of these two samples as outliers resulted in a relatively clear separation of native forest and urban stream samples from exotic forest and rural stream samples (Fig 1b). ANOSIM analysis suggested there were significant differences between functional gene composition of samples from exotic forest and urban streams, between rural and urban streams, and between native forest and exotic forest streams (Table 3). There was no evidence of significant differences between native forest and urban stream samples, or between rural and exotic stream samples.
The primary (horizontal) axis of the MDS ordination shown in Fig 1b showed a strong positive correlation (0.77, p = 0.001) with total in-stream N (S2 Table). The secondary (vertical) axis of Fig 1b was   Carbon cycling genes A total of 2,695 C cycling genes were detected among all biofilm samples, most of which were associated with autotrophic C fixation (22.4%) and various C degradation pathways (72. 2%). The close grouping of most samples in an MDS ordination indicated that most samples had relatively similar proportions of C cycling gene families regardless of catchment type (Figs 2a and 3, S2 Fig). Samples from an urban stream and two rural waterways were widely scattered, however, suggesting that these streams had divergent proportions of carbon cycling gene families. SIMPER analysis indicated that variation in proportions of C degradation and autotrophic C fixation pathway genes contributed most to these differences. In particular, samples from both the urban stream (Oakley) and a rural river (Hoteo) had much lower proportions of glyoxylate pathway genes than biofilm from all other streams (Fig 3).
The rural Hoteo river also had the lowest proportion of C fixation genes but the highest proportions of phenol oxidase, arabinofuranosidase, and starch degradation genes. A large number of acetogenesis, methane metabolism, and C degradation genes (other than glyoxylate pathway and aromatics degradation genes) were only detected in biofilm from the urban stream (Oakley). Several C cycling gene families had contrasting average proportions in native forest stream samples compared to exotic forest stream samples. For example, the proportions of PCC carbon fixation genes, CDH genes for cellulose degradation, xylA genes for hemicellulose degradation, and lcc (phenol oxidase) genes for lignin degradation were higher in exotic forest stream samples than native forest stream samples (S2 Fig). Conversely, the proportion of acetylglucosaminidase genes (chitin degradation) was higher in native forest stream samples than in exotic forest stream samples.

Nitrogen cycling genes
There were 1,606 nitrogen cycle-related genes detected in stream biofilm samples, most of which were dinitrogen reductase (nifH) genes for nitrogen fixation (561 in total), and various denitrification genes (641 in total). ANOSIM analysis found clear evidence of differences between proportions of N cycling genes detected in samples from exotic forest and native forest catchments, exotic forest and urban catchments, rural and urban catchments, and native forest and rural catchments (Table 3). Similarly, MDS ordination of samples based on proportions of N cycling genes showed a distinct cluster of urban and native forest samples, and a looser group of exotic forest and most rural samples (Fig 2b). Samples from streams with the most divergent proportions of C cycling genes (Hoteo, Ngakaroa, and Oakley) had comparatively more similar N cycling genes. SIMPER analysis indicated that differing proportions of nifH and denitrification genes contributed the most to the differences between catchment types. The average proportion of nifH genes in native forest and urban stream samples was 46% and 37% respectively, compared with 33% in rural samples and 25% in exotic forest samples (Fig 4 and  S3 Fig). Conversely, the average total proportion of five denitrification genes was 29% and 36% respectively in native forest samples and urban samples, and 39% in both exotic forest and rural samples. Of the denitrification genes, rural stream samples had the highest (9.7%) and native forest stream samples the lowest (4.7%) average proportions of nirS genes (Fig 4 and S3  Fig). Exotic forest stream samples had relatively more norB genes than native forest and urban streams, and the highest proportion of nosZ genes (7%). The proportion of ammonification gene ureC was highest in exotic forest stream samples (12.3%) and lowest in native forest samples (5.6%). Exotic forest samples also had the highest average proportions of assimilatory (nirA) and dissimilatory (napA and nrfA) nitrogen reduction genes.  aclB, ATP citrate lyase; CODH, carbon monoxide dehydrogenase; PCC, propionyl-CoA/acetyl-CoA carboxylase; rubisco, ribulose-1, 5-bisphosphate carboxylase/oxygenase; FTHFS, formyltetrahydrofolate synthetase; mcrA, methyl coenzyme M reductase; pmoA, particulate methane monooxygenase; mmoX, methane monooxygenase; aceA, isocitrate lyase; aceB, malate synthase; amyA, alpha-amylase; amyX, amylopullulanase; apu, amylopullulanase; cda, cyclomaltodextrin dextrin-hydrolase; gluco., glucoamylase; isop., isopullulanase; nplT, neopullulanase; pulA, pullulanase; b-ara., bacterial arabinofuranosidase; f-ara., fungal arabinofuranosidase; mann., mannanase; xylA, xylose isomerase; xylan., xylananse; CDH, cellobiose dehydrogenase; cello., cellobiase; endogl., endoglucanase; exogl., exoglucanase; pect., pectinase; assA, alkylsuccinate synthase; limEH, limonene epoxide hydrolase; lmo, limonene monooxygenase; vanA, vanillate monooxygenase; vdh, vanillin dehydrogenase; acetylgl., acetylglucosaminidase; endoch., endochitinase; exoch., exochitinase; glx, glyoxal oxidase; lip, lignin peroxidase; mnp, manganese peroxidase; lcc, phenol oxidase.  Sulphur, phosphorus and energy metabolism genes There were 582 S cycling genes, belonging to four gene families, detected in biofilm samples. Most of these were genes for sulphite reductases (dsrA and dsrB), while sulphite oxidase (sox) and dissimilatory adenosine-5'-phosphosulphate reductase (aprA) genes occurred at lower frequencies. The proportions of S cycling genes was different in native forest and urban samples compared to exotic forest and rural samples (Table 3). This was attributed mainly to contrasting proportions of sulphite reductase genes dsrA and dsrB, with the former accounting for 64% and 48% of S cycling genes in native forest and urban stream samples, compared to 40% in rural stream samples and just 9% in exotic forest stream samples. Conversely, dsrB genes accounted for 12% and 18% of S cycling genes respectively in native forest and urban stream samples, compared to 21% and 32% in rural and exotic forest stream samples (Fig 5 and S4 Fig). Exotic forest stream samples had a higher average proportion of dissimilatory adenosine-5'phosphosulphate reductase (aprA) genes (34%) than other stream types (11-19%). Exotic forest streams also had the highest average proportion of sulphite oxidase (sox) genes (26%) and native forest stream samples had the lowest (12%).
A total of 247 P cycling genes (in three gene families) and 241 energy metabolism genes (in two gene families) were detected in biofilm samples. Most of the P cycling genes (about 71%) were for hydrolysis of inorganic polyphosphate (ppx), followed by genes for polyphosphate biosynthesis (ppk, 27%), and a very low number of phytase genes. On average 65% of the energy metabolism genes detected in biofilm samples were c-type cytochromes and 35% were hydrogenases. There was no evidence of differences in P cycling or energy metabolism gene composition between different types of stream (S4 Fig).

Taxonomic composition of biofilm assemblages
Most of the nutrient cycling/energy metabolism genes detected in biofilm samples were of bacterial origin (85%, 22 phyla and 39 classes), with smaller proportions of archaeal genes (4%, 2 phyla and 6 classes) and eukaryote genes (10%, 3 phyla and 13 classes, mainly fungi) ( Table 4). Fewer taxa were present among the 638 gyrB phylogenetic marker genes (15 phyla and 27 classes of Bacteria, and 1 phylum and 4 classes of Archaea; Table 4). Genes from Proteobacteria and unclassified bacteria were most common among nutrient/energy genes, followed by Actinobacteria, Firmicutes, and Ascomycota (S3 Table). There was a higher proportion of Proteobacteria and Tenericutes among gyrB genes compared to nutrient energy genes, and a lower proportion of unclassified Bacteria. Oakley Creek biofilm (urban) had the highest proportion of gyrB genes, including several taxa that were undetected in most other samples (such as Chlamydiae, Chlorobia, Chloroflexi, and several classes of Cyanobacteria and Archaea). As for nutrient cycling/energy metabolism genes, about half (388) of gyrB genes were detected in only one sample, while just 25 gyrB genes (from 18 different genera) were detected in ! 15 samples.
An MDS ordination based on the proportions of genera represented among gyrB genes (Fig  2c) was very similar to that based on all nutrient cycling/energy metabolism genes (Fig 1a). Comparisons of Bray-Curtis sample similarity matrices based on proportions of nutrition/energy gene families with the similarity matrix based on gyrB genera using RELATE indicated that there were statistically significant matches for most gene categories. The similarity matrix based on C cycling gene families was most similar to the gyrB-based similarity matrix (ρ = 0.655, p = 0.001), followed by N cycling gene families (ρ = 0.418, p = 0.004), S cycling gene families (ρ = 0.321, p = 0.01), and energy metabolism gene families (ρ = 0.413, p = 0.004), suggesting a link between taxonomic composition and functional gene composition within these categories. The match with a similarity matrix based on P cycling gene families was not significant.

Discussion
A growing number of studies have considered the taxonomic composition of microbial communities in aquatic environments, enabled by advances in high-throughput molecular analysis techniques [28,[36][37][38]. The ecologically-relevant functional composition of microbial communities is less well understood, but improving knowledge of these aspects is important for increasing our understanding of the biogeochemical processes which maintain the biosphere. Our results demonstrate the presence of a diverse range of nutrient cycling and energy metabolism genes in stream biofilms, consistent with the possibility that these communities contribute significantly to key biogeochemical processes in streams.
This study included streams representing a wide variety of ecological states, from near-pristine waterways in forested catchments to highly modified and degraded waterways in highly urbanized catchments, with documented differences in nutrient and pollution levels and shading (Table 1 and S1 Table). In light of these environmental differences, the observed broad similarity of biofilm functional gene composition among different streams is perhaps surprising. Similar functional gene assemblages have been observed in GeoChip-based investigations of uranium-contaminated groundwater [39], healthy and diseased corals [40], and acid mine drainage [41]. This suggests that natural microbial communities may develop broadly similar functional gene composition and metabolic potential despite varied environmental conditions, and that environmental degradation may have only a limited impact upon microbial metabolic potential. The finding that most genes from specific taxa were detected only in one or a few samples suggests either a high degree of taxonomic divergence between streams or, more probably, that our analysis has sampled only a very small proportion of the total microbial functional gene diversity present in stream biofilms. Nonetheless, that most gene families were present in most samples suggests the existence of functional gene redundancy among stream biofilm communities. This may indicate that biofilm communities include a reservoir of microbes with the potential to restore or repair perturbed ecological processes which may arise in degraded streams. Microbial community composition can be influenced by environmental conditions [28,42], and changes in microbial community composition may affect ecological functions [21,22], such as C degradation [43] and denitrification [18]. However, other studies report no links between stream microbial community composition and the potential activity of a suite of C-, Nand P-related enzymes [44]. A fine-grained examination of our results revealed that although overall biofilm functional gene composition was similar among streams, differences were evident in the relative abundance and occurrence of specific functional genes, clearly indicating potential for enzymatic functions to vary in biofilms from different streams. Of course, the presence of functional genes does not necessarily mean that those genes are actively expressed or contributing to biogeochemical processes. To confirm whether this is the case, it may be necessary to collect parallel data on gene occurrence, gene expression and enzymatic activity in stream biofilms at different time points.
Factors previously identified as influencing stream benthic bacterial community composition include pH and conductivity [45], carbon and nutrients [46], and broad-scale land use [28]. This is consistent with our results, in which in-stream N concentration and pH, along with water temperature, shade and catchment land use, were correlated with the similarity of functional gene assemblages according to MDS ordination (Fig 1b and S2 Table). Our results suggest that the reduced N levels, increased shade and reduced water temperatures observed in exotic forest streams compared to urban streams contribute to the functional gene assemblage differences observed between these types of streams (Fig 1b). Functional gene assemblages in urban streams with moderate levels of N, shade and temperature (Lucas, Otara, Oteha) appear to have intermediate similarity to biofilm from exotic forest streams, while samples from the urban streams with the least shade, the highest temperatures and the lowest water quality (Omaru, Pakuranga and Puhinui) have more divergent functional gene assemblages. This suggests increasing levels of degradation may cause increasing divergence of biofilm functional gene composition from that observed in non-degraded stream ecosystems.
Three rural streams with partially forested catchments (Okura, Makarau, and Rangitopuni) also seem to have relatively similar biofilm functional gene assemblages to exotic forest streams, but biofilm samples from the native forest streams do not follow this pattern. In particular, both the functional gene composition and gyrB taxonomic composition of biofilm from Cascade stream, the most pristine environment in this study, was similar to biofilm from urban streams. A previous investigation found that bacterial biofilm community composition in Cascade Stream was most similar to that in urban streams including considerably degraded Omaru Stream [42], which is consistent with the present study. Cascade Stream substrates have copper concentrations that are higher than in other forested and most rural streams and comparable to many degraded urban streams [47]. Copper and other metals accumulate in stream biofilms [12] and have significant effects on biofilm communities [12,48] and microbial gene composition [49]. This factor may therefore contribute to the high similarity observed between Cascade Stream and urban stream biofilms. The source of copper in Cascade Stream is unclear, but may be linked to the underlying volcanic geology of this catchment [50]. The functional gene composition of biofilm from urban Lucas Creek appears very similar to that in Cascade Stream biofilm, however, yet Lucas Creek has low levels of copper. It thus appears that biofilm bacterial communities and functional genes may be driven towards similar composition by (or despite) many contrasting environmental factors present in urban and native forest streams.
While differences between exotic forest and urban stream functional gene assemblages can be readily ascribed to obvious vegetation cover and built environment differences, divergent factors between exotic forest catchments compared to native forest catchments are more subtle. Clear-felling and replanting of pine forest may cause drastic disturbances such as reduced surface water retention, elevated sediment loads entering streams, loss of leaf litter inputs and increased sunlight exposure, potentially altering the balance of biogeochemical cycles within streams on a multi-decadal basis. A variety of differences have been observed between streams in catchments planted with pine forest compared with streams in native broadleaf forest catchments, including increased levels of sediment, turbidity, dissolved organic C, N, and woody debris in pine forest streams [51], and altered benthic invertebrate communities and in-stream decomposition dynamics [52]. Pine needles are tough and less easily retained in waterways, with protective biochemical compounds and high C:N ratios, thus representing a less labile and lower quality nutrient resource in streams than litter from typical broadleaf species [53,54]. Additionally, benthic microbial enzyme activity may vary between native forest and pine forest streams, presumably in response to altered dissolved organic C supply [8]. This suggests that altered functional gene potential in pine forest and native forest streams is likely, and this is supported by our results.
Biofilm from rural streams showed the most variation in number and composition of functional genes detected ( Table 2 and Fig 1). This may be related to the mix of land use activities represented within rural catchments. While pastoral agricultural activity has wellestablished and long-acting impacts upon rural waterways [2], rural catchments also contain varying proportions of horticulture, native forest and exotic forest, with minor proportions of urban development (Table 1). This variation in rural land use patterns is likely to result in different inputs to rural streams and varied responses in rural stream biofilm communities. For example, streams in rural catchments with the highest proportion of forest cover had functional gene assemblages that were most similar to those detected in exotic forest streams. Rural Makarau Stream has one of the lowest in-stream N levels in this study, whereas rural Ngakaroa Stream has minimal forest cover but the highest proportion of horticultural land use in its catchment, and the highest in-stream N concentration (S1 Table). These contrasts are reflected by apparent functional gene differences between these streams (Figs  1b and 2b).
GeoChip analysis provides novel information about complex microbial communities and their potential functions, and represents a useful tool for investigating the dynamics of the microorganisms which underpin natural biogeochemical processes and cycles. Evidently, the analysis of genes using GeoChip is limited to the probes present on the chip, which is in turn constrained by the (steadily increasing) availability of robust sequence data. The number and range of probes present on the GeoChip has progressively increased on successive versions [24,55,56]. It nevertheless seems likely that this study has only "scratched the surface" of microbial functional gene diversity, despite the extent of data provided by GeoChip analysis. This is consistent with our observation that most individual genes were detected only in one or a few samples.
The observed patterns of functional gene composition and their relative gene abundance suggest the existence of functional gene redundancy in stream biofilms, as well as land-use related differences in stream microbial biofilm functional gene composition. This research raises many questions for further investigation. In particular, it is important to determine the degree to which functional gene expression and associated ecological processes in stream biofilms match the patterns of functional gene composition observed in this study. Our results also suggest hypotheses for further investigation regarding the occurrence and activity of particular functional genes or groups in relation to environmental parameters and taxonomic composition. As the costs of high-throughput sequencing steadily reduce, GeoChip analysis may provide a basis for metagenomic investigations of microbial biogeochemical functional genes of interest. Such investigations have the potential to provide insights into the regulation of biogeochemical cycling within streams, helping to clarify the role of microbial biofilm communities in maintaining the health and function of stream ecosystems.