Bacterial community and arsenic functional genes diversity in arsenic contaminated soils from different geographic locations

To understand how soil microbial communities and arsenic (As) functional genes respond to soil arsenic (As) contamination, five soils contaminated with As at different levels were collected from diverse geographic locations, incubated for 54 days under flooded conditions, and examined by both MiSeq sequencing of 16S rRNA gene amplicons and functional gene microarray (GeoChip 4.0). The results showed that both bacterial community structure and As functional gene structure differed among geographical locations. The diversity of As functional genes correlated positively with the diversity of 16S rRNA genes (P< 0.05). Higher diversities of As functional genes and 16S rRNA genes were observed in the soils with higher available As. Soil pH, phosphate-extractable As, and amorphous Fe content were the most important factors in shaping the bacterial community structure and As transformation functional genes. Geographic location was also important in controlling both the bacterial community and As transformation functional potential. These findings provide insights into the variation of As transformation functional genes in soils contaminated with different levels of As at different geographic locations, and the impact of environmental As contamination on the soil bacterial community.


Introduction
Arsenic (As) is a naturally occurring metalloid element present in virtually all environmental media and is a well-known carcinogen even at low levels [1,2]. Anthropogenic activities, such as mining and smelting, use of As-containing agrochemicals, and contamination of some water supplies with As-laden groundwater have led to serious environmental pollution and public health problems in some parts of the world. The worst affected areas are in South and Collection and characterization of soil samples Five soils from diverse geographic locations were used in this study: two paddy soil samples from Bangladesh (Faridpur and Sonagaon, designated as B1, B2, respectively) that had been contaminated with As by irrigation of groundwater, two paddy soils collected from south-central China (Chenzhou and Qiyang, named as C1, C2, respectively), which had high concentrations of As due to nearby mining activities (C1) or geogenic weathering (C2), and one upland arable soil from the United Kingdom (Rothamsted, southeast England, referred to as UK), which had an elevated As concentration, probably due to a relatively high geogenic background. Soils were collected from the plow layer (0-20 cm). A single bulk sample of approximately 100 kg was taken from each site after the crop had been harvested and paddy water drained. For the incubation experiment, sub-samples were taken from the bulk of 100 kg soil, which had been thoroughly homogenized. In addition, three replicates were included for each soil. Fresh soils were collected from the fields and transported to our lab by a carrier. The soils were air-dried at ambient room temperature (~25˚C) and sieved through a 6-mm sieve for soil property analysis. Triplicate samples were created for incubation, and the details had been described by Zhao et al. [24], in which the relationships between As methylation in soil, soil properties, and the abundance and diversity of microbial arsM genes were discussed. The cropping system for the sites in Bangladesh and China was double rice crops per year, with flooding during the rice growing season. The cropping system for the UK soil was a rotation of wheat and oilseed rape without irrigation. Soil properties, including soil pH (using a soil-to-water ratio of 1:1), soil organic carbon (SOC, measured with the dichromate oxidization method), available phosphorus (by 0.5M NaHCO 3 ), soil amorphous Fe (by ammonium oxalate oxalic acid extraction), soil total As (by aqua-regia digestion) and soil phosphate-extractable As (by 0.05 M NH 4 H 2 PO 4 extraction), were determined as described previously [25] (Table A in S1 File). To induce reducing conditions in the soils, triplicate 500 g soil samples were placed in plastic pots and flooded with deionized water to maintain a 2-cm layer of standing water above the soil surface. The soils were incubated at room temperature (20-25˚C) for 30 days, and the concentrations of As species (arsenate, arsenite, monomethylarsonic acid (MMA), and dimethylarsinic acid (DMA)) in the soil pore water were monitored as described by Zhao et al. [24]. Twenty grams of soil from the center of each incubation pot were collected and stored at -80˚C for later DNA extraction.

DNA extraction
Triplicate 5 g soil samples were used for total soil DNA extraction with a freeze grinding, SDS cell lysis method as described previously [26]. The crude DNA was purified using a low melting agarose gel, followed by phenol-chloroform-butanol extraction. DNA quality was assessed based on the spectrometry absorbance ratios at 260/280 nm and 260/230 nm using a Nano-Drop ND-1000 Spectrophotometer (NanoDrop Technologies Inc., Wilmington, DE). DNA concentration was measured by Pico Green using a FLUOstar OPTIMA fluorescence plate reader (BMG LABTECH, Jena, Germany). DNA samples were diluted to 2 ng /μl and stored at -80˚C for further sequencing analysis.

GeoChip analysis
GeoChip 4.0, a microbial functional gene array, synthesized by NimbleGen (Madison, Wisconsin, USA) in their 12-plex format [23] was used to detect the As related functional genes within the bacterial communities of the five soils. The purified soil DNA (1 μg) was labeled with Cy-3 dye and then hybridized at 42˚C for 16 h on a Hybridization Station (MAUI, BioMicro Systems, Salt Lake City, UT) [23]. Low quality spots (signal intensity < 1000) were removed prior to statistical analysis as described previously [23]. Spots were scored as positive if the signal-to-noise ratio (SNR) was ! 2.0 and the coefficient of variation (CV) of the background was < 0.8. Genes that were detected in only one sample were also removed. The signal intensity of each spot was normalized by the mean intensity of the slide. The quantified microarray data were preprocessed using the microarray analysis pipeline on our website (http://ieg. ou.edu/microarray/) as described previously [27] and stored on our lab ftp server (ftp://129.15. 40.240:8187/yunfu_gu/GeoChip_data_gu.xlsx).

Amplification and MiSeq sequencing of 16S rRNA gene amplicons
The primer pair 515F (5'-GTGCCAGCMGCCGCGGTAA-3') and 806R (5'-GGACTACHVG GGT-WTCTAAT-3'), combined with an adapter sequence and barcode sequences for sample differentiation, were used to amplify the V4 region of the 16S rRNA gene [28]. The PCR reaction mixture consisted of 5 μl of 10 × AccuPrime PCR buffer II (INVITROGEN), 1 μl of each primer (10 μM), 5 μl template DNA and 0.2 μl AccuPrime High Fidelity Taq Polymerase, and 38.8 μl sterile water added to a final volume of 50 μl. Three replicates of amplification were performed for each sample. The PCR procedure was as follows: initial denaturation at 94˚C for 1 min, followed by 32 cycles of 94˚C for 20 s, 53˚C for 25 s, and 68˚C for 45 s, with final extension at 68˚C for 10 min. PCR products (3 μl) were examined by 1% agarose gel electrophoresis, and then 5 μl of triplicate reactions were combined and quantified with PicoGreen. About 200 ng PCR products from each sample were pooled and purified with a QIAquick PCR Purification Kit (QIAGEN), and then re-quantified with PicoGreen. Denaturation was performed by mixing 10 μl of combined PCR products (2 nM) and 10 μl 0.1 N NaOH. Denatured DNA was diluted to 6 pM and mixed with an equal volume of 6 pM PhiX library. Finally, the 600 μl mixture was loaded into the reagent cartridge and run on a MiSeq sequencer (Illumina) for 300 cycles.
After primers and spacers were trimmed, forward and reverse sequences were combined with Flash [29]. Low quality fragments were then removed and sequences shorter than 240 bp were also trimmed before they were subjected to Chimera detection and removal by U-Chime [30]. All the sequence data were stored in our lab website (http://ieg.ou.edu/4download). OTU classification was performed using UCLUST at a 97% similarity level. An RDP classifier pipeline (http://zhoulab5.rccc.ou.edu) was used for taxonomic assignment of each OTU. OTUs with only one sequence (singleton) were not included in downstream analysis.

Statistical analysis
The number of observed As functional genes and 16S rRNA gene species (Richness), Shannon index, and Evenness computed as the Hill's ratios between the Shannon indices and Richness were calculated to estimate within-sample diversity. The GeoChip hybridization intensity data were logarithmically transformed prior to statistical analysis. Data normality and variance homogeneity were analyzed using the Shapiro-Wilks test and the Levene's test, respectively. Pearson correlation analysis with a Tukey's post-hoc test was used to reveal the linkage between the alpha-diversity of the whole microbial community and that of the As functional genes. Principle coordinate analysis (PCoA) with Bray-Curtis distance was used to estimate the bacterial community composition and As functional gene structure heterogeneity by using the relative abundances of OTUs or normalized gene intensity. Mantel tests and PCoA with the envfit function methods were used to evaluate the linkages between soil microbial community composition, As functional gene structure and soil properties. Adonis, Analysis of similarities (ANOSIM) and multi-response permutation procedure (MRPP) were used to test for dissimilarities between any two of the five As contaminated soils based on the null hypothesis [31]. The contributions of soil properties and geographic locations to the variances observed within the microbial communities were assessed with variance partitioning analysis (VPA). Spatial variables measured as latitude-longitude coordinates were converted into projected coordinates and were represented by a cubictrend-surface polynomial to capture broad-scale spatial trends. All the above analyses were performed using functions in the VEGAN package (v.1.15-1) [32] in R v.3.0.2 [33].

Soil properties
The physicochemical properties that play important roles in mediating As transformation and microbial metabolism were analyzed in the five soils (Table A in S1 File) [24]. Both the soils collected from Bangladesh (B1 and B2) and the United Kingdom (UK) were silty clay loam, while the soils from China (C1 and C2) were clay. The soil sample with the highest pH was from Bangladesh (B1, pH 8.2), whereas the soil sample with the lowest pH was from the UK (pH 6.8). Total soil As and phosphate-extractable As ranged from 8.7 to 81.2 mg kg -1 and 0.5 to 7.9 mg kg -1 , respectively. C1 soil had the highest concentrations amounts of total As and phosphate extractable As, whereas B2 and UK soils had the lowest concentrations of total As and phosphate extractable As, respectively. The amorphous iron in the soils varied from 610 to 3, 835 mg kg -1 , with B1 and C2 soils having the lowest and highest concentration, respectively. The available phosphorus (AP) in the soil samples ranged from 5.9 to 15.2 mg kg -1 . Finally, soil organic carbon (SOC) concentrations were high in the two Chinese soils (22.3−46.6 g kg −1 ), but relatively low in the UK soil (9.6 g kg −1 ).

Composition of the soil microbial communities
After paired-end joining and quality trimming, 9,577 to 50,725 effective 16S rRNA gene sequences (average length of 253 bp) per sample were obtained from each replicate of soil (Table 1), resulting in a total of 396,027 sequences from all five soils. All samples were subjected to random re-sampling and rarefaction at 9,577 sequences. OTU analysis at a 97% similarity level resulted in 31,308 OTUs, of which 11,885 remained after singleton removal and were used in downstream analysis. PCoA of the 16S rRNA gene sequences was performed to determine the overall variation among bacterial communities ( Fig 1A). The results showed that bacterial community structures differed based on the geographical location. The PCoA plot explained 44.6% of the observed variation, with the first axis explaining 27.2% of the variations and separating the C1 and C2 soils from the B1, B2 and UK soils. The second axis explained 17.4% of the variation and separated the B1, B2 from UK soils. Results from the dissimilarity tests based on the Adonis, Anosim and MRPP algorithms indicated significant differences (P<0.01) between geographical locations as well (Table B in S1 File). 0.4 × 10 7 to 2.3 × 10 7 g −1 dry soils. Between 27 and 35 arsM gene targets were detected in the five soils. There are also 133 aoxB (aioB) gene probes in GeoChip 4.0, among which 74 (47 to 66 in each sample) were detected among the five soils. The arsM and aoxB gene targets were the most numerous in the soils from Bangladesh, whereas the UK soil had the lowest number of these two genes. Similar to bacterial communities, the PCoA results based on the GeoChip data showed that the five soils differed from each other (Fig 1B), indicating that the As functional communities varied based on geographical location, which was also verified by the dissimilarity tests using the Adonis, ANOSIM and MRPP algorithms ( Table B in S1 File). Both the numbers and structure of the As functional genes (ars, aoxB and arsM) in the paddy soils with high As concentrations were different from those in the non-paddy soils with low As concentrations. A total of 560 As functional genes were detected in the five soils. Among these, 357-517 were detected in paddy soil samples; whereas 311-342 were detected in the UK soil samples (Table C in S1 File). In total, 329 As functional genes including arsC (GI 144944757) derived from Geobacter bemidjiensis Bem, Sphingomonas sp. SKA58 (GI 94497023) and Acidovorax sp. JS42 (GI 120607444), and aoxB from Acidovorax sp. 75 (GI 162568505) were shared by all five soils. Moreover, 49 As functional genes were only observed in all of the paddy soils (B1, B2, C1 and C2) while 24 were detected in the Bangladeshi soils (B1 and B2) (Table C in S1 File). No unique As functional genes were detected in the UK soil (Table C in S1 File). Only two unique arsC genes including GI 168992628 derived from Lysinibacillus sphaericus C3-41 and GI 117610002 derived from Magnetococcus sp. MC-1 were detected in soil C1. In contrast, several As functional genes, such as arsC derived from Myxococcus xanthus DK 1622 (GI 108463091) and Burkholderia multivorans ATCC 17616 (GI 189348441), arsB derived from Sulfurovum sp. NBC37-1 (GI 152992971), arsA from Rhizobium leguminosarum bv. viciae 3841 (GI 116254241), arsM from Desulfotomaculum acetoxidans DSM 771 (GI 257779458), and aoxB from Chloroflexus sp. Y-400-fl (GI 222524598) were detected only in the Bangladeshi soils (Table C in S1 File). All types of the As functional genes detected in the UK soil with a low As content were also detected in at least one of the four paddy soils with high available As contents, suggesting that the UK soil contained a less-diverse subset of the As transformation genes. Finally, the Mantel test analyses showed that the structures of functional genes involved in arsenite oxidation (aoxB) and arsenite methylation (arsM) were positively and significantly correlated with those involved in arsenite efflux (arsA and arsB) and arsenate reduction (arsC), respectively (Table D in S1 File).

Soil microbial community diversity
To assess the internal (within soils) complexity of individual microbial populations, the mean values of OTU Richness (the observed OTUs number), Shannon-Wiener index (H) and Evenness were calculated based on the 16S rRNA gene data ( Table 1). The average Richness ranged from 1757 to 2966 across the five soils. The highest Richness was in soil B1 while the lowest was in the UK soil. Similar trends in Evenness and Shannon-Wiener index were also observed ( Table 1).
The As functional genes Richness (the observed gene species), Shannon-Wiener index (H) and Evenness were also evaluated (Table E in S1 File). These indices followed a trend similar as the 16S rRNA genes, although Evenness remained relatively constant across all samples. Pearson correlation analysis showed that the As functional gene Richness (the observed gene species), Shannon-Wiener index (H) and Evenness correlated positively with those of the 16S rRNA gene (P<0.05) (Fig A in S1 File). Fig 2 summarizes the relative bacterial community abundance at the phylum level for each soil. Proteobacteria was the dominant phylum for all samples, accounting for between 29% and 38% of detected OTUs. However, different soils had distinct subdominant phyla (phyla with greatest abundance other than Proteobacteria). For soil B1, Acidobacteria, Chloroflexi and Actinobacteria were the subdominant groups, comprising 24%, 10%, and 6.4%, respectively, of the sequences detected. Acidobacteria, Actinobacteria and Firmicutes were the subdominant groups for soil B2, comprising 25%, 9.7%, and 6.4%, respectively. The subdominant groups from soil C1 were Bacteroidetes, Acidobacteria and Firmicutes, comprising 13%, 9.8%, and 9.6%, respectively. In soil C2, Bacteroidetes, Firmicutes and Chloroflexi were the subdominant groups, comprising 15%, 13%, and 9.6%, respectively. For the UK soil, Bacteroidetes, Firmicutes and Planctomycetes were the subdominant groups, comprising 19%, 13%, and 6.7%, respectively. These phyla represented approximately 69% to 78% of bacteria detected within the five soils.

Changes in the bacterial community composition
The Alphaproteobacteria comprised 25 to 42% of the Proteobacteria, followed by Betaproteobacteria (27 to 37%), Deltaproteobacteria (23 to 27%), and Gammaproteobacteria (5 to14%) (Fig 3A). Within the Alphaproteobacteria, six taxa were identified. Rhizobiales was the dominant group (29 to 37%) for all 5 soils, followed by Sphingomonadales and Caulobacterales, representing 15 to 39% and 9 to 20% of each population, respectively. Alphaproteobacteria incertae sedis, Rhodobacterales and Rhodospirillales comprised 14 to 27% of the Alphaproteobacteria (Fig 3B). The relative abundance of Rhizobiales was the highest in UK soil with the lowest available As content, but was the lowest was in B1 soil with a high available As content. Table F in S1 File lists the orders of bacteria detected. A total of 76 orders were identified by the RDP classifier (Table F in S1 File). Of these, 53 (60 to 76% of the classified sequences) were shared by all soils, including Anaerolineales, Rhodocyclales, Sphingomonadales, Rhizobiales, Clostridiales, Burkholderiales, Myxococcales, Planctomycetales, Sphingobacteriales, and Anaerolineales. Two orders (Thermogemmatisporales, Herpetosiphonales) only appeared in soil B2 while others such as Erysipelotrichales, Methanomicrobiales, Chlamydiales, Rubrobacterales, and Desulfarculales, were only detected in the paddy soils (B1, B2, C1 and C2). At the family level, a total of 174 families were obtained (Table G in S1 File). One hundred and twelve families (occupying 96 to 98% of the classified sequences), including Anaerolineaceae, Chitinophagaceae, Planctomycetaceae, Sphingomonadaceae, Gemmatimonadaceae, Cystobacteraceae, Clostridiaceae 1, Geobacteraceae, Rhodocyclaceae, Hyphomicrobiaceae, Comamonadaceae, Oxalobacteraceae, Caulobacteraceae, were shared by all soils. There were 17 families that appeared in only one soil and accounted for less than 1% of the classified sequences. Among those families, 15 were observed in the paddy soils while two were in the UK soil.
The abundance of the 544 genera detected is summarized in Table H in S1 File. Among the assigned 544 genera, 224 were shared by all soils and accounted for 52 to 66% of the classified sequences. There were 68 rare genera that were observed in only one sample, which accounted for < 0.5% of the total classified genera sequences in each sample. Among the rare genera, 60 were only observed in the four paddy soils while 8 were only in the non-paddy soil. The 10 most abundant genera in each soil (a total of 27 genera for all 5 soils) were compared across the five soils (Fig 4). Eight genera were abundant (> 0.5%) in all five soils, including Gp6, Phenylobacterium, Acidovorax, Geobacter, Subdivision3 genera incertae sedis, Anaeromyxobacter, Sphingomonas, and Flavisolibacter.

Relationships between bacterial communities, As potential function and environmental parameters
All soil physical-chemical parameters correlated significantly with the soil bacterial communities at the taxonomic level based on the PCoA data (P < 0.05) (Table I in S1 File). Moreover, the Mantel test analyses showed that the soil bacterial taxonomic structure correlated significantly with soil pH (P = 0.039) and soil phosphate extractable As (P = 0.001) ( Table J in  https://doi.org/10.1371/journal.pone.0176696.g003 S1 File). For As functional genes, PCoA data showed that soil pH was the most significantly correlated variable with soil microbial communities at the functional gene level (Table I in S1 File). In addition, the Mantel test analyses further revealed that the bacterial As functional gene structure correlated significantly with soil pH (P = 0.006), soil phosphate extractable As (P = 0.008) and soil amorphous Fe (P = 0.022) (Table J in S1 File).
Variance partitioning analysis (VPA) was used to further evaluate the contribution of selected soil parameters and geographic location to the bacterial community composition and As functional gene structure (Fig 5). Based on the 16S rRNA gene MiSeq data, a total of 95% of the variation was explained by the soil parameters and geographic location (Fig 5a), which could independently explained 33.0% and 37.0% of the variation of bacterial communities, respectively. Interactions between these two components explained another 25% of the variation. Only 5% of the community variance could not be explained by these two components and their interactions. For GeoChip analyses, a total of 72% of the variation was explained by the soil parameters and geographic location (Fig 5b), which independently explained 16% and 13%, respectively, of the total variations observed. Interactions between these two factors (43.3%) seemed to have more of an effect than the individual factors. About 28% of the community variation could not be explained by these two environmental variables.

Discussion
It is well known that the soil type and texture significantly influences As mobility and toxicity. In regions with soils derived from the same parent material, soil texture is often the dominant soil feature affecting background levels of As [34]. Fine-textured clay soils tend to result in a lower As mobility compared to coarse textured sandy soils due to a higher sorptive capacity [35,36]. The anaerobic conditions of flooded paddy soils favor As mobilization because of the reductive dissolution of iron oxyhydroxides and the reduction of arsenate to arsenite. The Microbial communities and arsenic functional genes diversity paddy soils examined in this study had a higher pH, soil organic matter, soil total As and soil phosphate-extractable As than the upland soil from the UK. The paddy soils from Bangladesh are a silty clay loam while those from China were clay. The concentrations of soil organic carbon, soil total As and phosphate-extractable As were higher in the Bangladeshi soils than those in Chinese soils.
The higher diversity of the As related functional genes (e.g. arsC,arsB, arsA, arsM and aoxB) in the soils with higher As concentrations in the current study was not unexpected, because high levels of As may exert a strong selective pressure leading to increased diversity of Asrelated genes. Moreover, studies of microbial communities using similar GeoChip microarray or metagenomics indicated an overabundance of metal resistance genes in metal-rich environments [37,38]. Poirel et al. found strong positive correlations between changes in arsB gene abundance and As levels using real-time PCR [22]. Significant correlations were also observed in studying the response of bacterial communities to the chronic chromium and arsenic contamination in Pakistan using pyrosequencing and quantitative PCR [39]. In addition, in the soils studied, other soil properties such as pH represent additional stressors for microbial communities that may impact their diversity. Metagenomics studies also showed that microbialmediated As metabolic processes in paddy soils correlated significantly with pH [40]. Zhang et al. [41] revealed a strong correlation between arsC and arsM gene copies based on quantitative PCR, and suggested that similar compositions of microbial communities were involved in As(V) reduction and As(III) methylation in paddy soils. Previous studies revealed that the As (V) reduction, As(III) oxidation, and As(III) methylation processes co-exist and are closely related in microbes in paddy soils [41][42][43]. Similarly, significant correlations between the structures of the ars, aioB, and arsM genes were also observed in this study, suggesting that these different As transformation systems co-exist and are closely related in microbes in both paddy and upland soil.
The alpha-diversity of 16S rRNA genes were higher in the soils with a high level of phosphate-extractable As than in those with a low level of phosphate-extractable As. The higher diversity and abundance were observed at each taxonomic level examined. This was probably due to an evolutionary adaptation of some specific bacterial species to the As contamination stress [44]. Cai et al. (2009) reported that more bacterial species isolated from a long term high-level arsenic contamination site than from intermediate and low levels of arsenic contamination sites [18]. Turpeinen et al. (2004) also found that the diversity of arsenic-resistant bacteria in higher arsenic-, chromium-and copper-contaminated soil was higher than that in less contaminated soil [45]. The higher alpha-diversity of the soil microbial community in the more As contaminated soil suggests that many microbial species could cope with the long term As stress. Genus like Steroidobacter had higher relative abundance in the heavy metal contaminated soils [46]. Similarly, in our study, the relative abundance of the genus Steroidobacter in the soil contaminated with more As was also higher than that in the lower As containing soil (Fig 4). The presence of these bacteria suggests that they may play an important role in As contaminated due to their high adaptability to extreme environments [47], providing a stabilizing effect for the soil functions.
All the soils were dominated by Proteobacteria. Previous studies chromium and As contaminated studies showed that Proteobacteria are frequently present in metal contaminated sites and are capable of metal transformation. Odum proposed that r-selected organisms (rapidly reproducing) [48], such as Proteobacteria [49], could be stimulated after a stressor is applied to an ecosystem, which would explain their dominance. Within Proteobacteria, the alpha and beta subdivision were the predominant groups, which was in line with the study results by Han [50]. Bacteria belonging to the Alphaproteobacteria were identified as the major microbial communities in contaminated soils while the Betaproteobacteria would disappear in soils under high As stress. In our study, other highly abundant phyla included Acidobacteria and Bacteroidetes, members of which are broadly represented in arsenic contaminated soils and sediments and have similar ecological distributions [51][52][53]. Previous studies revealed that lineages of Acidobacteria and Bacteroidetes are significantly regulated by soil parameters such as pH and organic matter, respectively [51,53]. Other soil properties such as C/N, NO 3 and the availability of trace elements can also influence microbial community composition in arsenic contaminated soils [20,54]. Similarly, significant correlations between the abundance of these two phyla and soil pH were also observed in present study. Several bacterial lineages that play critical roles in soil As transformation were observed in the four paddy soils (e.g. B1, B2, C1 and C2) with relatively high level of available As. These included bacteria within the orders Erysipelotrichales, Methanomicrobiales, Chlamydiales, Rubrobacterales, and Desulfarculales, which are typically associated with the anaerobic conditions found in paddy soils and play important roles in As-methylation and volatilization [54]. In addition, bacteria from the families Rhodothermaceae, Methanobacteriaceae, Alicyclobacillaceae are critical to the natural arsenic transformation [55][56][57]. Several genera, such as Thiobacter, Cytophaga, Nitrosomonas, Desulfomonile, Defluviicoccus, Azotobacter, Desulfatirhabdium, and Anaeroarcus are widely distributed in arsenic contaminated soils. Thiobacter can gain energy from the oxidation of arsenic and are ubiquitous in arsenic-contaminated environments [58]. Cytophaga-Flavobacterium related groups capable of methylating As have been isolated from various arsenic-rich soils and sediments [59]. Desulfomonile tiedjei, a sulfatereducing bacteria, is also widely distributed in arsenic-rich acid mine drainage [60]. Taken together, these results indicate that the paddy soils with higher available As content harbored more arsenic transforming bacterial species than the upland soil with a low level of available As.
The 16S rRNA gene sequencing data were found to be generally in line with the functional gene in the present study. For example, the arsC gene (GI 144944757) retrieved from the G. bemidjiensis Bem was detected in all the five soils, whilst 16S rRNA gene sequencing analysis also revealed the presence of Geobacter. Geobacter species, members of the order Desulfuromonadales, are able to mediate high rates of arsenite release in various As-contaminated environments [61]. Moreover, several arsC genes derived from the members of genus Sphingomonas were detected in all the five soil, whilst the presence of Sphingomonas was confirmed or by 16S rRNA gene sequence data. Besides, the genera Acidovorax and related members, which are aerobic heterotrophs and often resistant to arsenic [62], were detected in all five soils based on the16S rRNA gene MiSeq sequencing and GeoChip analysis. Rhodobacter species are photosynthetic with the capability of As reduction [63]. Hassan et al (2015) detected the Rhodobacter-like aioA sequences in the arsenic-contaminated anaerobic groundwater in Bangladesh, but no Rhodobacter 16S rRNA gene sequences were observed [61]. In contrast, in our study, both Rhodobacter-like arsC and Rhodobacter 16S rRNA gene sequences were detected in all five soils. Interestingly, Cytophaga hutchinsonii, member of the genus Cytophaga, in which the arsM and arsC genes were both detected, were found only in the soils with high available As (e.g. B1, B2, C1, and C2). This result also matched the 16S rRNA gene sequencing data.
Understanding the environmental variables that affect the microbial community structure is a key goal in microbial ecology [64]. Soil geochemical variables like soil As, soil phosphorus, soil organic matter, soil nitrate concentration, etc. are vital factors in shaping microbial communities [16,24]. In our study, soil pH, soil available As, and soil amorphous Fe (free iron oxides) were found to be the most important variables. Several studies have found these and similar variables to be important as well. Sheik et al (2012) found that the structure, diversity and abundance of microbial communities were highly influenced by the presence and concentration of As [39]. Similar results have also been obtained for rhizosphere and non-rhizosphere communities of the As-hyperaccumulator Pteris vittata [19]. Similar results were also obtained by Hornstrom [65]. Previous studies had revealed that soil pH is a key factor in shaping soil microbial diversity and composition since the intracellular pH of most microorganisms is usually within 1 pH unit of neutrality, and any significant deviation in environmental pH value would impose stress on microbes [66]. Iron (hydr) oxides have been reported to be an important abiotic factor in shaping the microbial community in As contaminated soils because these oxides provide the most important sorption phase for As [67]. Some arsenite tolerant, anaerobic ferrous iron-oxidizing bacteria such as Geobacter could produce highly crystalline ferric iron minerals that were only slowly reduced by iron-reducing bacteria and thus stimulate the permanent immobilization of arsenic [62,68]. Zhang et al (2015) reported that the concentration of total Fe plays a role in shaping the microbial communities involved in respiratory As (V) reduction [44]. Considering that the ferric iron and arsenate are electron acceptors for a range of anaerobic heterotrophic microbes, the positive relations between iron concentration and As release may indicate that iron-and arsenic-cycling microorganisms are widely presented and co-exist in paddy soils in anaerobic conditions [61].
Geographic location was also important in controlling both the bacterial community structure and functional gene structure, although overall, pH appeared to be a stronger driver ( Fig  5). Compared to the bacterial community, soil geochemical variables and geographic location explained a lower percentage of the variation in the As functional gene structure. This could be due to the soil parameters such as soil pH and As concentration would apply a selective pressure on the microbial community, which could increase the similarity between spatially isolated communities. Similar to the previous studies, the PCoA results in this study were in accordance with the dissimilarity analysis results, showing the soils from different geographic locations formed separate clusters. This pattern suggests a significant impact of local abiotic environmental conditions on the composition and structures of microbial communities. It should be noted that the sampling strategy and the relatively small number of samples analyzed in this study may have limited our ability to fully separate the effects of environmental variables and geographic locations. More replicate samples should be collected for study of the bacterial community structures in As-contaminated soils across different geographic locations.

Conclusions
In summary, both functional gene array (GeoChip 4.0) and next generation sequencing of 16S rRNA genes were applied to analyze the soil microbial communities and As functional gene communities from diverse geographic locations and different levels of As contamination. The study revealed that the bacterial taxonomic composition and As functional gene composition were often related in As contaminated soils from diverse geographic locations. Soil properties (i.e. soil pH, soil available As and soil amorphous Fe) as well as graphical locations were found to be significantly factors in shaping the soil microbial community composition and As-related functional gene structure. This study provides valuable insight into the variation of the abundance and diversity of the As functional gene structure and the bacterial community composition in As contaminated soils from diverse geographic locations, and their relations to environmental variables.
Supporting information S1 File. Fig A. The correlation between alpha-diversity of microbial community and that of As functional genes. The alpha-diversity was calculated by richness. The Pearson correlation coefficient (r) and the significance level (P). Table A. Characteristics of the five geographically distributed soils contaminated with different arsenic. Table B. Statistical analysis of differences in the microbial community composition and structure among the five As contaminated soils based on GeoChip data and Illumina MiSeq sequencing data. Table C. As functional genes detected in each soil. Table D. Mantel test of the relationship between the structure of genes involved in As methylation (arsM), oxidation (aoxB) and arsenite efflux (arsA and arsB) and arsenate reduction (arsC). Table E. Diversity indices of the As functional genes of the five geographically distributed soils. Table F. Sequences of all orders in each soil. Table G. Sequences of all families in each soil. Table H. Sequences of all genera in each soil. Table I. Principle coordinate analysis (PCoA) of GeoChip and 16S rRNA gene sequencing data with environmental properties among five arsenic contaminated soils from different geographic locations. Table J