Microbial community structure and functional potential of lava-formed Gotjawal soils in Jeju, Korea

The Gotjawal areas of Jeju Island, Korea, are comprised of unmanaged forests growing on volcanic soils. They support unique assemblages of vascular plants from both northern and southern hemispheres, but are threatened by human disturbance. The health and ecosystem function of these assemblages likely depends in part on the diversity and community structure of soil microbial communities, about which little is known. To assess the diversity of Gotjawal soil microbial communities, twenty samples were collected in November 2010 from 4 representatives of Gotjawal forests. While soil properties and microbial communities measured by 16S rRNA gene sequence data were marginally distinct among sites by PERMANOVA (p = 0.017–0.191), GeoChip data showed significant differences among sites (p <0.006). Gene composition overall, and the composition of 3 functional gene categories had similar structures themselves and similar associations with environmental factors. Among these communities, phosphorous cycling genes exhibited the most distinct patterns. 16S rRNA gene sequence data resulted in a mean 777 operational taxonomic units (OTUs), which included the following major phyla: Proteobacteria (27.9%), Actinobacteria (17.7%), Verrucomicrobia (14.3%), Acidobacteria (9.6%), Planctomycetes (9.8%), Bacteroidetes (8.9%), and Chloroflexi (2.2%). Indicator species analysis (ISA) was used to determine the taxa with high indicator value, which represented the following: uncultured Chlamydiaceae, Caulobacter, uncultured Sinobacteraceae, Paenibacillus, Arenimonas, Clostridium sensu.stricto, uncultured Burkholderiales incertae sedis, and Nocardioides in Aewol (AW), Aquicella, uncultured Planctomycetia, and Aciditerrimonas in Gujwa-Seongsan (GS), uncultured Acidobacteria Gp1, and Hamadaea in Hankyeong-Andeok (HA), and Bosea, Haliea, and Telmatocola in Jocheon-Hamdeok (JH) Gotjawal. Collectively, these results demonstrated the uniqueness of microbial communities within each Gotjawal region, likely reflecting different patterns of soil, plant assemblages and microclimates.


Introduction
On Jeju Island, Korea, the term "Gotjawal" refers to a unique natural unmanaged forest that grows on widely distributed lava-based formations characterized by dense stands of trees and shrub undergrowth [1]. The Gotjawal provides numerous ecosystems services, including groundwater purification [2], that have supported human occupation historically. At present, some areas of the Gotjawal are protected, but about 50% of the original forests have been replaced by secondary growth due to losses from deforestation, agricultural land use, and urbanization [3]. Gotjawal soils are considered highly fertile due to their high organic matter and low mineral content, but relatively little is known about their microbiology [2,4]. Previous cultivation-based studies of these soils have identified several new genera and species [2,[4][5][6][7][8], but more comprehensive analyses are needed to understand patterns of microbial community structure and function, and interactions with Gotjawal plant assemblages.
Studies on existing volcanic systems have focused on microbial distribution and function in relation to lava flows, their ages and plant or ecosystem development (for review see [2]). The microbial diversity of volcanic deposits is poorly characterized, but several studies have shown that lava weathering partially regulates rates of nitrogen fixation in young Mauna Loa ecosystems (Hawai'i) [9], and is accompanied by changes in carbon monoxide oxidizer diversity during succession on recent volcanic deposits at Kilauea Volcano, Hawai'i [10].
We report here the structure and functional potential of Gotjawal microbial communities analyzed using GeoChip 4.0 and the Illumina MiSeq platform for 16S rRNA gene sequences. First, we set out to elucidate the diversity and community structure of microbes in Gotjawal soils, along with determining their functional potential. Second, we identified the major environmental factors that shape the microbial community structure in Gotjawal soils. The results suggest that Gotjawal forests with distinct biotic and physical characteristics support distinct microbial communities.

Sampling sites
Gotjawal forests are located along the east-west axis of Jeju Island, Korea. Most forests occur in inland areas at altitudes of 200 to 400 m. Gotjawal forests tend to form a zone between inhabited coastal areas and mountainous regions used for grazing livestock. Four major Gotjawal forests are spread across Jeju Island, separated by Mount Hallasan (1,950 m) in the middle of the island (Fig 1). These 4 forests have been preserved well (S1 and S2 Tables and S1 Fig).

Soil collection and DNA extraction
Soil was sampled from all sites in November 2010. No specific permissions were required for sampling the soils, and the activities did not involve endangered or protected species. Soil samples from each sampling location were collected from behind or between lava and tree roots as one sample with mixed 6 subsamples. Each two samples were collected at distances of 10 m to 30 m in the field. Samples were collected aseptically using ethanol-disinfected shovels and were placed in clean, sealable plastic bags. Sampled soils were sieved using 2-mm mesh sieves autoclaved on the day of sampling. The samples were first stored in a 4˚C cooler during transfer to a laboratory (Uljin, South Korea), then stored at 4˚C for a week prior to shipment overnight to Glomics Inc. (Norman, OK, USA). DNA was extracted from approximately 0.25 g Gotjawal soil using the Power Soil DNA Isolation Kit (MoBio Laboratories, Carlsbad, CA, USA) for GeoChip and using a FastDNA SPIN kit for soil (QBiogene Inc., Vista, CA) for 16S rRNA gene sequencing using Illumina MiSeq. Total microbial DNA was quantified with Quant-iT PicoGreen (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions.

Soil analyses
Soil physical-chemical properties were measured at the National Instrumentation Center for Environmental Management (NICEM, Seoul, South Korea) based on standard SSSA (Soil Science Society of America) protocols (Table 1). Variables analyzed included pH, EC electrical conductivity, OM (organic matter content), TOC (total organic carbon content), T-N (total nitrogen content), NH 4 + , NO 3 -, CEC (cation exchange capacity, Mg, Na, Ca, K), P 2 O 5 , clay, silt and sand.

GeoChip hybridization and data preprocessing
GeoChip hybridization and data pre-processing were carried out by Glomics Inc. using their data processing pipeline (http://ieg.ou.edu/microarray/). Briefly, community DNA from soil samples was randomly amplified then labeled with Cy3 fluorescent dye before hybridization with randomly placed Cy5-labeled common oligo reference standard (CORS) target and  Cy3-labeled alignment oligo (Roche NimbleGen, Madison, WI, USA). The hybridized array was scanned with a NimbleGen MS200 Microarray Scanner and quantified with NimbleScan software with an appropriate grid file. Data pre-processing was carried out using the Microarray Data Management (MGM) system on the Institute for Environmental Genomics (IEG) website (http://ieg.ou.edu/microarray). The probes were considered positive if the signal-tonoise ratio (SNR) was > 2.0 and the coefficient of variability of the background was < 0.8, Signal intensity for each probe were normalized by the mean signals from all spiked CORS probes

16S rRNA gene sequencing and sequence processing
Briefly, the V3-V4 region of bacterial 16S rRNA gene amplicons were generated using the forward primer (337F, 5' CCTACGGGNGGCWGCAG) and reverse primer (805R, 5'GACTAC HVGGGTATCTAATCC) [11]. Each sequenced sample was prepared according to the Illumina 16S Metagenomic Sequencing Library protocols. The quantification of DNA and the DNA quality was measured by PicoGreen (Invitrogen) and Nanodrop (Thermo Scientific). Input gDNA (10ng) was PCR amplified. The final purified product was then quantified using qPCR according to the qPCR Quantification Protocol Guide and qualified using the Agilent Tapestation D1000. These libraries were sequenced by multiplexed paired-end sequencing on the Illumina MiSeq (Illumina, San Diego, CA, USA) platform v3 technology (2 × 300 bp, paired-end) by Macrogen Inc. (Seoul, Korea). After contigs were formed from the paired-end sequences by FLASH (v.1.2.11) [12], additional sequence processing was done using CD-Hit-OTU [13]. Short sequences of less than 40% in library and sequencing errors (Phred score < 30%), homopolymer and ambiguous bases were removed. Chimeric sequences were identified and removed using (USEARCH) [14]. The remaining sequences were used to form Operational Taxonomic Units (OTU, cluster cutoff value, 97%). Singletons and doubletons were not used for downstream analyses. Each representative OTU sequence were compared against the RDP database (release 11) the taxonomy was assessed using UCLUST (v.1.2.22) [14]. Species diversity and evenness were assessed by calculating Shannon index, Simpson index, Chao1, and Good's coverage. In addition, the most characteristic species of the microbial community in four Gotjawal areas were identified through the indicator value using the indval function of labdsv package in the program R (www.r-project.org). The sequence data were deposited in the NCBI Sequence Read Archive (SRA) under the BioSample accession numbers SAMN06049757 to SAMN06049776.

Statistical analysis
Nutrient cycle genes (C, N, P, and S cycles) were selected from the GeoChip data for further analysis in addition to all functional genes. The OTU table from 16S rRNA gene sequences was used for establish the taxonomy of Gotjawal microbial communities. Non-metric multidimensional scaling (NMDS) was used for unconstrained ordination of microbial communities. The NMDS results were quantitatively evaluated with analysis of similarity (ANOSIM) and permutational multivariate analysis of variance (PERMANOVA). NMDS configurations were compared using the Procrustes test [15]. Soil characteristics were fitted using vector fitting [16]. Hierarchical cluster analysis (HCA) and K-means cluster analysis were performed to classify the samples from the 4 Gotjawal areas [17]. HCA was carried out with Pearson coefficient without centering for distance measurement and the complete linkage agglomerative cluster algorithm [18], because it has been proposed to be the most legitimate cluster algorithm for microarray data in hierarchical cluster analysis [19].
The data structure was investigated by comparing the results from the redundancy analysis (RDA) and canonical correspondence analysis (CCA) for the GeoChip dataset, 16S rRNA sequence dataset, and soil characteristics (16 variables). The results clearly supported a unimodal data structure; thus, CCA was used to link microbial communities and soil characteristics. The initial results were very noisy, possibly due to the complex structure of the high variability in soil characteristics. Thus, subsets of soil characteristics were chosen by an iterative procedure of CCA modeling, based on VIF (variance inflation factor) and testing the significance of environmental constraints beginning with the full model. Specifically, variables with the highest VIF were sequentially removed until all variables became non-redundant, while the significance of soil characteristic constraints was controlled as p < 0.05. To understand the effect of geographic distances among samples, Mantel test and multivariate correlogram analysis were performed. Mantel test and partial Mantel test [17] were also used for additional gradient analysis.
All of the analyses were performed in R v. 3.0.2, with packages vegan v. 2.2-1 and ecodist v. 1.2.9 [20].

Soil characteristics
Soil physical and chemical properties varied among the sites ( Table 1). The characteristics of GS sites were distinct from the others, whereas the JH and AW sites clustered together, overlapping minimally with GS and HA (S2A Fig). In contrast, an HA site (HASY1) differed distinctly on the non-metric multidimensional scaling (NMDS) ordination field. Similar patterns were observed in hierarchical cluster analysis (HCA), with all first-level clusters containing samples from the various sites (S2B Fig). Composite soil characteristics were marginally significantly classified for the 4 regions by analysis of similarity (Table 1). However, both Mantel test with geographic distances (r M = 0.473, p = 0.002) and Procrustes test with NMDS ordination with geographic locations (t = 0.554, p < 0.001) were highly significant, indicating the soil characteristics were spatially autocorrelated.

Microbial communities
The   Overall, the GS Gotjawal region microbial communities were well clustered consistently, and other Gotjawal regions contained fairly diverse soil microbial communities. All statistical analyses distinguished P cycling genes from other nutrient cycling genes in the microbial communities. The NMDS ordination configurations of all nutrient cycling genes and 3 other gene (N, P, S) were virtually identical, because the p values from Procrustes test exceeded 0.992. In comparison, P cycling genes were consistently less similar than the other gene categories. HCA results showed that the GSBY samples from GS were clustered with the AWSB samples from AW for P cycling genes (S5 Fig). For all other genes, all GS Gotjawal samples were clustered together with JHDK samples
The Mantel and partial Mantel test (Table 3) revealed a significant association between composited soil characteristics and distances, and bacterial communities defined by 16S rRNA genes. However microbial communities defined by functional genes did not yield significant results. These results indicated that bacterial communities were spatially autocorrelated while functional genes were not. The overall linkage of functional genes with soil characteristics was weak. P cycling genes were most significantly associated with both the soil characteristics and distance. The Procrustes test results with NMDS ordination overall agreed with Mantel test results.
Some soil measurements were redundant based on their collinearity, and thus were removed in building the constrained correspondence analysis (CCA) models. Through the  iterative process, total N, ammonia, nitrate, Na exchange capacity, percent clay and percent silt were selected for the CCA model with bacterial communities (Fig 3A). HA samples were correlated with soil texture (HACS, HAJJ, and HASY). AW and JH samples were correlated with soil N (AWSB, AWNKM, and JHKR), while Na exchange capacity was correlated with some JH and HA samples (JHDK and HADNR). Cation exchange capacity, pH, total N, ammonia, K exchange capacity, percent clay, and percent silt were correlated with the soil microbial communities of all nutrient cycling genes. HA and AW samples were correlated with either soil texture (HASY, HACS, and AWSB) or soil nutrients (HAJJ and AWNKM). Soil nutrients were important correlating factors of JHKR samples, while pH was important for GS samples. The overall patterns for general microbial communities and functional gene categories were generally similar (Fig 3B and S6 Fig). All of the CCA models were very significant with the selected constraining soil characteristics variables (p < 0.001). Again, the CCA model with P cycling genes was the most unique among nutrient cycling genes, because all HA, GSDG, and JHDK samples were explained best by soil texture. AWNKM and JHKR samples were better explained by percent clay and P 2 O 5 concentration.

P cycling genes
In terms of the P cycling gene community based on GeoChip analysis, a ppx (exopolyphosphatase) gene was primarily detected in the AWNKM, AWSB, HADNR, GSBY, and GSDG soil samples. In contrast, a gene encoding phytase (hydrolyzing phytate to release inorganic phosphate) was primarily detected in the HACS soil samples. The ppx gene was not detected the HAJJ soil samples, while the phytase gene was not detected the HAJJ and GSBY soil samples. A ppx gene was detected in the AWSB, HAJJ, GSBY, and GSDG soil samples. Low detection of a ppk (polyphosphate kinase) gene was observed in all Gotjawal soil samples. The phytase gene produced no signal in AWSB, whereas ppx and ppk genes produced strong signals in this sample. In HAJJ, ppx and phytase genes were detected very strongly. Phosphorus cycling genes significantly contributed to site-specific characteristics. Available phosphorus content was very high, but varied with Gotjawal region with values of 231, 152, 96, and 55 mg/Kg in AWNKM, HADNR, HAJJ, and JHKR, respectively. In comparison, iron and aluminum concentrations were 5640 mg/Kg and 3410 mg/Kg in AWNKM, 9790 and 6750 mg/Kg in HADNR, 18840 and 11750 mg/Kg in HAJJ, and 25540 and 21100 mg/Kg in JHKR (Table 1).

Discussion
Microbial community structure and functional potential in the Gotjawal soils was investigated using 16S rRNA gene sequencing and GeoChip hybridization. The results demonstrated that each of four Gotjawal regions contained distinctly different soil microbial communities as defined by 16S rRNA genes and nutrient cycle functional genes ( Table 2). The microbial communities based on functional gene category were similar in structure, and had similar associations with environmental factors. However, P cycling genes were the most distinctive gene category, possibly due to the different geological features and vegetation of the Gotjawal areas (S2 Table). However, further careful analysis is required to support this result.
JHKR, AWNKM, and HAJJ were closely located in NMDS ordination space (  Composite soil characteristics were consistent with the 16S rRNA gene data, since clustering of three locations was not observed (S2 Fig). Both JHKR and AWNKM have many hummock and hollows in two Gotjawal regions, secondary deciduous broad-leaved forests, and temperate climates. In comparison, HAJJ was characterized by lava flows transitioning from pahoehoe to aa, collapsed trenches, a collapsing of lava tubes, an evergreen broad-leaved forest, and a subtropical climate. JHKR, AWNKM, and HAJJ were humid with subsidence, along with extensive cattle grazing, which should have greater impact on the functional potential of their soil microbial communities.
ANOSIM and PERMANOVA were used to test whether several groups of communities are significantly different from each other by comparing between-group and within-group dissimilarity [21,22]. The Mantel tests are a non-parametric multivariate correlation analysis between two dissimilarity matrices. When one of the matrices is a distance matrix, it tests the overall spatial autocorrelation of the other matrix [23]. Soil characteristics were spatially autocorrelated (r M = 0.476, p = 0.001), but were not distinctive among the Gotjawal regions that they represented. In contrast, the microbial communities defined by functional genes were quite distinctive among the Gotjawal regions, but spatial autocorrelation was weak. The spatial distribution of the 4 Gotjawal regions is quite distinctive around Jeju Island; however, the distance matrix of the 20 sampling locations did not perfectly match the spatial distribution of total Gotjawal area. This discrepancy occurred because some inter-Gotjawal locations were as close to one another as other intra-Gotjawal locations.
16S rRNA gene sequences and functional gene hybridization results represent defined microbial communities based on taxonomic markers and functional potential, respectively. The direct comparison between them based on NMDS ordination (Procrustes test, t = 0.43, p = 0.07) and dissimilarity matrix (Mantel test, r M = 0.22, p = 0.02) indicate marginal similarity between them. Hierarchical clustering and CCA models also showed noticeable distinctiveness between them. All of these results together may indicate a degree of decoupling between structure and function [24,25], however there were some shared but non-identical responses. This pattern may mean different results per different response units between specific functional traits and entire functional gene profiles. Another noticeable feature is insignificant association between functional gene results with soil characteristics [26], while 16S rRNA gene results were well correlated with soil characteristics.
For the diversity measure, we added effective diversity along with conventional diversity index for the Shannon index (H'). The Shannon index is non-linear so quantitative comparison is not appropriate. Effective diversity is the expected richness to generate a specified diversity index overcoming the limitation of the Shannon index [27]. Although there was no significant difference among Gotjawal forests, functional genes tended to have lowest diversity in GS forest and 16S gene sequence data had lowest diversity in HA forest, again indicating a weak link between community structure and functional potential.
Out of the 4 nutrient cycling gene categories (C, N, P, S), P cycling genes were the most unique, with the community functional potential composition differing from the other 3 categories and better association with composite soil parameters. Available phosphorus content was very high, with proportions similar to those of iron and aluminum ( Table 1). Out of all the P cycling genes, phytase and ppx were strongly detected; the ppk gene produced a weak signal in all Gotjawal soils. Genes for both exopolyphosphatase (ppx) and polypshophate kinase (ppk) are on polyphosphate operon [28][29][30] and responsible for the utilization of polyphosphate. Phytase catalyzes the hydrolysis of phytate, which is abundant in plants [31] and often most active in fungi [32] which may be the case for the Gotjawal forest soil. Generally, inorganic phosphorus (P i ) comprise 35~70% in total soil P, and between 30% and 65% of total P is present as organic P (P o ) [33]. Phytic acid, an inositol phosphate, represents the dominant form of organic P in soil [34,35]. The phytate in Gotjawal forest soil could be degraded by phytase, also known as a myo-inositol phosphatase, with four distinctly different classes known [32]. Future assessments of myo-inositol phosphate and myo-inositol phosphatases will be used to develop more detailed models of P cycling and to understand differences among Gotjawal sites.
In conclusion, our study investigated the microbial community structure and functional potential in Gotjawal soils using 16S rRNA gene sequencing and GeoChip. We showed that each Gotjawal region contained noticeably different soil microbial communities, despite soil characteristics that were not clearly distinctive. The taxonomic level composition and diversity, however, of those microbial communities were not very distinct. The indicator species of the microbial community in four Gotjawal regions were identified differently as uncultured Chlamydiaceae, Caulobacter, uncultured Sinobacteraceae, Paenibacillus, Arenimonas, Clostridium sensu.stricto, and uncultured Burkholderiales incertae sedis in AW, Aquicella in GS, uncultured Acidobacteria Gp1, and Hamadaea in HA, and Bosea, and Haliea in JH. In addition, we showed that P cycling genes (phytase, ppx, and ppk) are critical in these regions, and should be the focus in future studies. Our results demonstrate that the Gotjawal area supports diverse microbial communities that, in turn, support diverse plant and wildlife species, thereby contributing to the uniqueness of this primarily unexplored ecosystem that requires conservation.