Quantification of diversity sampling bias resulting from rice root bacterial isolation on popular and nitrogen-free culture media using 16S amplicon barcoding

Culturing bacteria from plant material is well known to be conducive to strong bias compared to the actual diversity in the original samples. This bias is related to the bacterial cultivability, chemical composition of the media and culture conditions. Recovery bias is often observed but has never been quantified on different media using an amplicon barcoding approach whereby plant microbiota DNA extractions are compared to DNA extracted from serial dilutions of the same plant tissues grown on bacterial culture media. In this study, we: i) quantified the bacterial culturing diversity bias using 16S amplicon barcode sequencing by comparing a culture-dependent approach (CDA) focused on rice roots on four commonly used bacterial media (10% and 50% TSA, plant-based medium with rice flour, nitrogen free medium NGN and NFb) versus a culture-independent approach (CIA) assessed with DNA extracted directly from root and rhizosphere samples; ii) assessed enriched and missing taxa detected on the different media; iii) used biostatistics functional predictions to highlight metabolic profiles that could potentially be enriched in the CDA and CIA. A comparative analysis of the two approaches revealed that among the 22 phyla present in microbiota of the studied rice root samples, only five were present in the CDA (Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria, Verrucomicrobia). The Proteobacteria phylum was the most abundant in all CDA samples, showing high gamma-Proteobacteria enrichment. The diversity of the combined culture media represented about a third of the total microbiota diversity, and its genus diversity and frequency was documented. The functional prediction tool (PICRUSt2) detected nitrogenase enzyme enrichment in bacterial taxa sampled from nitrogen-free media, thus validating its predictive capacity. Further functional predictions also showed that the CDA mostly missed anaerobic, methylotrophic, methanotrophic and photosynthetic bacteria compared to the CIA, thereby generating valuable insight that could enable the design of ad-hoc culture media and conditions to increase the rice-associated microbiota cultivability.


Introduction
Plants interact continuously with a microbiota that plays an important role in their health, fitness and productivity. In the last 10 years, the low-cost accessibility of next generation sequencing (amplicon-based sequencing and metagenomics) to scientists has enabled extensive description of the diversity of this microbiota on many model and non-model plants (e.g. in Arabidopsis [1] and wheat [2]). For rice, its microbiome has been widely described in different countries and rice culture practices [3][4][5][6]. This wealth of data has now provided a good overview of the main bacterial and fungal taxa inhabiting underground plant tissues (roots and rhizosphere), as well as those in their above-ground parts (phyllosphere and endosphere). The diversity determined using amplicon-barcode approaches is mainly based on fragments of ribosomal taxonomic markers such as 16S and 18S rRNA genes, with taxonomic resolution often restricted to the genus level. To access and obtain more microbial diversity and structural representativeness, several studies have been carried out using a combination of markers at different resolution levels, ranging from general (16S V3-V4 or V4 for prokaryotes, 18S V4 for microeukaryotes) to more resolutive markers (gyrB or rpoB fragments for bacteria, ITS1/ITS2 for fungi) [7][8][9]. Bioinformatic analysis of amplicon barcode data has also involved several novel strategies, ranging from operational taxonomic unit (OTU) clustering at different identity percentages to more advanced clustering methods using swarming algorithms [10,11], in addition to methods inferring true amplicon sequence variants (ASV) [12].
Harnessing plant microbiota diversity with regard to plant nutrition or tolerance to pathogens, for instance, relies on the isolation and culturing of the taxonomic and/or functional diversity of the microbiota [13]. The capacity to culture and store such diversity allows us to design synthetic communities and test their various compositions on plant growth and health [14,15]. Otherwise, different culturomics approaches have been developed to capture the bacterial diversity of plant microbiota, including culture media supplementation with various compounds, simulated natural environments, diffusion chambers, soil substrate membrane systems, isolation chips, single cell microfluidics [16], or using limiting dilutions on plates combined with dual barcode processing [17]. Substantial improvements in diversity sampling have also been achieved by popular media supplementation with plant compounds or plantbased media, while microbiologists continue to develop alternative culture methods to highlight rare and unculturable plant-associated microorganisms [16].
Several functional prediction tools, such as PICRUSt2 [18] have recently been developed to predict functional enrichment in metagenomes and even 16S amplicon barcoding data. In theory, such tools could allow the identification of metabolic functions and ecological functions that are enriched in culture-independent compared to culture-dependent approaches in order to guide culture media design or highlight culturing conditions that could help capture them [19].
It is well recognized in the microbiology community that commonly used non-selective bacterial medium, such as Lubria broth (LB), R2A, nutrient agar (NA), tryptic soy agar (TSA), are conducive to strong bias in the sampled diversity recovered from plant tissues [20,21]. This bias has never been quantified or documented in terms of proportions using next generation sequencing (NGS) amplicon-based technologies, to the best of our knowledge. Other media, such as Norris-glucose nitrogen-free medium (NGN), nitrogen-free medium (NFb) [22], have been successfully designed to isolate dinitrogen-fixing bacteria, but proportions of recovery of the diversity of the dinitrogen-fixing community remain unclear.
In this study, we employed both culture-independent (CIA) and culture-dependent approaches (CDA) to analyse bacterial diversity in rice roots and rhizosphere soils. Specifically, we used 16S amplicon barcode sequencing to analyse DNA directly extracted from the plant samples (CIA), as well as from mass bacterial cultures of varying dilutions plated on different media (CDA), including a popular medium for isolating plant-associated bacteria (TSA at 10 and 50%), a plant-based medium (rice flour), and two nitrogen-free media (NFb, NGN). The object of this study was: i) to quantify the bias of bacterial diversity introduced by CDA compared to CIA; ii) to determine the proportions of enriched bacterial genera per medium; iii) to use functional prediction tools on amplicon data to identify specific metabolic functions or bacterial capacities present in the rice root microbiota that are missing from the CDA. Our hypothesis is that the culture-dependent approach (CDA) we used in this study, which involved high-throughput sequencing of DNA pooling from the culture media, will help overcome the issue of losing slow-growing bacteria and provide a more accurate assessment of the culturable bacterial diversity. The approach may increase the percentage recovery of bacteria and obtain a more comprehensive picture of the bacterial diversity that reflects the real diversity present in the plant samples.

Rice root sampling and processing
Oryza sativa ssp indica cv FKR64 plant roots were collected in a rice field near Bama village (western Burkina Faso, Kou Valley, 10.64384 N, -4.8302 E). This field was already assessed in a previous study and described by Barro et al. [6]. Rice sampling was authorized by a national agreement between the Burkina Faso government and farmers within the framework of a rice productivity improvement program involving INERA. Rice plants were sampled at the panicle initiation growth stage, with three sampling points chosen 10 m apart, where roots were collected from three plants (20 cm apart). Roots were hand-shaken to remove non-adherent soil. Ten roots per plant from the same sampling point were pooled to obtain three final samples in 50 mL Falcon tube containing 30 mL of sterile PBS buffer, and vortexed for 5 min to separate the rhizospheric soil from the roots. Roots were removed with sterile forceps and placed in new 50 mL Falcon tubes. From this treatment step, the rhizosphere (Rh) and roots (Ro) samples were manipulated separately (Fig 1). The rhizosphere soil in PBS was vortexed for 10 sec and then two samples of 1 mL of the rhizosphere suspension were taken after 15 sec and placed in two separate 2 mL Eppendorf tubes to be used in bacterial culture-dependent (CDA) or culture-independent approaches (CIA) for diversity estimation by a direct 16S amplicon barcoding approach. Similarly, washed roots were cut into 2 cm fragments, and then divided and placed in two 2 mL Eppendorf tubes for CDA and CIA assessment.

Bacterial culture isolation media
Four culture media with different carbon and nitrogen sources were used to maximize the isolated bacterial diversity. First, non-selective tryptic soy agar (TSA, Sigma) medium was used at 10% (TSA10) and 50% (TSA50) concentration. It contained digests of casein and soybean meal, NaCl and agar. In addition, two nitrogen-free media were used for the isolation of potential nitrogen fixers, semi-solid NFb [22] and Norris glucose nitrogen-free medium (NGN, M712, [23]). NFb was used as semi-solid medium, which allows the development and growth of free nitrogen-fixing bacteria, due to their growth at an optimal distance for micro-aerobic conditions favourable for nitrogen fixation [22]. Finally, we included a plant-based medium, rice flour (RF), which is commonly used for isolation of fungal rice pathogens [24]. The compositions of the above culture media were as follows: TSA 10% (g/L): 0.5 NaCl, 1.

Culture-dependent (CDA) and independent (CIA) approaches
For the CDA, roots (200 mg) and rhizosphere soil (200 mg) were transferred into PowerBead Tubes from the DNeasy PowerSoil kit (QIAGEN) where 1 mL of PBS buffer was added, and homogenized in a TissueLyser II (QIAGEN) for 2 min (Fig 1). Dilutions (10 −2 to 10 −5 ) were performed and 50 μL of each dilution were spread on solid culture media (TSA 10%, TSA 50%, NGN, RF). For NFb medium, 50 μL of the 10 −1 root and rhizosphere soil suspensions were inoculated in 20 mL tubes containing 10 mL of NFb semi-solid medium. Each dilution was inoculated (on plates or in tubes) with 4 replicates. After 2 to 5 days of incubation (depending on the culture medium) at 28˚C, plates were examined and dilutions selected for further processing (details in S1 Table). For selected dilutions, cultivable bacteria were recovered from petri plates by adding 1 mL of sterile distilled water, scraping and mixing bacterial colonies. Bacterial suspensions obtained from the same dilution plates were collected with a pipette and transferred to sterile 15 mL Falcon tubes. For the NFb medium, bacteria which had grown in a ring shape 0.2-0.3 cm below the surface of the medium were collected. Bacterial suspensions were stored at -20˚C until DNA extraction. The number of cultivable bacteria in the obtained suspensions was roughly estimated by measuring the optical density (OD) at 600 nm for all suspensions and adjusted to 10 6 (assuming that OD600 nm of 1 corresponds to 1x10 8 bacteria/mL). The volumes collected from the samples were centrifuged 10 min at 14,000 rpm, and the pellets obtained were used for DNA extraction.
For the culture-independent approach (CIA), pooled roots were homogenized in liquid nitrogen using a mortar and pestle, while the pooled rhizosphere samples were used directly for DNA extraction (Fig 1). A mass of 250 mg was used for DNA extraction from both sample types.

DNA extraction
Cultivable bacteria suspensions (� 10 6 cells) and ground roots and rhizospheres soil (250 mg) were transferred to PowerBead tubes (DNeasy PowerSoil, Qiagen) containing C1 buffer and homogenised in a TissueLyser II (Qiagen) at 240 rpm for 2 x 1 min. Extraction was then performed according to the protocol provided by the supplier.

Bioinformatics analysis of 16S amplicons
For this study, we performed all diversity analyses using an amplicon sequence variant (ASV) detection approach (DADA2 pipeline), but we also compared the diversity with an OTU clustering method (based on FROGs, [26]).
For ASV analysis, raw amplicon barcoding data were demultiplexed and processed using the Bioconductor Workflow for Microbiome Data analysis [27]. This workflow is based on DADA2 [12] that infers amplicon sequence variants (ASV) from raw sequence reads. Forward and reverse reads were trimmed at 20 bp, respectively, to remove primers and adapters, and then quality-truncated at 280 and 205 bp, respectively. The dada2 denoise-paired function with default parameters was used to correct sequencing errors and infer exact amplicon sequence variants (ASVs). Then forward and reverse corrected reads were merged with a minimum 20 bp overlap, and the removeBimeraDenovo function from DADA2 was used to remove chimeric sequences. Eighty-two percent of reads passed chimeric filtering. The numbers of reads filtered, merged and non-chimeric are indicated in S2 Table. A mean of 58.6% of reads passed all filters (denoising, merging, non-chimeric), with a minimum of 15,347 and a maximum of 31,134 reads in filtered libraries, yielding a total of 2,712 ASV.
ASV were then assigned at the taxonomic level using the DADA2 AssignTaxonomy function, with the Silva 16S reference database (silva_nr_v132_train_set) [28]. We subsequently filtered out plasts (especially mitochondria from root samples) to keep only ASVs assigned to the Bacteria or Archaea kingdoms. A last filtering was done to remove ASV with <10 read occurrence across all libraries. A dataset of 1,647 ASV was used for subsequent diversity analyses. A Neighbour-joining phylogenetic tree of the 1,647 ASV was constructed using MEGA11 [29] by first aligning ASV sequences with MUSCLE [30] and then building a Neighbour joining-tree based on a distance matrix corrected with the Kimura 2P method. Metadata and ASV tables and the phylogenetic tree were uploaded to the NAMCO server for downstream microbiota diversity analyses (https://exbio.wzw.tum.de/namco/, [31]. NAMCO is a microbiome explorer server based on a set of R packages, including Phyloseq for diversity analyses [32] and PICRUSt2 for functional predictions [18]. Alpha-diversity analyses (observed richness, Shannon and Simpson diversity, statistical test with pairwise post-hoc Dunn test) were performed with Phyloseq and tidyverse, ggpubr, rstatix, multcompView R packages and plotted with ggplot2. Beta-diversity (NMDS, PERMANOVA) was performed with Phyloseq and Vegan. PICRUSt2 functional predictions were performed to infer metabolic capacities from our 16S amplicon ASV. Functions were predicted in three classes: enzyme classification (EC), KEGG orthology (KO) and molecular pathways (PW). Data were normalised with relative abundance, and a Kruskal-Wallis test was performed across conditions (medium used for CDA and CIA) with the ALDEx2 package [33]. Circular phylogenetic tree annotations and mapping were obtained with iTOL [34]. Additional R scripts for the DADA2 pipeline, Phyloseq, and the production of figures are freely available on GitHub (https://github.com/lmoulin34/Article_ Moussa_culturingbias).
For the OTU clustering approach, the FROGs pipeline ( [26]; http://frogs.toulouse.inra.fr/) was used in the Galaxy environment. After demultiplexing and pre-processing, reads were clustered into OTU using the swarming method with default parameters (aggregation distance of 3), then chimeric sequences were removed and OTU were affiliated with taxonomic levels using the same Assign taxonomy tool as described above.

Quality filtering and diversity indices of 16S amplicon libraries (CIA versus CDA)
We first assessed the quantity and quality of reads produced for each amplicon library originating from direct rice root or rhizosphere genomic DNA extraction (CIA) or from DNA extracted from cultures (CDA) of the same samples grown on bacterial culture media. A range of 24,000 to 44,000 reads (mean 36,120) was obtained for all 16S amplicon libraries (S2 Table). Rarefaction curves (Fig 2A) showed sampled diversity saturation for each library, with a clear difference between the CIA reads (much higher in alpha diversity) compared to CDA.
After DADA2 pipeline processing, we obtained 2,712 amplicon sequence variants (ASV) that were assigned at the taxonomic level using the Silva database. One library (S36) was removed from the analysis (from CIA) as it showed only 3 ASV. For the remaining libraries, ASV were filtered with regards to their abundance (cumulated reads � 10 among all libraries) and mitochondria, chloroplast and eukaryote reads were removed (remaining ASV = 1647).
We first compared the diversity obtained from root (Ro) and rhizosphere (Rh) samples. There was no statistical difference in ASV alpha diversity (Shannon index) or beta diversity (PERMANOVA) between Ro and Rh samples (S1 Fig). These results could be explained by the fact that we did not surface disinfect or remove the rhizoplane from roots, so the rhizosphere (soil adhering to roots) and the root (rhizoplane + endosphere) from the same samples did not show significant differences. As the focus of this study was to compare the diversity obtained from a non-culturable versus a culturable approach on different media, we pooled Rh and Ro data from the same plant samples for all subsequent analyses.
The bacterial sequences obtained by the CIA method exhibit significantly higher alpha diversity than those obtained from the five CDA media (TSA10, TSA50, NGN, NFb, RF) (Shannon or Simpson index, Kruskal-Wallis test, p = 0,002; Fig 2B). The TSA, RF and nitrogen-free media alpha diversities were not statistically different (Fig 2B). The ASV richness sampled from each medium represented about 15% of the diversity of all ASV detected in both CIA and CDA (TSA10: 16%, TSA50: 14.9%, NFb: 17%, NGN: 17%), except for RF (11%) Rarefaction curves (A), alpha (B) and beta diversity (C) in 16S amplicon libraries. Alpha diversity (A) rarefaction curves were calculated on unfiltered data, while alpha diversity indices (B) were calculated using the Shannon and Simpson index. Letters above boxplots indicate statistical significance using a pairwise Wilcoxon test (adjusted with the Bonferroni method). The p-value of a PERMANOVA test on beta diversity is indicated at the bottom of (C). Coloured circles on the NDMS are ellipses of confidence (at 95%) for all media and CIA conditions. Abbreviations: CIA: culture-independent approach; media used in the CDA: RF: rice flour, NFB: nitrogen-free medium, NGN: Norris glucose N-free medium, TSA10 and TSA50: tryptone soya agar at 10 and 50%.
https://doi.org/10.1371/journal.pone.0279049.g002 which captured less diversity, while the CIA approach represented 67%. NMDS on the beta diversity analyses showed no overlap between ASV obtained from the different media (CDA) and the CIA (Fig 2C, PERMANOVA, R 2 (Linear fit) = 0,88, p = 0.001). A substantial overlap was observed for TSA10 and TSA50, which was expected since it was the same medium but used at two different concentrations.

Culturable sampled diversity: Comparison between ASV and OTU
We also analysed our amplicon barcoding reads using an OTU-clustering approach (FROGs pipeline, using the swarming method to merge reads into OTU). This approach produced 1,023 OTU after quality filtering (same as for the ASV analysis). We then assessed if the diversity obtained by OTU gave the same percentage diversity recovery compared to ASV. In Table 1, we present the number of ASV and OTU obtained from the culture-dependent approach (CDA) and from the culture-independent approach (CIA), as well as the number of classes, orders and families represented in each. The ASV analysis produced more richness (38% more) than the OTU analysis. This higher diversity was observed at different taxonomic levels: class (ASV:50; OTU:38), order (ASV: 124; OTU:67), and families (ASV:219; OTU:119). Given this result, we conducted all subsequent analyses with ASV-analysed data as it was better at capturing the diversity of our 16S amplicon libraries. In both analyses, the diversity shared between CDA and CIA was relatively low (7% for ASV, 22% for OTU). From the culturable approach, we thus recovered many bacterial taxa that were undetected in the amplicon sequencing performed on gDNA extracted from roots or the rhizosphere, yet only a small proportion of the root bacteria were able to grow on our culture media.

Comparison of bacterial taxonomic diversity between culture-independent (CIA) and culture-dependent (CDA) approaches
Taxonomic binning was performed at different taxonomic levels for the top 30 phyla and the top 25 classes, orders and genera (Fig 3A-3D). The phylum distribution showed a dominance of Proteobacteria, Bacteroidetes and Firmicutes in all libraries, with a clearly higher diversity of phyla in the CIA samples. We identified 22 bacterial phyla in the rice root sample microbiota, with only 5 present in the CDA (Proteobacteria, Firmicutes, Bacteroidetes, Actinobacteria, Verrucomicrobia). The proteobacteria phylum was the most abundant in all samples, with a greater proportion noted on the rice flour culture medium. At the class level, the difference in diversity was even more visible with Gammaproteobacteria, Alphaproteobacteria and Bacteroidia dominating in the CDA, while high class diversity was present in the CIA ( Fig  3B). At the order level, the CIA showed (as expected) high diversity, while the CDA data were dominated by Enterobacteriales, Betaproteobacteriales, Rhizobiales and Flavobacteriales. Finally, in the top 25 genera, differences among CDA libraries clearly appeared, with the exception of the Enterobacter genus which was enriched in all (although to a lesser extent for NFb) (Fig 3D). In the CIA, Devosia was the most represented genus. To better visualize the sampled diversity distribution, we built a phylogenetic tree of ASV (diversity labelled at the

PLOS ONE
class level) and mapped their distribution and abundance in the different conditions (coloured outer circles) (Fig 3E). This representation clearly highlights which taxa diversity is sampled and over-represented with the media used in the CDA (e.g. Gammaproteobacteria in blue or Firmicutes in pink), and which whole parts of bacterial diversity were missed compared to the CIA (e.g. Patescibacteria, Armatimonadetes, Deltaproteobacteria, Planctomycetes, Chloroflexi).

Statistical differential analyses between CIA and CDA at class and genus levels
We performed a Kruskal-Wallis test (α = 0.05, with the Bonferroni multiple test correction method) to identify classes of bacteria with significant differences among CDA and CIA conditions. The statistical test identified 45 classes of bacteria above the significance cut-off level (p<0,05), 37 of which were present only in the CIA (S3 Table), including in the top 10 most frequent class taxa: Ignavibacteria, Saccharimonadia, Fibrobacteria and Acidobacteria. Four classes were present in both the CIA and CDA: Alphaproteobacteria, Gammaproteobacteria, Bacteroidia and Actinobacteria, with Alphaproteobacteria and Gammaproteobacteria being the most represented in the CIA and CDA, respectively (also visible in Fig 3B and 3C). Then we performed differential analyses on the mean relative abundance of bacterial genera in each condition, using a Kruskal-Wallis test (α = 0.05). Table 2 shows the 50 most abundant genera in the CIA and their mean relative abundance in each media dataset (whole dataset is available in S4 Table). Among the 20 most frequent bacterial genera in the CIA, eleven were detected in the CDA. These were Devosia (8.25% of all genera), obtained on TSA10, TSA50 or NFB media; followed by Pseudoxanthomonas (3.62%), which was found in all media conditions except RF, then Stenotrophomonas (3.36%), Bacillus (2.29%), Pseudomonas (1.42%) and Allo/Neo/Para/Rhizobium (1.3%) found in all media; and finally Sphingopyxis (2.1%) detected in TSA50; Streptomyces (1.48%) in NGN and Pseudolabrys (1.47%) in NFb. We built Venn diagrams on shared and specific diversity at ASV (Fig 4A) and genus levels (Fig 4B and 4C). Among the 244 genera from the CIA, 173 (71%) were absent from the culturable approach, while 71 were shared (29%) and 70 others were CDA-specific (Fig 4B). We also compared the genus diversity sampled in each CDA medium, and we listed specific genera obtained for each media on the Venn diagram in Fig 4C. To document genera that were the most frequent in the culturable approach for each medium, a table of the 20 most statistically frequent genera (Kruskal-Wallis test, α = 0.05) obtained for each medium of the CDA is given in Table 3. In this top 20 most frequent genera, several appeared in all media: Enterobacter, Stenotrophomonas, Bacillus, Sphingobacterium, Klebsiella, Brevundimonas and Rhizobium, all of which are known to be fast-growers on rich media and reported to contain plant-inhabiting species. On nitrogen-free media, species known as nitrogen-fixing Plant Growth Promoting Rhizobacteria (PGPR) were sampled: Azospirillum, Para/Burkholderia, Bradyrhizobium, Sphingomonas, etc.

Prediction of enriched functions in CIA compared to culture-based approach
We performed a functional prediction analysis using PICRUSt2 to infer metabolic capacities from our 16S amplicon ASV. In order to assess the predictive ability of the PICRUSt2 algorithm on our dataset, we focused on the specific enzyme nitrogenase (EC. 1.18.6.1) prediction in CDA libraries that included medium with (TSA, RF) or without nitrogen (NGN, NFb) ( Fig  5A). As expected, we observed nitrogenase enrichment (p = 0.00492) in nitrogen-free NFb and NGN media, with NGN medium exhibiting much higher enrichment than NFb. The non- selective medium (TSA) and the plant-based medium (RF) did not enrich bacterial taxa with the nitrogenase function ( Fig 5A).
We also aimed to predict which functional pathways were specific to CIA compared to CDA in order to help design conditions to capture the yet unculturable diversity. We thus analysed the metabolic pathways (based on PW/Metacyc categories) predicted as being enriched in the CIA compared to CDA conditions, and represented the results in a dot-plot (Fig 5B). Among the detected Metacyc pathways enriched in CIA, several functions linked to specific ecological niche abilities were detected: anaerobic/fermentation metabolism, carbon dioxide fixation, bacterial photosynthesis, methanotrophy and methylotrophy. As our CDA culture conditions were aerobic and in the dark, this enrichment was logical and gave clues on the culture conditions that could ultimately capture more bacterial diversity. Enriched pathways in the TSA and RF media libraries (compared to others) could be linked to heterotrophy on rich media in aerobic conditions (sugar degradation, amino acid/lipid/nucleotide biosynthesis, vitamin biosynthesis). For the nitrogen-free media (compared to the others), several pathways were detected as phenolic compound/polyamine/amino acid degradation and sugar degradation. Nitrogen fixation does not appear as itself in Metacyc pathways, it is embedded in "nitrogen metabolism" together with "nitrification" and "denitrification" capacities among others, so that no pattern from nitrogen fixation ability is possible, apart from the analysis on E.C. for the nitrogenase enzyme (Fig 5A).

Discussion
In this study we used Illumina sequencing on 16S-amplicon barcodes (variable region V3-V4) to quantify the bacterial diversity bias that occur when culturing rice root-associated bacteria on a range of culture media compared to the real diversity. Our goal was to precisely document the bacterial taxa diversity that could be recovered from a set of culture media compared to the real diversity, the predicted and enriched/depleted functions that could be inferred from this diversity, while seeking how to design new culture conditions to capture it. We have used the term "real diversity" in reference to that captured by Illumina amplicon sequencing, although this approach can also be biased, since it is based on the amplification of a marker gene from a DNA matrix that could originate from dead bacteria. As rice is a non-perennial plant and we expected root and rhizosphere soil to be under high metabolic turnover, we hypothesized that DNA from dead bacteria in the culture-independent approach would not represent high diversity in our analysis. Yet this is a possibility and could represent bias from the unculturable approach when comparing culture and uncultured-based diversity analyses.
Several studies have already compared culturable and real rice microbiota diversity [35][36][37], but they often relied on comparisons between regular 16S Sanger sequencing on isolated bacteria compared to NGS sequences. Here we used the same sequencing methodology at high depth and were able to compare diversity levels without sequencing/analytical bias. We also Colors indicate frequencies of occurrence in the 16S amplicon data (highest in red, lowest in green, for each medium and CIA condition). Kruskal-Wallis test, α = 0.05.
used two different analytical methods to infer operational taxonomic units, i.e. ASV (based on exact sequence variant detection, [12] or OTU (based on clustering by swarming, [11]). As ASV analysis detected more diversity than OTU at different levels (even class and order levels, Table 1), we preferred to use ASV for all subsequent diversity and functional predictions. As exact sequence variants highly rely on algorithms for sequencing error detection and correction, we could not exclude that some of the obtained diversity was due to algorithm imperfections. However, given that the obtained higher diversity also concerned higher taxonomic levels, this artificial diversity issue is unlikely as it would involve a high number of mutations.

PLOS ONE
In this study, the diversity obtained from the CDA culture media (TSA10, TSA50, RF, NGN, NFb) was lower compared to the CIA. If we combine all diversity of the CDA, it represents 11.7% (ASV level), 29% (genus level), 22.4% (class level), 25.6% (order level) and 23.1% (family level) of the diversity of the CIA. As there are few comparable studies in the literature, it is hard to determine if our recovery rate was low or high since this is the first study to our knowledge to have assessed culturable recovery by amplicon barcoding and NGS sequencing. The review of Sarhan et al. [16] detailed recent advances in culturomics methodologies, and established a recovery rate of about 10% for conventional chemically-synthetic culture media, which is in the range that we obtained at the ASV level (although we obtained 23 to 29% at higher taxonomic levels). Samson et al. [36] claimed to have recovered up to 70% of bacterial genera (on 16S V5-V7 amplicon, at >97% similarity) from Oryza sativa indica and japonica rice microbiota, but they applied a 0.1% frequency cut-off. Applying the same cut-off on our dataset would indicate the detection of 121 genera in the CIA, with 36 (29%) of them present in the CDA.
From all media used in the CDA, we could recover a total of 142 bacterial genera, with each medium capturing 15 to 23 specific genera (Fig 4C). The only exception was the plant-based rice flour medium, which in our study captured low bacterial diversity compared to the other media, probably due to its low composition complexity. Plant-based media have been suggested to be a good alternative to popular bacterial chemical media for increasing the cultivability of plant-associated microbes [16], but the use of homogenised roots, leaves or exudates has been recommended to complement minimal or more complex media.
The recovery of specific ASV from the CDA that were not detected in the CIA was an unexpected finding in our study. This diversity represented 532 ASV, 1 class, 3 orders, 16 families and 70 genera ( Table 1, Fig 4A and 4B). This number of ASV in CDA may seem high (the total

PLOS ONE
number in both CIA and CDA was 1,647), but it only represented a quarter of the total ASV diversity at the class, order and family levels ( Table 1). We cannot exclude that a technical PCR bias could have increased the diversity from the CDA, since DNA polymerase errors may arise. Moreover, if there is low diversity in the DNA matrix, a diverse range of sequence variants could be produced, but these errors would only affect diversity at species or genus levels in the amplicon sequencing, not at higher taxonomic levels as in our results. One explanation for not detecting, in the CIA, the ASV diversity found in CDA could concern the sequencing depth, yet the rarefaction curves did reach a plateau but at much higher alpha diversity for the CIA compared to the CDA (Fig 2A). The mean sequencing depth obtained was 36,120. If differences between bacterial ASV frequencies exceed 10e 4 , then several genera may be undetected in the CIA approach, whereas they may be selected by specific culturable media. We set the read number filter at 10 (cumulated in all libraries), but we also looked at lower filtering (>2) and unfiltered ASV data (S5 Table). In the unfiltered data, we counted 102 specific bacterial genera for CDA, 243 for CIA, with 90 in common, while these numbers were 70, 173 and 71 in the filtered results (10 read filter, Table 1 and Fig 4B), i.e. similar proportions. Processing unfiltered data thus produced similar proportions of specific ASV for the CDA compared to CIA. Regardless of the filtering method, we detected one specific class in CDA (undetected in CIA), i.e. Erysipelotrichia, represented by one genus, i.e. Erysipelothrix, and 4 ASV recovered from TSA medium (at 10 and 50% concentration). A BLAST study of these ASV sequences revealed 100% sequence identity with the 16S rDNA of Erysipelothrix inopinata-a species whose type strain was isolated from sterile-filtered vegetable broth [38]. As our medium was sterilized by autoclaving, it is unlikely that these 4 ASV were contaminants. It should also be noted that we are not the first to have found that isolates from culturable approaches were undetected in culture-independent approaches [14,37]. It would be better to assess rice microbiota diversity by substantially increasing the sequencing depth in order to get a better image of the overall diversity. The frequency differences between ASV exceeding 10e 3 (at the genus level, S4 Table) that we found in our study means that they would not be detected in the CIA, while they would be in the culture-based approach. This was a crucial finding since several studies have underlined the role of rare species (also called satellite taxa) in plant-microbe interactions and more broadly in key ecosystem functions [13,39]. Increasing the representativeness of taxonomic diversity in databases should also be the focus of further scientific research since many ASV cannot be affiliated to taxonomic ranks due to missing descriptions of these taxa in taxonomic databases. We also tried to predict functions and metabolic pathways that would be enriched when using different types of media, and we also conducted statistical tests to highlight functions that were missing from our culturable approach. Metagenome-guided isolation and cultivation of microbes has been developed in recent years, but these approaches are based on metagenomic sequences and the reconstruction of genomes and metabolic pathways [19]. The massive sequencing effort focused on a highly diverse range of bacteria from different environments has led to the development of genome database and prediction tools that may be used with simple amplicon taxonomic markers [18]. We applied a prediction tool to our dataset to investigate the ecology and functional capacities of our detected bacteria. We found that many taxa with anaerobic metabolisms such as methanogenesis (methane production), methanotrophy (methane degradation) and methylotrophy (one-carbon reduction), or with photosynthetic capacities, were missing from CDA (Fig 5) compared to the CIA. It is well known that rice microbiota differ from microbiota of other crops since rice is often grown in flooded conditions, thereby creating an oxic-anoxic interface between the rhizosphere/root system and the bulk soil [4,5]. Our functional prediction approach thus underlined the presence of these probably strictly anaerobic bacteria adapted to anoxic conditions in the CIA, and their absence from the CDA. These predictions provided clues on the specific conditions and compositions of media required to capture these yet unculturable functional groups of bacteria. They could also serve to develop culturomics, a growing scientific field for microbiologists interested in synthetic microbiota and for biotechnological applications of plant-associated microorganisms.