Microbial community function and bacterial pathogen composition in pit latrines in peri-urban Malawi

Despite the widespread global reliance on pit latrines as improved sanitation systems, the decomposition of waste within pit latrines is poorly understood. One area needing elucidation is the characterization and function of microbial communities within pit latrines. To address this gap, we characterized the microbial communities of 55 lined pit latrines at three sampling layers from two communities in peri-urban Malawi. The microbial communities of the fecal sludge samples were analyzed for beta diversity, pathogen presence, and functional profiling. Household surveys were conducted and used to compare microbial community patterns to household characteristics and pit latrine use patterns. Compared to activated sludge, anaerobic digestion in municipal wastewater systems, and human gut microbiomes, pit latrines were found to contain unique microbial communities. While the microbial community composition as a whole did not vary by sampling depth, pathogen composition varied by sampling depth, location


Introduction
Pit latrines and septic tanks are the most common sanitation systems used in the world, in rural, peri-urban, and urban settings.In 2017, nearly 1.8 billion people around the world used onsite sanitation systems requiring fecal sludge management [1].Pit latrines are the first step in one of the simplest fecal sludge disposal chains.When properly designed and managed (e.g., fecal sludge is treated and disposed of safely), pit latrines can contribute to meeting the aspirations of Sustainable Development Goal 6.These pits vary in volume and depth, may be lined or unlined, and filled with a variable combination of excreta, anal cleansing material, flush water, and trash [2][3][4].Once pits are full, the fecal sludge must be removed, transported, and treated/reused, unless there is space to safely cover and abandon a pit latrine in place and construct a new one [5].However, the pit latrine itself is not simply a storage technology but also a biological reactor leading to degradation of fecal sludge [5][6][7][8].These degradation processes impact filling rates (and thus frequency of emptying and/or cover/abandonment) [9] and greenhouse gas emissions and subsequent impacts on climate change [10].
One way of understanding these biological reactions is through analysis of the microbial communities in pit latrines.Molecular microbial community analysis (e.g., sequencing of 16S rRNA genes) is routinely used by researchers in engineered processes for waste treatment, such as activated sludge and anaerobic reactors [11][12][13].However, molecular-based microbial studies for pit latrines are limited.Torondel et al. [8] examined pit latrines in Vietnam and Tanzania; Byrne et al. [6] reported the microbial communities in South African pour-flush toilets; Ijaz et al. [14] compared unlined pit latrine microbial communities to latrine fill-up rates in Tanzania, and de los Reyes et al. [15] compared the microbial communities in pour-flush toilets to that of a ventilated improved pit latrine (VIP) in South Africa.These first studies allowed insights on likely methanogenic and sulfate-reduction processes occurring in pit latrines, and the relationships between environmental conditions and degradation processes.
In previous work, we examined the presence of pathogens at different depths of 55 pit latrines in Malawi [5].This study included pit latrine sampling and "waste-based epidemiology" (WBE) analysis-surveillance of fecal sludge that may show insights into public health and community health risk.This previous study used RNA and non-prokaryotic DNA assays targeting a pre-selected list of sanitation-related pathogenic bacteria, viruses, protozoa, and helminths.Assessing microbial communities in pit latrines using 16S rRNA gene sequencing complements this previous analysis, leading to more in-depth understanding of degradation processes within pits and therefore latrine fill-up rates, and insights into community health and risk.In the current study, we analyze the microbial communities of pit latrines in Malawi to gain insights in WBE and various microbial parameters, including methanogenic pathways, potential metabolic functions, presence of pathogens, and correlations with household data and latrine characteristics.We also examine ways to make it easier to sample pit latrines in a study area, which may enable less labor-intensive WBE data collection and improve community health.

Study site
The study was conducted in the peri-urban areas of Mzuzu in northern Malawi and in urban Blantyre in southern Malawi.Mzuzu covers 146 km 2 and had a population of 221,272 Blantyre covers 240 km 2 and had a population of 800,264 in the 2018 census [16].

Study design
Quantitative and qualitative methods were used to study 30 below-ground, lined pit latrines in Mzuzu and 25 lined pits in Blantyre.The study evaluated the household health and sanitation situation quantitatively and qualitatively using an in-depth household interview and field observation checklist.Households were selected purposefully to include a range of socioeconomic and pit latrine characteristics, and the pit latrines that were selected: 1) had a concrete or wood plank floor, 2) were lined pit latrines, to ensure the safety of personnel during emptying operations, 3) had a reasonably sized squat hole which the Gulper (pit emptying tool) [17] could access for sample collection, 4) had diluted or fairly diluted fecal sludge for easy pumping using a Gulper, and 5) had a fecal sludge content level that is not more than 1 m deep below the latrine slab floor.Direct observation of the conditions of the latrines was guided by a checklist.The quantitative methods involved sampling and analysis of the selected pit latrines (see below).Samples were collected in November and December of 2018, the beginning of the rainy season.

Collection of household data
In each household, a questionnaire was administered in person by a researcher from the University of Malawi, the Polytechnic (now known as Malawi University of Business and Applied Sciences MUBAS) or Mzuzu University.The questionnaire (S1 Fig) was used to collect basic sociodemographic and health data to assess sanitation access and health for the household.Where available, the height and weight of children in the household were measured and recorded.Detailed pit latrine characteristics are reported where available (S1 Table ), however, physico-chemical characteristics such as pH and chemical oxygen demand, were not measured.Water sources for household surveys included public faucets, faucets inside community compounds, faucets inside the households, and unprotected water sources.

Sample collection
Three fecal sludge samples were taken from each pit: 1) a surface sample immediately below the squat hole with a disposable hand scoop, 2) a maximum depth sample taken by inserting the Gulper to the bottom of the pit and collecting the first volume of sludge after the full (tube) volume of the Gulper was evacuated, and 3) a mid-point sample taken once approximately half of the pit had been emptied by the Gulper.Samples were collected in triplicate at each depth and placed into 15 mL centrifuge tubes.The Gulper was washed with shallow well water or piped municipal water between sampling of different pits.The samples were immediately stored in a cooler onsite until they could be placed in a 4˚C refrigerator; subsequent laboratory processing was performed within 4 days after sample collection.

Laboratory sample processing
The refrigerated tubes were centrifuged at 385 x g for 5 minutes and supernatant was discarded.A 0.75 mL volume of the pellet was then transferred to a 2 mL centrifuge tube and an equal volume (0.75 mL) of UNEX buffer (Microbiologics, St. Cloud, MN, USA) was added [18].The mixture was vortexed for 2 seconds to ensure thorough mixing with the buffer.The samples were kept at 4˚C prior to shipping to North Carolina State University (Raleigh, North Carolina, USA) at room temperature.Upon arrival in Raleigh, the samples were transferred to a -80˚C freezer until DNA extraction was performed.

gDNA extraction, sequencing, and reads processing
To remove the UNEX buffer, 0.75 mL aliquots of each sample were rinsed with 1X PBS three times by vortexing and centrifugation at 10,000 g for two minutes.After rinsing, microbial gDNA was extracted using the MP Biomedicals FastDNA SPIN Kit for Soil (MP Biomedicals, Inc., USA).Prior work reports the limit of detection for this commercial DNA extraction kit as 10 cells/mL [19] in drinking water.The 16S rRNA gene V4 variable region PCR modified primers 341F and 806R [20,21] were used in a 30-35 cycle PCR using the HotStarTaq Plus Master Mix Kit (Qiagen, USA) under the following conditions: 95˚C for 5 minutes, followed by 30-35 cycles of 95˚C for 30 seconds, 53˚C for 40 seconds and 72˚C for 1 minute, after which a final elongation step at 72˚C for 10 minutes was performed.PCR products were checked in 2% agarose gel to determine the success of amplification and the relative intensity of bands.Samples were multiplexed using unique dual indices and pooled in equal proportions based on their molecular weight and DNA concentrations.Pooled samples were purified using calibrated Ampure XP (Beckman Coulter, Indianapolis, IN, USA) beads.Then the pooled and purified PCR product was used to prepare an Illumina DNA library.Sequencing was performed at MR DNA (www.mrdnalab.com,Shallowater, TX, USA) on a MiSeq following the manufacturer's guidelines.
The raw sequences received from MR DNA were processed following the DADA2 [22] pipeline in R. Quality profiles of the forward and backward reads were inspected and nucleotides below a quality score of ~15 were trimmed.The reads were filtered to allow a maximum of four expected errors.The average number of input reads per sample was 13,756 with an average of 11,721 reads after filtering, an average of 4,142 denoised and merged reads, and finally an average of non-chimeric reads of 3,978 per sample.Taxonomy was assigned using the SILVA 16S v132 database to the genus or species level [23].A phyloseq object was constructed and used for assessing Shannon alpha diversity, Bray-Curtis dissimilarity metrics, and community relative abundances.
After sequences were processed and the phyloseq object was created, results were sorted by domain.Bacteria and archaea were assessed separately, and relative abundances were calculated pit-by-pit for each separate depth sampled, such that one sample represents a separate microbial community (relative abundances sum to 1).Archaea relative abundances were calculated at the genus level with a minimum relative abundance of 1% while bacteria relative abundances were calculated at the order level with a minimum relative abundance of 1%.

Shared genera among microbial communities
The unique genera from the samples were exported from the phyloseq object in R and compared to previously reported known genera present in the core microbiome of activated sludge [12], anaerobically digested biosolids (from municipal sewage treatment) [11,13], and the core microbiome of the human gut [24,25].Major members of the human gut microbiome were concatenated from two sets of previous human gut microbiome studies [24,25].The first set of genera was sourced from [25], where genera were considered from genomes with cumulative depth of coverage of �10× across all samples and for which a ratio of non-synonymous to synonymous polymorphism rates could be calculated.The second set was from [24], where genera from phyla present in at least 50% of samples, with at least 1% median abundance, were considered.These microbiomes were selected for comparison as they are relevant to pit latrines: activated sludge and anaerobic digesters are human waste treatment systems while the human gut microbiome is expected to contribute to the microbiome of pit latrines.The soil microbiome was not considered since they tend to have wide variability, and the studied pit latrines were lined (minimizing the infiltration of soil microbes to the pit latrines).Common genera were found for each potential overlapping community (e.g., anaerobic digester and pit latrines, etc.).A Venn diagram was plotted such that a group with more taxa was represented by a larger area.

Genera function determination
Using the MiDAS 4 database [26], metabolic functions were assigned to 43% (95 genera) of the genera found in the pit latrines.Each genus in the MiDAS database was assigned "no", "yes", or "variable" for each metabolic function listed in the database.For each metabolism for which a genus received a "yes", that genus was assigned a value of 1 for that metabolic function.The number of reads for each genus at each depth was summed.Then using the metabolic function data for the genera, the number of reads capable of a metabolic function was summed and compared to the total number of reads for a specific depth to calculate the percentage of metabolic functions at each depth.

Identification of pathogens
Pathogens were assessed using the 16SPIP analysis pipeline which contains a database of 346 16S rRNA clinical samples of bacterial pathogens of human health concern [27].Reads that were at least 99% identical to the reference database were mapped to the species level.From this, a separate phyloseq object was created to separately investigate detected pathogens in the pit latrine samples.

Richness by sampling effort
To inform cost-effective and feasible community monitoring plans, we examined two dimensions that represent levels of sampling effort: 1) sampling only the surface level versus sampling all depths; and 2) the number of latrines sampled.First, we examined the distribution of taxa richness, defined as the number of unique genera present [28], in the surface-level sample when the sampling effort was one latrine by creating the boxplot of the richness across the fifty-five latrines.Next, we calculated the taxa richness when the sampling effort was equal to two latrines by randomly choosing two of the fifty-five latrines.This process was repeated 1,000 times (since there are 55*54/2 = 1,485 possible ways to choose two latrines).We repeated this procedure by increasing the sampling effort in increments of one latrine until the richness maxed out and stabilized.We then repeated this step when pooling surface, mid-point, and maximum depth samples for a given latrine.We also conducted the analysis stratified by site to determine whether different sites may require different sampling efforts.

Statistical and multivariate analyses
Non-metric multidimensional scaling (NMDS) ordination was completed for each domain (Archaea and Bacteria) separately and together using Euclidean and Bray-Curtis dissimilarity metrics using the previously described phyloseq object containing sequence data, metadata, and taxonomy.Samples were assessed based on the sampling depth and location, with 95% confidence level statistical ellipses drawn around each sampling layer and location separately.Similarly, principal coordinates analysis (PCoA) ordination was completed for the pathogenspecific sample set.Ordination analyses were completed using the R package "phyloseq" [29].One sample (Mzuzu pit 5, surface layer sample) was initially assessed in ordinations but was later removed because it was an outlier.
After confirming the data are non-normally distributed, Kruskal-Wallis testing was performed to compare the relationship of alpha diversity values within a specific sample to measures of interest using the "stats" R package, with a p-value less than 0.05 considered statistically significant.Permutational multivariate analysis of variance (PERMANOVA) was used to test whether there were differences between groups using the "vegan" R package [30], with a p-value less than 0.05 considered statistically significant and percent variance represented by R 2 .To correct for multiple comparisons, p-values were corrected using the false discovery rate [31,32].Inputs to the PERMANOVA testing were ASV and metadata tables.

Ethics
This study received ethical approval and a Material Transfer Agreement from the Republic of Malawi National Health Sciences Research Committee (Protocol 17/09/1915).Participation was on the basis of informed, written consent by a household member over the age 18.

Microbial community analyses and comparisons to similar biological systems
Bacteria populations were dominated by fermenters, such as Clostridium sensu stricto 1 (Clostridiales order) at each depth and location (Figs 1 and 2).Faecalibacterium, a sugar fermenter [33], was detected mostly from the surface layers of pit latrines samples.Mzuzu pits contained Succinivibrio, with an increased relative abundance at the surface layers while Blantyre pits interestingly did not contain Succinivibrio at any depth.At both sampling locations, Proteiniphilum, an acetogen [34], was detected at every depth, with slightly higher relative abundance in surface layers.On the other hand, while Romboutsia, an acetogen and sugar and amino acid

PLOS WATER
fermenter [26], was detected in both locations at each depth, a higher relative abundance was found at the maximum depth layers compared to the mid-point and surface layers.
The methanogenesis pathway(s) associated with each archaeal genus (from previous physiological studies), were overlaid onto the calculated relative abundance stacked bar plots (Fig 3).The dominant methanogenesis pathway is mostly pit dependent, but if the pathway shifts within each pit, it shifts from hydrogenotrophic methanogenesis to acetoclastic methanogenesis with increasing depth.Overall, there was a higher relative abundance of hydrogenotrophic archaea in the surface layer samples and an increasing relative abundance of acetoclastic archaea with depth.The pit latrines located in Mzuzu contained more methylotrophic and acetoclastic dominant methanogens than those in Blantyre (Fig 3).The Shannon alpha diversity of Archaea in Mzuzu pits was lower than that in Blantyre pits (S2 Fig) .NMDS ordination analysis of Bacteria and Archaea (Fig 4 ) shows no clear clustering by sample depth and the calculated 95% confidence interval ellipses overlap considerably.The surface layer, mid-point, and maximum depth layers are thus similar to each other considering the whole microbial community.NMDS ordination by location (Fig 5 ) shows slight clustering by sampling location, with Mzuzu samples tending to cluster closer together and therefore more similar to each other than Blantyre samples, regardless of depth.However, the 95% confidence interval ellipsoids are broadly overlapping.
Because there are very few microbial community analyses of pit latrines, we compared the pit latrine genera with those from relevant microbiomes (activated sludge, anaerobic digestion of municipal sewage sludge, and the human gut).The majority (55%) of the genera in Malawi pit latrines were unique compared to these related microbiomes.The highest number of shared genera were with municipal sewage sludge anaerobic digesters, and about half of human gut genera were found in pit latrines (S3 Fig) .In this analysis, the activated sludge core microbiome shared no genera with the pit latrines microbiome, the human gut core microbiome, or the anaerobic digestion of municipal sewage sludge microbiome.Of the 124 detected genera that were unique to the pit latrine samples, 28 (23%) were described in the MiDAS database.There were a high number of taxa associated with anaerobic digestion, including methanogens, fermenters, and metabolizers of sugars, short-chain fatty acids, and proteins/ amino acids.
For functional profiling of the pit latrine microbial communities, genera that were not in the MiDAS 4 database were not included.As depth increased, so did the proportion of methanogens, protein/amino acid metabolizers, acetogens, short chain fatty acid metabolizers, and chemoautotrophs/mixotrophs (Fig 6).On the other hand, with increased depth, the proportion of aerobic heterotrophs, sugar metabolizers, and fermenters decreased (Fig 6).

Pathogens and public health implications
Fifty-nine presumptive human pathogens were detected across the 55 pit latrines sampled (S4 Fig) .Of these fifty-nine, six (Clostridium difficile, pathogenic Eschericia coli, Shigella boydii, Shigella dysenteriae, Shigella flexneri, Shigella sonnei) were also detected in a previous pathogen-specific study of these pit latrine samples using PCR analysis [5].This previous study also targeted viruses, protozoa, and helminths in addition to bacteria while the current study only targeted bacterial pathogens.More pit-to-pit variation of pathogen composition than within pit variation was found (S4

Guidance for sampling pit latrines
Richness by sampling effort.There were 223 unique taxa (genera) across all latrines and sites.When sampling one to four pit latrines, the median number of taxa when sampling all the depths was notably higher and the interquartile ranges smaller compared to only the surface layer being sampled (Fig 7).In both the surface layer and all layers, as the number of pit latrines sampled increased, the median number of taxa found also increased and the interquartile range decreased until the median richness achieved the maximum richness at 8 latrines with a very small interquartile range.At a sample size of eight pits, nearly all taxa were found regardless of depth.This finding suggests that sampling only eight pits at the surface layer is sufficient for estimating species richness by depth(s).We also conducted a Kruskall-Wallis test of the effect of sampling layer on microbial community richness when pooled across all latrines, and sampling layer was not statistically significant (p-value = 0.91).However, it should be noted that these results may change depending on the season during which sampling occurs and seasonal effects would need further study.
Richness sample modeling at each location (Fig 8 ) showed that when eight pits were sampled, most taxa were found regardless of the sampling location.When pooled across all latrines, the effect of site on microbial community richness was not statistically significant (pvalue = 0.82, Kruskall-Wallis test).

Correlations with household data
Whole microbial community.Only the sampling location (Blantyre or Mzuzu) had a statistically significant effect on Shannon alpha diversity within pit latrines (Kruskal-Wallis test, Table 1).PERMANOVA testing of Bray-Curtis beta diversity indicated that two variables, water source and whether or not the pit latrine had a slab, significantly impacted the beta diversity between pit latrines (Table 1).The water source explained 3% of the variation; whether the pit latrine had a concrete slab or not explained 1% of the variation.Overall, 88.5% of the differences between samples were unexplained by the household data collected.No household characteristics had a statistically significant impact on microbial diversity within and between pit latrines.When assessing only Archaea, for which only methanogens were detected, household population had a statistically significant impact on alpha diversity, but no household factors had a significant impact on beta diversity (Table 1).
Pathogens only.Using PERMANOVA to assess the relationship between beta diversity of only the detected pathogens and household factors, three factors had a statistically significant impact (Table 1).Location (Blantyre or Mzuzu), the household's water source, and sampling depth statistically significantly impacted the detected pathogen compositions of the tested pit latrines (Table 1).Sampling depth explained 4.1% of the variation, location explained 1.6% of the variation, and household water source explained 3.5% of the variation among samples.Overall, 79% of the differences observed between pathogen compositions of the pit latrine samples could not be explained by the household data available.

Microbial community analysis and comparisons to similar biological systems
Pit latrine contents are composed of a combination of fresh excreta, degraded excreta (fecal sludge), anal cleansing materials/water, and, potentially, trash and other unwanted materials   unique compared to other biological treatment systems, and not yet well understood.This finding contrasts with previous results where four dominant phyla found in untreated sewage were shared with phyla found in pit latrines [8].However, our study assessed the microbial community composition at a finer taxa resolution (genus-level) and did not find such an overlap between the core activated sludge microbial community [12] and the genera in our studied pit latrines.Given the wide variety of microbial metabolisms within phyla, microbial community overlap at the phylum level is not specific enough to make microbial community characteristic conclusions.Finer taxonomic resolution such as at the genus, species, or ASV/OTU level is more useful for these analyses.Since pit latrines are anaerobic below the immediate surface layer of fresh excreta, we expected the presence of anaerobic microorganisms and anaerobic functions [35].While the taxa overlap with municipal sewage sludge anaerobic digesters and our studied pit latrines was lower than expected, the pit latrines still shared the most taxa with the anaerobic digester microbiome (S3 Fig) and all microbial metabolisms necessary for anaerobic digestion were found in the pit latrines (Fig 6), consistent with previous results [35].Anaerobic digestion relies on a symbiotic microbial community capable of hydrolysis, acidogenesis, acetogenesis, and methanogenesis.Microorganisms that can carry out each of these functions were found in the pit latrine samples, varying by depth with relative abundance (Fig 6).
As depth increased, so too did the proportion of methanogens, protein/amino acid metabolizers, acetogens, short chain fatty acid metabolizers, and chemoautotrophs/mixotrophs (Fig 6).Amino acid metabolism is necessary in anaerobic digestion to regulate ammonia concentrations and prevent inhibition, which we expect to be occurring at least somewhat in the pit latrines [35].Short-chain fatty acids also must be metabolized during anaerobic digestion to either acetic acids or H 2 and CO 2 .This relationship between metabolizer proportions and pit latrine depth is consistent with the anaerobic digestion pathway in that as the proportion of methanogens increases so too do the proportions of amino acid metabolizers, short-chain fatty acid metabolizers, and acetogens.On the other hand, with increased depth, the proportion of aerobic heterotrophs, sugar metabolizers, and fermenters decreased (Fig 6).This finding is consistent with the expected decrease in oxygen concentration and carbohydrates with depth [36].The populations present also suggest that sugar metabolism and fermentation occur mostly at the upper layers and decrease with depth.This layering of function is consistent with the expected age of the material; newer material is expected at the surface layer, and the substrate age increases with depth.This microbial data bolsters the conceptual model of microbial processes within pit latrines put forth by Foxon and Buckley [7], which posits the aerobic to anaerobic layering from top to bottom of the pit.The model also relies on the assumption that pit latrine contents are not mixed but stratify based on when material is deposited.This study provides, for the first time, functional microbiological analyses that support the conceptual model.
Methanogens, members of the Archaea, produce methane through various methanogenesis pathways: acetoclastic, hydrogenotrophic, and methylotrophic.By assessing the Archaea and Bacteria domains separately, we can deduce the potential dominant methanogenesis pathway (s) for each pit latrine (Fig 3).No methanotrophs were found, indicating that methane was not consumed through methanotrophy.Mostly, each pit latrine was dominated by a single pathway (e.g., methane production through acetate or through H 2 /CO 2 ), which is supported by previous work that found variations in physical and chemical properties of pit latrines between pits [37].The dominant archaeal populations show that if a shift in methanogenesis pathway dominance occurred within a pit, it shifted from hydrogenotrophic to acetoclastic with increasing depth.A study on municipal anaerobic digestion found negative correlations between acetoclastic methanogenic families and total ammonia, free ammonia, and volatile fatty acids [38].Although it was not possible to obtain chemical data for these pit latrine samples, the population shift suggests a gradient of acetic acid concentrations that increase with depth and/or a gradient of ammonia that decreases with depth [38].

Pathogens and public health implications
A targeted pathogen-specific study on these pit latrine samples assessed the presence of 20 pathogens, defined a priori, of which 17 were found [5].These assays captured pathogenic bacteria, viruses, protozoa, and helminths while our 16S rRNA sequencing could only capture bacteria.Of the 17 pathogens detected previously, 16S rRNA sequencing indicated the presence of 6, and an additional 49 pathogens that were not included in the pathogen-specific assessments [5].The results indicate that more specific detection approaches should include other pathogens; 16S rRNA sequencing does not require a specific target a priori, thus can reveal non-targeted pathogens, but is not as sensitive to pathogen detection compared to other approaches such as the PCR based TAC platform [5].Assessing pathogen microbial communities complements the targeted pathogen approach by investigating relative abundance of pathogens and detecting non-targeted pathogens (S4 Fig) .From PCoA ordination and PERMANOVA analysis, both location and sampling depth were statistically significant factors contributing to differences between pathogen compositions of the pit latrine samples (Table 1, S5 and S6 Figs).The effect of location and sampling depth was observed only among the pathogens and was not present in the whole microbial community assessment.On the other hand, water source was a statistically significant factor for the whole microbial communities as well as the pathogens (Table 1).This is contrary to the pathogen-specific study on these pit latrines which concluded that household size, but not water source, sampling depth, or location, was a statistically significant factor on pathogen detection [5].The previous study concluded that a one person increase in household size was associated with an increased likelihood of pathogen detection [5], while our pathogen community relative abundance approach found no significant relationship with the household population size.
At a whole community level, fewer household factors resulted in statistically significant beta diversities when compared to only pathogens, using PERMANOVA testing.Further, only household drinking water source was statistically significant and impacted the whole community and pathogen-only beta diversity, suggesting that factors impacting only pathogens are separate from those impacting whole microbial community compositions.

Guidance for public health sampling of pit latrines and household correlations
There were no statistically significant differences among microbial communities between sampling layers (Table 1).Previous studies have reported more variability of physical and chemical properties of pit latrines at the surface compared to deeper layers [39], a variability we did not observe in the microbial communities.Additionally, Ijaz et al. [14] found changes in microbial communities with depth, with more gut-associated taxa at layers closer to the surface and more environmental-associated taxa with pit latrine depth.These differences may be due to taxa resolution, as this study is the first, to our knowledge, to report genus-level microbial community characteristics of lined pit latrines.Our findings suggest sampling of lined pit latrines can be done at any layer and will be representative of the entire pit latrine microbial community.
While sampling layer did not impact the microbial community composition, sampling location did have a statistically significant effect on the alpha diversity but not the beta diversity of the pit latrines (Table 1).This implies that there may be more within pit variation in one location than another.However, richness by sampling effort indicates that these differences are mitigated after eight pits have been sampled, regardless of sampling location (Fig 8).Further, location had a statistically insignificant impact on sample richness of the pit latrines (pvalue = 0.82).Despite apparent clustering by location (Fig 5), PERMANOVA analysis revealed statistically insignificant effects of location on beta diversity, suggesting that pit latrine microbial communities are similar across sampling locations.These results suggest that when looking at two cities the same country, Blantyre and Mzuzu, pit latrine sampling of microbial communities can be done in either location and results are applicable to both locations, if a minimum number of pits are sampled.
Water source and pit latrine type were the only household factors that statistically significantly impacted beta diversity of pit latrine samples (Table 1) and explained 3% and 1% of the variation between samples, respectively.Therefore, when sampling pit latrines, it is important to sample pit latrines representative of each water source and slab type.Further, no household or sampling characteristics significantly impacted both alpha and beta diversity of the microbial communities, indicating that factors influencing microbial community variation within pit latrines are different than those that influence microbial community variation between pit latrines.
This study extends wastewater-based epidemiology that has gained popularity as a tool to collect health surveillance data, particularly COVID-19 data in a community, to a "wastebased epidemiology" approach that includes sampling fecal sludge from shared latrines, tanker trucks, disposal sites representing wastes from many latrines, or representative sampling from multiple households [5].Sampling fecal sludge increases the coverage to communities where risks may be highest [40], but would be neglected by traditional wastewater-based epidemiology that requires centralized sewer systems.However, privacy concerns need to be carefully considered, since health indicator data can be collected at a less anonymized level.Thus, it is imperative to use best practices to eliminate or substantially reduce risks to violations of privacy.The advantages and disadvantages of "waste-based epidemiology" need to be further studied and careful consideration and development of best practices are needed to apply this approach in particular contexts.

Study limitations
There are some important limitations in this study.First, as with the pathogen-specific study sample collection methods, there may have been inadvertent mixing of the sludge when removing material from the pit latrines.Additionally, the Gulper collection device could not be sterilized between collecting the maximum depth and mid-point samples and these layers were not discrete.However, to date there is no ideal method to collect fecal sludge from pit latrines at various depths without disturbing some of the sludge surrounding the collection device.Although there was not an opportunity to collect explicit time series data, sampling depth can be used as a surrogate for time.Further, it was not possible to collect detailed chemical properties of the samples.Also, DNA data is only capable of detecting the presence of organisms and does not indicate activity or viability of the organisms of interest.rRNA gene detection is also susceptible to biases throughout the extraction and analysis workflow and the microorganism lower limit of detection for these specific samples is unknown.Finally, assessing pathogens using 16S rRNA gene sequencing is limiting as some pathogens are not detectable with this method, it is not always possible to distinguish possible pathogens from closely related non-pathogenic microbes, and our 16S rRNA approach could only target bacteria and archaea.Therefore, a more targeted pathogen assessment [5] in combination with 16S rRNA gene analysis is suggested.

Fig 2 .
Fig 2. Relative abundance of Bacteria (order level) in each pit latrine sampled at three depths in Blantyre.https://doi.org/10.1371/journal.pwat.0000171.g002 Fig).Across all samples, the pathogen composition in Blantyre pit latrines was more similar than that in Mzuzu pit latrines (S5 Fig), like what was observed at the

Fig 3 .
Fig 3. Relative abundance by genera of Archaea communities in each pit latrine sampled at three depths in Blantyre (A) and Mzuzu (B).Blue colors indicate genera that primarily use hydrogenotrophic methanogenesis, red indicates Methanosaeta, the only genus that primarily uses acetoclastic methanogenesis, purple colors indicate genera that can use hydrogenotrophic or acetoclastic methanogenesis, and orange/yellow colors indicate genera that use methylotrophic methanogenesis.https://doi.org/10.1371/journal.pwat.0000171.g003

Fig 4 .
Fig 4. NMDS ordination for the bacteria and archaea domains using Bray-Curtis distance metrics.Statistical ellipses corresponding to depths are drawn with their respective colors with 95% confidence levels.Pit numbers are displayed next to each data point.https://doi.org/10.1371/journal.pwat.0000171.g004

Fig 5 .Fig 6 .
Fig 5. NMDS ordination for the bacteria and archaea domains using Bray-Curtis distance metrics.Statistical ellipses corresponding to location are drawn with their respective colors with 95% confidence levels.Pit numbers are displayed next to each data point.https://doi.org/10.1371/journal.pwat.0000171.g005

Fig 7 .
Fig 7. Richness by sampling effort (1 to 8 latrines) by depth when resampling from 55 pits.Maximum richness (223 taxa) was achieved for nearly all samples greater than eight latrines and hence are not shown.Top panel shows results if only the surface layer sample from each pit latrine is used to estimate richness while the bottom panel shows results if the samples from all three depths are considered.https://doi.org/10.1371/journal.pwat.0000171.g007

Fig 8 .
Fig 8. Richness by sampling effort (1 to 8 latrines) by location when resampling from 25 (Blantyre) or 30 (Mzuzu) pits.Maximum richness (223 taxa) was achieved for nearly all samples of greater than eight latrines and hence are not shown.Top panel shows the results if only samples from Blantyre pit latrines were used to estimate richness while the bottom panel is if only samples from Mzuzu pit latrines were used to estimate richness.https://doi.org/10.1371/journal.pwat.0000171.g008

Table 1 . Statistical comparisons between alpha diversity, beta diversity, and household survey factors compared to whole microbial community compositions, path- ogens only, and Archaea only. Whole microbial community Archaea only Pathogens only
p-values adjusted for multiple comparisons are reported, statistically significant results are in bold (p > 0.05).https://doi.org/10.1371/journal.pwat.0000171.t001