Population infection estimation from wastewater surveillance for SARS-CoV-2 in Nagpur, India during the second pandemic wave

Wastewater-based epidemiology (WBE) has emerged as an effective environmental surveillance tool for predicting severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) disease outbreaks in high-income countries (HICs) with centralized sewage infrastructure. However, few studies have applied WBE alongside epidemic disease modelling to estimate the prevalence of SARS-CoV-2 in low-resource settings. This study aimed to explore the feasibility of collecting untreated wastewater samples from rural and urban catchment areas of Nagpur district, to detect and quantify SARS-CoV-2 using real-time qPCR, to compare geographic differences in viral loads, and to integrate the wastewater data into a modified Susceptible-Exposed-Infectious-Confirmed Positives-Recovered (SEIPR) model. Of the 983 wastewater samples analyzed for SARS-CoV-2 RNA, we detected significantly higher sample positivity rates, 43.7% (95% confidence interval (CI) 40.1, 47.4) and 30.4% (95% CI 24.66, 36.66), and higher viral loads for the urban compared with rural samples, respectively. The Basic reproductive number, R0, positively correlated with population density and negatively correlated with humidity, a proxy for rainfall and dilution of waste in the sewers. The SEIPR model estimated the rate of unreported coronavirus disease 2019 (COVID-19) cases at the start of the wave as 13.97 [95% CI (10.17, 17.0)] times that of confirmed cases, representing a material difference in cases and healthcare resource burden. Wastewater surveillance might prove to be a more reliable way to prepare for surges in COVID-19 cases during future waves for authorities.


Introduction
Wastewater-based epidemiology (WBE) has emerged as a valuable and cost-effective strategy for monitoring the prevalence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) within communities and predicting disease outbreaks [1,2].This approach capitalizes on the detection of SARS-CoV-2 RNA in wastewater samples and has been widely employed using samples obtained from wastewater treatment plants (WWTPs) in nations with centralized sewage networks [1,2].While initially applied in countries with centralized sewage networks, predominantly through wastewater treatment plant (WWTP) samples [3,4], the applicability of WBE has transcended geographical constraints, encompassing a diverse range of sources such as river water, airport wastewater, hospital effluents, marketplaces, and municipal drains [5][6][7][8].
This research endeavours to explore the feasibility of a cross-sectional wastewater-based sampling strategy aimed at detecting and quantifying SARS-CoV-2 viral loads in untreated wastewater within the Nagpur district, located in Maharashtra, Central India.Notably, the sampling done for this study coincides with the second wave of the COVID-19 pandemic in India in 2021, marked by an unprecedented surge in transmission and heightened disease impact.However, the comprehensive integration of WBE into disease surveillance systems, particularly in low-and middle-income countries (LMICs), is limited due to inadequate centralized sanitation facilities [3,4].One of the major reasons for this underutilization of WBE in LMICs, despite its huge potential, is that in such countries centralized sanitation facilities are often lacking [7].
In the course of carrying out this research, though there are several related works [9][10][11][12][13], one seminal study in the field of WBE that this research utilised was conducted by McMahon et al. [9].Their study investigates the use of wastewater samples to monitor community-level transmission of SARS-CoV-2, the virus responsible for COVID-19.The authors employ a Susceptible-Exposed-Infectious-Recovered (SEIR) model to estimate the number of infected individuals based on SARS-CoV-2 RNA concentrations detected in wastewater.Via their rigorous analysis, McMahon et al. [14] demonstrate the utility of the SEIR model in predicting infections by considering various parameters such as transmission rates and viral shedding dynamics.In addition, their work introduces a simplified equation that aids in estimating infections from wastewater data, enhancing the accessibility of the model's application.The study's use of Monte Carlo simulations further strengthens the accuracy of predictions, revealing a notable discrepancy between estimated infections and confirmed cases, thus highlighting the potential value of the SEIR model in informing public health strategies [14].
To this end, the integration of wastewater-based estimates complements traditional clinical testing and bolsters the accuracy of surveillance efforts, especially in resource-constrained settings where extensive clinical testing might be challenging [15].Thus, by integrating data from wastewater samples with demographic information and clinical data, a model is proposed which generates robust estimates of the number of COVID-19 infections within a given population.Crucially, this approach provides a comprehensive perspective on viral transmission dynamics, assisting public health officials in understanding the disease's impact on a broader scale.The imperative of this study is to develop wastewater-based surveillance systems in LMICs, particularly those with resource limitations and complex infrastructural challenges and underscores the necessity of adapting WBE to a broader global context [16,17].In these settings, the translation of wastewater surveillance data into effective public health tools requires the integration of mathematical models and simulations.
To address these challenges, the adaptation of mathematical models is crucial.The use of a modified version of SEIR modelling and Monte Carlo simulation (MC) in this study is motivated by the ability to effectively capture and analyze the transmission dynamics of infectious diseases, such as SARS-CoV-2.The SEIR model and MC simulation have established themselves as valuable tools in epidemiological research because of their ability to provide insights into the complex systems involved in infection transmission, population dynamics, and uncertainty analysis.The SEIR compartment model forms the foundation for understanding disease transmission dynamics [18][19][20][21][22].The SEIR model categorizes individuals into different compartments based on their disease status, encompassing susceptible, exposed, infectious, and recovered individuals.This model enables the estimation of disease prevalence over time, aiding in the interpretation of wastewater surveillance data and its linkage to community infection dynamics.MC simulations, on the other hand, are a robust computational technique used to account for uncertainties and variations in parameters.By generating multiple simulations with randomly sampled inputs, MC simulations enable the exploration of a range of possible outcomes.This is particularly valuable in epidemiological studies where factors such as contact rates, transmission probabilities, and intervention effects can vary or are uncertain.MC simulations provide a way to quantify the uncertainty associated with model predictions, helping researchers understand the potential variability in their results [14,[23][24][25][26][27].
This research initiative represents a pioneering effort in the Indian context, harnessing the SEIPR model and MC simulations to illuminate the transmission patterns of SARS-CoV-2 through wastewater.By addressing critical knowledge gaps within LMICs and regions confronting infrastructural limitations, this study contributes not only to scientific advancement but also furnishes actionable insights for policy formulation and disease mitigation.Amidst the complex landscape of the COVID-19 pandemic, this endeavour augments the global repository of knowledge, empowering communities and authorities alike to respond effectively to this ongoing public health challenge.
In this study, we explored the feasibility of conducting a cross-sectional wastewater-based sampling study for the detection, determination, and comparison of SARS-CoV-2 viral loads from untreated wastewater in urban and rural areas of Nagpur district, Maharashtra, Central India.We selected our sampling period during the second wave of COVID-19 in India in 2021.We next developed a modified version of the SEIR compartment mathematical model that has been frequently used to model COVID-19 dynamics in different populations [18,19,22], herein termed the "SEIPR model" to predict the number of infected individuals within specific Nagpur district partitioned zones and the total urban population under study.After predicting the number of infected individuals, the estimates were used to perform Monte-Carlo simulations to model the variations in the concentration of SARS-CoV-2 RNA in wastewater over time.These modelled changes were then compared to the actual measurements recorded to evaluate the accuracy of our SEIPR model.The urban incident COVID-19 cases were also used to calculate the basic reproduction number R 0 based on the SEIPR model.This data was correlated to air temperature, relative humidity (a loose proxy for rainfall as we did not have the precise precipitation data), and population density to enhance epidemiological understanding of environmental and human factors that may impact SARS-CoV-2 transmission dynamics in Central India.To the best of our understanding, this is the first Indian report that has employed the SEIPR model to measure the transmission patterns of SARS-CoV-2 through wastewater.This study could prove valuable for local authorities and government officials as it provides important insights to make well-informed policy decisions.

Wastewater sampling, SARS-CoV-2 detection, and quantification
Untreated (raw) wastewater samples were collected prospectively from the drainage systems in the Nagpur district of Maharashtra, India, during the second wave of the COVID-19 pandemic from January 31st to July 9th, 2021.Nagpur district is divided into 13 rural talukas and the Nagpur urban region, governed under Nagpur Municipal Corporation (NMC).The Nagpur urban region is further divided into ten municipality zones with each further divided into municipal wards.Individual grab samples were collected from sewers within each urban municipality zone as well as open drains/groundwater sources of rural talukas representing the complete Nagpur district, as illustrated in Fig 1 right panel (urban taluka) and left panel (rural talukas in relation to urban taluka).Each sample (1000 mL) was collected in sterile widemouth autoclaved plastic bottles sealed in plastic bags and transported under a cold chain at 4˚C within 18-24 hours.All sampling was conducted during the morning hours between 07:30 to midday using appropriate COVID-19 precautions.Samples were transported to Dr B. Lal Institute of Biotechnology, Jaipur, for pre-processing, RNA extraction and SARS-CoV-2 detection by RT-qPCR, as previously described [28].No specific permits were required for this study for field site access.We have only informed NMC regarding this study.Detailed sample processing methodology is presented in S1 Appendix.within the ten different municipality zones in urban Nagpur were obtained from the health department of the NMC.

Epidemiological modelling and estimation of infected individuals
We based our study of the transmission of SARS-CoV-2 infections on a deterministic ordinary differential equation (ODE) disease model in which the individuals in an entire population can present in five mutually exclusive compartments according to their disease status and other measures.These compartments are susceptible, exposed, infectious, confirmed positive and recovered, abbreviated as the SEIPR model, which is a modification of the SEIR model and that described by Acheampong et al. [20], where additional compartments were given to reflect the Ghanaian environment [20].We denote the proportion of susceptible individuals by S(t), the proportion of asymptomatic infected individuals by E(t), the proportion of symptomatic infectious individuals by I(t), the proportion of confirmed positive infectious individuals by P(t) and the proportion of recovered individuals by R(t).It must be noted here that individuals in the confirmed-positive class are carriers of the SARS-CoV-2 virus who have had clinical confirmation of this status.However, individuals in an infectious class show clear symptoms and have high infectivity but have not yet been clinically confirmed positive.Notably, as highlighted by Acheampong et al. [20], individuals classified within the infectious class I (t) represent an abstract concept that is often unmeasurable.This underscores the significance of introducing a compartment like the confirmed-positive class P(t), enabling comparison with the actual reported cases within the population.The SEIPR model was applied to study COVID-19 dynamics in ten zones within Nagpur's urban area.Each zone operates independently.Disease transmission is driven by a force of infection (λ), determined by the effective contact rate per day (β 1 ) and reductions in transmissibility for exposed (β 2 ) and confirmed positive (β 3 ) individuals.Disease-induced deaths are assumed to only occur within the infectious (I) and confirmed positive (P) compartments.The model describes how individuals transition between these compartments based on rates of entry and exit, such as exposure to infection (λ), testing (ω), recovery (ρ), and disease-induced death (d).Recovery of individuals (R) depends on recovery rates from the confirmed positive (P) and symptomatic infectious (I) compartments (ρP(t) and ργP(t), respectively).Additionally, no natural birth and death are considered, and their exclusion may be justified given the assumed short-term focus on COVID-19 dynamics and the neglect of population-level demographic changes, simplifying the model for this specific epidemiological context.These underlying assumptions guide the model's representation of COVID-19 transmission and progression in the Nagpur urban area.The transmission dynamics of the SARS-CoV-2 infections are described by the five nonlinear systems of ODEs shown in Eq (1): denotes the number of individuals in these compartments at time t, T denotes matrix transposition, denotes the parameter vector and f(�) denotes the nonlinear relationship describing the state variable (see S2 Appendix for detailed mathematical derivation of SEIPR model).The force of infection used in this model is λ = β 1 (β 2 E(t) + β 3 P(t) + I(t)), with β 1 denoting the effective contact rate per day, and β 2 and β 3 respectively accounts for the reduction in disease transmissibility of exposed and confirmed positive individuals.A value of epidemiological importance in infectious disease modelling is the basic reproductive number, which in this study is referred to as the number of secondary SARS-CoV-2 infections generated by a single active SARS-CoV-2 infected individual during the entire infectious period [29].It is given by the Eq (2): where i T = δ + γρ + d and p T = ρ + d.The effective reproductive number (R 0 ) is made up of contributions from secondary infections from the exposed class generated by asymptomatic individuals (first term), confirmed positive individuals' class (second term), and the infected (symptomatic) class (third term).S 0 is the proportion of the population that is initially susceptible.Other parameters in Eq (2) are defined as follows: � denotes the incubation period, σ denotes the progression rate of susceptible individuals to the confirmed positive class via testing per day, δ denotes the progression rate of infectious individuals to the confirmed positive class via testing per day, d denotes the disease-induced death rate per day, ω denotes the fraction of exposed individuals that transient to confirmed positive class, γ denotes the fraction of infectious individual that transient to recovery class and ρ denotes the recovery rate of confirmed positive individuals per day.In this study, the nonlinear least squares scheme is used to estimate the parameters involved in the calculation of R 0 .The model fitting was first carried out for each zone to obtain zone-specific parameter estimates and secondly for all zones put together as a single unit.Further details about model derivation and parameter estimation can be found in the Supplementary (see S2 Appendix for a full description of model parameters and variables).For this study, the number of SARS-CoV-2 infected individuals within urban Nagpur was estimated using the modelling approach proposed by McMahon et al. [14], which combines our disease model (SEIPR) to the viral concentration estimations [14].As already mentioned, there are ten zones within urban Nagpur and each zone is modelled independently.Based on McMahon et al. [14], using our SEIPR disease model, the number of newly detected infections on the jth day I n j is modelled as a Poisson process with rate parameter Nβ 1 [β 2 E(j) + β 3 P(j) + I(j)], which is expressed as Eq (3): where N is the total number of individuals that reside in the zone of the drainage systems.The viral load being introduced into the drainage system at time t is where V ij (t) is the number of copies of SARS-CoV-2 RNA entering the drainage systems via faeces of the ith individual of out the I n j who became infected on day j is modelled according to the Eq (5) for i ¼ 1; 2; . . .; I n j (infected individuals) and j = 1, 2, . .., J (days).In Eq (5), ϑ ij denotes the log 10 g of faeces per ith individual who gets infected on the jth day, modelled as a normal distribution with mean of 2.41 and standard deviation of 0.25 per data from lower-middle-income countries [15], ϕ ij denotes the log 10 maximum RNA copies per g being of faeces shed 5 days after being infected, modelled as a normal distribution with mean of 7.6 and standard deviation of 0.8 [14] and ψ ij denotes the log 10 RNA copies per g being of faeces shed 25 days after being infected, modelled as a normal distribution with mean of 3.5 and standard deviation of 0.4.To correlate the viral load being introduced into the drainage system to that being measured, McMahon et al. [14] proposed the Eq (6) called the downstream RNA copies measured, V(t, τ) to account for the time-dependent degradation in the drainage system, where τ is the time elapsed between waste excretion and arrival at the drainage systems modelled as a uniform distribution from τ = 1h to τ = 1.5h,V 0 (t) is the viral load introduced into the drainage system modelled by Eq (4), τ* is the temperature-dependent half-life modelled according to Eq (7) where T is the current temperature of the drainage system, modelled as a uniform distribution from T = 19˚C to T = 31˚C, t * 0 is the half-life (h) at an ambient temperature of T 0 , modelled as a normal distribution with a means of 3 h and 30 h respectively, with standard deviations of 0.7 and 1.5, Q 0 is the temperature-dependent rate of change, modelled as a normal distribution with a mean of 5.5 and standard deviation of 0.5.The choice of distributions and parameter ranges were informed by previous research as well as actual measurements or observations of SARS-CoV-2 in wastewater to inform their selection of parameter ranges for the Monte Carlo simulation.All the above information was used to simulate the viral load of infected individuals generated by our proposed disease model via 500 Monte Carlo simulations, since beyond this number of Monte-Carlo simulations, the value of the simulated RNA copies does not significantly change.Importantly, the number of Monte Carlo samples depends on various factors including the complexity of the model, which is the case here.
Finally, McMahon et al. [14] proposed a model for estimating the number of infected individuals in each day given the measured RNA copies quantified from samples collected from the drainage systems and is given by the Eq (8) where Q denotes the average flow rate at the drainage system in L per day, V denotes the virus copies per L, A is the rate of faeces production per person in g per day with A = 2 × 128 for developing countries [15], and B denotes the maximum rate at which the virus is shed in RNA copies for g of faeces per day with B = 10 7.6 × 128 [14].In this study, Q was calculated as a point estimate using the product of the at-home population in the catchment of each zone, and the observed average per capita wastewater rate, which we assumed to be either 120 or 135 L/person/day (based on the Ministry of Housing and Urban affairs suggested benchmark for urban water supply).

Statistical analyses
Due to the lack of COVID-19 incidence data for the rural areas in Nagpur, we explored catchment areas within urban Nagpur by zones to gain insight into the concentration of SARS-CoV-2 viral load in the collected wastewater samples.Based on the model parameter estimates, the distribution of the RNA copies per day existing in the drainage systems by zones was estimated, where we used the 2011 population census data as an estimate for each population zone.Of note, the use of the Monte Carlo simulation approach can help estimate uncertainties and account for variability in the data, which provides some indication of potential uncertainty and variability in prevalence estimates despite the limitations of using this census data, making the margin of error not a major problem.

Ethical approvals
The study was approved by the Faculty of Medicine and Health Sciences Research Ethics Committee at the University of Nottingham (REC No. 131-1120), and the institutional ethics committees of the Central India Institute of Medical Sciences, Nagpuadjust width.Lal Institute of Biotechnology, Jaipur.

SARS-CoV-2 detection in wastewater samples
A total of 983 wastewater samples were analysed, of which 743 (75.6%) were from the urban and 240 (24.4%) from the rural parts of Nagpur district.Overall, 43.7% (95% confidence interval 40.1, 47.4) of wastewater samples in the urban and 30.4% (95% confidence interval 24.66, 36.66), in rural areas tested positive for SARS-CoV-2 (p < 0.001); RT-PCR results revealed significantly higher SARS-CoV-2 viral copies per L in urban zones (p < 0.001).The median temperature of urban Nagpur was 29.0˚C, (IQR: 25.75-31.00)and was significantly lower than that of the rural areas (31˚C, (IQR: 29-33); p < 0.001).The median humidity was also significantly higher in urban (38%, IQR: 26-53) vs rural (32%, IQR: 22-50) Nagpur (p < 0.001) at the time of sampling (Table 1).Of the 10 sampled urban catchment zones, two zones (7 and 9) yielded no SARS-CoV-2 RNA detection but did record the highest humidity levels (Table 2).Only 3 zones experienced rainfall; zones 1 and 8, where rainfall was recorded 1 day prior to sample collection, and zone 7, where sample collection took place during heavy rainfall.It is likely that these rainfall events would also contribute to diluting the sewage prior to sampling.Moreover, rainfall events would also contribute to more rapid and effective flushing out within the sewers.In zone 9, wastewater sampling followed the conclusion of the main COVID-19 infection wave, and therefore, the cases of COVID-19 at the time of sampling were expected to be very low, as illustrated in S1A-S1E Fig ( S3 Appendix).The distributions of the continuous data and their normality plots by individual zones are shown in S3A-S3D Fig ( S3 Appendix).The respective significance p-values shown on the plots are all less than 0.05, indicating the data is not normally distributed.S1 Table (S3 Appendix) summarises the demographic characteristics of the catchment zones where wastewater samples were collected.The demographic and environmental characteristics by zones are presented in S2 and S3 Tables (S3 Appendix).

Estimation of infected individuals
We fitted our proposed SEIPR model to the reported confirmed COVID-19 positive cases and deaths in urban Nagpur via the nonlinear least squares method.Both plots show an increase in confirmed positive cases and deaths up to the first 50 days and then a decrease over the last 100 days.Thus, the SEIPR model predicts a decrease in the susceptible population as individuals become exposed, infected, confirmed positive, and then either recover or are confirmed dead.The remaining model fittings for the urban zones are presented in S5 Fig ( S3 Appendix).The corresponding model parameter estimates for the respective catchment zones and R 0 as calculated using clinical incident data only, are presented in Table 3.Each urban catchment zone exhibited different effective contact rates, β 1 , signifying different contact patterns.In addition, the basic reproduction number, R 0 is different for each catchment zone with the highest R 0 observed in zone 9 and lowest in zone 2. All the zones have an R 0 greater than 1 except for zone 2. All the zones, when combined as a single unit, gave an R 0 of 1.11.Linear regression analysis to investigate the variation in R 0 and β 1 between the zones revealed a statistically significant positive correlation between R 0 and population density [R 2 = 0.40, p-value = 0.05] whilst for effective contact rate (β 1 ) and R 0 , there was a negative correlation with humidity [R 2 = 0.49, p-value = 0.02].No significant relationship was seen between temperature and R 0 or Taking all zones combined, Fig 2(c) depicts the distribution of the RNA copies per day, similar to the dynamics observed by McMahon et al. [14].
There is a positive correlation between the concentration of SARS-CoV-2 RNA in the wastewater and the number of confirmed positive individuals as well as recovering individuals and shedding rates.There was a different association between the measured viral RNA concentration and the confirmed positive cases during the earlier stages (January and February 2021) of the wastewater sampling, with high wastewater viral concentrations but low numbers of confirmed positive individuals.Therefore, zone 1 and zone 2 were not considered for the viral RNA load SEIPR modelling (see S2   Appendix)) except for zone 7 and zone 9, where wastewater samples from these zones tested negative (Table 2).Furthermore, all plots depicting the entire Monte-Carlo simulations of the predicted number of active COVID-19 cases versus SARS-CoV-2 RNA mass rate are presented in S9 and S10 Figs ( S3 Appendix).
Table 4 presents the SARS-CoV-2 RNA wastewater concentrations in samples taken from all the catchment zones considered as a single unit between 1st March and 27th of May, 2021.Results of the other catchment zones are presented in the Supplementary.Each row corresponds to a specific date on which the wastewater samples were taken.The "RNA (copy per L)" column provides the concentration of SARS-CoV-2 genetic material in wastewater, providing insights into the prevalence of the virus in the population.The following columns, titled "Option 1" and "Option 2", present two separate scenarios based on different wastewater rates per capita (120 L/person/day for Alternative 1).and 135 L/person/day for 'Option 2).These scenarios are important for estimating the number of infected individuals using RNA concentrations as an indicator of viral activity.Calculated RNA levels are provided for each scenario, showing the rate of change in viral RNA levels per day.In addition, the "Estimated number of infected individuals" column quantifies the number of potential COVID-19 cases inferred from RNA levels, providing a way to assess community spread of the virus.
Direct comparison with clinically observed cases is presented in the column "Clinically observed COVID-19 positive cases", showing actual confirmed positive cases reported by clinical diagnoses.This actual data is used as a benchmark to evaluate the validity of the estimates obtained through wastewater analysis.Side-by-side estimating infected individuals with

Discussion
WBE has been used as a tool for surveillance of COVID-19 infections at the communitylevel and complements clinical-based surveillance and screening, which is limited by cost, turnaround time, and the bias associated with uncharacterized asymptomatic infections and their contribution to infection spread.WBE captures the totality of symptomatic, pre-symptomatic and asymptomatic carriers within a specific community [16,17] This study is the first to successfully pilot and assess WBE as a methodology for the detection and quantification of SARS-CoV-2 viral RNA in community sewers in Nagpur district of Central India during the second wave of the pandemic in 2021.Whilst several epidemiological models have been described and compared for transmission of SARS-CoV-2 [2,21,30], this study employed a new SEIPR model, which adds the extra compartment of "confirmed positive" to estimate the number of infected individuals and was further used to estimate the mass rate of RNA in the wastewater.We observed a low number of clinical cases early in the COVID-19 wave that was out of proportion to the observed high SARS-CoV-2 concentration in the wastewater.If we use our modelling results from later in the study and apply them to this earlier period, it reveals that the clinical surveillance data underestimated the level of COVID-19 transmission in the Nagpur district.The model predicts the unreported number of cases under the per capita wastewater rates of 120L/person/day and 135L/person/day to be 12.42 (95% CI 9.04, 15.15) and 13.97 (95% CI (10.17, 17.0) times higher than the reported number of cases, respectively.Hence, SARS-CoV-2 RNA detected in community wastewaters may have come from pre-symptomatic, symptomatic, or asymptomatic cases who did not selfreport to their local health monitoring unit due to fear of social stigma, isolation, or quarantine, or simply because they did not know they were infected [31,32].Under-reporting bias in the clinical incident data is also likely to have arisen due to the limitation of testing resources (analytical kits, personnel), coverage, and accessibility of testing sites [33].We observed that SARS-CoV-2 seemed to be suppressed in samples collected from catchment zones recording higher relative humidity, a loose proxy for rainfall.This was further substantiated by observing a statistically significant negative correlation between R 0 (effective reproductive number) and humidity, and β 1 (effective contact rate per day) and humidity, but not temperature.These results partly agree with those reported elsewhere in which temperature and humidity were inversely correlated with daily new cases and deaths of COVID-19 with several studies reporting that SARS-CoV-2 is sensitive to high temperatures and humidity [34,35].It is likely that rainfall events prior to or during the sampling phase may have contributed to the lack of detection of SARS-CoV-2 RNA due to the dilutional effect.The substantial variation in parameter values across the ten geographic zones, as detailed in Table 3, is a consequence of the inherent complexity and diversity of real-world conditions being modelled.These variations are influenced by factors such as population density, healthcare infrastructure, interventions, and social behaviours specific to each zone.While these differences may appear significant, they are expected in epidemiological modelling and reflect the diverse nature of disease spread in different settings.Rather than indicating issues with the model, these variations underscore the need for tailored, context-specific modelling to capture the nuanced dynamics within each zone accurately.This diversity in parameters enhances the model's ability to represent the unique characteristics of each zone.Moreover, the calculation of R 0 considers the complex interplay of these parameters, and the model offers valuable insights into the dynamics of COVID-19 within a geographically diverse urban area like Nagpur.The variation in R 0 estimates (0.98-1.66) between the different zones in Nagpur urban district may be due to additional factors such as variation in sociobehavioural habits (personal hygiene, wearing masks, handwashing, social distancing, vaccine uptake, social gatherings), sociodemographic, educational levels and dietary factors.Factors such as high levels of youth, income inequality, high population density and social media usage are associated with high R 0 and may be important influences shaping zonalwise variation in R 0 in Nagpur as reported across countries [36].Overall, these R 0 estimates for the second wave of COVID-19 in India are consistent with a baseline R 0 of 1.450 recorded for Maharashtra and 1.379 for India by Marimuthu et al. [37] but fall below earlier estimates calculated by Shil et al who reported R 0 in the range of 2-3 during the initial wave of infection for the majority of Indian districts (March-June 2020) [38].This depicts that the use of 2011 population census data as a proxy for the modelling process in this study was not out of place as the estimated R 0 in this study is consistent with what other studies have found.This feasibility study identified a unique set of challenges in the implementation of WBE in Central India which mirrors those observed in other LMIC settings such as Bangladesh [4].These include establishing a sampling plan and schedule that is representative of the different urban and rural catchment populations, underdeveloped sewage systems in rural areas necessitating onsite sanitation epidemiology/sampling; development and validation of standardized protocols for lab analysis; complex collaborative efforts from government agencies, public health units and academia and resource limitations (e.g., autosamplers not suitable for large rapid monitoring where passive sampling techniques are more easily implemented) [39].Supply chain issues for essential goods such as PPE and PCR diagnostic reagents, and logistical constraints such as inaccessibility and poor transport systems made it difficult to reach rural communities in remote areas.In recognition of these challenges, we acknowledge several study limitations.Although we did assess and compare the abundance of SARS-CoV-2 viral concentration in untreated wastewater samples between urban and rural areas, in line with other wastewater research studies in India [40,41] most of our sampling sites were from urban zones of Nagpur, introducing sampling bias.Due to the lack of COVID-19 clinical incident data for the rural areas sampled, we were not able to apply our SEIPR model to model infectious burden in rural Nagpur.We also had to base our model assumptions on historical rather than current census data which is not available from Nagpur district.Due to the limitation of resources and skilled personnel, we were not able to undertake 24-hour composite and longitudinal sampling which we recognize would have made our data more representative, to assess the impact of seasonality or to obtain detailed information on spatiotemporal trends.Moreover, we were unable to record physiochemical, hydrologic, and anthropogenic parameters of the wastewater samples which would have affected RNA concentrations, and consequently, SARS-CoV-2 RNA detection [31].Although we did not collect daily rainfall measurements and instead used relative humidity as a proxy for rainfall, the majority of the sampling period was conducted during periods of no rain.We acknowledge the use of air temperature as a surrogate for wastewater temperature in the absence of direct wastewater temperature data, particularly in open drainage systems.While this substitution is a common practice in environmental modelling due to data limitations, it's essential to recognize its potential limitations and the possible impact on the results.Wastewater temperature can be influenced by various factors beyond just air temperature in open drainage systems, such as ground temperature, flow rates, and interactions with other environmental factors.This assumption may introduce some level of uncertainty into the model, and future studies should aim to collect specific wastewater temperature data to improve the accuracy of the modelling.However, given the data constraints, the use of air temperature can provide a reasonable estimation of wastewater temperature and is a common approach in the field.We recognize that with any modelling efforts, it will be important to explore the sensitivity of the model to different assumptions in future research.Future studies should also adopt the use of rapid in-field testing of SARS-CoV-2 or any pathogenic target as opposed to bringing samples back to a central lab with appropriately trained personnel.This technology is already in proof-of-concept stages and could be easily operationalized ahead of future outbreaks or pandemics.

Conclusion
We have established a quantitative framework to estimate COVID-19 prevalence and predict SARS-CoV-2 transmission through integrating wastewater-based surveillance data into a SEIPR model.The constructed model may be used to provide accurate and robust estimates of future waves of the COVID-19 pandemic and could usefully be applied to study other infectious diseases or expanded to consider reinfected populations.Our findings showcase the translational value of utilizing WBE to study the health of a population for epidemiological inference and in informing public health actions, particularly where comprehensive individual testing is severely constrained by a shortage of resources and logistical challenges.However, to realize the true value of this tool in India and other LMICs, it will be important for governmental and other funding agencies to invest heavily in building laboratory capacity and sample collection teams.Such efforts should also help re-emphasize the criticality of clean water, sanitation, and waste management as potential control points in the fight against COVID-19 and future pandemics.
Demographics, and climatic factors including the presence of rainfall, air temperature, and relative humidity, along with GPS coordinates, were also recorded by field workers based at the Central Indian Institute of Medical Sciences (CIIMS) and assisted by the NMC.Daily laboratory-confirmed COVID-19 positive cases and deaths between 1st February and 30th July 2021

Fig 1 .
Fig 1. Map of Nagpur district (study area) showing sampling locations for wastewater study.Each dot represents a location of wastewater collection in Nagpur urban and rural talukas.The map was created using the ArcGIS 10.4 version from a GIS student.Source of map used "ESRI, Maxar, Earthstar, Geographics and the GIS user Community".https://doi.org/10.1371/journal.pone.0303529.g001 Fig 2(a) and 2(b) shows the representative model fit for the SEIPR model to data for all 10 Nagpur catchment zones combined as a single unit for the period of March to July 2021.
Fig in the S3 Appendix).Fig 2(d) depicts a zoomed-in plot of the predicted number of active COVID-19 cases versus SARS-CoV-2 RNA mass rate with individual Monte-Carlo simulations represented by grey points.The measured RNA mass rates and estimated number of infectious individuals based on Eq (8) are denoted by the coloured datapoints and fall within the 95% CI denoted by the red solid lines.In this particular

Fig 2 .
Fig 2. Considering a half-life of 30 h.Model fit to the proportion of the population.(a) (left) confirmed positive COVID-19 infections and (b) (right) confirmed deaths from COVID-19 infections for all zones as a single unit.(c) SEIPR model (1) prediction for the mass rate of SARS-CoV-2 RNA in wastewater over time via Monte-Carlo simulation represented by black points.(d) Zoomed-in plot of predicted number of active COVID-19 cases versus SARS-CoV-2 RNA mass rate with individual Monte-Carlo simulations represented by grey points, where 75% CI and 95% CI are denoted by the green and red solid lines, respectively.Coloured datapoints denote the measured RNA mass rates and estimated infectious individuals based on Eq (8) as presented in Table 4, respectively for an assumed average per capita wastewater rates of 120 L per person per day (red solid points) and 135 L per person per day (blue solid points) for all zones as a single unit.https://doi.org/10.1371/journal.pone.0303529.g002 Data on continuous variables are presented as median with interquartile ranges (IQR).Categorical variables are shown as counts and percentages in parentheses.The normality of data was assessed using the Shapiro-Wilk test.Student's t-test was used for comparing variables which were normally distributed.Mann-Whitney test was used when the normality assumption was violated.The Fisher's exact test and Proportion tests were applied to compare categorical variables.All p− values and confidence intervals (CIs) are two-sided and a p-value of < 0.05 is considered statistically significant.All modelling studies were performed using MATLAB, 2022 o and Rstudio 2022.07.1+554 software on macOS Monterey Version 12.5.1 MacBook Pro (13-inch, M2, 2022).

Table 2 . SARS-CoV-2 RT-PCR results detected per unit of time and detected viral load results of the wastewater samples with climatic and population census infor- mation for each Nagpur catchment zone.
a : 2011 census data; b : RT-PCR results for wastewater samples per unit of time for each zone, NA: not available.https://doi.org/10.1371/journal.pone.0303529.t002

Table 3 . Model parameter estimates and basic reproduction number (R 0 ) for each catchment zone of Nagpur district. Catchment Parameter β 1 (per day) β 2 (10 −4 ) β 3 (10 −4 ) � * (per day) σ (10 −4 per day) ω (10 −4 per day) δ (10 −4 per day) γ ρ (10 −4 per day) d (10 −4 per day) R 0
Computation of R 0 and all model parameters are based on clinical incidence data and not wastewater samples.Note: β 1 denotes the effective contact rate per day, β 2 and β 3 respectively account for the reduction in disease transmissibility of exposed and confirmed positive individuals.� denotes the incubation period, σ denotes the progression rate of susceptible individuals to confirmed positive class via testing per day, δ denotes the progression rate of infectious individuals to confirmed positive class via testing per day, d denotes the disease-induced death rate per day, ω denotes the fraction of exposed individuals that transient to confirmed positive class, γ denotes the fraction of infectious individual that transient to recovery class and ρ denotes the recovery rate of confirmed positive individuals per day.

positive cases RNA rate (10 14 copies per day) a Estimated number of infected individuals (10 14 ) RNA rate (10 14 copies per day) a Estimated number of infected individuals (10 14 )
aggregate SARS-CoV-2 RNA concentration if samples are taken from different locations measured on the day. *