District level correlates of COVID-19 pandemic in India during March-October 2020

Background COVID-19 is affecting the entire population of India. Understanding district level correlates of the COVID-19’s infection ratio (IR) is essential for formulating policies and interventions. Objective The present study aims to investigate the district level variation in COVID-19 during March-October 2020. The present study also examines the association between India’s socioeconomic and demographic characteristics and the COVID-19 infection ratio at the district level. Data and methods We used publicly available crowdsourced district-level data on COVID-19 from March 14, 2020, to October 31, 2020. We identified hotspot and cold spot districts for COVID-19 cases and infection ratio. We have also carried out two sets of regression analysis to highlight the district level demographic, socioeconomic, household infrastructure facilities, and health-related correlates of the COVID-19 infection ratio. Results The results showed on October 31, 2020, the IR in India was 42.85 per hundred thousand population, with the highest in Kerala (259.63) and the lowest in Bihar (6.58). About 80 percent infected cases and 61 percent deaths were observed in nine states (Delhi, Gujarat, West Bengal, Uttar Pradesh, Andhra Pradesh, Maharashtra, Karnataka, Tamil Nadu, and Telangana). Moran’s- I showed a positive yet poor spatial clustering in the COVID-19 IR over neighboring districts. Our regression analysis demonstrated that percent of 15–59 aged population, district population density, percent of the urban population, district-level testing ratio, and percent of stunted children were significantly and positively associated with the COVID-19 infection ratio. We also found that, with an increasing percentage of literacy, there is a lower infection ratio in Indian districts. Conclusion The COVID-19 infection ratio was found to be more rampant in districts with a higher working-age population, higher population density, a higher urban population, a higher testing ratio, and a higher level of stunted children. The study findings provide crucial information for policy discourse, emphasizing the vulnerability of the highly urbanized and densely populated areas.


Introduction
Coronavirus disease 19 (COVID-19) is a respiratory disease caused by the SARS-CoV-2 virus, which is a member of a large family of viruses called coronaviruses. These viruses can infect people and some animals. The virus is thought to spread from person to person through droplets released when an infected person coughs, sneezes, or talks [1]. With more than 81,82,676 confirmed cases on October 31, 2020, India ranked second globally in terms of the total number of infected patients of COVID-19 [2]. The rate of spread of the disease was slow in the initial three months (January to March 2020) from the first outbreak in Kerala in January 2020, possibly because of the early nationwide lockdown [3][4][5]; widespread coverage about the pandemic in print, electronic and social media [6], and targeted efforts by the union and state governments on quarantine facilities and travel protocols [7,8]. There was a paid increase in the number of confirmed COVID -19 cases in many districts since April 2020. India has recorded over 50,000 cases every day from August 2020. S1 Fig shows that the trend of COVID-19 confirmed cases biweekly from March 14-October 31, 2020. After reaching a peak of COVID-19 infection in September-October 2020, new cases have been steadily declining in India. This biweekly peak reduced from 12,44,430 (12-25 September, 2020) to 5, 97,281 (24 October-5 November, 2020).
Interestingly, India has a relatively high recovery rate and the lowest fatality rate globally [9]. Despite India's advantage of having a young age structure with less susceptibility to COVID-19 related deaths [10], India may have to undergo a higher burden of disease due to other demographic factors [11] such as the enormous population size, high population density, higher percentage of people living in poverty, lower levels of per capita public health infrastructure, and a high prevalence of co-morbid situations [12]. Research evidence that the transmission of second wave of COVID-19 increase risk almost double of the first peak [13]. There is various factor associated with second wave more devasting such as-transmission dynamics, effect of population density, effect of testing rate and healthcare infrastructure [14].
Like any other health and demographic indicator, COVID-19 infection varies widely among the different states of the country [15,16]. However, the geographical pattern of the COVID-19 infection rate' does not coincide with the patterns of demographic and health indicators such as the under-five mortality rate or nutritional status. COVID-19 has been spreading rapidly in the urban areas, especially in states with megacities with densely populated urban slums like Delhi, Maharashtra, Tamil Nadu, and West Bengal. The sudden surge of return labour migration to the states of origin (due to COVID-19 related national lockdown), state-level health care system, adherence to physical distancing measures, and local government management are other potential community-level factors affecting geographical variations in the spread of COVID-19 in India. Some recent studies have computed composite indices to rank the districts in terms of their COVID-19 vulnerabilities using demographic information and infrastructure characteristics [17][18][19]. While such analyses help district-level planning and prioritization, they are based on the assumption that vulnerability will decrease as the districts' socioeconomic indicators improve. However, such an inverse relationship may not be applicable in the context of COVID-19; for instance, a higher percentage of the urban population may indicate a higher socioeconomic status of the district population in a non-COVID situation and may be linked with an improved health outcome. However, it may be positively correlated with the spread of COVID-19. COVID-19 is more prevalent in cities and towns than in rural areas or hilly regions [20]. Therefore, it is imperative to unfold the empirical relationship patterns between the district's socioeconomic and household infrastructural characteristics and the COVID-19 infection ratio.
To the best of our knowledge, no such previous study has been conducted on COVID-19 in India. The aim of the present study is of two-folds, first to investigate the district level variation in COVID-19 during March-October 2020 and, secondly, to investigate the district level socioeconomic and demographic correlates of COVID-19 infection ratio in India. Identification of such correlates is crucial for framing health policy and appropriate intervention.

Data and methods
We used crowdsourced district-level data on COVID-19 available in the public domain from March 14, 2020, to October 31, 2020, accessed from the COVID-19 India dashboard. The time-series data on COVID-19 was available in the COVID-19 India from March 14, 2020 [21]. It is an application programming interface (API) to monitor the COVID-19 cases at national, state, and district levels. The data compiled in this web portal is based on state bulletins and official handles. The details of the data are available on the website. This portal data is consistent with the Ministry of Health and Family Welfare data, Government of India (https:// www.mohfw.gov.in/) [22].
For explanatory variables, we utilized data from the National Family Health Survey of India 2015-16 (NFHS-4), a cross-sectional survey of 601,599 households, and 2.87 million individuals from all 29 states and seven union territories [23]. The survey collected data on various socioeconomic, demographic, health, and family planning indicators and anthropometry and biomarkers' measures related to anemia, hypertension, and diabetes. The NFHS-4 is the most recent source of such biomarker-based data at the district level in India. We also used some socioeconomic and demographic variables from the Census of India [24].

Outcome variable
We have analyzed new, infected, recovered, and deceased cases at the national and state levels. The term "new cases" indicates the newly infected cases in the reference period; the term "confirmed cases" indicates the number of confirmed COVID-19 cases in the reference period. The recovered and deceased cases indicate the number of persons recovered and died from COVID-19 in the reference period. Finally, the term "average infected cases" means the total number of confirmed cases after excluding recovered and death cases.
For all the 640 districts in the thirty-five states and eight union territories of India, we defined the outcome variable, COVID-19 Infection Ratio (IR), as the number of confirmed cases in a given district per 100,000 population. For the district-level population for the year 2020, we projected the district population using an exponential growth rate from the census 2001 and 2011.
The infection of ratio was calculated as: Where, C i = the number of confirmed cases in i th district and the P i = total projected population in the i th district on October 31, 2020.

District level correlates
Based on previous literature [25][26][27][28], we considered a set of 23 variables at district level, viz., i) Demographic variables: percentage of population aged 60 and above, percentage of population in age group 15-59, percentage of marginal worker, population density, ii) Socioeconomic variables: percentage of the literate population, percentage of Scheduled Castes (SC) population, percentage of Scheduled Tribes (ST) population, percentage of Hindu population(as an indicator of religious composition at district level), percentage of urban population, average number of persons that sleep in one room; iii) Household Infrastructure variables: percentage of households with availability of soap, percentage of households with water and toilet facility within the premise; iv) Health-related variables: percentage of women with Diabetes (Gluco-se>140mg), percentage of women (among age18+) with Cancer, percentage of 18+ aged women consuming tobacco, testing ratio per one hundred thousand population, under-five mortality rate, percentage of institutional births, percentage of full immunization among children aged 23-36 months, percentage of women aged 18 and above reporting anemia and, percentage of children with stunting and wasting. The district's health-related variables represent the district population's overall health status at the macro level.

Statistical analysis
We performed a bi-weekly trend analysis of COVID-19 cases in India. To examine the district level correlates of the outcome variable, we carried out a linear regression analysis at the district level. Four separate district-level regression models were fitted-Model 1and Model 3 presents the independent variable's unadjusted effect without controlling any other independent variable's role. Model 2 and Model 4 show the adjusted results of the independent variables on the dependent variable. We did all the analyses in the statistical package Stata14.1. We tested for the possible multicollinearity among the independent variables before fitting them to the regression model.

Spatial analysis
For analysing the spatial distribution of the COVID-19 cases at the district level, we generated descriptive maps of 640 districts in the software package QGIS. We later exported the shapefiles to GeoDa software to perform spatial analysis. Using the first-order 'Queen's contiguity matrix as the weight, we estimated Moran's I and univariate Local Indicators of Spatial Association (LISA). 'Moran's I" is the Pearson coefficient measure of spatial autocorrelation, which measures the degree to which data points are similar or dissimilar to their spatial neighbours [29]. The LISA cluster map yields four types of geographical clustering of the interest variable [30].
Here, "high-high" refers to the regions with an above-average infection ratio and sharing the boundaries with neighbouring areas with above-average infection ratio values. On the other hand, "high-low" indicates regions with below-average value and the surrounding areas with an above-average infection ratio. Also, the areas with below-average infection rates and sharing boundaries with neighbouring regions having values below the average of the same variables are referred to as low-low. The "high-high" are also referred to as hot spots, whereas the "low-low" is referred to as cold spots [30,31].

Results
India has been reporting new cases of the coronavirus (COVID-19) every day since March 14, 2020. India reported over 8.1 million confirmed cases as of October 31, 2020. Out of these, around 7.4 million patients recovered, while 1 22,154 were fatal [32].  Table present the national bi-weekly (14 days) national pattern of new confirmed, infected, recovered, and deceased COVID-19 cases in India in the study period. In India, the average bi-weekly new confirmed cases rose 677 times (from 63 to 42,663), the average recovered points increased 10,729 times (from 5 to 53,647), the average infected patients increased 449 times (from 63 to 28,256). The average deceased cases increased 502 times (from 1 to 502) between the 1 st and the 17 h bi-weekly. During the study period, the COVID-19 cases were in peak 12 th -25 th September 2020, and then it started declining.
S2 Table presents national and state-wise confirmed, infected, recovered, and deceased cases for the entire study period (March 14, 2020, to October 31, 2020). It also presented

PLOS ONE
infection ratios at the national and state level. It is found that six states viz. Maharashtra, Andhra Pradesh, Tamil Nadu, Karnataka, Telangana, and Uttar Pradesh, have accounted for more than half of the country's total cases. About 70 percent of new possibilities, 70 percent of recovered cases, 80 percent of infected patients, and 61 percent of COVID-19 deaths are from only nine states (Maharashtra, Delhi, Tamil Nadu, Karnataka, Andhra Pradesh, Telangana, West Bengal, Gujarat, and Uttar Pradesh). The IR in India is 42.85 per one hundred thousand people. The highest IR was observed in Kerala (259.93) and the lowest in Bihar (6.58). However, the states like Arunachal Pradesh (123. 40 In July 2020, about 11 percent of the confirmed cases belonged to another seven urban districts (Hyderabad, Central Delhi, Ahmedabad, South East Delhi, Kolkata, West Delhi, and East Godavari) with at least 20,000 confirmed cases. However, by the end of October 31, 2020, seven districts contributed 9.34 percent of total cases, with at least 90,000 confirmed cases in each district. About 90 percent (575 districts) have at least 1000 confirmed cases each, and about 97 percent (625 districts) have at least 100 positive confirmed cases of COVID-19. About 99.5 percent of the districts reported at least one confirmed case of COVID-19 until October 31, 2020.
In Figs 2 and 3, panel B presents the district level infection ratio (IR), defined as the number of confirmed cases per 100,000 population on July 31, 2020, and October 31, 2020, respectively. The results show that the top three worst-affected districts (Central Delhi, New Delhi, Mumbai, and Chennai) in India have at least 2,000 (per 100,000 population). Another three districts, viz., South East Delhi, Kamrup Metropolitan, and North Delhi, have IR ranging between 1000 to 2000 per one hundred thousand population till the end of July 2020. However, by the October end, the same districts have experienced a manifold increase in IR (for instance, the IR in Kamrup Metropolitan increased from 1,035 to 77,044 per 100,000 population

PLOS ONE
District level correlates of COVID-19 Table 1 shows the descriptive statistics of the outcome variable (Infection Ratio) and exposure variables for the 640 districts. The average infection ratio is (108.40 per one hundred thousand population on July 31, 2020, and 586.54 per one hundred thousand population on October 31, 2020. On July 31, 2020, the IR was zero in the two districts from Arunachal Pradesh (Dibang valley, Krumung Kumey) and another two districts from the Union Territories (Lakshadweep and Nicobar).
All district-level socioeconomic variables differ substantially among districts. For example, the mean percent of the old age population (60 and above) is 8.33, and it varies between 2.46 percent to 17.82 percent among the districts of India. The percent of marginal worker a proxy indicator of district level out migration, ranged substantially from 1.36 to 33.60 percent. Population density (number of persons per square kilometer) varies from 1 in Dibang Valley (Arunachal Pradesh) to 36155in northeast Delhi.
We also observed massive variation in the districts' socioeconomic variables, viz., percent Hindu, urban, and ST population vary from 0 percent to 100 percent in India's districts. On average, 2.92 persons sleep in one room in the Indian districts. On average, 48.14 percent of the households have water facilities, and 61.41 percent have toilet facilities within the household premises. Hygiene practice of availability of soap ranges from less than 17.20 percent to 98.90 percent in India. There exists a wide disparity in health-related variables too. While the testing ratio ranged from 0 to11471 persons per 100 hundred thousand, the percentage of full immunization among children ranged from 7.14 to 100. The tobacco consumption among women ranged from 0.8 to 88 percent.

PLOS ONE
District level correlates of COVID-19 The regression analysis of infection ratio (presented in Table 2) displays both unadjusted and adjusted coefficients of the exploratory variables for two time periods, viz., March 14, 2020-July 31, 2020 and March 14, 2020-October 31, 2020. In the unadjusted models (1 & 3), among demographic variables, percent of 15-59 aged population, percent of marginal workers, and population density were significantly associated with IRs. Several socioeconomic variables, such as the literate population, ST population, and urban population, and the average person sleeping in a room were significantly associated in the unadjusted model 1. In unadjusted model 3, percent of urban population is significantly associated with COVID-19 infection together with the percent of literate population, and the average person sleeping in a room Also, districts with better household infrastructure facilities have a higher likelihood of COVID-19 infection IR.
Districts with higher levels of household infrastructure facilities (such as soap or toilet facility) reported higher levels of IRs in model 3. Among the health-related variables, percentage of women with diabetes (glucose > 140), testing ratio, institutional births, percentage of children immunized were significantly positively correlated with COVID-19 IR in October in the unadjusted model. Districts with high diabetic patients have a 21.05 and 89.8690-fold higher chance of COVID-19 infection in July and October than those with a low prevalence of diabetes. However, the percentage of women consuming tobacco, under-5 mortality, and children with stunting conditions were associated negatively with COVID-19 IR.
In the adjusted model (2 & 4), the association between IRs and most of the correlates becomes statistically insignificant. After controlling the roles of demographic, socio-economic,

Summary and discussion
In terms of the total number of confirmed cases, India ranked second after the US, reporting more than eight million COVID-19 cases as of October 31, 2020. The present study examined the district-level variation in COVID-19 cases from March 14, 2020, to October 31, 2020. The present study also aimed to identify socioeconomic and demographic correlates of COVID-19 infection ratio at the district level. Due to different stages of socio-economic development in the states, the trajectory of COVID-19 and related intervention cannot be uniform. Our result illustrated the differences in COVID-19 cases at the state and district levels with few critical findings.
First, the spread of COVID-19 has been increasing over time during the study period. The average bi-weekly cases show that the new, infected, recovered, and deceased cases are growing nationally. It is found that about 80 percent of the infected patient and 61 percent of the deaths are concentrated in nine states (Delhi, Gujarat, West Bengal, Uttar Pradesh, Andhra Pradesh, Maharashtra, Karnataka, Tamil Nadu, and Telangana). On October 31, 2020, the IR in India was 42.85 per hundred thousand population. The IR ranged from a minimum of 6.58 in Bihar to a maximum of 259.63 in Kerala. Only two Union Territories (Lakshadweep and Daman & Diu) have zero IR. The metropolitan cities like New Delhi, Mumbai, Thane, Pune, Kamrup Metropolitan, South Goa, Chennai, Bengaluru, and Hyderabad were most affected by COVID-19. The study identifies the districts at higher risk of coronavirus infection in the southern, northern, and western states. The apparent concern is that these states also contribute significantly to the Indian economy [33]. Another important observation of this study is that districts bordering the six metropolitan cities were observed to be India's highest hot spots, possibly because they contribute the largest share of migrants and commuters to these megacities.
Spatial autocorrelation analysis of COVID-19 infection ratio shows that a district's infection ratio is not highly correlated with that of the neighboring districts. We have observed a few hotspots of COVID-19 in Maharashtra, Tamil Nadu, Delhi, and Jammu &Kashmir. In contrast, we identified cold-spots in the central, north-western and north-eastern regions of India.
Finally, our research reveals that the district's infection ratio of COVID-19 is correlated with most socioeconomic and health-related variables. However, after adjusting other variables' roles, we observed a statistically significant association only with a limited number of variables. After adjusting the role of socioeconomic and health-related factors, the COVID-19 infection ratio was found to be higher in districts with a higher working-age population, higher population density, a higher urban population, a higher testing ratio, and a higher level of stunted children.
The results obtained in regression analysis are consistent with that from the geographical analysis of the Covid-19 infection ratio. It has been observed that highly urbanized districts are worst affected by Covid-19. Population density is also higher in urban areas compared to rural areas. As the percentage of the urban population increases, the chances of unavoidable economic activities such as medicine, food transport, distribution, etc., also increase even in the national lockdown period, which exposes more people to the pandemic. Previous studies also showed that higher population densities in congested slum areas and large towns accelerated COVID-19 infection and mortality rates [34][35][36]. The congestion, slum concentrations, inadequate housing, and sanitation in poor urban areas may explain such high disease. A positive association between COVID-19, IRs, and testing ratio indicates underreporting of COVID-19 in districts where the testing ratio is low.
Studies based on individual data show that older people are more vulnerable to COVID-19 infections [37,38]. This study also identified that pre-existing diabetes is positively associated with COVID-19 disease [37,38]. In our research, we did not find such associations since our analysis is not an individual-level analysis. One of the major limitations of our study is not extending the analysis beyond October 31, 2020. However, this analysis can be extended in future. Yet, unlike many previous studies, we are identifying macro-level correlates of the COVID-19 infection rate for the period March 14, 2020, to October 31, 2020.

Comparison between first and second waves of COVID-19 in India
At the onset of the COVID-19 pandemic, India imposed world's strictest nationwide lockdown beginning from March 25, 2020 [39], But, as of April 10, 2021, India was the third leading country after USA and Brazil's identified cases [40]. Like several other parts of the world, India has been experiencing a massive surge of COVID-19 cases and deaths [41][42][43]. The second wave has started in the middle of March, 2021 and recorded highest number of cases (144,829) on April 09, 2021 [42,43]. The major affected states were Maharashtra, Kerala, Karnataka, Andhra Pradesh, Tamil Nadu, Andhra Pradesh, Delhi, Uttar Pradesh and West Bengal [39,41,42]. Moreover, several megacities with high concentration of population and overcrowded with migrants registered high transmission rate of the disease. Mumbai Urban and NCT of Delhi were the example of two such cities. Despite this high caseload, several national movements such as the farmer's protest in November 2020 at the New Delhi border, elections in several states and some religious gatherings have kept the social distancing norms at stake and made situation highly vulnerable for the spread of the virus. Another reason for the high spread of virus might be the intra-urban mobility form slum to towns/cities. The household density of the slum areas is one of the most vital causes of the infection spread. The international airports of NCT of Delhi, Mumbai, Bengaluru and Kolkata acted as the gateways to allow the dangerous mutating COVID-19 strain to spread across the main urban hubs of the country. Moreover, migrant labors also borrow the new strain to the different corners of the country. We also found that previous studies on second wave, have similar pattern of hotspots clusters with as our study. The concentration of the virus spread basically found in Mumbai Urban -Pune-Nasik-Kolhapur region and NCT of Delhi (comprising Punjab and Haryana) [39,[41][42][43][44]. Though Kerala and districts surroundings Bengaluru Urban have high concentration of confirmed cases, it didn't report high CFR. These patterns would give us some idea to combat with the present situation and get prepared for the predicted third wave.

Conclusion
The COVID-19 pandemic is expected to have a long-term impact on health, economy, and social processes globally, including India. Only a clear understanding of the disease's spatial distribution and its correlates will help to formulate policies and interventions. Therefore, the possible risk factors should be included in policy preparedness and implementation during the COVID-19 pandemic. Understanding risk factors of COVID-19 may also help to understand the future dynamics of COVID-19 or other such infectious diseases.
We found that the share of working-age populations, population density, urban residence, and testing rates are significantly correlated with the COVID-19 infection ratio (IR). As in urban areas, the population density is very high, and social distancing is challenging to maintain; the role of government is crucial in combating the pandemic. By ensuring the health and hygiene-related facilities, (providing adequate clean water, adequate sanitation, and sewerage facilities, cleaning the city, maintaining quarantine centers and public health care institutions, etc.), and improving public distribution system to ensure minimum food supply, especially among the urban poor and other deprived sub-groups, can help to control the spread of COVID-19 infection.
More tests are required to classify patients with asymptomatic conditions. India has a population of over 1.3 billion, but till October 31, 2020, approximately 1087.9 million (only 8.04 percent) tests have been carried out. As of April 1, 2021, India has 552,566 active infection cases while 11,434,301 patients have recovered and 162,468 have succumbed to COVID-19. According to Ministry of Health and Family Welfare, a total of about 640.05 million doses of COVID-19 vaccine have been administered by August 31, 2021 [45]. Simultaneously, people's negligent behavior towards COVID-19 protocols (like, not following the social distancing norms, not wearing the mask in public places, and coughing without covering mouth) put them at a higher risk. Finally, there is a need to improve infrastructure (hospitals, ventilators, PPE kits) and human resources (doctors, nurses, and frontline workers) in healthcare facilities.
Our analysis does have a few limitations. First, there is a possibility of under-reporting positive and fatal cases due to a lack of testing or social stigma. Hence our data gives the most conservative estimates of the infection ratio. Second, for most cases, the patients' level of information (such as age, sex, and comorbidity) is unavailable. Therefore, we analyzed the districtlevel determinants instead of individual-level determinants. Thus, our results identified the major correlates only at the district level. Finally, we analyzed the number of confirmed cases for infection ratio rather than the number of active cases. The later considers the recovery rate and depends on the health service available in a region. We used the number of confirmed cases as the primary indicator of the spread of the infection. Despite these limitations, the study's merit lies in bringing together spatial-demographic vulnerabilities prevalent across the nation during the pandemic period. Writing -review & editing: Nandita Saikia.