Epidemiology of Shigella infections and diarrhea in the first two years of life using culture-independent diagnostics in 8 low-resource settings

Culture-independent diagnostics have revealed a larger burden of Shigella among children in low-resource settings than previously recognized. We further characterized the epidemiology of Shigella in the first two years of life in a multisite birth cohort. We tested 41,405 diarrheal and monthly non-diarrheal stools from 1,715 children for Shigella by quantitative PCR. To assess risk factors, clinical factors related to age and culture positivity, and associations with inflammatory biomarkers, we used log-binomial regression with generalized estimating equations. The prevalence of Shigella varied from 4.9%-17.8% in non-diarrheal stools across sites, and the incidence of Shigella-attributable diarrhea was 31.8 cases (95% CI: 29.6, 34.2) per 100 child-years. The sensitivity of culture compared to qPCR was 6.6% and increased to 27.8% in Shigella-attributable dysentery. Shigella diarrhea episodes were more likely to be severe and less likely to be culture positive in younger children. Older age (RR: 1.75, 95% CI: 1.70, 1.81 per 6-month increase in age), unimproved sanitation (RR: 1.15, 95% CI: 1.03, 1.29), low maternal education (<10 years, RR: 1.14, 95% CI: 1.03, 1.26), initiating complementary foods before 3 months (RR: 1.10, 95% CI: 1.01, 1.20), and malnutrition (RR: 0.91, 95% CI: 0.88, 0.95 per unit increase in weight-for-age z-score) were risk factors for Shigella. There was a linear dose-response between Shigella quantity and myeloperoxidase concentrations. The burden of Shigella varied widely across sites, but uniformly increased through the second year of life and was associated with intestinal inflammation. Culture missed most clinically relevant cases of severe diarrhea and dysentery.

Introduction Shigella is the second leading cause of diarrhea morbidity and mortality among children in low and middle-income countries, accounting for approximately 60,000 deaths in 2016 [1]. An invasive Gram-negative rod, Shigella has a low infectious inoculum, and both fecal-oral and direct person-to-person transmission can occur [2]. Shigella is strongly associated with dysentery; correspondingly, the WHO guidelines recommend treatment of all pediatric cases of dysentery with ciprofloxacin or azithromycin for presumed Shigella infection [3].
The recent use of quantitative PCR for Shigella detection revealed a more than five times higher burden of Shigella-attributable diarrhea among children in low-resource settings than previously recognized using culture-based diagnostics [4][5][6]. Importantly, the majority of Shigella burden was associated with watery diarrhea, not dysentery [5]. A recent meta-analysis showed that the proportion of Shigella infections that present with dysentery has been decreasing, and that Shigella infections overall had a stronger association with mortality than Shigellaassociated cases of dysentery [7]. Furthermore, even in the absence of diarrheal symptoms, Shigella has been associated with impaired linear growth [6,8,9]. WHO treatment guidelines do not currently recommend treatment for the majority of Shigella infections that may be associated with adverse outcomes, such that there may be missed treatment opportunities. Increasing rates of fluoroquinolone and macrolide resistance have highlighted the need for novel interventions, and particularly increased the urgency of the development of a Shigella vaccine [10], which may offer a more sustainable solution.
Given our recently revised understanding of the magnitude of Shigella disease burden and in preparation for Shigella vaccine trials, a better understanding of the epidemiology of Shigella infections among children in low-resource settings is needed. We describe the burden, diagnostic and clinical characteristics, risk factors, and seasonality of Shigella in the first two years of life in 8 low-resource settings.

Methods
The MAL-ED study was conducted in eight sites: Dhaka (Bangladesh), Vellore (India), Bhaktapur (Nepal), Naushero Feroze (Pakistan), Venda (South Africa), Haydom (Tanzania), Fortaleza (Brazil), and Loreto (Peru), as previously described [11]. Briefly, between November 2009 and February 2012, children were recruited within 17 days of birth if maternal age was � 16 years, their family intended to remain in the area for 6+ months, the child was a singleton pregnancy, birthweight was � 1500 g, the child was not diagnosed with severe disease, and their siblings were not in the study. Fieldworkers conducted active surveillance for child illnesses, antibiotic use, and feeding practices twice weekly until two years of age. Anthropometry was measured monthly. Diarrheal stool samples were collected during diarrhea defined by maternal report of three or more loose stools in 24 hours or one stool with visible blood. Clinical characteristics, including blood observed in stool, were caregiver-reported. Severe diarrhea was defined using the CODA score, which has been previously validated against hospitalization [12,13] and is more appropriate for Shigella than the Vesikari score, which was validated for rotavirus. Non-diarrheal stool samples were collected monthly (at least 3 days distant to a diarrhea episode). Weight-for-age (WAZ) and length-for-age z-scores (LAZ) were calculated using 2006 WHO child growth standards [14]. Socioeconomic status (SES) was summarized using a construct of water, assets, maternal education, and income and was averaged over 4 biannual surveys [15].

Analysis of stool specimens
Total nucleic acid was extracted from stool specimens from children who completed 2 years of follow-up using the QIAmp Fast DNA Stool Mini Kit (Qiagen), as previously described [16]. Extrinsic controls phocine herpesvirus and bacteriophage MS2 monitored the efficiency of extraction and amplification. Quantitative PCR with custom-designed TaqMan Array Cards was used to detect 29 enteropathogens using the AgPath One Step realtime PCR kit (Thermo-Fisher), as described elsewhere [5,17]. Shigella was detected using the ipaH gene, and as in previous work [5,6], we interpret ipaH detections as diagnostic of Shigella even though both Shigella and enteroinvasive E. coli are detected using the ipaH target. Shigella species were detected using periplasmic protein, O-antigen, and type 3 restriction enzyme genes (S. flexneri) and a methylase gene (S. sonnei), as previously described [4]. Shigella infection was defined by ipaH qPCR cycle threshold (Cq) < 35, and quantity defined by log 10 -copy numbers per gram of stool based on the Cq, as previously [6]. Among ipaH positive stools, speciation assays were considered positive when Cq < 40 since the speciation assays target single copy genes and are therefore less sensitive than the ipaH assay. Shigella-attributable diarrhea episodes were identified using attributable fractions (AFe) to adjust for subclinical pathogen infections, as previously [4,5]. We defined Shigella-attributable episodes when the Shigella quantity-derived AFe � 0.5 (i.e. majority attribution). Shigella was also previously detected by culture using standard protocols across sites in all diarrheal stools and non-diarrheal stools collected monthly in the first year and quarterly in the second year [18].

Statistical analysis
We analyzed all stool samples with valid qPCR results for Shigella (97.1%, n = 41,450 of 42,630 samples with sufficient stool collected). We estimated the incidence of Shigella-attributable diarrhea (AFe � 0.5) using Poisson regression and reweighted estimates from the number of episodes tested to the total number of episodes identified by surveillance. We estimated the relative risks of an episode presenting with each clinical characteristic in the first versus second year of life, and for children's first versus subsequent Shigella diarrhea episodes, using logbinomial regression with generalized estimating equations (GEE) to account for correlated episodes within children, adjusting for site.
We assessed diagnostic test characteristics of Shigella culture compared to qPCR as the gold standard among all stools, among attributable diarrheal stools, and among attributable dysenteric stools by site. We further estimated the associations between clinical characteristics of attributable episodes and culture positivity using log-binomial regression with GEE and in univariable (adjusting for site) and multivariable (adjusting for site and other clinical characteristics) models.
To identify risk factors for Shigella infection in both non-diarrheal and diarrheal stools, we included sociodemographics, household, and child-level variables as potential risk factors in univariable log-binomial regression models with GEE, adjusting only for site. We then estimated adjusted associations in a multivariable model with a subset of risk factors that were either statistically significant (p<0.05) or had a risk ratio with magnitude greater than 1.2 or less than 0.83 in the univariable models, overall and by site. For covariates measuring similar constructs (e.g. anthropometric measurements and recent antibiotic use), we included the variables with complete data and/or larger magnitudes of effect in the multivariable model.
We modeled the seasonality of Shigella detections in non-diarrheal stools at each site using predictions from logistic regression models with linear and quadratic terms for the week of the year (w), and the terms sin(2πw/52), cos(2πw/52), sin(4πw/52), and cos(4πw/52). We estimated the associations of historical monthly average temperature and rainfall from 1982-2012 for towns nearest each site [22] with Shigella using log-binomial regression with effects scaled to compare high versus low temperature and rainfall at each site, defined by the 90 th and 10 th percentiles of the site-specific distributions.
Finally, we assessed the associations between concurrent measurements of biomarkers and Shigella infection and quantity, both in quartiles and continuously per log 10 increase in copy numbers per gram of stool based on the Cq value, using linear regression models with GEE and adjusting for site, age, sex, and stool consistency.

Shigella burden
Among 1715 children with 2 years of follow up, a total of 41,405 stool samples (6,751 diarrheal and 34,654 non-diarrheal samples) were tested for Shigella by qPCR ( Table 1). The prevalence of Shigella was 11.5% (n = 4744) overall, and was higher in diarrheal stools than non-diarrheal stools (18.4% vs. 10.1%; p<0.0001). Shigella prevalence in non-diarrheal stools varied from 4.9%-17.8% across sites. Almost half (611/1407, 43.4%) of children with Shigella detected in one or more non-diarrheal stools never had a diarrhea episode in which Shigella was detected, and almost two-thirds (900/1407, 64.0%) never experienced a Shigella-attributable diarrhea episode. The quantity of Shigella detected was approximately 1 log higher in diarrheal (6.13 log 10 -copy numbers per gram of stool) compared to non-diarrheal (5.56 log 10 -copy numbers) stools, and quantities were lower in Pakistan and South Africa compared to the other sites. The prevalence and quantity of Shigella infection increased with age (Table 1). Shigella was detected at quantities high enough to attribute the episode to Shigella in 11.2% (n = 755) of diarrheal stools. The overall incidence of Shigella-attributable diarrhea was 31.8 cases (95% CI: 29.6, 34.2) per 100 child-years. Tanzania had the third lowest incidence (10.8 cases per 100 child-years, 95% CI: 6.2, 18.5) of Shigella-attributable diarrhea despite having the highest prevalence in stools overall. Incidence of Shigella-attributable diarrhea was higher in older age groups (incidence rate ratio: 1.75, 95% CI: 1.70, 1.81 per 6-month increase in age). Incidence of Shigella-attributable dysentery was lower (4.85 cases per 100 child-years, 95% CI: 4.02, 5.84), but followed similar patterns by age and site. Incidence estimates by disease severity and diagnostic are reported in S1 Table. By two years of age, 82.0% (n = 1407) of children had been infected with Shigella (Fig 1A), and 29.6% (n = 507) had at least one episode of Shigella-attributable diarrhea (Fig 1B). Median time to first infection was 14.4 (95% CI: 14.0, 15.0) months.
Because the Shigella speciation assays were less sensitive than that for ipaH, results were available for only 31.3% (n = 1245) of ipaH positive stools with speciation testing (n = 3980, 83.9% of ipaH positives). ipaH quantity was 6.4 log 10 -copy numbers per gram of stool in speciated detections compared to 5.4 log 10 -copy numbers in non-speciated detections. Among the speciated Shigella-attributable diarrheal stools (n = 258), 58.9% (n = 152) were S. flexneri and 43.8% (n = 113) were S. sonnei. The majority of Shigella-attributable diarrheal stools were S. flexneri in all sites except Brazil, Nepal, and South Africa. The ratio of S. flexneri to S. sonnei (71.9% vs. 31.7%) observed in all (diarrheal and non-diarrheal) stools was similar to that in diarrheal stools (S2 Table).

Clinical characteristics of Shigella diarrhea
Among Shigella-attributable diarrheal episodes, 28.3% (n = 214) were severe and 14.7% (n = 111) had bloody stools. Approximately a third (n = 235, 31.1%) were accompanied by fever, 18.5% (n = 140) by vomiting, 10.1% (n = 76) by dehydration, and 18.9% (n = 143) lasted 7 days or longer. The majority of episodes were antibiotic treated (456, 60.4%), and of these, the majority were treated with a macrolide, cephalosporin, or fluoroquinolone (n = 311/456, 68.2%). However, there was substantial site variability; 97.0% of all macrolide treatment was given in Bangladesh and Peru, and antibiotic treatment was rare in Brazil and South Africa. There were differences in the clinical presentation of shigellosis by age ( Table 2). Episodes of Shigella-attributable diarrhea in the first year of life were more likely to be prolonged (aRR: 1.24, 95% CI: 0.88, 1.74), with vomiting (aRR: 1.72, 95% CI: 1.26, 2.35) and with more than 6 loose stools in 24 hours (aRR: 1.41, 95% CI: 1.06, 1.86) compared to episodes in the second year. Episodes in the first year were also more likely to be treated with cephalosporins and less likely to be treated with fluoroquinolones. Adjusting for age, a child's first episode of Shigellaattributable diarrhea was more likely to be accompanied by dehydration (aRR: 1.41, 95% CI: 0.84, 2.36) compared to subsequent episodes. First episodes were also slightly more likely to be prolonged and with vomiting and high frequency of stools, but less likely to be bloody (S3 Table).
Coinfections were detected in almost all Shigella-attributable diarrheal stools (S4 Table). However, a second etiology of diarrhea (i.e. a coinfecting pathogen was detected in a quantity high enough to be associated with diarrhea) was identified in only 38.5% of these episodes (n = 289). The coinfecting pathogen had a higher AFe than Shigella (i.e. potentially the primary etiology) in 12.3% of episodes (n = 92). Episodes with a viral co-etiology were less likely to be bloody (aRR: 0.60, 95% CI: 0.38, 0.96) and more likely to include vomiting (aRR: 1.79, 95% CI: 1.31, 2.46) compared to episodes with Shigella as the only etiology identified. Episodes with another bacterial co-etiology were also less likely to be bloody (aRR: 0.62, 95% CI: 0.35, 1.11; S5 Table).

Performance of Shigella culture
Of 30,678 stools tested by both qPCR and culture, 3,372 (11.0%) were positive by qPCR and 280 (0.9%) were positive by culture. Considering qPCR the gold standard, the overall sensitivity of culture was 6.6% (Table 3). Specificity was uniformly high, at more than 99% at all sites. In the subset of Shigella-attributable diarrheal stools (n = 736), which have higher Shigella quantity detected than all stools, sensitivity improved to 17.5%, and ranged from 0% in South Africa and Tanzania to 24.0% in Peru (Table 3). Sensitivity was even higher in dysentery episodes (27.8%). The sensitivity of culture among Shigella-attributable diarrheal stools without another attributable pathogen identified (20.6%, n = 94/457) was higher than that among Shigella-attributable diarrheal stools with another attributable pathogen identified (12.5%, n = 35/279). Culture positivity was strongly associated with age, such that attributable diarrhea stools among younger children were less likely to be culture positive ( Table 4). Presence of blood had the strongest association with culture positivity (aRR: 1.84, 95% CI: 1.28, 2.65), but culture still missed more than 70% of dysentery cases. Diarrhea severity was not associated with culture positivity, and caregiver-reported fever was inversely associated with culture positivity. Recent macrolide treatment was also associated with reduced detection by culture (aRR: 0.57, 95% CI: 0.30, 1.08; Table 4).

Risk factors for Shigella
In univariable analysis, unimproved sanitation, crowding (2+ people living in a single room), <10 years of maternal education, having 3+ live children, initiating complementary foods before 3 months, recent diarrhea, antibiotic use (particularly fluoroquinolone use), and lower anthropometric measurements prior to sample collection were also associated with higher risk of Shigella infection (  score increase in weight) were the strongest risk factors in multivariable analysis (Table 5).
There was some variability in associations by site; for example, crowding and recent diarrhea were strong risk factors in Brazil, low maternal education had the largest associations in Brazil, India, and South Africa, and recent fluoroquinolone use was only a risk factor in India and Pakistan (S1 Fig). The seasonality of Shigella infections differed by site (Fig 2). Peak prevalence was observed in May/June in Bangladesh, June in Nepal, July/August in India, November in South Africa, and February in Tanzania. Two peaks were observed in Pakistan and Peru, and there was little seasonality in Brazil. Among climactic factors that could explain these heterogeneities, temperature was more strongly associated with Shigella detection than rainfall, though temperature was collinear with rainfall at most sites (Fig 2, S6 Table). Higher temperature was most strongly associated with Shigella in Nepal (aRR: 2.71, 95% CI: 1.96, 3.75) and Tanzania (aRR: 1.92, 95% CI: 1.63, 2.27). Uniquely, higher rainfall was protective in Pakistan (aRR: 0.78, 95% CI: 0.61, 1.00; S6 Table).

Discussion
The burden of Shigella among children under two was heterogeneous across eight sites with the absolute burden of infection and illness was higher in the South Asian sites and Peru. The burden of Shigella diarrhea relative to subclinical infections also varied. Shigella diarrhea episodes were accompanied by blood in only a minority of episodes, and episodes were generally more severe in the younger children. In a minority of Shigella diarrhea episodes that were also attributable to another pathogen, clinical phenotypes were often mixed; for example, episodes with viral co-etiologies predictably presented with more vomiting. An 11-fold higher detection of Shigella was observed with qPCR compared to culture, including a 3-fold increase for Shigella-attributable dysentery. While higher sensitivity of culture among more severe cases has been previously noted [23], culture still missed the majority of cases of Shigella diarrhea, severe diarrhea, and dysentery, and culture had the lowest sensitivity among young children who are at highest risk for poor outcomes. These results highlight the need for more sensitive diagnostic tools.
The analyses of Shigella risk factors were consistent with prior work, which identified maternal education, exclusive breastfeeding, and larger WAZ as protective [24][25][26], and found similar trends with age [24,26,27]. Undernourished children were more likely to be infected, and interestingly, the seasonality of Shigella in Tanzania mirrored the seasonality of malnutrition, previously described [28]. The identification of unimproved sanitation as a risk factor alongside seasonal patterns that correlate strongly with average temperature and rainfall suggest environmental transmission pathways may be important. The seasonal patterns also support the potential implication of houseflies as a mechanical vector, as housefly population densities are seasonal and have been shown to correlate with Shigella [29]. Surprisingly, recent  antibiotic exposure, including to macrolides and fluoroquinolones which are recommended for the treatment of shigellosis [3], was not associated with reduced Shigella detection. Among several biomarkers that indirectly measure environmental enteric dysfunction (EED), especially MPO, but also AGP, were elevated during Shigella infections with a dose-response with Shigella quantity. Several previous studies found that high levels of MPO were most predictive of linear growth decrements compared to other biomarkers, including in previous analyses of data from the Bangladesh [30] and Peru [31] MAL-ED sites, and in a birth cohort in Pakistan [32]. The associations between Shigella and MPO, and MPO and growth faltering, suggest a potential mechanism for the impact of Shigella on linear growth previously characterized [6].
This study was limited by the fact that stool samples were not collected and/or tested from all diarrhea episodes [5], such that we may have underestimated the incidence of Shigella diarrhea. Because a second etiology of diarrhea was frequently identified, we were unable to determine whether Shigella was the primary cause in a substantial subset of Shigella-attributable diarrhea episodes. Both Shigella and enteroinvasive E. coli can be detected using the ipaH gene. However, previous speciation [4] and metagenomic work [33] supports the interpretation of these detections as Shigella. In addition, the Shigella speciation assays were insensitive, such that species data were available for only a third of infections. Improvements to the speciation assays have been made since the MAL-ED study; validated real time PCR assays that can differentiate >80% ipaH positives regardless of culture positivity and identify a panel of S. flexneri serotypes (including 2a, 3a, and 6) are now available for future studies. Because deaths were rare in this community-based cohort, we could not assess the associations between Shigella and mortality, as in the GEMS study [34]. Finally, site-specific estimates were relatively imprecise given the low numbers of children at each site. Because burden varied substantially by site, the incidence estimates may not be generalizable to other low-resource settings.
The high burden of Shigella disease documented in MAL-ED highlights the potential utility of Shigella vaccines. Almost all children were exposed to Shigella by two years of age in most sites, which suggests a pathogen-specific population-based prevention strategy is warranted. Furthermore, because the incidence of Shigella diarrhea was higher in the second year of life, the vaccine could potentially be given later in infancy and prevent the majority of cases. However, younger children presented with more severe symptoms, suggesting protection early in infancy may be important. Continued monitoring of Shigella epidemiology is needed since incidence trends may change as macrolide antibiotics become more available globally. More than 15 Shigella vaccines are currently in development [10], with some rapidly advancing to evaluation in target populations. Because of the poor sensitivity of culture, use of molecular diagnostics to define outcomes in future vaccine efficacy studies could limit misclassification of the outcome and reduce the sample size required to estimate significant effects.