Figures
Abstract
Candida albicans, a pathogenic fungus implicated in early childhood caries (ECC), plays a crucial role in oral health. While its colonization usually begins at birth, the extent of maternal involvement in yeast transmission to their offspring, particularly across different racial groups, remains unclear. Studies have shown elevated levels of C. albicans in both mothers and children, with genetically related fungal strains, suggesting maternal transmission, but the racial component, notably higher levels in Black children, lacks thorough investigation of underlying factors. Our research aimed to address this gap by investigating how maternal and demographic factors such as socioeconomic status and oral health affect C. albicans levels in infants across Black and non-Black populations. Employing a partial linear semiparametric mixed-effects model (PLSMM) with variable selection and race-based stratification, we identified predictors that have different effects depending on the infant’s race among a large pool of predictors. Through this stratified analysis, we aimed to discern crucial factors significantly contributing to C. albicans colonization while minimizing the impact of irrelevant or redundant variables. In this stratified analysis, exclusive breastfeeding () and maternal marriage (
) were significant predictors among non-Black infants, while maternal employment (
) and post-delivery maternal C. albicans (
) were significant among Black infants. Our findings highlighted race-specific associations between C. albicans levels in children and factors such as breastfeeding practices, marital status, maternal oral hygiene, and maternal C. albicans levels. Our study underscores the importance of race-specific considerations in understanding C. albicans colonization in infants, offering insights for tailored interventions and healthcare strategies, particularly for vulnerable populations.
Citation: Leon S, Alomeir N, Xiao J, Wu TT (2026) Maternal and demographic factors influencing oral Candida albicans in infants: A stratified analysis using a novel partial linear semiparametric mixed-effects model. PLoS One 21(2): e0340317. https://doi.org/10.1371/journal.pone.0340317
Editor: Geelsu Hwang, University of Pennsylvania, UNITED STATES OF AMERICA
Received: August 9, 2025; Accepted: December 18, 2025; Published: February 6, 2026
Copyright: © 2026 Leon et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data cannot be shared publicly as participants did not give consent for their data to be shared in this manner. Since the consent statement approved by the Institutional Review Board (IRB) of the University of Rochester, and signed by the participants, did not include the provision that data would be made publicly available, we do not have participant consent to share this data. Requests for anonymized data can be made to the Institutional Review Board (IRB) of the University of Rochester (https://www.rochester.edu/ohsp/irb-review/).
Funding: NIH/NIDCR R01DE031025 and NSF SCC 2238208.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Emerging research evidence suggests a cariogenic potential of oral Candida species [1–3]. Intriguing findings have demonstrated a greater prevalence of Candida albicans in children with early childhood caries (ECC) than in caries free children [3]. Moreover, a positive association was observed between the carriage of C. albicans and cariogenic Streptococcus mutans in the saliva and plaque of ECC-affected children [4]. Koo et al. further revealed the synergistic relationship between C. albicans and S. mutans is facilitated by glucosyltransferases (GTFs) secreted by S. mutans [5]. When sucrose is available, the enzyme GtfB latches onto the surface of C. albicans cells, leading to the creation of exopolysaccharides (EPS), predominantly insoluble α-glucan [6]. EPS plays a pivotal role in fostering the interaction between C. albicans and S. mutans, thereby encouraging the development of mixed-species biofilms leading to more severe dental caries [5]. The synergistic associations between C. albicans and S. mutans are associated with exacerbated dental caries in animal models [7] and has a potential indication for a higher rate of caries among children [8].
C. albicans oral colonization starts as early as one week [9]. Candida oral colonization is reported to be 45–65% in healthy children and 45% in neonates [10]. A prospective study that aimed to determine the extent of maternal involvement in transmitting C. albicans to children, focused on the genetic relatedness of C. albicans strains between mother-child dyads and the factors influencing early life colonization [11]. A significant finding was that 94% of mother-child pairs with oral C. albicans shared highly genetically related strains, indicating a strong maternal influence on the child’s acquisition of this pathogen [11]. The vertical transmission rate of C. albicans was reported at 60% (3 out of 5 mother–child pairs) where both the mother and child were carriers of C. albicans, indicating a significant role of the mother in transmitting this yeast species to the offspring [12]. Also, one month after birth, 60% of the infants were found to have acquired C. albicans through vertical transmission, while the other 40% were affected by horizontal transmission [11].
Oral health conditions such as dental caries and periodontitis in adults exhibit a higher prevalence among Black populations in the United States as compared to White populations [13,14]. Based on the Oral Health Surveillance Report data from 2011 to 2016, the prevalence of ECC is higher in non-Hispanic Black children (28%) compared to non-Hispanic White children (18%) for the age group of 2 to 5 years [14]. This indicates that non-Hispanic Black children have a higher risk of experiencing dental caries in their early years than their non-Hispanic White peers [14]. This information underscores the existence of racial disparities in oral health, particularly in the prevalence of dental caries among young children in the United States. However, the underlying factors contributing to the racial disparity in the context of oral C. albicans colonization and carriage in infants and their association with ECC remains unexplored.
In this study, we propose the use of a partial linear semiparametric mixed-effects model (PLSMM) [15] to analyze the concentrations of Candida albicans in infants. The PLSMM enhances both flexibility and interpretability while minimizing specification bias. Unlike linear mixed-effects models (LMM), which impose strict linearity assumptions, this approach, drawing from semiparametric mixed-effects models (SMM), combines the flexibility of nonparametric models with the simplicity of linear regression. By integrating a high-dimensional linear component, the PLSMM addresses dimensionality issues common in nonparametric models. Instead of traditional methods such as splines or kernels, the PLSMM employs a finite dictionary of basis functions, following the approach introduced by Leon and Wu (2025) [15]. This method automatically selects from a diverse range of basis functions, providing a sparse approximation of the nonparametric function. Unlike previous approaches, which may compromise on flexibility and precision, the PLSMM simultaneously tackles issues of dimensionality and flexibility. A detailed implementation of the estimation procedure is provided in the R package plsmmLasso [16]. In the work of Leon and Wu (2025) [15], they identified a risk of elevated C. albicans levels among Black children compared to non-Black children, as shown in Table 1A. The new stratified analysis here seeks to deepen our understanding by investigating the underlying factors contributing to this racial disparity. Our objective is to explore potential protective or risk factors among Black and non-Black children through a stratified analysis. The stratified analysis aims to help us understand how race-specific factors may interact with Candida albicans concentrations in infants, thereby enhancing our understanding of the complex relationship between race, environmental factors, and microbial colonization in early childhood. This comprehensive approach not only sheds light on the determinants of Candida albicans colonization but also underscores the importance of considering race as a potential influencing factor in epidemiological studies focused on infant health outcomes.
The dataset to be analyzed consists of 160 mother–infant dyads and includes 86 covariates, tracking children’s data at intervals of 1, 2, 4, 6, 12, 18, and 24 months between 11/22/2017 - 08/20/2020, with each child having a minimum of 2 and up to 7 observations. The reported mother-infant dyads were obtained from a parent cohort study that examines the association between oral Candida and early childhood caries onset in children from birth to 2 years of age [17]. Pregnant women and their infants were recruited from patients visiting the University of Rochester Highland Family Medicine or Eastman Institute for Oral Health Perinatal Dental Clinic.
In short, the PLSMM framework enables variable selection, estimation of effect sizes with adjusted p-values, modeling of the nonlinear temporal relationship between Candida albicans concentration and race, and testing for differences in these functions over time. In this paper, we present a new formulation of PLSMM for stratified analyses.
2 Methods
The birth cohort study was approved by the University of Rochester Research Subject Review Board (Protocol #1248). Written informed consent was obtained from all participants, including authorization to access their medical records. For child participants, legal guardians reviewed and signed a consent form for study participation and medical record access. Recruitment occurred between November 2017 and August 2020 at two clinical sites: the University of Rochester Highland Family Medicine (HFM) and the Eastman Institute for Oral Health (EIOH) Perinatal Dental Clinic. The study was concluded in November 2022. To collect relevant data, participants completed self-reported questionnaires covering demographic, socioeconomic, oral health behavior, medical history, and medication use for both mothers and children. This information was verified against the participants’ electronic medical records. The research team had access to identifiable participant data throughout and following the data collection phase.
2.1 Partial Linear Semiparametric Mixed-Effects Model (PLSMM)
We apply partial linear semiparametric linear mixed-effects model (PLSMM) proposed by Leon and Wu (2025) [15] to analyze the impact of race on Candida albicans (Ca) concentration, employing nonlinear time effects for each race group using distinct basis functions, and conducting variable selection to identify potential variables linked to Ca concentration. The data were collected during 11/22/2017 - 08/20/2020.
The PLSMM is defined as follows:
where is the time (in days), g is the grouping variable (0 for non-Black and 1 for Black),
is the random intercept for individual subjects,
with
a vector of ni ones, and
is the random noise. The regression coefficients
with its associated design matrix
capture the effects of the covariates on
, and
represents the semiparametric part of the model which captures the effect of time for each infant in each group. The function fg is unknown and needs to be estimated. The random terms
and
are assumed to have zero conditional means such that
and
with finite variances where
and
.
Remark 1 (Identifiability). To retain identifiability, we use a mean-zero constraint enforcing both nonlinear functions to have a mean of zero, expressed as . Details about the constraint can be found in [15].
2.2 Estimation of PLSMM
The nonlinear function fg is modeled using a sparse linear combinations of elements in a “dictionary”, which contains a wide range of functions . The approximation is then given by
where is a sparse vector of coefficients estimated using lasso. The dictionary can be a collection of basis functions from different bases (e.g., splines with fixed knots, Fourier basis, power functions, etc.). For the ease of notation and without loss of generality, we assume the observations are sorted by groups. The approximation
can be written using the following block matrix
such that , where
with tij being the time points at the jth observation of subject i in group 0, similarly
with tij being the time points in group 1. We’d like to point out that the matrices
and
do not need to be estimated using the same dictionary, i.e., different sets of basis functions can be used for each search. Thus,
allows to have two distinct nonlinear functions with possibly a very different shape for each group. The first M coefficients of
are associated with group 0, the coefficients from M + 1 to 2M are associated with group 1. The model can then be written as follows:
The nonparametric component allows for a distinct nonlinear relationship between the longitudinal response and time for each group. This approach allows the shape of the functions to be determined by the data rather than assumed a priori. A rich dictionary provides flexibility, while sparsity in
selects only the most relevant basis functions from the dictionary.
To estimate the parametric and nonparametric parts of model (1), collectively denoted by , we adopt a penalized maximum likelihood framework.
The model parameters are estimated using a penalized EM algorithm that maximizes a penalized version of the conditional expectation of the complete-data log-likelihood of . Specifically, the algorithm maximizes 2 with respect to the parameter vector
.
with is the Euclidean norm such that
,
and
. The detailed mathematical derivation of the estimation procedure is provided in [15], where the algorithm was developed for a general variance–covariance matrix
. Here, we focus on the special case assuming
, which leads to simpler expressions for the likelihood terms. Using the EM algorithm to maximize 2 we obtain the estimators of the model parameters, denoted
. The estimation procedure can be carried out using the package plsmmLasso [16], where the function plsmm_lasso() is used for parameter estimation.
The model estimation depends on the value of the tuning parameters γ and λ. We select these parameters by performing a grid search and then selecting the model that minimize the Bayesian Information Criterion (BIC) [18]. The BIC is a method that allows to compare the relative goodness of fit of different models. It is based on Bayesian statistics and attempts to balance the model fitting to the data and the model complexity by penalizing models with a large number of parameters such that
where is the likelihood of the model with parameters θ, k the number of non-zero elements in
and n is the number of observations. A pseudocode outlining the complete estimation procedure is presented in [15] and the tuning can be performed using the tune_plsmm() function from the R package plsmmLasso [16].
The model can be interpreted similarly to a mixed-effects model, with some additional features. The coefficients represent the effects of the covariates,
captures the correlation between repeated measurements, and
allows the model to approximate the true underlying nonlinear functions of time. The model’s goals are to estimate and test differences in the nonlinear time trends between groups and accurately estimate the covariate effects through
while performing variable selection among these covariates. In the next two sections, we outline the procedures used to carry out inference under this model.
2.3 Post-selection inference
Post-selection inference is used to provide valid statistical inference that accounts for the uncertainty introduced in the variable selection step [19]. We use the debiasing procedure proposed in the paper by Leon and Wu (2025) [15].
First, the data requires transformation. Let , the remainder of the responses with the estimated nonlinear component removed. The estimate of the covariance of the random components is:
. Let
and
denote the transformed observations such that
. The debiased estimate of
is defined by
where is a correction score, which can be calculated using either an additional lasso regression or a quadratic optimization method. Here we take the lasso approach using the transformed data. Define the correction score
, where
is the j-th column of the transformed matrix
and
is all columns of
except for the j-th one and
with the tuning parameter .
The two-sided confidence interval at a confidence level for
can be constructed as:
where is the
th quantile of a standard normal distribution and
is an estimator of the variance of
that can be estimated using the following empirical variance estimate
with being the ith sub-vector of
, and
is the ith sub-vector of
. Post-selection inference is necessary to correct the bias introduced by the variable selection procedure. The debiasing procedure corrects for this bias by first transforming the data to remove correlations induced by the random effects, resulting in a version of the dataset
that is suitable for debiasing. Using these transformed observations, the original coefficient estimates are then adjusted to account for the bias introduced by the selection procedure. This approach ensures that the resulting estimates more accurately reflect the true effects of the predictors and allows for the construction of valid confidence intervals for the fixed-effect coefficients. The debiasing procedure can be carried out using the debias_plsmm() function from the R package plsmmLasso [16].
2.4 Testing of the nonparametric component
We first focus on performing an overall test to compare the nonlinear functions between two groups. This hypothesis is:
To quantify the difference, we consider the L2 norm of the difference between the two nonlinear functions f0 and f1. This norm tends to be small when the null hypothesis H0 holds true, and conversely, when H0 is not true the L2 norm will be relatively large. We propose using the following test statistic
The observed test statistic is then calculated by replacing f0 and f1 by their estimated function and
. The L2 norm-based tests have previously been used in similar contexts such as [20]. Intuitively, this test defines a metric that quantifies the overall difference between the two estimated functions: if the functions are similar, the metric is small, whereas larger values indicate greater differences.
The distribution of this L2 norm-based test statistic is generally unknown. In order to assess its statistical significance, we employ a bootstrapping method as described in the paper by Leon and Wu (2025) [15].
Frequently, there is also interest in constructing joint confidence intervals to assess the difference between functions at a specific time point, say . Following the methodology presented in the work by Leon and Wu (2025) [15], we construct bootstrapped joint confidence intervals based on B bootstrap samples, given by
where is the
th quantile of Mb,b = 1,...,B with
Here ,
and
and
,
the estimated function for bootstrap sample b.
Conceptually, represents the average estimated difference between the two functions across the bootstrap samples at time point t*. The quantity Mb, defined as the maximum standardized deviation across all time points, ensures that the resulting joint confidence interval captures the uncertainty over the entire time range simultaneously. By taking the maximum, the interval maintains the nominal coverage probability for all points along the continuum, effectively accounting for multiple comparisons across time. Both tests can be conducted using the test_f() function from the R package plsmmLasso [16].
3 Results
3.1 Data description
The study sample consists predominantly of low-income patients, with 54% of the children identified as Black. The demographic comparison between Black and non-Black mothers reveals several key patterns, as detailed in Table 1.
Maternal age and rates of college education were similar across the two groups. However, Black mothers had higher employment rates and lower marriage rates than non-Black mothers. A greater proportion of non-Black mothers reported not brushing their teeth on the daily basis. Clinical oral health indicators were comparable between groups, including the proportions with plaque index scores and salivary Candida level 3. Post-delivery Candida infections occurred at similar rates, though oral anti-fungal treatment was less common among Black mothers. Furthermore, the proportion of male infants was higher in the Black group (54% vs 45%).
3.2 Stratified analysis
The overall trends of our model can be seen in Fig 1. These trends exhibit a similar pattern across groups: a rapid increase followed by a decrease, succeeded by a gradual rise and then a slow decline. The main distinction lies in the magnitudes of these changes. Black children start at a slightly higher baseline value, but their increases are larger, consistently placing their trend above the non-Black group.
The effect of time can be seen in the second plot, where the trends mostly overlap, with the greatest difference observed at month 1 and 2. To facilitate hypothesis testing of the nonlinear functions, we conducted 10,000 bootstrap resamples, generating corresponding estimated trajectories for each group. From these resampled trajectories, joint confidence bands were constructed to test the difference between and
, as presented in the final plot of Fig 1. The joint confidence bands for the differences between the nonlinear functions shows no significant difference at the observed time points; all the confidence bands for each time points largely cover 0. Additionally, an overall test comparing these nonlinear functions yields a p-value of 0.49, suggesting insufficient evidence to claim that the time trends differ significantly between the groups.
To conduct a stratified analysis, we create a design matrix allowing for the estimation of independent coefficients for each subgroup, or “stratum”, defined by the categorical variable race. The results of the stratified analysis, presented in Table 2, reveal important factors associated with the concentration of C. albicans. Four predictors reached statistical significance. The interpretation of the remaining six, which did not reach significance, is therefore descriptive in nature. Infants with non-Black mothers who exclusively breastfeed or are married have significantly lower levels of C. albicans, indicating a protective effect against Candida colonization. Conversely, infants of Black mothers with detected Candida status post-delivery show increased C. albicans concentration, suggesting a risk factor, however this risk factor is not observed for non-Black children. What makes this finding interesting is that the likelihood of both non-Black and Black mothers having Candida detected post-delivery is similar, a test was conducted and the observed difference was not significant. One possible explanation for this discrepancy could be attributed to vertical transfers. It’s plausible that such transfers are more prevalent in the case of Black mothers, leading to an increased transmission of Candida albicans to their infants.
The estimates denote the coefficients derived from our lasso procedure, while the debiased coefficients represent the values obtained after employing an adapted debiasing method which allows to compute p-values and confidence intervals.
There’s no significant effect between C. albicans concentration and infants of Black mothers with a college-level education or higher, although this could be protective. Factors like Black mothers not brushing their teeth daily, receiving oral antifungal prescriptions within six months post-pregnancy, or having a saliva C. albicans level of 400 or higher aren’t statistically significant but could pose risks. The impact of race alone isn’t statistically significant, but Black infants tend to have a larger C. albicans concentration of 0.49.
Although some coefficients lack statistical significance individually, they were all selected by the lasso procedure, enhancing the model’s predictive accuracy. This implies that while certain variables may not reach traditional significance thresholds on their own, their inclusion in the model improves its ability to predict C. albicans concentration accurately. It is not unexpected that race did not reach statistical significance in the stratified analysis, even though it was a strong predictor in the non-stratified model (Table 1A, p = 0.0002). The stratified analysis decomposes the racial effect into contributions from other predictors, effectively accounting for potential confounding and mediation by socio-economic, environmental, and genetic factors. As a result, the direct effect of race is attenuated, indicating that race itself is unlikely to be the primary driver of elevated C. albicans levels in Black infants. Instead, the observed racial disparities appear largely mediated or confounded by other variables included in the model. The interpretation of the race coefficient in this context encapsulates the residual impact of race, elucidated after accounting for its interaction with other predictors within our model. It denotes the specific influence of race that remains unexplained by the encompassing set of predictors considered.
4 Discussion
Relatively little was known about the relationship between race and ethnicity and oral candidiasis. Our study investigated maternal factors that affect the carriage level of infant oral C. albicans, a fungus linked to early childhood tooth decay, with a particular attention to racial differences since Black children show higher infection rates. Using advanced statistical modeling, we analyzed how maternal and demographic factors influence C. albicans carriage differently across Black and non-Black populations. The key findings revealed race-specific patterns in what protects infants from having a higher amount oral C. albicans. For non-Black infants, exclusive breastfeeding () and maternal marriage (
) were strongly protective factors. In contrast, for Black infants, maternal employment (
) served as a protective factor, while having a mother with post-delivery Candida infection increased risk (
).
Our research found a notably higher prevalence of C. albicans in Black infants, aligning with the observations by Jenks et al. that Black individuals are more susceptible to both surface-level and invasive Candida infections [21]. The researchers highlighted that the exact causes of this increased susceptibility remain uncertain, suggesting it might arise from broader social health determinants, healthcare access inequalities, and various socioeconomic conditions, rather than strictly genetic or biological reasons [21]. Our study suggests that for Black infants maternal employment might protect infants against the transmission of C. albicans from mothers to infants, possibly due to better healthcare access, hygiene practices, and reduced mother-infant contact frequency, minimizing vertical transmission opportunities. Although C. albicans detection among women during the postpartum stage is consistent across racial groups, its presence in Black mothers significantly raises the risk of passing it to infants from close mother-infant contact, such as through breastfeeding. We have analyzed the genetic relatedness of C. albicans isolated from both mothers and children and indicated the high rate of sharing of genetically identical/highly related strains [11]. Although previous studies emphasized the role of poor maternal oral hygiene in increasing C. albicans transmission from mother to child, we did not observe any racial differences in plaque index, which is an indication of the effectiveness of oral hygiene. Additionally, horizontal transmission sources like daycare also pose a risk for C. albicans acquisition in infants, showing that both the transmission point and contact frequency are crucial in C. albicans transmission dynamics [11]. Our research highlighted a significantly high Candida risk Postpartum, in infants born to Black mothers. Due to disparity driven by social determinants, especially in communities where factors like crowded living conditions, limited access to high-quality postpartum care, and other socioeconomic challenges are more prevalent, infants are more likely to be exposed to infectious agents after delivery [22]. In addition, some mothers may continue to carry Candida after delivery, which can be transferred through vertical transmission by kissing, breastfeeding, and other means. A systematic review by Jang et al. found that pregnant women in their third trimester have higher salivary Candida carriage compared to non-pregnant women [23].
Among non-Black infants breastfeeding showed significant protective response against the carriage of C. albicans. Possible explanation lies in: a) breast milk contains a rich supply of antibodies (especially IgA), immune cells, and other factors that play a critical role in shaping the infant’s immune system; these components can protect infants from infections and diseases by directly fighting pathogens and modulating the infant’s immune responses [24]; b) compared to some formula milk, breast milk has a lower risk of leading to excessive sugar exposure. Excessive sugar intake from early life can lead to a higher risk of obesity, diabetes, and other metabolic diseases, indirectly affecting the child’s immune health [25]. Consequently, the elevated risk of caries and colonization by C. albicans in Black infants can be attributed to the lower rates of breastfeeding initiation and duration observed among Black mothers, in comparison to mothers from other racial or ethnic groups [26]. Furthermore, married mothers may experience protective benefits due to a combination of factors, including financial stability, social support, and legal protections associated with marriage. This can contribute to better outcomes for children and the family as a whole [27]. In our study, infants with married non-Black mothers had significantly lower levels of C. albicans, indicating a protective effect against Candida colonization.
Black, Hispanic, and other minority women in the United States continue to have lower breastfeeding rates compared to white women. The disparities are influenced by various factors, including socioeconomic status, education level, cultural beliefs, and access to healthcare and breastfeeding support [26]. Black women have the lowest rates of breastfeeding initiation and continuation at 6 and 12 months compared to all other racial/ethnic groups [26]. Consistent with these findings, we observed a lower prevalence of exclusive breastfeeding among Black infants. The lack of data or underrepresentation of purely breastfed Black infants in our study may explain why exclusive breastfeeding was not statistically significant for Black infants. Given that the protective benefits of exclusive breastfeeding are expected to apply across all racial groups, without suggesting a race-specific effect, another possible explanation for the lack of significance could be the potential for vertical transmission being more common among Black mothers. While exclusive breastfeeding provides numerous benefits, the close contact involved in breastfeeding may inadvertently increase the risk of vertical transmission. This is reflected in the debiased coefficients for exclusive breastfeeding, which are -0.59 for Black mothers (not statistically significant) compared to -1.06 for White mothers (statistically significant).
Utilizing the PLSMM framework enables us to achieve several key objectives: (1) perform variable selection to identify potential associations with Candida albicans concentration in each stratum; (2) estimate the effect sizes and obtain adjusted p-values for the selected variables; (3) model the relationship between the Candida albicans concentration and race over time; (4) model the nonlinear pattern of the temporal effects; (5) test for differences in the nonlinear functions globally and at specific time points. In a PLSMM, the longitudinal response can be considered as a function of time, where time is treated as a continuous variable. The model fitting is data-driven and has no restriction on the shape of the fitted model. By doing that, a nonlinear relationship is allowed between the longitudinal response and predictors. It is particularly advantageous for this application as the function of time appears to be nonlinear and non-smooth. Alternative approaches of PLSMM generally assume a certain degree of smoothness in the underlying function, and if the true function is not smooth, the estimators may fail to accurately capture the underlying structure. PLSMM adds flexibility to the model specification and has demonstrated superiority over splines in scenarios where the nonparametric component involves both smooth and non-smooth functions [28].
Overall, C. albicans carriage differences between Black and non-Black infants reflect complex transmission pathways within distinct social contexts rather than isolated factors. This requires an integrated framework positioning social determinants as foundational to both vertical transmission (pregnancy/delivery) and horizontal transmission (post-birth exposure). Race-specific protective factors support this model: exclusive breastfeeding and mother’s marriage status protect non-Black infants, suggesting traditional structures operate effectively within their context. Conversely, maternal employment protects Black infants while post-delivery maternal Candida increases infants’ risk. These patterns indicate different social environments create distinct risk and protective mechanisms through both transmission pathways, requiring targeted interventions addressing population-specific circumstances.
Given these findings, further research into the role of vertical and horizontal transmission is necessary. Such studies could provide deeper insights into how environmental factors, childcare practices, and community health behaviors contribute to the spread of C. albicans among infants and children. Understanding these dynamics is crucial for developing comprehensive prevention strategies that encompass both pathways of transmission.
Appendix A. Non-stratified analysis results
Column 1 shows the estimates using the penalized-EM algorithm, and Column 2 lists the debiased estimates, and Columns 3-5 report the debiased p-value and 95% confidence intervals.
References
- 1. Bachtiar EW, Bachtiar BM. Relationship between Candida albicans and Streptococcus mutans in early childhood caries, evaluated by quantitative PCR. F1000Research. 2018;7.
- 2. Jean J, Goldberg S, Khare R, Bailey LC, Forrest CB, Hajishengallis E, et al. Retrospective analysis of candida-related conditions in infancy and early childhood caries. Pediatr Dent. 2018;40(2):131–5. pmid:29663914
- 3. Xiao J, Huang X, Alkhers N, Alzamil H, Alzoubi S, Wu TT, et al. Candida albicans and early childhood caries: a systematic review and meta-analysis. Caries Res. 2018;52(1–2):102–12. pmid:29262404
- 4. Xiao J, Moon Y, Li L, Rustchenko E, Wakabayashi H, Zhao X, et al. Candida albicans Carriage in Children with Severe Early Childhood Caries (S-ECC) and Maternal Relatedness. PLoS One. 2016;11(10):e0164242. pmid:27741258
- 5. Koo H, Andes DR, Krysan DJ. Candida-streptococcal interactions in biofilm-associated oral diseases. PLoS Pathog. 2018;14(12):e1007342. pmid:30543717
- 6. Hwang G, Liu Y, Kim D, Li Y, Krysan DJ, Koo H. Candida albicans mannans mediate Streptococcus mutans exoenzyme GtfB binding to modulate cross-kingdom biofilm development in vivo. PLoS Pathog. 2017;13(6):e1006407. pmid:28617874
- 7. Falsetta ML, Klein MI, Colonne PM, Scott-Anne K, Gregoire S, Pai C-H, et al. Symbiotic relationship between Streptococcus mutans and Candida albicans synergizes virulence of plaque biofilms in vivo. Infect Immun. 2014;82(5):1968–81. pmid:24566629
- 8. Eidt G, Waltermann EDM, Hilgert JB, Arthur RA. Candida and dental caries in children, adolescents and adults: a systematic review and meta-analysis. Arch Oral Biol. 2020;119:104876. pmid:32905885
- 9. Alkhars N, Zeng Y, Alomeir N, Al Jallad N, Wu TT, Aboelmagd S, et al. Oral Candida Predicts Streptococcus mutans Emergence in Underserved US Infants. J Dent Res. 2022;101(1):54–62. pmid:34018817
- 10. Patil S, Rao RS, Majumdar B, Anil S. Clinical Appearance of Oral Candida Infection and Therapeutic Strategies. Front Microbiol. 2015;6:1391. pmid:26733948
- 11. Alkhars N, Al Jallad N, Wu TT, Xiao J. Multilocus sequence typing of Candida albicans oral isolates reveals high genetic relatedness of mother-child dyads in early life. PLoS One. 2024;19(1):e0290938. pmid:38232064
- 12. Azevedo MJ, Araujo R, Campos J, Campos C, Ferreira AF, Falcão-Pires I, et al. Vertical transmission and antifungal susceptibility profile of yeast isolates from the oral cavity, gut, and breastmilk of mother-child pairs in early life. Int J Mol Sci. 2023;24(2):1449. pmid:36674962
- 13. Eke PI, Borgnakke WS, Genco RJ. Recent epidemiologic trends in periodontitis in the USA. Periodontol 2000. 2020;82(1):257–67. pmid:31850640
- 14.
Lin M, Griffin SO, Gooch BF, Espinoza L, Wei L, Li CH, et al. Oral health surveillance report: trends in dental caries, sealants, tooth retention,, edentulism and United States: 1999 –2004 to 2011–6. 2019.
- 15. Leon S, Wu TT. Comparison of longitudinal trajectories using a high-dimensional partial linear semiparametric mixed-effects model. J Am Stat Assoc. 2025:10.1080/01621459.2024.2441523. pmid:41111609
- 16.
Leon S, Wu TT. plsmmLasso: variable selection and inference for partial semiparametric linear mixed-effects model. 2024. https://CRAN.R-project.org/package=plsmmLasso
- 17. Alkhars N, Al Jallad N, Wu TT, Xiao J. Multilocus sequence typing of Candida albicans oral isolates reveals high genetic relatedness of mother-child dyads in early life. PLoS One. 2024;19(1):e0290938. pmid:38232064
- 18. Schwarz G. Estimating the dimension of a model. Ann Statist. 1978;6(2):461–4.
- 19. Berk R, Brown L, Buja A, Zhang K, Zhao L. Valid post-selection inference. Ann Statist. 2013;41(2):802–37.
- 20. Horváth L, Kokoszka P, Reeder R. Estimation of the mean of functional time series and a two-sample problem. Journal of the Royal Statistical Society Series B: Statistical Methodology. 2012;75(1):103–22.
- 21. Jenks JD, Aneke CI, Al-Obaidi MM, Egger M, Garcia L, Gaines T, et al. Race and ethnicity: risk factors for fungal infections?. PLoS Pathog. 2023;19(1):e1011025. pmid:36602962
- 22. Njoku A, Evans M, Nimo-Sefah L, Bailey J. Listen to the whispers before they become screams: addressing black maternal morbidity and mortality in the United States. Healthcare (Basel). 2023;11(3):438. pmid:36767014
- 23. Jang H, Patoine A, Wu TT, Castillo DA, Xiao J. Oral microflora and pregnancy: a systematic review and meta-analysis. Sci Rep. 2021;11(1):16870. pmid:34413437
- 24. Lokossou GAG, Kouakanou L, Schumacher A, Zenclussen AC. Human breast milk: from food to active immune response with disease protection in infants and mothers. Front Immunol. 2022;13:849012. pmid:35450064
- 25. Cheshmeh S, Nachvak SM, Hojati N, Elahi N, Heidarzadeh-Esfahani N, Saber A. The effects of breastfeeding and formula feeding on the metabolic factors and the expression level of obesity and diabetes-predisposing genes in healthy infants. Physiol Rep. 2022;10(19):e15469. pmid:36200185
- 26. Jones KM, Power ML, Queenan JT, Schulkin J. Racial and ethnic disparities in breastfeeding. Breastfeed Med. 2015;10(4):186–96. pmid:25831234
- 27. McLanahan S, Beck AN. Parental relationships in fragile families. Fut Child. 2010;20(2):17.
- 28. Arribas-Gil A, Bertin K, Meza C, Rivoirard V. LASSO-type estimators for semiparametric nonlinear mixed-effects models estimation. Stat Comput. 2013;24(3):443–60.