National, Regional, and Global Trends in Infertility Prevalence Since 1990: A Systematic Analysis of 277 Health Surveys

Gretchen Stevens and colleagues use information from demographic reproductive health surveys to estimate the global, regional, and country levels, patterns, and trends in infertility between 1990 and 2010.

Text B. Methods to account for bias due to incomplete information on marriage status and contraceptive use Many household surveys ascertain current contraceptive use, but do not collect information on past contraceptive use over a defined exposure period. Likewise, data on time since first union are available more often than data on couple status during a defined exposure period. Failure to account for changing contraceptive use over time can lead to biases in secondary infertility estimates of more than 100% for some age groups, and assuming that exposure is continuous from the time of first union can lead to smaller biases, generally under 5% 1 . We used data from surveys that collected data on continuous contraceptive use and couple status to develop regression equations to predict unbiased estimates of infertility (taking into account continuous contraceptive use and union status) from biased estimates (calculated assuming that current contraceptive use reflects past contraceptive use, and that the respondant has been in a union continuously since her first union).
Two linear regressions were developed to estimate unbiased primary and secondary infertility from biased primary and secondary infertility estimates, respectively. This analysis pooled household surveys which collected continuous contraceptive use and union status, a total of 53 Demographic and Health Surveys (DHS; see Table B). For primary infertility, the dependent variable was the natural log of the unbiased estimate of infertility, and the independent variables were the natural log of the biased estimate, and a an indicator variable for women under age 30, among whom current contraception use as a proxy for past use leads to a larger bias 1 . For secondary infertility, the dependent variable was the natural log of the unbiased estimate of infertility prevalence; independent variables were the natural log of the biased estimate, age, square of age, and the prevalence of contraceptive use in the survey sample. Regressions were restricted to surveys in which 30 or more individuals were identified as infertile to avoid using uncertain data points to fit the regression. We calculated uncertainty in the predicted prevalences using the standard error predicted using the stdf option in Stata 10.1; this option provides a standard error for a single prediction or forecast 2 .
Predictive variables in ordinary least squares linear regression to predict primary infertility prevalence. Dependent variable is the natural log of the unbiased prevalence estimate. N = 75 and the adjusted R 2 of the regression is 0.928. Predictor Coefficient Constant -0.23 (-0.47, -0.01) Natural log of biased prevalence estimate (calculated using current contraceptive use as a proxy for past use, and time since first union as a proxy for time in a union) 0. 95 (0.89, 1.02) Indicator variable equal to 1 for ages < 30 and equal to 0 for ages ≥ 30 -0.042 (-0.080, -0.003) Predictive variables in ordinary least squares linear regression to predict secondary infertility prevalence. Dependent variable is the natural log of the unbiased prevalence estimate. N = 197 and the adjusted R 2 of the regression is 0.972. Predictor Coefficient Constant -2.8 (-3.4, -2.1) Natural log of biased prevalence estimate (calculated using current contraceptive use as a proxy for past use, and time since first union as a proxy for time in a union) 1.03 (0.97, 1.09) Age (years) 0.14 (0.10, 0.18) Square of age (years) -0.0017 (-0.0024, -0.0010) Prevalence of contraceptive use in the survey sample -0.54 (-0.73, -0.35) Text C. Bayesian hierarchical model We fit four Bayesian hierarchical logistic regressions to separately estimate the prevalence of and exposure to primary and secondary infertility for couples in each country-age group, indexed by the age of the female partner. We selected uninformative priors for all parameters fit in this model. For observation h, we define y h to be the number of subjects with infertility (as defined in the main text) out of a total effective sample size n h . Our model includes a 6-level hierarchy: • 1,653 observations 1 are denoted o and indexed by h.
• 277 studies are denoted st and indexed by i.
• 104 countries are denoted c and indexed by j.
• 21 subregions are denoted sb and indexed by k.
• 8 regions are denoted r and indexed by l.
• Global-level effects are denoted g.
Our logistic regression is as follows, with covariates X defined below. p h is the prevalence of infertility for a given observation h: We calculate y h based on the analyses described in Text A and Text B, in order to reflect uncertainty from each survey's complex design and the uncertainty from the adjustments we make to prevalence. In Text B we described our methods to estimate unbiased primary and secondary infertility from biased primary and secondary infertility. This gave us unbiased estimates and standard errors surrounding these estimates. To propagate these errors, we adjusted the variance for observations from bias-corrected studies in our Bayesian model as follows: rather than using the y h 's as the data in our likelihood function at each iteration of our sampler, we first multipled each y h by a draw from a log-normal distribution with mean 1 and standard deviation equal to the standard error calculated for the bias-correction.
The hierarchical a's have normally distributed priors: where the global, regional, subregional, and country standard deviations were given uniform priors U(0, 100). Study-specific random effects allow for additional variability in studies not explained by sampling uncertainty. The variance of the study-specific random effects depends on whether the study is national or subnational, so: The a o h error terms, one per observation, allow for an overdispersed binomial distribution in our data. The resulting model is called the Binomial-normal model 3 . Since this parameter is constrained by a common normal distribution with variance σ 2 o , the degree of overdispersion that is necessary is informed by the data, and shrinks to zero if the data are binomially distributed.
Year of study denoted t was included as a hierarchically modeled covariate. A global slope b g had prior N (0, 1000) and additionally separate region-specific slopes b r l [h] had normal priors with mean zero and a shared standard deviation with a uniform prior U(0, 100). Before fitting the model, year values were normalized to have mean 0 and standard deviation 1.
Additional covariates representing characteristics of the data at the observation, study, and country levels were included. The observation-specific characteristics differed by indicator modeled. For all models, the observation-specific covariates included indicators for each age category, denoted by indicator variables z 0 , z 1 , . . . , z 4 (corresponding to membership in the age groups 20-24, 25-29, 30-34, 35-39, 40-44). To model the exposure to primary infertility, we additionally included interactions between indicators for each age category and high income status, denoted by indicator variables v 0 , v 1 , . . . , v 4 (corresponding to membership in a high income region and in the age groups 20-24, 25-29, 30-34, 35-39, 40-44). The age coefficients shared a normal prior with mean 0 and uniform U(0, 20) standard deviation. The studylevel covariate is whether the study is from the World Fertility Survey series wfs i [h] (an indicator variable), with prior N (0, 1000). The country-level covariate is mean years of maternal education mat j [h] , with prior N (0, 1000). Mean years of education was normalized to have mean 0 and standard deviation 1.
For primary and secondary prevalence of infertility and exposure to secondary infertility, our covariates are: X h β = β 0 z 0 h + β 1 z 1 h + β 2 z 2 h + β 3 z 3 h + β 4 z 4 h + β 5 wfs i[h] + β 6 mat j [h] For exposure to primary infertility, our covariates are: Other than age (described above) the covariates were all assigned uninformative uniform priors.
To fit the parameters of our Bayesian hierarchical model to our dataset, we used the PyMC package in python 4 , which provides an implementation of Markov chain Monte Carlo (MCMC) sampling. We specified our model as described above. For each of the four indicators (primary or secondary, prevalence or exposure), we had the following sampling scheme: we ran four chains in parallel. After a burn-in of 5 million iterations, we ran each chain for an additional 5 million iterations, and then thinned our chains by a factor of 12,500, thus obtaining 400 posterior draws from each chain for a total of 1,600 draws for each model.
To assess mixing and convergence, we inspected graphical summaries, including plots of traces and autocorrelation plots, and also summary plots comparing parameter estimates from different chains for the same model. The sampling scheme described above was arrived at through tuning based on trace and autocorrelation plots, which showed us how well the MCMC sampler was exploring the parameter spacewe used a large number of iterations and thinned our chains because we had high autocorrelation initially. We also calculated the Gelman-Rubin R-hat statistic 5 , which allowed us to monitor convergence by ensuring that the parallel chains eventually converged to the same distribution for each parameter.
To predict primary and secondary infertility and exposure for each country-year-age unit, we include global, regional, and country effects, covariate effects, and age effects. Study effects (a st ) and error terms (a o h ) are not included when making predictions, and the World Fertility Survey covariate wfs i[h] is set to 0. For predictions we use the following equation, where the values of the parameters from the MCMC sampling at each iteration give a separate posterior prediction at that iteration:   Indonesia  1987 Demographic and Health Survey  2, 3  7777  4195  9860  Indonesia  1991 Demographic and Health Survey  1, 3  14682  7761  19133  Indonesia  1994 Demographic and Health Survey  1, 3  18359  9522  23834  Indonesia  1997 Demographic and Health Survey  1, 3  18442  8935  24249  Indonesia  2002 Demographic and Health Survey  1, 3  18565  8322  24464  Indonesia  2007 Demographic and Health  Notes: 1 Information on past contraceptive use and past couple status were available 2 Data on current use of contraception and couple status were available, but data on past use and status were incomplete 3 Survey excluded never-married women 4 Data on secondary infertility from China were not used 5 Survey is representative of a subnational area * Sample size refers to the proportion of women who were included in the calculations of primary or secondary infertility (i.e., the denominator in th eprevalence calculation) ** Survey size refers to the total number of women aged 15-44 surveyed.  Table C: Crossvalidation statistics with standard errors: 10-fold crossvalidation was performed with replacement, where each sample left out 20% of the countries in the dataset (crossvalidation is described in the main text).  1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999  1-Union is defined as marriage or cohabitation 2-Contraception use includes any method used to prevent birth 3-Wants a child, undecided, or declared infecund 1-Union is defined as marriage or cohabitation 2-Last child refers to the most recent birth 3-Contraception use includes any method used to prevent pregnancy 4-Wants a child, undecided, or declared infecund   Figure G: Trends in prevalence of secondary infertility by year, shown for each country with data on following pages. Infertility is indexed on the female partner; age-standardized prevalence among women aged 20-44 years is shown here. Legend for plots is given below.

Primary prevalence
Nationally representative (error bars represent standard errors) Nationally representative, data do not cover the entire 20-44 year age range Subnational Subnational, data do not cover the entire 20-44 year age range World Fertility Survey World Fertility Survey, data do not cover the entire 20-44 year age range Posterior means (model predictions) 95% credible interval Zimbabwe Figure H: Trends in exposure to primary infertility by year, shown for each country with data on following pages. Infertility is indexed on the female partner; age-standardized prevalence among women aged 20-44 years is shown here. Legend for plots is given below.
Nationally representative (error bars represent standard errors) Nationally representative, data do not cover the entire 20-44 year age range Subnational Subnational, data do not cover the entire 20-44 year age range World Fertility Survey World Fertility Survey, data do not cover the entire 20-44 year age range Posterior means (model predictions) 95% credible interval Figure I: Trends in exposure to secondary infertility by year, shown for each country with data on following pages. Infertility is indexed on the female partner; age-standardized prevalence among women aged 20-44 years is shown here. Legend for plots is given below.
Nationally representative (error bars represent standard errors) Nationally representative, data do not cover the entire 20-44 year age range Subnational Subnational, data do not cover the entire 20-44 year age range World Fertility Survey World Fertility Survey, data do not cover the entire 20-44 year age range Posterior means (model predictions) 95% credible interval Age (years) Prevalence of exposure to secondary infertility (%) Figure J: Age pattern of infertility prevalence and exposure, by development status and type of infertility. All countries were modeled to have the same age pattern for primary infertility, secondary infertility, and exposure to secondary infertility; the age pattern for exposure to primary infertility varied by development status. See Text C for details.  Figure  3 in the main text.) Infertility prevalence is indexed on the female partner; age-standardized prevalence among women aged 20-44 is shown here.  Figure 4 in the main text.) Infertility prevalence is indexed on the female partner; age-standardized prevalence among women aged 20-44 is shown here.  Data from DHS surveys are shown; surveys using the reproductive health calendar collected information on past contraceptive use and union statues, whereas surveys using the questionnaire identify exposed unions using current contraceptive use and union status, without correction for incomplete information. Dotted lines indicate equal prevalence using the two exposure periods.