A Validated Age-Related Normative Model for Male Total Testosterone Shows Increasing Variance but No Decline after Age 40 Years

The diagnosis of hypogonadism in human males includes identification of low serum testosterone levels, and hence there is an underlying assumption that normal ranges of testosterone for the healthy population are known for all ages. However, to our knowledge, no such reference model exists in the literature, and hence the availability of an applicable biochemical reference range would be helpful for the clinical assessment of hypogonadal men. In this study, using model selection and validation analysis of data identified and extracted from thirteen studies, we derive and validate a normative model of total testosterone across the lifespan in healthy men. We show that total testosterone peaks [mean (2.5–97.5 percentile)] at 15.4 (7.2–31.1) nmol/L at an average age of 19 years, and falls in the average case [mean (2.5–97.5 percentile)] to 13.0 (6.6–25.3) nmol/L by age 40 years, but we find no evidence for a further fall in mean total testosterone with increasing age through to old age. However we do show that there is an increased variation in total testosterone levels with advancing age after age 40 years. This model provides the age related reference ranges needed to support research and clinical decision making in males who have symptoms that may be due to hypogonadism.


Introduction
In the male, testosterone secretion from the Leydig cells in the testes has a central role in developing secondary sexual characteristics, supporting spermatogenesis and regulating libido [1]. Synthesis and secretion are under the stimulation of the gonadotrophin luteinizing hormone (LH) from the anterior pituitary gland and approximately 98% of circulating testosterone is bound to plasma proteins, with the remaining 2% circulating freely [2]. Whether healthy adult men maintain serum testosterone concentrations throughout life, and the implications of a postulated decline and thus its potential for therapy, have been widely debated but remain unclear [3,4].
A number of studies have reported decreasing testosterone levels in men with age. This includes a study involving both crosssectional and longitudinal components [5], that reported low levels (,11.3 nmol/L) of total testosterone in up to 20% of men over 60, 30% over 70 and 50% over 80 years of age and suggested that further investigation of testosterone replacement in aged men, perhaps targeted to those with the lowest serum testosterone concentrations, was justified [5]. Longitudinal observation in the Massachusetts Male Aging Study also showed a decrease in total testosterone (TT) with increasing age [6], particularly when accompanied by increasing obesity. Male hypogonadism is a clinical condition resulting from testosterone deficiency as a result of abnormalities of testicular, hypothalamic or pituitary function. The diagnosis is based on clinical and biochemical findings and has been shown to be associated with impaired sexual function, impaired cognitive function [7,8], depression [9], abdominal obesity, low bone and muscle mass [10], diabetes, and prediabetic states (including insulin resistance, impaired glucose tolerance and the metabolic syndrome) that may lead to an increase in risk of cardiovascular disease [11][12][13]. Overall, cardiovascular mortality is increased in late-onset hypogonadism [14,15]. Testosterone replacement in young hypogonadal men results in significant improvement of libido and sexual function and is of clear benefit [16], but it remains to be established in older men that general health and other manifestations of the metabolic syndrome are improved [17][18][19]. Testosterone replacement therapy in older hypogonadal men is associated with an increased risk of cardiovascular events [20,21], and such concerns have led to a current reappraisal of the safety of testosterone replacement (http://www.fda.gov/drugs/drugsafety/ucm383904.htm).
As the diagnosis of hypogonadism includes identification of serum testosterone levels below the normal range for healthy males, there is an underlying assumption that normal ranges of testosterone for the healthy population are known for all ages. However, to our knowledge, no such reference model exists in the literature, and hence the identification of individuals as hypogonadal does not have a widely applicable biochemical basis. In this study we derive and validate a normative model of TT across the lifespan.

Data acquisition
Using an established methodology [22][23][24], studies containing TT measurements of healthy human males at a known age were identified by performing Medline and Embase searches, using the search terms Humans, Testosterone, Males and Reference values. The reference lists of selected studies were checked, and cited references were retrieved to identify further relevant studies. Manuscripts were included for analysis if they reported TT levels for healthy normal males with no known testicular or endocrinological disorders. Data from subjects with an identified chronic illness or on testosterone replacement therapy were excluded from the dataset, as were data from fetal blood and cord blood. These studies provided the basis for a dataset for TT that approximates the healthy human population from childhood to old age. Twentyseven studies were identified that met the inclusion criteria; eleven studies were excluded because only descriptive statistics for study data were reported [25][26][27][28][29][30][31][32][33][34][35].
For the remaining sixteen studies, data were extracted from scatterplots using Web Plot Digitizer v2.6 (http://arohatgi.info/ WebPlotDigitizer) to convert the datapoints into pairs of numerical values denoting age and TT (n = 10,458; age range 0-101 years). Two researchers (TWK & LQL) extracted data from each plot with the results being compared (a) for inter-observer agreement and (b) agreement with the published descriptive statistics. Interobserver agreement limits were set at 99% for both age and TT levels; in the event that this limit was not reached due to observer or calibration error, the data were extracted again by both observers. Longitudinal data were recorded as cross-sectional values. Serum testosterone values were standardised to give units of testosterone in nmol/L using the standard multiplication factor 1 ng/dL = 0.0347 nmol/L.
The current gold standard assay for male testosterone is mass spectroscopy-based LC-MS/MS [36], and several of our data sources measured testosterone using non-extraction platform immunoassays. A recent analysis of serum testosterone measurements (n = 3,174) comparing one such platform immunoassay demonstrated good accuracy at all concentrations found in eugonadal as well as hypogonadal men, when compared to mass spectroscopy assay values [37]. A further search of the biomedical literature was therefore performed to identify conversion formulae from other assay values to LC-MS/MS values [38][39][40], so that our modelling is concordant with current endocrinological best practice. Since the conversions used correlate strongly (85%-97%, Table 1) and have small Bland-Altman mean differences [41], their use introduces no significant bias into the combined data. For three of our identified studies [42][43][44], no such conversion formula was found (in-house assays were used and no direct comparison with LC-MS/MS values was made), and their data were therefore excluded. Data were also censored at age 3 years: extracted values below this age failed to match the descriptive statistics published with the chart, and therefore could not be used to model accurately the height or age of the expected peak in TT in early infancy.
The final dataset (n = 10,097; age range 3-101 years) obtained from 13 studies (Table 1) represents a typical random sample from the healthy male population, and was used as the basis for normative model selection and verification. All data in this study were extracted from existing publications in the scientific and medical literature. Data relating to individual human subjects was not included and therefore specific ethical approval was not required.

Data analysis
Zero TT values at conception were added to the combined dataset, in order to force models through the only known level at any age; these values were not taken into account when calculating model errors and fit. Since variability increases with testosterone level, the data were log-adjusted (after adding one to each value so that zero testosterone on a chart represents zero testosterone level). We then fitted 330 mathematical models to the data using TableCurve-2D (Systat Software Inc., San Jose, California, USA), and ranked the results by coefficient of determination, r 2 . Each model defines a generic type of curve and has parameters which, when instantiated gives a specific curve of that type. For each model we calculated values for the parameters that maximise the r 2 coefficient. The Levenberg-Marquardt non-linear curve-fitting algorithm was used throughout, with convergence to 9 significant figures after a maximum of 4,000 iterations, for models having up to 21 parameters. For each candidate model, the mean square error and r 2 were calculated after removing the artificial zero values at conception. In addition LOESS regression [45] was used to investigate the possibility that the best predictive model may be an ensemble of locally linear or quadratic models, rather than a single model covering all age ranges. The best performing family of models were rational polynomials. 5-fold cross validation was performed: the data were randomly split into 5 equally sized subsets. For each subset S, the other four subsets were used to train rational polynomials having 3-11 parameters, with subset S being held back as test data. The mean square error of the test data was calculated and compared to the mean square error of training data for the same model. In other words, the estimated prediction error of a model when generalized to unseen data was compared to the training error of the model. A model was considered validated if 1. the residuals of the test data were approximately normally distributed, and 2. the tradeoff between high r2 for the training data (denoting possible overfitting to the data) and low generalisation error for the unseen test data (denoting possible underfitting to the data) was optimal.

Results
The validated model is a rational polynomial of the form where TT is measured in nmol/L and x denotes age in years.
Model coefficients a -f are given in Table 2, and relationship to the data given in Figure 1. The model has coefficient of determination r 2 = 0.41 indicating that around 41% of the variation in serum TT throughout healthy male life is due to age alone, and that 59% in the variation is therefore due to other factors such as lifestyle, anthropometry and health status. The r 2 for the best-fitting LOESS model was 0.32, establishing the optimality of the single regression model in terms of goodness-offit. The residual plot for the validated model ( Figure 2) shows a distribution close to the ideal Gaussian curve (r 2 = 0.99). Moreover, the proportions of residuals within one, two and three standard deviations (respectively 71%, 96% and 99%) are close to the expected values for data with a Gaussian distribution (respectively 68%, 95% and 99%). Figure 3 is an exemplar of the 5-fold validation process in which a model is chosen that neither overfits nor underfits the underlying dataset.
Our log-unadjusted normative model ( Figure 4) provides average TT values for the entire age range, together with normative ranges in terms of standard deviations away from age-related mean levels. The same model is given in terms of centiles in Figure 5. Residual plots for each decade of age are supplied as supporting information (Figure S1a-h), as are the remaining cross-validation plots ( Figure S2), and the TableCurve inputs and output for the validated model (Table S1). Mean and normative ranges for serum TT in healthy males are given for ages from 3 to 88 years in Table 3.
We show that TT peaks [mean (2.5-97.5 percentile)] at 15.4 (7.2-31.1) nmol/L at an average age of 19 years, and falls in the average case [mean (2.5-97.5 percentile)] to 13.0 (6.6-25.3) nmol/ L by age 40 years, but we find no evidence for a further fall in mean TT with increasing age through to old age. However we do show that there is an increased variation in TT levels with advancing age after age 40 years. Our analyses show that the 95% prediction limit increases from 18.7 nmol/L at age 40 years to 24.5 nmol/L at age 88 years. The model provides centile and/or standard score values for an individual when compared to the population as a whole.

Discussion
Using data-driven modelling and analysis, we have derived a normative model of total testosterone throughout the lifespan. We have shown that in the average healthy male testosterone is low in pre-puberty, rises from age 11 and peaks at age 19 at 15.4 (7.2-31.1) nmol/L [mean (2.5-97.5 percentile)]. Thereafter TT falls slightly to age 40 years to 13.0 (6.6-25.3) nmol/L. We find no evidence to support a progressive decline in testosterone in middleaged and older men, sometimes termed the 'andropause', as TT does not fall significantly in the average man after the age of 40 years. Our analyses show that the 95% prediction limit increases   onwards in 2,194 men (95% CI of 0.2% to 0.6%), whereas Yeap et al. [29] found that TT did not decline with advancing age in older men (aged 70-89, n = 3,645). Halmenschlager et al. (n = 428) [26] report both no decline in TT with advancing years and an increase in variance later in life. Our analysis of the combined data from 13 studies shows that TT levels do not decline after age 40 years in the average case.
We now compare our results with those reported in studies that were not used to form our dataset. These comparisons are necessarily qualitative since the data were in the form of descriptive statistics or could not be reliably converted to the LC-MS/MS assay and hence were excluded during our data acquisition process. For each study we supply the number of subjects involved, n, to aid comparison with the studies used as a basis for our dataset. Rohmann et al. [27] report a decline of 1.0% per year from age 35 (n = 1,351), in qualitative disagreement to our findings. However they report geometric means as opposed to arithmetic means and do not report the sample size for each calculation, so a detailed comparison of their results to ours is not possible. Muller et al. [32], Mohr et al. [31] and Simon et al. [35] all report a small annual decline of 0.4%, 0.3% and 0.5% respectively (n = 400, 1,677, 1,408 respectively) and show no  increase in variance, also in qualitative disagreement with our two key findings. Frost et al. [25], Boyce et al. [46] and Orwoll et al. [30] (n = 783, 266, 2,623 respectively) all report no decline in serum TT with advancing age but also no increase in variance. Rhoden et al. [34] report that not only does serum TT not fall with advancing age, but there is also an increase in variance across the lifespan from age 40 onwards (n = 1,071). Taken together, these studies provide partial qualitative external validation for our model, but do not completely resolve the issue of contradictory single-centre study outcomes.
Our model is derived from data from multiple sources of the measurement of TT in over 10,000 healthy males aged between 3 and 101 years. This is both a strength and weakness of the study. The strength is that modelling power is increased by the provision of large numbers of datapoints for a wide range of ages: it has been previously shown that models that include both prepubertal, pubertal and adult ages can be used to derive important insights for a restricted age range [47]. The weakness is the approximate heterogeneity of the values obtained from diverse sources, especially as assay conversion factors were used that have known high correlation but are nevertheless inexact. This includes studies that involve convenience samples (e.g. primary care and outpatient attenders) as well as those that involve population derived cohorts. Further limitations of our approach are that insufficient data were found to model accurately neonatal ages, and that we had to exclude potentially useful studies that used in-house assays which lack standardisation and harmonisation, and for which no conversion formula has been published [48]. Ten of the thirteen studies used as data sources (Table 1) excluded subjects taking medication that could affect the endocrine system, but three studies [5,6,49] (combined n = 2,371 of 10,097) do not have equivalently explicit inclusion criteria. We can therefore not rule out the possibility that a small number of subjects were on medication that increased their TT levels.
Our results suggest that the reported increase in the proportion of hypogonadal men with increasing age can be attributed to the increase in variance of testosterone levels with increasing age, as opposed to an age-related decline in testosterone levels for the population as a whole. In particular, our model provides a coherent explanation for the widely believed but incorrect assertion that the prevalence of male hypogonadism increases from 12% in men in their 50s to 49% in men in their 80s when hypogonadism is defined as a TT level lower than the 2.5 percentile [5,50], or lower than 6.4 nmol/L using the LC-MS/MS assay. These assertions are incorrect since it is not possible to have more than 2.5% of the population included in an age-related 2.5 th centile. However, if the definition of hypogonadism is based on a TT level lower than a fixed value, then the prevalence of hypogonadism will indeed increase due to increased variance with advancing age. A common rationale for the increased prevalence of low serum testosterone levels is the assertion of an annual average case decline of 1% or more [6,50]. As shown above, the majority of cross-sectional studies either report no decline in the average case, or a moderate annual decline in testosterone levels. Again, the disparity between the common explanation given and the data in the literature is that a greater proportion of individuals have lower levels of testosterone with increasing age.
There is disagreement on the indications for the use of hormonal therapy in men with apparently age-related low testosterone concentrations [51,52]. Our analysis of the combined data from several studies agrees to a certain extent with both sides of the controversy. We find no evidence that TT declines in the average case after the age of 40 years for ageing males. We do find that the prevalence of higher and lower testosterone levels increases with age, and hence that there is a larger number of men potentially at risk of androgen-related disorders. Our study shows that the increasing proportion of men commonly regarded as having abnormally low (or high) levels of TT can be accounted for by the increase in variance in testosterone levels with age. Factors that have been identfied as determinants of lower testosterone include obesity in many studies, as well as the development of other co-morbidities [53][54][55]. Obesity was not an exclusion criteria for our data acquistion, and our age-related reference ranges can be used for quantitative evaluation of the relationship between high body-mass index and low testosterone. There is increasing concern over testosterone supplementation in  Table 3. Normative age-related total testosterone reference values in nmol/L.  Table 3. Cont.  men, both generally and more particulary in those with comorbidities [20,21]. Whether all or subgroups of older hypogonadal men might benefit from replacement requires critical assessment, for which a robustly-established normal range provides an important basis. This analysis also highlights the increasing proportion of men with high testosterone with increasing age. This intriguing finding is both at odds with the notion of an age-related fall in the general case, and may be relevant to diseases more prevalent in this age group. For example, a recent study has shown an association in older men between high testosterone and increased all-cause mortality when compared to those with mid-range testosterone levels [56], although causality or indeed reverse causality are not established.
In conclusion, This model provides the reference ranges needed to support research and clinical decision making in males who have symptoms that may be due to hypogonadism. In addition, our study suggests that instead of a gradual decline in testosterone levels in men as they age there is an increasing variation in testosterone levels in aged men with a larger population of hypogonadal males, who may benefit from testosterone therapy, but also more men with high serum total testosterone that may also be disadvantageous [56].

Supporting Information
Figure S1 Figure S1a. Model residuals for ages 3 through 11 years. The residuals are the variations in log-adjusted observed values from the log-adjusted age-related mean value predicted by the model. Figure S1b. Model residuals for ages 20 through 29 years. The residuals are the variations in log-adjusted observed values from the log-adjusted age-related mean value predicted by the model. Figure S1c. Model residuals for ages 30 through 39 years. The residuals are the variations in log-adjusted observed values from the log-adjusted age-related mean value predicted by the model. Figure S1d. Model residuals for ages 40 through 49 years. The residuals are the variations in log-adjusted observed values from the log-adjusted age-related mean value predicted by the model. Figure S1e. Model residuals for ages 50 through 59 years. The residuals are the variations in log-adjusted observed values from the log-adjusted age-related mean value predicted by the model. Figure S1f. Model residuals for ages 60 through 69 years. The residuals are the variations in log-adjusted observed values from the log-adjusted age-related mean value predicted by the model. Figure S1g. Model residuals for ages 70 through 79 years. The residuals are the variations in log-adjusted observed values from the log-adjusted age-related mean value predicted by the model. Figure S1h. Model residuals for ages 80 through 89 years. The residuals are the variations in log-adjusted observed values from the log-adjusted age-related mean value predicted by the model. (DOCX) Figure S2 Model validation. An exemplar of the 5-fold cross validation analysis is given as Figure 3 of the main text; this figure shows the remaining four cases. High test and training errors represent underfit (i.e. insufficient model parameters to accurately capture essential features of the dataset), and high test errors represent overfit (i.e. a model that will not generalise to accurately predict new data). An optimal number of model parameters is seven in all cases. (TIFF)