The Effect of Paternal Age on Offspring Intelligence and Personality when Controlling for Parental Trait Levels

Paternal age at conception has been found to predict the number of new genetic mutations. We examined the effect of father’s age at birth on offspring intelligence, head circumference and personality traits. Using the Minnesota Twin Family Study sample we tested paternal age effects while controlling for parents’ trait levels measured with the same precision as offspring’s. From evolutionary genetic considerations we predicted a negative effect of paternal age on offspring intelligence, but not on other traits. Controlling for parental intelligence (IQ) had the effect of turning an initially positive association non-significantly negative. We found paternal age effects on offspring IQ and Multidimensional Personality Questionnaire Absorption, but they were not robustly significant, nor replicable with additional covariates. No other noteworthy effects were found. Parents’ intelligence and personality correlated with their ages at twin birth, which may have obscured a small negative effect of advanced paternal age (<1% of variance explained) on intelligence. We discuss future avenues for studies of paternal age effects and suggest that stronger research designs are needed to rule out confounding factors involving birth order and the Flynn effect.


Introduction
The well-established genetic influences on psychological traits such as intelligence and personality traits have attracted the attention of a growing number of evolutionary psychologists. This is because selection continues to exert pressure on heritable traits, unless they are completely irrelevant for fitness. There is evidence that neither intelligence nor personality traits are currently completely neutral to selection, but are associated with fitness components like survival [1][2][3] and reproductive outcomes [4][5][6][7]. If we make the assumption that at least somewhat similar selection processes affected these traits with some consistency during the last few thousand years we must wonder why genetic differences in these traits persist [8]. They imply the existence of maintaining evolutionary mechanisms, since otherwise natural selection would drive variants to fixation and eliminate the differences [8,9].
Here we tested the hypothesis that harmful genetic mutations that occur anew each generation might contribute to the genetic variation particularly of intelligence, which would suggest that this genetic variation is maintained by a balance of mutation and counteracting selection. To test this hypothesis, we relied on paternal age at twin birth (henceforth simply ''paternal age'') as a proxy of new mutations and used a better-controlled design than previous studies. We review the increasingly supportive evidence for paternal age as an indicator of new mutations as well as the importance of using the right controls after explaining the evolutionary genetic reasoning behind the hypothesis that mutations contribute substantially to the genetic variation in intelligence.

Evolutionary Explanations for Individual Differences
Because intelligence is regarded as an attractive trait in mates across cultures [10][11][12][13], it is plausible that higher intelligence was also preferred during recent human evolutionary history. Thus, high intelligence could be positively sexually selected, driving low intelligence to extinction (barring evolutionarily very novel impediments like effective birth control; [14,15]). There is also evidence for survival selection for intelligence in current times [3], though it is of course hard to infer how the relation between intelligence and survival has varied during evolutionary history. To explain why high intelligence has not been fixated, Penke et al. [9] argued that intelligence has a large number of relevant genetic loci and thus presents a large target for mutations. Mutational target size includes loci that are not polymorphic, but whose alteration would affect the trait. It thus includes a larger number of loci than those which are currently polymorphic and might be picked up by analyses of common single nucleotide polymorphisms (SNPs). As such a target, intelligence would be under mutation-selection balance, which occurs when purifying selection removes mutations deleterious to fitness from the gene pool, but cannot outpace the occurrence of new mutations. Thus, a number of mutations persist in the population and individuals have varying ''mutation loads'' [16]. The predicted effect of genetic perturbations depends on whether the trait is under stabilising or directional selection. Stabilising selection leads to a buffering against both deleterious and beneficial changes (robustness), whereas under directional selection leads to a higher evolvability or responsiveness to perturbations [17]. Higher robustness would imply smaller effects of new mutations [18]. Brain size was previously held to be a good evolutionary proxy of intelligence [19]. The brain, as a bone-encased organ, however, may have been under predominantly stabilising selection due to anatomical and developmental constraints, while intelligence is still often thought to have been under directional selection [20]. Therefore, we predict that indicators of mutation load should be negatively related to intelligence, but not so much to brain size.
Penke et al. [9] contrasted personality traits with intelligence to show that they are not just distinct due to convention or methods, but a product of different selection pressures. They argued that personality variation does not fit the pattern evoked by mutationselection balance and predicted only a medium mutational target size for personality variation. They suggested that this favoured balancing selection as the explanation for personality differences. Balancing selection is a class of mechanisms in which fitness effects of a trait variant differ by environment, be it spatial, social, temporal or genetic (i.e. epistasis and overdominance) [21]. Variation is thus maintained by selection of different trait levels in different environments.
Recent genome-wide complex trait analyses found that more of the genetic variation of intelligence [22][23][24][25] than of personality traits [26,27] is associated with small genetic relationships captured by common genetic variants that have high frequencies in the population and are thus unlikely to be novel harmful mutations. Despite this, some of the genetic variation remains unexplained in these traits. Furthermore, the weak signals from the common genetic markers in these studies might come from older mutations in linkage disequilibrium with the markers, or even from novel rare variants in weak linkage disequilibrium but with strong effects on the phenotype [28][29][30][31][32]. In fact, we know of more than 300 of rare mutations that have major effects on intellectual ability [33][34][35]. Potential participants with intellectual disability are usually excluded from research on intelligence in the normal range. Still, substantial evidence points toward mutation-selection balance [36] as a reasonable explanation for much of the genetic influence on intelligence. However, these and other molecular genetic findings [36][37][38] cast doubt on balancing selection as the main explanation for genetic personality variance. Instead, a more differentiated view of different personality domains is spreading [26,[39][40][41]. Still, in the absence of a new convincing pattern implicating a specific mechanism in personality, balancing selection may still be a viable explanation. Therefore we predict that indicators of mutation load are unrelated to personality traits.

Genetic Mutations and Paternal Age
Mutation-selection balance can occur because mutations are generally much more likely to harm the intricate system they affect than to add adaptive benefits to it [42], so the expected effect of new mutations is in the opposite direction of selection. But where do new mutations originate? To maintain mutation-selection balance, mutations need to be inherited, so they need to be germline, not somatic, mutations. Keightley's [42] estimates were in line with Kong et al. [43], who reported an average of 63.2 new mutations when comparing the sequenced whole genomes of parent-offspring trios. Keightley [42] also estimated that on average 2.2 of these new mutations per generation are deleterious (reducing fitness), which would be implausibly high if each mutation had to be eliminated through failure of the carrier to reproduce, but not if selection acts on relative fitness differences among individuals (quasi-truncation selection [44,45]).
Keightley [42] reviewed the available evidence and found that most mutations are paternal in origin, as had been suggested for a long time [46][47][48]. His finding was corroborated by Campbell et al. [49] and Kong et al. [43]. The latter reported 3.9 times higher mean single nucleotide mutations of paternal than maternal origin. Strikingly, the far-larger heterogeneity (ratio of variances = 8.8) in male mutation rates could almost entirely be accounted for by paternal age; Kong et al. [43] reported an estimated increase in paternal mutations with age of two per year. Crow [50] found single nucleotide mutations in which one nucleotide had been mis-transcribed into one of the other three to be far more common during male than female gametogenesis. Originally, the suspected reason for this was the far greater number of pre-meiotic cell divisions in sperm (35+23 * years after puberty) compared to oocytes (24) leading to an accumulation of errors with age. New data is consistent with this linear relationship, but there is also evidence for ''selfish spermatogonial selection'' (i.e. pre-meiotic selection for mutated cells) at a few loci [51][52][53][54][55]. Decay of transcription fidelity, proofreading error, or some combination of these pathways [50,56] may also be involved.
Single nucleotide mutations appear to be the most common type [57], though they do not account for the most altered base pairs per birth [51]. Unlike chromosomal aberrations, which affect the most base pairs but are unlikely to explain normal variation in the traits considered here, such as trisomy 16 and 21, they do not occur more often with advancing maternal age [42,43,58]. Like single nucleotide mutations, new copy number variants (CNV; duplicated or deleted base pair sequences) also seem to have a paternal origin bias [51] and to be associated with increasing paternal age in mouse models [59]. Molecular genetic analyses in humans seem to yield different biases for different types of CNVs, with a paternal age bias having been found for CNVs with nonrecurrent breakpoints, but not for others [51,60].
Because epigenetic insults accumulate in somatic cells during a lifetime [61], there has been speculation that paternal age effects could potentially be explained through epimutations [62,63], though erasure of epigenetic information in the germline is thought to limit if not prevent their inheritance [61].
To summarize, since paternal age at conception is linearly related to the number of pre-meiotic cell divisions, it can be used as a proxy for likelihood of new germline mutations [43].

Paternal Age and Psychological Traits
Keller and Miller [64] and Uher [65] argued that severe mental illnesses that confer strong reproductive disadvantages should owe their continued existence to pleiotropic effects of rare recent mutations. Indeed, the increased likelihood of schizophrenia in offspring of older fathers is well documented [66] and has been noted since the 1950s [67] and more recently by Malaspina et al. [68]. Reichenberg et al. [69] reported similar observations for autism, as did Frans et al. [70] for bipolar affective disorder and Lopez-Castroman et al. [71] for intellectual disability. By contrast, effects seem to be trivial or zero for unipolar depression [72]. Paternal age associations with sporadic (nonfamilial) cases of Apert's syndrome, achondroplasia, progeria and other diseases have been found consistently [65]. For autism, schizophrenia and intellectual disability, paternal age effects have recently been corroborated by exome-sequencing studies [35,[73][74][75], some of which also reported auxiliary analyses of the association of paternal age with IQ.
Searching for a stable phenotype associated with schizophrenia led Malaspina et al. [76] to examine the relation of paternal age with IQ in the general population. In a large (N = 44,175) sample of Israeli conscripts, they reported a shallow inverted U-shaped relation with IQ, which is a risk factor for schizophrenia [77]. The relation persisted even after controlling for maternal age, parental education and numerous other possible confounds. One subsequent, independent study replicated the finding in a large (N = 33,437) sample of children for several intelligence measures across three waves [78]. However, after controlling for maternal education, birth order, birth weight and family size in the same sample, Edwards and Roff [79] found many of the associations reduced to non-significance. They argued for the added controls, but Svensson, Abel, Dalman, and Magnusson [80] expressed concern that the correction for birth weight might remove a mediated effect [81,82] and make a real association look spurious. Still, the largest effect reduction resulted from controlling for maternal education. This choice can hardly be contested, because maternal education can be a proxy for heritable maternal intelligence and Edwards and Roff [79] showed that maternal education correlated negatively with father's age due to period effects on education in their cross-sectional sample. Svensson et al. [80] did not find a negative link between paternal age and scholastic achievement in adolescence either. Their sample (the largest so far; N = 155,875) comprised recent birth cohorts in Stockholm county, where delayed paternity is common. They controlled maternal and paternal education (not scholastic achievement), country of birth, parental mental health service use and graduation year (to control rising grades).
Auroux et al. [83], who had previously reported a negative association between advanced paternal age and military aptitude test scores [84], did not replicate that relation in newer data (N = 6,564) when controlling for parents' academic level, but instead found an association between lower paternal age and lower aptitude. Similarly, Whitley et al. [85] found lower IQs for children born to younger fathers after controlling for number of older siblings (N = 772). However, in the same sample they found no association between paternal age and reaction time, arguably a measure less influenced by cultural, social and educational background.
The same research groups who found negative links between paternal age and intelligence also reported associations in the same general population samples between paternal age and aspects of personality, namely poor social functioning [86] and externalising behaviour [87]. Lundström et al. [88] reported a U-shaped relation of paternal age with autistic-like normal variation in two Swedish twin samples, though Robinson, Munir, McCormick, Koenen, and Santangelo [89] could not replicate this finding in a smaller sample.
An explanation of the paternal age effects that relies on new mutations or epigenetics [63] mandates thorough control for alternative explanations. An obvious possibility is that parental personality [90] and intelligence [15] influence reproductive timing and therefore paternal age. Offspring's inherited personality and intelligence would then differ according to paternal age because of this unobserved common cause. So far, parental intelligence and personality as confounds have not been ruled out, because only proxy variables like education or socioeconomic status (SES) were available in the samples. Proxies for personality measures have not yet been controlled in any study, to our knowledge.
The effect of controlling for familial predisposition has been studied more when it comes to mental illnesses by comparing familial and sporadic cases. With continuously measured traits like intelligence and personality, we can hope to control for the parental contribution with greater precision.
Statistical controls for parental traits are still necessary when new mutations are directly quantified. However, in three recent clinical exome-sequencing studies, such controls were not possible and the reported associations with intelligence may thus have been biased: Iossifov et al., [91] and Sanders et al. (2012) [58] counted new SNPs by comparing parents' and children's exomes. They reported no links between new rare SNPs and intelligence. Sanders, et al. (2011) [92], on the other hand, reported a negative association with CNVs. Generalizability may be limited here because the children had autism spectrum disorders. In an earlier study using SNP arrays [93], rare CNV burden was found to predict intelligence in a small clinical sample. This association was not replicated in two larger, nonclinical samples [94,95]. Intellectual disability, which is excluded from most studies of IQ in the normal range (but see [96]), has been linked to new CNVs on several occasions [35,60,97]. Rauch et al. [35] estimated new SNPs to explain up to 55% of cases of non-syndromic, sporadic intellectual disability in a small exome-sequencing study.

The Present Study
We addressed several of the limitations of prior studies in a large, population-based twin and family sample. To isolate the effect of new mutations from the expected, inherited trait level, we controlled for parental intelligence or parental personality traits when assessing the influence of paternal age on these traits in the offspring. We also controlled for birth order, which was correlated with paternal age, to account for the possibility of diminishing parental investment in later-born children [98,99]. We did not need to exploit the genetic similarity of twins for the purposes of our analysis. Instead we used one randomly chosen co-twin from each pair to replicate our results. Thus we always report two coefficients pertaining to twins. Samples with detailed parental and offspring trait measurements such as this one are valuable but rare, which offsets potential problems with generalizability to singletons [100,101].
On theoretical grounds and based on previous results, we predicted a small remaining negative paternal age association with offspring intelligence after applying these controls. We also looked for paternal age associations with offspring head circumference as a proxy for brain size [20,[102][103][104]. Head circumference is highly heritable [105], but not highly correlated (about .10-.20 [19,103]) with intelligence. Because the anatomical and developmental constraints acting on head and brain size imply a buffering against mutation to be adaptive, we did not expect to find a paternal age association with head circumference.
For personality traits, on the other hand, we did not expect to replicate the association between paternal age and offspring externalising behaviour and social functioning reported by Saha et al. [87] and Weiser et al., [86] when using analogue personality traits and controlling for parental personality trait levels. In these studies proxy variables for the parental trait levels were not controlled. Therefore it cannot be ruled out that parental personality affected reproductive timing [90], which could have introduced a spurious association between paternal age and the children's personality. Absence of association after control would be consistent with the theoretical prediction that personality traits are mostly not under mutation-selection balance [9], though it would not provide direct evidence for the absence of mutationselection balance.

Sample
The sample comprised 1,898 pairs of same-sex twins (52% female; 64% monozygotic) and their parents who participated in the intake assessment of the Minnesota Twin Family Study (MTFS), an ongoing population-based longitudinal study. State birth records provided the starting point to locating more than 90% of all Minnesotan same-sex twins born in the target periods spanning 1971 to 1994. Twins with birth defects and major disabilities were screened out of the sample. Less than 20% of the located families declined participation. Based on a brief survey which 80% of the decliners completed, it was possible to show that decliners were only slightly less educated (,0.3 years) and did not differ from participants with regard to self-reported mental health. At intake two thirds of the assessed twins were approximately 11 years old (born 1977-1994) and one third were approximately 17 years old (born 1972-1979). Like the population of Minnesota in the periods of their births, the twins predominantly (over 95%) had European ancestry. Iacono, Carlson, Taylor, Elkins, and McGue [106] and Iacono and McGue [107] described the recruitment process and the characteristics of the sample in more detail. The 11-year-old cohort was enriched for twins showing antisocial behaviours by recruiting pairs in which at least one showed symptoms of attention-deficit-hyperactivity disorder or conduct disorder [108]. About 11% of participants were recruited in this way; we refer to them as the ''enrichment sample''. Neither attention-deficit-hyperactivity-disorder [109] nor conduct disorder [110] has been linked to paternal age.

Ethics Statement
The University of Minnesota's institutional review board approved the collection of the data used in this study. Twins gave written informed assent and parents gave written informed consent.

Measures
Twins' birth dates were available from state records. Their zygosity was assessed based on the consensus of several methods and serological analyses in case of disagreement. In the intake phone survey, the mother reported the father's birth date and education, the twins' birth weight, any birth complications and whether the twin birth had been full-term or by how many weeks it had been early or late. If the father had taken part in the intake assessment, we used his self-reported birthdate and education data instead. We considered using the mother's report on how many weeks the birth had been early or late to derive the paternal age at conception, but decided against it because the information on gestational age was often missing and the computed paternal age at conception correlated perfectly with paternal age at birth.
The 11-year-old-cohort of twins was assessed at intake using an abbreviated Wechsler Intelligence Scale for Children -Revised (WISC-R). It comprised two verbal (Vocabulary and Information) and two performance (Block Design and Picture Arrangement) subtests, which had been selected to maximize the correlation (.94 [111]) with the full WISC-R. The 17-year-old-cohort and the parents were assessed using the same subtests of the Wechsler Adult Intelligence Scale-Revised (WAIS-R). Altogether, 1,531 families had complete IQ data, due mostly to missing paternal data (see Table 1 for ns for each family member as applicable for our analyses).
Both age cohorts completed the eleven primary scales of the Multidimensional Personality Questionnaire (MPQ) only after turning seventeen. Their parents completed the questionnaire at intake (ns = 1,109 families with complete data for the superfactors, n = 1170 for Absorption). The MPQ primary scales can be aggregated into three superfactors (Positive Affectivity, Negative Affectivity, and Constraint) plus an Absorption factor, which measures a person's proneness to experience imaginative and altered states. Positive Affectivity comprises the scales Well-being, Social Potency, Achievement and Social Closeness. Negative Affectivity contains the scales Aggression, Alienation, and Stress Reaction. Constraint consists of Control, Traditionalism, and Harm Avoidance [112]. A joint factor analysis by Church [113] of Tellegen's personality model with the popular Big Five model revealed no gaps in coverage of either instrument in comparison with the other.
Head circumference (n = 1,225 families) was measured during the intake assessment.

Statistical Analyses
We fitted structural equation models (SEMs) using Mplus version 7 [114] with a robust maximum likelihood estimator. Compared to standard multiple regressions, full information maximum likelihood (FIML [115]) allowed us to use all available data as opposed to e.g. just the 64% complete families for the third intelligence model due mostly to missing information on the covariates paternal intelligence and birth complications. (see also Table 1) By using latent variables we were able to estimate comparable regression coefficients, indicating expected change in outcome in standard deviation units per decade of paternal age, across outcomes with different reliabilities.
We fitted two separate but analogous chains of models for intelligence and head circumference in one chain, and MPQ personality traits in the other. In the intelligence models we let the residuals of the verbal subtests Vocabulary and Information and those of the performance subtests Picture Completion and Block Design correlate. We allowed subscale residuals to correlate between the twin-pairs to allow for similarity greater than expected on the basis of the latent factors. We also let the Absorption factor correlate within the superfactors Positive and Negative Affectivity. A simplified model for IQ can be seen in Figure 1.
In the first model, we included methodological controls, namely child's sex and age at testing, to decrease residual variance and increase predictive power, and zygosity to control for correlated prenatal factors. The main predictor was paternal age at birth in days. In the second model we added parental trait levels. From the second intelligence model on we also added parental years of education as auxiliary variables, to improve the FIML estimation of missing intelligence data for parents. To compare our methods with previous studies we estimated models controlling only for either parental education, or intelligence or both. In the third model we added the number of older non-co-twin siblings (i.e. birth order), birth weight and birth complications as further controls.
For all analyses we chose one twin from each pair at random and then replicated the result with the co-twin data, both resulting coefficients are reported for all central results. We also modelled quadratic trends emulating Malaspina et al.'s [76] analyses, and cubic trends for paternal age as suggested by Crow [44].
Furthermore, we examined associations with the primary scales of MPQ personality using multiple regressions and verbal and performance intelligence using a SEM. We ran analyses with and without the enrichment sample as well as split by sex.
Complete, reproducible reports of the analyses have been made available online at http://openscienceframework.org/project/ wLrZF/wiki/home.

Sensitivity Analysis
Including the enrichment sample made some results reach the conventional level of statistical significance but did not change the pattern of results in a noteworthy manner, so we opted for including it to reach higher power. We did a power analysis with G*Power 3 [116] to compute our study's sensitivity at a power of 95% and a Type I error probability of 5% to estimate the upperbound effect size that could be detected. Our sensitivity analysis using the ns of complete cases indicated that we would be able to find paternal age effects if they explained at least 0.85% of the variance of IQ, 1.06% of head circumference variance, and 1.30% of MPQ personality superfactor score variance. Sensitivity in the FIML analyses would be higher, though we did not perform the simulations necessary to give a precise estimate.

Descriptive Statistics and Model Fit
The average paternal age at twin birth was 30.15 years (SD = 5.54, range = 15-53) and fathers were born between 1925 and 1977. Mothers were born about 2 years later on average and twins were born on average in 1982. Mothers reported birth complications for 51% of all twin births and an average birth weight of 2587 grams (SD = 563). Twins averaged slightly less than one older and one younger sibling, though few had both older and younger siblings. Parents averaged about 2 years of post-high school education and IQs very slightly above average; twin IQs were similar. Descriptive statistics for the other main variables can be found in Table 1. Parent-offspring correlations for IQ (rs = .39) and head circumference (rs = .24-.28) were moderate but somewhat lower for MPQ personality (rs = .10-.21). Correlations between mothers' and fathers' traits were similar (IQ: r = .34; MPQ: rs = .15-.21), except for head circumference, which was effectively zero (r = .04).
Model fit according to root-mean-square error of approximation (RMSEA; both for baseline and full model) and standardized root-mean-square residual (SRMR) are reported in Table 2. Model fit according to x 2 was always violated owing to the large sample. The measures of close fit for the intelligence models exceeded recommendations by Browne and Cudeck [117]. The MPQ models' fit can still be regarded as reasonable for a parsimonious model because we did not model cross-loadings that were not part of the theoretical factor structure.

Main Results
Overall, we found no robust evidence for paternal age associations with offspring intelligence, personality or head circumference (see Table 3 and Figure 2). The regression coefficient of paternal age on offspring intelligence turned from significantly positive to non-significantly negative after controlling for parental intelligence, because both predictor and outcome We also found a positive relation between paternal age and Absorption that was marginally significant for one set of twins (p = 0.079) and conventionally significant (p = 0.038) for the other set of twins in the third model, but not without the enrichment sample. We found no statistically significant relation of paternal age with the MPQ superfactors or with head circumference.
We tested for confounding relations between paternal age and parental traits. In multiple regressions estimated in the SEMs we found positive regression coefficients for maternal (b = 0. 17 [20.11, 0.00]; quadratic b = 0.07, p = .008, 95% CI [0.02, 0.13]), but it was in the opposite direction of what we had predicted. We tested our results' robustness to leaving out covariates and other modelling decisions such as using FIML instead of multiple imputation, or imposing measurement invariance according to Raykov et al. [118]. With the exception of the aforementioned covariates birth order and parental traits in the intelligence models, this did not lead to changes in the paternal age effect size estimates.

Discussion
We did not find support for our hypothesis that higher paternal age at offspring conception, as an indicator of more new, harmful mutations, would predict lower offspring intelligence. A small positive association between paternal age and offspring intelligence turned significantly negative after controlling for parental intelligence and education, but this finding was not robust to adding birth order as a covariate, leaving out the enrichment sample, or informally correcting for multiple testing. We found small positive relations between parental intelligence and both paternal and maternal ages, plausibly indicating delayed reproduction among higher-IQ parents. Unlike Rodgers et al. [15] and Neiss et al. [119], who reported that education mediated the relation between maternal intelligence and female age at first birth, we found that parental education did not account for a significant amount of variance in paternal age over and above parental intelligence. This might indicate that paternal and maternal ages at twin birth were not representative of maternal age at first birth (the most commonly used indicator of reproductive timing). We think it is unlikely that this discrepancy reflects deeper underlying differences with regard to reproductive planning in our twin sample as twin births are not usually planned. Differential utilisation of assisted reproductive technologies (ART) would further complicate the picture, MTFS twins, however, were born at a time when ART were less common reasons for multiple births [120,121]. Unlike Malaspina et al. [76] we did not find a stronger effect on nonverbal than verbal intelligence. Confidence intervals for the two outcomes strongly overlapped, as they appear to have done in Malaspina et al.'s results as well. Unexpectedly, we found an association between paternal age and one MPQ scale, Absorption, which was marginally significant for one twin and conventionally significant for the co-twins. To the extent this association might be real, we speculate that it might reflect the well-replicated association of paternal age with offspring schizophrenia, because Absorption has been found to correlate with clinically aberrant experience [122], hallucinations [123] and Psychoticism [124]. Although a potential link with new mutations, as indicated by the parental age association, could explain why Absorption has not been found to be elevated in mostly nonoffspring kin of schizophrenia patients [125], we would only  cautiously interpret this finding in the light of the fact that the association was not robustly significant.
For the MPQ superfactors, constraint, positive and negative affectivity, we did not find any significant relations between offspring traits and paternal age, either before or after controlling for parental personality. Offspring head circumference was not significantly related to paternal age either. These results provide indirect support for our hypothesis that genetic variance in neither personality traits nor head circumference is under mutationselection balance [8,9,20].
One of the primary strengths of this study was our ability to control for parental trait levels measured with the same precision as offspring traits when using paternal age as an indicator for likelihood of new mutations. Even though our sample was smaller than those of preceding studies, we would have been able to detect some of the previously reported effect sizes had they been present (e.g. Malaspina et al.'s 2% incremental variance explained [76]), so these early reports may have overestimated the effect size. However, some reported effects would probably (variance explained was rarely reported in previous studies) have been too small for us to detect and we cannot have too much confidence in power estimates derived from previous studies that probably suffered from varying degrees of omitted variable bias. Most importantly, neither the relation with intelligence nor the relations with constraint, positive or negative affectivity were significant. Lack of constraint coupled with negative affectivity is similar to externalising behaviour [126], which Saha et al. [87] reported to be related to paternal age. Positive and negative affectivity are related to social functioning [127], which Weiser et al. [86] found to be associated with paternal age. Despite the smaller size of our sample, we were able estimate the upper effect size boundary when controls for parental trait levels were in place, and can say with some confidence that true effects would not explain more than 1.3% of variance.
With samples in which less variation is accounted for by nongenetic components (e.g. shared-environment), we would expect a paternal age effect attributable to mutations to explain more variation and thus to be more easily detected. This could for example be the case in samples with older offspring [128,129]. However, higher heritability does not imply that it will be easy to detect individual causal genes.
A large effect of paternal age on intelligence would have been consistent with a detrimental burden of new mutations coming from older fathers and would have thus raised the question why selection has not led to early reproduction (or even ''andropause'', i.e. a complete cessation of male reproductive ability in late life) in men. It would also have indicated selective pressure for transcription accuracy. Very small effects of paternal age are consistent [51] with the hypothesis that new mutations affecting fitness are rare and have small effects on the population level (though their effects on single individuals might still be substantial).
A link between paternal age and a trait in which variation is maintained through mutation-selection balance should persist or even emerge only after controlling for parental trait levels. Parents' intelligence and personality may influence both their reproductive timing and their children's traits, thus constituting an unobserved common cause of both paternal age and offspring traits. If we assume that the mean time at which the parents had the twins was representative of their mean overall reproductive timing (we were unable to test this beyond showing that parental intelligence was unrelated to twins' number of older or younger siblings), parents with higher IQs delayed reproduction compared with those with lower IQs in our sample. This is the most likely reason that the association between paternal age and offspring intelligence turned from positive to negative when controlling for parental intelligence, an effect that was not apparent when controlling only for parental education, as has been done in previous studies. Because controlling for education led to regression coefficients whose confidence intervals overlapped with those of the first and second models' and because our results regarding the importance of education to reproductive timing differed from previous studies' [15,119], we recommend controlling for both in samples in which fertility may be influenced by personality [4,130,131] and intelligence [15]. In particular, associations with reproductive timing have not yet been demonstrated in a sufficiently wide range of samples which differ with regard to family planning. At least in our sample we did not find any noteworthy changes in the regression coefficients of paternal age on personality when adding parental trait levels as covariates. Thus, it may be possible to assess effects of paternal age on personality in simple cross-sectional samples without having to account for the indirect path through the common cause parental personality. An interaction between societal factors leading to delayed reproduction and IQ might explain that results differed in previous studies, though they used similar controls and methods. In the case of Auroux et al. [83,84] a largely overlapping research group working with French military recruit data found a negative effect of increasing paternal age on IQ, but could not replicate it in more recent data. If the societal trend towards delayed reproduction in industrialised countries [132] were accelerated in people with higher IQs, parental IQ, as an unobserved common cause in previous studies, would have suppressed the path from paternal age to offspring IQ more in studies of more recent lower-fertility cohorts. Saha et al. [78] and Malaspina et al. [76], who reported a negative association between paternal age and offspring IQ, analysed samples from populations with high average fertility. Average fertility rates in the USA and Israel were 3.6 and 3.8, roughly double those in France, Sweden, the United Kingdom and Minnesota (1.6 to 2.2; national fertility rates at the time of data collection from [133]; Minnesota fertility rates from [134]) in the respective birth cohorts of the studies that did not report negative associations [80,83,85]. Speculatively, any bias resulting from an effect of paternal IQ on both reproductive timing and offspring traits may have differed between these higher-and lower-fertility populations. Thus, societal fertility trends might account for differences among the studies in these different populations.
Our sample size was smaller than those of most previous studies (a fourth of Auroux et al.'s [83], a hundredth of Svensson et al.'s [80]), therefore our power to detect effects that explain less than 0.85% of variance was severely restricted. That our IQ tests were more established and comprehensive than the military aptitude tests and school grades used before can only partly compensate this. Because we analysed a rather small and homogeneous sample, our considerations regarding societal trends have to remain speculative and generalizability of results might be restricted. We also cannot know for sure whether paternal age at twin birth was representative of average reproductive timing and whether the associations we report would be replicated for single births. There is evidence against consequential mean differences in the outcomes of interest [100,135]. The relation between advanced maternal age and dizygotic twinning [136] would not, on its own, jeopardise our conclusions, though replication in a singleton sample would of course strengthen our confidence in them. The systematic differences we found for families whose fathers did not participate in the intake assessments may have decreased our chance to find significant results.
Our relative ability to detect any paternal age effects on MPQ personality as opposed to effects on intelligence may have been even lower than indicated by our sensitivity analyses, because we had less MPQ personality data, poorer model fit and less auxiliary information to estimate our models with missing data.
Because major disabilities and birth defects were thoroughly screened out of our sample, our conclusions are limited to intelligence variation in the normal range. Previous studies also conducted their analyses on either nonclinical or clinical samples, but not both. If paternal age were related to intellectual disability, but not intelligence in the normal range, effect sizes would also vary across studies according to the thoroughness of the screening procedure. The mean and variance of paternal age in our sample were similar to previous studies, but we cannot rule out that a larger number of older fathers would have boosted our explanatory power, especially if the effect were exponential.
We may also have omitted important confounding variables. Unlike previous researchers we decided against controlling for maternal age because this would have introduced high multicollinearity with paternal age (r = 0.80) and birth order (r = 0.29) and led to convergence failures. Findings of an offspring IQ increase with advancing maternal age largely relied on childrearing, maternal social background and parental psychological adjustment as mediators [137,138], for which we tried to account using parental IQs instead. Socioeconomic status was not controlled either, because we believed controls for intelligence and education to be sufficient. Positive effects of advancing maternal age, if not sufficiently controlled in our study, would have decreased our ability to identify a purported effect of advanced paternal age.
We hope future research on paternal age effects on intelligence will benefit from the debate about the effect of birth order on intelligence [98,99,139]. It seems possible to disentangle birth order and paternal age, because they generally have only moderate correlation across families. Some interpretations of the birth order variable (e.g. tutoring by siblings, or decreased paternal investment when multiple children are born in short intervals) would not be consistent with an effect of accumulated germline mutations, but e.g. decreased paternal investment in later-born siblings would be. Many of the challenges that emerged in birth order research apply to paternal age research as well. One example is the debate over whether birth order is also related to decreased intelligence within families [98]. If constant differences between families (e.g. parental intelligence) which are related to both their reproductive decision-making and mean offspring intelligence drive paternal age effects, they would be found between families, but not within. Such effects would not be indicative of new mutations and thus spurious in the context of our research question.
Of course within-family findings are not beyond reproach either [140]. For example, families may decide to have more children after their economic situations improve, allowing them to provide better environments for their later-borns. Additionally, withinfamily research may suffer from limited variance in paternal age, because most women do not have children across their whole reproductive lifespan in industrialised countries [141] and because fathers can only have children across their whole reproductive lifespans if they find younger partners after their original partner has gone into menopause.
Paternal age and birth order have the same rank-order within many ''traditional'' families, making it very difficult to compare their predictive accuracy within families in all but the largest samples. To break up this confound, the variable birth order could be substituted by direct assessments of the constructs for which it is supposed to be a proxy: differential parental investment, sibling tutoring and so forth.
Similarly, whole-genome-and exome-sequencing studies of families, which allow for counting new mutations by comparing the genomes or exomes of parents and children, have to control the various factors, especially parental trait levels, that might influence reproductive timing and thereby new mutation incidence. For example, Iossifov et al. [91] found that ''likely genedisrupting mutations'' predicted autism, but were not related to intelligence in an exome-sequencing study of 343 families with children on the autism spectrum and their unaffected siblings. Inherited subsyndromal autism was an unlikely confound for the autism finding, because they employed a simplex sample (i.e. no relatives with autism spectrum disorders). However, parental intelligence was not controlled. The same concern applies to Sanders et al.'s [58,92] results, which implicated new copynumber but not single nucleotide variants in intelligence in a clinical sample from the same population, the Simons Simplex Collection. Most importantly, the exome constitutes only the coding 1% of the genome; plausibly more polymorphisms affecting complex, continuous traits may be found in the regulatory sequences of the genome [142]. The total contributions of new mutations to intelligence are only beginning to come into the reaches of current molecular genetic methods (especially the still expensive sequencing techniques, see [36,57,73,74]).
Another way paternal age studies can improve their estimates is by considering the insights from Flynn effect research (Flynn [143,144], reviewed by Mingroni [145]). The rise in intelligence test scores over time could mean that older parents in previous studies were also from earlier cohorts with lower test scores. Possibly, their offspring would have lower test scores as well. Malaspina et al. [76] dismissed the Flynn effect as a confound, because it had not been found to occur within families, nor to affect heritability estimates for intelligence. However, the Flynn effect has since been shown in brothers [146]. Johnson, Penke, and Spinath [147] reasoned that high heritability of a trait should not be construed as an argument against environmentally mediated secular increases: Gene-environment interactions may be revealed or hidden, depending on whether the necessary variability in the environment is present. Wicherts et al. [148] showed that measurement invariance of general intelligence was violated with respect to different cohorts, making it unlikely that the observed gains reflected latent ''real'' increases. Previous studies which used sum scores could not guard against bias resulting from changes in subtest scores rather than general intelligence by checking their results' robustness to imposing measurement invariance.
A paternal age effect could also mask a rise of test scores within families. Rodgers [149] had dismissed both the within-family Flynn and the birth order effect, arguing that neither was present in his data, even though the two effects might have cancelled each other out [145].
In fact, Sundet, Borren, and Tambs [146] have proposed changes in fertility patterns as one cause of the Flynn effect after finding that decreases in the prevalence of large families explain part of the increase in intelligence scores. They examined data on Norwegian conscripts, but they aggregated mean sibling IQ. Plausibly the actual explanatory variable is found elsewhere, at the individual level. A trend towards delayed reproduction in intelligent parents [15] and the general population [132], and thus an increase in new mutations, could be partly culpable for the reports of a slowing [150], stop [151] or even reversal [152,153] of the Flynn effect in Scandinavian countries. We might be able to explain the null effects of paternal age on intelligence in more recent analyses, in which parental intelligence was not controlled [80,83,85] by delayed reproduction among more intelligent parents, though our study raises the question whether any paternal age effect attributable to mutations exists and is substantial. Taking into account these known problems with measuring differences in intelligence over time could serve to improve future research into paternal age.
Additionally, research in more diverse populations is warranted because results from the 1000 Genomes Project Consortium [142] suggest that populations are substantially differentiated geographically with regard to low-frequency variants. The results also suggest differences in strength and efficacy of purifying selection across populations, which are highly relevant to paternal age research.
Future research on paternal age effects may benefit from the history of birth order research and employ controls for parental traits, within-family designs (eliminating between-family confounds) or pedigree analyses of paternal age effects across several generations (ruling out alternative environmental and epigenetic explanations and boosting explained variance [154]) depending on the availability of data.
Controlling for parental trait level, we were unable to show significant effects of paternal age, a proxy for new genetic mutations, on offspring IQ, head circumference, or personality traits. Parents' IQ and personality were correlated with their reproductive timing. This necessitates thorough control of parental trait levels in future studies on paternal age effects. Our sample size was insufficient to reveal very small effects, but our results can be understood as providing an upper boundary of any expected effect sizes. Reported effect sizes of paternal age on offspring personality and intelligence have been heterogeneous. So far no clear picture of the role of mutation-selection balance has emerged from these studies. More research in different populations and converging evidence may enable us to find out more about the evolutionary mechanisms that maintain genetic variance in traits like intelligence. If any paternal age effects on intelligence exist, they are probably very small. Narrowing down the precise effect size and ruling out the many possible confounds would be steps towards quantifying the contribution of de novo mutation-selection balance to intelligence and other individual differences. If other studies show paternal age effects on intelligence to be negligible but confirm the link between paternal age and de novo mutations, this prompts interesting research questions into the robustness of the highly polygenic intelligence trait.