Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Using machine learning to understand age and gender classification based on infant temperament

  • Maria A. Gartstein ,

    Roles Conceptualization, Data curation, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing

    gartstma@wsu.edu

    Affiliation Washington State University, Pullman, WA, United States of America

  • D. Erich Seamon,

    Roles Conceptualization, Formal analysis, Methodology, Software, Visualization, Writing – original draft, Writing – review & editing

    Affiliation University of Idaho, Moscow, ID, United States of America

  • Jennifer A. Mattera,

    Roles Conceptualization, Data curation, Methodology, Project administration, Writing – original draft, Writing – review & editing

    Affiliation Washington State University, Pullman, WA, United States of America

  • Michelle Bosquet Enlow,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Boston Children’s Hospital and Harvard Medical School, Boston, MA, United States of America

  • Rosalind J. Wright,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliations Department of Pediatrics, Kravis Children’s Hospital, New York, NY, United States of America, Icahn School of Medicine at Mount Sinai, New York, NY, United States of America

  • Koraly Perez-Edgar,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Pennsylvania State University, University Park, PA, United States of America

  • Kristin A. Buss,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing

    Affiliation Pennsylvania State University, University Park, PA, United States of America

  • Vanessa LoBue,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing

    Affiliation Rutgers University, New Brunswick, NJ, United States of America

  • Martha Ann Bell,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Virginia Tech, Blacksburg, VA, United States of America

  • Sherryl H. Goodman,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Emory University, Atlanta, GA, United States of America

  • Susan Spieker,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation University of Washington, Seattle, WA, United States of America

  • David J. Bridgett,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Northern Illinois University, DeKalb, IL, United States of America

  • Amy L. Salisbury,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Virginia Commonwealth University, Richmond, VA, United States of America

  • Megan R. Gunnar,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation University of Minnesota, Minneapolis, MN, United States of America

  • Shanna B. Mliner,

    Roles Data curation, Investigation, Project administration, Writing – original draft, Writing – review & editing

    Affiliation University of Minnesota, Minneapolis, MN, United States of America

  • Maria Muzik,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation University of Michigan, Ann Arbor, MI, United States of America

  • Cynthia A. Stifter,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Writing – original draft, Writing – review & editing

    Affiliation Pennsylvania State University, University Park, PA, United States of America

  • Elizabeth M. Planalp,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation University of Wisconsin, Madison, WI, United States of America

  • Samuel A. Mehr,

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Harvard University, Boston, MA, United States of America

  • Elizabeth S. Spelke,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Harvard University, Boston, MA, United States of America

  • Angela F. Lukowski,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation University of California, Irvine, CA, United States of America

  • Ashley M. Groh,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft

    Affiliation University of Missouri, Columbia, MO, United States of America

  • Diane M. Lickenbrock,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Western Kentucky University, Bowling Green, KY, United States of America

  • Rebecca Santelli,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation University of North Carolina, Chapel Hill, VA, United States of America

  • Tina Du Rocher Schudlich,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Western Washington University, Bellingham, WA, United States of America

  • Stephanie Anzman-Frasca,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation University of Buffalo, Buffalo, NY, United States of America

  • Catherine Thrasher,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation University of Virginia, Charlottesville, VA, United States of America

  • Anjolii Diaz,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Ball State University, Muncie, IN, United States of America

  • Carolyn Dayton,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Wayne State University, Detroit, MI, United States of America

  • Kameron J. Moding,

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Purdue University, West Lafayette, IN, United States of America

  •  [ ... ],
  • Evan M. Jordan

    Roles Data curation, Funding acquisition, Investigation, Methodology, Project administration, Resources, Writing – original draft, Writing – review & editing

    Affiliation Oklahoma State University, Stillwater, OK, United States of America

  • [ view all ]
  • [ view less ]

Abstract

Age and gender differences are prominent in the temperament literature, with the former particularly salient in infancy and the latter noted as early as the first year of life. This study represents a meta-analysis utilizing Infant Behavior Questionnaire-Revised (IBQ-R) data collected across multiple laboratories (N = 4438) to overcome limitations of smaller samples in elucidating links among temperament, age, and gender in early childhood. Algorithmic modeling techniques were leveraged to discern the extent to which the 14 IBQ-R subscale scores accurately classified participating children as boys (n = 2,298) and girls (n = 2,093), and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779). Additionally, simultaneous classification into age and gender categories was performed, providing an opportunity to consider the extent to which gender differences in temperament are informed by infant age. Results indicated that overall age group classification was more accurate than child gender models, suggesting that age-related changes are more salient than gender differences in early childhood with respect to temperament attributes. However, gender-based classification was superior in the oldest age group, suggesting temperament differences between boys and girls are accentuated with development. Fear emerged as the subscale contributing to accurate classifications most notably overall. This study leads infancy research and meta-analytic investigations more broadly in a new direction as a methodological demonstration, and also provides most optimal comparative data for the IBQ-R based on the largest and most representative dataset to date.

Introduction

Although a number of approaches have been developed for the purpose of measuring temperament in childhood, including a variety of observational procedures and physiological techniques, parent report continues to be most widely used overall [1]. The latter is due to a number of factors, prominently among these being ease of administration and scoring as well as accessibility. Parent-report also provides descriptors of child temperament across time and situations, not just a “snapshot” of reactivity and/or regulation that can be gleaned from brief laboratory observations. Although multiple temperament theories or frameworks have been proposed, Rothbart’s psychobiological model is generally viewed as most widely accepted at this time [2]. This approach casts temperament as constitutionally based individual differences in reactivity and self-regulation, with constitutional referring to the relatively enduring biological make-up of the individual, influenced by heredity, maturation, and experience. Reactivity refers to the arousability of emotional, motor, and attentional responses, assessed by threshold, latency, intensity, time to peak intensity, and recovery time of reactions. Self-regulation embodies processes that can serve to modulate reactivity, such as soothability and inhibitory control [3].

Although temperament has often been delineated into three overarching factors of Negative Emotionality, Positive Affectivity/Surgency, and Regulatory Capacity/Orienting, more recent studies emphasize the narrowly defined component scales. This shift toward a fine-grained approach is a function of research demonstrating individual scales that belong to the same overarching factor differentially predict important outcomes (e.g., behavior problems), present with growth trajectories discrepant from the overarching factors, and contribute to temperament profiles in a manner inconsistent with the overarching factor content (i.e., scales that load onto different factors contribute to the same profile, and vice versa–components of the same factor define different profiles/classes; [47]. The Infant Behavior Questionnaire-Revised (IBQ-R) designed to provide indicators of infant temperament comprises 14 fine-grained scales: Activity Level, Smiling/Laughter, Approach, High Intensity Pleasure, Perceptual Sensitivity, Vocal Reactivity, Fear, Distress to Limitations, Sadness, Falling Reactivity, Duration of Orienting, Soothability, Cuddliness/Affiliation, and Low Intensity Pleasure, and is the focus of this investigation.

Development of temperament and age differences

Manifestations of temperament transform over development, with rapid change during infancy [8]. Positive emotionality (e.g., smiling), rarely expressed during the newborn period, is observed more reliably between ages two and three months, and increases in expression throughout the first year of life [8,9]. Levels of activity, approach, distress to limitations, and fear increase throughout the first year of life as well [1014]. Anger reactions across infancy appear to follow a U-shaped trajectory [12,15]. The decrease in anger responses occurring between 2 and 6 months of age has been linked to greater flexibility in attention shifting [16]. In the second half of the first year, infants are likely to respond with anger when unable to grasp an attractive stimulus that has been placed out of reach, or when a caregiver has removed a forbidden object. Fear generally increases throughout the second half of the first year of life [10,1214], with inhibition of approach toward novel and/or intense stimuli “coming online” [14,17].

The developmental course of attentional orienting has been described as U-shaped in the first year of life [18]. Carranza and colleagues [12], for example, noted decreases in Duration of Orienting between 6 and 9 months, followed by an increase between 9 and 12 months. Toward the end of the first year, skills associated with the development of the executive attention system may contribute to the flexibility of orienting reactions [1921]. Infants also gain communication skills rapidly during the first year of life [22,23], and thus exhibit greater vocal reactivity over time.

With respect to age/developmental differences discerned via the IBQ-R, older infants obtain higher scores on Approach, Vocal Reactivity, High Intensity Pleasure, Activity Level, Perceptual Sensitivity, Distress to Limitations, and Fear, whereas younger infants’ scores are higher for Low Intensity Pleasure, Cuddliness/Affiliation, and Duration of Orienting [24,25]. More recent longitudinal investigations provided further evidence of increases in Fear across the first year of life [5,26], also noting increases in Distress to Limitations and Sadness, albeit not always linear in nature. Falling Reactivity was associated with a quadratic trajectory, with increases followed by declining values later in infancy. Increasing trajectories were noted for attributes associated with Positive Affectivity/Surgency, with trends toward greater Activity Level, Smiling and Laughter, High Intensity Pleasure, Approach, Perceptual Sensitivity, and Vocal Reactivity later in infancy. Growth modeling provided evidence of nonlinear changes in Duration of Orienting, Soothability, Cuddliness, and Low Intensity pleasure, wherein initial growth in values was followed by decreases later in infancy [5]. These findings are largely consistent with prior research relying on different measurement approaches. Although the data examined in this study are cross-sectional in nature, earlier longitudinal evaluations are informative as their results speak to the importance of age in shaping temperament presentations, and vice versa–temperament features as predictors of infant age. It should be noted that no study to date has explored the latter, that is, used temperament dimensions to classify infants with respect to their age, likely due to sample size limitations and only recently available methodological advances in empirically based classification techniques.

Gender differences in temperament

Although a number of gender differences in temperament have been reported for older children and adults, fewer exist for children younger than one year of age [8,25,27,28]. Differences in infancy have been limited to activity level and fear/behavioral inhibition. Higher activity level and approach is evident in boys [29,30], with girls exhibiting greater hesitation in approaching novel objects [14,31]. Campbell and Eaton [29] applied meta-analytic procedures to summarize 46 studies addressing activity level in infancy, estimating the size of the gender difference at 0.2 standard deviations based on objective measures (parent-report measures estimated the difference to be smaller). Gender differences in approach-withdrawal have been reported for samples from different countries [30,3234], with parents rating boys higher in approach. Martin et al. [31] reported a large and significant gender difference for distress to novelty, with 6-month-old girls receiving higher scores.

Gender differences also have been documented with the IBQ-R, as boys received higher scores on Activity and High Intensity Pleasure, and girls higher scores on Fear [24,25,35,36]. Infant gender also predicted intercept values of Fear trajectories, with girls demonstrating higher levels at 4 and 6 months [5,26]. Girls also started out at lower values (i.e., intercept estimates) for Activity Level, Approach, and High Intensity Pleasure. Similar to age/developmental differences research, gender-related temperament studies have only compared temperament for boys and girls, not considering gender classification based on temperament features. Importantly, age- and gender-based temperament distinctions have not been considered jointly, discerning whether age-related changes inform gender differences.

Present study

In this study, we leveraged IBQ-R data collected across multiple laboratories (N = 4,438) to further investigate age and gender differences in infancy, addressing yet unanswered questions. Specifically, algorithmic modeling techniques were used to discern the extent to which the 14 IBQ-R subscale scores (referred to as features) accurately classified participating children as boys (n = 2,298) or girls (n = 2,093; 47 children were missing gender data) and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779), because of previously noted gender-based variability [14,2934] and significant developmental differences among these age groups (e.g., with respect to brain growth and maturation; [37,38]). This study addresses an important gap in research, being the first to consider temperament attributes as determinants of age and gender groupings, quantifying the extent to which early reactivity and regulation provide the features necessary for accurate prediction. Importantly, this work also allows for simultaneous classification of age and gender categories, providing an opportunity to consider the extent to which gender differences are informed by infant age, and to our knowledge, this is the first to study to do so. That is, despite prior demonstrations of reliable age and gender differences in temperament, the two classifications have not been considered jointly, examining whether gender differences were age dependent in a single investigation. Moreover, this effort provides a new direction for infancy and temperament research, serving as a methodological demonstration of machine learning applications, not yet utilized in these areas of scientific inquiry. This meta-analytic data driven effort is the first to rely on advanced machine learning techniques using temperament features to classify infants into age and gender groups, rather than compare temperament of children who vary in age and gender, considering these classifications simultaneously. This cross-laboratory effort also overcomes prior limitations associated with small samples that were not representative, producing results circumscribed in terms generalizability.

Materials and methods

Measures

The Infant Behavior Questionnaire-Revised (IBQ-R; [24]. This parent-report measure of temperament was developed for infants between 3- and 12-months of age. The IBQ-R contains 191 items, which yield 14 scales: Activity Level, Smiling/Laughter, Approach, High Intensity Pleasure, Perceptual Sensitivity, Vocal Reactivity (loading onto Positive Affectivity/Surgency); Fear, Distress to Limitations, Sadness, Falling Reactivity (Negative Emotionality); Duration of Orienting, Soothability, Cuddliness/Affiliation, Low Intensity Pleasure (Regulatory Capacity/Orienting). Individual items are rated on a 7-point scale reflecting the frequency of occurrence of the behavior in the past week (two weeks for less frequent events, such as encounters with unfamiliar settings/adults). Reliability of the IBQ-R has been supported for mothers and fathers, as well as samples from different cultures, with Cronbach’s α values ranging from .77 to .96 [3941]. Evidence supports the predictive and construct validity of IBQ-R scores [4244]. Cronbach’s α values for the 14 subscales included in the current analysis, derived from 29 datasets, ranged from .74 to .89 (mean α = .82). These temperament features were used to classify children into gender and age categories via Machine Learning algorithms.

Procedure

Data sets (N = 29) were acquired by emailing researchers who requested the IBQ-R or published research using the instrument between 2006 and 2019. All of the researchers had received approval from their respective Human Research Protection Programs (HRPPs)/Institutional Review Boards (IRBs) prior to initiating data collection: Human Studies Committee at the Brigham and Women’s Hospital in Boston, MA and the Icahn School of Medicine at Mount Sinai in New York; IRB at Boston Children’s Hospital; Pennsylvania State University IRB; Rutgers-Newark IRB; Virginia Tech IRB; University of North Carolina at Greensboro IRB; Emory IRB; University of Washington IRB Committee D; Northern Illinois University IRB #1; Brown University HRPP/IRB; IRB of the University of Minnesota’s Human Research Protection Program; University of Michigan Health Sciences and Behavioral Sciences IRB; Health Sciences IRB at University of Wisconsin; Harvard University Committee on the Use of Human Subjects in Research; University of California, Irvine HRPP/IRB; University of Missouri IRB; IRB of Western Kentucky University; University of North Carolina at Chapel Hill IRB; Western Washington University IRB; University of Virginia IRB for the Social and Behavioral Sciences; Wayne State University IRB; Colorado Multiple IRB; obtaining written informed consent. Contributors were asked to provide item level data from the IBQ-R as well as infant age, gender, and race. For all participants, the IBQ-R was completed by the infant’s mother. See Table 1 for a brief description of the samples.

Analytic strategy

Descriptive statistics across gender and age groups were computed first (Table 2). We then constructed a model framework allowing us to assess the utility of fine-grained temperament dimensions with respect to gender and age classifications. This framework resulted in a total of five (5) model types, which included: 1) gender: boys vs. girls; 2) age groups: youngest (< 24 weeks) vs. mid-range (24 to 48 weeks) vs. oldest (> 48 weeks) infants; and gender by age group analyses: 3) boys vs. girls in the youngest age group; 4) boys vs. girls in the mid-range age group; 5) boys vs. girls in the oldest age group. Classification of infant gender within age groups allows us to determine if predictive strength of gender-based classification is more accurate for younger vs. older infants.

thumbnail
Table 2. Descriptive statistics for the temperament subscales by gender and age group.

https://doi.org/10.1371/journal.pone.0266026.t002

Established machine learning techniques, methodologically rigorous and shown to provide reliable/reproducible results, were used in this study (e.g., [45,46]). Specifically, for all models, we used repeated 10-fold cross-validation partitioning with random assignment: a training dataset including 70% of the sample, and 30% reserved as a hold-out dataset (testing) to evaluate the predictive utility of the trained models. A total of 11 different algorithms were considered for each model type, including: (1) linear discriminant analysis; (2) generalized linear modeling; (3) support vector machines; (4) K-nearest neighbor; (5) naïve bayes; (6) classification and regression trees; (7) C5.0 classification; (8) bootstrapped aggregated trees; (9) ensembled decision trees (Random Forest; [47,48]); (10) gradient boosting; and (11) multi-class adaptive boosting (AdaBoost). These algorithms were chosen based on their applicability and widespread use in the classification modeling literature [45,46], and in order to achieve most robust and replicable results discernable across multiple modeling techniques. The aforementioned models were then compared to discern the most effective classification of infant gender and age with temperament features based on misclassification rates, Cohen’s kappa coefficients, and sensitivity and specificity via the area under the curve (AUC) from Receiver Operator Curves (ROC), considered as indicators of predictive accuracy.

Misclassification provides a simplistic posterior assessment of model classification based on contingency tables and is often used for initial classification and model accuracy evaluation. Accuracy indicators, reported herein, represent the inverse of misclassification rates. Cohen’s kappa coefficient assesses reliability of categorization, which incorporates chance agreement, is normalized, and can range from -1 to 1. Kappa values will typically be lower than overall misclassification indictors, as it represents a more conservative estimate given its assessment of accuracy compared to random assignment. The area under an ROC curve (area under the curve, or AUC) is a third metric used to evaluate the accuracy of binary classifiers, which encapsulates both Type I and Type II errors [49]. However, ROC-AUC is limited insofar as it does not take predicted probability values and goodness of fit of evaluated models into account. While all three indicators provide unique assessments of classification accuracy, overall misclassification rate (or, inversely, accuracy) is the most broadly used metric for classification evaluation [50]. For all of the model classification indices, higher values (i.e., closer to 1) can be considered superior, indicative of more optimal performance.

Results

Overall, classification accuracy was superior for age relative to gender categories, based on misclassification rates (i.e., accuracy indicators), Kappa, and area under the curve (AUC) indicators (Table 3A).

thumbnail
Table 3A. Classification effectiveness indicators across machine learning algorithms: Gender and age-based classification with temperament features.

https://doi.org/10.1371/journal.pone.0266026.t003

Specifically, across all algorithmic models, age-based classification outperformed gender-based classification for all classification outcomes.

Gender classification was performed within the three infant age groups next (Table 3B), with classification effectiveness for gender generally superior in the oldest age group (> 48 weeks). That is, oldest age group classification models consistently outperformed others based on the AUC, and this was the case for the majority of classification algorithms with respect to accuracy and Kappa indicators. Next, we focused on the AUC, especially informative in capturing differences for gender classification models across age groups because of its longstanding widespread use for comparative purposes in the machine learning classification literature [51] and visualization capabilities (Figs 13). AUC gender classification indicators were superior for the oldest age group, yielding higher values across different algorithmic models, illustrated in Fig 3.

thumbnail
Fig 1. Note: lda—Linear Discriminant Analysis; glm—Generalized Linear Modeling; svm—Support Vector Machines; knn—K-Nearest Neighbor; nb—Naïve Bayes; cart—Classification and Regression Trees; c50—C5.0 Classification; treebag—Bootstrapped Aggregated Trees; rf—Ensembled Decision Trees (Random Forest); gbm—Gradient Boosting Method; adabag—Multi-class Adaptive Boosting (AdaBoost).

https://doi.org/10.1371/journal.pone.0266026.g001

thumbnail
Fig 2. Note: lda—Linear Discriminant Analysis; glm—Generalized Linear Modeling; svm—Support Vector Machines; knn—K-Nearest Neighbor; nb—Naïve Bayes; cart—Classification and Regression Trees; c50—C5.0 Classification; treebag—Bootstrapped Aggregated Trees; rf—Ensembled Decision Trees (Random Forest); gbm—Gradient Boosting Method; adabag—Multi-class Adaptive Boosting (AdaBoost).

https://doi.org/10.1371/journal.pone.0266026.g002

thumbnail
Fig 3. Note: lda—Linear Discriminant Analysis; glm—Generalized Linear Modeling; svm—Support Vector Machines; knn—K-Nearest Neighbor; nb—Naïve Bayes; cart—Classification and Regression Trees; c50—C5.0 Classification; treebag—Bootstrapped Aggregated Trees; rf—Ensembled Decision Trees (Random Forest); gbm—Gradient Boosting Method; adabag—Multi-class Adaptive Boosting (AdaBoost).

https://doi.org/10.1371/journal.pone.0266026.g003

thumbnail
Table 3B. Classification effectiveness indicators across machine learning algorithms: Gender by age with temperament features.

https://doi.org/10.1371/journal.pone.0266026.t004

Discussion

We set out to leverage existing IBQ-R datasets from multiple laboratories (N = 4,438) to address an important gap in research by investigating age and gender classifications in early childhood, and overcoming limitations of the published studies such as small sample sizes that cannot be considered representative or provide widely generalizable results. Relying on algorithmic modeling techniques, 14 IBQ-R subscale scores served as features used to classify participating children as boys (n = 2,298) and girls (n = 2,093), and into three age groups: youngest (< 24 weeks; n = 1,102), mid-range (24 to 48 weeks; n = 2,557), and oldest (> 48 weeks; n = 779). Importantly, this approach allowed us to simultaneously classify infants into age and gender categories, providing an opportunity for the first time to consider the extent to which gender differences are informed by infant age. This study also makes an important contribution to the literature as a novel methodological demonstration. That is, the present application of machine learning algorithms provides a new direction for infancy and temperament research, as well as meta-analytic investigations more broadly.

Results based on accuracy indicators (the inverse of misclassification rates), Cohen’s kappa coefficients, and AUC (incorporating sensitivity and specificity parameters) demonstrated that temperament features provided superior classification of age groups relative to gender, which is consistent with the existing literature insofar as age effects have generally been more robust (e.g., not dependent on methodology; [5,26,52]). As noted, gender differences in infancy have been largely limited to activity level and fear/behavioral inhibition, with higher activity level and approach reported for boys [29,30] and greater fear/behavioral inhibition for girls [14,25,31,35,36]. These gender differences are somewhat controversial due to a lack of consensus regarding their origin (i.e., biologically based or largely a function of socialization; [53]) and questions regarding the role of parental expectations. That is, parents could rate boys and girls differently not due to actual variability in behavior but as a function of their own culturally influenced ideas about what is typical behavior in boys vs. girls. This explanation cannot be ruled out completely, although existing research suggests that gender differences are not entirely dependent on methodology (i.e., have been identified via behavioral observations along with parent report; [33,52]).

Importantly, gender classification by age groups results suggest this is most effective for the oldest age group, in line with the literature that indicates gender differences in temperament attributes become more pronounced with age [54]. Although a number of factors could be contributing to this pattern of results—accentuated gender differences in temperament with increasing age, and, conversely more accurate classification of gender with temperament features for oldest participants—socialization is often described as critical among these. The primary mechanism invoked in such explanations involves the infants’ interactional history, and is consistent with literature that indicates mothers respond differently to their sons and daughters [5559], presenting with different affordances as social interaction partners (e.g., [60]). Over time, such differences could result in divergent trajectories with respect to temperament due to differences in socialization goals/approaches for boys vs. girls. Specifically, parents may prioritize relationship orientation for daughters, but competence and autonomy for sons [6163]. These and other socialization-related pathways may be responsible for the stronger temperament-based classification of boys and girls later in infancy observed herein.

At the same time, gender is viewed as a marker for a host of sex-linked distinctions in physiological processes. For example, prenatal exposure to high levels of androgen is predictive of later behavior problems, primarily of the externalizing type (e.g., ADHD; [64]), and used to explain early vulnerability observed in boys with respect to this set of problems [65]. Postpartum biological effects are also possible, for example via testosterone increases for boys in infancy, referred to as “mini-puberty,” peaking by the second month and returning to baseline at about 6 months [66]. Sex-linked differentiation in brain structures and functions occurs with maturation, resulting in greater discrepancies with age. For example, Goldstein et al. [67] reported that the amygdala tends to be larger in males and the hippocampus larger in females (see Hines [68] for a related review).

Follow-up analyses outlining feature importance for classification models were performed for the Ensembled Decision Trees (Random Forest) to further interpretation of the observed results. Random Forest methods provide an effective mechanism for feature selection and importance using tree-based mechanisms to rank node classification via the mean decrease in gini impurity, i.e., the probability that a random sample in a particular tree node would be mislabeled using the distribution of the node sample, averaged across all trees [69]. Figures provided in Supplemental Materials (S1S3 Figs) demonstrate that while Fear was the most important feature in distinguishing boys and girls for the youngest and mid-range age group, for oldest infants, low intensity pleasure was most influential. In fact, for youngest infants (S3 Fig), all three distress-related scales (Fear, Distress to Limitations, Sadness) were of primary importance in classifying infants accurately by gender via the Random Forest algorithm. Positive emotionality and regulatory dimensions of temperament (e.g., Falling Reactivity, Approach) begin to take on greater importance for mid-range and oldest infants. Notably, certain temperament features detracted from model accuracy in classifying infants by gender (i.e., associated with lowest negative importance values), particularly Cuddliness, Vocal Reactivity, and Smiling and Laughter in the youngest age group and Smiling and Laughter, Perceptual Sensitivity, and Activity in the oldest age group. These results identify the temperament attributes that did not differentiate boys and girls effectively, and it is of interest that the list of these poorly differentiating features varied by age. When the most important features were considered for age classification and gender classification models only, Fear again emerged as the critical dimension, which is in line with the extensive literature documenting the developmental progression as well as gender differences for this domain of temperament [2,13,14,26,54].

This work is not without limitations, chief among these our reliance on a single method (i.e., parent report) in the assessment of infant temperament. Future studies should aggregate datasets providing different sources of information, including behavioral observations and physiological measures, such as cortisol reactivity, heart rate variability/respiratory sinus arrhythmia, and/or frontal alpha asymmetry ascertained via electroencephalogram (EEG) recordings. In addition, the outcomes examined in this study were limited to child gender and age. Future studies with older children should conduct classification analyses with additional dependent variables, particularly symptom and disorder classifications (e.g., clinical/subclinical/asymptomatic ADHD). It should be noted that we did not consider classification based on race/ethnicity because of a far more limited literature suggesting these differences can be discerned on the basis of temperament, and future research should examine related models, as relevant studies accumulate. Finally, the present modeling approach could be extended and potentially improved by applying ensembling modeling approaches (i.e., using multiple algorithms simultaneously), as opposed to relying on singular modeling frameworks.

This study underscores the importance of meta-analytic investigations and cross-laboratory collaborations, providing illusive answers to questions, such as those related to intersections of gender and age in temperament development, that have not been previously addressed. Because of the large cross-laboratory sample included herein, this study provides most optimal comparative data for the IBQ-R (Table 2), which has emerged as a widely used infant temperament assessment tool. Importantly, the present investigation serves as a methodological illustration for application of machine learning techniques in infancy and temperament research, as well as developmental science more broadly. Given the propensity for differing algorithmic methods to have strengths and weaknesses that may bias predictive outcomes and classification accuracy, we selected 11 established algorithmic modeling and classification techniques to quantify the most robust outcomes, simultaneously demonstrating the viability of machine learning approaches in this area of scientific inquiry. Results of this study make an important contribution to developmental temperament research, demonstrating effective age group classification on the basis of fine-grained temperament features, and indicating more effective gender classification for the older age group, with multiple implications for future mechanistic research examining potential socialization and biological contributors.

Supporting information

S1 Fig. Note.

DL–distress to limitations; Sad–sadness; PS–perceptual sensitivity; App–approach; Fall–falling reactivity; DO–duration of orienting; HP–high intensity pleasure; LP–low intensity pleasure; Act–activity level; Sooth–soothability; SL–smiling and laughter; VR–vocal reactivity; Cud–cuddliness.

https://doi.org/10.1371/journal.pone.0266026.s001

(TIF)

S2 Fig. Note.

Fall–falling reactivity; HP–high intensity pleasure; LP–low intensity pleasure; Sad–sadness; VR–vocal reactivity; App–approach; DL–distress to limitations; SL–smiling and laughter; PS–perceptual sensitivity; DO–duration of orienting; Sooth–soothability; Cud–cuddliness; Act–activity level.

https://doi.org/10.1371/journal.pone.0266026.s002

(TIF)

S3 Fig. Note.

LP–low intensity pleasure; App–approach; VR–vocal reactivity; Fall–falling reactivity; Sad–sadness; DL–distress to limitations; Cud–cuddliness; DO–duration of orienting; Sooth–soothability; HP–high intensity pleasure; Act–activity level; PS–perceptual sensitivity; SL–smiling and laughter.

https://doi.org/10.1371/journal.pone.0266026.s003

(TIF)

References

  1. 1. Gartstein MA, Bridgett DJ, Low C. Asking questions about temperament: Self- and other-report measures across the lifespan. In: Shiner M, Zentner RL, editors. Handbook of Temperament. New York, NY: The Guilford Press; 2012. p. 183–208.
  2. 2. Gartstein M.A., Putnam S.P., Aaron E., Rothbart M. Temperament and personality. In: Maltzman S, editor. Oxford Handbook of Treatment Processes and Outcomes in Counseling Psychology. New York, NY: Oxford University Press; 2016. p. 11–41.
  3. 3. Rothbart M, Derryberry D. Development of Individual Differences in Temperament. In: Lamb ME, Brown AL, editors. Advances in Developmental Psychology. Mahwah, New Jersey: Earlbaum; 1981.
  4. 4. Gartstein MA, Prokasky A, Bell MA, Calkins S, Bridgett DJ, Braungart-Rieker J, et al. Latent profile and cluster analysis of infant temperament: Comparisons across person-centered approaches. Dev Psychol. 2017;53(10):1811–25. pmid:28758787
  5. 5. Gartstein MA, Hancock GR. Temperamental growth in infancy: Demographic, maternal symptom, and stress contributions to overarching and fine-grained dimensions. Merrill Palmer Q. 2019;65(2):121–57.
  6. 6. Lengua LJ. Growth in temperament and parenting as predictors of adjustment during children’s transition to adolescence. Vol. 42, Developmental Psychology. 2006. p. 819–32. pmid:16953689
  7. 7. Oldehinkel AJ, Hartman CA, De Winter AF, Veenstra R, Ormel J. Temperament profiles associated with internalizing and externalizing problems in preadolescence. Dev Psychopathol. 2004;16(2). pmid:15487604
  8. 8. Rothbart MK. Temperament in childhood: A framework. In: Kohnstamm GA, Bates JE, Rothbart MK, editors. Temperament in Childhood. New York, NY: Wiley; 1989. p. 59–73.
  9. 9. Bridgett DJ, Laake LM, Gartstein MA, Dorn D. Development of infant positive emotionality: The contribution of maternal characteristics and effects on subsequent parenting. Infant Child Dev. 2013;22(4):362–82.
  10. 10. Braungart-Rieker JM, Hill-Soderlund AL, Karrass J. Fear and anger reactivity trajectories from 4 to 16 months: The roles of temperament, regulation, and maternal sensitivity. Dev Psychol. 2010;46:791–804. pmid:20604602
  11. 11. Buss AH, Plomin R. A temperament Theory of Personality Development. New York: Wiley; 1975.
  12. 12. Carranza JA, Pérez-López J, González Salinas MDC, Martínez-Fuentes MT. A longitudinal study of temperament in infancy: Stability and convergence of measures. Eur J Pers. 2000;14(1):21–37.
  13. 13. Rothbart MK. Longitudinal observation of infant temperament. Dev Psychol. 1986;22(3):356–65.
  14. 14. Rothbart MK. Temperament and the development of inhibited approach. Child Dev. 1988;59(5):1241–50. pmid:3168640
  15. 15. Rothbart MK. Measurement of temperament in infancy. Child Dev. 1981;52:569–78.
  16. 16. Johnson MH, Posner MI, Rothbart MK. Components of visual orienting in early infancy: Contingency learning, anticipatory looking, and disengaging. J Cogn Neurosci. 1991;3(4):335–44. pmid:23967813
  17. 17. Rothbart MK. Emotional development: Changes in reactivity and self-regulation. In: Ekman P, Davidson RJ, editors. The Nature of Emotion: Fundamental Questions. New York, NY: Oxford University Press; 1994. p. 369–72.
  18. 18. Ruff HA, Rothbart MK. Attention in early development: Themes and variations. Attention in Early Development: Themes and Variations. Oxford: Oxford University Press; 2010.
  19. 19. Posner M. I., & Rothbart MK. Attentional mechanisms and conscious experience. In: Milner MR& AD, editor. The Neuropsychology of Consciousness. London: Academic Press; 1991. p. 91–112.
  20. 20. Posner MI, Rothbart MK, Sheese BE, Voelker P. Control networks and neuromodulators of early development. Dev Psychol. 2012;48(3):827–35. pmid:21942663
  21. 21. Rothbart MK, Sheese BE, Rueda MR, Posner MI. Developing mechanisms of self-regulation in early life. Emot Rev. 2011;3:207–13. pmid:21892360
  22. 22. Reilly S, Eadie P, Bavin EL, Wake M, Prior M, Williams J, et al. Growth of infant communication between 8 and 12 months: A population study. J Paediatr Child Health. 2006;42(12):764–70. pmid:17096710
  23. 23. Leve LD, Kim HK, Pears KC. Childhood temperament and family environment as predictors of internalizing and externalizing trajectories from ages 5 to 17. J Abnorm Child Psychol. 2005;33:505–20. pmid:16195947
  24. 24. Gartstein MA, Rothbart MK. Studying infant temperament via the Revised Infant Behavior Questionnaire. Infant Behav Dev. 2003;26:64–86.
  25. 25. Sechi C, Vismara L, Rollè L, Prino LE, Lucarelli L. First-time mothers’ and fathers’ developmental changes in the perception of their daughters’ and sons’ temperament: Its association With parents’ mental health. Front Psychol. 2020;11. pmid:32063872
  26. 26. Gartstein MA, Hancock GR, Iverson SL. Positive affectivity and fear trajectories in infancy: Contributions of mother–child interaction factors. Child Dev. 2018;89(5):1519–34. pmid:28542794
  27. 27. Bates J. Temperament in infancy. In: Osofsky J, editor. Handbook of Infant Development. New York, NY: Wiley; 1987. p. 1101–49.
  28. 28. Bornstein MH, Putnick DL, Gartstein MA, Hahn CS, Auestad N, O’Connor DL. Infant temperament: Stability by age, gender, birth order, term status, and socioeconomic status. Child Dev. 2015;86(3):844–63. pmid:25865034
  29. 29. Campbell DW, Eaton WO. Sex differences in the activity level of infants. Infant Child Dev. 1999;8(1):1–17.
  30. 30. Maziade M. Infant temperament: SES and gender differences and reliability of measurement in a large Quebec sample. Merrill Palmer Q. 1984;30(2):213–26.
  31. 31. Martin RP, Wisenbaker J, Baker J, Huttunen MO. Gender differences in temperament at six months and five years. Infant Behav Dev. 1997;20(3):339–47.
  32. 32. Carey WB, McDevitt SC. Revision of the infant temperament questionnaire. Pediatrics. 1978;61(5):735–9. pmid:662513
  33. 33. Gartstein MA, Carranza JA, González-Salinas C, Ato E, Galián MD, Erickson NL, et al. Cross-Cultural Comparisons of Infant Fear: A Multi-Method Study in Spain and the United States. J Cross Cult Psychol. 2016;47(9):1178–93.
  34. 34. Hsu C, Soong W, Stigler JW, Hong C, Liang C. The temperamental characteristics of Chinese babies. Child Dev. 1981;52(4):1337–40. pmid:7318527
  35. 35. Montirosso R, Cozzi P, Putnam SP, Gartstein MA, Borgatti R. Studying cross-cultural differences in temperament in the first year of life: United States and Italy. Int J Behav Dev. 2011;35:27–37.
  36. 36. Gaias LM, Räikkönen K, Komsi N, Gartstein MA, Fisher PA, Putnam SP. Cross-cultural temperamental differences in infants, children, and adults in the United States of America and Finland. Scand J Psychol. 2012;53:119–28. pmid:22428997
  37. 37. Gilmore JH, Shi F, Woolson SL, Knickmeyer RC, Short SJ, Lin W, et al. Longitudinal development of cortical and subcortical gray matter from birth to 2 years. Cereb Cortex. 2012;22(11):2478–85. pmid:22109543
  38. 38. Knickmeyer RC, Gouttard S, Kang C, Evans D, Wilber K, Smith JK, et al. A structural MRI study of human brain development from birth to 2 years. J Neurosci. 2008;28(47):12176–82. pmid:19020011
  39. 39. Gartstein MA, Slobodskaya HR, Kinsht IA. Cross-cultural differences in temperament in the first year of life: United States of America (US) and Russia. Int J Behav Dev. 2003;27(4):316–28.
  40. 40. Gartstein MA, Knyazev GG, Slobodskaya HR. Cross-cultural differences in the structure of infant temperament: United States of America (U.S.) and Russia. Infant Behav Dev. 2005;28(1):54–61.
  41. 41. Parade SH, Leerkes EM. The reliability and validity of the Infant Behavior Questionnaire-Revised. Infant Behav Dev. 2008;31:637–46. pmid:18804873
  42. 42. Gartstein MA, Bateman AE. Early manifestations of childhood depression: Influences of infant temperament and parental depressive symptoms. Infant Child Dev. 2008;17(3):223–48.
  43. 43. Gartstein MA, Bridgett DJ, Rothbart MK, Robertson C, Iddins E, Ramsay K, et al. A latent growth examination of fear development in infancy: Contributions of maternal depression and the risk for toddler anxiety. Dev Psychol. 2010;46(3). pmid:20053005
  44. 44. Gartstein MA, Marmion J. Fear and positive affectivity in infancy: Convergence/discrepancy between parent-report and laboratory-based indicators. Infant Behav Dev. 2008;31(2):227–38. pmid:18082892
  45. 45. Prasad AM, Iverson LR, Liaw A. Newer classification and regression tree techniques: Bagging and random forests for ecological prediction. Ecosystems. 2006;9(2):181–99.
  46. 46. Kotsiantis SB, Zaharakis ID, Pintelas PE. Supervised machine learning: A review of classification techniques general issues of supervised learning algorithms. Inform. 2007;31:249–68.
  47. 47. Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–40.
  48. 48. Ho TK. Random decision forest. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition. Montreal; 1995. p. 278–82.
  49. 49. Ben-David S, Blitzer J, Crammer K, Kulesza A, Pereira F, Vaughan JW. A theory of learning from different domains. Mach Learn. 2010;79(1–2):151–75.
  50. 50. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36. pmid:7063747
  51. 51. Bamber D. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J Math Psychol. 1975;12(4):387–415.
  52. 52. Olino TM, Durbin CE, Klein DN, Hayden EP, Dyson MW. Gender differences in young children’s temperament traits: Comparisons across observational and parent-report methods. J Pers. 2013;81:119–29. pmid:22924826
  53. 53. Ruble D. N., Martin C. L., & Berenbaum SA. Gender development. In: Eisenberg N, Damon W, Lerner RM, editors. Handbook of Child Psychology: Social, Emotional, and Personality Development. New York, NY: Wiley; 2006. p. 858–932.
  54. 54. Else-Quest NM, Hyde JS, Goldsmith HH, Van Hulle CA. Gender differences in temperament: A meta-analysis. Psychol Bull. 2006;33–72. pmid:16435957
  55. 55. Golombok S., & Fivush R. Gender development. New York, NY: Cambridge University Press; 2000.
  56. 56. Lewis M. State as an infant-environment interaction: An analysis of mother-infant behavior as a function of SEX1. ETS Res Bull Ser. 1971;1971(1):i–56.
  57. 57. Lovas GS. Gender and patterns of emotional availability in mother-toddler and father-toddler dyads. Infant Ment Health J. 2005;26(4):327–53. pmid:28682464
  58. 58. Miller FG, Gluck JP, Wendler D. Debriefing and accountability in deceptive research. Debriefing Account Deceptive Res. 2012;18(3):235–51.
  59. 59. Schoppe-Sullivan SJ, Diener ML, Mangelsdorf SC, Brown GL, McHale JL, Frosch CA. Attachment and sensitivity in family context: The roles of parent and infant gender. Infant Child Dev. 2006;15(4):367–85.
  60. 60. Biringen Z, Robinson JAL, Emde RN. Maternal sensitivity in the second year: Gender‐based relations in the dyadic balance of control. Am J Orthopsychiatry. 1994;64(1):78–90. pmid:8147430
  61. 61. Chodorow N. The Reproduction of Mothering: Psychoanalysis and the Sociology of Gender. The Reproduction of Mothering. New York: University of California Press; 1978.
  62. 62. Mascaro JS, Rentscher KE, Hackett PD, Mehl MR, Rilling JK. Child gender influences paternal behavior, language, and brain function. Behav Neurosci. 2017;131(3):262–73. pmid:28541079
  63. 63. Miller SA. Parents’ beliefs about their children’s cognitive abilities. Dev Psychol. 1986;22(2):259–85.
  64. 64. Martel MM, Klump K, Nigg JT, Breedlove SM, Sisk CL. Potential hormonal mechanisms of Attention-Deficit/Hyperactivity Disorder and Major Depressive Disorder: A new perspective. Vol. 55, Hormones and Behavior. 2009. p. 465–79. pmid:19265696
  65. 65. Crick NR, Zahn-Waxler C. The development of psychopathology in females and males: Current progress and future challenges. Dev Psychopathol. 2003;15(3):719–42. pmid:14582938
  66. 66. Hines M, Constantinescu M, Spencer D. Early androgen exposure and human gender development. Vol. 6, Biology of Sex Differences. 2015. pmid:25745554
  67. 67. Goldstein JM, Seidman LJ, Horton NJ, Makris N, Kennedy DN, Caviness VS, et al. Normal sexual dimorphism of the adult human brain assessed by in vivo magnetic resonance imaging. Cereb Cortex. 2001;11(6):490–7. pmid:11375910
  68. 68. Hines M. Prenatal endocrine influences on sexual orientation and on sexually differentiated childhood behavior. Vol. 32, Frontiers in Neuroendocrinology. 2011. p. 170–82. pmid:21333673
  69. 69. Nicodemus KK, Malley JD. Predictor correlation impacts machine learning algorithms: Implications for genomic studies. Bioinformatics. 2009;25(15):1184–890. pmid:19460890