Exploring the Factor Structure of Neurocognitive Measures in Older Individuals

Here we focus on factor analysis from a best practices point of view, by investigating the factor structure of neuropsychological tests and using the results obtained to illustrate on choosing a reasonable solution. The sample (n=1051 individuals) was randomly divided into two groups: one for exploratory factor analysis (EFA) and principal component analysis (PCA), to investigate the number of factors underlying the neurocognitive variables; the second to test the “best fit” model via confirmatory factor analysis (CFA). For the exploratory step, three extraction (maximum likelihood, principal axis factoring and principal components) and two rotation (orthogonal and oblique) methods were used. The analysis methodology allowed exploring how different cognitive/psychological tests correlated/discriminated between dimensions, indicating that to capture latent structures in similar sample sizes and measures, with approximately normal data distribution, reflective models with oblimin rotation might prove the most adequate.


Introduction
Factor analysis, which was first introduced to analyze data from large numbers of psychological tests [1], is an important technique that provides a means of data reduction to obtain an "orderly" simplification from a group of interrelated measures. It assumes that some variables of theoretical interest cannot be observed directly, but rather it is by exploring (modeling) relationships between the measurable/observable variables that information can be obtained on the underlying smaller number of unobserved variables (also called latent variables or factors) [2]. There are two basic types of factor analysis: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA), with their applicability being based on the tests' intrinsic differences. EFA is at its core a "data-driven" (a posteriori) approach, while CFA is "theory-driven" (a priori) [3,4]. In broad terms, EFA and CFA constitute two discrete classes of factor analysis aimed to use a common factor model to represent relationships between observed variables, with a minimum number of factors. EFA aims to explore data and to provide information regarding the number of factors that best fit to the data, with each factor constituting a latent variable that underlies different variables. Although it is not always a means of testing the hypothesis, there are hypothesis tests of the resulting factor solution from EFA (for example, in the maximum likelihood method there is a goodness-of-fit test, the chi-square statistics). In the presence of a theory, the use of CFA is preferable since it allows the quantification of model fitting in multiple ways [5]. Altogether, the different methods of factor analysis first extract a set a factors from a data set, which are ordered according to the proportion of the variance of the original data they explain. While a subset of factors is kept for further analysis, others are considered as either irrelevant or nonexistent (that is, reflect measurement error or noise) [6]. Rotation of the factor axes (dimensions) identified in the initial extraction is then conducted in order to obtain simple, reliable and interpretable factors [7].
Still, despite the established need and use of factor analysis methodologies, including in neurocognitive studies, the decision between different methods is often confusing and difficult for the "average second-language" researcher on statistical methodology, rendering it difficult to decide on the appropriate technique or even where to start. For instance, many investigations into the structure of individual differences theorize in terms of latent variables, but rely greatly on PCA when analyzing the data. In fact, this constitutes a debated area because PCA is a data reduction technique and not a latent variable one, and it is a formative model and not a reflective one (conceptualizing constructs as causally determined by the observations, not the other way around) [8].
This is of particular interest in cognitive ageing research, given that the researcher is the one who has the responsibility of analyzing the cognitive data measured via neurocognitive/psychological test batteries. Often, test variables must be grouped as reflective of the overall cognitive ability across cognitive domains. Particularly, decisions must be made attending to the two broad cognitive dimensions that have emerged across studies as sensitive to age-related effects: memory and general executive function [9][10][11][12][13][14][15]. In fact, evidence on age-associated memory and executive cognitive changes are so well-established that they might be considered the baseline against which other variables are analyzed [14]. In this context, and as recently reported, overall there are many studies lacking on transparency on the decisions on methodology, often lacking on adequate reporting in the design, conduct and analysis of the experiments [16].
In this line, herein the goal was to analyze the factor structure of neuropsychological measures, both with exploratory and confirmatory methods, and also to provide relevant support for methodological decisions.

Participants
Participants (n = 1051) were randomly selected from the Guimarães and Vizela local area health authority registries. The cohort was representative of the general Portuguese population with respect to age and gender. All participants still resided in the community (communitydwellers), the majority was retired (n = 763, females 51.8%) and in the medium socio-economic stratum (61.6%, females 47.3%; Graffar measure [17]). Sample characteristics are presented in Table 1.
The study was conducted in accordance with the Declaration of Helsinki (59th Amendment) and was approved by the national ethical committee (Comissão Nacional de Protecção de Dados) and by the local ethics review boards (Hospital Escola Braga, Braga; Centro Hospitalar do Alto Ave, Guimarães; and Unidade Local de Saúde do Alto Minho, Viana-do-Castelo/ Ponte-de-Lima). Potential participants were explained the study goals and the neurocognitive evaluation. All volunteers provided written informed consent.

Neurocognitive assessment
Tests were selected to provide cognitive (general cognitive status and executive and memory functions) profiles. A team of trained psychologists conducted the cognitive/psychological evaluations, constituted by the following instruments: the Mini-mental state examination (MMSE) [18], which is the most widely used cognitive mental status-screening test and assesses orientation, word recall, attention and calculation, language and visual-construction abilities; the Digit Span test [19][20][21][22], used as a measure of short-term memory, working memory and attention; the Stroop test [23] to test for the ability to resist to interference and to assess cognitive flexibility and inhibitory control; the Selective reminding test (SRT) [24], to evaluate verbal learning and memory through the parameters long-term storage (LTS), consistent-term retrieval (CLTR) and delayed recall; and the Controlled Oral Word Association Test (COWAT-FAS) [25], which is a measure of verbal fluency. All neurocognitive test scores were converted into z scores to express all variables in the same scale.

Exclusion criteria
Participants that met the established MMSE criteria for cognitive impairment were excluded from the sample (n = 3, 0.3%) [26] Furthermore, following a very conservative approach, individuals with one (n = 31, 2.9%) or more (n = 336, 32.0%) missing values in the neuropsychological test battery were also excluded [filling the requirements for an appropriate strategy according to (Rubin, 1976)]. The remaining participants (n = 684) were equally allocated at random into two groups: one for EFA and PCA, to investigate the number of factors underlying the neuropsychological variables (group termed "EFA/PCA"); the second to test the "best fit" model via CFA (group termed "CFA"). The sample sizes were appropriate to conduct the described statistical procedures [27], and Stevens [28] recommendations were met (ranging from 5-20 participants per scale item). Replicability analysis was conducted via internal replication (splits a single data set into two samples via random assignment). The participants were randomly assigned into the two groups. This was a calibration and validation samples strategy. Both samples remained representative of the initial study population for all the socio-demographic measures considered, except regarding literacy rate (99.4%, able to read and write, in both groups).
Subsequently, participants were considered outliers if in any of the neurocognitive variables a z-score>|4| was obtained (n = 9 participants in both samples; representing 2.6% of each sample). A conservative approach was followed; to obtain variables with approximate normal distribution, skewness and kurtosis values were considered in this decision (skewness value < |3| and kurtosis < |8|), following Kline [29] reference values. In EFA several factor analysis extraction methods are used, but some controversy is still observed. Whereas some argue for severely restricted use of components analysis in favor of a true factor analysis method, others argue that there is almost no difference between these, or even that the former is preferable (see references in favor of the different views in review, [30]). Briefly, as supported by Fabrigar and colleagues [31], maximum likelihood (ML) [32] and principal axis factoring (PAF) yield the best results. Specifically, if data are relatively normally distributed, ML is the best choice, while if the assumption of multivariate normality is "severely violated", the use of principal factor methods is recommended. Despite significant controversy in the field over the equivalence between the factor analysis and PCA, PCA remains a highly popular technique and its use has been supported based on similarity in results between the techniques and gains in information [31]. Two main types of rotation are used: orthogonal and oblique, assuming that the factors are uncorrelated or correlated, respectively. Most of the rationale for rotating factors comes from Thurstone [33] and Cattell [34] for deciding on simple structure (a goal of rotation methodology if to measure a single construct) [35]. It is, therefore, important to note that if allowing for the possibility that an instrument measures multiple constructs, simple structure would not be the goal [36]. An orthogonal rotation is specifled by a rotation matrix, the rows stand for the original factors, the columns for the new (rotated) factors and at their intersection the cosine of the angle between the original axis and the new one is formed. Four rotation methods are listed by Gorsuch [37]. The most popular orthogonal rotation technique is varimax [38]; where, a linear combination is desired so that the variance of the loadings is maximized. In oblique rotations the new axes can take any position in the factor space (both orthogonal and oblique rotations are performed in a "subspace" referred to as the factor space); however, the degree of correlation allowed among factors is normally small (this is because two highly correlated factors can be better interpreted as only one factor). Gorsuch [37] lists 15 different oblique rotation methods that, in general, are almost always interpreted by looking at the correlations between the rotated axis and the original variables and are interpreted as loadings. From an oblique rotation two different matrices are obtained that can (and should both) be used for interpretation: the structure matrix, which holds the correlations between each variable and each factor (same as with orthogonal rotations); and the pattern matrix (factor loadings), which holds the beta weights to reproduce variable scores from factor scores. Among oblique rotations, direct oblimin [37,39] is the more generally used method. This rotation allows for the definition of the magnitude of the correlation by the researcher. This can be particularly useful if there is theoretically-based knowledge concerning the degree of correlation between factors.
Given this information, for the present dataset it is thought that the more appropriate extraction method is ML with oblimin rotation. However, so to avoid a priori assumptions, here, three different dimensions/components extraction methods were tested: ML, PAF and PCA. For each method, two rotation methods were tested: orthogonal (varimax) and oblique (oblimin). In all methods all neurocognitive measures were considered for the initial analysis. In modeling, in the reflective models, the parameters SRT intrusions and COWAT F-A-S non-admissible were excluded due to very small initial communality values (.034 and .047, respectively), in both rotation methods models. Furthermore, for the variable SRT intrusions, the measures of sampling adequacy based on the anti-image correlation were < .5 (.337). For all other measures the antiimage correlation values were > .5. In the formative model, in both rotation methods, the variable SRT intrusions was excluded due to anti-image correlation < .5 (.337), followed by exclusion of MMSE due to similar and absolute loading < .4 in two separate components.

Parallel analysis (PA)
In order to determine the number of factors to extract in EF and PC analyses, many criteria have been proposed [40,41]. Among these, the parallel analysis (PA) method [42] is used to determine the threshold for significant components, variable loadings, and analytical statistics when decomposing a correlation matrix. Specifically, it results from a modification of the Cattell's scree diagram to ease the component indeterminacy problem [43], and it can be used to statistically verify the number of factors extracted in EFA and/or PCA. For this, it requires that a set of random correlation matrices are generated using the same number of variables and participants as the experimental data used in the EFA and/or PCA procedure [41,43]. PA can use different methodologies to determine (or "confirm") the number of components to extract [44][45][46]. Some authors argue that the routine use of PA in multivariate ordination increases the confidence in the results, and reduces the subjective interpretation of supposedly objective methods, because it allows to determine which variable loadings are significant for each component, thus parsimoniously simplifying structure and reducing the analysis of noise [6].
Here, the optimal number of components factored was determined based on the original eigenvalues (raw data values !1; Kaiser criteria) and subsequently comparison with the 99 percentile, obtained using the O'Connor [47] procedure. For this, to confirm the number of factors extracted in EFA and/or PCA, PA (eigenvalue Monte Carlo simulation) multiple strategies were used including: PC/PAF/common factor analysis with random normal data generation and raw data permutation. The number of datasets considered was 2,000 for each strategy and the percentile considered was 99. Other approaches could be explored such as the Scree plot test proposed by Cattell [48] and the Velicer's MAP test Minimum Average Partial [40]. In fact, the Kaiser's eigenvalue-greater-than-one rule approach has deficiencies [49] and should be used with a confirmatory approach, which was the strategy here followed by using PA. PA is one of the procedures with increasing consensus among statisticians to typically yield the most fit to data solutions regarding the number of components/factors [47]. Furthermore, the Kaiser criteria and the Scree plot criteria are similar in the manner that both are "mechanical rulesof-thumb," while PA is statistically based.

Confirmatory factor analysis (CFA)
The most common estimation procedure in SEM is ML, which assumes multivariate normality (MVN) [50]. On this, failure to meet the assumption of MVN can lead to an overestimation of the chi-square statistic and may undermine the assumptions inherent in several ancillary fit measures; still, a mild departure from MVN is acceptable (see full references on MVN assumptions in review, [50]). However, in CFA, upon model estimates, model fit must be evaluated. Besides the mentioned chi-square goodness-of-fit test, other ancillary indices of global fit are used, including: goodness-of-fit index (GFI) and adjusted goodness-of-fit index (AGFI), comparative fit index (CFI), Tucker-Lewis index (TLI), root-mean-square error of approximation (RMSEA), and the standardized root-mean-square residual (SRMR).
CFA was applied using the ML estimation method. Fit statistics/indexes of the tested models without and with correlation errors were tested (respectively, Fit Model A and Fit Model B; the latter with correlations between the Stroop test measures). The correlation among errors were established according to modification indices (MI) higher than 11 (χ 2 0.999; (1) = 10.83). Although, Arbuckle [51] suggests MI ! 4 (χ 2 0.95; (1) = 3.84), here a MI ! 11 was used for a more conservative approach, balancing model saturation and goodness-of-fit measures. A way to roughly gauge the size of the MI is to compare the chi-square difference relative to the overall chi-square value (i.e., by what percentage will the chi-square be improved by the addition of the new parameter to the model). The MIs, should a variable be dropped or a path added, are estimated to derive the best indicators of latent variables prior to testing a structural model (this is part of the process together with factor loadings and unique variances). When modifications are made to the model after an initial analysis or multiple models are tested, different indexes should be used [52]. Fit Model A and B's Cronbach α values were .802 and .897 for F1 and F2, respectively, representing good internal consistency. Reliability coefficients of these magnitudes were within the range acceptable for psychological instruments [53,54]. Data imputation was performed using Bayesian imputation method into a single output file (based on Model B).

Construct reliability and validity
The reliability of the individual items (cognitive/psychological test variables) was assessed through regression weights (for all items λ ij ! 0.5, R 2 ! 0.25; following the Hair, Anderson [55] criteria), and the reliability of the construct (CR) through the composite reliability measure [56] (for both factors CR j ! 0.7; following the criteria).
For construct validity, different validity strategies were used: factorial (CFA and correlation between constructs); convergent [average variance extracted (AVE)]; and discriminant [comparison of the shared variance (squared correlation) between the constructs against the average of the AVEs for these; AVE j ! squared correlation between factors, following Fornell and Larcker [56]].
Two multiple linear regressions were performed, considering latent cognitive scores as dependent variables (after Bayesian imputation in a single output file and one completed dataset) and gender, age and school years (both as quantitative) as predictors.

Statistical analysis
In sum, for the exploratory step (EFA/PCA), three extraction (ML, PAF and PC) and two rotation (orthogonal and oblique) methods were used. Parallel analysis (PA) was used to determine the same number of extracted factors (PC/PAF/common factor analysis with random normal data generation and raw data permutation). The obtained model was confirmed using CFA (ML method; tests of significance and goodness-of-fit measures: chi-square, CFI, GFI, TLI, RMSEA and SMRS). Construct reliability (CR, regression weights and composite reliability measure) and validity (factorial, convergent and discriminant) were determined. The SPSS package v20 (IBM SPSS Statistics) and AMOS statistical package v21 [51] were used to conduct all statistical analysis.

Latent cognitive structure
Data analysis was structured as described in Fig 1 and neurocognitive variables descriptive statistics for the "EFA/PCA" and "CFA" groups are presented in Tables 2 and 3. In general, variables were clustered in a similar manner across the three extraction methods (ML, PAF and PC) and the rotation methodologies (oblique and orthogonal) ( Table 4). Specifically, for ML, for both rotation methods, a final ML solution with two factors (F1 and F2) was obtained. Sampling adequacy was met (KMO = .826) and there was a significant correlation between the variables (χ 2 (45) = 1341; p<0.001). Cronbach α values were .797 and .885 for F1 and F2, respectively, representing good internal consistency. As expected, variables saturated differently in each of the ML rotation solutions. Regarding PAF, a final solution with two factors (F1 and F2) was obtained in both rotation methods. Sampling adequacy, Bartlett's sphericity and Cronbach α values were the same as those obtained in ML. Variables saturated differently in each of the PAF rotation solutions. Finally, in PCA a final solution with three components (F1, F2 and F3) was obtained in both rotation methods. Sampling adequacy was verified (KMO = .800) and a significant correlation between the variables was observed (χ 2 (45) = 1214; p<0.001). Cronbach α values were .777 and .885 for F1 and F2, respectively, representing good internal consistency; the F3 Cronbach α value could not be calculated due to the Factor being composed of only one variable. As expected, variables saturated differently in each of the solutions. Regarding the oblimin rotation method, for both ML and PAF the observed correlation between factors 1 and 2, for was .519, and for PC the correlation between components 1 and 2 was-.406, between components 1 and 3 was-.146, and between components 1 and 3 was .115.
Since the goal was to explore a latent cognitive structure, the PC was excluded. Furthermore, because the two factors (dimensions) were expected to be correlated, the varimax solutions were also excluded. The ML and PAF results using oblimin rotation methodology were similar. This follows the literature indicating that although PAF is a better method to recover weak factors and that the ML estimator is asymptotically efficient, there is almost no evidence regarding which method should be preferred for different types of factor patterns and sample sizes [57].
Here, in accordance with the Kaiser's criterion, the final PA solution yielded two factors in all methods, independently from whether the normally distributed random data generation or permutations of the raw data were used.   Based on these results, the ML oblimin two-factor solution was considered the "best fit" EFA solution, particularly taking into consideration that it is a reflective model and it is based on the same estimation method as the CFA methodology. Overall, this agrees with findings indicating that although PAF is preferred for population solutions with few indicators per factor and for overextraction, ML is preferred in cases of unequal loadings within factors and for underextraction [57], which more closely resembles our data. Furthermore, the results are also in agreement with the literature indicating that in contexts such as these it is better to conduct oblique rotations [58]. In general, when not knowing if the factors are or not related it is safer to conduct an oblimin rotation as i) there is not necessarily a reason to assume that the factors are independent, and ii) this rotation offers the advantage of estimating the factors' correlations [58]. Based on the neurocognitive variables grouping, the obtained factors were termed "GENEXEC" (general and executive function; Factor 1, Table 4) and "MEM" (memory; Factor 2, Table 4

Construct reliability and validity
Regarding the regression weights, for MEM for all items λ i ! 0.5 [specifically, the smallest λ observed was .762 (R 2 = .581) for SRT delayed recall], while for GENEXEC two regression weights presented values λ i < 0.5 [Stroop words/colors, λ = .467 (R 2 = .218); Digits forward, λ = .484 (R 2 = .234)]. Construct reliability was acceptable for the two factors (CR = .902 and .799 for MEM and GENEXEC, respectively) ( Table 5). Regarding construct validity, for MEM the AVE = .755 and for GENEXEC the AVE = .368; that is, while the MEM factor presented acceptable scores (AVE ! 0.5), the GENEXEC factor did not fulfill the Hair, Anderson [55] criteria. It should be noted that the result is influenced by the items with low regression weights (Stroop words/colors and Digits forward); still, the items were kept in the analysis given that overall other reliability and adjustment measures criteria were fulfilled. Both factors, MEM and GENEXEC, presented AVE values higher than the squared correlation between factors (R 2 = 0.335), meeting the [56] discriminant criteria (Table 5). Here, it should be noted that when the average variance extracted is below 0.5 it is hard to distinguish between variance due to construct and the variance due to error of measurement.
Two multiple linear regressions were performed, considering latent cognitive scores as dependent variables, and gender, age and school years as predictors ( Table 6). Considering that age and school years have been proven to be important variables to cognitive ageing, here it was considered a good measure of results (model) validation. Both regressions were significant [for MEM: F (3,329) = 24, p<.001; adjusted R 2 = .172); for GENXEC: F (3,329) = 48, p<.001; adjusted R 2 = .299]; meaning that the three predictors explained 17.2% of the MEM latent score and 29.9% of the GENXEC latent score. Age and school years were significant predictors in both regression models. Age was negatively related with MEM (β = -0.283, p<.001) and GENEXEC (β = -0.260, p<.001) latent scores (controlling for gender and school years). School years was also significant for both regression models, although with higher impact on GENEXEC latent score (β = 0.383, p<.001) than MEM (β = 0.254, p<.001). Regarding gender, it was only a  Factor Analysis in Neurocognition significant predictor for GENEXEC (β = 0.154, p<.001), with males scoring higher in this cognitive domain. Table 7 provides z-scores on the descriptive statistics, correlations and variance/covariance matrices for the variables in the model. The correlation matrices for the variables in the various datasets, split by the exploratory and confirmatory samples, are reported in Table 8.

Discussion
Here, we investigated the factor structure of neuropsychological measures. All cognitive evaluations were conducted using validated neurocognitive/psychological instruments (selected to evaluate underlying latent cognitive constructs). Taking into consideration that analysis and findings are based on the specific sample and measures (and may not necessarily generalize to other studies), summarily, among the explored EFA/PCA models, the EFA maximum likelihood strategy, using oblique rotation, proved to be the most adequate for the dataset yielding a Factor Analysis in Neurocognition two-factor solution: "MEM" (memory) and "GENEXEC" (general and executive function). Parallel analysis (PA) determined the same number of extracted factors. The obtained model was then confirmed using CFA in a separate sample from the same original cohort, with the results indicating a satisfactory fit level for a correlated errors model (Model Fit B). Largely, internal consistency and reliability coefficients were within an acceptable range. Altogether, data analysis indicated that the obtained factors presented a moderate positive correlation between each other, which is not unexpected from a neurobiological/functional and/or structural point-of-view. For example, executive and memory functions both involve abilities such as verbal fluency, set shifting ("rule change"), attributed suppression ("interference"), temporal order, source memory, frequency estimation, working memory, free recall and recognition, that can be attributed to frontal lobe activity [59,60]. However, it is also expected that the neurocognitive variables considered would load differently among the factors, following their inherent "design" purpose (or even why they were here selected for the cognitive assessment). Specifically, the variables MMSE, Digits forward, Digits backwards, Stroop words, Stroop colors, Stroop words/colors and COWAT FAS admissible saturated into the "GENEXEC" factor; while, "MEM" was composed of the variables SRT LTS, SRT CLTR and SRT delayed recall.

Factor Analysis in Neurocognition
Executive function refers to a variety of higher cognitive processes that modulate and use information to produce behavior [19,21], including initiation or intention of action, planning, working memory, and attention. Still results do remain interesting. For instance, the MMSE was not "clearly" in one of the two dimensions. As such, a confirmatory solution (CFA model) was tested where the variable MMSE was considered as a manifest variable of both cognitive dimensions; however, the regression weight was not significant (standardized regression weight: .105, p = .128; CFA model not shown; no significant differences between models based on chisquare difference statistics, p = .134). Regarding the SRT tasks, it is further interesting that the parameter SRT delayed recall presented the lowest score in its factor grouping, which might be rooted in the SRT tasks' design characteristics (and, therefore, in the memory functions evoked by each). This means that while during the first five trials the participant has to find an efficient strategy to recall in a short time periods the presented words ("short-term storage"), in the sixth trial the participant must recall items now in "long-term storage" (SRT delayed recall). That is, the individual will be more efficient in remembering the words if throughout the initial trials he/she found a strategy to not only recall based on memory but also, based on some type of internal structure and/or association between the words and/or their meanings, find a "mnemonic" strategy that carries over to the delayed recall task.
Concerning the "GENEXEC" factor some considerations are warranted regarding the Stroop variables and their associated errors, which follow similar well-established findings in the literature. It is postulated that older subjects have fewer resources available to initiate efficient inhibitory processes in either divided attention tasks or if the task requires maintaining ("storing") a large amount of information in working memory [61]. In this line, two separable control processes are deemed necessary in the Stroop task(s): goal maintenance (reflecting the ability to maintain the appropriate task set, such as, to respond "color" and ignore "word" across trials); and, response competition (reflecting the ease with which people can select between appropriate and inappropriate competing response tendencies) [62]. As such, a loss in goal maintenance can result in rapid-response errors in which the person simply responds to the word without any influence from the potentially competing color [62]. These aspects can justify the presence of Stroop-related errors observed in the CFA model. The efficiency at which the brain supports the ability to filter relevant stimuli from irrelevant noise can be gauged by "interference" effects in performance of classic selective attention/response conflict paradigms, such as the ones presented in the Stroop naming tasks [63]. These observed effects are considered to increase with age due to selective attention impairment, albeit some authors have suggested that this could also be a result of general slowing of information processing, since this effect disappears when processing speed is controlled with statistical analysis [64]. In model modifications a further consideration is warranted: the possibility that modifications fit chance characteristics of the sample data rather than the aspects that can be generalized to other samples [65]. As such, the model with correlated errors, here the most appropriate, may not be generalized to other datasets (and/or samples or population). In fact, the correlated errors CFA is not what emerged from the EFA; as such, the Model Fit B is not a "pure" confirmatory model, but instead it also contains an exploratory element (i.e., the correlation between the error terms).
As a measure of construct reliability and validity, multiple linear regressions were performed, considering latent cognitive scores as dependent variables and gender, age and school years as predictors (age and education are particularly well established measures in the literature in their relation with cognitive performance) (for example, [66]). Specifically, from the socio-demographic perspective, here the relation between the overall cognitive factors and education (higher education, better performance), age (cognitive performance decrease with age), and gender (males better performance in general executive tasks), follows with the literature. Furthermore, another relevant aspect is that human frontal lobes are particularly vulnerable to age-related deterioration and functional decline in ageing has been related to functions that largely depend on frontal regions [59]. For instance, normal ageing has been associated with difficulties suppressing irrelevant but highly activated responses [66]. This can be seen as a result of a decline in inhibitory abilities as well as in cognitive resources available to properly perform mental tasks [67]. Still, it should be mentioned that age sensitivity is attributed to functions also independent of frontal integrity [59]. Regarding gender, here, males presented a better performance in the GENEXEC cognitive domain and no significant association was detected between gender and MEM. In fact, the results reflect what still remains a matter of debate in the literature: the relation between "gender and executive function" and "gender and memory" in ageing studies. For this reason, gender itself should not yet be considered an adequate variable to attest any type of validity.
Of note, the factor structure of neuropsychological measures was already explored in a previous paper by the group [68]. As such, inherent good background knowledge on the properties and relationships between the test measures and the factors (MEM and GENEXEC) was already present. Nonetheless, a few observations should be addressed: the previous observations consisted of a smaller cohort, where the overall aim of the analysis (PCA) was to reduce information through a linear function and force the components not to be correlated (varimax). Here, the present study covers different exploratory methods to aggregate variables (formative: PCA; reflective: ML and PAF), as well as, through CFA, confirm the structural model found in a separate sample. Hence, the aim was also to capture a latent structure based on the cognitive/psychological tests and to confirm and validate the obtained latent structure. Additionally, a second order latent variable was also tested since a .58 correlation between MEM and GENEXEC was obtained, possibly suggesting a second order latent dimension. However, because only two first order latent dimensions were present and multicolinearity concerns should not be dismissed, this option was not further explored (data not shown). Also, in any case, the model suitably restricted is just a reparameterization of the correlated factor model. Still, this possibly indicates that other cognitive domains could be explored. In fact, preliminary data by the group indicates that measures of higher processing and/or related with cognitive reserve (such as measured by the Digit Symbol Substitution Test, DSST) may be of particular interest. Finally, although EFA is a widely-used technique it does contain inherent problems mainly at the level of replication; therefore, for instance, debate over the sample size, extraction and rotation techniques to use, number of factors to extract, and whether results of an EFA can be used in a "confirmatory" fashion, do remain [69]. As such, great care should be taken in any results generalization.
In summary, herein we have tried to exemplify that even taking into account the "vastness" of the field regarding factor analyses methodology and terminology, it is possible to follow a statistically "transparent" strategy to obtain the best possible model for the set of data at hand. Briefly, a reflective approach should be conducted to extract a latent construct, instead of a formative approach, which would be more appropriate to obtain an index. Still, although here results indicated that in order to capture latent structures in neurocognitive data, reflective models with oblimin rotation might prove the most adequate methodology, care should be taken on the measurement needs of each dataset, sample or population. As such results warrant further evaluation using a new dataset and a "true" method comparison analysis would benefit from a simulation study where varying conditions are introduced.