Working Memory, Reasoning, and Task Switching Training: Transfer Effects, Limitations, and Great Expectations?

Although some studies have shown that cognitive training can produce improvements to untrained cognitive domains (far transfer), many others fail to show these effects, especially when it comes to improving fluid intelligence. The current study was designed to overcome several limitations of previous training studies by incorporating training expectancy assessments, an active control group, and “Mind Frontiers,” a video game-based mobile program comprised of six adaptive, cognitively demanding training tasks that have been found to lead to increased scores in fluid intelligence (Gf) tests. We hypothesize that such integrated training may lead to broad improvements in cognitive abilities by targeting aspects of working memory, executive function, reasoning, and problem solving. Ninety participants completed 20 hour-and-a-half long training sessions over four to five weeks, 45 of whom played Mind Frontiers and 45 of whom completed visual search and change detection tasks (active control). After training, the Mind Frontiers group improved in working memory n-back tests, a composite measure of perceptual speed, and a composite measure of reaction time in reasoning tests. No training-related improvements were found in reasoning accuracy or other working memory tests, nor in composite measures of episodic memory, selective attention, divided attention, and multi-tasking. Perceived self-improvement in the tested abilities did not differ between groups. A general expectancy difference in problem-solving was observed between groups, but this perceived benefit did not correlate with training-related improvement. In summary, although these findings provide modest evidence regarding the efficacy of an integrated cognitive training program, more research is needed to determine the utility of Mind Frontiers as a cognitive training tool.


Introduction
Cognitive training is not a new concept, despite the surge in "brain training" applications that capitalize on the marketability of programs informed by "neuroplasticity" research [1]. In any activity, prolonged experience or practice leads to proficiency in that specific process, or skilled behavior. More recently, there has been increased interest in developing training programs that lead to improvement in or "transfer" to a wider array of cognitive abilities or exercises beyond the specific task trained. In the psychology literature, this line of research is coined "cognitive training" [2][3][4] and is often associated with the goal to enhance cognition or ameliorate the age-related decline of cognitive abilities such as working memory, reasoning, and fluid intelligence (Gf), abilities that have been shown to be predict performance in academic and workplace settings [5][6][7]. Developmental researchers also employ computerized training programs in hopes of improving cognitive abilities in children [8][9][10][11][12][13], including those from disadvantaged backgrounds [14] and those with learning difficulties [15][16][17][18][19].
Improvements in reasoning/Gf have been found in several studies that employ working memory training [20,21], task switching training [22], and reasoning training [14,23], while improvements in working memory are primarily found in training studies that use working memory training tasks ( [9,17,20,[24][25][26][27][28]). Although promising, several of these experiments, which were conducted on different age groups from children to older adults, face methodological shortcomings involving small sample sizes, single tests of cognitive transfer, and the lack of a comparable active control group [29][30][31]. Training-related improvement from the dual n-back working memory paradigm for example, has often not been replicated in other laboratories [32][33][34][35] (but see [36,37]). Recent meta-analyses and reviews differ in their conclusions on the benefit of working memory training and highlight the implications of the aforementioned methodological issues [38][39][40][41][42]. More broadly, computer-based training paradigms, from video games to laboratory-based regimens, yield improvement in the trained tasks but limited transfer to other related abilities, including those similar to the trained tasks [14,23,[43][44][45][46][47][48][49][50]. Thus, although behavioral and neural changes can be observed from training, these changes have not been shown to consistently translate to meaningful improvements outside of the training paradigm.
Several studies employing a multiple-task training approach, often using more complex tasks or games, show promise in engendering transfer beyond the specific trained tasks [14,[51][52][53][54][55] (but see [46,48,[56][57]). To maximize training benefits in the current study, we employ working memory, reasoning, and task-switching training tasks similar to those previously mentioned, which have shown promise in enhancing working memory and reasoning/Gf, abilities that highly overlap in the psychometric literature. We integrate six of these tasks into a mobile training platform called "Mind Frontiers," which modifies the surface features of the training tasks (i.e., their appearance) to unify them into a Wild West-themed game. All tasks were programmed to be adaptive in difficulty, and a scoring/reward system was added to the game to promote engagement for the duration of training, which consisted of 20 hour-and-ahalf-long sessions, with each game played for approximately 12 minutes.
To better attribute any training-related improvements to the Mind Frontiers program, the current study employed an active control group that also involved interaction with a mobile device and multiple adaptive training games. For the active control group, we used visual and perceptual training tasks that have been shown to produce improvements in the performance of these tasks but not improvements in working memory and reasoning/Gf tests. This included three variants of a visual search paradigm previously used as an active control task in a working memory training study [32] and three variants of a change detection task that was shown not to transfer to untrained tasks [58].
As expectancy effects are a significant issue in cognitive training studies, we used a questionnaire to assess perceived improvement and other biases that may contribute to a placebo training effect [29,30]. We also employed multiple transfer tests to allow analysis at the construct level and better generalize findings to improvement in cognitive abilities. We used a set of established measures from the Virginia Cognitive Aging Project Battery [59], which is comprised of tasks validated to assess key cognitive abilities including reasoning/Gf, episodic memory, and perceptual speed. In addition, we administered neurocognitive tests to ensure comprehensive assessment of the training effects, including multiple tests of working memory, selective attention, divided attention, and task switching.
It is to be noted that while improving reasoning/Gf abilities is a main goal of the study, we hypothesize that training with Mind Frontiers may also lead to benefits in related abilities, such as attentional control and perceptual or processing speed. As these abilities are often interrelated in the literature [59][60][61][62], we hypothesize that the Mind Frontiers group will also show improvements in "lower-level abilities" of selective attention, divided attention, and perceptual speed, especially given the speeded and game-like implementation of the tasks. Furthermore, reasoning/Gf ability has been shown to be relatively stable in young adulthood [63][64][65], whereas other skills that are also recruited in reasoning/Gf games may be more malleable or sensitive to training.

Methods Participants
Participants were recruited from the University of Illinois campus and Champaign-Urbana community through flyers and online postings advertising participation in a "cognitive training study." Pre-screening for demographic information (e.g., sex, education, English language proficiency) and game experience was administered using a survey completed over email. A few general game experience questions in the survey were embedded with other activity questions that included the Godin Leisure-Time Exercise Questionnaire [66]. More detailed information about game play experience, history and habits were queried in a post-experiment survey. Upon passing pre-screening, an experimenter followed-up with a phone interview that assessed major medical conditions that may affect neurocognitive testing. Participants eligible for the study fulfilled at least the following major requirements: (1) between 18 and 30 years old, (2) 75% right-handedness according to the Edinburgh Handedness Inventory, (3) normal or corrected-to-normal vision and hearing, (4) no major medical or psychological conditions, (5) no non-removable metal in the body, and (6) played no more than five hours per week of video games in the last six months. All participants signed informed consent forms and completed experimental procedures approved by the University of Illinois Institutional Review Board. One hundred two participants were recruited. Ninety participants completed the study and received compensation of $15/hour. Twelve individuals who dropped out or were disqualified from the study received $7.50/hour. Demographics are summarized in Table 1. More information about study procedures is available in S1 File.

Study Design
Participants completed three cognitive testing sessions and an MRI session before and after the training intervention. The MRI data will not be presented in this paper. Assessments were completed in a fixed order. Participants were randomly assigned to the Mind Frontiers training group or the active control training group. They completed four to five training sessions per week for four to five weeks, a total of 20 sessions; each session involved completing six cognitive training tasks (games) for approximately 12 minutes each. The task order was pseudo-randomized across sessions and all subjects completed the same order during each session. Following the training period, participants completed the same four testing sessions in reverse order. More details about the training protocol can be found in S1 File.

Training Protocol
All participants completed training on portable handheld devices. After the first, tenth and last training sessions, participants completed a training feedback questionnaire electronically.

Mind Frontiers
The Mind Frontiers group completed six adaptive training tasks ( Table 2 and Fig 1) in each training session. All games were programmed by Aptima, Inc. using the Unity game engine and were administered using the Samsung Google Nexus 10 tablet. Table 2 provides a summary of each game and its source from previous literature. These games were selected based on their known associations (psychometric properties, training-related improvements) with the following abilities: reasoning/Gf, working memory, visuospatial reasoning, inductive reasoning, and task switching.

Active Control
The active control group also completed six adaptive training tasks in each training session ( Table 2 and Fig 2). These included three variants of a visual search task and three variants of a change detection task. The visual search paradigm was derived from Redick et al. [32] and has been shown to not highly overlap (i.e., low correlations) with the working memory, reasoning, and task-switching abilities trained in Mind Frontiers [67,68]. The change detection paradigm was obtained from Gaspar et al. [58]. Similar to the Mind Frontiers group, the active control group also completed the tasks on a portable device, the Asus Vivotab RT. The visual search tasks were programmed in E-prime 2.0 [69] and the change detection tasks were programmed in MATLAB (MathWorks™) using the Psychophysics Toolbox extensions [70,71].

Training Feedback Questionnaire
At the end of the first, tenth, and twentieth sessions, all participants were asked the following questions about each training game and were instructed to respond on a scale of 1-10: 1) How much did you enjoy/like each game? (1 = did not enjoy/like at all, 10 = enjoyed a lot), 2) How engaging was each game? (1 = least, 10 = greatest), 3) How demanding/effortful was each game? (1 = least, 10 = greatest), 4) How motivated were you to achieve the highest possible score on each game? (1 = least, 10 = greatest), and 5) How frustrating did you find the game? (1 = not at all frustrating, 10 = very frustrating).

Cognitive Assessment Protocol
Before and after 20 training sessions, participants completed a battery of tests and questionnaires to assess cognitive function at pre-test and changes that may have resulted from training.
The tests measured a variety of cognitive abilities, including reasoning/Gf, episodic memory, perceptual speed, working memory, and attention (Table 3). Participants also completed questionnaires regarding sleep, personality, fitness, and media usage. Following the final testing session, participants completed a post-experiment survey that assessed their feedback on the cognitive training games, the strategies employed during training, gaming experience, and expectations. The majority of the transfer tasks have been extensively used in the cognitive

Mind Frontiers
Townspeople request items that belong to a certain category. There are five objects in each category, which correspond to stereotypical occupations of the "Wild West." Once the store is reached, the last item from each category must be selected. Difficulty level is manipulated by the number of requests and the number of categories.

Mind Frontiers
A 20-square grid is presented. Boxes of the grid light up in a random sequence. The sequence must be entered exactly. Difficulty is manipulated by the length of the sequence.

Mind Frontiers
Sentries lift their lanterns while saying a word of the phonetic alphabet. The current word spoken and lantern lifted is compared to the word spoken/lantern lifted n times previously. There may be an audio match, a visual match, an audio and visual match, or no match between the current sentry and the one who spoke n times ago. Difficulty is manipulated by how far back the comparison is (e.g., 1-back, 2-back, 3-back).

Mind Frontiers
Safe combinations are determined by completing the next item in a series. Series may be letter-, number-, or day/month-based, and all are governed by some pattern or rule that must be determined and applied to select the next item in the series. Difficulty is manipulated by the difficulty of the patterns and the number of problems to solve within the given time limit.
Inductive Reasoning [23] Irrigator (reasoning) Mind Frontiers Irrigation pipelines are built from a water source to wells using individual pieces of pipe. The pipe pieces available for building are randomly determined, highlighting the importance of planning and flexibly using the resources at hand. Difficulty is manipulated by the number of wells, the presence of obstructions, and the time limit.

Mind Frontiers
Items are presented that need to be sorted based on one of two binary criteria (for example, the item's category or the size of the image). The pattern in which to sort the items is presented at the beginning of the level. For instance, the pattern may be to alternate sorting by category and by size. Items are sorted by swiping them either to the right or to the left. Difficulty is manipulated by increasing the complexity of the sorting pattern.
Switching without external cues [22] Visual Search Active Control A target is presented amidst distractors. The target must be identified, and the direction of the target indicated (right/left). Difficulty is manipulated by increasing the number and heterogeneity of distractors. [32] Change Detection Active Control A 3-or 5-item array of stimuli is displayed. After a brief static screen (interference), the array of stimuli is presented again with one stimulus changed. The item that changed must be selected. Difficulty is manipulated by the time available to observe the initial array. [58] doi:10.1371/journal.pone.0142169.t002 psychology literature (Table 3), so only brief descriptions are provided. More details about each task can be found in S1 File.

Reasoning, perceptual speed, episodic memory
Except for i-Position, the tests below were obtained from the Virginia Cognitive Aging Project Battery [59], and two different versions were used for pre-and post-testing, with the sequence counterbalanced across subjects. Shipley Abstraction: Identify missing stimuli in a progressive sequence of letters, words, or numbers. Number of correctly answered items within five minutes is the primary measure.
Matrix Reasoning: Select the pattern that completes a missing space on a 3 x 3 grid. Number of correctly answered items is the primary measure. Reaction time on correct trials was also analyzed.
Paper Folding: Identify pattern of holes that results from a punch through folded paper. Number of correctly answered items is the primary measure. Reaction time on correct trials was also analyzed.
Spatial Relations: Identify 3D object that would match a 2D object when folded. Number of correctly answered items is the primary measure. Reaction time on correct trials was also analyzed. Effects of Working Memory, Reasoning, and Task Switching Training Form Boards: Choose shapes that will exactly fill a space. Number of correctly answered items is the primary measure.
Letter Sets: Determine which letter set is different from the other four. Number of correctly answered items is the primary measure. Reaction time on correct trials was also analyzed.
Digit Symbol Substitution: Write corresponding symbol for each digit using a coding table. The primary measure is number of correctly answered items within two minutes.
Pattern Comparison: Determine whether pairs of line patterns are the same or different. The primary measure is number of correctly answered items within 30 seconds, averaged across two sets of problems. Letter Comparison: Determine whether pairs of letter strings are the same or different. The primary measure is number of correctly answered items within 30 seconds, averaged across two sets of problems.
Logical Memory: Listen to stories and recall them in detail. The primary measure is number of correctly recalled story details, summed across three story-tellings.
Paired Associates: Listen to word pairs and recall the second word in a pair. The primary measure is number of correctly recalled items.
i-Position: View an array of images on a computer screen and reproduce the positions of the images. Measures are proportion of swap errors (primary) and mean misplacement in pixels.

Working memory
Running Span: Recall the last n items presented in a letter list that ends unpredictably. The total number of items in perfectly recalled sets is the primary measure. We also analyzed the total number of items recalled in the correct serial order, regardless of whether the set was perfectly recalled.
Operation Span: Remember a sequence of letters while alternately performing arithmetic problems, then recall the sequence of letters. The total number of items in perfectly recalled Control tower Multi-tasking 26 3 [32] sets is the primary measure. We also analyzed the total number of items recalled in the correct serial order, regardless of whether the set was perfectly recalled. Symmetry Span: Remember a sequence of locations of squares within a matrix while alternately judging symmetry, then recall order and locations of the sequence. The total number of items in perfectly recalled sets is the primary measure. We also analyzed the total number of items recalled in the correct serial order, regardless of whether the set was perfectly recalled.
Visual Short-Term Memory (VSTM): Detect color change in an array of colored circles. Data was analyzed in terms of d-prime collapsed across set sizes (2,4,6,8) and Cowan's k averaged across set sizes [105]. Each set size measure is reported in S2 File.
Single N-back: Determine whether the current letter presented matches the letter presented two or three items back. The primary measure of d-prime was computed separately for the 2-back and 3-back conditions. Reaction times on correct trials were also analyzed.
Dual N-back (administered in the MRI): Determine whether simultaneously presented auditory and visual stimuli match stimuli presented one, two, or three items ago. The primary measure of d-prime was computed separately for the two-back and three-back conditions following procedures in [92]. Reaction times on correct trials were also analyzed.

Divided attention, selective attention, multi-tasking
Trail Making: Connect numbered circles as quickly as possible by drawing a line between them in numerical order (Trails A), then connect numbered and lettered circles by drawing a line between them, alternating between numbers and letters in numerical/alphabetical order (Trails B). The difference in Trails B and Trails A completion time was the primary measure.
Attention Blink: Identify the white letter (target 1) in a sequence of rapidly presented black letters, and identify whether the white letter was followed by a black "X" (target 2). The attentional blink is calculated on trials where target 1 was accurately detected, as the difference in target 2 accuracy when detection is easiest (lag 8 after target 1) and when detection is most difficult (lag 2 after target 1).
Dodge: Avoid enemy missiles and destroy enemies by guiding the missiles into other enemies. Highest level reached within eight minutes of game play was analyzed.
Multi-source interference task (MSIT; administered in the MRI): Determine the stimulus (digits 1, 2, or 3) that is different from the other two in a three-digit number. The flanker effect is derived by taking the difference between reaction times on incongruent and congruent trials. Only correct trials were analyzed.
Flanker: Indicate the direction (right or left) of the middle arrow, which was either flanked by two arrows on each side (incongruent with oppositely oriented arrows, or congruent with similarly oriented arrows) or two horizontal lines on each side (neutral trials, no arrow head). The flanker effect is derived by taking the difference between reaction times on incongruent and congruent trials. Only correct trials were analyzed.
Anti-Saccade: Identify masked letter, cued on opposite or same side. Accuracy on a block of anti-saccade trials is used as the primary measure.
Psychomotor Vigilance Task (PVT): Press key as soon as zeros begin to count up. The average of the 20% slowest RTs (bottom quintile) is used for analysis.
25 boxes (Number Search): Search for stimuli in a matrix and indicate the corresponding location on blank matrix. The average score on levels with matrix rotation (levels 12-20) was analyzed.
Control tower: Search through arrays using different rules (primary task) while performing several distractor tasks. Performance on the primary task (average of symbol, letter and number score minus corresponding errors) was used as the main measure.
Task-Switch, Dual-Task paradigm (TSDT): Respond to simultaneously presented auditory and visual stimuli based on cued task (auditory, visual, or both). Switch costs (reaction time difference between switch and repeat trials-for single task trials only) were analyzed separately for auditory and visual stimuli, and averaged across both.

Self-report instruments
Participants also completed questionnaires during the third session of pre-testing. These included the Big Five Inventory [106] and Grit Scale [107] to assess personality, the Karolinska Sleep Questionnaire [108] and Pittsburgh Sleep Quality Index [109] to gauge sleep quality, the Godin Leisure-Time Exercise Questionnaire [66] to estimate physical activity, several questions on height, weight, resting heart rate and physical activity to estimate cardiorespiratory fitness [110], and a Media Multitasking Index Questionnaire [111] to assess media usage. These questionnaires were also completed post-testing, but were not used for analyses. Analyses of whether these individual differences moderate training effects will be discussed in a separate publication.
Post-experiment questionnaire: Participants completed an online survey that assessed gaming experience prior to and during the study, as well as their experience in the study. They provided feedback about their enjoyment, effort, and difficulty in playing the training games. They also elaborated on strategies they developed while playing the games. Participants provided feedback on game experience, design, and ease of use, and offered their perspective on improvements to their daily life resulting from their participation in the study (perceived selfimprovement questions), including: overall intelligence, short-term/working memory, longterm memory, sustained attention, divided attention, visuomotor coordination, perception/ visual acuity, multi-tasking, problem-solving, reasoning, spatial visualization, academic performance, emotional regulation, and work/school productivity. The fourteen dimensions queried in the perceived self-improvement questions were also posed in terms of general expectancy or perceived potential benefit. Finally, the survey assessed prior knowledge of cognitive training literature.

Statistical Analyses
Training tasks: Practice effects: To examine improvement on the training tasks, we used a linear mixed effects model for each training task. In each of these models, the dependent variable was average level and the independent variable was session, which was coded as a linear contrast, with random effects of session and intercept for subjects. The change detection task had two conditions (set sizes three and five) which we analyzed separately.
Training tasks: Composite scores: For each training task, we computed a gain score by taking the difference between average level on the last two sessions of training and average level on the first two sessions [12,112,113]. To obtain a measure of overall training gain, we standardized the gain score for each relevant task and averaged the resulting values.
Training feedback: For each group, we averaged the training ratings across the six different tasks and analyzed each dimension using a repeated-measures ANOVA with group as between-subjects factor and training session as within-subjects factor. We report results of the multivariate tests since not all analyses met the assumption of sphericity. We do not analyze the ratings for each task, but report the means in S2 File.
Transfer tests: Measures: Primary measures for each transfer test were determined using conventional analysis procedures (S1 File). When relevant, reaction times (RTs) were also analyzed as secondary measures. In the n-back paradigm, RTs typically show a pattern that is complementary with the accuracy effects [91,114], and each trial in both n-back training and transfer tasks required a response within a short time interval. In addition, the two reasoning games in Mind Frontiers (Irrigator, Safe Cracker) emphasized speed, such that each level needed to be completed within a limited period of time. Reasoning/Gf tests typically have a completion time limit, but speed is usually not stressed. As strategies developed over training may be reflected in post-test performance, we also analyzed RTs for each reasoning test to determine whether training may have had a unique or differential effect on this aspect of performance.
Transfer tests: Data quality and gain scores: If participants scored more than three standard deviations from the mean of any measure (computed separately for pre-and post-test), their data was excluded from analysis of that test and its relevant composite score. This was a relatively liberal criterion applied uniformly to the measures to ensure data quality. For the letter n-back and the VSTM (only) however, this procedure identified three individuals with high dprime values. These data points were not discarded. To reduce the influence of remaining extreme but usable values such as these, the data was then Winsorized: mean and standard deviation were recomputed for the "cleaned" dataset (separately for pre-and post-test), and any value three standard deviations away from the mean was replaced with the appropriate cut-off value (value 3 SD above the mean, or value 3 SD below the mean).
For each measure that would be analyzed at a construct level (more details in next section), we computed a standardized gain score by taking the difference between post-and pre-test scores, and dividing this by the standard deviation of the pre-test score (collapsed across groups). We also inspected gain score data quality using a more liberal criterion of four standard deviations from the mean gain score, and discarded two data points found in two subjects' PVT gain scores (extremely negative gain scores). The task-level analysis was also not performed on the pre-subtraction measures for these excluded gain scores.
The total number of participants differed across tests due to missing or unusable data. More details regarding data quality procedures and exclusions are provided in S2 File. The raw aggregate data for each subject including outliers is provided in S3 File, together with the final data used for analyses.
Transfer effects: Linear mixed model analysis: Standardized gain scores from the transfer tests were then used for linear mixed-effects models (LME) to analyze training-related improvements at a construct level [115,116]. A separate LME model was run for each set (i.e., construct) of gain scores, though note that not all tests were grouped into a construct. We grouped gain scores into eight constructs: working memory n-back (2- Gain scores for reaction time were multiplied by negative one, such that positive scores indicate faster performance after training. Each model consisted of a fixed effect variable of training group and crossed random effects of subject and task for the intercept [117]. These models were implemented with the "lme4" package [118]. Significance testing was performed using the standard normal distribution as well as the more conservative Kenward Rogers approximation for degrees of freedom using the "pbkr" package [119] in the R statistical program (R Core Team, 2014).
Transfer effects: Composite-level analyses: To create composite scores for use in subsequent analyses, the standardized gain scores were averaged according to the aforementioned groupings. One subject's extremely high gain score (>4 SD) for Selective Attention was Winsorized.
With these composite gain scores, we used a multivariate ANOVA to verify training group effects and their consistency with the results from the linear mixed-effects analysis. Bayes Factor was calculated using tools provided at http://pcl.missouri.edu/bf-two-sample [120].
Transfer effects: Task-level analyses: We also conducted analyses at the task level to investigate the specificity and consistency of the composite-level findings. Not all tests were integrated into a composite score or construct in the linear mixed effects analysis, and were analyzed only at the task-level. Only significant interactions at p < .05 are reported in the text. For brevity, we discuss significant group x time interaction results in terms of "transfer effects." Due primarily to technical issues in the recording of responses, only 24 subjects in the Mind Frontiers group and 29 subjects in the active control group have usable dual n-back data. For each measure, we also tested whether the groups differed at baseline, and found no significant differences (S2 File).
Perceived improvement: Surveys with Likert-type single questions were analyzed using Mann-Whitney U tests. In S2 File, medians were used to summarize results as appropriate for ordinal data. Responses were coded as numbers prior to analysis (e.g., 1-7 for very strongly disagree to very strongly agree).

Practice effects: Game performance across sessions
The main effect of session was robust at p < .0001 for all tasks in both groups (Fig 3). The analysis of the change detection task for the set size 5 condition excluded three subjects run at the beginning of the experiment; this was due to experimenter error causing extremely high average maximum duration values in several of these subjects' training sessions (S2 File). When these subjects were included in the set size 5 analysis, the shapes training effect was still significant (p = .03), the cars training effect was no longer significant (p = .15), and the letters training effect remained significant (p < .0001).

Training Feedback
First we tested whether training feedback differed between groups after the first training session. Only motivation F(1,85) = 8.466, p = .005 and demand (F(1,85) = 8.858, p = .004 showed significant group effects, with higher overall motivation in the active control group and higher demand in the Mind Frontiers group.
We then examined whether these ratings changed over time and differed between groups. In enjoyment, there was no main effect of time, but a significant group by time interaction F Participants were not given an opportunity to rate the training tasks that they did not complete; thus the ratings provided may reflect relative differences in the six games played and not necessarily differences between training regimens. Mean ratings for each task are plotted in S2 File.

Transfer of training: Linear mixed model analysis and composite-level analysis
As shown in Table 4, the linear mixed model analysis revealed significant transfer effects in working memory n-back and reasoning/Gf reaction time, and a marginal effect in perceptual speed.
We verified these transfer effects using the composite gain scores, which will be used in succeeding analyses. The MANOVA on the composite gain scores showed a significant training group effect F(8,78) = 2.633, p = .013, η p 2 = .213 (Fig 4), In perceptual speed, the Mind Frontiers group outperformed the active control group, with the Mind Frontiers group correctly completing more items within each test's time limitalthough this effect was weaker in the linear mixed model analysis. While no group by time interaction was observed in the accuracy or total correct composite measure of reasoning/Gf, there was a significant group x time interaction in the reaction time composite measure for reasoning/Gf, with the Mind Frontiers group displaying faster reaction times on correct trials at post-test compared to the active controls. Working memory span, episodic memory, selective attention, and divided attention did not show training-related effects; there were no improvements or decrements that significantly differed between groups.

Correlation between baseline performance and transfer gain
To determine whether transfer effects observed in the Mind Frontiers group vary according to baseline cognitive ability, we correlated the composite reasoning/Gf score at baseline (pre-test) with transfer gain for the composite measures that showed a group effect. None of the correlations were significant. Since working memory ability may also predict individual differences in transfer, we correlated baseline working memory scores with transfer gain. There was no significant correlation between working memory n-back baseline score and transfer gain. Baseline working memory span score was correlated with transfer gain in perceptual speed (r(42) = .274, p = .036, one-tailed), but this is not significant after Bonferroni correction for multiple comparisons.

Correlation between training gain and transfer gain
Next we tested whether training-related improvements related to gains observed in the transfer tests. Table 5 reports the Pearson correlation coefficients and the confidence intervals from 2000 bootstrapped samples using the adjusted bootstrap percentile (BCa) method [121]. For the Mind Frontiers training group, overall training gain was significantly related only to  Effects of Working Memory, Reasoning, and Task Switching Training working memory n-back transfer gain. There was no significant relationship between training gain and the perceptual speed and reasoning RT gain scores (Table 5 and S2 File). For the active control group, no significant relationship was observed, which is not surprising given that no transfer effects were observed for this group (Table 5). We also examined the relationship between transfer gain and training gain on each Mind Frontiers game, as averaging across training games may dilute task-specific effects. Consistent with the composite training gain results, working memory n-back gain was significantly related to training gains in Supply Run, Sentry Duty, and Safe Cracker. Moreover, gain in working memory span was significantly correlated with training gain in Riding Shotgun, which was based on a matrix span task. Given the number of correlations tested however, these results are not significant at p < .05 after Bonferroni correction for multiple comparisons.

Correlation between training feedback and transfer gain
To determine whether subjects' experience and involvement in the games factored into the transfer effects, we correlated the three composite scores that showed transfer effects and their ratings of the training games after the last session of training. Reported below are correlations significant at p < .05 and whose bootstrapped confidence intervals (2000 samples) do not include zero. First, we averaged ratings across the six Mind Frontiers games. None of the correlations were significant. Since the relationships may differ across the training games, we also conducted analyses at the task level.
The majority of the results were not robust or in the expected direction (greater gains with more positive ratings or experience), thus we refrain from interpreting them here. Reasoning . None of these correlations, however, pass a Bonferroni-corrected threshold and thus overall indicate no effect of gaming experience on transfer.

Task-level analysis
Results for each test are summarized in Table 6 and briefly discussed below. We also tested pre-test scores and did not find significant group differences at baseline (S2 File).

Working memory (n-back tasks)
Compared to the active control group, the Mind Frontiers group improved significantly on three out of four accuracy measures in the dual and single n-back tests. This is not surprising given that the Sentry Duty game in Mind Frontiers is based on the dual n-back task. Although the 2-back condition in the dual n-back did not reach significance, there was a trend of higher scores in the Mind Frontiers group at post-test. Reaction time improvements were also observed in the single letter n-back task.

Working memory (span tasks and VSTM)
While there was evidence of near-transfer to the n-back tasks in the Mind Frontiers group, no transfer effects were found in other common measures of working memory such as the Operation Span, Running Span, Symmetry Span, and VSTM tasks.  Although there were no other significant group x time interactions for reasoning/Gf total correct measures, the Mind Frontiers group at post-test had significantly faster reaction times for Letter Sets. In the other reasoning tasks, there was a trend for faster RTs in the Mind Frontiers group, but these effects were not significant at the task-level. It is important to note that RTs are not typically used as measures for reasoning/Gf tests. We chose to analyze RTs in this study due to the speeded nature of the reasoning training tasks included in Mind Frontiers.

Perceptual Speed
The composite gain analyses revealed a significant transfer effect for perceptual speed. Task analyses show that this was driven by a significant group x time interaction in Letter Comparison, with more correct responses answered within the time limit for the Mind Frontiers group compared to the active control group. The interactions for Pattern Comparison and Digit-Symbol Substitution were not significant, but showed the same trend for improved performance in the Mind Frontiers group.

Episodic Memory
Logical Memory, Paired Associates and i-Position showed no significant transfer effects.

Selective Attention
Compared to active controls, the Mind Frontiers group had marginally improved accuracy in the Anti-saccade task. No transfer effects were found in the PVT, Flanker Test, and visual/ number search game (25 Boxes). Although there was a significant group x time interaction in the MSIT RT congruency effect, there were no differences in the pre-subtraction measures of incongruent and congruent RTs.

Multi-tasking and Task Switching
No training-related effects were found in Control Tower and TSDT.

Perceived benefit effects
Mann-Whitney U tests on the perceived self-improvement questions did not reveal significant group differences (S2 File), suggesting that the transfer effects are unlikely to be influenced by perceived improvement differences across groups. However, the same set of questions phrased in terms of general expectancy or potential benefits (not necessarily applicable to self) revealed significant group differences (S2 File) with the active control group expecting better sustained attention (U = 741.0, p = .019) and perception (U = 751.0, p = .023), and the Mind Frontiers group expecting better multi-tasking (U = 789.50, p = .045), problem-solving (U = 632.50, p = .001), and reasoning (U = 717.0, p = .012) performance. After Bonferroni correction for the fourteen multiple comparisons however, only the problem-solving effect holds. There was no significant relationship between problem-solving expectancy and transfer gain. More details about training feedback and analyses can be found in S2 File.

Discussion
Participants who played the Mind Frontiers game showed near-transfer to the single and dual n-back tasks, which were similar in design to one of the trained working memory tasks, Sentry Duty. Training-related transfer effects were also observed for composite measures of perceptual speed and reasoning/Gf reaction times. These speed-reaction time findings support the hypothesis that varied and integrated cognitive training in Mind Frontiers can lead to improvements in "lower-level" abilities of perceptual speed and attention-which may reflect more efficient processing of stimuli to support performance in more complex tasks. Although reasoning/Gf improvements were not found in primary accuracy measures, improvement in reasoning/Gf reaction times provides some promise for the plasticity of this higher-level ability. It is important to note, however, that no training-related effects were observed in five out of the eight composite measures tested here, and that differential expectancy regarding the nature of training suggests some caution in the interpretation of the results.

Transfer effects
Baseline cognitive performance as measured by reasoning/Gf had no effect on transfer gains in the Mind Frontiers group, which suggests that the training had a relatively uniform effect on participants. This is likely due to the adaptive and relative difficulty of the training tasks, which decreased the likelihood of performance plateaus. Other computer-based training studies have found that baseline ability measures either negatively or positively predict improvements [49,54,122], with results varying depending on the nature of training tasks. The null effect of baseline reasoning/Gf in the current study may also reflect lack of power or variability, though it is also possible that the heterogeneous and adaptive games employed here decreased the likelihood of floor or ceiling effects in overall training improvement. Similar to Irrigator, Safe Cracker, and Pen 'Em Up in Mind Frontiers, the reasoning/Gf transfer tests required task execution within a very limited time frame. Although the perceptual speed tasks were not reasoning in nature per se, they also involved completion of as many items as possible within a certain time limit. Did experience with these time-limited games specifically drive the transfer gains observed in the speed and RT measures? Several training studies find that only those who improve on the training tasks ("responders") also show transfer on untrained tests (e.g., [12,113]). To test this, we correlated overall training gain and transfer gain as measured by composite scores. Only the working memory n-back composite score was significantly and reliably related to improvement in Mind Frontiers, similar to previous findings in working memory training [34]. As expected, no significant relationship was observed for the active control group.
Apart from transfer to another working memory n-back test, Thompson and colleagues [34] did not find any transfer to untrained paradigms. In the current study, which involved a larger sample size, we found some evidence for transfer in reasoning RT and speed measures. However, these performance gains were not related to training gains and hint that the benefits observed may be due to a factor or combination of factors common to the Mind Frontiers tasks and not necessarily attributable to processes such as working memory, reasoning, or attentional control. It is plausible that rather than developing these skills per se, the overarching time-limited nature of the tasks made participants better prepared for the speed-intensive tests at post-test.
No training-related effects were observed in the working memory span tests, despite the inclusion of "Riding Shotgun," a Mind Frontiers game that is similar to a simple working memory span task (Symmetry Span) employed in a training study that found transfer to untrained span tests [28]. These incompatible results may arise from differences in training methodology; the Mind Frontiers group spent less time overall training on the span task (20 12-minute sessions) compared to Harrison et al. [28], where only span tasks were performed for the duration of 20 45-minute sessions. In addition, transfer effects may be very specific to the type of training received. Similar to the simple span training group of [28], the Mind Frontiers group did not improve in tests of complex working memory span (Operation Span, Symmetry Span). While the previous study found improvements in Running Span for both the simple and complex working memory training groups, the training and test stimuli were the same. The absence of a Running Span effect in the current study can be attributed to the specificity of stimuli-in that the Riding Shotgun game involved spatial locations while the Running Span test involved letter stimuli. Unlike the current study, the previous experiment also incorporated performance-based bonus compensation, which may have also led to differences in motivation. Nonetheless, an examination of individual game performance and transfer gain revealed a modest yet positive relationship between working memory-span gain and training gain on Riding Shotgun. While this effect no longer holds after multiple comparison correction, it is consistent with the n-back finding of transfer gains in tests similar to the training tasks, as well as other studies that find transfer to various working memory span tests after adaptive training on verbal and visuo-spatial span tasks [17,123].

Expectancy and placebo effects
The responses to the perceived improvement and expectancy questions were not consistent, with no significant group differences in questions of perceived self-improvement, but slight group effects when the same questions were phrased in terms of general potential improvement. These findings, however, are not necessarily contradictory in terms of expectancy biases and may instead reveal that the participants accurately assessed the properties of their training tasks. Nonetheless, this awareness has been argued to potentially lead to sub-conscious expectations and thus placebo effects] 29,30]. Although the transfer effects were not correlated with the improvement or expectancy ratings, we cannot conclusively rule out that the benefits observed in the Mind Frontiers group reflect placebo effects to some extent. The results obtained here also highlight the importance of wording in self-report assessments, such that subtle changes in question framing may reveal different patterns of results.
Given these findings, a more careful examination of placebo effects is warranted. One approach involves comparison to a survey-based study where participants learn about specific interventions and evaluate intervention-related outcomes [124]. Another involves having participants specifically rate perceived improvement and expectancy for specific tasks [29], rather than general abilities as implemented in the current study.

Limitations and Future Directions
The improvement observed in reasoning/Gf is promising, but modest; effects were found in reaction time and not accuracy, which is the more established measure for estimating reasoning/Gf [125][126][127]. Future research should involve administration of more sensitive accuracy measures that may better capture any subtle changes in processing efficiency. The tests used in the current study were derived from previous studies demonstrating sensitivity to age-related differences or changes. The Matrix Reasoning test used, for example, is a modified and abbreviated computerized version of the more extensive 60-item Raven's Advanced Progressive Matrices [126]. Although easier to administer, these abridged tests may be less suitable for detecting subtle effects or changes [128,129], especially in relatively high-functioning young adults and in the presence of practice (test re-test) effects. Moreover, it is possible that longer and more intense training, as well as conducting a study with a larger sample size, may lead to more measurable gains in higher-level abilities of reasoning/Gf and working memory.
Although there were no significant differences in engagement and frustration between groups, an important limitation of this study is the different training experience between training groups. Group by time interactions were found in enjoyment and motivation, with increasing ratings for the Mind Frontiers group and decreasing ratings for the active control group. While the Mind Frontiers group experienced a "gamified" experience of the tasks, the active control participants completed less visually engaging laboratory tasks without explicit progress tracking, unlike the Mind Frontiers group that received information on points accrued from gameplay and game levels attained. Unfortunately, thus far, very few training/transfer studies have collected such ratings. Therefore, it is impossible to know whether previous observations of transfer effects have been confounded by subjects' expectancies about benefits. A follow-up study should equate the active control group on these motivational aspects of training and usability, with comparable presentation and progress tracking of the control training tasks.
As this study involves relatively high-functioning young adults, future directions include investigating whether individual differences in physical fitness and personality can moderate training and transfer benefits. Physical fitness has been shown to be highly related to executive control abilities [130,131], while personality factors have been found to play a role in training improvement [34,128,132,133]. Moreover, brain volume in specific cortical and subcortical regions have been shown to predict training and transfer benefits from videogame training [134,135]. Analyzing structural and functional brain profiles may provide further insight into why specific interventions may be more successful for certain individuals, and help characterize the overlap between training tasks and tests that show training-related transfer.