Evaluation of a gamification and flipped-classroom program used in teacher training: Perception of learning and outcome

Recent years have witnessed the arrival of new methodological horizons in teacher training. Technological resources and mobile connections play a major role in these studies. At the same time, there is a focus on play to increase commitment and motivation. It is in this context that gamification and flipped-classroom strategies have arisen. This paper presents the results of a training program with future Primary Education teachers using gamification and flipped-classroom strategies and techniques. The aim was that teachers in training acquire competences in proposing innovative teaching units. The learning achieved through the program was evaluated by collecting perceptions via a questionnaire and using an observation scale of the didactic units designed. The program was implemented in four classroom groups (N = 210) at the University of Murcia (Spain). Descriptive statistics are shown; mean tests (t of Student and ANOVA of one factor); non-parametric tests (U-Mann Whitney test); and Pearson correlations between subscales. The data show a very positive assessment of the learning achieved and the strategies applied in the training program. The learning outcomes were satisfactory, although lower than perceived. Some differences between class groups and gender are discussed, and some weaknesses of the program are pointed out.


Gamification and the flipped-classroom in teacher training
Recent years have witnessed the arrival of new methodological horizons in teacher training [1] and international studies have stressed the need to renew teacher training programs to improve teaching-learning processes in compulsory education [2][3][4] and have emphasized mastery of normal classroom tasks [5,6]. Technological resources and mobile connections play a major role in these studies [7,8]. At the same time, there is a focus on play to increase commitment and motivation [9][10][11]. It is in this context that the phenomenon known as Research in higher education has usually focused on one of these strategies. In this article we show the positive effects of linking both teaching methods. Gamification and flipped-classroom has drawn the attention of teachers and academics in the last five years, but there is still an underdeveloped theoretical and empirical frame to discuss the effectiveness of these methods, above all used both together like in this study [48], in spite of we already have the first evidences in primary education [49]. But there is a lack of evidence and empirical evaluation of its use in a non-entertainment context [50] in higher education, where they are less used and there are no many examples of these methods [45]. Above all in Social Studies Education, a field still very linked to traditional teaching & learning methodologies, where it could be found studies in economics and marketing [51], social work [52], business courses [47,53], educational technology programs [54], employees, consumers or environmental experiences [55] but very few researches of gamification and flipped-classroom teaching History or Geography lessons [56] and never using both strategies at the same time in this field [57][58][59][60].
Empirical investigations need to dig into evidences designing Social Science educative programs using gamification and flipped-classroom. It could be very interesting to know what is the opinion of future teachers [61] using these methods together or what are the perceptions of the training teachers about learning outcomes or self-motivation which is especially important for educational research as the students of today will be the teachers of tomorrow [62]. And, in the future, their opinion about strategies and methods will affect their curricular and pedagogical decisions during their professional development [63] which, as well, can influence the future learning of their students [64].

Research question and aims
The objective of this paper was to analyze the effects of a training program based on flippedclassroom and gamification on the learning perceptions and outcomes of teachers in training.
The specific objectives derived from this general objective are: SO1: To analyze the opinion of future teachers on the strategies used in the training program by group, gender and flipped-classroom/gamification techniques. SO2: To analyze the perception that future teachers have of the learning achieved in the training program by group and gender, and relationship with the techniques assessment subscale.
SO3: To analyze the learning outcomes of future teachers in relation to their ability to make teaching proposals for Social Sciences in Primary Education and the differences by groups and learning perceived.

Participants
This specific study was reviewed and approved by an institutional review board (Research Ethics Commission of the University of Murcia) before the study began. We obtained the written consent associated to the Ministry of Science, Innovation and Universities investigation project. After informing to the participants about the research objectives, they signed an informed consent document (supplementary material).
The research sample comprised 210 trainee teachers (53 males; 25% and 157 females; 75%), with respect to the groups, in the group 1 (bilingual) is in the one that is smaller proportion of males with respect to the other groups ( Table 1). The total number of participants in the sample study the third course on the Primary Education Degree at the University of Murcia, Spain. The purpose of the academic degree is the initial training of Teachers trained to practice in the Primary Education stage (6-12 years). The age range of the participants was 19 to 44 years (M = 20.94 and SD = 2.77). Almost 90% of the sample was between 19 and 22 years old (Table 1), which indicates that the academic year of the majority of students corresponds to the age. The training program was carried out in four classroom groups, although all students belonged to the same course, who are required a high cut-off note to access the Degree (2018/ 19 academic year-8.506). There are differences between group 1 and the other groups (2, 3 and 4). Group 1 differed from the others because it is a bilingual group (a minimum of 15 subjects are taught in English). In addition, students in the bilingual group have to meet the following requirements: have the nationality of countries whose official or co-official language is English, either, have a certificate of accreditation of a level B1 or higher in English and have completed a Bilingual Baccalaureate. Groups 2, 3 and 4 are more homogeneous both in percentage of men and women and in the academic origin of the participants.
It is interesting to know whether the students of the same obtain benefits in the objectives proposed in this work with respect to their colleagues in the monolingual groups, as well as to check the uniformity in the implementation of the program based on gamification and flipped classroom in teacher training and in the process of teaching and learning through the different groups.
Futhermore, the distribution of the sample was quite homogeneous: 21% in Group 1 to 28.6% in Group 3 (Table 1). In addition, this table presents information about participants' demographics based on groups.

Research focus
A methodological approach based on program evaluation was chosen: design, implementation and evaluation of a training program [54]. For evaluation of the program, a quantitative approach was applied using two tools: a questionnaire with Likert scale (1)(2)(3)(4)(5) to ascertain the perceptions of the participants about their learning; and an observation record of the training units designed by the teachers in training to ascertain the learning outcomes.

Design of the training program
The training program was run in four class groups on the subject Teaching Methodology for Social Sciences on the Primary Education Degree at the University of Murcia (Spain). To ensure fidelity in the implementation by the teaching team, a document was created with a protocol (supplementary material). This document is a checklist of activities that teachers

PLOS ONE
should complete to ensure uniformity in the implementation of the program. The aim of the subject is that the students acquire competences in the design of innovative teaching proposals for social sciences in Primary Education. The strategies used in the training program were based on the flipped-classroom, as a teaching approach, and gamification, as a technique to encourage motivation. The subject was taught in the first semester of the academic year 2018/ 2019 (September-December), with two sessions of two hours a week. The teaching team produced a weekly video with the theoretical contents of the subject. For the flipped-classroom, the students had to watch the video at home. The activities inside the classroom were based on case studies, simulations, analysis of materials, cooperative work, etc. This was combined with gamification techniques. To design our gamification techniques we follow the claims of Teixes [65] on the different mechanics and dynamics used. Specifically, in terms of dynamics, which are the elements that make progress in the game visible, we used the points, specifically experience points. Those are earned from the actions performed by users: the number of successes that the member obtained from the questions launched with the Socrative application. A classification was also introduced, that is, an element that visually ordered users according to the score achieved. As for the dynamics, we use a system of rewards, that is, a valuable element that is obtained after the achievement of an objective. In our case it was a bonus in the final grade of the subject based on the total points obtained at the end of the experience. Another dynamic was the competition by comparing the results of all groups through the classification, thus seeking extra motivation. Finally, the feedback was used: the interrelations that was offered to the students at the end of each questionnaire so that they knew their degree of progress in the gamified system. With this feedback, students were stressed on the need to achieve a real learning of the contents of the subject, leaving the rewards in the background. In this way we try to boost intrinsic motivation against extrinsic. At the beginning of each of the sessions the students answered questions about the theoretical videos through team competitions made using the Socrative platform, following the recommendations of studies like [66]. At the end of the sessions, team competitions were held again on the contents dealt with throughout the session. The groups could obtain badges during the development of the proposal, and prizes at the end of the course for those who gained most badges related to the final grade.
Throughout the program the working groups had to design an innovative teaching unit for social sciences. At the end of the course, the groups were required to give an oral presentation with a simulation of one of its parts and to carry out the activities designed in situ. The unit was evaluated using an observation scale (supplementary material) in order to test the effectiveness in learning competencies related to the design of innovative proposals.

Tools used for data collection
The information on the effects of this training program was collected through two tools. First, there was an ad hoc questionnaire entitled "Evaluation of the gamification and flipped-classroom based training program", which used a closed Likert rating scale (1-5), consisting of three thematic blocks (supplementary material). The first block addressed the perceptions of trainee teachers on how the program had affected their motivation. The second block of the questionnaire dealt with how satisfied they felt with the program. The third block focused on the perception of the learning received in the training program. For this purpose, a series of statements related to each of the objectives of the program regarding the proposal of innovative teaching units were drawn up. Participants were also asked to assess the role that, in their opinion, each of the strategies and techniques used in the effectiveness of learning played in learning effectiveness.
The design of the questionnaire took into consideration other studies on the effects of gamification programs on motivation, satisfaction and learning effectiveness [9,19,67,68]. The validation of the content was carried out by a panel of experts who judged the relevance and clarity of the items in the tool.
The second tool was an observation scale to evaluate the teaching units designed by the future teachers (supplementary material). It had a 1-5 rating scale and was built around four variables that were evaluated by the teaching team: suitability of the structure of the teaching unit; relevance of the training activities; methodological suitability; correction of the evaluation procedures and instruments. The learning outcomes of each of the assessments (1-5) were detailed. Some models developed and implemented in this area of knowledge in observation scales were taken into account [69][70][71]. Validation of the content was also carried out through a pane l of experts who judged the relevance and clarity of the tool's variables and the proposed learning outcomes for each of the assessments with the scale.

Data analysis procedure
The data collected by the two tools were coded and analyzed separately with SPSS v. 22.0 for MAC. The reliability and validity of the construct of the perception of learning questionnaire were estimated prior to the data analysis. The internal consistency method based on Cronbach's Alfa used to estimate the reliability of a measuring instrument composed of a set of items of Likert scale type expected to measure the same theoretical dimension (the same construct) was used to analyze the reliability of the questionnaire. This validation procedure has been used in other history education research [69]. The criterion established and used by various authors is that a Cronbach alpha value between .70 and .90 indicates a good internal consistency for a one-dimensional scale [72,73]. In the case of the questionnaire, satisfactory results were obtained both on a global scale and on each of the subscales used in this study. The degree of reliability of the global scale was also shown to be adequate using the Guttman split half technique ( Table 2).
The validity of the construct and the viability of a subsequent factorial analysis were also checked. For this purpose, the correlation matrix was analyzed and Barlett's sphericity test and a Principal Component Analysis (PCA) were carried out for each of the blocks of the questionnaire. The exploratory ACP explains the maximum percentage of variance observed in each item from a smaller number of components which summarize that information [74].
The analysis of the correlation matrix looked for variables that did not correlate well with any other, that is, with correlation coefficients of less than 3; and variables that correlated too well with others, that is, variables that have some correlation coefficient greater than 9. In the case of the study questionnaire, no variable with these characteristics was found.
In the three blocks a critical level (Sig.) of .000 was obtained in Barlett's sphericity test. If we apply the ACP to each of the blocks, we obtain a distribution in the first block of 3 dimensions, explaining 48.9% of the total variance, with a KMO of .848. In the second block we obtain 2 dimensions, explaining 46.3% of the variance, with a KMO of .828. In the third block we obtain 3 dimensions, explaining 55.01% of the variance, with a KMO of .884 (Table 3). Table 2. Cronbach's alpha internal consistency coefficients and Guttman split half for the scale "evaluation of the gamification and flipped-classroom based training program" and the sub-scales used in the research.

Scales and sub-scales Number of Elements Cronbach's Alpha Guttman's split-half
Overall Scale "Evaluation of the gamification and flipped-classroom based training program" 37 .940 .903 Sub-scale "perception of learning" 8 .876 Sub-scale "perception of motivation" 13 .821 https://doi.org/10.1371/journal.pone.0236083.t002 The results of these tests showed that the questionnaire has an adequate degree of reliability and validity. Descriptive statistical analyses were carried out (minimum, maximum, mean and standard deviation of each of the variables). In addition, mean tests (Student t and single factor ANOVA) were applied for sex and group variables; and nonparametric tests (Mann-Whitney U test) for the sex variable; and Pearson correlations between subscales.

Opinion of trainee teachers about the strategies and techniques used in the training program
We can see below (Table 4) the means and standard deviations, as well as the minimum and maximum, of the participants according to the membership group of each of the variables referring to the perception of the strategies used in the study and the grouped variables.
The scores show a very positive evaluation of the strategies used. All items were rated higher than 4 out of 5, and a large part over 4.5. Overall, Group 1 rated the strategies used in the training program most positively while Group 2 gave the lowest rating. At between-group level, when differentiating each of the strategies and techniques it is observed that group 1 values all the items more positively, except that of "Practical activities in the whole group", in which it is the students of Group 3 who award a higher mean score, and "Videos of the flipped-classroom", where Group 4 has a slightly higher mean. At the within-group level, the highest scores are for groups 1 and 2 for the "Simulation" strategies, and for groups 3 and 4 for the "Socrative Test" strategy.
A single factor ANOVA was performed to check for statistically significant differences and the mean differences found between the four groups were not statistically significant.  We present ( Table 5) differentiated descriptions of the two strategies/techniques associated with the flipped-classroom (videos and activities in the large group class) and gamification (Socrative test and scores/badges). It can be seen that Group 1 valued the gamification strategies (Socrative test and scores and badges) notably more positively than the rest of the groups. However, Group 1 does not value the techniques associated with flipped-classroom as positively as Groups 3 and 4.
A single factor ANOVA was performed to ascertain whether there were statistically significant differences in the assessment of the techniques used associated with the flipped-classroom. The mean differences found between the four groups were not statistically significant. However, a single-factor ANOVA was also performed for the evaluation of the gamification techniques used. The results showed statistically significant differences for Group 1 with respect to Group 2 ( Table 6). This group reported a better assessment of the gamification strategies and techniques, with statistically significant differences with respect to Group 2, and with minor differences with the rest of the groups.
As we observed, the descriptive statistics by sex for the perception of strategies variable, with males (26.5%) and females (74.5%) scores very similarly: mean 4.34 (male) versus 4.45 (female).
The Student t test, for independent samples, and the non parametric Mann-Whitney U test were carried out to see if participants' perceptions of the strategies differed according to sex. In neither case were statistically significant differences found.

Perception of acquired learning under the training program
We can see (Table 7) the means and standard deviations, as well as the minimum and maximum, of the participants according group membership for each of the variables referring to the perception of learning in the training program and with the grouped items.
The scores obtained show a very positive evaluation of the learning acquired in the program. All the items obtained a valuation above 4 out of 5, and all but one exceeded 4.5 (Group 4 rated its learning 4.48). Overall, Group 1 rated its learning most positively and Group 2 the least positively. At between-group level, Group 1 values learning more positively in all variables. At within-group level, the perception is more positive in Groups 1 and 4 in "Activities and phases"; Group 2 students perceive better learning in "Structure of the Didactic Unit" and Group 3 in "Methodology".
In order to analyze whether the perceptions that future teachers have of the learning achieved in the training program based on flipped-classroom and gamification were Table 5. Descriptive statistics by group membership for the variables referring to the evaluation of the strategies used.

PLOS ONE
statistically different, a single-factor ANOVA test was carried out. The results showed statistically significant differences for Group 1 with respect to the rest of the groups: there is a 0.24 difference between group 1 and the rest in learning perception, which is explained because of a higher valuation for methodology and activities by the members of group 1.
In addition, the descriptive statistics for the perceptions of learning acquired during the training program according to participants' sex show that the mean for females is higher (4.7) than for males (4.43).
In order to analyze whether the differences are significant, both the Student t test for independent samples and the non-parametric Mann-Whitney U test were applied. The results showed statistically significant differences between males and females the overall perception score of their learning, and this was higher in females (Mann-Whitney U = 5.541; p< 0.01; Student t = .000). In the light of the results, female participants perceived that they had learned more.
The learning perception subscale and the techniques/strategies assessment subscale show significant correlations (Correlation = .621; p< 0.000). As we can see (Table 8), the correlations of all items are significant and positive. However, the items of the learning perception subscale show higher correlations among themselves than the items of the techniques/strategies assessment subscale. Correlations between the items of the learning perception subscale range from .515 to .799; while the items of the techniques/strategies assessment subscale range from .174 to .718. With regard to the correlations between the items of both subscales, we observe that they are of magnitudes from weak to moderate, ranging from 251 (Socrative test and Activities) to 453 (UD Simulation and Structure).

Regressions-multivariate analysis with PLS
After analysing the descriptive statistics, the difference between the result variables and the sociodemographic variables, as well as the correlation between subscales, a multivariate regression analysis approach has been used with the Partial least squares regression (PLS regression). The goal is to inspect which variables of the study are more predictive. The partial least squares regression (PLS) is a technique that reduces the predictors to a smaller set of not correlated components and performs a least squares regression on these components, instead of on the original data. Given the fact that the PLS regression models the response variables in a multivariate way, the results could differ significantly from the ones calculated for the response variables individually.
In the Table 9 it can be seen the variables' importance description for the projection, the standard deviation, as well as the inferior and superior limits for every explanatory variable used in our study.
As shown in Fig 1, the variables provided by the most optimal values for the projection of our regression model are provided by the teamwork, the gamification strategies and the flipped-classroom strategies, respectively. The importance of these variables in the explanation of the perceived learning can also be noted in Fig 2 on standardized rates. Teamwork is the most influent variable on the perceived learning, followed by gamification strategies, flipped-classroom strategies (with similar values between strategies). With these results we can say that the socio-demographic variables of the study (sex, age and gender) do not have a strong causal relationship with the perceived learning by the students in this training programme. By contrast, in the student's assessment of the group work, the gamification and flipped-classroom strategies had a notable influence in the perceived learning.

Results of the future teachers' learning with regard to the capacity to produce teaching proposals
After knowing the perceptions of the students about their learning, we evaluated the learning outcomes of teachers in training on their ability to propose didactic activities. For it, we used an observation scale that would allow us to assess the structure of these activities, the methodology, the phases and the proposed evaluation. We can see (Table 10) the means and standard deviations, as well as the minimum and maximum, of the participants according to group membership for each of the variables and with the grouped items referring to the learning results for the capacity to propose training activities for the teaching of the social sciences.
The scores show positive results for the learning acquired in the training program, although with a lower score than the self-perception of t learning. Half of the items received a score over 4 (4.93 out of 5 in the valuation of the activities in Group 2 was the variable with the highest score). The rest of the items exceeded 3.5 out of 5 on average, except for the score received for the methodology in Group 2. Overall, Group 1 had the highest performance in the training program and Group 3 the lowest, although the difference was not very great. At betweengroup level, the averages between all the groups showed few differences. Nevertheless, the best results were obtained in "Structure of the Teaching Unit" in Group 1; in "Activities and phases" in Group 2 and in "Evaluation" and in "Methodology" in Group 3. At the within-group level, the highest scores were from Group 1 in "Structure of the Teaching Unit", and Groups 2, 3 and 4 in "Activities and stages".
A single factor ANOVA was run to test for significant differences and the differences in means detected were not significant. Table 11 collects the descriptive statistical data from the learning perception score and the performance of the Didactic Unit. The sample difference between the learning perception and performance questionnaires obtained in the Didactic Unit evaluation is due to the fact that in the latter the evaluation was conducted in a group.
On one side, if we compare the scores on Table 11 on a descriptive level, it can be observed that the higher scores are obtained on the perception of the achieved learning (regarding the perception of the Didactic Unit, the evaluation methods of the Didactic Unit and the active teaching methods in social sciences' structures). However, regarding the phases of a Didactic Units' activities, the performance levels are higher than what is perceived.
To analyse if the observed differences at a descriptive level are statistically significant, a t of Student was used for independent samples, observing that there are statistically significant differences in the learning perception regarding the achieved learning (in terms of both evaluation tools of a Didactic Unit and the learning of different active teaching methods in social sciences).

Discussion and conclusions
From the data, the students had a very positive opinion of the strategies and techniques used in the training program. Increased participation, greater autonomy and the ability to tackle different learning styles [36-38, 59, 75]; as well better commitment towards the learning [20,57,76,77] are some of the factors that would explain this. Although there were no statistically significant differences in terms of groups and gender overall, Group 1 did score gamification strategies higher. As regards perception of learning, the results are again very positive. This perception of learning expressed by the students themselves is in line with other research on the use of gamification and the flipped-classroom [49,[78][79][80]. In this case there were significant differences. Group 1 students and female participants perceived that they had learned more. There is a wealth of literature that has addressed differences in the perception of the use of technology and digital literacy according to gender [81,82]. In this training program, in which ICTs played an important role, it was women who showed a better perception of learning and a greater appreciation of the program. These results differ from a large part of the studies, which indicate notable differences in the use and usefulness of ICT [83]. We interpret the women's more positive opinions of the program and the techniques as being related more to their innovative potential than to the use of ICT. The supposed gender digital divide [84] must, therefore, be taken with extreme caution, and the results should be analyzed from different conceptions of what innovation means [85]. The differences for Group 1 have an explanation first in their special characteristics (bilingual group), and second in their greater valuation of techniques and strategies linked to gamification. These techniques are related to the increase in student motivation [9]. The data seem to indicate that this motivation generally supposed a greater self-perception of learning. This was a group with a higher academic level than the rest of the groups, and valued the motivation techniques more positively.
The analysis of the learning outcomes through the evaluation of the teaching units designed by the students shows satisfactory results. However, this score was lower than the selfperception of learning. The students believed that they had learned more than the final results showed after the teachers' evaluations. Group 1 had the best results, although with a statistically insignificant difference with the rest of the groups. The structure of the training units and the activities were the items with the best outcome among the teachers in training. On the other hand, the methodological justification at the theoretical level did not have such satisfactory results. From training in university contexts, one must begin to understand that international studies and reports are proposing that these key trends be adopted in the short term, leading to a change in practices in educational contexts. In our study, students valued very positively the use of previous videos through the flipped-classroom method, as in other studies [46]. This allowed to change the traditional lesson and to introduce activities of work in group. The use of in-class time to do these activities resulted in more positive feedback from students. Student learning, as determined by Teaching units designed, was affected positively [45]. The good evaluation of the program was also increased thanks to the use of gamification techniques. If the use of flipped-classroom was a methodological change in which students acquired a more active role, the use of gamification kept motivation high with daily work [20]. University education and initial teacher training must adapt to these challenges and the demands of today's society, taking into account the emerging trends that professionals will encounter in their immediate future [86,87].
Not all published experiences on gamification reflect these positive results [19,38]. Hence, from the conclusions of these experiences, we interpret that the satisfactory data obtained in our research are due to our students' being informed of the working method from the outset [37] and it was accepted once the educational objectives associated with the program had been established. The learners involved themselves in the learning process by watching the videos and so arrived prepared in the classroom. In tandem with the dynamics of the flipped-classroom and gamification, we could also count on the cooperative work to fix learning and internal motivation [9].
According to the data obtained, the teachers in training acquired specific competences in the proposal of training activities for teaching social sciences. However, learning on the theoretical conceptualization capacity of these proposals was more superficial learning. This undoubtedly seems to be one of the weak points of the program, with more emphasis placed on technical and design skills. The program needs to be reviewed in order to provide future teachers with a greater theoretical framework on which to base their training proposals. However, in order to corroborate the effects of this program, more in-depth evaluative research is needed with larger numbers of learners and in different contexts.