Educational improvement through machine learning: Strategic models for better PISA scores

Bilal Baris Alkan; Serafettin Kuzucuk; Şevki Yetkin Odabasi; Leyla Karakuş

doi:10.1371/journal.pone.0326121

Abstract

In this study, in addition to traditional variables such as economic wealth or the number of books read, on which many studies have already been conducted, variables that are thought to influence student achievement and better predict success are identified. Random Forest algorithm was used to identify important variables based on the PISA 2018 data, covering all three domains of science, mathematics and reading. The study found that the main factors influencing the success of students in countries that perform well in the PISA exam are essentially access to information technology, weekly hours of instruction in the subject, economic-social and cultural status, parents’ occupation, level of metacognition, awareness of PISA, sense of competition and attitudes towards reading. New prediction models based on these variables were proposed. The proposed models will give a significant advantage to policy makers who want to improve their country’s PISA score and implement appropriate education policies.

Citation: Alkan BB, Kuzucuk S, Odabasi ŞY, Karakuş L (2025) Educational improvement through machine learning: Strategic models for better PISA scores. PLoS One 20(7): e0326121. https://doi.org/10.1371/journal.pone.0326121

Editor: Tülin Otbiçer Acar,, Parantez Education Research Consultancy and Publishing Services, TÜRKIYE

Received: February 18, 2025; Accepted: May 24, 2025; Published: July 2, 2025

Copyright: © 2025 Alkan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: PISA 2018 data are freely accessible from the OECD. website: https://www.oecd.org/pisa/data/2018database/.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Today, various measurement and evaluation tools are used worldwide to analyze and compare both the quality of educational practices and student achievement [1] The best known and most effective comparative assessment program among them is the Program for International Student Assessment (PISA) [2]. OECD countries benefit greatly from the results of this program when evaluating and improving their education systems. PISA, which is conducted every three years by the OECD, assesses the skills of 15-year-old students in reading, mathematics and science and provides a valuable basis for international comparisons [3]. In contrast to traditional measurement and assessment methods, PISA focuses on assessing students’ critical thinking and practical problem-solving skills. In this sense, it aims to measure not only students’ access to knowledge, but also how they process and apply it in real-world contexts. As the need to raise educational standards in a globalized world becomes ever greater, PISA has grown in importance [4]. The results of the program provide an objective basis for understanding the position and performance of a country’s education system in an international comparison and promote a climate of competition [5]. In addition, this data allows countries to assess structural differences, policy orientations and implementation outcomes in their education systems. PISA also facilitates the investigation of the causes of individual success or failure and provides important insights for the development of strategies to improve student performance [6]. In order to gain relevant insights, the PISA data has been used in a variety of publications/studies for various analyzes and comparisons. A close examination of the purpose and content of these studies reveals that most focus on comparing countries’ performance in science, reading and maths and attempt to identify the determinants of success or failure in order to inform educational policy makers [7–12]. Studies that aim to improve the quality of learning processes and provide meaningful insights have also gained attention [10,13–15]. However, much of this literature relies on a limited number of variables and rarely applies advanced or innovative analytical techniques, so it has limited explanatory power. In general, student achievement is explained by commonly accepted variables such as socioeconomic status or the number of books in the household, which are well known and often do not require robust empirical testing. This underscores the need for researchers and policy makers to explore additional factors- that are as important or even more effective than socioeconomic status and that contribute to student achievement.

In this context, the rich, multivariate and culturally diverse structure of the PISA data requires not only traditional statistical analysis, but also the application of artificial intelligence-based algorithms. While many studies limit themselves to regression or correlation analyzes [16,17], they often overlook non-linear relationships, interactions between variables and contextual dependencies. In recent years, the increasing use of machine learning algorithms in educational research has shown that they have great potential to explain complex phenomena and make accurate predictions [18,19].

On the other hand, recent studies have shown that methods such as the Random Forest Algorithm (RFA) offer unique advantages in this area. In particular, its ability to detect nonlinear interactions and assess the relative importance of variables in large training datasets is remarkable [20–22]. While traditional approaches often ignore complex dependencies and interaction effects, RFA can uncover complex patterns that classical statistical models cannot detect. This property has made it a particularly valuable tool for comparative educational research. However, despite all this potential, the use of RFA in PISA-based studies is still limited, pointing to an important methodological gap that our study attempts to fill [23].

The main objective of this study is to identify the factors that influence student achievement in OECD countries with above-average performance in reading, maths and science, beyond traditional variables such as economic wealth or the number of books read, and to develop a predictive model based on these variables that can provide functional insights to policy makers in countries seeking to improve their PISA results. The following research questions guide the study:

With this in mind, to answer the relevant research questions in the study, an application of the Random Forest Algorithm was conducted with 24 categorical variables based on PISA (2018) results in science, reading and mathematics and student information for 37 OECD countries. RFA as an ensemble learning method consisting of multiple decision trees has been shown to be effective in both classification tasks and in determining the significance of variables [24]. Especially in complex social systems such as education, where traditional statistical models may not be able to capture nonlinear and high-dimensional relationships, RFA offers significant advantages. Its ability to work with large data sets, tolerate missing values and deal with multicollinearity has led to its increasing use in the field of educational data mining [25]. In this study, RFA was used not only to create powerful predictive models, but also to provide decision makers with evidence-based prioritization of influential variables. As a result of the analysis conducted using the R software, six variables for reading, six for mathematics and eight for science were identified as the most important factors influencing student performance. These variables included access to information technology, weekly hours of instruction in the respective subject area. Economic-social and cultural status, parents’ occupation and level of metacognition were found to be important and common variables for all three areas. Other variables that have an independent effect only on the respective domain are weekly hours of instruction in a foreign language for the reading domain; a sense of competition for the math domain; for the science domain, awareness of PISA was found to be the perception of difficulty and the emotional state related to the act of reading. Consequently, the determination of these variables is very important, especially when it comes to helping teachers and education authorities in the countries concerned to make correct and effective decisions in policy-making processes. In this context, it is clear that the study has a supportive and guiding character.

To support this aim and ensure a coherent flow of research, the structure of the article has been designed to reflect a logical sequence from context to conclusion. The Literature Review section discusses the contribution of PISA data to educational policy, identifies the key variables influencing student achievement and explores the limitations of traditional methods of analysis, providing a sound theoretical and empirical basis for the study. In this context, the importance of machine learning-based approaches --in particular the Random Forest Algorithm (RFA)-- is emphasized and the reasons for its selection are explained. The “Materials and methods” section provides a detailed description of the PISA 2018 dataset used in the study, covering aspects such as the selection of the sample, the coding of the variables and the data pre-processing procedures. It also outlines the reasons for the choice of the RFA and explains how it was implemented in the analysis. The “Results and findings” section presents the variables that have the greatest impact on student achievement in reading, math, and science. The relative importance of these variables and their common or domain-specific effects are interpreted and illustrated using visual representations. Finally, the “discussion and conclusions” section situates the findings within the wider literature and evaluates the methodological contributions of the study. Practical recommendations for policy makers are proposed, and directions for future research are suggested.

Background on PISA

The effects of recent technological developments can be clearly felt in educational research. Especially with the help of current technological methods and techniques such as data mining and machine learning, there are a large number of studies conducted with the aim of evaluating the quality of the education system, accelerating the learning and teaching process, investigating the factors that influence the quality of learning activity or directly affect learning success, and contributing to policy development processes in this direction. Among these studies, those based on the PISA data are particularly noteworthy because their scope and impact are international. Table 1 lists the studies that have been conducted based on PISA data to identify the factors that affect student’ learning outcomes and to suggest how education policies should be designed in this regard.

Download:

Table 1. PISA-based academic studies.

https://doi.org/10.1371/journal.pone.0326121.t001

If we examine the studies by purpose and year, we find that most of the studies published in the 2010s were conducted with the aim of demonstrating the contributions of PISA so that they can be used as an international basis and target for guiding education policy. It becomes clear that the benefits of the study, such as the improvement of the country’s education policies and activities and the possibility of international comparison, were taken for granted and the studies were conducted with the aim of making direct functional use of the PISA data and investigating the factors that influence student performance.

When the existing literature is examined in terms of the methodology employed, it was observed that previous studies applying the Random Forest Algorithm (RFA) to PISA data [12,20–22] generally focused on a single domain or were conducted using a limited set of variables. In contrast, the present study offered a notable methodological and practical contribution by simultaneously modelling student performance in three domains (reading, mathematics, and science), utilizing a broader variable set, and presenting the variable importance results in a comparative format. The choice of the Random Forest algorithm was not only aligned with the scope and structure of the study but also supported by its strong presence in the literature regarding its high accuracy and compatibility with categorical data. The Random Forest Algorithm (RFA) has outperformed logistic regression in analyzes with categorical data [26]. In a comparative study with 115 binary data sets and 14 different algorithms, it achieved the highest accuracy of all methods tested [27]. It also provided the most successful results compared to CART, C5.0, Naïve Bayes, Linear Discriminant Analysis and K-Nearest Neighbour in similar comparative analyzes [28]. In an analysis with PISA 2018 data, it provided higher accuracy than Hierarchical Linear Modeling [29]. It even outperformed Artificial Neural Networks and Support Vector Machines in predicting student satisfaction [30]. A review of 72 education-related studies conducted between 2015 and 2023 emphasized the strong correspondence with educational datasets and the ability to provide interpretable and meaningful predictions [11]. In light of these findings, the RFA was employed in the present study as a functional tool for identifying variable importance, constructing predictive models, and generating threshold values. Its methodological flexibility and analytical precision played a key role in achieving the core objectives of the study.

Materials and methods

In this section of the study, which was carried out to identify the main variables affecting the successful performance of countries with a score above the OECD average in reading, mathematics, and science, and to create a prediction model for other countries aiming to increase their PISA achievement through these variables, detailed information on the research data and methodological procedure is provided.

Obtaining data

The dataset of the study consists of PISA (2018) scores of 37 OECD countries in science, reading and mathematics and other variables including information about PISA participating students. When the data set was examined, it was seen that there was no missing data. .Subsequently, the non-categorical variables that provide information about the students were categorized. In Table 2, abbreviations and explanations related to the 24 categorical variables obtained within the scope of the PISA exam and to be used within the scope of the study are given.

Download:

Table 2. Information on the variables whose effect on PISA scores.

https://doi.org/10.1371/journal.pone.0326121.t002

Preparing data for analysis

The data set was subjected to a comprehensive pre-processing process before being analysed. Firstly, the continuous variables were converted into meaningful categories and made usable for the analysis. Distribution characteristics and literature applications were taken into account when categorising the variables. The variables that were considered necessary were coded and insignificant variables were removed. In addition, the names of the variables were standardised according to the abbreviations. Countries’ scores in science, reading and maths were assessed on the basis of the OECD average. 487 points in reading, 489 points in maths and 489 points in science were accepted as the OECD average. Countries that scored above these thresholds were categorised as high-performing countries, while countries that scored below were categorised as low-performing countries. Thus, the performance levels of the countries were categorised into two categories and a suitable structure for the classification algorithms was created.

Random forest application

When analysing the data, the random forest algorithm was preferred in order to determine the significance of the variables and create a prediction model. This algorithm is a machine learning method that works on the basis of decision trees and shows high performance in both classification and regression problems [24]. Random Forest can harmonise with high-imensional data structures, minimize multicollinearity problems and objectively represent the ranking of variable importance [25].

The structure of the categorical data used in the study further emphasizes the advantages of the random forest algorithm. The algorithm in question was favoured not only in terms of predictive success, but also in terms of its ability to define complex interactions between variables. In this context, recent studies such as [20,22,23] emphasise that the RFA method is increasingly used in educational data mining.

In the analysis, the data set was transferred to the programme R (version 4.3) and the Random Forest algorithm was applied via the Boruta package. The Boruta is a wrapping method based on Random Forest and calculates the significance levels of the variables using Z-scores [31]. Thanks to this package, the predictive power of all variables in the data is tested and only the significant variables are included in the model. During the analysis, a “shadow copy” was created for each variable and its Z-scores were compared; only variables with unique information value were included in the model. This increased the accuracy of the model and the degree of explainability.

Visualization of analysis results

In order to better understand the results of the analysis, the importance levels of the variables and their classification successes are shown graphically in Figs 1 and 2. The prominent variables for each domain (reading, mathematics and science) are visualized separately; this facilitates the comparison of the determining factors in different domains. These visualizations support both scientific interpretation and a more effective presentation of conclusions to policy makers.

Download:

Fig 1. Most Important Variables in Reading(a), Science(b) and Math(c) Section.

https://doi.org/10.1371/journal.pone.0326121.g001

Download:

Fig 2. Measurement areas and important variables in PISA 2018.

https://doi.org/10.1371/journal.pone.0326121.g002

Finding and results

In this part of the study, findings and results of the data analyses conducted to answer the research questions are presented.

RQ1: The most important variables affecting students’ PISA success

In line with the first research objective of determining the most important variables affecting students’ PISA achievement; standardised PISA scores (science, mathematics, reading) were obtained for OECD countries and a classification variable was created according to the success or failure of the countries. After the class variable was determined, the first findings regarding the results of the analysis carried out to determine which of the variables categorised in the previous stage, which were thought to have an effect on the achievement of the countries, had the most significant effect on achievement are presented in Fig 1 (a–c).

When the box plot, in which green, yellow and red coloured boxes represent important, temporarily important and unimportant variables respectively, is examined in detail, it is seen that 9 variables are important in total. ICTRES (ICT resources), CPERFORLAN (The typically required to attend: number of [class periods] per week in foreign language), CPERWEEK ( in a normal, full week at school, how many [class periods] are you required to attend in total?), METASPAM (Meta-cognition: assess credibility), ESCS (Index of economic, social and cultural status), HISEI (Index highest parental occupational status) for reading; COMPETE (Competitiveness), HISEI, METASPAM, ESCS, ICTRES, CPERWEEK for mathematics; ICTRES, CPERWEEK, PISADIFF (Perception of difficulty of the PISA test), METASPAM, COMPETE, JOYREAD (Joy/Like reading), ESCS, HISEI for science. On the other hand, it is seen that some of the variables have a direct effect on achievement in more than one field, even at different levels. In this respect, the variables whose area of influence and intersections are revealed according to the results of the analyses are presented in the Fig 2 in a more understandable way.

When the Fig 2 is analysed in detail, it is seen that ICTRES, CPERWEEK, ESCS, METASPAM and HISEI variables are determined as the most important variables for science, mathematics and reading sections, despite their different degrees of importance. On the other hand, another remarkable result is that CPERFORLAN variable is important for the reading section, PISADIFF and JOYREAD variables are important for the science section, and COMPETE variable is important for both maths and science sections. In the following, the average importance coefficients of the variables presented in Table 3 regarding the degree of importance in the related field are explicitly included.

Download:

Table 3. Coefficient table of determined variables (reading, maths and science).

https://doi.org/10.1371/journal.pone.0326121.t003

When Table 3 is analysed in detail; 6 variables are determined as the most important variables for the Reading section. It is seen that the variable with the highest average importance coefficient is CPERWEEK (7.266618) and the variable with the lowest importance coefficient is METASPAM (3.666450). When the coefficient estimates obtained from the average importance coefficient are analysed, the most important variable belonging to the Reading section is CPERWEEK (.844). The variables with the lowest prediction coefficient among the most important variables were METASPAM (.124) and ICTRES (.124).

For the mathematics section, 6 variables were determined as the most important variables. It is seen that the variable with the highest average importance coefficient is COMPETE (7.060205) and the variable with the lowest average importance coefficient is CPERWEEK (4.434677). When the coefficient estimates obtained in this section are analysed, it is seen that the highest coefficient of estimation is COMPETE (.628) and the lowest coefficient of estimation is CPERWEEK (.073).

In the science department, 8 variables were determined as the most important variable. It is seen that the variable with the highest average importance coefficient is METASPAM (8.875383) and the variable with the lowest importance coefficient is COMPETE (3.629179). When the coefficient estimates obtained in this section are analysed, the variable with the highest coefficient of estimation is METASPAM (.550) and the variable with the lowest coefficient of estimation is COMPETE (.029).

RQ2: Proposal for PISA score prediction models

The equations obtained from the coefficient estimates obtained through the model are presented in Model 1 for Reading, Model 2 for Maths and Model 3 for Science.

When the total coefficient values to be obtained over the models are analysed, it is predicted that a country with a score of greater than 3,219 from the model obtained from the Reading section can be successful. It is predicted that a country with a score of greater than 2.396 in Maths and greater than 2.732 in Science will be successful. Based on these results, it is predicted that countries can use the results obtained from the models as an early warning system. Thus, the answer to the second research question was obtained.

Discussion and conclusions

This study was conducted using data from PISA 2018 to identify the main factors influencing student performance in high-performing countries among OECD countries and to develop a predictive model based on these variables. Thanks to the RFA applied, not only non-linear relationships but also the interactions and rank order of importance of the variables were analysed. In this respect, the RFA went beyond classical statistical methods and offered more contextual and explanatory analysis possibilities; it recognised patterns that were not visible with traditional models [22,23]. One of the main reasons why the RFA was preferred is that it allows the simultaneous modelling of a large number of independent variables in high-dimensional data sets. This algorithm offers a great advantage, especially in terms of calculating the relative importance of the variables and eliminating multicollinearity problems [25]. However, the RFA algorithm does not directly show causal relationships, but only classifies the predictive power between the variables. Therefore, caution is required when interpreting the results obtained. It is recommended to test possible cause-effect relationships with supporting analyses.

The results of the analysis show that the most important variables influencing students’ PISA success are the following: Access to information technology in all subjects, weekly hours of instruction in each subject, metacognitive awareness, socioeconomic and cultural status, and parents’ occupation. In addition, weekly instruction in a foreign language was found to be significant for reading, competition in maths and science, perception of test difficulty, awareness and students’ emotional engagement in reading.

When examining the variables that influence PISA success, the variable of access to information technology should be mentioned first, which was identified as the most important variable for all areas. Tools and applications based on information technologies have been found to provide students with quick access to relevant sources of information, thereby facilitating their engagement in active learning environments and accelerating their learning processes [11,32–34]. On the other hand, the level of awareness of the use of technological tools and the use of relevant tools for educational purposes are very important for students’ academic success. Among the variables that affect PISA success, economic, social and cultural status and parents’ occupation are usually mentioned together. Socioeconomic status, family cultural structure, parents’ occupation, and their attitudes toward education have been shown to directly influence students’ academic success in numerous studies [35–38]. As a study by [39] shows, Vietnam, despite being one of the poorest countries in the world, performed very well in maths and reading in the 2012, 2015 and 2018 PISA exams compared to many other countries with high economic power. In this regard, it can be said that the economic level or socio-cultural disadvantages of students can be partially overcome with a good coaching and mentoring system. On the other hand [40], have clearly shown in their studies the contribution of parental support to students’ success. There are also various studies in the literature that support the idea that parental support is directly related to the parenting skills of the parents in question [41].

When examining the variable of weekly instructional hours, which is one of the most important variables influencing PISA results, it is generally found that in all three subjects, the length of weekly instructional hours devoted to learning in the subject in question, i.e., the amount of time devoted to learning, has a direct impact on students’ academic performance [42]. state in their study that time management is a variable that should be considered as part of the implementation of a correct educational policy. Similarly, this also applies to the variable of weekly lessons in a foreign language, which was identified as an important variable based on the reading section. The amount and continuity of lessons play a crucial role in improving students’ grammar, vocabulary and reading strategies [43].

Among the main variables that influence PISA success, another variable that attracts attention is metacognition. Metacognition refers to a person’s ability to understand, control and regulate their own thought processes. The relationship between academic success and metacognition is related to students’ ability to manage learning processes more effectively. As [44] emphasise, students with metacognitive skills are more likely to be able to set learning goals, develop strategies, monitor their learning progress and make adjustments when necessary. As [45] suggests, the use of metacognitive processes increases students’ deep understanding, problem-solving skills and critical thinking. This helps them to be more successful in exams, projects and overall academic performance. Metacognition is often positively associated with academic success because it allows students to be more intentional about their learning processes. Metacognition can therefore have a positive impact on students’ academic success as it enables them to utilise learning strategies effectively.

The variable that has a direct impact on PISA success in maths is the variable competition. Competition refers to the rivalry between individuals or groups to achieve certain goals. This competition is often an incentive for better performance, success or the achievement of a specific goal. Maths lessons, where what is right or wrong can be clearly defined, provide a more suitable basis for performance comparisons, which encourages competition. However, in subjects such as psysical education, science and reading, competition is not so obvious as the subject is often based on subjective judgement and collaboration [46]. In these subjects, a variety of skills are often emphasised and the focus is on collaboration and understanding rather than competition.

Other unique variables that directly impact PISA success in science include two variables: emotional state related to the act of reading and perception of the difficulty of the PISA test. The enjoyment of reading supports students’ ability to understand and evaluate scientific texts. Positive reading motivation promotes deeper learning by increasing students’ interest in scientific topics. On the other hand, the perception of the difficulty of the test, i.e., the perception of the quality of the content of the texts read, influences the students’ attitude and their success in the exams. A high perception of difficulty creates stress and anxiety in students. This can lead to poor performance in science classes. Therefore, it is important that students are helped to develop a positive attitude towards the exams. However, the effects of these two variables can vary from student to student. As suggested by [47], students’ individual characteristics, learning styles and motivation levels interact with these factors. It is important for teachers to understand how students respond to these factors and support their individual learning needs in order to increase learning success in science courses.

Strategic recommendations for practitioners

The results obtained provide concrete data that can be utilised to shape educational policy. In this context, improving digital accessibility should not be limited to the provision of hardware, but should also be supported by the development of digital pedagogical content for teachers and students. The integration of metacognitive skills into the curriculum, especially through self-directed learning strategies, will help students to manage their learning processes more consciously and effectively. It is important to increase the number of lessons only quantitatively and to focus this process on learning outcomes by enriching it with skilled content. The competitive environment should be constructively encouraged, especially in areas that require objective assessment, such as maths. However, supporting co-operative learning activities in subjects such as science and reading can lead to more effective outcomes. In addition, counselling services for economically disadvantaged students and support systems that involve the family should be strengthened, which will help to reduce inequalities in education [39].

Limitations and future research

This study has some methodological limitations. First of all, the analysis covers only 37 OECD member countries. A direct generalisation of the results to countries outside the OECD is therefore only possible to a limited extent and should be interpreted taking contextual differences into account. Furthermore, the study was limited to data at student level; variables at teacher, school or system level were excluded from this model. This restricts the scope of the model and limits the impact analysis to a specific level. In future studies, the use of statistical models at multiple levels, the combination of different data sources and the preference for mixed methods supported by qualitative interviews will help to obtain results that are both generalisable and explanatory.

General evaluation

This study has made the multidimensional structures behind the PISA success more visible by utilising the powerful classification and determination of the degree of importance of variables by the RFA, providing a remarkable framework for evidence-based policy making in education. The flexibility that RFA offers in explaining complex interactions that are often ignored by traditional methods is a significant advantage, especially in multivariate systems such as education [20,22]. The insights gained not only provide directional information about the factors that influence student success, but also about which components of the education system to focus on.

Indeed, recent OECD reports [48,49] clearly emphasise the influence of students’ access to digital resources, levels of metacognitive awareness and time spent learning on long-term success. The results of this research also support and strengthen these findings through a machine learning-based approach to analysis. RFA can therefore be evaluated not only as a classification tool, but also as a strategic method that can contribute to the development of more equitable,student-centred and effective educational policies.

In future studies, the application of similar methods in different groups of countries, educational systems or disciplines will both prepare the ground for comparative analyses and allow stronger conclusions to be drawn about the generalisability of this model.

Supporting information

S1 Data. Dataset containing the variables influencing PISA scores.

https://doi.org/10.1371/journal.pone.0326121.s001

(XLSX)

References

1. Barrenechea I, Beech J, Rivas A. How can education systems improve? A systematic literature review. J Educ Change. 2022;24(3):479–99.
- View Article
- Google Scholar
2. Ledger S, Thier M, Bailey L, Pitts C. OECD’s approach to measuring global competency: powerful voices shaping education. Teach Coll Rec. 2019;121(8):1–40.
- View Article
- Google Scholar
3. Pons X. Fifteen years of research on PISA effects on education governance: a critical review. Eur J Educ. 2017;52(2):131–44.
- View Article
- Google Scholar
4. Breakspear S. The policy impact of PISA: an exploration of the normative effects of international benchmarking in school system performance. OECD Educ Work Pap. 2012;71.
- View Article
- Google Scholar
5. Niemann D, Martens K, Teltemann J. PISA and its consequences: shaping education policies through international comparisons. Euro J Educ. 2017;52(2):175–83.
- View Article
- Google Scholar
6. Auld E, Xiaomin L, Morris P. Piloting PISA for development to success: an analysis of its findings, framework and recommendations. Compare. 2020;52(7):1145–69.
- View Article
- Google Scholar
7. Fredriksson U, Holzer T, McCluskey-Cavin H, Taube K. Strengths and weaknesses in the Swedish and Swiss education systems: a comparative analysis based on PISA data. Eur Educ Res J. 2009;8(1):54–68.
- View Article
- Google Scholar
8. Waldow F, Takayama K, Sung Y-K. Rethinking the pattern of external policy referencing: media discourses over the ‘Asian Tigers’’ PISA success in Australia, Germany and South Korea. Comp Educ. 2014;50(3):302–21.
- View Article
- Google Scholar
9. Sälzer C, Roczen N. Assessing global competence in PISA 2018: challenges and approaches to capturing a complex construct. Int J Develop Educ Glob Learn. 2018;10(1).
- View Article
- Google Scholar
10. dos Santos RA, Paulista CR, da Hora HRM. Education data mining on PISA 2015 best ranked countries: what makes the students go well. Tech Know Learn. 2021.
- View Article
- Google Scholar
11. Kim M, Kim H. Profiles of students’ ICT use in high-performing countries in PISA 2018. Comput Sch. 2023;40(3):262–81.
- View Article
- Google Scholar
12. Liu A, Wei Y, Xiu Q, Yao H, Liu J. How learning time allocation make sense on secondary school students’ academic performance: a Chinese evidence based on PISA 2018. Behav Sci (Basel). 2023;13(3):237. pmid:36975262
- View Article
- PubMed/NCBI
- Google Scholar
13. Agasisti T, Longobardi S. Equality of educational opportunities, schools’ characteristics and resilient students: an empirical study of EU-15 countries using OECD-PISA 2009 data. Soc Indic Res. 2016;134(3):917–53.
- View Article
- Google Scholar
14. Lopez AACC, Gamazo A. Multilevel study about the explanatory variables of the results of Mexico in PISA 2015. Educ Policy Anal Arch. 2020;28.
- View Article
- Google Scholar
15. Mazurek J, Fernández García C, Pérez Rico C. Inequality and students’ PISA 2018 performance: a cross-country study. CER. 2021;24(3):163–83.
- View Article
- Google Scholar
16. Hanushek EA, Woessmann L. The economics of international differences in educational achievement. In: Hanushek EA, Machin S, Woessmann L, editors. Handbook of the economics of education. Amsterdam: Elsevier; 2011. p. 89–200.
17. Jerrim J. Why do East Asian children perform so well in PISA? An investigation of Western-born children of East Asian descent. Oxf Rev Educ. 2013;39(4):510–38.
- View Article
- Google Scholar
18. Baker RS, Inventado PS. Educational data mining and learning analytics. In: Peña-Ayala A, editor. Learning analytics: fundaments, applications, and trends. Cham: Springer; 2014. p. 61–75.
19. Koedinger KR, D’Mello S, McLaughlin EA, Pardos ZA, Rosé CP. Data mining and education. Wiley Interdiscip Rev Cogn Sci. 2015;6(4):333–53. pmid:26263424
- View Article
- PubMed/NCBI
- Google Scholar
20. Tan B, Cutumisu M. Employing tree-based algorithms to predict students’ self-efficacy in PISA 2018. Proc Educ Data Min Conf; 2022. p. 73–9.
21. Hong S, Kim J. Random forest analysis of factors predicting science achievement groups. Int J Innov Sci Math. 2022;10(1):12–9.
- View Article
- Google Scholar
22. He Q, von Davier M. Analyzing process data from problem-solving items with n-grams: insights from a computer-based large-scale assessment. Front Psychol. 2016;7:773.
- View Article
- Google Scholar
23. Zhang M, Zhao Y, Liu Y. An interpretable machine learning model for predicting student performance in PISA. Comput Educ. 2020;156:103952.
- View Article
- Google Scholar
24. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
- View Article
- Google Scholar
25. Romero C, Ventura S. Educational data mining and learning analytics: an updated survey. WIREs Data Min Knowl. 2020;10(3):e1355.
- View Article
- Google Scholar
26. Prakash S, Bhanu S, Bigul SD. Random forest and logistic regression algorithms: a comparison of their performance. AIP Conference Proceedings; 2023. 050013 p. https://doi.org/10.1063/5.0118420
27. Wainer J. Comparison of 14 different families of classification algorithms on 115 binary datasets. arXiv. 2016. arXiv:1606.00930.
- View Article
- Google Scholar
28. Brohi S, Pillai TR, Kaur S, Kaur H, Sukumaran S, Asirvatham D. Accuracy comparison of machine learning algorithms for predictive analytics in higher education. In: Saeed F, Mohammed F, Gazem N, Busalim A, editors. Emerging technologies in computing. Cham: Springer; 2019. p. 223–35. https://doi.org/10.1007/978-3-030-23943-5_19
29. Weng W, Luo W. A comparative analysis of data mining methods and hierarchical linear modeling using PISA 2018 data. IJDMS. 2023;15(2/3):1–16.
- View Article
- Google Scholar
30. Supriyadi D, Purwanto P, Warsito B. Comparison of random forest algorithm, support vector machine and neural network for classification of student satisfaction towards higher education services. AIP Conference Proceedings; 2022. 060003 p. https://doi.org/10.1063/5.0106201
31. Kursa MB, Rudnicki WR. Feature Selection with the Boruta package. J Stat Soft. 2010;36(11):1–13.
- View Article
- Google Scholar
32. Kong S, Chan TW, Griffin P, Hoppe U, Huang R, Kinshuk , et al. Digital learning for developing Asian countries: achieving equity, quality, and efficiency in education. Educ Technol Res Dev. 2022;70(1):1–5.
- View Article
- Google Scholar
33. Wang Y, Wang L. The impact of digital learning on student performance: evidence from PISA 2018. J Educ Comput Res. 2023;61(2):345–62.
- View Article
- Google Scholar
34. Agasisti T, Avvisati F, Borgonovi F. The role of digital technologies in education: evidence from PISA 2018. OECD Educ Work Pap. 2023;(249):1–35.
- View Article
- Google Scholar
35. Agasisti T, Avvisati F, Borgonovi F. Socio-economic status and student performance: evidence from PISA 2018. OECD Educ Work Pap. 2021;2021:1–29.
- View Article
- Google Scholar
36. Eriksson K, Lindvall J, Svensson T. Family background and student achievement: evidence from PISA 2018. Scand J Educ Res. 2021;65(3):1–15.
- View Article
- Google Scholar
37. Rolfe V. The impact of parental occupation on student achievement: a PISA 2018 analysis. Br J Sociol Educ. 2021;42(4):567–84.
- View Article
- Google Scholar
38. Perry B. Cultural capital and educational outcomes: a study of PISA 2018 data. Int J Educ Res. 2022;112:101921.
- View Article
- Google Scholar
39. Dang HAH, Glewwe P, Lee J. Vietnam’s exceptional performance on the PISA test: lessons for other developing countries. J Dev Econ. 2023;156:102839.
- View Article
- Google Scholar
40. Fernández-Alonso R, Suárez-Álvarez J, Muñiz J. Parental involvement and academic achievement: a meta-analysis. Rev Educ. 2022;400(1):1–25.
- View Article
- Google Scholar
41. Crede M, Roch SG, Kieszczynka UM. Class attendance in college: a meta-analytic review of the relationship of class attendance with grades and student characteristics. Rev Educ Res. 2015;85(2):272–95.
- View Article
- Google Scholar
42. Liu Y, Li H, Zhang J. Time management and academic performance: evidence from PISA 2018. Eur J Educ. 2023;58(1):1–17.
- View Article
- Google Scholar
43. Pitsia V. The role of foreign language instruction in reading achievement: insights from PISA 2018. Lang Learn J. 2022;50(2):123–35.
- View Article
- Google Scholar
44. Gul R, Shehzad S. Metacognitive awareness and academic achievement among university students. J Educ Pract. 2012;3(3):1–7.
- View Article
- Google Scholar
45. Abdelrahman R. The impact of metacognitive strategies on academic achievement: a meta-analysis. Int J Instr. 2020;13(2):1–16.
- View Article
- Google Scholar
46. Lens W. Achievement motivation and future time perspective theory: the time to turn. Psicothema. 2005;17(2):225–31.
- View Article
- Google Scholar
47. Hanus MD, Fox J. Assessing the effects of gamification in the classroom: a longitudinal study on intrinsic motivation, social comparison, satisfaction, effort, and academic performance. Comput Educ. 2015;80:152–61.
- View Article
- Google Scholar
48. OECD. OECD digital education outlook 2023: towards an effective digital education ecosystem. OECD Publishing; 2023. https://doi.org/10.1787/c74f03de-en
49. OECD. Shaping digital education: enabling factors for quality, equity and efficiency. OECD Publishing; 2023. https://doi.org/10.1787/bac4dc9f-en

[ref1] 1. Barrenechea I, Beech J, Rivas A. How can education systems improve? A systematic literature review. J Educ Change. 2022;24(3):479–99.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Ledger S, Thier M, Bailey L, Pitts C. OECD’s approach to measuring global competency: powerful voices shaping education. Teach Coll Rec. 2019;121(8):1–40.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Pons X. Fifteen years of research on PISA effects on education governance: a critical review. Eur J Educ. 2017;52(2):131–44.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Breakspear S. The policy impact of PISA: an exploration of the normative effects of international benchmarking in school system performance. OECD Educ Work Pap. 2012;71.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Niemann D, Martens K, Teltemann J. PISA and its consequences: shaping education policies through international comparisons. Euro J Educ. 2017;52(2):175–83.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Auld E, Xiaomin L, Morris P. Piloting PISA for development to success: an analysis of its findings, framework and recommendations. Compare. 2020;52(7):1145–69.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Fredriksson U, Holzer T, McCluskey-Cavin H, Taube K. Strengths and weaknesses in the Swedish and Swiss education systems: a comparative analysis based on PISA data. Eur Educ Res J. 2009;8(1):54–68.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Waldow F, Takayama K, Sung Y-K. Rethinking the pattern of external policy referencing: media discourses over the ‘Asian Tigers’’ PISA success in Australia, Germany and South Korea. Comp Educ. 2014;50(3):302–21.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Sälzer C, Roczen N. Assessing global competence in PISA 2018: challenges and approaches to capturing a complex construct. Int J Develop Educ Glob Learn. 2018;10(1).
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. dos Santos RA, Paulista CR, da Hora HRM. Education data mining on PISA 2015 best ranked countries: what makes the students go well. Tech Know Learn. 2021.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Kim M, Kim H. Profiles of students’ ICT use in high-performing countries in PISA 2018. Comput Sch. 2023;40(3):262–81.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Liu A, Wei Y, Xiu Q, Yao H, Liu J. How learning time allocation make sense on secondary school students’ academic performance: a Chinese evidence based on PISA 2018. Behav Sci (Basel). 2023;13(3):237. pmid:36975262
View Article
PubMed/NCBI
Google Scholar

[35] View Article

[36] PubMed/NCBI

[37] Google Scholar

[ref13] 13. Agasisti T, Longobardi S. Equality of educational opportunities, schools’ characteristics and resilient students: an empirical study of EU-15 countries using OECD-PISA 2009 data. Soc Indic Res. 2016;134(3):917–53.
View Article
Google Scholar

[39] View Article

[40] Google Scholar

[ref14] 14. Lopez AACC, Gamazo A. Multilevel study about the explanatory variables of the results of Mexico in PISA 2015. Educ Policy Anal Arch. 2020;28.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref15] 15. Mazurek J, Fernández García C, Pérez Rico C. Inequality and students’ PISA 2018 performance: a cross-country study. CER. 2021;24(3):163–83.
View Article
Google Scholar

[45] View Article

[46] Google Scholar

[ref16] 16. Hanushek EA, Woessmann L. The economics of international differences in educational achievement. In: Hanushek EA, Machin S, Woessmann L, editors. Handbook of the economics of education. Amsterdam: Elsevier; 2011. p. 89–200.

[ref17] 17. Jerrim J. Why do East Asian children perform so well in PISA? An investigation of Western-born children of East Asian descent. Oxf Rev Educ. 2013;39(4):510–38.
View Article
Google Scholar

[49] View Article

[50] Google Scholar

[ref18] 18. Baker RS, Inventado PS. Educational data mining and learning analytics. In: Peña-Ayala A, editor. Learning analytics: fundaments, applications, and trends. Cham: Springer; 2014. p. 61–75.

[ref19] 19. Koedinger KR, D’Mello S, McLaughlin EA, Pardos ZA, Rosé CP. Data mining and education. Wiley Interdiscip Rev Cogn Sci. 2015;6(4):333–53. pmid:26263424
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref20] 20. Tan B, Cutumisu M. Employing tree-based algorithms to predict students’ self-efficacy in PISA 2018. Proc Educ Data Min Conf; 2022. p. 73–9.

[ref21] 21. Hong S, Kim J. Random forest analysis of factors predicting science achievement groups. Int J Innov Sci Math. 2022;10(1):12–9.
View Article
Google Scholar

[58] View Article

[59] Google Scholar

[ref22] 22. He Q, von Davier M. Analyzing process data from problem-solving items with n-grams: insights from a computer-based large-scale assessment. Front Psychol. 2016;7:773.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref23] 23. Zhang M, Zhao Y, Liu Y. An interpretable machine learning model for predicting student performance in PISA. Comput Educ. 2020;156:103952.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref24] 24. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.
View Article
Google Scholar

[67] View Article

[68] Google Scholar

[ref25] 25. Romero C, Ventura S. Educational data mining and learning analytics: an updated survey. WIREs Data Min Knowl. 2020;10(3):e1355.
View Article
Google Scholar

[70] View Article

[71] Google Scholar

[ref26] 26. Prakash S, Bhanu S, Bigul SD. Random forest and logistic regression algorithms: a comparison of their performance. AIP Conference Proceedings; 2023. 050013 p. https://doi.org/10.1063/5.0118420

[ref27] 27. Wainer J. Comparison of 14 different families of classification algorithms on 115 binary datasets. arXiv. 2016. arXiv:1606.00930.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref28] 28. Brohi S, Pillai TR, Kaur S, Kaur H, Sukumaran S, Asirvatham D. Accuracy comparison of machine learning algorithms for predictive analytics in higher education. In: Saeed F, Mohammed F, Gazem N, Busalim A, editors. Emerging technologies in computing. Cham: Springer; 2019. p. 223–35. https://doi.org/10.1007/978-3-030-23943-5_19

[ref29] 29. Weng W, Luo W. A comparative analysis of data mining methods and hierarchical linear modeling using PISA 2018 data. IJDMS. 2023;15(2/3):1–16.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref30] 30. Supriyadi D, Purwanto P, Warsito B. Comparison of random forest algorithm, support vector machine and neural network for classification of student satisfaction towards higher education services. AIP Conference Proceedings; 2022. 060003 p. https://doi.org/10.1063/5.0106201

[ref31] 31. Kursa MB, Rudnicki WR. Feature Selection with the Boruta package. J Stat Soft. 2010;36(11):1–13.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref32] 32. Kong S, Chan TW, Griffin P, Hoppe U, Huang R, Kinshuk , et al. Digital learning for developing Asian countries: achieving equity, quality, and efficiency in education. Educ Technol Res Dev. 2022;70(1):1–5.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref33] 33. Wang Y, Wang L. The impact of digital learning on student performance: evidence from PISA 2018. J Educ Comput Res. 2023;61(2):345–62.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref34] 34. Agasisti T, Avvisati F, Borgonovi F. The role of digital technologies in education: evidence from PISA 2018. OECD Educ Work Pap. 2023;(249):1–35.
View Article
Google Scholar

[91] View Article

[92] Google Scholar

[ref35] 35. Agasisti T, Avvisati F, Borgonovi F. Socio-economic status and student performance: evidence from PISA 2018. OECD Educ Work Pap. 2021;2021:1–29.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref36] 36. Eriksson K, Lindvall J, Svensson T. Family background and student achievement: evidence from PISA 2018. Scand J Educ Res. 2021;65(3):1–15.
View Article
Google Scholar

[97] View Article

[98] Google Scholar

[ref37] 37. Rolfe V. The impact of parental occupation on student achievement: a PISA 2018 analysis. Br J Sociol Educ. 2021;42(4):567–84.
View Article
Google Scholar

[100] View Article

[101] Google Scholar

[ref38] 38. Perry B. Cultural capital and educational outcomes: a study of PISA 2018 data. Int J Educ Res. 2022;112:101921.
View Article
Google Scholar

[103] View Article

[104] Google Scholar

[ref39] 39. Dang HAH, Glewwe P, Lee J. Vietnam’s exceptional performance on the PISA test: lessons for other developing countries. J Dev Econ. 2023;156:102839.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref40] 40. Fernández-Alonso R, Suárez-Álvarez J, Muñiz J. Parental involvement and academic achievement: a meta-analysis. Rev Educ. 2022;400(1):1–25.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

[ref41] 41. Crede M, Roch SG, Kieszczynka UM. Class attendance in college: a meta-analytic review of the relationship of class attendance with grades and student characteristics. Rev Educ Res. 2015;85(2):272–95.
View Article
Google Scholar

[112] View Article

[113] Google Scholar

[ref42] 42. Liu Y, Li H, Zhang J. Time management and academic performance: evidence from PISA 2018. Eur J Educ. 2023;58(1):1–17.
View Article
Google Scholar

[115] View Article

[116] Google Scholar

[ref43] 43. Pitsia V. The role of foreign language instruction in reading achievement: insights from PISA 2018. Lang Learn J. 2022;50(2):123–35.
View Article
Google Scholar

[118] View Article

[119] Google Scholar

[ref44] 44. Gul R, Shehzad S. Metacognitive awareness and academic achievement among university students. J Educ Pract. 2012;3(3):1–7.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref45] 45. Abdelrahman R. The impact of metacognitive strategies on academic achievement: a meta-analysis. Int J Instr. 2020;13(2):1–16.
View Article
Google Scholar

[124] View Article

[125] Google Scholar

[ref46] 46. Lens W. Achievement motivation and future time perspective theory: the time to turn. Psicothema. 2005;17(2):225–31.
View Article
Google Scholar

[127] View Article

[128] Google Scholar

[ref47] 47. Hanus MD, Fox J. Assessing the effects of gamification in the classroom: a longitudinal study on intrinsic motivation, social comparison, satisfaction, effort, and academic performance. Comput Educ. 2015;80:152–61.
View Article
Google Scholar

[130] View Article

[131] Google Scholar

[ref48] 48. OECD. OECD digital education outlook 2023: towards an effective digital education ecosystem. OECD Publishing; 2023. https://doi.org/10.1787/c74f03de-en

[ref49] 49. OECD. Shaping digital education: enabling factors for quality, equity and efficiency. OECD Publishing; 2023. https://doi.org/10.1787/bac4dc9f-en

Figures

Abstract

Introduction

Background on PISA

Materials and methods

Obtaining data

Preparing data for analysis

Random forest application

Visualization of analysis results

Finding and results

RQ1: The most important variables affecting students’ PISA success

RQ2: Proposal for PISA score prediction models

Discussion and conclusions

Strategic recommendations for practitioners

Limitations and future research

General evaluation

Supporting information

S1 Data. Dataset containing the variables influencing PISA scores.

References