Figures
Abstract
The employment of college students is an important issue that affects national development and social stability. In recent years, the increase in the number of graduates, the pressure of employment, and the epidemic have made the phenomenon of ’slow employment’ increasingly prominent, becoming an urgent problem to be solved. Data mining and machine learning methods are used to analyze and predict the employment prospects for graduates and provide effective employment guidance and services for universities, governments, and graduates. It is a feasible solution to alleviate the problem of ’slow employment’ of graduates. Therefore, this study proposed a feature selection prediction model (bGEBA-SVM) based on an improved bat algorithm and support vector machine by extracting 1694 college graduates from 2022 classes in Zhejiang Province. To improve the search efficiency and accuracy of the optimal feature subset, this paper proposed an enhanced bat algorithm based on the Gaussian distribution-based and elimination strategies for optimizing the feature set. The training data were input to the support vector machine for prediction. The proposed method is experimented by comparing it with peers, well-known machine learning models on the IEEE CEC2017 benchmark functions, public datasets, and graduate employment prediction dataset. The experimental results show that bGEBA-SVM can obtain higher prediction Accuracy, which can reach 93.86%. In addition, further education, student leader experience, family situation, career planning, and employment structure are more relevant characteristics that affect employment outcomes. In summary, bGEBA-SVM can be regarded as an employment prediction model with strong performance and high interpretability.
Citation: Wei Y, Rao X, Fu Y, Song L, Chen H, Li J (2023) Machine learning prediction model based on enhanced bat algorithm and support vector machine for slow employment prediction. PLoS ONE 18(11): e0294114. https://doi.org/10.1371/journal.pone.0294114
Editor: A. L. Mahfoodh, UNITEN: Universiti Tenaga Nasional, MALAYSIA
Received: May 2, 2023; Accepted: October 23, 2023; Published: November 9, 2023
Copyright: © 2023 Wei et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The data involved in this study are all public data, which can be downloaded through the link (https://github.com/Forproject1111/bGEBA-SVM).
Funding: This work was supported in part by the Philosophy and Social Science Foundation in Zhejiang Province of China under Grant 23GXSZ68YB, the Humanities and Social Sciences Program of the Ministry of Education under Grant 23JDSZ3236, the Scientific and Technological Innovation Project of Zhejiang University Students under Grant 2023R253002 and the Philosophy and Social Science Project of Wenzhou, China, under Grant 23BM038YB. The four projects were carried out by the primary author of this paper Yan Wei. These projects primarily focused on the research of innovation, entrepreneurship, and employment education, providing a solid research foundation and data support for the writing of this paper. Consequently, this facilitated the organization of ideas and propelled the process of data analysis.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Employment is the most basic livelihood, and the employment of college students is an important issue related to people’s livelihood and the stable development of society. In recent years, China’s economic development has entered a new normalized stage, and the number of graduates as well as the increase in employment pressure. It has led to more college graduates who do not intend to be employed immediately after completing their studies, and thus actively or passively become a ’slow employment’ group. The phenomenon of ’slow employment’ has become a prominent problem in the employment of college graduates, in addition to the impact brought about by the epidemic, the phenomenon of ’slow employment’ of college students has become more prominent. It brings a greater negative impact on China’s economic development, social stability, and talent training in many aspects.
Understanding the basic situation of the ’slow employment’ group of college students, analyzing the causes of the ’slow employment’ phenomenon, and digging out the key influencing factors, provides a set of scientific and effective solutions to prevent the ’slow employment’ behavior and crack the ’slow employment’ phenomenon. This is of great significance to promote higher quality and fuller employment of graduates. The employment landscape for college students and graduates has become more challenging due to the COVID-19 pandemic [1]. Shi [2] pointed out that the record-breaking number of graduates and the problem of career decision-making for graduates due to the economic recession have become obstacles to the active employment of college students. Researchers and college educators are increasingly concerned about graduate employment and the potential problems that may exist in the future career choices of current students. Li and Zhang [3] constructed a decision tree model based on big data to provide effective help for college students to solve the problem of slow employment in terms of scientific decision-making, precise guidance, accurate service, and time-sensitive assistance. Wang and Li [4] verified the mechanism of the influence of employment value on the willingness to choose slow employment by means of a survey document for students of several universities in Haidian and Changping districts of Beijing. The study found that long-term post-employment income and employment costs have a facilitating effect on positive ’slow employment’ choices, employment anxiety only plays a mediating role, and short-term income plays a negative role for ’slow employment’. This study contributes to a deeper understanding of college students’ career choices and provides a reference for full employment.
At present, data mining technology has been applied in the fields of academic early warning [5], career planning [6], and teaching assessment [7]. The database of student employment information also contains a large number of valuable laws. With the development of computer technology, some machine learning methods [8] or hybrid models [9,10] have gradually come into the public’s view and become an important method to solve the prediction problem. Of course, these techniques have also received attention in the field of education. Rahman et al [11] used data mining techniques for feature selection and predicted graduate employment using techniques such as K-Nearest Neighbor, Naive Bayes, Decision Trees, Neural Networks, Logistic Regression and Support Vector Machines and then analyzed the data based on Rapid Miner. Chen et al [12] used the road factor score approach to systematically analyze and assess the comprehensive employability of graduates to scientifically guide students in their search for suitable careers. Bharambe et al [13] used data mining techniques to assess the employability of students by using a classification model to intelligently predict which types of companies’ needs the skills acquired by students are suitable for. Zhao et al [14] proposed a random forest algorithm to select features for the employee retention rate of ’double-class’ university graduates, and based on the principal component analysis, it was found that the economic levels factors such as regional gross domestic product, wages, the average sales price of commercial properties, and the unemployment rate were the main factors affecting employment mobility in each province and city, and then a back propagation neural network was used to predict and obtain high accuracy and stability. Zhao et al [15] proposed a model for predicting the employment situation of college graduates based on long- and short-term memory recurrent neural networks. The model can effectively reflect the complex characteristics of university graduate employment data and the nonlinear dynamic interaction of influencing factors, and the data that mainly affect the employment situation were selected for prediction. It is compared with the traditional statistical method based on the cluster analysis technique, and the results show that the technique has higher prediction accuracy and reliability. Tu et al [16] proposed a model for predicting the entrepreneurial intentions of graduate students based on a chaotic local search sine cosine algorithm, random forest, and support vector machine, and demonstrated the importance of components such as major, gender, general student type, grade point average and total credits in influencing the choice of entrepreneurial intentions. Gao et al [17] proposed an intelligent prediction model of employment stability with a multi-group slime mould algorithm combined with support vector machines, which showed better prediction results and demonstrated the association of current employment monthly salary, first employment monthly salary, change of employment location, degree of first employment major affiliation, and salary difference on students’ employment stability.
To predict and assess the ’slow employment’ phenomenon of college students, this work presented and used bGEBA-SVM, a wrapper feature selection approach based on the improved bat algorithm (GEBA) and support vector machine. First, to enhance the optimization capability of the feature subset search method, Gaussian distribution-based strategy and Elimination Strategy were introduced into the bat algorithm to enhance the global optimization capability of the algorithm and to improve the population quality.
In recent years, many excellent optimization methods have been successively proposed, which include colony predation algorithm (CPA) [18], Harris hawks optimization (HHO) [19], slime mould algorithm (SMA) [20,21], bat algorithm (BA) [22], firefly algorithm (FA) [23], sine cosine algorithm (SCA) [24], butterfly optimization algorithm (BOA) [25], ant colony algorithm for the continuous domain (ACOR) [26], Runge Kutta optimizer (RUN) [27], rime optimization algorithm (RIME) [28], weighted mean of vectors (INFO) [29], and hunger games search (HGS) [30], chaotic whale optimizer (CWOAII) [31], hybridizing sine cosine algorithm with differential evolution (SCADE) [32], chaotic bat algorithm (CBA) [33], a bat algorithm based on collaborative and dynamic opposition-based learning (CDLOBA) [34], hybrid bat algorithm (RCBA) [35]. To test optimization performance of the proposed GEBA, it is compared with 10 advanced peers on 30 benchmark functions from IEEE CEC2017 [36]. The experimental results are then analyzed by Wilcoxon’s Signed-Rank Test (WSRT) [35] and Friedman test (FT) [37] to finally verify its excellent optimization performance. Second, the experimental results demonstrate the better interpretability and predictive performance of bGEBA-SVM by comparing it with five similar methods and four advanced prediction models through public datasets and graduate employment prediction datasets from the class of 2022 college graduates in Zhejiang Province.
The remaining structure of the paper is set up as follows. Section 2 presents the graduate employment prediction dataset and the proposed bGEBA-SVM prediction method. Section 3 implements a benchmark function experiment based on IEEE CEC2017 to validate GEBA’s optimization capacity. Section 4 validates the prediction ability of bGEBA-SVM with the public dataset and the graduate employment prediction dataset. Section 5 discusses the suggested approach and the experimental results in further detail. Section 6 summarizes this study and proposes goals for future research based on the existing foundation.
2 Materials and methods
2.1 Graduate employment prediction dataset
In this study, 1,694 graduates of the class of 2022 from Zhejiang universities were selected for the study, and predictions were made based on 18 characteristics. The details of the Graduate Employment Prediction (GEP) dataset are shown in Table 1.
Since the above-mentioned data did not involve ethical issues, the review committee/ethics committee of Wenzhou Vocational College of Science and Technology granted an exemption from ethical review.
2.2 Bat algorithm
The bat algorithm mainly simulates the behavior of bats to find tiny insects foraging through an echolocation system. In the bat algorithm, each bat (search agent) flies at a random velocity vi at the location xi (solution of the problem) while the bats have different wavelengths, loudness Ai, and pulse emissivity r. When a bat finds prey, its frequency, loudness, and pulse emissivity change for the best solution selection. The specific, more detailed procedure of the bat algorithm is as follows.
First, each bat generates ultrasonic frequencies fi according to random, as shown in Eq (1).
(1)
where β is a 0 to 1 random vector, and fmin and fmax are the minimum and maximum values of fi which are set to 0 and 2, respectively.
Then, each bat updates its velocity according to its current velocity
and the distance between its current position
and the optimal position xbest further updates the bat’s current position
according to
. Eqs (2) and (3) are used to calculate
and
, respectively.
When the global optimal solution is updated, each local solution xold in the current population is updated, as shown in Eq (4).
(4)
where ε denotes a random number obeying a uniform distribution between -1 and 1, and At denotes the average loudness of all bats.
After the position is updated, the bat makes a greedy choice between the current position xold and the updated position xnew. When the bat position is updated, the loudness A and pulse frequency r are updated, as shown in Eqs (5) and (6).
(5)
(6)
where α and γ are constants set to 0.9, and r and A are set to 0.5.
The pseudo-code of the bat algorithm is shown in Algorithm 1.
Algorithm 1. Pseudocode for BA.
Initialize the algorithm parameters: fmin, fmax, α, γ, r, A
Initialize the population
Evaluate the fitness value of each search agent
FEs = FEs+N
While FEs≤MaxFEs
For i = 1:N
Update the frequency fi of by Eq (1)
Update the velocity of
by Eq (2)
Update by Eq (3)
Evaluate the fitness value fitnessi of
If (rand<Ai)&&(fitnessi<fitnessbest)
Calculate the new position xnew by Eq (4)
Evaluate the fitness value fitnessnew of xnew and greedy selection
FEs = FEs+1
Update At by Eq (5)
Update rt by Eq (6)
End if
End for
FEs = FEs+N
t = t+1
End while loop
Return the best solution
2.2 Gaussian distribution-based strategy
To obtain better optimization results, Gaussian distribution is used to optimize the way the original BA is updated, enhancing the ability of the algorithm to search for the global optimal solution in the search space. The Gaussian distribution-based strategy is calculated from Eq (7).
(7)
where Gaussian(μi, σi) denotes the Gaussian kernel function, which obeys normal distribution. μi and σi denote the mean and standard deviation, respectively, as shown in Eqs (8) and (9).
2.3 Elimination strategy
In this study, the elimination strategy is introduced into BA to improve the optimization capability of the algorithm. Replacing the poorer search agents in the population with the derived search agents based on the optimal solution xbest and the suboptimal solution xsub can improve the population quality, prevent the search agents from overexploiting near the poor solution, and thus improve the algorithm accuracy. The mathematical model of the elimination strategy is as follows.
First, the individual information of the optimal solution xbest and the suboptimal solution xsub is used to generate a new reference search agent x_ref, as shown in Eq (10).
Then, the worst 5% of individuals in the population were updated according to Eqs (11) and (12).
(11)
(12)
where k and Ra2 are random numbers between 0 and 1 that obey a uniform distribution.
2.4 Implementation of GEBA
To improve the optimization performance of the original BA, the Gaussian distribution-based strategy and elimination strategy were introduced in this study. Wherein, the Gaussian distribution-based strategy is used to improve the convergence accuracy of the algorithm, so Eq (4) in the original BA was replaced by Eqs (7) to (9). In the early stages of optimization, a more diverse population can speed up the algorithm’s convergence and enhance the likelihood that the algorithm will find the best solution. However, when the optimization is late a larger population diversity may cause the algorithm to waste more resources on computing inferior solutions, which is not conducive to obtaining high-quality solutions. Thus, the pseudo-code of GEBA shows that after all search agents in BA were updated by the elimination strategy, the inferior solutions in the population were replaced with new search agents in the optimal solution region, which promotes a balance between global exploration and local exploitation of the algorithm. The pseudo-code for GEBA is shown in Algorithm 2.
Algorithm 2. Pseudocode for GEBA.
Initialize the algorithm parameters: fmin, fmax, α, γ, r, A,
Initialize the population
Evaluate the fitness value of each search agent
FEs = FEs+N
While FEs≤MaxFEs
For i = 1:N
Update the frequency fi of by Eq (1)
Update the velocity of
by Eq (2)
Update by Eq (3)
Evaluate the fitness value fitnessi of
If (rand<Ai)&&(fitnessi<fitnessbest)
Update μi by Eq (8)
Update σi by Eq (9)
Calculate the new position xnew by Eq (7)
Evaluate the fitness value fitnessnew of xnew and greedy selection
FEs = FEs+1
Update At by Eq (5)
Update rt by Eq (6)
End if
End for
For i = 1:N
Update xref according to xbest and xsub by Eq (10)
Evaluate the fitness value fitnessi of and greedy selection
End for
FEs = FEs+2×N
t = t+1
End while loop
Return the best solution
2.5 Proposed prediction model bGEBA-SVM
The set of optimized features may be thought of as a discrete optimization problem for feature selection, where a value of ’1’ indicates that the feature has been chosen and a value of ’0’ indicates that it has not. As a result, GEBA is further modified by discretization to create bGEBA, a variant that may be used to optimize the discrete space. The S-shaped function was selected as the GEBA transformation function in this study.
The S3 function in the S-shaped is applied to the j-th component of the i-th search agent, xi,j. If the random number between 0 and 1 is smaller than the output value s, the component information xbi,j of this search agent in the discrete space is output as 1, otherwise it is output as 0.
The j-th component of the i-th search agent xi,j is input into the S3 function in the S-shaped, and if the random number between 0 and 1 is less than the output value s then the output of the component information xbi,j of this search agent in the discrete space is 1, and vice versa is 0. Eqs (13) and (14) show the calculation.
Support vector machine (SVM) [38] has excellent generalization performance as well as good performance for nonlinear and nonconvex problems, so SVM was chosen as the classifier for the GEP dataset in this study. Furthermore, a wrapper feature selection method based on the combination of bGEBA and SVM is proposed, which is called bGEBA-SVM. The prediction process of bGEBA-SVM for the GEP dataset is shown in Fig 1 (The code is available at https://github.com/Forproject1111/bGEBA-SVM).
This approach makes use of bGEBA’s exceptional ability to search for the best option to optimize the feature subset. The feature subset is fed into the SVM as input for training, and the SVM’s prediction output serves as the evaluation criterion (fitness function) for this collection of feature subsets. The fitness function fitness is generated using Eq (15). The process is repeatedly iterated until the feature subset that gives the classifier the best performance is attained.
(15)
where, ω1 and ω2 make two weight parameters that are used to measure the impact of the model prediction accuracy and the number of selected features on the feature subset evaluation. Since the prediction accuracy of the model is the focus of the study, ω1 and ω2 are set to 0.99 and 0.01, respectively.
where ω1 and ω2 are two weight factors that are used to assess how well the model predicts the future and how many features were chosen for the feature subset analysis. Since the study’s main focus is the model’s prediction accuracy, ω1 and ω2 are set to 0.99 and 0.01, respectively.
3 Experiments on benchmark functions
In the prediction method based on the wrapper feature selection, swarm intelligence optimization algorithms are used as a key part of them to optimize the subset of features trained by the input model. Therefore, the optimization capability of the swarm intelligence optimization algorithm is one of the important factors affecting the prediction results. To explore the optimization performance of GEBA, this section sets up the benchmark function experiments for validation.
3.1 Experiment setup
The algorithm’s search way can be divided into global exploration and local exploitation. The global exploration capability indicates the algorithm’s ability to search for optimal solutions in unknown regions of the search space, increasing the probability of the algorithm avoiding local extremes. However, the probability of obtaining a poor solution is likewise elevated, which is not conducive to improving the accuracy of the current optimal solution. The local exploitation capability indicates the ability of the algorithm to further exploit near the current solution and improve the quality of the optimal solution. But this may make the algorithm fall into a local optimum dilemma. When the global exploration ability and local exploitation ability of the algorithm are balanced, the optimization ability of the algorithm can be fully utilized, and thus better optimization results can be obtained. To more comprehensively verify the optimization performance of the algorithm, this section verifies the optimization capability of the algorithm based on the IEEE CEC2017 benchmark function [36], and the details of IEEE CEC2017 are shown in Table 2. In addition, to ensure the fairness of the experimental results, the public parameters of the benchmark function experiments are set uniformly, and the public parameters as well as the experimental environment are shown in Table 3.
3.2 Ablation experiment
To verify the significance of the Gaussian distribution-based strategy and elimination strategy for the performance improvement of the algorithm, this subsection set up the ablation experiments of the optimization strategies. The two optimization strategies were introduced into BA separately, and the details of the ablation experiment comparison method are shown in Table 4. The comparison was carried out by ranking in both WSRT and FT nonparametric tests as well as convergence tests.
Table 5 shows the comparison results and rankings of the four methods, from the table it can be seen that the average rankings of GEBA for WSRT and FT are 1.37 and 1.88 respectively with the best performance, followed by GBA. The results of the two test rankings of EBA and BA are different, EBA performs better in the WSRT ranking, while the introduction of the elimination strategy in the FT ranking instead reduces the optimization performance of the algorithm. However, the comparison between GEBA and GBA shows that the performance improvement of the GBA algorithm is more obvious by the elimination strategy. Fig 2 shows the convergence of the four methods, and it is clear from the figure that the convergence curve of GEBA is at the bottom of all methods, and GEBA converges faster except for the F18 function.
In summary, the experiment in this subsection demonstrates that the combination of two optimization strategies outperforms the optimization effect of individual strategies and has more significant performance improvement for algorithm optimization.
3.3 Search history analysis
To explore the search process of GEBA for optimal solutions to different optimization problems, this subsection was experimentally verified by performing 1-dimensional search history, 2-dimensional function top view, and average fitness value error.
Fig 3 shows the historical search trajectory of GEBA on different optimization problems. Fig 3(A) shows the corresponding 3D image of the function. Fig 3(B) shows that the GEBA searches for larger steps in the search space in the early iterations, and in the late iterations as the global exploration ability decreases, the local search ability increases making the GEBA gradually converge to the optimal value in the dimension. From Fig 3(C), it can be seen that on the cross-section of the search space, the black dots indicate the historical locations of all search agents and the red dot indicates the location of the optimal solution. It can be seen that most of the search agents’ locations are clustered, and only a few of them are scattered in various regions of the search space. Of course, by looking at F1, F4, and F28 it is easy to see that in addition to a part of the historical positions clustering around the optimal solution, there is also a part of clustering around the suboptimal or local optimal solution. But in the end, the advantage of GEBA is to get rid of the local optimum and thus obtain a higher quality solution. Fig 3(D) shows the average fitness value curve between populations, similar to the fitness value convergence curve as the number of iterations increases the curve gradually converges to a smaller value. In addition to illustrating the gradual convergence of GEBA to the optimal solution, the average fitness value convergence curve also indicates that the differences among the search agents are smaller and more consistent with the optimization process in the later iterations.
In summary, in the optimization process, GEBA gradually converges to the optimal position based on the information provided by all search agents, the global exploration of the population gradually becomes local exploitation, and GEBA has a strong ability to jump out of the local optimal solution.
3.4 Stability experiment
To explore the stability of the algorithm in handling high-dimensional optimization problems, this subsection tested the optimization performance of GEBA and BA at 50 and 100 dimensions.
Table 6 shows the comparative results and rankings of the algorithms tested at high dimensions. GEBA outperforms BA in both dimensions with more than 19 functions, and BA performs better in only one function. There is no significant difference between the two algorithms in other functions, and the optimization performance of GEBA and BA is approximately equal. The ranking of the two statistical nonparametric tests, WSRT and FT, shows that GEBA has better optimization capability and robustness in handling different optimization problems. In addition, the convergence speed and convergence accuracy of GEBA can be seen in Figs 4 and 5, which show that GEBA performs well in high dimensions.
3.5 Comparative experiment on GEBA with advanced peers
To objectively evaluate the algorithm optimization performance, GEBA was compared with 10 advanced similar methods, and the parameters of the 10 comparison methods were set as shown in Table 7.
First, the performance of the 11 methods was verified by evaluating the mean and standard deviation of 30 independently run experiments on each function, and the experimental results are shown in Table 8. Observing the data in the table, it can be seen that the best performing algorithm among the 30 benchmark functions is GEBA. Among them, GEBA has 16 optimal means and 7 minimum standard deviations. Although GEBA is less stable compared to FA, the optimization performance of GEBA is still stable among all compared methods, and GEBA has a better overall optimization performance. Second, the results of the comparison between WSRT and FT statistics in Table 9 also show that GEBA ranks higher compared to the other methods. In the comparison results of P-values between GEBA and other methods, the number of P+ is at least 19 and the number of P- is at most 6. It indicates that GEBA performs better and that these results are statistically significant. For the WSRT ranking, GEBA had the best average ranking of 2.50, followed by FA. For the FT ranking, GEBA had the best average ranking of 3.20, followed by RCBA. Third, the convergence curve images of the 11 algorithms are shown in Fig 6. Observing the images it is easy to see that at about 20,000 to 50,000 evaluations, the curve of GEBA reaches convergence and the solution quality is higher.
The comprehensive analysis shows that GEBA has stronger optimization performance and faster convergence and higher convergence accuracy, so much so that it has the potential to be applied to many fields, such as image segmentation [39,40], medical diagnosis [41–43], engineering optimization [44], pulmonary hypertension diagnose [45], forecasting COVID-19 [46], spatiotemporal modeling of cardiac electrodynamics [47], automated detection of gastrointestinal diseases [48], and classification of dermatological disease [49].
4 Experiments on the employment prediction
4.1 Experiment setup
This section set up a series of experiments to verify the predictive performance of bGEBA-SVM. bGEBA-SVM was compared with eight similar methods and four popular prediction models, including bBA-SVM [22], bSCA-SVM [24], bGWO-SVM [50], bHHO-SVM [51], bSMA-SVM [21], Back Propagation Neuron NetWok (BP) [52], Classification And Regression Tree (CART) [53], Random Forest (RandomF) [54] and Adaptive Boosting (AdaBoost) [55] for comparison. To ensure the fairness of the experiment, this study unified the experimental setup. The population size (N) was set to 20, the dimension was determined by the dataset, and the number of loop terminations (Max_Iter) was set to 100. In addition, in order to avoid the occurrence of randomized experimental results, this study set up 10 independently run experiments and discussed the results of the mean and standard deviation.
In addition, to obtain a more accurate assessment of the model performance, this study comprehensively evaluated the model prediction ability by analyzing the four-assessment metrics of Accuracy, Sensitivity, Matthews correlation coefficient (MCC), and F-measure of the prediction results, the details of which are shown in Table 10.
4.2 Transformation function experiment
When optimization methods are discretized, the use of different transformation functions can cause differences in prediction results [56]. Therefore, to improve the prediction performance of the model for the GEP dataset, this subsection conducted experimental tests for the eight most common transformation functions [57] for S-shaped and V-shaped, and the experimental setup is shown in Table 11.
Fig 7 shows the four evaluation metrics for the prediction of the GEP dataset by eight discretization methods. As can be seen from the figure, S3-bGEBA-SVM performs the best among the eight comparisons, with Accuracy, Sensitivity, Matthews correlation coefficient, and F-measure reaching 93.86%, 88.65%, 0.8816 and 93.36%, respectively. This suggests that in order to optimize the feature space and eventually produce better prediction results for GEBA, the S3 transformation function is more appropriate. Therefore, this study set the S3 function as the default discretization method and conducted subsequent experiments.
4.3 Comparative experiment on the public datasets
To verify the generalization ability of the proposed method, bGEBA-SVM was compared with other methods by six public datasets [58,59] for comparative experiments, and Table 12 shows the details of these datasets.
Table 13 to Table 18 shows the means and standard deviations of the four evaluation metrics for the prediction results of the public dataset for the 10 prediction methods. Observing the data of the table species, we can see that bGEBA-SVM has the best mean and standard deviation of various metrics on all datasets except for the standard deviation of the four evaluation metrics on the Heart dataset. It indicates that bGEBA-SVM has better prediction ability for public datasets and its prediction performance is stable, which is a method with excellent generalization ability.
4.4 Comparative experiment on the employment prediction dataset
In this subsection, to achieve a more accurate prediction of graduate employment, bGEBA-SVM was used to predict the GEP dataset, and its superior prediction performance was verified by comparing it with other methods.
In the wrapper feature selection method, bGEBA was used as an optimization method to optimize a subset of features, and the set of features that favor the prediction accuracy of the model was fed into the classifier. Among them, the optimization performance of bGEBA is a key factor affecting the prediction accuracy of the model. Therefore, the convergence curves of bGEBA as well as peers for evaluating the objective function of feature subsets were analyzed as shown in Fig 8. Observing the images, it can be seen that bGEBA can jump out of the local optimum compared with other methods, and thus obtain a higher-quality feature subset. Combining the optimization results of bGEBA for feature subsets and the above prediction results, it is clear that bGEBA has a stronger global optimization ability and further improves the prediction accuracy of the model by feeding high-quality training data to the model. Table 19 and Fig 9 show the prediction results for the graduate employment dataset. It is easy to see that bGEBA-SVM can achieve a more accurate prediction of graduate employment due to peers and popular prediction models on the four performance evaluation metrics.
4.5 Important features analysis
The proposed bGEBA-SVM method has strong interpretability, and to verify the role played by each feature in the prediction process, feature importance experiments were set up in this study.
Fig 10 shows the number of times each feature was selected in the 100 times of feature selection experiments. The experimental results are realistic in that the features A11, A4, A12, A10, A2, A13, A3, and A1 are selected more times and have a more significant impact on predicting the employment aspect of graduates.
5 Discussion
The results of the experiment show that there are significant correlations among the factors influencing college students’ choice of ’slow employment’, such as preparation for school, student leadership experience, family situation, career planning, employment structure, family parenting style, student category, gender, professional interest. Among them, career planning, academic achievement, self-concept, and family situation are negatively correlated with the intention of ’slow employment’, i.e. the clearer the career planning, the better the academic achievement, the clearer the self-concept and the better the family economic situation, the lower the intention of ’slow employment’ of college students [60,61]. There are three other aspects that we should consider. First, among the ’slow employment’ intention, the number of students who are actively ’slow employment’ is increasing because they are pursuing further education, and more of them have the experience of being student leaders during their college years. Among those who are actively ’slow employment’, there is one point worth noting: there are more science and technology majors than economics and management majors, while agriculture majors have a lower intention to be actively ’slow employment’ [61–63]. Second, family upbringing also influences students’ intention of ’slow employment’, the more democratic family upbringing, the stronger students’ intention of ’slow employment’. Excessive family interference or lack of family involvement can make career decision-making difficult, leading young people to make poor choices based on these interferences [2]. Relatively democratic families, on the other hand, allow young people to be less influenced by external pressures and more diversified in their career choices, but this is more likely to lead to missed opportunities. Family influence is certainly a factor worth noting in the case of slow employment. Third, students’ intention to ’slow employment’ is influenced by the type of birth source to a certain extent, and students from rural areas have a higher intention to ’slow employment’ than those from urban areas. It can also be concluded that the situation of ’slow employment’ of college students should be analyzed scientifically and treated rationally.
This paper established a scientific ’slow employment’ prediction and evaluation model, used the improved bat algorithm as a feature subset search method, and used the screened key features for the training of the classification model, the proposed evaluation model can provide reasonable auxiliary decision-making suggestions for universities to solve the problem of ’slow employment’ of college students. First, it can help colleges and universities to do a good job of classifying and guiding students’ employment, predicting employment intentions, grasping the employment needs and difficulties of ’slow employment’ students in time, and providing employment guidance and services for different types of students. Second, it can help colleges and universities to make precise policies, find out the bottom number, send jobs, target help, and implement personalized employment services. Third, it can help all parties to help promote student employment. Colleges and universities can work with families, society, and other parties to help graduates understand the employment situation, strengthen their ability to improve, master the pace of job hunting, better intervene in the ’slow employment’ behavior, and actively guide their positive transformation. We can assess, predict and prevent ’slow employment’, to promote higher quality and fuller employment of graduates.
6 Conclusion and future works
The wrapper feature selection approach (bGEBA-SVM) presented in this study was developed on an improved bat algorithm and a support vector machine. By incorporating an Elimination approach and a Gaussian distribution-based approach into the Bat algorithm, an effective and reliable optimization technique (GEBA) was presented. In order to ensure that better feature subsets can be obtained, this study introduced the Gaussian distribution-based strategy and the Elimination strategy based on the original bat algorithm. The experimental findings at IEEE CEC2017 shown that bGEBA offers considerable optimization performance benefits over advanced comparable algorithms. bGEBA was objectively evaluated with certain sophisticated approaches. To suggest a discrete version for feature selection, the GEBA is further discretized (bGEBA). bGEBA provides effective training data for support vector machines and achieves more accurate prediction of graduate employment prediction dataset. Through experimental tests and analysis, we found that school, student leadership experience, family situation, career planning, employment structure, family parenting style, student category, gender, professional interest and other factors have a greater influence on graduates’ positive and negative responses to the choice of "slow employment". Analyzing these factors facilitates a more accurate prediction of graduate employment problems and the development of effective measures.
Although the proposed model has a more stable and excellent performance for the graduate "slow employment" prediction problem. However, its performance is still limited by the optimization efficiency and classification model performance. In future research, we will plan to incorporate high-performance computing techniques such as distributed optimization to solve the feature subset optimization efficiency problem, and use machine learning models with stronger predictive performance and compatibility. Additionally, the suggested GEBA’s optimization capabilities may be utilized to address picture segmentation [64] and engineering optimization challenges [65].
References
- 1. Donald WE, Ashleigh MJ, Baruch Y. The university-to-work transition: Responses of universities and organizations to the COVID-19 pandemic. Personnel Review. 2022;51(9):2201–21.
- 2. Shi H. The generation mechanism underlying the career decision-making difficulties faced by undergraduates in China during the COVID-19 pandemic: a qualitative study based on SCCT theory. Frontiers in Psychology. 2023;14:1154243. pmid:37377699
- 3. Li H, Zhang Y, editors. Research on employment prediction and fine guidance based on decision tree algorithm under the background of big data. Journal of Physics: Conference Series; 2020: IOP Publishing.
- 4. Wang T, Li S. Relationship between employment values and college students’ choice intention of slow employment: A moderated mediation model. Frontiers in Psychology. 2022;13:940556. pmid:36033039
- 5. Dai N. Analysis of data interaction process based on data mining and neural Network topology visualization. Computational Intelligence and Neuroscience. 2022;2022. pmid:35814595
- 6. Wang Y, Yang L, Wu J, Song Z, Shi L. Mining Campus Big Data: Prediction of Career Choice Using Interpretable Machine Learning Method. Mathematics. 2022;10(8):1289.
- 7. Yao Y. Design of English Teaching Postcompetency Evaluation System Based on Data Mining and IoT. Wireless Communications & Mobile Computing (Online). 2022;2022.
- 8. Banadkooki FB, Ehteram M, Ahmed AN, Fai CM, Afan HA, Ridwam WM, et al. Precipitation forecasting using multilayer neural network and support vector machine optimization based on flow regime algorithm taking into account uncertainties of soft computing models. Sustainability. 2019;11(23):6681.
- 9. Ghazvinian H, Mousavi S-F, Karami H, Farzin S, Ehteram M, Hossain MS, et al. Integrated support vector regression and an improved particle swarm optimization-based model for solar radiation prediction. PLoS One. 2019;14(5):e0217634. pmid:31150467
- 10. Ehteram M, Singh VP, Ferdowsi A, Mousavi SF, Farzin S, Karami H, et al. An improved model based on the support vector machine and cuckoo algorithm for simulating reference evapotranspiration. PloS one. 2019;14(5):e0217499. pmid:31150443
- 11. Rahman NAA, Tan KL, Lim CK. Supervised and unsupervised learning in data mining for employment prediction of fresh graduate students. Journal of Telecommunication, Electronic and Computer Engineering (JTEC). 2017;9(2–12):155–61.
- 12. Guofen C, editor Control tracking model of the graduate quality based on neural network theory 2014.
- 13.
Bharambe Y, Mored N, Mulchandani M, Shankarmani R, Shinde SG, editors. Assessing employability of students using data mining techniques. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI); 2017: IEEE.
- 14. Zhao Y, He F, Feng Y. Research on the Current Situation of Employment Mobility and Retention Rate Predictions of “Double First-Class” University Graduates Based on the Random Forest and BP Neural Network Models. Sustainability. 2022;14(14):8883.
- 15. Li X, Yang T. Forecast of the employment situation of college graduates based on the LSTM neural network. Computational Intelligence and Neuroscience. 2021;2021:1–11. pmid:34616445
- 16. Tu J, Lin A, Chen H, Li Y, Li C. Predict the entrepreneurial intention of fresh graduate students based on an adaptive support vector machine framework. Mathematical Problems in Engineering. 2019;2019.
- 17. Gao H, Liang G, Chen H. Multi-population enhanced slime mould algorithm and with application to postgraduate employment stability prediction. Electronics. 2022;11(2):209.
- 18. Tu J, Chen H, Wang M, Gandomi AH. The Colony Predation Algorithm. Journal of Bionic Engineering. 2021;18(3):674–710.
- 19. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H. Harris hawks optimization: Algorithm and applications. Future Generation Computer Systems-the International Journal of Escience. 2019;97:849–72.
- 20. Chen H, Li C, Mafarja M, Heidari AA, Chen Y, Cai Z. Slime mould algorithm: a comprehensive review of recent variants and applications. International Journal of Systems Science. 2022:1–32.
- 21. Li S, Chen H, Wang M, Heidari AA, Mirjalili S. Slime mould algorithm: A new method for stochastic optimization. Future Generation Computer Systems. 2020;111:300–23.
- 22. Yang XS, Hossein Gandomi A. Bat algorithm: a novel approach for global engineering optimization. Engineering computations. 2012;29(5):464–83.
- 23. Yang X-S, Slowik A. Firefly algorithm. Swarm intelligence algorithms: CRC Press; 2020. p. 163–74.
- 24. Mirjalili S. SCA: A Sine Cosine Algorithm for solving optimization problems. Knowledge-Based Systems. 2016;96:120–33.
- 25. Arora S, Singh S. Butterfly optimization algorithm: a novel approach for global optimization. Soft Computing. 2019;23:715–34.
- 26. Socha K, Dorigo M. Ant colony optimization for continuous domains. European journal of operational research. 2008;185(3):1155–73.
- 27. Ahmadianfar I, Asghar Heidari A, Gandomi AH, Chu X, Chen H. RUN Beyond the Metaphor: An Efficient Optimization Algorithm Based on Runge Kutta Method. Expert Systems with Applications. 2021:115079.
- 28. Su H, Zhao D, Asghar Heidari A, Liu L, Zhang X, Mafarja M, et al. RIME: A physics-based optimization. Neurocomputing. 2023.
- 29. Ahmadianfar I, Asghar Heidari A, Noshadian S, Chen H, Gandomi AH. INFO: An Efficient Optimization Algorithm based on Weighted Mean of Vectors. Expert Systems with Applications. 2022:116516.
- 30. Yang Y, Chen H, Heidari AA, Gandomi AH. Hunger games search: Visions, conception, implementation, deep analysis, perspectives, and towards performance shifts. Expert Systems with Applications. 2021;177:114864.
- 31. Yousri D, Allam D, Eteiba MB. Chaotic whale optimizer variants for parameters estimation of the chaotic behavior in Permanent Magnet Synchronous Motor. Applied Soft Computing. 2019;74:479–503.
- 32. Nenavath H, Jatoth RK. Hybridizing sine cosine algorithm with differential evolution for global optimization and object tracking. Applied Soft Computing. 2018;62:1019–43.
- 33. Adarsh B, Raghunathan T, Jayabarathi T, Yang X-S. Economic dispatch using chaotic bat algorithm. Energy. 2016;96:666–75.
- 34.
Yong J, He F, Li H, Zhou W, editors. A novel bat algorithm based on collaborative and dynamic learning of opposite population. 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design ((CSCWD)); 2018: IEEE.
- 35. Liang H, Liu Y, Shen Y, Li F, Man Y. A hybrid bat algorithm for economic dispatch with random wind power. IEEE Transactions on Power Systems. 2018;33(5):5052–61.
- 36. Wu G, Mallipeddi R, Suganthan PN. Problem definitions and evaluation criteria for the CEC 2017 competition on constrained real-parameter optimization. National University of Defense Technology, Changsha, Hunan, PR China and Kyungpook National University, Daegu, South Korea and Nanyang Technological University, Singapore, Technical Report. 2017.
- 37. Derrac J, García S, Molina D, Herrera F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation. 2011;1(1):3–18.
- 38. Cortes C, Vapnik V. Support-vector networks. Machine learning. 1995;20:273–97.
- 39. Han Y, Chen W, Heidari AA, Chen H. Multi-verse Optimizer with Rosenbrock and Diffusion Mechanisms for Multilevel Threshold Image Segmentation from COVID-19 Chest X-Ray Images. Journal of Bionic Engineering. 2023;20(3):1198–262. pmid:36619872
- 40. Liu G, Ding Q, Luo H, Sha M, Li X, Ju M. Cx22: A new publicly available dataset for deep learning-based segmentation of cervical cytology images. Computers in Biology and Medicine. 2022;150:106194. pmid:37859287
- 41. Xia J, Zhang H, Li R, Wang Z, Cai Z, Gu Z, et al. Adaptive Barebones Salp Swarm Algorithm with Quasi-oppositional Learning for Medical Diagnosis Systems: A Comprehensive Analysis. Journal of Bionic Engineering. 2022:1–17.
- 42. Xia J, Zhang H, Li R, Chen H, Turabieh H, Mafarja M, et al. Generalized oppositional moth flame optimization with crossover strategy: an approach for medical diagnosis. Journal of Bionic Engineering. 2021;18(4):991–1010.
- 43. Hu L, Lin F, Li H, Tong C, Pan Z, Li J, et al. An intelligent prognostic system for analyzing patients with paraquat poisoning using arterial blood gas indexes. Journal of Pharmacological and Toxicological Methods. 2017;84:78–85. pmid:27884773
- 44. Zhang H, Liu T, Ye X, Heidari AA, Liang G, Chen H, et al. Differential evolution-assisted salp swarm algorithm with chaotic structure for real-world problems. Engineering with Computers. 2023;39(3):1735–69. pmid:35035007
- 45. Yu X, Qin W, Lin X, Shan Z, Huang L, Shao Q, et al. Synergizing the enhanced RIME with fuzzy K-nearest neighbor for diagnose of pulmonary hypertension. Computers in Biology and Medicine. 2023;165:107408. pmid:37672924
- 46. Xu L, Magar R, Barati Farimani A. Forecasting COVID-19 new cases using deep learning methods. Computers in Biology and Medicine. 2022;144:105342. pmid:35247764
- 47. Xie J, Yao B. Physics-constrained deep active learning for spatiotemporal modeling of cardiac electrodynamics. Computers in Biology and Medicine. 2022;146:105586. pmid:35751197
- 48. Su Q, Wang F, Chen D, Chen G, Li C, Wei L. Deep convolutional neural networks with ensemble learning and transfer learning for automated detection of gastrointestinal diseases. Computers in Biology and Medicine. 2022;150:106054. pmid:36244302
- 49. Zhou J, Wu Z, Jiang Z, Huang K, Guo K, Zhao S. Background selection schema on deep learning-based classification of dermatological disease. Computers in Biology and Medicine. 2022;149:105966. pmid:36029748
- 50. Mirjalili S, Mirjalili SM, Lewis A. Grey wolf optimizer. Advances in engineering software. 2014;69:46–61.
- 51. Heidari AA, Mirjalili S, Faris H, Aljarah I, Mafarja M, Chen H. Harris hawks optimization: Algorithm and applications. Future generation computer systems. 2019;97:849–72.
- 52. Hecht-Nielsen R. Theory of the backpropagation neural network. Neural networks for perception: Elsevier; 1992. p. 65–93.
- 53. Breiman L. Classification and regression trees: Routledge; 2017.
- 54. Breiman L. Random forests. Machine learning. 2001;45:5–32.
- 55.
Freund Y, Schapire RE, editors. A desicion-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory: Second European Conference, EuroCOLT’95 Barcelona, Spain, March 13–15, 1995 Proceedings 2; 1995: Springer.
- 56. Mafarja M, Eleyan D, Abdullah S, Mirjalili S, editors. S-shaped vs. V-shaped transfer functions for ant lion optimization algorithm in feature selection problem. Proceedings of the international conference on future networks and distributed systems; 2017.
- 57. Mirjalili S, Lewis A. S-shaped versus V-shaped transfer functions for binary particle swarm optimization. Swarm and Evolutionary Computation. 2013;9:1–14.
- 58. Su H, Han Z, Fu Y, Zhao D, Yu F, Heidari AA, et al. Detection of pulmonary embolism severity using clinical characteristics, hematological indices, and machine learning techniques. Frontiers in Neuroinformatics. 2022;16. pmid:36590906
- 59. Dheeru D, Taniskidou EK. UCI machine learning repository. 2017.
- 60. Tomlinson M. Graduate employability: A review of conceptual and empirical themes. Higher education policy. 2012;25:407–31.
- 61. Huang X, Cao J, Zhao G, Long Z, Han G, Cai X. The employability and career development of finance and trade college graduates. Frontiers in Psychology. 2022;12:719336. pmid:35082712
- 62. Scurry T, Blenkinsopp J. Under‐employment among recent graduates: A review of the literature. Personnel Review. 2011;40(5):643–59.
- 63. Park S, Park SY. Career adaptability of South Korean engineering students: Personal and contextual influencing factors. European Journal of Training and Development. 2020;44(4/5):469–88.
- 64. Yang X, Wang R, Zhao D, Yu F, Heidari AA, Xu Z, et al. Multi-level threshold segmentation framework for breast cancer images using enhanced differential evolution. Biomedical Signal Processing and Control. 2023;80:104373.
- 65. Yang X, Wang R, Zhao D, Yu F, Huang C, Heidari AA, et al. An adaptive quadratic interpolation and rounding mechanism sine cosine algorithm with application to constrained engineering optimization problems. Expert Systems with Applications. 2023;213:119041.