Comparing subjective and objective evaluations of player performance in Australian Rules football

Player evaluation plays a fundamental role in the decision-making processes of professional sporting organisations. In the Australian Football League, both subjective and objective evaluations of player match performance are commonplace. This study aimed to identify the extent to which performance indicators can explain subjective ratings of player performance. A secondary aim was to compare subjective and objective ratings of player performance. Inside Football Player Ratings (IFPR) and Australian Football League Player Ratings were collected as subjective and objective evaluations of player performance, respectively, for each player during all 1026 matches throughout the 2013–2017 Australian Football League seasons. Nine common player performance indicators, player role classification, player age and match outcomes were also collected. Standardised linear mixed model and recursive partitioning and regression tree models were undertaken across the whole dataset, as well as separately for each of the seven player roles. The mixed model analysis produced a model associating the performance indicators with IFPR at a root mean square error of 0.98. Random effects accounting for differences between seasons and players ranged by 0.09 and 1.73 IFPR each across the five seasons and 1052 players, respectively. The recursive partitioning and regression tree model explained IFPR exactly in 35.8% of instances, and to within 1.0 IFPR point in 81.0% of instances. When analysed separately by player role, exact explanation varied from 25.2% to 41.7%, and within 1.0 IFPR point from 70.3% to 88.6%. Overall, kicks and handballs were most associated with the IFPR. This study highlights that a select few features account for a majority of the variance when explaining subjective ratings of player performance, and that these vary by player role. Australian Football League organisations should utilise both subjective and objective assessments of performance to gain a better understanding of the differences associated with subjective performance assessment.


Introduction
Player evaluation plays a fundamental role in the decision-making processes of professional sporting organisations, including player monitoring, team selection, player contracting and Despite frequent studies in the team sport notational analysis literature looking to encourage the use of objective performance rating systems [10,19,20], very few studies have looked specifically at identifying the specific mechanisms behind subjective evaluation of individual performance in team sports. Pappalardo, Cintia [8], analysed human evaluations of elite soccer performance using performance indicators and contextual information relating to each match performance. The authors illustrated that subjective ratings of performance were biased towards specific performance indicators, as well as contextual factors such as the outcome of a game, and the expected outcome of a game as estimated by bookmakers. Their findings indicated that in order to improve overall performance evaluations, player analysis should be a balance between objective performance measures and subjective values such as insights from qualitative skill qualities. These findings are indicative of those in other fields, which have shown that humans are susceptible to many errors and biases in decision making, and have limits to the amount of information they can comprehend [21,22].
In AF, the majority of research on evaluating player performance has had a specific focus on assessing performance indicators in order to explain or predict playing performance [11,12,[23][24][25][26]. Further to this, various other research in AF has been undertaken in other areas, such as assessing the relationship between performance indicators and match outcome [2,27,28], playing position [29,30], and trends in game-play [31].
This study aimed to identify the extent to which performance indicators can explain subjective ratings of player performance in the AFL. A secondary aim was to compare subjective and objective ratings of player performance. The rationale for this study was to identify the relationship between subjective ratings of performance and the most basic comprehendible performance indicators, in order to add to the existing understanding of the extent to which human decisions are related to measurable aspects of a player's performance. The methodologies are expressed as an exemplar of what could be implemented within professional AF organisations using their own specific subjective rating processes. An understanding of these insights could be beneficial in supporting organisational decisions relating to weekly team selection, player recruitment, as well as player contracting and financial remuneration; each which have ramifications on team outcomes.

Data
Two separate measures of player performance were collected for each player during 1026 matches played throughout the 2013-2017 AFL seasons. This included 22 matches played by each team during the regular season, as well as a total of nine matches played throughout the finals series each season. One match was abandoned prior to play during the 2015 season. Further, the eight drawn matches that occurred throughout the 2013-2017 seasons were removed from the analyses.
The Inside Football Player Ratings (IFPR) were obtained from http://www.aflplayerratings. com.au, which is a subjective measure of player performance, rated continuously from zero to ten, based on human interpretation of a player's performance ('Inside Football' is the commercial publication for these publically available player ratings). The ratings for each match were completed by a single AFL accredited journalist who was covering the game for Inside Football (most of whom had 10+ years in the industry). The journalist covering the game was at the ground in the majority of instances, and ratings were provided immediately post-match. The AFL Player Ratings were acquired from Champion Data (also available from http://www.afl. com.au/stats), which is an objective measure of player performance, rated on an open-ended continuous scale, and based on the principle of field equity [14]. The rating process is derived from contextual information collected in real time by trained Champion Data staff (corrected postgame), and is determined by how much each player's actions increase or decrease their team's expected value of scoring [14]. The validity and reliability of the data provided by Champion Data is not publicly available. However, previous research conducted in AF has reported the validity of the performance indicators collected by Champion Data as high [32], and the reliability (as determined by an external assessment) as very high (ICC ranged from 0.947-1.000 for the included performance indicators) [2]. Nine player performance indicators were collected from http://www.afl.com.au/stats, for each player and match included in the dataset. These indicators were selected due to being widely reported and available, as well as being previously reported in the literature [2,11,28]. These performance indicators and their definitions are outlined in Table 1. Player role classifications were collected for each player, based on Champions Data's classification for each player at the end of each respective AFL season. These classifications are defined in Table 2. Additionally, a player's age for each corresponding season (range: 18 to 40), and the match outcome for each match (Win and Losses; dummy coded as 1 and 0, respectively) were also collected. See S1 Dataset for all data collected on players.

Statistical analysis
Descriptive statistics (mean and standard deviation) were calculated for each of the two player rating measures, as well as for each respective player role. To determine the variation between the two rating systems, as well as each of the playing roles, the coefficient of variation was calculated for each. To determine the level of association between the two player rating systems and each of the features univariately (all performance indicators, as well as age and match outcome), correlational analyses were undertaken. This analysis was undertaken using the Hmisc package [33] in the R statistical computing software version 3.3.2 [34], and visualised using a correlogram.
A linear mixed model analysis was undertaken to determine the extent to which each of the features explained IFPR. This particular approach was used to control the variability created by the repeated measures on each player. This analysis was undertaken using the lme4 package [35]. All factors (besides position) were standardised and centred with a mean = 0 prior to the Table 1. Definitions of the Australian rules football performance indicators used in this study.

Kick
Disposing of the football with any part of the leg below the knee.

Handball
Disposing of the football by hitting it with the clenched fist of one hand, while holding it with the other.

Mark
Catching or taking control of the football after it has been kicked by another player a distance of at least 15 metres without touching the ground or being touched by another player.

Tackle
Taking hold of an opposition player in possession of the ball, in order to impede his progress or to force him to dispose of the ball quickly.
Free For An infringement in favour of the player as called by the umpire.
Free Against An infringement against the player as called by the umpire.

Hitout
A tap by a ruckman after a ball up or bounce by the umpire.

Goals
The maximum possible score (6 points) achieved by kicking the ball between the two goalposts without touching a post or any player.
Behinds A score worth one point, achieved by the ball crossing between a goalpost and a behind post, or by the ball hitting a goalpost, or by the ball being touched prior to passing between the goalposts. analysis to allow for Beta coefficient comparisons. In the model, player and season were treated as separate random effects, whilst all other factors were considered as fixed effects. A recursive partitioning and regression tree model [36,37] was undertaken as a secondary method to determine the extent to which each of the features explained IFPR. This analysis was undertaken using the rpart package, which uses the CART algorithm (classification and regression trees) [38]. A minimum of 100 cases were needed for each node to split, and the complexity parameter was set at 0.001 in order to maximise the number of outcome variables in the model. These measures were employed in order to avoid overfitting and to produce a more parsimonious model. Data were split whereby the 2013-2016 seasons were used to train the model, which was then subsequently tested on the 2017 season. Results of the model were displayed using a tree visualisation and a histogram outlining the model accuracy. Additionally, the recursive partitioning and regression tree analysis was conducted firstly on the whole dataset and then separately for each of the seven respective player roles.
A comparison of the IFPR and AFL Player Ratings was created for two specific players as a practical decision support application. Specifically, the deviation of each player's season mean ratings was compared to the overall sample mean for each rating system. This application allowed for a descriptive analysis and visualisation of the difference in evaluation between the subjective and objective systems.

Results
Descriptive statistics of each player role for both the IFPR and the AFL Player Ratings measures are presented in Fig 1. The overall mean and standard deviation of each rating system was 5.25 ± 1.73 for the IFPR, and 9.65 ± 5.58 for the AFL Player Ratings. The coefficient of variation for each system was 32.9% and 57.8%, respectively. The results of the Pearson's correlation analysis indicated a moderate association (r = 0.60) between the AFL Player Ratings and the IFPR. Further, the IFPR and marks both showed moderate associations (r = 0.64 and r = 0.53) with kicks. All of the remaining associations were r < 0.50 and are outlined in Figs 2. and 3 outlines the distribution on AFL Player Ratings along the various levels of IFPR, indicating that as the IFPR increases, the mean AFL Player Ratings increases and the distribution becomes more spread.
The results of the linear mixed model are outlined in Table 3. All features except for frees against, behinds and age contribute significantly to the model (p < 0.001), with kicks and handballs having the highest Beta coefficients of 0.844 and 0.646, respectively. The model produced a root mean square error of 0.98 in association with the IFPR. The random effect accounting for the difference between seasons ranged by 0.09 IFPR across the five seasons,    indicating minimal variation. The random effect accounting for differences between players ranged by 1.73 IFPR across the 1052 players, indicating that the mixed model varied substantially in its ability to explain player performance for all players. The full recursive partitioning and regression tree model is presented in Fig 4. Despite having 38 terminal nodes, only the features relating to ball disposal (kicks and handballs), scoring (goals and behinds), match outcome and hitouts contribute to the model. The splitting of the nodes within each branch indicates that having a greater total count of each performance indicator results in a higher rating of performance, except for behinds. None of the terminal nodes explain the outcome variables zero, nine or ten. The results of this model are outlined in Fig 5 and display that the IFPR could be explained exactly in 35.8% of instances, and within 1.0 IFPR point 81.0% of the time. The positive x-axis variables indicate that the model-expected IFPR was higher than the actual IFPR. Conversely, the negative x-axis variables indicate that the model-expected IFPR was lower than the actual IFPR. S1-S7 Figs outline the separate recursive partitioning and regression tree models based on each player role. As with the full model, none of the terminal nodes explain the outcome variables zero or ten; however the models based on Key Forwards and Midfielders do explain the outcome variable nine. Further, the model based on Key Defenders also excludes the outcome variables one and eight. Each of the separate models included six or more features, with kicks and handballs featuring heavily in all. Kicks was the root node in all models except for Rucks and Key Forwards, where hitouts and goals where the root node in each, respectively. The most notable additional changes from the full model were that goals featured frequently in the models for Key and General Forwards, marks featured frequently in Key and General Defenders, as well as Key Forwards, tackles for General Defenders, Key Forwards and Midfielders, and hitouts for Ruckmen. The range of accuracy for explaining IFPR exactly in these separate models varied from 25.2% for Key Defenders to 41.7% for Midfielders. The accuracy within 1.0 IFPR point either side varied from 70.3% for Key Defenders to 88.6% for Midfielders. Fig 6 outlines the distribution of IFPR and AFL Player Ratings for winning and losing teams across the five seasons. The abovementioned random effects accounting for player differences provide an indication of the individual players who were most consistently underand over-rated as estimated by the linear mixed model, after adjusting for the fixed effect factors. Two individuals were selected, with a comparison of subjective and objective evaluations of their performance undertaken as an exemplar of the application. Specifically, in order to compare their evaluations between the two rating systems on different scales, the deviation of their seasonal mean rating from the overall sample mean were calculated for each system. Table 4 outlines the deviation of their seasonal mean ratings from the overall sample mean of rating values for the two respective players. Additionally, Figs 7 and 8 outline how this could be visualised for ease of interpretability in an applied setting.

Discussion
This study aimed to identify the extent to which performance indicators can explain subjective ratings of player performance. A secondary aim was to compare subjective and objective evaluations of player performance. To achieve the primary aim, two separate models were fit identifying the relationship between our exemplar subjective rating system, the IFPR, and the selected performance indicators. To achieve the secondary aim, a descriptive analysis and visualisation was conducted to outline the potential discrepancies noted between subjective and objective evaluations of player performance. Together, these methodologies are expressed as Comparing subjective and objective evaluations of player performance in Australian Rules football an exemplar of what could be implemented within professional AF organisation using their own specific subjective rating processes.
Inspection of the coefficient of variation for each playing role, and the descriptive statistics outlined in Fig 1 indicates that the distribution of ratings in the subjective IFPR system is more variable between each of the player role classifications, in comparison to the objective AFL Player Ratings system. In addition to this, in both ratings systems the mean values for midfielders are higher than that for all other player roles. This aligns with the aforementioned biases noted within both AF and the wider team sport literature [12,16,17].
Both the linear mixed model and recursive partitioning and regression tree models provide an objective view of how subjective analyses of performance are explained. Each of the models reflect the results of the other, and outline that when explaining subjective assessment of performance, a small number of features account for a large majority of the variance. The changes seen in the recursive partitioning and regression tree model once analysed separately by position supports the notion that specific indicators differ between playing roles, indicating that controlling for player role when explaining player performance subjectively is important, to  Comparing subjective and objective evaluations of player performance in Australian Rules football account for the roles specific to each positional group [39]. Further, both models display a negative association between behinds and expected IFPR, thus indicating that behinds might be viewed as inefficient. This is not surprising, as though behinds contribute to team scoring, they also result in a loss of possession. The agreement levels outlined in both models indicates that alone the features used cannot fully explain the IFPR process. This may be a result of the features used not being able to fully capture aspects of technical performance, or potentially because the subjective assessors of performance consider more in depth performance actions, other contextual information (i.e., strength of opponent, expected match outcome) or are influenced by their own individual biases. The recursive partitioning and regression tree model provides a visual representation of what performance indicators subjective raters tend to associate with better or worse performances. This is particularly visible by conceptualising the explanations of the highest and lowest IFPR values within each of the trees (i.e., the limbs stemming from the root node to the highest or lowest outcome variable of each recursive partitioning and regression tree). Whilst we observe that for the more frequently occurring IFPR outcome variables, performance rating can be explained in various ways, by various combinations of associated performance indicators. However, despite each recursive partitioning and regression tree (full model and player role specific models) incorporating six or more of these features, explanation of performances which are associated with highest or lowest IFPR values are explained by just the features Comparing subjective and objective evaluations of player performance in Australian Rules football kicks, handballs and one or two other features for all player roles, except rucks which has three other features. This explanation of performance associated with the highest and lowest ratings aligns with previous research, whereby subjective evaluation of performance has been shown to rely on the presence of noticeable features that are specific to a player's role, and are easily brought to mind [8,40]. For example, a specific instance of a positively associated noticeable feature in this study is goals for key forwards; whereby the model can explain the subjective rating of performance for players who kick four or more goals, irrespective of any other features.
Applications of these models have the potential to be beneficial in supporting the decision making processes in professional AF organisations. Figs 7 and 8 provide specific comparisons of how the subjective and objective evaluations of player performance outlined in Table 4 can be compared, and visualised. Specifically Fig 7 indicates that the player is objectively rated more highly across all four seasons in comparison to the subjective ratings system. Conversely, Fig 8 indicates that whilst the subjective rating system shows the individuals performance has progressed across his four seasons, the objective rating system indicates that performance has remained very similar. Without the ability to unequivocally identify the reasons for these inconsistencies, this highlights the importance of considering both subjective and objective measures when evaluating player performance.
In an applied setting, these findings advocate for performance evaluators and key decision makers (i.e., coaches, player scouts) to utilise both types of evaluations, and to be aware of Comparing subjective and objective evaluations of player performance in Australian Rules football their differences. Further, it also encourages the need for these key decision makers to be aware of the various reasons which could account for these differences, as well as the tendencies of the subjective performance assessors. As an example, the objective measure may not capture and fully account for certain aspects of the game, such as off-ball defensive acts, which would be important to know when evaluating individual players who have a specific role to negate an opposition player. Alternately, the subjective assessor may be prone to certain biases, such as a personal bias, and may consistently under-or over-rate certain players.
Some limitations of this study should also be noted. Though the mixed model approach in this study was able to account for repeated measures in the dataset, the recursive partitioning and regression tree model did not. Despite this limitation, as the results of the linear mixed model indicated minimal effects from the repeated measures variables, the recursive partitioning and regression tree model was subsequently used due to its interpretability as an applied application, and its ability identify non-linear trends. Another limitation is that not all available performance indicators were used to construct the models. Future research could look to include these, as well as other factors such as anthropometric features to further analyse subjective ratings of player performance in AF. Specifically, future research should target the subjective ratings of key decision makers within applied sporting organisations (i.e., coaches and scouts), to further understand the validity and reliability of their organisational decision making processes.

Conclusions
The models developed in this study provide an explanation of subjective analyses of performance in AF. Specifically, it demonstrates that subjective perceptions of performance can be somewhat accurately explained whilst considering a small number of performance indicators specific to a player's role. Further, though there is an ongoing development of objective data and player performance measures in both AF and wider team sport literature, the results of this study support the notion that overall player performance evaluations should consider both subjective and objective assessments in a complementary manner to accurately evaluate player performance.