Expected goals in football: Improving model performance and demonstrating value

doi:10.1371/journal.pone.0282295

Table 1.

League positions resulting in specific consequences for teams in each league.

More »

Expand

Fig 1.

Kernel density estimate of the distance a shot is taken from for those that result in a goal or miss/save.

More »

Expand

Fig 2.

Kernel density estimate of the angle a shot is taken from for those that result in a goal or miss/save.

More »

Expand

Fig 3.

Heatmap of the where shots are taken from that do no result in a goal.

More »

Expand

Fig 4.

Heatmap of the location of successful shots.

More »

Expand

Fig 5.

Kernel density estimate for the time a shot is taken (in seconds from the start of the match).

More »

Expand

Fig 6.

Kernel density estimate of the player value.

More »

Expand

Fig 7.

Kernel density estimate of the ELO rating.

More »

Expand

Table 2.

Log loss test set scores for each league and model, before and after tuning (LR = logisitic regression, RF = random forest, AB = AdaBoost, XGB = XGBoost).

The best score for each league is highlighted in bold.

More »

Expand

Table 3.

Summary of the results of our model compared to published models.

The AUC ROC for the optimal model in this research used test data, and used players’ FIFA ratings as a proxy for player ability.

More »

Expand

Fig 8.

Important features for premier league, ordered by importance.

In general, most of the models (with the exception of AdaBoost) performed relatively well on both training and test data, however, the MLP produced the best results on unseen data.

More »

Expand

Fig 9.

Important features for the German Bundesliga using the optimal model (in this case a tuned XGBoost model).

We order the features in terms of Gain, the improvement in predictability attained by the variable from splitting the dataset.

More »

Expand

Table 4.

Test data results for comparison between expected goals statistic and traditional metrics.

More »

Expand

Fig 10.

Statistics from all leagues data plotted against future average goals.

The differences between each metric’s ability to predict future goal ratio are examined by plotting the best fit line through each statistic’s values (from all leagues combined) against average goals over the subsequent six matches and calculating Pearson’s r to deteermine their level of correlation.

More »

Expand