Ranking enables coaches, sporting authorities, and pundits to determine the relative performance of individual athletes and teams in comparison to their peers. While ranking is relatively straightforward in sports that employ traditional leagues, it is more difficult in sports where competition is fragmented (e.g. athletics, boxing, etc.), with not all competitors competing against each other. In such situations, complex points systems are often employed to rank athletes. However, these systems have the inherent weakness that they frequently rely on subjective assessments in order to gauge the calibre of the competitors involved. Here we show how two Internet derived algorithms, the PageRank (PR) and user preference (UP) algorithms, when utilised with a simple ‘who beat who’ matrix, can be used to accurately rank track athletes, avoiding the need for subjective assessment. We applied the PR and UP algorithms to the 2015 IAAF Diamond League men’s 100m competition and compared their performance with the Keener, Colley and Massey ranking algorithms. The top five places computed by the PR and UP algorithms, and the Diamond League ‘2016’ points system were all identical, with the Kendall’s tau distance between the PR standings and ‘2016’ points system standings being just 15, indicating that only 5.9% of pairs differed in their order between these two lists. By comparison, the UP and ‘2016’ standings displayed a less strong relationship, with a tau distance of 95, indicating that 37.6% of the pairs differed in their order. When compared with the standings produced using the Keener, Colley and Massey algorithms, the PR standings appeared to be closest to the Keener standings (tau distance = 67, 26.5% pair order disagreement), whereas the UP standings were more similar to the Colley and Massey standings, with the tau distances between these ranking lists being only 48 (19.0% pair order disagreement) and 59 (23.3% pair order disagreement) respectively. In particular, the UP algorithm ranked ‘one-off’ victors more highly than the PR algorithm, suggesting that the UP algorithm captures alternative characteristics to the PR algorithm, which may more suitable for predicting future performance in say knockout tournaments, rather than for use in competitions such as the Diamond League. As such, these Internet derived algorithms appear to have considerable potential for objectively assessing the relative performance of track athletes, without the need for complicated points equivalence tables. Importantly, because both algorithms utilise a ‘who beat who’ model, they automatically adjust for the strength of the competition, thus avoiding the need for subjective decision making.
Citation: Beggs CB, Shepherd SJ, Emmonds S, Jones B (2017) A novel application of PageRank and user preference algorithms for assessing the relative performance of track athletes in competition. PLoS ONE 12(6): e0178458. https://doi.org/10.1371/journal.pone.0178458
Editor: Wei-Xing Zhou, East China University of Science and Technology, CHINA
Received: December 9, 2016; Accepted: May 12, 2017; Published: June 2, 2017
Copyright: © 2017 Beggs et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The study data is presented in its entirety in the manuscript. It involves the results of ten sprint races.
Funding: The authors received no specific funding for this work.
Competing interests: Clive Beggs and Simon Shepherd have nothing to disclose. Stacey Emmonds holds a coaching position at Doncaster Belles women’s football team for which she receives financial support for research activities. Ben Jones has received financial compensation from Leeds Rugby for coaching and consultancy services, the Rugby Football Union for research, and holds a position with Rugby Football League for which he receives financial support for research activities. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
Ranking is an important task that enables coaches, applied scientists, sporting authorities, and pundits to determine the relative performance of individual athletes and teams in comparison to their peers or competitors. Ranking within talent identification, can also provide an objective assessment of the performance of young athletes relative to their peers, something that can be helpful when making important decisions about career progression [1, 2]. While ranking is a relatively straightforward task in sports that employ traditional leagues in which all the teams play each other (e.g. soccer and rugby), it is more difficult in sports that involve knock-out tournaments (e.g. tennis, international soccer) or are event based (e.g. athletics). With these, not all the athletes or teams play each other and as a result competition can become fragmented, making it difficult to assess performance relative to competitors. The situation is compounded by the fact that in many sports (e.g. athletics, tennis, golf) not all the athletes compete in every competition. Consequently, we can have the paradoxical situation where an athlete may appear to be performing well, having achieved several wins against low ranking opposition, while a much better athlete, who has entered just a few competitions, is ranked far below them despite having only narrowly lost to opponents of the highest calibre. As such, this may lead to a ‘false positive’ identification (i.e. the identification of an athlete who is not as good as their ranking suggests) or a ‘false negative’ (i.e. failure to identify an athlete who may be better than their ranking suggests), both situations that are undesirable. In situations where assessing the relative performance of competitors is difficult, developing a robust ranking system that accurately reflects the true performance of the respective athletes or teams represents a considerable mathematical challenge. If a ranking system is too simplistic, then it will fail to capture the complexities of the system and will struggle to reflect the true performance of the athletes concerned, something that may cause it to become discredited. Aware of this, many sporting authorities employ complex points based systems , which attempt to mirror the complexities associated with the competition structure. While these systems aim to be objective, they inevitably involve a degree of subjectivity when it comes to allocating the number of points to particular tournaments, with the result that the overall ranking process can be somewhat arbitrary. However, in recent years advances in computer science have yielded techniques, such as the Google PageRank (PR) algorithm, that have the potential to overcome this problem and make ranking a more objective process.
Accurate ranking of teams and athletes is something that poses a considerable challenge for many sporting authorities, and there is no clear consensus regarding the best approach that should be taken. In most sports, ranking is based on the accumulation of points awarded for performance during matches or tournaments. Leagues, such as those found in soccer and rugby, represent a classic example of this, with all the teams playing each other over the course of a season and the number of points accumulated indicating the respective standings. However, even within a league system such as this, the points system used can still vary between sports. For example: rugby league award two points for a win, one point for a draw and zero points for a loss ; soccer award three points for a win, one point for a draw and zero points for a loss ; while rugby union award two points for a win, one point for a draw and zero points for a loss, with the addition of a ‘bonus’ point system . Golf and tennis, although not league based, also employ a points system. However, unlike the relatively simple points system employed in the leagues associated with the aforementioned sports, golf and tennis employ complex models which calculate the number of points accumulated in a rolling period. In golf  and tennis  the points awarded for the various competitions will tend to vary depending on the prestige of the event and the perceived strength of the competitors involved. Prestigious international tournaments will naturally yield more points in comparison to smaller local events. In athletics the International Association of Athletics Federations (IAAF) also employs a complex points system, which takes into account the measured result of athletes and their placings, together with the prestige and quality of the event in which they are competing. The process is complex and relies on published tables (e.g. Ref: IAAF Scoring Tables of Athletics ) to compute the respective points scored. While these point based systems are generally able to differentiate high performers from those who are more mediocre, they tend to be over-complex, difficult to understand, and can be somewhat arbitrary in their points allocation. As such, they are a relatively ‘blunt instrument’ and are therefore of only limited value to coaches seeking to assess the true performance of both developing and high-performance athletes.
In recent years graph theory has yielded a number of algorithms that have proved remarkably successful in computer science. Perhaps the most pre-eminent of which is the PR algorithm used to power the Google search engine. This algorithm, first developed at Stanford University by Larry Page and Sergey Brin in 1996 [9, 10], enables Google’s search engine to measure of the relative importance of every web page on the Internet. It does this by computing an adjacency matrix, which it then uses to determine the PR of each individual web page. The PR of any particular web page is calculated based on the quantity and PR quality of the incoming links to it. For any given page, the higher the PR of the incoming links, and the fewer outbound links associated with the page, the higher the PR given to the web page. Such is the robustness and power of the algorithm, that, for any given query, Google is able to rapidly rank web sites in order of importance, despite the huge complexity of the Internet (i.e. many millions of web pages). As such, the PR algorithm appears to have great potential as a tool for ranking athletes and sporting teams. In particular, because the algorithm can automatically evaluate relationships between athletes/teams irrespective of the quality of the tournament, it has the potential to remove much of the arbitrary decision-making currently associated with many ranking methodologies. However despite its potential, surprisingly few studies have investigated the use of the PR algorithm in a sporting context. For example, Zack et al  and Govan et al [12, 13] both used the PR algorithm to rank the performance of teams in the National Football League (NFL) in the USA, while Lazova and Basnarkov  used it to rank international soccer teams using Federation International Football Association (FIFA) World Cup data. Pena and Touchette  and Brandt and Brefeld  also used the PR algorithm, but did so to rank the performance of individual soccer players. Others have adapted the PR algorithm in an attempt to utilize it for prediction purposes . However these studies have focused almost exclusively on sports in which games consists of two teams playing each other, with the result that the potential for the PR algorithm in athletics has been overlooked. Unlike team sports where each game involves only two teams, athletic events involve many individuals competing at the same time, making assessment more challenging for some ranking algorithms. However, this is not a problem for the PR algorithm, which was specifically designed to assess complex networks involving numerous interactions. We therefore designed the study presented here, with the specific aim of using the PR algorithm to evaluate the relative performance of male 100m sprinters throughout the course of the 2015 IAAF Diamond League season. These races were selected for investigation because: (i) they represented a closed system with well-defined outcomes; and (ii) the competition exhibited considerable asymmetry, with some athletes (e.g. Mike Rogers, Nesta Carter) competing many times, while others (e.g. Marvin Bracy, Usain Bolt) ran only once. As such, the system posed considerable challenges from a ranking standpoint, making it an ideal context with which to evaluate the PR algorithm. In addition, we adapted an Internet retail user preference (UP) algorithm , which we also used to rank the athletes. UP algorithms have some attributes that are well suited to sporting events where many individuals compete at the same time, and so we were interested to know if they could be adapted to successfully rank track athletes.
In order to assess the performance of the Internet derived algorithms, we also used three well-known sports ranking systems, the Colley , Massey  and Keener  algorithms, which we used for comparative purposes. These algorithms were originally developed for use in team sports where matches consist of two teams playing each other. We therefore had to adapt these algorithms so that they could be applied to track athletics.
The paper reports on a project that involved analysis of publicly available secondary data. Ethical approval for the project was granted by the Research Ethics Board of Leeds Beckett University.
Public domain data from IAAF Diamond League website  were collated for the ten Diamond League events (i.e. Doha, Eugene, Rome, Birmingham, New York, Paris, Lausanne, Monaco, London and Bruxelles) that included a male 100m race during the 2015 season. For each event, only the results of the A race were utilised, with the times and placings compiled into a single dataset (Table 1). The data were evaluated using firstly the PR algorithm, and then a UP algorithm, developed for Internet shopping , which we adapted to evaluate the track events. We also used adapted versions of the Colley, Massey and Keener algorithms to assess the data. All the algorithms were executed using bespoke ‘in-house’ programs written in Matlab (version R2016b; Mathworks, Natick, USA) and ‘R’ (version 3.3.3; open source statistical software).
B. PageRank algorithm
The PR algorithm was developed by Larry Page and Sergey Brin, the founders of the company Google, to rank web pages on the Internet . The algorithm utilises graph theory, in so much that it relies on a directed adjacency matrix, A, which describes the relationships between web pages. In the graph described by the adjacency matrix, the web pages are nodes and the links between the web pages are the directed edges. The number of links leaving from any given node is called the ‘out-degree’ of that node . These can be viewed as ‘votes’ cast by that web page in favour of other web pages. With the PR algorithm, the rank of any given web page is dependent on how many other pages ‘vote’ for it (i.e. are linked to it), and the rank of these other pages. A web page is considered important if it is pointed to by other web pages of importance.
With highly inter-connected webs such as the Internet, there is a high degree of recursiveness, with web pages often linked to each other in both directions, so that rank of one page often depends on the rank of the other and vice versa. To solve this type of system, an iterative procedure is required to determine the rank of all the pages in the web. In this process the PR of a web page, P, after k+1 iterations is: (1) where, r(P) is the rank of the web page, P; Bp is the set of all the web pages pointing to P; and |Q| is the out-degree of a web page Q . In order to complete the iterative procedure it is necessary to initialise the system by setting the initial ranks of P and Q to 1/n, where n is the number of web pages in the network.
In order to explain the matrix algebra required to solve the PR problem, we consider the small web shown in Fig 1. In the adjacency matrix, A, summarising the graph structure of this web, if an edge (i.e. link) exists from node Pi to node Pj, then 1 is inserted in the matrix, otherwise 0 is inserted. The resulting adjacency matrix, A, is:(2)
Matrix A is then divided by the number of out-degrees per page to produce the hyperlink matrix, H.(3)
Because the PR model requires a stochastic matrix in which all the row sums are equal to 1, it is necessary to adjust any rows containing all zeros (i.e. rows relating to nodes with no out-degrees). This is done by replacing any zero rows in H with 1/n to produce the stochastic matrix, S.(4)
Finally, in order to ensure that the graph is strongly inter-connected and irreducible, the stochastic matrix, S, is modified using Eq (5) to produce the Google matrix, G. (5) Where, α is a damping factor, usually set at 0.85; and E is a [n×n] matrix populated entirely with the value 1/n.(6)
The vector containing the final PR scores, q, is then computed using the power method shown in Eq (7). (7) Where, q is a [1×n] vector containing the PR scores; z0 is a [1×n] vector containing the initial estimated PR scores, generally populated entirely with the value 1/n; and k is the number of iterations necessary to reach convergence.
For the small web in Fig 1, after convergence the PR vector, q, is: (8)
From q it can be seen that node 3 has the highest PR, because it had no out-degrees, whereas node 1 was ranked lowest, reflecting the fact that it has 50% more links leaving the node than entering it.
C. Application of the PageRank algorithm
In order to apply the PR algorithm to the Diamond League event, we constructed a [33×33] adjacency matrix for all the competing athletes, which we updated after every race. For each race, any athlete who beat another received a ‘vote’ of 1 from the beaten athlete. So in a race comprising eight athletes, the winner would receive seven ‘votes’ from the other athletes, while the athletes who came second and third would receive six and five ‘votes’ respectively. This pattern would continue to the second to last athlete, who would receive one ‘vote’ from the last athlete, who would of course receive no ‘votes’. Where two athletes tied for a position, they were each considered to have voted for each other. After every race, the adjacency matrix was updated with the new ‘votes’ added to the existing ‘votes’ from the previous races. As such, the adjacency matrix maintained a precise updated summary of ‘who beat who’.
The updated adjacency matrices were analysed using an ‘in-house’ PR algorithm, which was used to generate vectors containing the PR scores, together with network graphs showing the connectivity between the athletes. In keeping with previous work , when computing the PR scores, we set the damping factor, α, to 0.85.
D. User preference algorithm
Internet based retailers frequently rely on a ‘five star’ rating system to assess user preferences. This performs two functions; firstly it enables users to rate particular products; and secondly it allows similar products to be ranked so that retailers can compare their ‘performance’. This can be done by compiling a skew-symmetric weighted adjacency matrix from UP data, as described by Langville and Meyer . Once this matrix is defined, the product ratings can easily be assessed using: (9) where, n is the number of products being assessed; b is a [n×1] vector containing the UP ranks for the various products; K is a [n×n] skew-symmetric adjacency matrix; and e is a [n×1] vector populated entirely with ones.
In order to explain the methodology associated with a typical user preference rating system, we consider the example below, in which six users (u1 … u6) rate four films (f1 … f4) to produce a UP matrix, U, in which the rows denote the users and the columns denote the films. From this we can see that user U5, for example, has given film f1 a rating of five stars (the highest rating score) and a rating of three stars to film f4.(10)
The graph associated with matrix, U, is shown in Fig 2. From this the paired [n×n] matrix K can be derived, with the numerical values of K representing the average of the score differences between the respective films and the signs representing the direction of these differences.
The nodes represent the films and the edges represent the user ratings, with edge scores representing the numerical difference between the user ratings for the two nodes.
Once matrix, K, is formulated then the rating vector, b, can be derived using Eq (9), as follows: (12)
From b it can be seen that film f2 is ranked first, followed by f1, f4, and lastly f3.
E. Application of the user preference algorithm
In order to apply the UP algorithm to the Diamond League event, we constructed a [10×33] UP matrix in which the races represented the individual ‘user evaluations’. Because the races contained varying numbers of contestants (i.e. some races comprised nine runners, while others had only seven), we decided to use a reverse scoring system, in which the athletes who came first, second and third received one, two and three ‘stars’ respectively, with the number of ‘stars’ incrementally increasing for the lower finishing places. In this way, the number of athletes competing in any given race had minimal impact on the overall rankings, because the winner would always received one ‘star’, with the second athlete always receiving two ‘stars’, and so on. The rankings were derived for the individual athletes using the methodology described above. Because we used a reverse scoring system, the final ranking vector, b, was multiplied by –1 in order to make the ranking scores positive for the leading athletes.
F. Keener algorithm
In 1993, James Keener proposed a ranking method that utilized a non-negative matrix constructed using the results of games between competing teams . Although metrics such as win ratio can be used to construct Keener's matrix, X, it is generally more common to use the number of points or goals that team i scored against team j . Indeed, Keener suggested using a score ratio that satisfies Laplace’s rule of succession as follows: (13) where, aij is the value of the statistic produced when team i competes against team j; Sij is the number of points team i scores against team j; and Sji is the number of points team j scores against team i.
Since the Keener method utilises game scores to compute ratings, it can be susceptible to bias when teams accumulate high scores. In order to compensate for this Keener suggested applying the non-linear skewing function: (14) to each computed value, aij, to generate a non-negative matrix that is irreducible provided that enough games have been played between the teams. By exploiting the Perron-Frobenius theorem  and using Eq (15), Keener was able to compute the Perron vector, (i.e. a unique vector derived from the eigenvector corresponding to the largest eigenvalue of the Keener matrix), which contains the positive ratings of all the teams involved in any given competition . (15) where, X is the Keener matrix; rk is the Perron rating vector; and λ is the proportionality constant.
In order to explain the methodology associated with the Keener algorithm, let us consider a hypothetical mini-league in which six soccer teams (Arsenal, Chelsea, Liverpool, Stoke, Swansea and Tottenham) compete. If we assume that each team has played three matches (results shown in Table 2), then the following ‘goals scored’ adjacency matrix, L, can be constructed in which the rows denote the goals scored by the respective teams and the columns denote goals conceded.(16)(17)
Finally, by solving Eq (15), the Perron rating vector, rk, can be computed, which indicates that the algorithm ranks the teams in the following order: Liverpool, Tottenham, Arsenal, Chelsea, Stoke and Swansea.(18)
G. Application of the Keener algorithm
In order to apply the Keener algorithm to the Diamond League event, we constructed a [33×33] adjacency matrix for all the competing athletes, which we populated with data from all ten races. In order to ensure consistency with the other ranking techniques used, we adopted a ‘who beat who’ strategy, in which each athlete who beat another athlete receiving a ‘vote’ of 1 from the defeated athlete, in the similar manner to the PR algorithm. In the event of a tie between two athletes, a ‘vote’ of 0.5 was awarded to each athlete . Eq (13) was then used to compute the aij statistic, which in this case represented the adjusted win ratio, with sij being the number of times athlete i beat athlete j, and sji being the number of occasions on which athlete j beat athlete i. For the Diamond League application, the win ratio was used rather than the score ratio, because we felt that this best reflected the ‘who beat who’ approach used in the PR algorithm.
H. Colley algorithm
Unlike the Keener method, which takes into account the outcome of individual matches between teams, the Colley algorithm utilises only the total number of wins, losses and games played to rank competing teams. It was developed by Wesley Colley in 2002 to rate teams in match-orientated sports  and utilizes Laplace’s rule of succession  to produce a [n ×1] vector, v, (where, n is the number of competing teams) using Eq (19): (19) where, vi is the combined ‘score’ of the ith team; wi is the number of wins of the ith team; and li is the number of losses of the ith team.
Colley’s method solves the linear system: (20) where, rc is the Colley rating vector, which defines the ranking of the teams; and C is the Colley coefficient matrix defined as : (21) where, pi is the total number of times team i has played; and pij is the number of matches played between teams i and j.
If the Colley algorithm is applied to the hypothetical mini-soccer league outlined above (see Table 2), then the Colley matrix, C, is: (22) and the vector, v, is: (23)
By solving Eq (20), the Colley rating vector, rc, can be computed, which indicates that the Colley algorithm ranks the teams in the following order: Liverpool, Chelsea, Tottenham, Arsenal, Stoke and Swansea.(24)
I. Application of the Colley algorithm
In order to apply the Colley algorithm to the Diamond League event, we constructed a [33×33] adjacency matrix in the same manner as used with Keener algorithm. We used this to calculate for each athlete, the total number of contests won and lost against the other athletes, which we then used to compute the vector, v, using Eq (19). The Colley matrix was compiled as described above, with the number of ‘matches’ being the number of times the respective athletes ran against each other.
J. Massey algorithm
In 1997 Kenneth Massey proposed a ranking model which used a least squares approach to solve a system of linear equations expressing the relationship between team ratings and the margin of victory . Massey’s method involved constructing a [m×n] matrix, W, recording the outcomes of m matches between n teams. Matrix, W, is populated according to the following rules , where wki is an indicator variable for the outcome of the kth game for team Ti.(25)
Massey used matrix, W, and a vector, y, containing the margins of victory, to solve the following normal equation, in which rm is the vector of unknown ratings.(26)
Conveniently, Massey was able to simplify Eq (26) to: (27) where, the Massey matrix, M, is: (28) pi is the total number of games played by team i; pij is the number of matches played between teams i and j, and d is the vector of cumulative points differentials.(29)
If Massey’s method is applied to the mini-soccer league example (see Table 2), then the Massey matrix, M, becomes: (30) and the vector of cumulative goal differentials, d, is: (31)
Because Eq (27) does not necessarily have a unique solution, Massey proposed a workaround solution [18, 20], which involved replacing the last row of the matrix, M, with a row of ones, and the last row of the vector, d, with a zero as follows, thus forcing the unique solution for rm shown in Eq 33.(32)(33)
From vector, rm, it can be seen that the Massey methods ranks the teams in exactly the same order as the Colley algorithm, namely: Liverpool, Chelsea, Tottenham, Arsenal, Stoke and Swansea.
K. Application of the Massey algorithm
In order to apply the Massey algorithm to the Diamond League event, we constructed a [33×33] adjacency matrix for all the competing athletes, which we populated with data from all ten races. Since the Massey method was developed for competitions in which teams play paired matches, we adopted a strategy that mimicked the scoring in a soccer match. So in a race comprising eight athletes, the winner was deemed to have scored seven ‘goals’ more than the last athlete, who was deemed to have scored no ‘goals’. Likewise, second and third athletes in the race were deemed to have scored six and five ‘goals’ respectively. This pattern continued until the second to last athlete, who scored only one ‘goal’ more than the last athlete. We used this matrix to calculate the cumulative points differential score for each athlete, which we compiled into vector, p. The Massey matrix was compiled as described above, with the number of ‘matches’ being the number of times the respective athletes ran against each other.
L. Statistical analysis
The final rankings produced by the various algorithms were compared with each other and also with the standings computed using the official IAAF Diamond League points system. In the 2015 season, for all the races, except the final meeting in Bruxelles, four points were awarded for first place, two points for second place, and one for third place . In the final race in Bruxelles the points awarded were double those awarded for the other races. However, in 2016 the IAAF changed its points system to accommodate lesser positions. Under the ‘2016’ points system, ten points were awarded for a win, six for second, four for third, three for fourth, two for fifth, and one for six . As in the ‘2015’ points system double points were awarded for the final race. Because both IAAF points systems reflected different attributes, we used both methods to compute alternative final standings for the athletes, which we than compared with the standings produced using the various algorithms described above.
In order to compare the standings produced by the various ranking systems, we calculated the Kendall’s tau rank distances between the respective standings for the top 23 athletes (i.e. the number of athletes ranked by the Diamond League ‘2016’ points system), which we then normalized (Eq (34)) to compute the percentage of pairs that differed in order between the ranking lists. Kendall’s tau distance is a metric that counts the number of pair order disagreements between two ranking lists. It can be normalized to yield the fraction of discordant pairs as follows: (34) where, τdist is the normalized Kendall’s tau distance; N is the number of items in each ranking list; and Nd is the number of discordant pairs.
In addition, Pearson correlation analysis was performed using the paired ranking scores computed by the various systems for the respective athletes. Statistical analysis of the data was performed using ‘in-house’ algorithms written in ‘R’ and Matlab. For all tests, p values <0.05 were deemed to be significant.
The standings computed by the PR algorithm for each competing athlete after every race are presented in Table 3. From this it can be seen that as the season progressed, the number of rank scores allocated, steadily increased as more and more athletes became involved in the competition. This is illustrated in Fig 3, which shows the respective connectivity networks for the system after three, six and ten races. In these network graphs the graph edges (i.e. the straight lines) link competitors who raced against each other, with the direction of the arrow indicating who beat who. It can be seen from this that after three races (Fig 3A) the network is still relatively simple, with only 16 competitors involved in the competition. However, when all 33 athletes have become involved, after ten races, the network becomes much more complex and difficult to understand (Fig 3C).
The results in Table 3 reveal that, with just one exception, the PR algorithm consistently ranked Justin Gatlin as number one throughout the season, while Tyson Gay and Mike Rogers broadly competed for second and third place. When athletes of higher quality, such as Marvin Bracy and Usain Bolt, entered the competition, they scored relatively highly, depressing the rank scores of less able competitors. Of the lower ranking athletes, Harry Adams, Guy-Elphege Anouman, Julian Forte, Joseph Morris, Justin Walker and Isiah Young only ran once and all finished last in the single race in which they were entered. As such, they all received the same low PR and were ranked in equal last place.
Table 4 presents the PR, UP, Keener, Colley and Massey standings for the athletes after ten races, together with fastest times achieved and the rankings calculated using the official IAAF Diamond League 2015 and 2016 points systems. This reveals that the PR standings bear a close resemblance to the rankings achieved using the Diamond League points system (particularly the 2016 system) and also, to a lesser extent, those produced by the Keener and UP algorithms. Indeed, the top five places for the PR, UP, Keener and 2016 points system standings were all identical. The Kendall’s tau distances and Pearson correlations between the various standings are presented in Table 5. From these it can be seen that the tau distance between the PR standings and ‘2016’ points system standings was 15, indicating that only 5.9% of pairs differed in their order between the two lists. The equivalent tau distances for the Keener and UP standings were 76 (30.0% pair order disagreement) and 95 (37.5% pair order disagreement) respectively, indicating that their agreement with the 2016 Diamond League points system was much less strong than that produced using the PR algorithm. The close relationship between PR scores and the ‘2016’ points is highlighted in the scatter plot shown in Fig 4. From this it can be seen that there is a very strong positive correlation (r = 0.984, p<0.001) between the two. By comparison, the ‘2016’ and UP standings displayed a less strong relationship (r = 0.845, p<0.001) (Fig 5), as did the ‘2016’ and Keener standings (r = 0.924, p<0.001). These differences are reflected in the tau distances between the PR and the UP and Keener standings, which were 87 (34.4% pair order disagreement) and 67 (26.5% pair order disagreement) respectively.
While the Colley and Massey algorithms yielded standings that were very similar to each other (tau distance = 19, 7.5% pair order disagreement), the standings produced by both these methods were markedly different from those produce by either the PR and Keener algorithms, or indeed the Diamond League points systems. For example, the Colley and Massey algorithms ranked Usain Bolt and Marvin Bracy much higher, and Mike Rogers much lower, than any of the other methods. Notwithstanding this, some similarities were observed between the UP standings and the Colley and Massey standings, with the tau distances between these ranking lists being only 48 (19.0% pair order disagreement) and 59 (23.3% pair order disagreement) respectively, much smaller than the equivalent distances between the UP standings and the PR and Keener algorithms.
The results of our study are promising and suggest that graph based algorithms have considerable potential as tools for ranking the performance of track athletes, without the need for complex tables. Currently, the ‘All-athletics’ World Rankings [27, 28] rely on a complicated system in which athletes accumulate points as they compete in IAAF approved competitions. For any given competition the overall performance score of the athlete is the sum of the ‘result score’ (based on the time achieved) and a ‘placing score’ (generally based on the athletes placing in the final of the competition) . The results score is awarded for the result (i.e. time) achieved according to the IAAF Scoring Tables of Athletics  and is modified depending on factors such as wind speed and hand timing . By comparison, placing scores are awarded only to those who reach the final of a competition (or semi-final in the case of the Olympic Games and the IAAF World Outdoor Championships), with competitions allocated different amounts of points depending on their perceived quality of the event. For example, while winning a track event at a IAAF Diamond League meeting will earn a placing score of 170, winning the same event at a IAAF World Challenge meeting will only earn 100 points . As such, the IAAF system does not primarily assess ‘who beat who’, but rather, relies more on the times achieved and a series arbitrary classifications designed to reflect the quality of the competition. Consequently, there is the potential for subjectivity to enter the ranking system. By comparison systems such as the PR and UP algorithms are more objective and are much simpler, relying purely on a ‘who beat who’ matrix, no matter the level of competition, thus eliminating the need for complex tables. Furthermore, because the IAAF ranking system only allocates placing scores to the top athletes who reach the track finals, it does not reflect the race competitiveness (i.e. placings) of the lesser athletes, something that can be problematic for coaches wanting to assess the true performance of developing athletes. Thus the development of graph based algorithms capable of objectively ranking athletes, particularly developing athletes, may be of considerable benefit to all those involved in athletics.
In the present study we used the 2015 Diamond League 100m races because collectively they represented an asymmetric system of manageable size, which could easily be assessed. The Diamond League also had the added bonus of a points system, which we could use for comparison purposes. While the Diamond League points system reflects the idiosyncrasies of the competition, and therefore should not be considered definitive, it nonetheless proved a useful tool with which to assess the performance of the ranking algorithms. From Table 4, it is clear that the PR algorithm yielded very similar results to those achieved using the ‘2016’ Diamond League points system. This was particularly true for the top ranking athletes, most of who appeared in identical or similar positions in the respective standings for both methods. However, greater variability was observed between the two sets of standings for the lower ranked competitors. This is probably due to the fact that the Diamond League points system allocates placing points, irrespective of the quality of the competitors involved. So in a weak race, a lesser athlete might score more highly than would otherwise be the case if the competition were stronger, with the result that the standings may not accurately reflect the true performance of that athlete. Also, because the Diamond League points system is integer based, this means that it tends to lose definition when the point scores are low, with the result that points scored in a weak race can make a considerable difference to the rank position amongst the weaker athletes. In comparison, because the PR system has greater definition and accurately reflects ‘who beat who’, it is more able to capture true race performance and distinguish between athletes in the lower reaches of the standings.
In our PR model an athlete who beats another athlete can be considered as receiving a ‘vote’ from the beaten athlete. Consequently, the winner of a race involving eight athletes will receive seven votes. However, in the PR model these votes are not all equal, because beaten athletes with a higher PR score will make a greater contribution to the PR score of the race winner, than those from beaten athletes exhibiting a low PR score. As such, the PR algorithm tends to favour individuals who race more often and perform reasonably well (e.g. Mike Rodgers, who raced many times and was placed second or third in most races), over outstanding athletes, such as Usain Bolt, who participate sparingly. However, the same would generally be true for a points based system. As such, both approaches tend to favour athletes who perform consistently well, over those who put in outstanding ‘one-off’ performances. So although the PR algorithm and the points system utilise very different methodologies, in the context of the Diamond League, they appear to achieve very similar results.
When compared with the other ranking systems, the PR algorithm produced results that were most similar to those of the Keener algorithm, with a 26.5% pair order disagreement between the two ranking lists. This suggests that the Keener algorithm, as executed here, produces standings that capture some of the attributes displayed in the PR standings, and indicates that it may also have potential as a tool for assessing track athletes. Having said this, it is noticeable that the Keener algorithm ranked Marvin Bracy relatively lowly, despite this athlete winning the race in which he competed. This appears to be due to the fact that only seven athletes competed in Bracy’s race, whereas the Keener algorithm ranked Usain Bolt six places above Bracy, primarily because he won a race in which nine athletes competed. As such this represents a serious limitation of the Keener method, which may impede its application to track athletics.
The points based scoring system used by the Diamond League reflects the fact that it is a league, in which consistency is rewarded over ‘one-off’ victory. So we have the paradoxical situation where Kim Collins, who entered six races and whose best position was fourth, scored more points (under the 2016 system) than Usain Bolt (multiple Olympic champion) who finished first in the only race in which he ran. Although the PR algorithm also placed Kim Collins above Usain Bolt, the UP algorithm did not, placing Usain Bolt 6th and Kim Collins 16th, suggesting that the two algorithms detected different nuances. So while the UP, PR and 2016 points systems all agreed about the first five standings, below this the results for the UP algorithm diverged from those produced by the other two approaches. The primary reason for this is that the UP algorithm tends to average the scores achieved so that ‘one-off’ victors will tend to be ranked higher than more mediocre athletes who have competed many times. As such, the UP algorithm appears to share some of the characteristics of the Colley and Massey algorithms, which both ranked ‘one-off’ victors highly. Indeed, it is noticeable (Table 5) that there was relatively little pair order disagreement between the ranking lists produced by the Colley, Massey and UP algorithms. Consequently, these algorithms may be better suited to predicting future performance, say in knockout tournaments, rather than assessing performance in competitions such as the Diamond League. The differences between the PR and UP algorithms are starkly highlighted in Figs 4 and 5. Not only did the PR algorithm rank the athletes in a similar order to the Diamond League ‘2016’ points system, it did so using broadly similar intervals to the Diamond League method, as demonstrated by the general linearity of the scatter plot in Fig 4. By comparison, the scatter plot in Fig 5 displays marked non-linearity between the results of the UP algorithm and the Diamond League system.
Linear algebra based sports ranking models such as the Coley matrix method [13, 19] and the Massey least squares system [13, 20] have been used for many years to rate teams in the Bowl Championship Series in American college football . While these algorithms are well suited to team sports where each match consists of two teams playing each other, they are more difficult to apply in sports such as track athletics, where eight athletes may compete against each other in a single race, and where athletes compete in different numbers of races. Attempts have been made to address this later point by introducing the concept of a ‘super-user’ into the competition, where unequal numbers of games have been played, with promising results . However, this does not address the former point, and so when applying the Colley and Massey algorithms to the Diamond League, we had to adapt them so that the athletes mimicked ‘teams’ playing each other. While the standings produced by both these algorithms tended to rank strong ‘on-off’ athletes such as Usain Bolt and Marvin Bracy very highly, overall they did not reflect the structure and ethos of the Diamond League. By comparison, the PR algorithm appears much better suited to track athletics tournaments such as the Diamond League. Prior to the present study, the PR algorithm had been used to rate the performance of teams in the NFL [11–13], international soccer teams , and individual soccer players [15, 16], but not track athletes. As such, there was no prior precedence regarding the construction of the adjacency and UP matrices used in the study. Consequently, one of the major challenges associated with the study was to develop a suitable methodology for ensuring that these matrices accurately reflected the characteristics of the competition. After due consideration, for the PR algorithm, we decided to construct an adjacency matrix in which the more successful athletes received ‘votes’ from the defeated athletes whom they beat. So for a race containing eight athletes, the winner would receive seven votes, while the last athlete would receive none. This meant that at the end of the ten race series, there were a few athletes who were placed equal bottom of the standings by the PR algorithm simply because they competed only once and came last in their respective races. With regard to the UP algorithm, one major challenge was how best to cope with the varying number of athletes in each race. If, as is normal for Internet retailing, we had adopted a ‘five star’ system, then only the first five athletes in each race would have been ranked. Conversely, if we had awarded ‘stars’ to every athlete that competed, then the winner of a race containing nine competitors would receive two more ‘stars’ than an athlete who won a race involving just seven individuals. We therefore decided on the reverse ‘star’ strategy described in the methodological section above, because this best preserved scoring parity amongst the leading athletes. This strategy did however have a downside, in that an athlete who finished last in a race involving just seven athletes would receive more ‘stars’ than one involved in a race with eight athletes. As such, this led to a few anomalies amongst the lower ranked athletes for the UP algorithm. For example, the UP algorithm ranked Julian Forte relatively high (twenty-fifth) compared to other athletes, despite the fact that he finished last in his respective race, simply because that race contained just seven athletes.
The purpose of the study presented here was not to develop the perfect ranking algorithm, but rather to explore the potential for applying Internet related algorithms in a sporting context were competition is fragmented. With respect to this both the PR and UP algorithms appear to have potential. Because they are both essentially ‘who beat who’ models that automatically rank the contestants, they minimise any subjectivity that would otherwise be involved. Consequently, they have the potential to objectively rank individuals and teams in sports where competition between competitors can be sparse, such as in boxing, tennis and athletics, or in international soccer and rugby. In these sports, lower ranked competitors frequently compete at a more regional level, making comparisons between individuals and teams difficult. In athletics, the ability to assign a ‘fine tuned’ score to lesser ranked athletes based solely on who beat who, rather than an arbitrary points system, may be something of particular interest to coaches seeking to assess the true performance of developing athletes, and also sporting authorities seeking to prioritise funds. Such athletes often fail to reach the finals of competitions and therefore, under the IAAF system, are not awarded a placing score, making it difficult for the coaches to assess their true performance.
While in the present study we used race position to construct the respective adjacency and UP matrices, we are aware that the difference in race time between athletes was also an option. However, in this study we decided against this because of potential differences between race tracks and variations in weather conditions, which might have compromised our results. Consequently, both the PR and UP models are essentially ‘who beat who’ methods that take no account of race times. While this approach appears well suited to the ethos of the Diamond League, we are conscious that our models could be improved by incorporating race times and plan to investigate this in the future. We are also aware that we have ignored other ranking systems that might prove useful with regard to track athletes, the foremost of which is perhaps Elo’s system which has been used for many years to rate chess players , and has latterly been applied to the NFL , soccer  and even the study of animal behaviour . Elo’s system uses the difference in the ratings between two players to predict the outcome of any given contest, but has an interesting self-correcting mechanism, which may have applicability to track athletics. After every contest, the winning player takes points from the loser, with the amount of rating points transferred governed by the difference between the prior ratings of the two players. For any given contest, if the high-ranked player wins, then only a few points are taken from the low-ranked player. However, if an upset occurs and the lower ranked player wins, then many rating points are transferred from the high-rank loser. Consequently, Elo’s system is self-correcting, with players whose rating is too low, in the long run doing better than their initial rating might predict, with the result that they gain points until their rating reflects their true performance level. As such, Elo’s system shares some similarities with the PR algorithm, insomuch that they both consider prior rating positions when calculating outcome ranks. However, unlike the PR algorithm, which updates the whole directed graph of the system after every contest, Elo’s method restricts itself to updating the ratings of just the players involved in any given match .
In conclusion, we have shown that the PR and UP algorithms, when utilised with a simple ‘who beat who’ matrix, can be used to accurately rank track athletes. With specific reference to the IAAF Diamond League men’s 100m events, the PR model produces very similar results to those achieved using the Diamond League ‘2016’ points scoring system, but with more definition between the athletes, particularly in the lower ranks. By comparison, the UF algorithm captures other characteristics and may be more suitable for predicting future performance, say in knockout tournaments. As such, the algorithms appear to have considerable potential for objectively assessing the relative performance of track athletes, without the need for complicated points equivalence tables. Importantly, because the algorithms utilise a ‘who beat who’ model, they automatically adjust for the strength of the competition, thus avoiding the need for subjective decision making.
- Conceptualization: CB SS BJ.
- Data curation: CB.
- Formal analysis: CB SS.
- Investigation: CB.
- Methodology: CB BJ.
- Software: CB.
- Writing – original draft: CB BJ SE.
- Writing – review & editing: CB BJ SE SS.
- 1. Till K, Cobley S, Morley D, O'Hara J, Chapman C, Cooke C. The influence of age, playing position, anthropometry and fitness on career attainment outcomes in rugby league. Journal of sports sciences. 2015;34(13):1240–5. pmid:26512761
- 2. Till K, Jones BL, Cobley S, Morley D, O'Hara J, Chapman C, et al. Identifying Talent in Youth Sport: A Novel Methodology Using Higher-Dimensional Analysis. PLoS One. 2016;11(5):e0155047. pmid:27224653.
- 3. Spiriev B, Spiriev A. IAAF scoring tables of athletics: International Association of Athletics Federations; 2014.
- 4. RFL. Rugby Foofball League: Section B3: League competition rules2016 1st December 2016. http://media.therfl.co.uk/docs/Section%20B3%20-%20League%20Competiton%20Rules_Final%20PDF.pdf.
- 5. Robinson J. Understanding the Premier League2016 1st December 2016. http://worldsoccer.about.com/od/soccer101/a/101_Prem.htm.
- 6. Premiership_Rugby. League Rules2016 1st December 2016. http://rd.premiershiprugby.com/matchcentre/tables/index.php#u7BeJIr0kp3sZAey.97.
- 7. OWG. Official world golf ranking: How the ranking system works2016 30th November 2016. http://www.owgr.com/about.
- 8. ATP. ATP world rankings: What is the ranking structure and formula for 2016?2016 1st December 2016. http://www.atpworldtour.com/en/rankings/rankings-faq.
- 9. Brin S, Page L. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems. 1998;33:107–17.
- 10. Brin S, Page L. Reprint of: The anatomy of a large-scale hypertextual web search engine. Computer Networks. 2012;56(18):3825–33.
- 11. Zack L, Lamb R, Ball S. An application of Google's PageRank to NFL rankings. Involve, a Journal of Mathematics. 2012;5(4):463–71.
- 12. Govan AY, Meyer CD, editors. Ranking national football league teams using google's pagerank. AA Markov Anniversary Meeting; 2006; Charleston: Boson Books.
- 13. Govan AY. Ranking theory with application to popular sports. Raleigh: North Carolina State University; 2008.
- 14. Lazova V, Basnarkov L. PageRank Approach to Ranking National Football Teams. arXiv preprint arXiv:150301331. 2015.
- 15. Pena JL, Touchette H, editors. A network theory analysis of football strategies Sports Physics: Euromech Physics of Sports Conference 2012; Palaiseau: Proc. 2012, Editions de l'Ecole Polytechnique.
- 16. Brandt M, Brefeld U. Graph-based Approaches for Analyzing Team Interaction on the Example of Soccer. Machine Learning and Data Mining for Sports Analytics; 11th September; Porto2015.
- 17. Balreira EC, Miceli BK, Tegtmeyer T. An Oracle method to predict NFL games. Journal of Quantitative Analysis in Sports. 10(2):183–96.
- 18. Langville AN, Meyer CD. Who's# 1?: the science of rating and ranking. Princeton: Princeton University Press; 2012.
- 19. Colley WN. Colley's bias free college football ranking method: The Colley matrix explained. Princeton University, Princeton. 2002.
- 20. Massey K. Statistical models applied to the rating of sports teams. Bluefield College. 1997.
- 21. Keener JP. The Perron-Frobenius theorem and the ranking of football teams. SIAM review. 1993;35(1):80–93.
- 22. 2015 Diamond League results [Internet]. 2015 [cited 4th September2016]. https://www.diamondleague.com/lists-results/archive/2015/.
- 23. Bressan M, Peserico E. Choose the damping, choose the ranking? Journal of Discrete Algorithms. 2010;8(2):199–213.
- 24. Meyer CD. Chapter 8: Perron-Frobenius theory of nonnegative matrices. Matrix analysis and applied linear algebra. Philadelphia: Society for Industrial and Applied Mathematics; 2000.
- 25. How it works: IAAF Diamond League 2015 media guide [Internet]. IAAF. 2015 [cited 4th September 2016]. https://www.diamondleague.com/fileadmin/IDL_Default/files/documents/2015/2015_IDL_media_guide.pdf.
- 26. The Diamond Race Rules and Points System [Internet]. IAAF. 2016 [cited 4th September 2016]. https://www.diamondleague.com/rules/.
- 27. Rules of the All-Athletics World Rankings—2016 [Internet]. All-athletics.com. 2016 [cited 4th September 2016]. http://www.all-athletics.com/en-us/rules-all-athletics-world-rankings-2016.
- 28. Rules of the All-Athletics World Rankings—2016—Overall Rankings [Internet]. All-athletics.com. 2016 [cited 4th September 2016]. http://www.all-athletics.com/en-us/rules-all-athletics-world-rankings-2016-overall-rankings.
- 29. Chartier TP, Harris J, Hutson KR, Langville AN, Martin D, Wessell CD. Reducing the Effects of Unequal Number of Games on Rankings. IMAGE-The Bulletin of the International Linear Algebra Society. 2014;52(1).
- 30. Paine N. NFL Elo Ratings Are Back!2015 10th March 2017. https://fivethirtyeight.com/datalab/nfl-elo-ratings-are-back/.
- 31. World football Elo ratings [Internet]. 1997 [cited 10th March 2017]. http://www.eloratings.net/.
- 32. Albers PCH, de Vries H. Elo-rating as a tool in the sequential estimation of dominance strengths. Animal Behaviour. 2001;61:489–95.