^{*}

Conceived and designed the experiments: FR. Performed the experiments: FR. Analyzed the data: FR. Contributed reagents/materials/analysis tools: FR. Wrote the manuscript: FR.

The author has declared that no competing interests exist.

We considered all matches played by professional tennis players between 1968 and2010, and, on the basis of this data set, constructed a directed and weighted network of contacts. The resulting graph showed complex features, typical of many real networked systems studied in literature. We developed a diffusion algorithm and applied it to the tennis contact network in order to rank professional players.

Social systems generally display complex features

During last years, the analysis of social systems has become an important topic of interdisciplinary research and as such has started to be not longer of interest to social scientists only. The presence of a huge amount of digital data, describing the activity of humans and the way in which they interact, has made possible the analysis of large-scale systems. This new trend of research does not focus on the behavior of single agents, but mainly on the analysis of the macroscopic and statistical properties of the whole population, with the aim to discover regularities and universal rules. In this sense, professional sports also represent optimal sources of data. Soccer

In this paper we continue in this direction of research and present a novel example of a real system, taken from the world of professional sports, suitable for network representation. We consider the list of all tennis matches played by professional players during the last 43 years (1968–2010). Matches are considered as basic contacts between the actors in the network and weighted connections are drawn on the basis of the number of matches between the same two opponents. We first provide evidence of the complexity of the network of contacts between tennis players. We then develop a ranking algorithm similar to PageRank and quantify the importance of tennis players with the so-called “prestige score”. The results presented here indicate once more that ranking techniques based on networks outperform traditional methods. The prestige score is in fact more accurate and has higher predictive power than well established ranking schemes adopted in professional tennis. More importantly, our ranking method does not require the introduction of external criteria for the assessment of the quality of players and tournaments. Their importance is self-determined by the various competitive processes described by the intricate network of contacts. Our algorithm does nothing more than taking into account this information.

Data were collected from the web site of the Association of Tennis Professionals (ATP,

In panel a, we report the total number of tournaments (top panel) and players (bottom panel) as a function of time. In panel b, we plot the fraction of players having played (black circles), won (red squares) and lost (blue diamonds) a certain number of matches. The black dashed line corresponds to the best power-law fit with exponent consistent with the value

We represent the data set as a network of contacts between tennis players. This is a very natural representation of the system since a single match can be viewed as an elementary contact between two opponents. Each time the player

In panel a, we draw the subgraph of the contact network restricted only to those players who have been number one in the ATP ranking. Intensities and widths are proportional to the logarithm of the weight carried by each directed edge. In panel b, we report a schematic view of the matches played during a single tournament, while in panel c we draw the network derived from it.

The network representation can be used for ranking players. In our interpretation, each player in the network carries a unit of “tennis prestige” and we imagine that prestige flows in the graph along its weighted connections. The process can be mathematically solved by determining the solution of the system of equations

In general topologies, analytical solutions of Eqs. (1) are hard to find. The stationary values of the scores

In the simplest case in which the graph is obtained by aggregating matches of a single tournament only, we can analytically determine the solutions of Eqs. (1). In a single tournament, matches are hierarchically organized in a binary rooted tree and the topology of the resulting contact network is very simple [see

In Eq. (4),

In the former calculations, we have used the well known identity

It is worth to notice that for

In

Prestige score

We set

In

In panel a, we present a scatter plot of the prestige rank

Rank | Player | Country | Hand | Start | End |

1 | Jimmy Connors | United States | L | 1970 | 1996 |

2 | Ivan Lendl | United States | R | 1978 | 1994 |

3 | John McEnroe | United States | L | 1976 | 1994 |

4 | Guillermo Vilas | Argentina | L | 1969 | 1992 |

5 | Andre Agassi | United States | R | 1986 | 2006 |

6 | Stefan Edberg | Sweden | R | 1982 | 1996 |

7 | Roger Federer | Switzerland | R | 1998 | 2010 |

8 | Pete Sampras | United States | R | 1988 | 2002 |

9 | Ilie N |
Romania | R | 1968 | 1985 |

10 | Björn Borg | Sweden | R | 1971 | 1993 |

11 | Boris Becker | Germany | R | 1983 | 1999 |

12 | Arthur Ashe | United States | R | 1968 | 1979 |

13 | Brian Gottfried | United States | R | 1970 | 1984 |

14 | Stan Smith | United States | R | 1968 | 1985 |

15 | Manuel Orantes | Spain | L | 1968 | 1984 |

16 | Michael Chang | United States | R | 1987 | 2003 |

17 | Roscoe Tanner | United States | L | 1969 | 1985 |

18 | Eddie Dibbs | United States | R | 1971 | 1984 |

19 | Harold Solomon | United States | R | 1971 | 1991 |

20 | Tom Okker | Netherlands | R | 1968 | 1981 |

21 | Mats Wilander | Sweden | R | 1980 | 1996 |

22 | Goran Ivaniševi |
Croatia | L | 1988 | 2004 |

23 | Vitas Gerulaitis | United States | R | 1971 | 1986 |

24 | Rafael Nadal | Spain | L | 2002 | 2010 |

25 | Raúl Ramirez | Mexico | R | 1970 | 1983 |

26 | John Newcombe | Australia | R | 1968 | 1981 |

27 | Ken Rosewall | Australia | R | 1968 | 1980 |

28 | Yevgeny Kafelnikov | Russian Federation | R | 1992 | 2003 |

29 | Andy Roddick | United States | R | 2000 | 2010 |

30 | Thomas Müster | Austria | L | 1984 | 1999 |

In general, players still in activity are penalized with respect to those who have ended their careers. Prestige score is in fact strongly correlated with the number of victories [see panel a of

Year | Prestige | ATP year-end | ITF |

1968 | Rod Laver | - | - |

1969 | Rod Laver | - | - |

1970 | Rod Laver | - | - |

1971 | Ken Rosewall | - | - |

1972 | Ilie N |
- | - |

1973 | Tom Okker | Ilie N |
- |

1974 | Björn Borg | Jimmy Connors | - |

1975 | Arthur Ashe | Jimmy Connors | - |

1976 | Jimmy Connors | Jimmy Connors | - |

1977 | Guillermo Vilas | Jimmy Connors | - |

1978 | Björn Borg | Jimmy Connors | Björn Borg |

1979 | Björn Borg | Björn Borg | Björn Borg |

1980 | John McEnroe | Björn Borg | Björn Borg |

1981 | Ivan Lendl | John McEnroe | John McEnroe |

1982 | Ivan Lendl | John McEnroe | Jimmy Connors |

1983 | Ivan Lendl | John McEnroe | John McEnroe |

1984 | Ivan Lendl | John McEnroe | John McEnroe |

1985 | Ivan Lendl | Ivan Lendl | Ivan Lendl |

1986 | Ivan Lendl | Ivan Lendl | Ivan Lendl |

1987 | Stefan Edberg | Ivan Lendl | Ivan Lendl |

1988 | Mats Wilander | Mats Wilander | Mats Wilander |

1989 | Ivan Lendl | Ivan Lendl | Boris Becker |

1990 | Stefan Edberg | Stefan Edberg | Ivan Lendl |

1991 | Stefan Edberg | Stefan Edberg | Stefan Edberg |

1992 | Pete Sampras | Jim Courier | Jim Courier |

1993 | Pete Sampras | Pete Sampras | Pete Sampras |

1994 | Pete Sampras | Pete Sampras | Pete Sampras |

1995 | Pete Sampras | Pete Sampras | Pete Sampras |

1996 | Goran Ivaniševi |
Pete Sampras | Pete Sampras |

1997 | Patrick Rafter | Pete Sampras | Pete Sampras |

1998 | Marcelo Ríos | Pete Sampras | Pete Sampras |

1999 | Andre Agassi | Andre Agassi | Andre Agassi |

2000 | Marat Safin | Gustavo Kuerten | Gustavo Kuerten |

2001 | Lleyton Hewitt | Lleyton Hewitt | Lleyton Hewitt |

2002 | Lleyton Hewitt | Lleyton Hewitt | Lleyton Hewitt |

2003 | Roger Federer | Andy Roddick | Andy Roddick |

2004 | Roger Federer | Roger Federer | Roger Federer |

2005 | Roger Federer | Roger Federer | Roger Federer |

2006 | Roger Federer | Roger Federer | Roger Federer |

2007 | Rafael Nadal | Roger Federer | Roger Federer |

2008 | Rafael Nadal | Rafael Nadal | Rafael Nadal |

2009 | Novak Djokovi |
Roger Federer | Roger Federer |

2010 | Rafael Nadal | Rafael Nadal | Rafael Nadal |

We perform also a different kind of analysis by constructing networks of contacts for decades and for specific types of playing surfaces. According to our score, the best players per decade are (

Tools and techniques of complex networks have wide applicability since many real systems can be naturally described as graphs. For instance, rankings based on diffusion are very effective since the whole information encoded by the network topology can be used in place of simple local properties or pre-determined and arbitrary criteria. Diffusion algorithms, like the one for calculating the PageRank score

Here we have reported another emblematic example of a real social system suitable for network representation: the graph of contacts (i.e., matches) between professional tennis players. This network shows complex topological features and as such the understanding of the whole system cannot be achieved by decomposing the graph and studying each component in isolation. In particular, the correct assessment of players' performances needs the simultaneously consideration of the whole network of interactions. We have therefore introduced a new score, called “prestige score”, based on a diffusion process occurring on the entire network of contacts between tennis players. According to our ranking technique, the relevance of players is not related to the number of victories only but mostly to the quality of these victories. In this sense, it could be more important to beat a great player than to win many matches against less relevant opponents. The results of the analysis have revealed that our technique is effective in finding the best players of the history of tennis. The biases mentioned in the case of citation networks are not present in the tennis contact graph. Players do not need to be classified since everybody has the opportunity to participate to every tournament. Additionally, there is not temporal dependence because matches are played between opponents still in activity and the flow does not necessarily go from young players towards older ones. In general, players still in activity are penalized with respect to those who already ended their career only for incompleteness of information (i.e., they did not play all matches of their career) and not because of an intrinsic bias of the system. Our ranking technique is furthermore effective because it does not require any external criteria of judgment. As term of comparison, the actual ATP ranking is based on the amount of points collected by players during the season. Each tournament has an

In conclusion, we would like to stress that the aim of our method is not to replace other ranking techniques, optimized and almost perfected in the course of many years. Prestige rank represents only a novel method with a different spirit and may be used to corroborate the accuracy of other well established ranking techniques.

(PDF)

(PDF)

(PDF)

(PDF)

(PDF)

(PDF)

(PDF)

We thank the Association of Tennis Professionals for making publicly available the data set of all tennis matches played during last 43 years. Helpful discussions with Patrick McMullen are gratefully acknowledged as well.