The quantitative measure and statistical distribution of fame

Fame and celebrity play an ever-increasing role in our culture. However, despite the cultural and economic importance of fame and its gradations, there exists no consensus method for quantifying the fame of an individual, or of comparing that of two individuals. We argue that, even if fame is difficult to measure with precision, one may develop useful metrics for fame that correlate well with intuition and that remain reasonably stable over time. Using datasets of recently deceased individuals who were highly renowned, we have evaluated several internet-based methods for quantifying fame. We find that some widely-used internet-derived metrics, such as search engine results, correlate poorly with human subject judgments of fame. However other metrics exist that agree well with human judgments and appear to offer workable, easily accessible measures of fame. Using such a metric we perform a preliminary investigation of the statistical distribution of fame, which has some of the power law character seen in other natural and social phenomena such as landslides and market crashes. In order to demonstrate how such findings can generate quantitative insight into celebrity culture, we assess some folk ideas regarding the frequency distribution and apparent clustering of celebrity deaths.


Introduction
The phenomena of fame and celebrity are increasingly important in our culture. With the rapid expansion of electronic media, fame plays a growing role in commerce, media, and public affairs, as well as in legal and academic spheres [1]. Social media have boosted the visibility of celebrities of all kinds, allowing individuals to acquire or lose fame overnight [2]. Celebrity endorsements offer value to businesses, political campaigns and cultural organizations. Fame affects the economic value of names and trademarks [3], and it aids professional advancement in a variety of fields.
Researchers have explored some aspects of fame, such as the psychological motives of the would-be famous [4] and sex bias in assessments of fame [5]. Several studies have also attempted to correlate the fame of well known individuals with measures of their professional achievement [6] [7] [8] [9] [10] [11], which is measured by different tools in different fields. Remarkably however, although fame clearly exists in degrees there is no consensus on its a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 quantitative measure: researchers who have attempted to quantify fame have relied on a variety of ad hoc measures that have not themselves been evaluated or calibrated.
One common such measure has been search engine results. Following Schulman's proposal [12] that an individual's fame is revealed by the number of web pages returned from an internet search for his/her name, many researchers have used Google hits to quantify fame. Google hits (denoted GH, the number of web pages returned in a Google search for an individual's name) has been used to quantify the fame of WWI flying aces [6] and chess masters [9] as well as physicists [10]. Similarly, some researchers have used Wikipedia data, including page views and other measures of Wikipedia presence, to quantify the fame of athletes [8] and historical figures [11] and to predict movie box office success [13]. Other social media tools have also been employed to measure attributes related to fame; one study used the number of Twitter followers [7] to gauge the social media visibility of a sampling of scientists.
There are alternatives to using internet tools or social media to measure fame. Psychological researchers have studied how panels of human subjects judge the fame of well-known individuals [5] [14]. A recent "culturomics" study tracked the rise and fall of individuals' fame on historical time scales by measuring the frequency of their mention in a large database of digitized texts [2]. Some journalists regard the length of an individual's obituary, as well as its advance preparation, as indices of fame [15].
Nevertheless internet tools such as GH are convenient to use. Unfortunately, researchers have generally not attempted to validate these tools by testing whether they give results consistent with other measures or intuitive indicators of fame. It is remarkable that no metric has been tested for measuring a phenomenon that is unequally distributed and yet has demonstrable utility and economic value across fields. Furthermore, studies of fame have often failed to define it separately from related concepts such as celebrity, professional accomplishment, and media profile. This lack of precision hinders the quantitative study of fame and its accurate valuation. It also prevents serious assessment of common claims about celebrity and fame: The often-made assertions that an unusual number of famous individuals died within a given year [16], or that famous deaths occur in clusters [17], or that famous musicians die young [18], cannot be assessed unless fame can be quantified. This article aims to demonstrate that, starting from a clear definition of fame, internet-derived metrics of fame can be found that correlate with quantitative human judgments of fame, and that using these metrics one may gain insight into some of the statistical properties of fame.
For clarity we begin by defining an individual's fame as his/her degree of renown, or state of being well known, to a population. We do not assume that the fame of an individual correlates with accomplishment as judged by that population, or that the metrics of accomplishment favored by that population are also metrics of fame.
In contrast we define celebrity as the close media attention that is provided to the most famous individuals; thus a celebrity is one whose ordinary activities receive media attention. Fame and celebrity correlate, but they are not the same. These definitions accord with those of Drake and Miah [1], who described a celebrity as a mediated public persona.
By our definition the fame of an individual can be measured as a snapshot taken from the perspective of a given population at a particular time. Metrics that gauge renown among different populations at different times may not agree completely. We do not attempt to measure the fame that individuals may have enjoyed in the past, or the peak fame that they achieved. Rather our approach is to study a diverse group of renowned individuals at one common time point in their career-the year following their death-and to quantify their fame at that time point. We do this by measuring their renown among a group of survey subjects. The survey data provides a baseline, quantitative fame score that we then compare against some plausible internet or social media metrics of fame. In this way we identify metrics of fame that can be easily employed on a larger scale to evaluate the renown of many individuals. We then use one such metric to investigate some statistical properties of fame and demonstrate how these statistics can provide insight into some folk ideas regarding the frequency of famous deaths.

Data sources
We investigated the fame of deceased individuals only. This is in part because we intended to use the findings to test folk claims regarding the frequency of celebrity deaths. However we also sought to minimize concerns related to name ownership and consent among the individuals whose fame we were evaluating. We also limited our study to those who had died within the year or so prior to this investigation, so as to avoid having to devise corrections for possible changes in the fame of individuals after their death.
We generated three lists (denoted NBC, Wiki, and NYT) of renowned individuals who died in 2016 or 2017. No individual appeared in more than one list. The NBC list consisted of 126 highly renowned individuals whose deaths occurred during the full year 2016 and received mention in NBC Online [19]. The Wiki list consisted of 78 names drawn at random from 642 individuals who (as of March 2017) were named on Wikipedia.org as having died during the month of January 2017 [20]. The NYT list consisted of two parts, totaling 147 individuals who were named in the New York Times online obituaries [21] as having died during two months in 2017; One part (NYT 1) is 75 individuals who died in February 2017, while the other part (NYT 2) is 72 individuals who died in June 2017.

Survey metric (p ratings)
In order to generate an intuitive and quantitative scale for fame, against which we could compare various other possible metrics for fame, we first used a survey, based on pairwise comparisons, to rank a list of twenty famous individuals according to their renown. The individuals named on the survey are listed in Table 1. They were selected on the basis of (1) having died during 2016, with their obituaries widely reported in news media, and therefore being plausibly described as famous; (2) spanning a sufficient range in renown that statistically significant differences in their rankings could emerge from the data analysis; (3) being known in fields for which the survey subjects likely possessed relevant general knowledge. Fifty undergraduate students at the University of Florida were recruited as subjects to complete the survey. Therefore the list in Table 1 is a sampling of major political and historical figures, top American athletes, stars of popular films and music, authors of books often read by students, and similar figures. The list excludes individuals associated with more specialized interests, such as cabinet secretaries, academics, playwrights, foreign athletes, classical musicians, and so forth.
Each survey subject was presented with a list of fifty different pairs of names, drawn from the twenty names in Table 1. Each pair of names could be presented in either order (A : B or B : A). The survey subject was asked to indicate a preference within each pair by identifying the name about which he/she had greater knowledge. The subject could also select a "no preference" option if he/she felt equally knowledgeable about both names. The fifty pairings on each survey form were computer-selected at random from the 380 possible pairs that can be generated from twenty names. Each subject received a unique, randomized version of the survey. The list of names was limited to twenty so that each of the possible name pairs could be presented to multiple survey subjects, without requiring a survey of excessive length. Thus, with twenty names and fifty subjects, each being offered fifty comparisons, the survey offered each pair of names to approximately 13 subjects. If instead 40 names had been tested, then 1560 name pairings would be possible and either the number of subjects or the length of the survey would have had to increase fourfold to achieve the same coverage.
Of the fifty name pairs offered to each subject, subjects responded with an average (and median) of 34 preferences and 16 "no preference" responses. 86% of subjects indicated a preference in at least half of the fifty pairs they were offered. Consequently, of the 2500 name pairs (50 subjects × 50 name pairs) offered to all subjects, 1679 elicited preferences and 821 elicited a "no preference" response from the subject. The preference data are provided in S1 Spreadsheet.
The "no preference" response could indicate that the subject was equally familiar with both names (two very famous names), or that the subject was equally unfamiliar with both names (two less famous names). Regardless of its cause, a "no preference" response does not facilitate the ranking of those two particular names by renown. As the purpose of the survey was to differentiate the individuals by renown, the "no preference" responses were omitted from the subsequent data analysis. The effect of survey sample size, including these omitted "no preference" responses, on the robustness of the obtained ranking was tested through (1) a bootstrap error analysis, discussed below, and (2) a log likelihood test, discussed in Survey Results.
We used a Bradley-Terry model [22] to convert the preference data to a quantitative measure of fame, assigning a rating p i to each individual i (i = 1. . . 20) in Table 1. In the Bradley-Terry model, the strength scores or ratings p i and p j determine the probability that individual i defeats individual j in a single pairwise comparison: with X 20 A maximum likelihood estimate for the twenty p i was extracted from the survey data by an iterative procedure [23] that rapidly converges to produce the optimal p i ; that is, it find the p i for which the dataset is most probable. In addition, to test how robustly our particular dataset determined those p i , we performed *2000 bootstrap random samplings of the maximum likelihood estimation. The bootstrap method yields an estimate for confidence levels in model parameters, reflective of the size and the internal self-consistency of the dataset. The uncertainties δp in the reported p i are the 1 σ deviations obtained from the bootstrap test.

Internet-based metrics of fame
We then sought to test how other plausible metrics for fame correlate with the rankings obtained from the survey. Some possible metrics for fame are problematic as they are not universally applicable or cannot readily be measured or estimated for non-celebrities or for living individuals, or they are weighted toward people in certain professions, or they are controlled by gatekeepers. These include an individual's wealth, the length of his/her obituary or Who's Who entry, numbers of Twitter followers, etc. Instead we sought to evaluate metrics that were (1) available for a wide range of individuals of diverse profession and varying fame, (2) reflective of the opinion of a large population or audience, rather than the judgment of curators or gatekeepers, (3) regularly updated, and (4) readily accessible through the internet. Based on these criteria, we selected the following plausible, internet-derived metrics of fame (S2 Spreadsheet) and evaluated them for the individuals on the NYT, NBC and Wiki lists: • GH-the total current Google hits returned for the individual; • GN-the total current Google news items citing that individual; • WE-the total edits to date of the individual's Wikipedia page; • WV-the total Wikipedia page views to date.
Most of the internet metrics for the names on the NYT, NBC and Wiki lists were assessed on March 8, 2017. The data for the NYT 2 list was assessed on July 12, 2017, and the Wikipedia page views (WV) were recorded on June 29, 2017.
Total current Google hits (GH) was obtained by searching an individual's name in Google and counting the number of links returned. Total current Google news items (GN) was obtained by searching an individual by name and profession in Google News and counting the number of links returned. For GN searches where the individual could be identified with more than one profession, the search that returned the most links was used. Total current Wikipedia page edits (WE) were obtained from an individual's Wikipedia page through the "History" feature.
To evaluate the temporal stability of these metrics we also retrieved time series data: WE t is the month-to-month time series of Wikipedia page edits, obtained from the Revision History Statistics of the Wikipedia page; GS t is the history of monthly Google searches, obtained from Google Trends; WV t is the daily history of Wikipedia page views, using Wikipedia PageViews Analysis.

Power law analysis
We used a maximum likelihood method to assess whether metric WE exhibits a power law distribution in the three lists studied [24]. If x is a discrete random variable whose probability distribution p(x) for x ! x min is a power law (Eq 4) then the value of α for a dataset x i (i = 1. . .n) is estimated by maximizing the logarithmic likelihood of the data As the data x i will not obey the power law below x min , an estimate for x min is also needed. For each name list we generated these estimates by minimizing the Kolmogorov-Smirnov distance between the cumulative distribution function (CDF) of the WE data and that of a perfect power law [24].

Data sharing
Datasets are provided as supplemental information in S1 and S2 Spreadsheets.

Use of human subjects
Undergraduate students completed the fame p i survey under protocol IRB201700835, which was approved as exempt by the University of Florida Institutional Review Board (Behavior/ Nonmedical, IRB-02). Volunteer subjects were recruited in mid-June 2017, from public areas of the University of Florida campus. Each subject read an informed consent document and provided oral consent for participation. The written consent requirement was waived owing to the minimal risk and the fact that no sensitive or identifying information was collected from the subjects.

Survey results
We used a Bradley-Terry model [22] to extract from the survey data a quantitative score of renown or fame for each of the individuals in Table 1. These scores are the p values that are shown in Table 1 and Fig 1. The p of each individual is a measure of his/her degree of renown, as derived from the set of pairwise comparisons or preferences reported by all the survey subjects. The maximum likelihood method described in Methods identifies the unique, self-consistent set of p values for which the entire survey dataset of 1679 subject preferences is most probable, based on Eq 1. The obtained p range over almost two orders of magnitude, from a maximum of 0.18 ± 0.03 to a minimum of 0.0029 ± 0.0009, indicating that the fame of the different individuals spans almost two decades, at least from the perspective of the subject population.
To assess whether (a) the number of survey subjects and (b) the number of preferences reported by those subjects were both sufficiently large, we examined the robustness of the p values in Table 1 using two different statistical tests. First, as described in Methods, we applied a bootstrap random sampling to evaluate the confidence intervals in the p values, given our dataset. The bootstrap method is a model-free approach that takes account of the size of the dataset as well as any lack of knowledge of the true or theoretical distribution of the model parameters. As shown in Fig 1, the uncertainties δp determined from the bootstrap correspond to relative uncertainties δp/p of 10-30%. As the less famous names in Table 1 more frequently drew a "no preference" response from survey subjects, such names occur less often in the dataset; accordingly the bootstrap analysis finds a larger relative uncertainty δp/p for these names.
The relative uncertainty increases about two-fold from the best known (δp/p ' 16% for p ' 0.2) to the least known (δp/p ' 30% for p ' 3 × 10 −3 ) individuals. Nevertheless these relative uncertainties are still substantially smaller than most of the name-to-name differences in p values. This analysis shows that the survey dataset contains sufficient, self-consistent preference data to establish a robust ranking. As a second statistical test of our survey sample and the model obtained from it, we also compared the relative likelihood of our findings (Fig 1) to that of a null model for the same dataset. If for example the survey subjects are too few or are incompetent to rank the names usefully, then the relevant null model is one where the survey dataset contains too little information to support a significant ranking. In this null model all the names in Table 1 have equal p values, and either subject preference is equally likely for any name pair [22]. Comparing the likelihood L model of our dataset under our model to its likelihood L null under the null model we find a very high log likelihood ratio log(L model /L null ) = 372. That is, our p values provide a 10 161 -fold better explanation of our dataset than does the null model. This is illustrated by Fig  1, where roughly 78% of the 1679 preferences obtained from the survey have likelihood greater than 0.5, meaning that they are more likely than not, given our p values and Eq 1. The high likelihood of the dataset, given the p values, demonstrates that the survey generated statistically significant, self-consistent information about the relative fame of the 20 individuals. Table 1 shows the fame measures p and δp, WE, GN, and GH for the twenty individuals who died in 2016, identified by name and dates of birth (DOB) and death (DOD). It also shows dWE/dt, the average WE added per month from the creation of the page through June 2017, and dWV/dt, the average WV per day from July 1 2015 to June 29 2017.

Testing correlation of fame metrics with p
Therefore, using the individuals in Table 1 as a test population, we evaluated several plausible internet-derived metrics of fame by testing their correlation with the p values. Like the p values, the metrics GN, GH, WE and WV all range over several orders of magnitude. Our data indicate that Google hits GH and Wikipedia page views WV are less reliable metrics of fame. WV has a moderate correlation with p, giving R = 0.52. GH has a weak correlation with p (R ' 0.6), leading to a log-log slope 0.33 ± 0.33 that is consistent with zero. Although GH has been regarded as an obvious metric of fame, the expansion of the internet may have made it less useful for distinguishing non-celebrities: A very high GH (*10 6 − 10 7 ) does correlate with celebrity status, but many common names have GH * 10 5 − 10 6 or higher, and therefore do not distinguish greater and lesser fame. Other flaws in GH have also been noted [25].
Overall we find that WE and to a lesser extent GN correlate sufficiently well with p values and with each other that they may serve as useful quantitative measures of fame. However, we regard GN and WE only as metrics of the current fame of individuals, measured at a particular instant. Although it is likely that some individuals were more famous in the past than at their death, we do not attempt to construct a model to estimate their fame at its peak or to correct for any decline. In addition these internet-based metrics are probably not useful for comparing the fame of individuals who died at different times in the past. Clearly an individual who died prior to the Wikipedia launch in 2001 is less likely to acquire WE than is a living person of otherwise comparable renown. Therefore in what follows we make no attempt to compare the fame of the recently deceased to that of individuals who died in earlier years.

Time dependence
As an alternative to cumulative quantities such as total Wikipedia page edits we also evaluated some continuously varying indicators of fame, such as the monthly Wikipedia page edits WE t .   Table 1, the figures at right show (D) cumulative WV and (E) cumulative WE. The legends identify the individuals by their ID in Table 1. Google searches/Google Trends, respectively) for four individuals in Table 1. Although WE t is noisy its behavior is generally stable and similar for all four individuals, with a dynamic range of about 10-100. By contrast WV t is subject to abrupt spikes associated with news events. For some individuals, especially ID 15 and ID 02 in Table 1, GS t and WV t show strong weekly or annual periodicity, presumably due to regular cycles of student academic assignments. The instability of GS t and WV t argues against the use of short-term snapshots of GS or WV as quantitative metrics of fame. Fig 3 also shows the accumulation of total WE and WV over a 1.5 y interval, for the individuals in Table 1. While an individual's WV may jump discontinuously when the individual's death is reported, WE generally changes more slowly and its relative ordering is largely stable over time. These data suggest that while both WV and WE inevitably increase over time, a rank ordering of individuals by WE changes slowly, as required for a useful quantitative measure of fame.
As WE and GN both correlate reasonably well with p, we expect them to correlate with each other. The scatterplot of Fig 4 shows that GN and WE correlate similarly in all three datasets (279 individuals). The data fall roughly along a curve that is more nearly quadratic (GN / WE 2 ) than linear, so that while GN spans more than six decades, WE spans about 3-4 decades. One possible interpretation of the nonlinear relationship is that WE, which is subject to the practical aspects of web editing, is unlikely to be smaller than about 10, and therefore has a floor value even though GN does not. Another possibility is that GN is more sensitive to celebrity (as defined herein) whereas WE is a better measure of fame, so that GN emphasizes more famous individuals at the expense of the less famous. Fig 4C illustrates the relation between GN and WE for individuals of different professions. This and other analysis we conducted show no evidence that the correlation between WE and GN depends significantly on profession.

The probability distribution for fame
The survey-generated p i provide an intuitive and quantitative metric of fame. However the printed survey is impractical for evaluating the fame of larger numbers of individuals. Therefore we use WE, which appears to be a satisfactory alternative metric of fame, to investigate the statistical distribution of fame. Fig 5A shows histograms of WE for individuals in the NYT, NBC and Wiki datasets. In each case the WE distribution is broad, spanning at least two decades, with a tail extending to very large WE. The tails raise the question of whether fame, like many other quantities in the social and natural sciences, obeys a power law distribution. Phenomena such as forest fires, earthquakes, and the sizes of cities, which lack an intrinsic size scale [26], often obey a power law: The probability that an event has magnitude x is given by for x > x min . Here x min is a cutoff, C is a normalization constant, and we have taken x as a discrete variable (like WE). We tested our data for power law behavior by finding the power law model that best fit the cumulative distribution function (CDF) of each dataset. (The CDF is the function F(x) that gives the probability that any one measurement X exceeds x.) For each dataset we assumed that the WE data obey Eq 4 above a cutoff value of WE (Methods). As shown in Table 2, all three datasets give comparable values, α ' 1.9 − 2.6, although with very different cutoff (x min ) values, indicative of the different selectivity of the three data sources. Fig 5 illustrates the agreement by showing the cumulative distribution function (CDF) for WE and that of the corresponding, maximum-likelihood power law. Many apparent power laws are only approximate and do not withstand close statistical scrutiny [24]. Although the size of NYT and NBC datasets is insufficient to establish whether the power law is superior to other models for the distribution, these particular datasets do appear to show good agreement with the power law model. We note that our α values are consistent with the α = 1.9 − 2.1 that was estimated by a different method in a study of the fame of WWI flying aces [6]. While the CDF gives the probability that any particular event exceeds a certain magnitude, it is often more useful to know the absolute frequency of events of a certain magnitude. For example in the case of earthquakes it is helpful to know how many events exceeding a given threshold occur each year. A cumulative frequency plot shows the frequency f(x) of events that . For each dataset, the optimal value of the power law exponent and cutoff (α and x min respectively in Eq 4) were evaluated as described in Methods [24], with results shown in Table 2. The dashed lines show the CDF for true power law distributions that have the same α and x min as the data. have magnitude greater than or equal to x. For many natural power law phenomena the cumulative frequency plot obeys the empirical Gutenberg Richter law [26] Here a and b set the overall scale of the frequency and define a threshold sensitivity in the dataset (similar to x min above), while ν reflects the power law distribution of the underlying events. Fig 6 shows the cumulative frequency plots for all three datasets, using WE as a fame measure and scaling the event numbers in the data up to equivalent annual rates. The parameters a and b, shown in Table 2, are highly variable as the different media sources apply different selectivity criteria in reporting obituaries. However as WE increases, all three datasets tend toward Gutenberg-Richter behavior with ν ' 1.5 − 1.7 (by least squares fit).
It is interesting that all three curves indicate that *30 − 100 individuals with fame WE ! 10 3 die each year. Such information can yield quantitative insight into questions that arise perennially about the frequency of famous deaths [16] [17] [18]. For example, the socalled celebrity rule of three is a folk belief that deaths of celebrities occur in clusters, especially groups of three, spread over a few days [17]. A frequently mentioned example is the three days in June 23-25, 2009 during which the television personality Ed McMahon, the musician The probability of such coincidences can be estimated from the data in Fig 6 by defining a threshold for fame, referring to the Gutenberg-Richter plot, and then applying the familiar birthday problem in statistics: If the deaths of N individuals are randomly distributed throughout the year, then if N ! 23 the probability exceeds 50% that at least two deaths will occur on the same day. If the threshold for fame is WE ! 1000, the expected 30-100 deaths per year ensures that two famous deaths will occasionally coincide.
The common statement of the celebrity rule of three does not require the deaths to occur on precisely the same day. If they may be separated by Δn days, then as shown in Fig 6 a 50% probability of two occurring in coincidence requires only N ! 14 (for Δn = 1) or N ! 11 (Δn = 2) deaths per year. Among individuals of rather high fame, WE ! 2000 − 3000, at least one such coincidence appears likely each year. A three-person coincidence has 50% probability when N ! 88 (Δn = 0) or N ! 35 (Δn = 2): If Δn 2 is considered a coincidence, then at least one cluster of three deaths with WE > *1000 seems likely to occur in most years. Therefore, although Fig 6 summarizes only one year of deaths, the data are sufficient to demonstrate that the apparent clustering of famous deaths is not an entirely false perception; rather the clustering is a statistical consequence of the rather large number of famous deaths that occur each year.

Conclusion
The application of statistical ideas has led productively to greater understanding of many aspects of human social dynamics, such as the evolution of opinion, cultural and linguistic behaviors [27]. Our results demonstrate that quantitative measures can plausibly be applied to fame, providing insight into this important cultural phenomenon and allowing detailed statistical investigation. We note however that, as fame has economic value, it would be preferable to measure fame using tools that (unlike WE) are difficult to manipulate. For example, instead of using paper surveys to score fame, one could implement a larger scale, social-media based, electronic version of the pairwise comparison method. This would greatly expand the survey base, allowing more accurate evaluation of the fame of greater numbers of individuals, and especially less renowned individuals. Future investigators may wish to explore such approaches.
Supporting information S1 Spreadsheet. Survey responses. The spreadsheet contains all individual responses to the fame survey. The first page contains the list of twenty deceased individuals of high renown, each identified by a number ID. Each of the 50 survey subjects was presented with 50 pairs of names from this list and asked to select the more familiar name in each pair. The second page contains the response data. The first column gives the subject number. The second and third columns indicate the ID of the winner (more familiar) and loser (less familiar) in each pairwise comparison of names. Because "No preference" responses have been removed, fewer than fifty responses are recorded for some subjects.