Russians are the fastest 100-km ultra-marathoners in the world

Objectives A recent study investigating the top 10 100-km ultra-marathoners by nationality showed that Japanese runners were the fastest worldwide. This selection to top athletes may lead to a selection bias and the aim of this study was to investigate from where the fastest 100-km ultra-marathoners originate by considering all finishers in 100-km ultra-marathons since 1959. Methods We analysed data from 150,710 athletes who finished a 100-km ultra-marathon between 1959 and 2016. To get precise estimates and stable density plots we selected only those nationalities with 900 and more finishes resulting in 24 nationalities. Histograms and density plots were performed to study the distribution of race time. Crude mean, standard deviation, median, interquartile range (IQR), mode, skewness and excess of time for each nationality were computed. A linear regression analysis adjusted by sex, age and year was performed to study the race time between the nationalities. Histograms, density and scatter plots showed that some races seemed to have a time limit of 14 hours. From the complete dataset the finishes with more than 14 hours were removed (truncated dataset) and the same descriptive plots and analysis as for the complete dataset were performed again. In addition to the linear regression a truncated regression was performed with the truncated dataset to allow conclusion for the whole sample. To study a potential difference between races at home and races abroad, an interaction term race site home/abroad with nationality was included in the model. Results Most of the finishes were achieved by runners from Japan, Germany, Switzerland, France, Italy and USA with more than 260’000 (85%) finishes. Runners from Russia and Hungary were the fastest and runners from Hong Kong and China were the slowest finishers. Conclusion In contrast to existing findings investigating the top 10 by nationality, this analysis showed that ultra-marathoners from Russia, not Japan, were the fastest 100-km ultra-marathoners worldwide when considering all races held since 1959.


Introduction
http://statistik.d-u-v.org/index.php. By using the link http://statistik.d-u-v.org/ each person can access the publicly available database. We used http://statistik.d-u-v.org/geteventlist.php and inserted in 'Year' the term 'all', in 'Distance' the term '100 km' and in 'Country' the term 'All' when using the English version of the website. By clicking on 'Go', all 100-km ultra-marathons held worldwide are presented. This search leads to more than 4,500 races; all race results were manually downloaded by one of the investigators.
The original dataset contains the following variables: name, age at race, year of race, sex of finisher, nationality and country, and speed in km per hours. We converted running speed to time in hours by dividing 100 km by speed (km/h). To identify unique finisher we computed date of birth with year of race minus age at race. After cleaning the variable name we identified unique finisher if finisher has the same name, date of birth, sex and nationality. Finishes with missing in age, year, sex and times were removed. Finishes out of the following ranges were removed: date of birth between 1890 and 2000, age between 15 and 100, year between 1950 and 2016, and number if character in nationality and country is 3. After removing the finishes from missing data and outliers we selected only finishes which nationality has equal or more than 900 finishes. This dataset has 24 nationalities and we named it complete dataset. To study the distribution of time we produced histograms and density plots with Gaussian kernel for each of the selected nationality. Also we produced normal distribution for each nationality defined by the crude mean and standard deviation of each nationality to compare with the empirical distribution. Furthermore we computed crude mean, standard deviation, median, interquartile range (IQR), mode, skewness and excess kurtosis of time for each nationality. Excess kurtosis or shortened excess is defined as kurtosis minus 3. An excess of 0 means a Gaussian-like kurtosis (mesokurtic), a positive excess has a slender form of curve (leptokurtic), and negative excess has a broader curve (platykurtic).
Due to various kind of distribution of time we decided to cluster the nationalities according to time density. The range of time (h) was segmented in 0 to 7, 7 to 8, 8 to 9, . . .,22 to 23, 23 to 24 and ! 24. For each of these 19 segments we computed the area under der density curve. With that, we performed an agglomerative hierarchical clustering using the group average clustering to analyse groups of similar distribution of time.
To study the time between the nationalities we performed a linear regression analysis adjusting by sex, age and year: We included a quadratic term for age and year and also an interaction term between sex and age and age squared, sex and year and year squared and sex and nationality. This model (1) is based on visual inspection of scatterplots of time against year and time against age for each nationality (Figs 1 and 2). We included the following fitting curve to each panel: a bspline (solid) of age, respectively, time, and a quadratic term (dashed). Since both curves overlaps for the large range using a quadratic term seems admissible. The variable age and year were centered by the median with median year of 2009 and median age of 44. Reference level of sex was male and reference level of nationality was Australia (AUS).
Histograms, density plots (Fig 3) and scatterplots (Figs 1 and 2) show that some races seem to have a time limit: finishers who didn't reach certain time limit were discarded. To account for this limit we defined a time limit of 14 hours based on the plots and histograms of Japan, Korea and Taiwan. From the complete dataset we removed the finishes which have more than 14 hours (Fig 4) which we called the truncated dataset and produced the same descriptive plots and analysis as for the complete dataset (Figs 5 and 6). Additionally to the linear regression we performed a truncated regression with the truncated dataset to allow conclusion to the whole sample. The estimates from the regressions were used to compute the times of a reference finisher: median of age, median of year and male. These times and resulting ranks were compared between the various nationalities.
Furthermore, to study if there is a difference between races at home and races on abroad we included in model (1) an interaction term race site (home/abroad) with nationality and sex (2): Since 64.6% of the finisher had only one race and 16.6% had two races we performed regression analysis without including the cluster effect of finishers.
All data processing and analysis were performed with the statistical software R [10]. Truncated regression was performed with function truncreg from package truncreg.

Results
Between 1959 and 2016, a total of 363,924 athletes finished a 100-km ultra-marathon. The variable with the highest number of missing data is date of birth. There was no missing in variable sex. Table 1 summarizes the exclusion criteria. Only nationalities with at least 900 finishes were considered to allow precise estimates and robust histogram. To analyse which country have the most missing data in variable date of birth nationalities with at least 1,000 finishes and at least 10% of missing data are listed in Table 2  48.1%, 41.5%, respectively. Finally, a total of 150,710 finishers originating from 24 countries with a total of 307,871 finishes could be considered for data analysis. Table 3 presents the number of finishes by origin of the athletes. Most of the finishes were achieved by runners from Japan, Germany, Switzerland, France, Italy and USA with more than 260'000 (85%) finishes.
A total of 20 nationalities performed more than 50% of their races at their home country with runners from Japan, Switzerland, Italy and Korea on the top whereas runners from Germany, Great Britain, Belgium and Austria have performed less than 50% of the races abroad (Fig 7). Runners from Finland, Germany, Switzerland, Italy, Netherlands Hungary, Belgium, Austria, France and Russia have an average number of finishes per finisher of more than 2 (Fig 8).
A total of 64.6% of the finishers completed only one 100-km ultra-marathon (Table 4). On average, the athletes were 43.7±11.1 years old (Table 5). A total of 88% of the finishers are men and 12% are women (Table 6).
From the agglomerative hierarchical clustering 5 groups can be retrieved: • Group 1 with China and Hong Kong which show a wide spread distribution of time.
• Group 2 with Russia which has the lowest mode and high excess (3.2).
• Group 3 with Korea, Japan and Taiwan which have a very high excess (7.1, 5.8 and 16.8). All have a very step right curve at 18, 14 and 14 hours, respectively.
• Group 4 with Czech Republic, Spain, Great Britain, Australia, Switzerland, Italy and USA with low excess and low skewness. The fastest 100-km ultra-marathoners worldwide • Group 5 with Finland, Denmark, Nederland, Belgium, Hungary, Poland, Sweden, France, Canada, Austria and Germany which have a higher skewness (!1) compared to group 4 with skewness 1 (Fig 4). Table 7 compares the number of finishes before and after truncation of the data set for each nationality. Hong Kong and China have more than 90% truncated observation, Australia, Czech Republic, Spain, Switzerland and USA have between 50% and 63% truncated observation and all others less than 50%. Table 8 presents the mean, SD, median, interquartiles, mode, skewness and excess of time for each nationality of the complete dataset and Table 9 for the truncated data set.
Estimates, standard errors and p-values from models (1) and (2) based on complete and truncated dataset are given in Tables 10-14. These data were used to compute times at the reference sex = male, year of race = 2009 and age = 44 which are presented in Tables 15 and 16 and To visualize changes in ranks between these three methods the estimates were ordered by descending time estimates and the nationality of the estimates were connected with lines. where all other nationalities change their rank. Japan changed from rank 5 to rank 18. Hungary changed from third to second and Finland from second to fourth. The time for Russia changed from 9.4 h (95%-CI: 9.1-9.6) to 9.0 h (95%-CI: 8.9-9.1), Hungary from 10.7 h (95%-CI 10.4-11.0) to 9.9 h (95%-CI: 9.7-10.0), Japan from 11.1 h (95%-CI: 10.9-11.3) to 11.4 h (95%-CI: 11.3-11.5) and China from 19.1 h (95%-CI: 18.9-19.3) to 11.9 h (95%-CI: 11.6-12.1).
There are only four nationalities which change ranks when ranks from linear regression are compared with ranks from truncated regression both based on truncated dataset ( Fig 9E). Fig  10(A) and 10(B) shows changes of ranks between running at home and abroad computed with complete data set, respectively, truncated data set based on model (2). Both show many changes in rank position. Again Russia remains at place 1 at home and abroad with 10.0 h and 8.2 h, respectively, with complete data set and 9.7 h and 7.8 with truncated data set. Japan changed from rank 10 (11.1 h, at home) to rank 20 (14.9 h, abroad) when complete dataset was used and from rank 18 (11.4 h) to 3 (9.2 h) when truncated dataset was used (Table 16). Table 17 shows the mean time of the top 10, 100 and 1000 finishers. Only the fastest finishes of each finisher was considered to define the top. Japan is first at top 10 and top 100 and second at top 1000 whereas Russia is second at top 10 and top 100 and ninth at top 1000. Table 18 shows the mean time of the top fastest finishes. Russia is first at top 10 and 100 with 5 finisher and 10 finishes and 37 finishers with 100 finishes. Japan is second at top 10 and 100 with 7 finishers and 10 finishes and 50 finishes with 100 finishes.

Discussion
The aim of this study was to investigate the aspect of nationality of the fastest 100-km ultramarathons competing between 1959 and 2016 with the hypothesis that the fastest runners would originate from Japan as it has been found for the top 10 runners worldwide competing between 1998 and 2011. However, in contrast to previous findings, athletes from Russia achieved the fastest race times, not athletes from Japan, when all athletes were considered by nationality.
A first potential explanation could be the quote of finishes at the origin country. For example, Russians have~37% of the finishes outside the origin but Japanese less than 2%. Most probably only the fastest Russian runners compete outside Russia on the fastest races (e.g. World Championships) or the fastest courses (e.g. completely flat course, track races) worldwide. In contrast, Japanese runners competed preferably in races held in Japan where the courses are most probably not fast (i.e. rather hilly courses than flat courses). The present study is, however, not the first to show that Russian athletes are the fastest in an ultra-endurance sport. Recently, an analysis of the 'Engadin Ski Marathon' showed that Russians were the fastest cross-country skiers [11], which, combined with the findings of the present study, indicated a general trend of excellence of Russians in ultra-endurance sports.
The two strongest factors which seems influence the population speed are the attitude to participate and rules concerning time limits. Firstly, as the density plots and histograms show it seems that there are countries where also very slow participants were allowed to compete in races and who has been also considered in the ranking. Extreme examples of this kind of competitions are athletes from the nationalities China, and Hong Kong, but also Czech Republic, Great Britain, Spain, and Australia. Athletes from other countries like Denmark, Finland,  The fastest 100-km ultra-marathoners worldwide Sweden, and Russia have a high skewness and excess which means that the bunch is concentrated over a narrow limit. This may due to attitudes within society (e.g. popularity of sports, policy of furtherance) or socio-economic backgrounds of the individuals that only fast competitors participates. It has been suggested also that a successful finish in this sport depends on the motivation to train intensively [7]. Secondly, the density plots of athletes from Japan, Taiwan and Korea show a very steep slope on the right side of the curve. This may due to time limits given by the organizer. These time limits may not only performed in Japan or Taiwan but less frequently also in other countries. This could be the main source of bias which would explain why Japanese were the fastest [1]. To counteract this bias we truncated the dataset and considered only finishes with lower or equal 14 hours. This can cause bias like using top 10 finishers if we would conclude to the whole population. So, we have to consider that using the complete dataset would give bias due to policy rules and if we use the truncated dataset we have selection bias. We used also truncated regression to allow conclusion to the whole running population but it seems that too many observations have been truncated which changed the dataset in that way that it changed completely the shape of the original distribution which does not anymore allow conclusion on the complete data set but only on the truncated data set. That's why the linear regression and truncated regression of the truncated data set gives similar results. Nonetheless, the The fastest 100-km ultra-marathoners worldwide PLOS ONE | https://doi.org/10.1371/journal.pone.0199701 July 11,2018 assumption that in Japan is a time limit may be supported by the fact that the race time at home is 11.1 h and on abroad is 14.9 h using model (2) and complete data set. This is an increase of 3.1 hours. For athletes from Russia, the times are 10.0 h and 8.2 h, respectively, a decrease of 1.7 hours. Assuming that only good and the best ultra-marathoners take the effort  The fastest 100-km ultra-marathoners worldwide The fastest 100-km ultra-marathoners worldwide The fastest 100-km ultra-marathoners worldwide to go abroad the mean time should decrease which is not the case for Japanese runners. Nevertheless, both analyses with the complete and the truncated dataset show that Russian runners were the fastest and athletes from China and Hong Kong were the slowest. All other nationalities change their rankings reflecting the distribution of the running time.
If we look at the top 10, 100 and 1000 of the fastest finishers, Japanese ultra-marathoners take the first place and the second place, respectively. The rank shifts to the rear the more participants are included in the data set. It seems that there some very fast Japanese but as the number of participants growths the mean time increase more than in other nationalities. A look at the top 10, 100 of the finishes shows that Russian ultra-marathoners take the first places. This is due to five runners with 10 finishes and 37 runners with 100 finishes. In this case it seems that Russian ultra-marathoners take high ranks since some few runners achieved some very fast finishes. This could be a limit of the linear regression if finishers are not considered in a multilevel regression as random variable. We also performed a linear regression with finisher as random variable and we got similar results as in the linear regression with complete and truncated data set (data not shown).
The role of nationality on 100-km ultra-marathon race times highlighted in the present study was in agreement with previous research that identified sports as the most powerful form of national performance [12]. An attempt to use sport to build a sense of national identity has been reported [13]. Either biological or cultural heredity has been identified as a factor associated with the dominance of a nationality in a sport [14]. For instance, certain genes have been identified to relate to endurance performance [15]. In addition, an explanation of the The fastest 100-km ultra-marathoners worldwide The fastest 100-km ultra-marathoners worldwide The fastest 100-km ultra-marathoners worldwide The fastest 100-km ultra-marathoners worldwide  The fastest 100-km ultra-marathoners worldwide The fastest 100-km ultra-marathoners worldwide The fastest 100-km ultra-marathoners worldwide differences in participation among nationalities might be the differences in their attitudes towards physical activity [16]. Participation in running might be influenced by economic and cultural factors, e.g. those without a migration background are more likely to participate in running than those with a migration background [17]. Another aspect affecting sport The fastest 100-km ultra-marathoners worldwide performance is doping, for which no accurate rates exist due to its undisclosed practice; however, its prevalence has been estimated as 14-39% in adult elite athletes and has been shown to vary by performance level and nationality [18].
Since it has been shown that the role of nationality might vary by distance in endurance and ultra-endurance running [9], the findings of the present study should not be generalized to other distances. On the other hand, strength of the study was its methodological approach to the research question: first, it used one of the larger samples of 100-km ultra-marathoners ever studied, second, it considered a probably applied time limit barrier by using a truncated data set, third, we compared adjusted times of a reference finisher to compare ranks, and forth, we provide a time distribution classification which helps to interpret the results.
We found that ultra-marathoners from Russia were the fastest in this specific ultra-marathon distance. Unfortunately, this kind of analysis is not able to explain the reason for this dominance. The role of ethnicity in running is, however, well-known for other running distances. Running best performances are dominated by a few groups of athletes including runners with West African ancestry for the sprint distances and East African runners for the long The fastest 100-km ultra-marathoners worldwide distances [9,19]. For marathoners, East-African runners from Kenya and Ethiopia dominate since decades running distances up to the marathon [9,[20][21][22] while for other running distances such as the sprint distances, runners from Jamaica are dominating [23]. For elite Kenyan runners, it is well-known that they are from a distinctive environmental background in terms of geographical distribution, ethnicity and run to school [20]. Interestingly, the same findings were reported for elite Ethiopian runners, who also have a distinct environmental background in terms of geographical distribution, ethnicity, and also often run to school [21]. So for both Kenyan and Ethiopian runners both environmental and social factors are important for their athletic success. These aspects are, however, not only for East African distance runners, but also for sprinters from Jamaica of importance. It has been shown that a higher proportion of middle distance and both jump and throw athletes walked to school and travelled greater distances to school [23].
Although different anthropometric, physiological, biomechanical and training characteristics are of importance for the East African running dominance [22,24,25], a strong psychological The fastest 100-km ultra-marathoners worldwide  The fastest 100-km ultra-marathoners worldwide motivation to succeed athletically for the purpose of economic and social advancement is known [26]. Elite Kenyan runners become a competitive athlete due to economic reasons. Typically, Kenyan athletes see athletics as a means of making money to help their families, parents and siblings [20,27].