Time window to constrain the corner value of the global seismic-moment distribution

It is well accepted that, at the global scale, the Gutenberg-Richter (GR) law describing the distribution of earthquake magnitude or seismic moment has to be modified at the tail to properly account for the most extreme events. It is debated, though, how much additional time of earthquake recording will be necessary to properly constrain this tail. Using the global CMT catalog, we study how three modifications of the GR law that incorporate a corner-value parameter are compatible with the size of the largest observed earthquake in a given time window. Current data lead to a rather large range of parameter values (e.g., corner magnitude from 8.6 to 10.2 for the so-called tapered GR distribution). Updating this estimation in the future will strongly depend on the maximum magnitude observed, but, under reasonable assumptions, the range will be substantially reduced by the end of this century, contrary to claims in previous literature.


Introduction
Statistics of earthquake occurrence, in particular of the most extreme events, must be a fundamental source to assess seismic hazard [1]. The cornerstone model for describing the earthquake-size distribution is the Gutenberg-Richter (GR) law [2,3]. The original version of the GR law states that earthquake magnitudes follow an exponential distribution, and since this is a perfectly "well-behaved" distribution, with all statistical moments (such as the mean and the standard deviation) being finite, the problem of earthquake sizes would seem a rather trivial one.
However, a physical interpretation of the meaning of the GR law needs a proper understanding of magnitude. In fact, magnitude presents several difficulties as a measure of earthquake size [4], and a true physical quantity is given instead by seismic moment [5,6]. Due to the logarithmic dependence of magnitude on seismic moment, the GR law for the latter transforms into a power-law distribution, i.e., where x is seismic moment, f(x) the seismic-moment probability density, a a lower cutoff below which the power law does not hold (presumably because of the incompleteness of the considered catalog for small earthquakes), and 1 + β the power-law exponent, which takes values close to 1.6 or 1.7 (and with the symbol "/" representing proportionality). It turns out that the solution to the physical interpretation of the GR law has a price to be paid: the power-law distribution, when 1 + β is smaller than 2 (which is indeed the case), is not "well behaved", in the sense that the mean value of the seismic moment becomes infinite. The reason is that, for power-law distributed seismic moments, events in the tail of the distribution, despite having very small probability, bring an enormous contribution to seismicmoment release [7], and the seismic-moment sample mean does not converge, no matter how large the number of data is, due to the inapplicability of the law of large numbers to power-law distributions [8] such as that in Eq (1). In consequence, as when extended to the whole range of earthquake sizes the GR law is unphysical, the tail of the distribution of seismic moment must deviate from the GR power-law shape [9].
Due to scarcity of data, the problem has to be approached at a global scale, or at least for a large subset of the global data (for instance, for subduction zones as a whole [9]). This approach has been followed by a number of authors [3,[9][10][11][12][13][14][15][16][17]. Essentially, a new parameter M c > 0 is introduced, providing a scale for the seismic moment of the largest ("non-GR") earthquakes, in such a way that for x � M c the GR law can be considered to hold but for x � M c the distribution clearly departs from this law, decaying faster than the GR power law. The values of M c are more easily read in terms of the corresponding (moment) magnitude m c [5,6], through the formula m c = 2(log 10 M c − 9.1)/3, where the seismic moment is measured in N�m. As m c is sometimes referred to as "corner magnitude", so M c would be the "corner seismic moment" [9], independently of the specific probabilistic model (in practice, we will use M c for formulas and m c for reporting numeric values, and both will be referred to as "corner parameters" or "corner values").
In this article we aim to further clarify to what extent the available observations can constrain M c or m c , and how many more earthquakes (and then, how many more years of recording) would be likely necessary to yield reasonably precise values of such estimates. Before proposing a rigorous statistical way to tackle these issues, we will need first to assess a previously proposed approach [16].

Probabilistic models
We define the probabilistic models in terms of the cumulative distribution function, F(x), which gives the probability that the random variable (seismic moment) is equal or smaller than a value x. This description is totally equivalent to the one in terms of the probability density, as both functions are related as f(x) = dF(x)/dx and FðxÞ ¼ R x a f ðx 0 Þdx 0 (at some point we will use also the complementary cumulative distribution function, The distributions of our interests are: 1. the truncated power-law (TPL) distribution [16], 2. the tapered (Tap) GR law [14,16,18], also called Kagan distribution [19], 3. the truncated gamma (TrG) distribution [12,20], with Gðg; zÞ ¼ R 1 z x gÀ 1 e À x dx the upper incomplete gamma function, defined when γ < 0 only for z > 0. All three F(x) are zero for x < a and F tpl (x) = 1 for x � M c . The parameter β has to be greater than zero, except in the TrG model, where it has no restriction. Of course, M c > 0 and a > 0.
The three distributions are graphically depicted in S1-S3 Figs of the supporting information. Note that for the TPL distribution M c is a truncation parameter, whereas for the Tap and TrG it is a scale parameter (it sets the scale of F(x) in the x-axis) [12,20]. Namely, f tpl (x) goes abruptly (discontinuously) to zero at x = M c , whereas for the other two distributions this point sets the scale at which the power law transforms smoothly into an exponential decay. So, the physical meaning of M c in the TPL is quite different than in the other two models. Note also that the TPL is truncated both from below and from above (but the adjective refers to the truncation from above, x � M c ), whereas the TrG and Tap are truncated only from below (x � a). Summarizing, all the considered distributions have two free parameters, β and M c (or β and m c ), with the value of a fixed by the completeness of the earthquake catalog. In all cases, the limit M c ! 1 yields the usual power-law (PL) distribution [20], F pl (x) = 1 − (a/x) β for x � a, which is equivalent to Eq (1). Other works have considered different distributions, such as the Gumbel in Ref. [21], for which the power-law limit is not so clear.

State of the art
Several authors have addressed the constraining of the value of M c and related issues. In particular, Ref. [16] studied the TPL and the Tap distributions (called there GR and MGR, respectively). It was claimed that, for global seismicity with magnitude above 5.75 (i.e., seismicmoment lower cut-off a = 5.31 × 10 17 N�m), an enormous amount of data would be necessary in order to obtain reliable estimates of M c or m c (200,000 years are mentioned for the Tap distribution with m c ' 8.5). Reasonable values proposed previously by other authors (for instance m c ' 9 in Ref. [14] for the Tap distribution) were discarded.
The analysis was based on a single statistic: the maximum seismic moment Y of the N earthquakes with magnitude above 5.75 contained in the catalog; that is, Elementary probability theory allows one obtaining the probability distribution of the maximum Y when the N observations are independent [16,22] (independence is the maximumentropy outcome when there is no constrain for the dependence between the observations [23]). Namely, the cumulative distribution function of this maximum is given by where F(y) can be given by any of the distributions in Eqs (2)-(4), depending on the underlying statistical model. This approach constitutes an "extreme" limit of the classical block-maxima procedure used in extreme-value theory, considering just one single block [24]. . .x N } and a modeling probability distribution F(x), Zöller [16] correctly argued that, if the data come indeed from F(x), then, F max (y emp ) = Prob[Y � y emp ] should not be too close to 1. The reason is that proximity to 1 would mean that the empirical value y emp is too large in relation to the values of Y that one can expect from the model distribution F(x) and the number of earthquakes observed. Subsequently, this author introduced an ad-hoc distinction between what he called "not well-sampled" distributions, characterized by F max (y emp ) = Prob[Y � y emp ] large (close to 1) and "well-sampled" distributions, characterized by F max (y emp ) small. The latter can be equivalently characterized by a large value of the complementary cumulative distribution at y emp , that is, S max (y emp ) = 1 − F max (y emp ) = Prob[Y > y emp ] large (close to one). In practice [16], for "well-sampled" distributions [16]. We will explain below that this criterion cannot be sustained from a statistical point of view, and will introduce instead a robust criterion. Analyzing global data from the centroid moment tensor (CMT) catalog [25,26], from January 1, 1977 to June 30, 2012 (including shallow, intermediate and deep events, N = 7, 585 for x � a), Zöller [16] found that the value of the maximum magnitude corresponds to the 2011 Tohoku earthquake, with magnitude 9.1 (note that the 2004 Sumatra earthquake had a combined multiple-source moment magnitude of 9.3, but only 9.0 with the standard CMT determination [27]). In our work, we will analyze the same dataset, for the sake of comparison. Then, this author [16] evaluated the performance of the TPL and the Tap distributions for different fixed values of the parameter M c . The considered values correspond to m c = 8.5, 9, 9.5, . . .12, in addition to m c = 9.2. In contrast, it was stated that β was estimated by maximum likelihood for fixed M c . For the TPL model, a value of m c = 9.2 resulted in Prob[Y > y emp ] = 0.55 [16], whereas m c = 9.5 and m c = 10 led to Prob[Y > y emp ] very close to one, and even closer-to-one values were obtained for m c � 10.5. Following the "well-sampledness" criterion, the value m c = 9.2 was discarded for the TPL model, despite of having the maximum likelihood among all the values of the parameters considered, and values with m c � 10.5, with much smaller likelihood, were preferred. However, no preference was shown between m c = 10.5 and any other higher value (for instance m c = 12) and all the models were considered equally likely. For the Tap model, the previous results and the conclusions [16] were similar to those for the TPL model, and in this way the value m c = 9 (proposed in Ref. [14]) was rejected despite of yielding maximum likelihood.
The calculation of the required number of data to perform a reliable estimation of parameter M c (or m c ) was obtained by imposing a minimum number of events N m such that the distribution becomes "well-sampled" [16], in the sense of Eq (6). So, introducing Eq (5) into Eq (6), Note that, no matter the value of F(y emp ), if this is strictly smaller than 1, for sufficiently large N m we will have ½Fðy emp Þ� N m < 0:01 and the condition will be fulfilled by any model, with any parameter value, if enough data are gathered (except truncated models with F(y emp ) = 1). Imposing that the previous condition becomes an equality one gets We will argue below that this Eq (8), used (but not made explicit) in previous research [16], does not hold for the problem under consideration.
In this way, for the TPL model with m c = 9.2, accepting the value Prob[Y > y emp ] = 0.55, the approach just outlined, Ref. [16] [Eq (8) here], yields that N m has to be higher than 45,000 (corresponding to 212 years of earthquake recording, with about 214 earthquakes with x � a per year). For the Tap model with m c = 8.5, for which Prob[Y > y emp ] = 0.0007, one obtains that more than 200,000 years would be needed (from N m = 50 × 10 6 , roughly). Note the counterintuitive results that this approach leads to: the larger the corner seismic moment M c , the less data are required for its estimation, as contained in Eq (8) (due to the decrease of F(y emp ) with m c ) and illustrated for the TPL model in Fig 2.

Proper testing using the maximum seismic moment
First, let us show that the previously used "well-sampledness" criterion [16], reproduced here in Eq (6), is not appropriate. If the distribution F(x) is a good model for the empirical data, what one expects is that both Prob[Y � y emp ] and Prob[Y > y emp ] are not too close to 1, let us say, below 1 − (1 − r)α and 1 − rα, respectively, at significance level α (with r = 1/2 in the usual symmetric case and α = 0.05 or 0.01). As both probabilities add to one, the conditions can be written as or, equivalently, as i.e., the random variable Y takes not too extreme values with probability 1 − α (e.g. 0.95 or 0.99). Note the profound difference between these conditions and the "well-sampledness" criterion [16], Eq (6) here. Note that, following this "new" criterion, previous numerical results for the truncated power-law distribution [16] seem to indicate (in contrast to the conclusions there) that all tested values of m c should be rejected at the 0.05 significance level (as Ref. [16] reports Prob[Y > y emp ]>0.975), except m c = 9.2 (the value of Prob[Y > y emp ] for m c = 9.5 displayed in Fig 3 of Ref. [16] seems to be slightly above 0.975 and should be rejected as well, at least in the symmetric case r = 1/2). For the Tap distribution, the only values of m c that should not be clearly rejected from the numerical results of Ref. [16] (again in contrast with the conclusions of that reference) are m c = 9 and m c = 9.2 (for the rest of m c values Ref. [16] reports Prob[Y > y emp ] above 0.975 or below 0.025). But the numerical results of Ref. [16] are not in correspondence Regarding the number of earthquakes required to constrain the corner parameters (M c or m c ), what is implicit behind Eq (8) is that a "not well-sampled" distribution (with Prob[Y > y emp ] close to zero) is "not well-sampled" just because of "bad luck", that is, the largest earthquake had y emp much larger than expected from both the model F(x) and the actual value of N. This bad luck is what leads to the rejection of the null hypothesis in usual statistical testing (and corresponds to the significance level, see Eq (9)). But, in Ref. [16]'s argument, gathering more data would eventually lead to the accommodation of the theoretical distribution of the maximum to the empirical value y emp , regardless of the model. Thus, in that assumption y emp is considered quenched, i.e., it does not grow despite the fact that the number of data increases. This is hard to justify.

Proper constraining of the corner seismic-moment: TPL case
In this section we derive a proper statistical way to evaluate the number N of earthquakes necessary to constrain the estimated value of M c or m c for the TPL distribution. In this case, our approach uses the distribution of the estimator of these quantities (M c and m c ) to calculate their statistical uncertainty as a function of N, and looks for the value of N that reduces the uncertainty down to a desired range. This will necessary depend on the true values of the parameters, which are unknown, and is also based on the assumption that the sample is representative of the whole population (otherwise, no inference is possible).
For this purpose, let us focus in the truncated-power-law model, which has the peculiar property that the random variable Y (the maximum seismic moment of the N earthquakes) constitutes the maximum-likelihood estimator,M c , of the truncation parameter M c , that is Y ¼M c for the TPL (or, equivalently,m c for the magnitude). Then, inverting F max (y p ) = p, with y p defining the 100p-th percentile of the distribution of the maximum seismic moment (i.e., the distribution ofM c ), one can get the probability of any interval forM c . The limiting points for these intervals are, from Eqs (5) and (2) Fig 3 shows that the ideal situation happens when the distribution of the maximum-likelihood estimator is very narrow, and thenm c ' m c , leading to the automatic recovering of the true value (a value very close to it, but below, in fact). When N is equal to the empirical value (considering the case previously studied in the literature [16], up to mid 2012) this happens for m c < 8.5. One could refer to this case as "sampled enough" (in sharp contrast with previous terminology [16]). On the contrary, when the upper limit of the interval, m p+0.95 , departs clearly from the true value of m c , we may talk of undersampling (there is no hint of the real maximum m c after the N observations, again in contrast with previous research [16]). This is the case for m c > 10.5 (for N = 7, 585), for which the intervals do not include the true value of m c (for instance, for m c = 12 the interval of the maximum goes from 9 to 11, roughly, see Fig  3). But note this kind of undersampling still would allow ruling out the values of the parameters of the undersampled distributions, if the empirical value of the maximum were outside the resulting interval (nevertheless, this is not the case for the actual value, see below). In the intermediate case (8.5 < m c < 10.5 for the period under consideration), the intervals are wide but they reach the true value.
We can use the previous argument to find the value of N that leads to narrow 95%-probability intervals for the estimation of M c or m c in the TPL model. Using Eq (10), the width of the magnitude intervals, Δ = m p+0.95 − m p , is obtained as Isolating N as a function of Δ tpl for given values of M c and β yields the desired result. Notice that, in contrast to Ref. [16] [Eq (6)], our approach does not need any empirical information (except the value of β). Going back to Fig 2, this includes the number of events necessary to obtain intervals of a fixed width after numerical inversion of Eq (11), as a function of M c . The results are clearly different to the previous ones [16], as shown in the figure. Fig 2 is particularly useful for testing a specific value of m c . If the real value of m c were 9.5 (the largest earthquake in the historical record [28], but not contained in the CMT catalog) a 95%-probability interval with width Δ = 0.4 (from 9.1 to 9.5, roughly) would be obtained after about N = 14, 000 events (corresponding to 65 years, reached in 2042). If one wants instead a width of Δ = 0.2 (yielding an interval from 9.3 to 9.5) the necessary N is 36, 400, to be reached around the year 2147 (assuming that the TPL were the right model, that there is no dependence between the magnitudes, and that the long-term global earthquake rate and β were constant).
It is important to realize that, in all the cases shown in the figure, the top value of the interval coincides with the real value. Although the probability that the estimated value is between m p+0.95 and m c is 0.05 − p, the two values are very close, i.e., m p+0.95 ' m c ; this is due to the extreme sharpness of the density of the observed maximum close to m c (for instance, as in Fig  1, where the vertical axis is logarithmic). So, the value of N provided in the figure guarantees no undersampling. Note also that a 95%-probability interval is a much more strict requirement than an interval corresponding to one standard deviation.
We have just calculated the number of earthquakes required to estimate M c with a given uncertainty, for different hypothetical values of the true M c . This does not make use the empirical value y emp obtained in 35.5 years. A different issue then is how y emp discards or not the possible values of M c . Fig 3 shows (in addition to the intervals of the maximum magnitude obtained from Eq (10)) the empirical value obtained for the period 1977-2012.5. If the observed maximum magnitude (9.1 in the global CMT catalog) is inside the interval, there is no reason to reject the parameters of the model (with a 95% confidence); on the contrary, if the empirical value is outside, we should reject the parameters.
The figure shows how, for the TPL model, no value of m c � 9.1 can be rejected, i.e., any value of m c between 9.1 and 1 is compatible with the empirical result, and therefore the data do not allow to determine an upper bound for m c , although values of m c above 10 are close to rejection (with a 95% confidence; if we decreased the confidence or increased the number of data an upper bound would appear). Indeed, considering the most recent data at the time of writing, up to the end of 2017 (where no other earthquake of magnitude larger than 9.1 has taken place) the range of compatible values of m c turns out to be 9.1-10.8, as reported in Table 1.
As an illustration, we also analyze what an hypothetical y emp corresponding to a 9.1 magnitude in a 71-year period (from 1977 to 2047, let us say) would imply. Table 1 shows that that would constrain m c to be between 9.1 and 9.5, for 95%-probability intervals, but if the maximum in the same period were 9.3, the allowed range would be between 9.3 and 10.3. In contrast, a maximum empirical value of 9.5 (or higher) in that period would yield m c unbounded from above again. Needless to say, we need to wait about 30 years to chose between these three answers.

Proper constraining of the corner seismic-moment: Tap and TrG cases
Note that, although the maximum empirical value of the seismic moment is the maximumlikelihood estimator of M c only for the TPL distribution (out of the three considered models), we can still use the previous procedure to constrain the value of M c for any distribution, but with the resulting values of M c not related to maximum likelihood estimation, in general.
Thus, for the Tap distribution, the percentiles of the maximum seismic moment turn out to be, using Eqs (3) and (5), with W the Lambert W function [29], fulfilling z = W(ze z ). And for the truncated gamma we get, using Eq (4), with G À 1 2 the inverse, respect to its second argument, of the incomplete gamma function. In the same way as for the TPL, the empirical value y emp leads to an unbounded range of the values of m c compatible with y emp for the original value of N (7,585). These ranges go from 8.65 to 1 for the Tap distribution and from 8.8 to 1 for the TrG, with β = 0.67. However, when one extends the analysis up to 2017 the ranges become bounded, although large, see Table 1.
This table also explores the values of these ranges in the future, depending on the hypothetical value of the maximum magnitude observed. We see that, in general, the ranges provided by the Tap distribution are somewhat wider than those provided by the TPL, whereas the TrG yields rather larger ranges. This means that the number of data necessary to constrain the value of m c is larger in the TrG than in the other two distributions. The table also allows us to rule out the scenario that there will be no earthquakes larger than magnitude 9.1 before 2097 for a TPL distribution, as this scenario leads to the implausibility of having events larger than 9.3, contrary to what was observed in the 9.5 1960 event in Chile (although the CMT catalog would probably underestimate the seismic moment of such an event [27]).

Discussion
Before concluding, we briefly explore the implications of our results for the assesment of seismic hazard. Considering as an illustration the case of the tapered model, we have seen (Table 1) how the CMT data, up to 2017, is compatible with a range of values of the corner magnitude, from m cmin = 8.6 to m cmax = 10.2 (with a 95% confidence). Therefore, the resulting seismic-moment distribution (or, in the same way, the resulting magnitude distribution) will be a mixture (or combination) of the different S tap (x|M c ) (now we use the complementary  The corner value of the global MS law cumulative distribution function and make explicit in the notation the dependence on the corner seismic moment M c ), with M c ranging from M cmin to M cmax . Thus, where the resulting distribution S mix (x) is no longer a Tap distribution but a mixture of Tap's with different M c . The term ρ(M c ) gives weight to the different values of M c . The same equation holds for any other probabilistic model (such as TPL and TrG). One could assume a uniform distribution of corner magnitudes (all its values would be equally likelly from m cmin to m cmax ). Interestingly, for the corner seismic-moment distribution, this leads to the Jeffreys prior of a scale parameter, ρ(M c ) / 1/M c . Under this choice, the integral in Eq (12) can be easily evaluated by the Monte-Carlo method. For the Tap model, the probability of an earthquake of magnitude 9.1 or larger (among all earthquakes with magnitude larger than 5.75) turns out to be S mix (x) = 2.6 × 10 −4 , corresponding to about 1 in 20 years. In comparison with the CMT catalog itself (1 of such events in 35.5 years) this probability seems somewhat large. Even higher values of the magnitude or other models (TPL or TrG) also seem to lead to an overestimation of these probabilities. Naturally, this is the core problem in the statistics of extreme events, one has very few extreme events to contrast estimations. As the result is highly sensitive to the choice of the distribution ρ(M c ), this is a topic that deserves further study.
Our results can also have applications for time-dependent hazard [18]. If we know when the last earthquake of a given seismic moment x or higher happened (a time t ago), we can obtain the probability of recurrence in a given time period Δ from the present as where the subindex w denotes that the distribution refers to the waiting time (not to the seismic moment). For a Poisson process S w is exponential with rate λ x and then we recover which turns out to be independent on t and becomes essentially the same formula used above for time independent hazard, with R a = 213.7 year −1 (we have assumed D � l À 1 x ). In order to obtain time-dependent hazard one needs to go beyond Poisson occurrence. At a global scale it has been pointed out that the gamma distribution can describe well earthquake waiting times [30,31]; nevertheless, for the sake of simplicity, we are going to illustrate the calculation with the Weibull distribution, which can give similar fits [32]. In this way, from the equation ago we can write with γ and c x the shape and scale parameters of the Weibull distribution, respectively (the latter depending on x). The Poisson case is included in the particular limit γ = 1.
The scale parameter of the waiting-time distribution can be directly related to the seismicmoment distribution: On the one hand, the number of events per unit time (with seismic moment above x) is R a S(x). On the other hand, this number is also given by 1/ht(x)i, where ht(x)i is the mean waiting time for events above x. In the particular case of the Weibull distribution, this is given by ht(x)i = c x g(γ) with g(γ) = Γ(1 + γ −1 ). Thus, which substituting into Eq (13) allows the calculation of the probability P x;t;D . In the case Δ � t this can be simplified to P x;t;D ' 1 À exp ½À gDðgðgÞR a SðxÞÞ g t gÀ 1 �: In the context of this article, the seismic-moment distribution S(x) could be substituted by the mixture for different values of M c given by Eq (12). Nevertheless, the calculation of these probabilities needs the accurate fitting of the waiting time distributions S w (t | x) (i.e., the fitting of γ and c x in the case of the Weibull distribution). This is left to future works.

Conclusions
Summarizing the main results of the article, we have reconsidered to what extent the available earthquake record can constrain the parameter that characterizes the tail of the global seismicmoment distribution: a corner seismic moment (M c , or its corresponding moment magnitude m c ), for three different distributions (truncated power law, tapered GR, and truncated gamma). We have corrected some of the drawbacks of previous literature, regarding the number of events necessary for such a purpose.
The key point in our approach is to obtain the percentiles of the distribution of the maximum seismic moment of N earthquakes, and to derive from there probability intervals that can be compared with the maximum seismic moment observed, y emp . If y emp is inside the interval there is no reason to reject the considered value of the corner parameter. Although currently (up to the end of 2017), the range of values of m c is rather wide, in 80 years from now these ranges are expected to decrease substantially, but depending crucially on the maximum value to be observed. For instance, if this were 9.3, the tapered model would lead to m c ' 9.1 ± 0.3 (95% confidence), and the truncated gamma model to 9.35 ± 0.45 (see Table 1 for more hypothetical examples). From here we conclude that the much larger periods of time estimated earlier are not justified. In addition, for the same reasons elaborated in this article, the standard errors of corner parameters that we [20] calculated previously for almost 37 years of shallow global seismicity using asymptotic likelihood theory do not provide a convenient description of the range of uncertainty in those parameters.
Supporting information S1 Fig. ccdf S(x) and pdf f(x)