## Figures

## Abstract

Starting from an extensive database, pooling 9 years of data from the top three insurance brokers in Italy, and containing 38125 reported claims due to alleged cases of medical malpractice, we use an inhomogeneous Poisson process to model the number of medical malpractice claims in Italy. The intensity of the process is allowed to vary over time, and it depends on a set of covariates, like the size of the hospital, the medical department and the complexity of the medical operations performed. We choose the combination medical department by hospital as the unit of analysis. Together with the number of claims, we also model the associated amounts paid by insurance companies, using a two-stage regression model. In particular, we use logistic regression for the probability that a claim is closed with a zero payment, whereas, conditionally on the fact that an amount is strictly positive, we make use of lognormal regression to model it as a function of several covariates. The model produces estimates and forecasts that are relevant to both insurance companies and hospitals, for quality assurance, service improvement and cost reduction.

**Citation: **Bonetti M, Cirillo P, Musile Tanzi P, Trinchero E (2016) An Analysis of the Number of Medical Malpractice Claims and Their Amounts. PLoS ONE 11(4):
e0153362.
https://doi.org/10.1371/journal.pone.0153362

**Editor: **Chiara Lazzeri,
Azienda Ospedaliero-Universitaria Careggi, ITALY

**Received: **November 15, 2015; **Accepted: **March 29, 2016; **Published: ** April 14, 2016

**Copyright: ** © 2016 Bonetti et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

**Data Availability: **All relevant data are within the paper and its Supporting Information files.

**Funding: **PC acknowledges the support of his Marie Curie CIG PCIG13-GA-2013-618794 under the Seventh Framework Programme (http://ec.europa.eu/research/mariecurieactions/funded-projects/how-to-manage/cig/index_en.htm). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Competing interests: ** The authors have declared that no competing interests exist.

## Introduction

The subject of clinical risk management and patient safety is one of the main critical points in the supply of health services. Managing disputes or litigation—and the resulting impact on health care expenditure—is a priority both at the institutional and at the organizational level (see, e.g., [1, 2] and [3]).

Over the last few years, the growth and the aging in population, the rise in expectations in the levels of health, and the increasing ease of access to information have changed patients’ demands on health services, and increased the numbers of medical malpractice claims. In this paper, we focus our attention on the Italian case, where the growing financial restrictions placed on the Italian National Health Service, and the more pressing need for insurance companies to cover specific risks in the health sector are leading to changes in the basic risk management practices, and to the development of new local strategies. These trends, however, do not always appear to be based on a solid decisional process and seem, in a few cases, to be driven by short term considerations [4].

The Italian National Health Service, after several reforms, combines common central guidelines and decentralization of health policy responsibilities to the intermediate level of government. As well stated in [5], “the Central government has exclusive power to set system-wide rules and the health services that must be guaranteed throughout the country. Regions have responsibility for the organization and administration of publicly financed healthcare. Italian Regions differ widely in terms of demography, economic development (and fiscal capacity), health care infrastructures and health expenditures. (…) In the health sector Regions developed different organizational and funding models and now there are many relatively different regional health systems.”

Medical malpractice involves patient damage, injury or death attributed to negligent behavior by a medical practitioner or other health care professions [6]. Often patients (or their families), who think to have been victims of medical malpractice, file claims against health care providers. This possibility has a potentially strong impact in terms of costs and reimbursements, and it leads doctors, other health care professions and health care organizations to underwrite liability insurance policies in order to offset their risks.

Modeling claims due to alleged medical malpractice thus becomes very important from a legal, regulatory, and insurance point of view. A better understanding of such a phenomenon can have positive effects for hospitals and clinics in terms of quality assurance, service improvement, and cost reduction. At the same time, such understanding is essential for insurance companies to be able to reliably price their policies, in order to implement a more efficient risk management approach to losses, as required by new international regulations like Solvency II (see for example the discussion in [7]).

Notwithstanding the importance of the topic, the related statistical and actuarial literature is not extensive, as most contributions deal with the legal aspects and the impact on the medical profession (see [2], and references therein). This is probably due to the lack of publicly available data, as well as to the novelty of the phenomenon in many countries like Italy—the source of our data [8].

Some specific modeling contributions are discussed in [9–13], and [14]. In particular, the modeling approaches of [10] and [11] on US data have been a source of inspiration for some of the methods that we implement below.

Here we describe what is, to the best of our knowledge, the first published large analysis of the medical malpractice phenomenon in Italy, involving statistical models both for the number of claims and for their associated monetary amounts.

The main findings of our analyses, whose details are given in the rest of the paper, can be summarized as follows:

- The inhomogeneous Poisson process is able to model the number of medical malpractice claims accurately. Its predicting power has been successfully back-tested.
- In Italy, the yearly number of claims due to alleged Medical Malpractice has (linearly) increased over time in the last years. This is true for all the typologies of claims we have analyzed: injury, injury at birth, death, monetary damage and other. The regions of Toscana, Liguria and Lazio show the highest growth. Lombardia is the only region experiencing no particular trend in the number of claims.
- The number of claims (for all possible types of causes, apart from monetary damage) is positively and significantly dependent on both the size of the hospitals and the complexity of the medical operations, as represented by the Case Mix Index (CMI) of the health care organization.
- Importantly, a clear relationship between the number of claims and the type of medical departments involved in the analysis does not emerge.
- Regarding the monetary amounts (corrected for inflation) that insurance companies have to pay in case of a successful claim, we observe an increase for claims related to the death of the patient, a stationary behavior for claims due to injuries at birth and monetary damages, and a slight decrease for non-birth injuries.
- Differently from what we obtain for the number of claims, the type of medical department does have a significant effect on the monetary amounts. For example, Orthopedics and Obstetrics generate, on average, higher disbursement costs for hospitals and insurance companies.

In Section “The Data”, we describe the Italian medical malpractice claims data set that we have used for the analysis. In “Methods”, we summarize the statistical methodology that we have implemented to model the numbers of claims and the associated payout amounts. In “Results” we discuss the main findings, including point estimates, back-testing results, and forecasts. To avoid tens of tables, we do not include all the estimates and the forecasts produced as part of the research, but they are naturally available upon request to the authors. We close in the “Discussion” section with some summarizing comments and possible extensions of our work.

Three Appendices contain the statistical details and the complete descriptions of the models we have fitted.

## The Data

In this section we describe the data that we have used in our analyses and provide some basic descriptive information. The results of the in-depth analyses will be presented in the “Results” section.

As far as we know, the data set that we have used to study the problem of alleged medical malpractice represents the largest Italian data set of this type in the scientific literature. It has been obtained by pooling the data of three of the major international insurance brokers in Italy: AON, Marsh, and Willis Italy.

The observation window ranges from January 1st 2004 to December 31st 2012.

The data set contains a total of 38125 reported claims due to alleged cases of medical malpractice. These observations arise from 15 Italian regions (over a total of 20). From North to South: Valle D’Aosta, Veneto, Lombardia, Trentino-Alto Adige and Friuli-Venezia Giulia, Emilia-Romagna, Liguria, Toscana, Marche, Umbria, Lazio, Campania, Calabria, Puglia and Sicilia. Trentino-Alto Adige and Friuli-Venezia Giulia are two independent regions, but they are pooled together using the common classification *Nordest* (Northeast). The Italian regions that are not represented in our sample are: Piemonte, Abruzzo, Molise, Basilicata and Sardegna.

It is important to stress that regions, in Italy, refer only to an historical administrative partitioning of the territory, and, in this study, they were *not* constructed on the basis of the presence of any health care disparity.

The data set roughly contains 52% of all the hospitalizations in public hospitals in the available regions, with respect to 2012 data, that is 3,152,611 out of a total of 6,087,039. The best covered regions are Nordest and Lombardia, with a coverage of 100% and 83%, while the worst covered ones are Marche and Veneto, with 8% and 18% [15].

Regarding the representativeness of the sample, it is important to stress that the data have not been sampled randomly. This is due to the fact that our observations only come from those hospitals, which have underwritten an insurance contract with one of the three brokers, thus determining a selection bias.

For each claim the following information is available: Region, Hospital Code, Medical Department, Date of the Reporting of the Claim, Alleged Cause of the Claim. The claims can be due to injury (INJ), death (DEA), injury at birth (BIR), monetary damage to people and things (DAM) like a theft or a broken mobile, or to other causes (OTH). The need to disaggregate injuries at birth from the other injuries is due to the tremendous impact this type of events has, both from a personal and an insurance point of view. This disaggregation was suggested in one of the many discussions we had with practitioners and insurance brokers, when cleaning the data.

In addition, for each hospital, the total number of hospitalizations in 2012 is known, as well as the Case Mix Index (CMI). The CMI represents the complexity of a hospital’s patient mix (see [9, 16], and [17]). As such, we have used it as a measure of the average complexity of the procedures performed within each hospital.

We have classified the medical departments claims refer to as follows: Anesthesia (AN), Surgery (SU, all specializations apart from orthopedic surgery and emergency surgery), General Medicine (ME), Orthopedics (OR), Obstetrics and Gynecology (GY), Not Classifiable (NC), Health Support Services (HS, i.e. histology, laboratory, etc.), Emergency (ED), Other departments (OT), and Missing Information (NA). The NC category refers to the whole hospital: claims for the “NC department” are those claims that cannot be associated to any specific department within the given hospital/clinic. An example would be “falling from the stairs while hospitalized.” Note that this is not the same as “Other departments,” which indicates a separate group of known departments, for which only a small number of claims was recorded, thus suggesting the need of aggregation not to lose statistical significance. NC is also different from NA: while the first refers to the whole hospital, for the second we are just in a missing information situation (it could be surgery or anything else, but we do not know).

For many of the claims, the status of the claim (open or closed) is also known as detailed in Subsection “Amounts” below. For closed claims, the payoff amount, i.e. the payment settled by (or imposed to) the insurance company for that claim, is also available.

### Number of claims

Table 1 contains the number of claims by department and alleged cause of claim. From this table one can extract some interesting information. For example, it appears that most claims related to monetary damages are connected to the whole hospital (Not Classifiable, NC in our acronyms), where a mobile phone can be easily lost or stolen in the common areas, while injuries seem to be very often linked to surgery and orthopedics departments, probably because of the more invasive treatments.

Anesthesia departments generate the smallest number of claims in the data set, probably because anesthesia is always coupled with some type of surgery, and the activities of this type of department are more visible to the patients. As expected, injuries or deaths at birth only concern the departments of obstetrics and gynecology.

Fig 1 shows the yearly number of reported claims, for all types of alleged causes, in the period 2004–2012. An overall increase in the number of reported claims is observed during the period 2004–2011, while we notice a drop in the number of reported claims in 2012. As a matter of the facts, at least one of the insurance companies was still collecting and organizing the data for the last months of 2012, so that those observations are not in our data set. To avoid the consequences of this recording delay, we have decided to restrict our attention on the 2004–2011 time window.

Yearly number of claims for all regions pooled together (2004–2012).

### Amounts

For the analysis of the payoff amounts, for which the recording delay is not as relevant, we have used a selection of the 38125 observations in the 2004–2012 time window, split among the different alleged causes of claim as shown in Table 2.

In particular, claims had status equal to Open (16971), Closed (14058), Without Further Action (WFA, 4574), or Unknown (2522). We have analyzed the amounts associated with claims having WFA (4574) or Closed statuses, and with a non missing amount (11285 out of 14058). We have corrected any missing amounts associated with WFA claims to be equal to zero. Claims with zero monetary amount but with Open status were removed from the analysis, as these were not true zeros being the claims still open. All 2522 claims with Unknown status also had missing amount, and were removed as well. All in all, a total of 15859 claims with monetary amount was therefore available for the analysis, as shown in Table 2. Table 3 shows the distribution of the claims used for the analysis of the amounts, by region and by type of department.

All amounts have been adjusted for inflation using the Consumer Price Index (CPI) elaborated by the Italian Institute of Statistics [18]. All amounts were converted into Jan 31, 2012 Euro levels by using a yearly (geometric) average CPI of 2.15%.

The median payment was equal to 984 euros, the average to 26,220 euros, and the observed maximum to 5,387,470 euros. Table 4 shows the maximum monetary amounts observed within each combination of department by type of claim.

A preliminary analysis of the claim amounts, all together and by type of claim, suggested a marginal lognormal model for the non-zero payments. As an example, Fig 2 shows the histogram of log-transformed non-zero payments for claims related to injuries. In Appendix 1 we describe additional analyses that further support the use of the lognormal distribution in our analyses of the amounts.

Histogram of log-transformed non-zero payments related to injuries.

For the open claims, information about the amounts reserved by the insurance companies was sometimes also available, and we did indeed repeat all the analyses using that information as well. For brevity, here we do not include those additional analyses, but they are available upon request.

## Methods

In this section we summarize the modeling approach that we have followed. The technical details of such approach are described in Appendix 2.

### Modeling the number of claims

For modeling the numbers of claims we have used an inhomogeneous Poisson process, choosing the combination medical department by hospital as the unit of analysis. This means that all claims are gathered according to such combinations; in other words, each medical department by hospital unit is treated as a separate generator of claims. This is different from what Cooil [10] and Gibbons et al. [11] did in their works, where the unit of analysis was the single physician.

For each unit of analysis *i*, *i* = 1, …, *m*, with *m* the number of units, we modeled the number of claims by an inhomogeneous Poisson process whose time-varying intensity function is linearly dependent on a set of covariates (including time itself).

In the analysis we used the following covariates:

*x*_{i,1}: the CMI of the hospital that the unit of analysis*i*belongs to. This is used as a measure of the complexity of the medical services offered by the hospital;*x*_{i,2}: the total number of hospitalizations (HOS) in 2012 for the hospital the unit of analysis*i*belongs to. This quantity represents a proxy for size. Given the lack of more precise information—we have assumed that the size of each hospital in 2012 could also describe its size for the previous years (and speaking with sanitary experts this appears to be a reasonable assumption on a short time scale);*x*_{i, j}: for*j*= 3, …, 10, a set of 8 dichotomic variables used to identify the different types of medical departments:

*x*_{i,3}: Department of Anesthesia, AN*x*_{i,4}: Dept. of Surgery (all specializations except orthopedic surgery and emergency surgery), SU*x*_{i,5}: Dept. of General Medicine, ME*x*_{i,6}: Dept. of Orthopedics, OR*x*_{i,7}: Dept. of Obstetrics and Gynecology, GY*x*_{i,8}: Not Classifiable, NC*x*_{i,9}: Health Support Services (e.g. labs), HS*x*_{i,10}: Emergency Department, ED.

*x*_{i,0}(INT), in order to avoid collinearity. For the number of claims, we are indeed interested in identifying departmental effects for major departments only.

For each reported claim, the date of the event and the date of reporting are available. After consultation with the brokers who provided the data we have decided to work with the latter only, i.e. with the time when a claim first appears in the database (clearly the date of an event appears in the database only after reporting has occurred). This decision was based on the fact that the reporting date is what really matters for insurance-related considerations. One should in fact expect some delay in the reporting of claims. Comparing the date of the reported claim with the date of the event generating it, we have found out that the overall average delay is equal to 1.69 years. Claims due to monetary damages are typically reported after 72 days from the event, while claims due to injuries at birth are reported on average after 742 days.

Consistently with such consideration, no adjustment has been performed for departments that were added or removed from the set of the claim-generating process over the years. As a consequence, such changes are reflected into the brokers’ databases as changes in the intensity of the reporting process.

The model was initially estimated at the national level. However, the Italian National Health Service allows the different regions to have diverse regimes of health governance, provided that a minimum level of service quality is guaranteed. As a consequence it seems more reasonable to estimate the model separately for each of the available regions, rather than just using one single model with intercept modifiers for the distinct regions.

As mentioned above, claims were grouped into five macro-sets (types of claims): claims due to injuries (injuries, INJ), claims due to injuries at birth (Birth, BIR), claims due to death (Death, DEA), claims due to monetary damages to people and things (Damages, DAM), and claims falling into other categories (Other, OTH).

We have estimated a total of 23 models, that is one for each of the (75) combinations of regions by types of claims for which a sufficient number of observations were available (including the case pooling together all observations, without regional differences, which we call “ALL”.).

The estimation of the models was performed using maximum likelihood. It is worth pointing out that the model for the claims due to injuries and deaths at birth is different, since these claims can only arise from departments of obstetrics and gynecology.

### Modeling the amounts

#### The two-component model.

The inflation-adjusted liquidation cost/payment (*C*) has been modeled separately for the zero and the non-zero amounts, using a two-step regression approach:

- A logistic regression model for the probability that a claim is closed with a zero payment.
- Conditionally on an amount being strictly positive, a lognormal regression model for the amount
*C*.

Both regression models have been developed to assess the statistical significance of the different regions, of the medical departments and of time (allowing for a possible quadratic effect of time on the two outcomes as well). It is worth underlining that the two (distinct) model selection processes will in general produce *different* sets of significant covariates. As a consequence, some care must be used to properly keep track of this fact in the later production of forecasts for the costs. In Appendix 2.2 we provide more details, and we also explain how to obtain the prediction intervals for the conditional expected values of (positive) costs, as well as for the *overall* mean costs.

#### Expected costs and tail amounts.

Under the assumption that the expected values of the costs do not depend on their number, the expected value of the overall amount for a given time interval can be estimated as the product of the expected number of events and the expected amount for each event. Hence such average total amount can be easily computed from the models for the number of claims and the associated amounts (more details in Appendix 2.2).

One could also study the distribution of the total (regional or national) amounts, and in particular the quantiles of such distributions (the well-known Value at Risk—*VaR*—approach in risk management). This study would require an extensive simulation study from the joint distribution of the number of events and their amounts, and for completeness it should also take into account the sampling variability of the estimated parameters of the models. Such an approach would however still produce strongly model-dependent total amounts. As a matter of fact, the goodness of fit of the models, for the largest total amounts, would probably be very hard to assess, and the exercise could lead to a dangerous over-interpretation of the evidence contained in the data.

On the other hand, some information on such high-amount claims is indeed desirable. In Appendix 2.3 we describe how one can study the probabilities that some of the predicted total numbers of events lie in the extreme tail of the amount distribution, and we provide details on how to estimate their average value, the “Expected Shortfall”.

## Results

In this section we describe the main results from the analyses. For all the remaining cases, we are available to share them upon request.

### Number of claims

Starting from the original 38125 claims, restricting our attention to the period 2004–2011 and imposing the condition of having a value for all covariates of interests, we have analyzed 36981 observations.

Claims have been grouped into five macro-classes: INJ, DEA, DAM, OTH and BIR. We will discuss the first four in the next paragraph, and the BIR data in Subsection “Number of claims for injuries at birth”. This separation is due to our modeling choices, as explained in Appendix 2.1.

Remember that, in what follows, when using the dummy variables for the different departments, the intercept contains both OT and NA (defining the residual OT/NA group).

#### Number of non-birth-related claims.

Within each class we have estimated models for all the data pooled together (i.e. without regional distinctions), and models for each region for which a sufficient number of observations were available. The model parameters were estimated on all 2004–2011 data. Model selection has then been performed using the Akaike’s information criterion (AIC), as common in these cases [10]. Tables 5, 6, 7, 8 and 9 contain some examples of the results for injuries (all regions, Lombardia and Toscana), deaths (Lombardia), and monetary damages (Liguria).

Estimates of the parameters of the Poisson model as per Appendix 2.1, number of departments of a given type that generated each alleged type of claim (N.Dep), observed frequencies of claims for the different types of departments (Obs.F), expected frequencies according to the model (Exp.F), and predicted claims for 2012 (P2012) and 2013 (P2013), together with their standard deviations (in brackets).

Estimates of the parameters of the Poisson model as per Appendix 2.1, number of departments of a given type that generated each alleged type of claim (N.Dep), observed frequencies of claims for the different types of departments (Obs.F), expected frequencies according to the model (Exp.F), and predicted claims for 2012 (P2012) and 2013 (P2013), together with their standard deviations (in brackets).

Estimates of the parameters of the Poisson model as per Appendix 2.1, number of departments of a given type that generated each alleged type of claim (N.Dep), observed frequencies of claims for the different types of departments (Obs.F), expected frequencies according to the model (Exp.F), and predicted claims for 2012 (P2012) and 2013 (P2013), together with their standard deviations (in brackets).

A first consideration from Tables 5 to 9 is that the inhomogenous Poisson process correctly replicates the observed numbers of claims. Indeed, the maximum difference between observed and fitted numbers of claims, among all models, is an overestimation by 4 units.

Each table also contains predictions for years 2012 and 2013, on the basis of the models estimated up to the end of 2011. It will be interesting to verify them with actual data, should they become available to us.

For what concerns the estimates of the parameters of the model, it is worth noticing that most of them are significant at the 5% level of significance. For example, in Table 5, where we consider the claims due to injuries in all regions pooled together, all parameters are significantly different from zero apart from the one related to NC, the Not Classifiable category. Thus, when analyzing all claims for injuries without any regional distinction, the NC “departments” show no particular difference with respect to the baseline. In other words, after model selection, NC “departments” are included within the new OT/NA/NC group.

As expected, the size of the hospitals (in terms of patients in 2012) and the complexity of the operations (as expressed by the CMI) have, on average, a positive influence on the expected number of claims, especially for what concerns claims due to injuries and deaths. For what concerns claims due to monetary damages, conversely, it is not possible to obtain a clear relation with respect to CMI, but this is in line with the nature of the claims, not really related to the complexity of hospital operations; while the size of the hospital has a positive effect: the larger the number of patients, the larger—on average—the number of small economic losses.

For what concerns the dummy variables representing the departments, it is not possible to identify a unique behavior. This is quite surprisingly, since one would for example expect surgery departments to be riskier than the average.

We should note that the parameter *δ* is always strictly larger than 0, most of the times larger than 1 as well, but smaller than 2. In our model (Eq (6) in the Appendix), this means that an underlying linear trend is enough to model the average increase in the number of claims over time (see also Fig 1). An increase in the number of claims is present in all regions, with the only exception of Lombardia region, where no significant trend is observed (in Table 6, for instance, for claims due to injuries in Lombardia, *δ* can be safely constrained to 1).

In order to assess the predictive power of the model we have performed some back-testing experiments. In particular, we have estimated the model parameters using data until December 31st 2010, and have used the estimates to predict the number of claims in 2011. Predictions were then compared to the observed numbers of claims in 2011 for the different alleged claim causes, department types, and regions.

The results were quite satisfactory. For example, Table 10 shows the comparison for the numbers of claims due to injuries (INJ) using data from Lombardia region. The worst prediction in the table is obtained for the department of general medicine (ME): the actual number of claims is 122 while the model predicts 145 claims, with an error of 18.8%. The best prediction is given for gynecology and obstetrics, where the error is just 1%. In general, the most problematic units are the departments of general medicine (ME) and the Not Classifiable (NC) ones. The maximum error is equal to 19.7% for the claims due to injuries, in the general medicine departments in Tuscany. The prediction error across all cases is around 12%.

Observed (historical) claims against claims predicted for 2011.

#### Number of claims for injuries at birth.

The model for the number of claims due to injuries and deaths at birth is different from the one given in Eq (6) in the Appendix. In particular, we no longer need the covariates *x*_{i,3}, ⋯, *x*_{i,10}, given that all claims belong to the same department: Obstetrics and Gynecology. The data set contains 717 claims due to injuries and deaths at birth (go back to Table 1). These claims mainly come from Lombardia, Emilia-Romagna, Liguria, Toscana, Lazio and Calabria. For the other regions the number of observations is not sufficient to estimate the model reliably.

Table 11 contains the estimates of the parameters of the model, the predicted claims in 2012 and 2013 and their standard deviations, for all the claims pooled together (ALL), and for the different regions for which the model is estimable. The number of hospitalizations appears to be the most important covariate, while CMI is significant only for the pooled data and for the Lazio region. As usual model selection has been performed using AIC.

The asterisk indicates significance at 5% level, the star ⋆ indicates that *δ* is also significantly different from 1 at the 5% level. In brackets, we provide the standard deviations of the predicted claims in 2012 (P2012) and 2013 (P2013).

We have also back-tested this model, and the quality of results is comparable to what we have seen for non-birth-related events.

### Amounts

A large quantity of results is obtained when looking at the amounts associated with all types of claims. Here we show how the model-produced information should be interpreted and used, by only focusing on the results obtained for the amounts associated with injuries, in our opinion the most interesting ones.

Here, the departments OT and NA are not pooled together, because it may be relevant to isolate the amounts related to non-major departments (OT), from those for which no information was available (NA).

#### Additional descriptive statistics and model forecasts.

The cost analyses for injuries are based on a large number of claims (11134), shown by region and department in Table 12. A total of 38.1% of such claims had an associated amount equal to zero.

For injuries, the model selection procedure for the probability that cost is equal to zero has identified statistically significant effects for several regions, medical departments, and for calendar time (quadratic effect). For the conditional (on its being positive) model for cost, the model selection process identified significant effects for the Sicilia and Veneto regions. Detailed results, including all parameter estimates, are reported in Appendix 3. Note that from a health management point of view it would be interesting to further investigate these regional differences. Despite being both part of the Italian Health System, Veneto and Sicilia have two very different sanitary management systems, in accordance with the Italian law, which provides regions with a high level of independence.

Tables 13 and 14 contain descriptive statistics for the injury claims, for each of the regions and departments as identified by the models. In particular the tables show: the total number of claims used for the analysis (*n*); for positive amounts, their observed conditional mean and median (*C-Mean* and *C-Median*) and the conditional mean and variance of their natural logarithm (*C-LogMean* and *C-LogVar*); and the overall (i.e. unconditional) observed mean and median (*Mean* and *Median*).

Thanks to our modeling, one may compute estimates for all relevant model-based quantities for any specific time point, as long as it is not too far from the time window of data collection. Table 15 provides a detailed legend of the information that is presented in Tables 16 and 17, where forecasts for June 30 2013 are provided (remember that our data stop on December 31st 2012, therefore June 30 2013 is a future date).

Forecasts refer to 30 June 2013. (Continued).

Let us focus our attention on Table 16 and, in particular, on the Liguria region. The departments of Anesthesia and Orthopedics are the ones with the highest probability of non-zero amounts, that is to say those departments that generate the largest number of positive disbursements for insurance companies and hospitals. The departments showing the highest median amounts are Orthopedics and Obstetrics. These departments are also the ones associated with the highest expected costs (about 24k euros), the highest 90% Value-at-Risk (the amount with respect to which only 10% of all paid amounts are larger, i.e. the 90% quantile) and, as a consequence, the highest 90% expected shortfall, that is to say the expected paid amount, when considering the top 10% of all disbursements.

Similar considerations can be made for all the regions in the data set, and it is interesting to see how, in every region, Orthopedics and Obstetrics appear to be the most expensive departments in terms of disbursement, every time a medical malpractice claim is made. The NC category (the whole hospital), on the contrary, is on average associated to the smallest amounts. This is easy to explain: the NC category typically refers to events happening in the common areas of the hospital, which are usually associated to monetary damages and minor injuries.

It is also possible to plot the model-based quantities of interest with respect to time, in order to study their trends for different covariate values. Such plots are useful to obtain an exploratory overall impression of the absolute impact of the baseline covariates and time on the cost associated with the claims.

While the object of such detailed examinations is not among the goals of this article, we do show in Figs 3–5 three examples of such model-based curves. Fig 3 shows the estimated probability that cost is equal to zero versus time from January 1st, 2004. Figs 4 and 5 show, again against time, the estimated median cost and the 95*th* quantile of the cost distribution, also taking into consideration the zero amounts. The different curves on the three plots correspond to the different combinations of baseline covariate values (regions by departments). In Fig 3 a consistent behavior is identifiable for all regions by departments: the estimated probability that cost is equal to zero tends to slightly increase during the first 30 months and then decreases. For Figs 4 and 5, on the contrary, no unique trend is observable and further analyses are needed.

Estimated probability that Cost is equal to zero vs. time, for all baseline covariate values.

Estimated unconditional mean Cost vs. time, for all baseline covariate values.

Estimated unconditional median Cost vs. time, for all baseline covariate values.

#### Expected and tail amounts.

We finally provide some examples to show how to derive expected and tail amounts.

For 2013, a total of 218 injury-type claims have been forecast for the orthopedic departments of the Lombardia region. The corresponding average cost of each of such events is equal to 24,976 euros. A simple multiplication of such average amount by 218 generates an estimated overall cost for such claims of 5,444,768 euros. It should be noted that the 95% confidence interval for the claim-specific expected cost, i.e. (17,438;35,510), is all but narrow, and that the overall cost forecast also has its own sampling variability. From the part of the model that describes the probability that the amounts are equal to zero, one may easily produce a forecast for the proportion of claims (out of the 218) that will have a strictly positive amount. For 2013 such proportion is equal to 0.95, and the 95% confidence interval is (0.94, 0.97). As a consequence, a total of 207 injury-type claims with non-zero associated amounts are expected, and the 95% confidence interval is derived as (205, 211).

Focusing on the extreme amounts and on the number of such claims, let us now consider the case of injuries in anesthesia departments of the Toscana region. A total of 34 claims have been forecast for 2013, and the June 30 2013 forecast for the 90*th* quantile of the amount distribution is equal to 8,989. This forecast already takes into account the zero amounts, which are estimated to occur with probability equal to 1 − 0.91 = 0.09. The binomial formula in Appendix 2.3 allows us to easily obtain the probability that at least 8 of the 34 claims have associated amounts greater than or equal to 8,989 as being less than or equal to 0.017. Note that in this example *np*(1 − *p*) = 3.06, so that it would not be appropriate to use the normal approximation for the previous computations. A similar procedure can easily be employed for the number of claims that may yield even more extreme amounts; it is in fact sufficient to use larger quantiles of the amount distribution.

Finally, for the same departments and for the same year, the expected amount for claims that have an amount greater than the 90*th* quantile (8,989 euros) is estimated as being equal to 26,681. Such number is quite large since it refers to amounts that are in the top 10% tail of the distribution. As we have pointed out above, such an amount should be treated with caution as it is based on our parametric (lognormal and logistic) assumptions.

## Discussion

The problem of medical malpractice risk assessment is becoming more and more important for the Italian Health System, because of its implications in terms of public expenditure and hospital management. Indeed, differently from the past, an increasing number of Italian patients is following the North American trend of filing lawsuits against hospitals and doctors [8]. Relatedly, there has recently been a lot of discussion in the country about an advertising campaign on TV and newspapers. The campaign suggested the possibility for patients to be reimbursed for cases of medical malpractice. Notably, the campaign was promoted by some associations of lawyers, and it has caused a strong negative reaction from physicians in the country [19, 20].

In this article we have analyzed the number and the payoff amounts of medical malpractice claims in Italy, in the period 2004–2012, using a large database pooling the observations of three major international brokers. We believe this work will provide a useful contribution to the quantitative study of the phenomenon of medical malpractice, not only in Italy, but also in other countries.

Despite the richness of our data set, we stress once again that it is not advisable to extend any forecast based on our data to the whole country. As already observed, our data were not randomly sampled, as our observations only come from those hospitals, which have underwritten an insurance contract with one of the three brokers providing the database. This necessarily determines a selection bias, which undermines representativeness.

Our analysis seems to suggest an increase in the number of reported claims over time for most Italian regions (only exception: Lombardia), even if it will be interesting to observe whether this trend will continue in the future. The performances of the inhomogeneous Poisson process have been checked in-sample and via back-testing, and they have proved to be very satisfactory.

For what concerns the payoff amounts (for the settled claims), we have registered an average increase for claims due to death, a stationary behavior for claims due to injuries at birth and monetary damages, and a slight decrease for injuries.

We should point out that the expected values estimated for the costs in the different subcategories prove to be somewhat unstable, with wide prediction intervals. Nevertheless, these forecasts do provide useful indications, e.g. for the trend of costs over time. Clearly, the forecasts of the cost distribution’s quantiles are sensitive to the parametric model chosen (log-normal), as are the expected values predicted for the tails of the distributions for the various amounts. These are in fact dependent on the hypotheses made for the tails of the distributions. Once again, extreme caution should be used when interpreting such cost quantiles.

However, despite all caveats, we do think that our modeling has achieved its goal, in describing and forecasting the phenomenon of medical malpractice in Italy. Should a complete, more updated data set be made available, the methodology could be effectively employed to produce estimates for future periods of time.

Given our results about costs, not as high as one could expect, the decision taken by some Italian regions to consider the partial retention of the clinical risk is understandable. Naturally, this decision implies the necessity of acquiring properly skilled personnel, with the competence to deal with the process of accepting, assessing and—should this be required—settling claims for damage. They should also be qualified for defining the right policies for earmarking reserves in the public budget. Further, any decision to mitigate the clinical risk using insurance options should not be undertaken without first making a historical analysis of the claims experienced by each region, hospital and department. But, in order not to fall in the trick of historical bias, these decisions should also be oriented towards covering risks with the lowest frequency and the greatest financial impact—the so-called black swans in the everyday language. If said risks were not adequately covered by setting aside considerable budget reserves, the result would be a series of unforeseeable, and thus unmanageable, losses [14].

To conclude, some relevant points for discussion and future work include the possibility of implementing the methods on a continuous-time scale, so that a timely monitoring of the phenomenon could be performed. It could also be relevant to develop a related alarm system [21], as a way of monitoring the phenomenon [22].

As time progresses, further checks on the accuracy of the models’ forecasts may then be performed, by matching our prediction with the newly observed data made available by continuous monitoring.

Also, while such information was not available to us, a possible enrichment of the analyses could include the variation of the number of patients and of CMI over time, within each “department by hospital” event-generating unit.

Finally, the lack of regional homogeneity observed in this analysis could serve as a starting point for a more general discussion on the interpretation of these differences. If more data become available, it would be interesting to study the impact of the different regional Health Systems on medical malpractice claims in Italy.

## Appendix 1—On the lognormal assumption for the distribution of the amounts

In this appendix we report on some additional analyses that further support the use of the lognormal distribution in our analyses of the amounts.

A moment-ratio plot, as the one in Fig 6, involving the sample coefficient of variation (CV) and the skewness, indicates that claims (pooled all together) can be modeled with a lognormal-like distribution. Introduced by [23], and further developed in [24] and [25], moment-ratio plots represent a simple way of visualizing and discriminating among distributions. Some distributions may be represented as a set of points, some others as curves or areas. For more details on the interpretation of moment-ratio plots we also refer to [26].

Discriminant moment-ratio plot for the non-zero payments, all claims pooled together. The large dot represents the pair “CV and Skewness” and it falls in the so-called lognormal region.

Lognormality is also supported by the study of the mean excess function of claims, a tool commonly used in extreme value statistics. In particular, let *X* be a random variable with distribution *F* and right endpoint *x*_{F} (i.e. ). The function
(1)
is called mean excess function of *X* (ME). The empirical ME of a sample *X*_{1}, *X*_{2},…, *X*_{n} is easily computed as
(2)
that is the sum of the exceedances over the threshold *u* divided by the number of such data points. Interestingly, the ME is a way of characterizing distributions within the class of continuous distributions [27]. For example, the Pareto distribution (and its generalizations) is the only distribution characterized by the so-called van der Wijk’s law [28], that’s to say by a mean excess function linearly increasing in the threshold *u*.

In case of lognormally distributed random variables, we have
(3)
and the mean excess function has a behavior very similar to the sample plot computed on our data and shown in Fig 7. That graph is known as meplot, and it is obtained by plotting the pairs {(*X*_{i: n}, *e*_{n}(*X*_{i: n})) : *i* = 1, …, *n*}, where *X*_{i: n} is the *i*−th order statistic. For a complete treatment about mean excess functions and meplots we refer to [29].

Mean excess function plot for the non-zero payments. Concavity is a symptom of lognormally distributed data.

To further exclude other heavy-tailed models (such as the Generalized Pareto Regression [30]), we studied the finiteness of the first four moments for the non-zero payments. The use of a Maximum to Sum plot, as the one in Fig 8, shows that at least the first four moments of the distribution of claim amounts are finite, indicating the absence of very heavy tails. This plot relies on the fact that, for a sequence *X*_{1}, *X*_{2}, …, *X*_{n} of nonnegative i.i.d. random variables, if for *p* = 1, 2, 3…, *E*[*X*^{p}] < ∞, then as *n* → ∞, where and . This follows from the law of large numbers, as shown for example in [29]. In conclusion, in our case the existence of the first four moments suggests that Paretianity can safely be ruled out.

Maximum to Sum plot for the non-zero claims, first four moments (*p* = 1, …, 4). The convergence towards zero, in all four subplots, suggests that the corresponding moments are finite.

## Appendix 2—Technical details of models

### A2.1—On models for the number of events

For each unit of analysis *i*, *i* = 1, …, *m*, with *m* the number of units, we modeled the number of claims by an inhomogeneous Poisson process whose time-varying intensity function is linearly dependent on a set of covariates (including time itself).

For every *i* = 1, …, *m*, we let λ_{i}(*t*) be the intensity of a Poisson process at time *t*, while Λ_{i}(*t*) is the corresponding cumulative intensity, such that
We then assume the following functional form for the intensity function
(4)
so that
(5)
with
where *x*_{i,0}, *x*_{i,1}, … , *x*_{i, k} are the covariates of the model, with *x*_{i,0} being the intercept. The parameters *γ*_{0}, *γ*_{1}, … , *γ*_{k} are therefore the coefficients of the covariates to be estimated.

The parameter *δ* in Eq (4), which modifies the time trend of the Poisson intensity, is coherent with the Weibull hypothesis for the baseline intensity of the process [31][32], i.e. the part of the intensity function that does not depend on the covariates. Rewriting Eq (4) as
(6)
shows that the intensity in Eq (4) can be factorized as
where the term λ_{i,0}(*t*) = exp (*γ*_{0}) *δt*^{δ−1} does not depend on the covariates (notice *x*_{i,0} = 1), while λ_{i, x} is the covariate-dependent part of the intensity.

The estimation of the models was performed using maximum likelihood. Given Eq (4), the log-likelihood of each model can be written as
(7)
where *n*_{i} is the number of claims for unit *i*, is the total number of claims in the data set (*m* being the total number of units), *t*_{i, j} is the time in which claim *j* of unit *i* was reported, and *T* is the time length in years of the observation window, once we assume that January 1st 2004 is equal to the origin, i.e. time zero.

As we have mentioned, the model for the claims due to injuries and deaths at birth is different, since these claims can only arise from departments of obstetrics and gynecology. Hence the log-likelihood Eq (7) for such claims does not include the covariates *x*_{i,3}, ⋯ , *x*_{i,10}, so that the Eq (6) reduces to

### A2.2—On models for the amounts

It is well known that if *Y* = log(*C*) follows a normal distribution with mean *μ* and variance *σ*^{2}, then the expected value of *C* = exp(*Y*) is equal to . Also, because of monotonicity of the exponential function, the medians of *Y* and *C* are equal to *μ* and exp(*μ*), respectively.

We define the p-*th* quantile *y*_{p} of the log-cost by *P*(*Y* ≤ *y*_{p}) = *p*. The corresponding cost quantile is therefore *q*_{p} = exp(*y*_{p}) = exp(*μ* + *z*_{p} *σ*), where *z*_{p} is the p-*th* quantile of a standard Gaussian distribution.

If we set *p*_{0} = *P*(*C* = 0), it is not difficult to show that the overall p-*th* quantile of *C*, taking into account the point mass probability at zero, is *q*_{p} = exp((*μ* + *z*_{ϵ} *σ*), with *ϵ* = min(1, (1 − *p*)/(1 − *p*_{0})).

Both regression models have been developed to assess the statistical significance of the different regions, of the medical departments and of time (allowing for a possible quadratic effect of time on the two outcomes as well). It is worth underlining that the two (distinct) model selection processes will in general produce *different* sets of significant covariates, which we call **V**_{1} and **V**_{2}, for the logistic regression component and for the lognormal regression component of the overall model, respectively. As a consequence, some care must be used to properly keep track of this fact in the later production of forecasts for the costs.

For the different combinations of covariates, let *β*_{1} and *β*_{2} be the parameter vectors of the two components of the model. With and we indicate their estimates.

Given and , one can easily obtain an estimate of the probability that the amount corresponding to a claim is equal to zero as Prediction intervals for such probability can also be readily obtained from the logistic regression analysis.

The estimated expected value of cost, for a value of the covariate vector **V**_{2} = **v**_{2,0}, is
where is the variance estimate obtained from fitting the lognormal model. The estimation of expected values is notably difficult for the lognormal model, because of a problem of bias, and we recommend using predicted quantiles instead.

The estimated overall expected cost for **V**_{1} = **v**_{1,0} and **V**_{2} = **v**_{2,0} is
(8)
For ease of notation, in what follows, we drop **v**_{1,0} and **v**_{2,0}, so that, for example, becomes .

Clearly, it is possible to construct approximate *α* level prediction intervals for the conditional expected values *E*(*C*|*C* > 0). By keeping the variance estimate fixed, we get . It is relevant to note that fixing the variance generally underestimates the sampling variability of the predictions, even if, on the other side, it simplifies computations.

Prediction intervals for the *overall* mean costs can then be obtained by exploiting the independence between the sampling distributions of and . By using as the confidence level for the prediction intervals for the two terms in Eq (8), it is possible to show that 0.95 is a lower bound for the (approximate) confidence level for the prediction interval of *E*(*C*; **v**_{1,0}, **v**_{2,0}), constructed as , where and are the lower and upper extremes of the *α* prediction interval for *P*(*C* > 0) = 1 − *P*(*C* = 0), as obtained from the logistic regression model.

Similarly, an approximate *α* level prediction interval for the *p*−th quantile *q*_{p} is obtained as , where and .

Last, let *N* be the number of events of a given kind in a given time interval, and *C*_{1}, … , *C*_{N} their associated (*i.i.d.*) amounts. If one assumes that the expected values of the *C*_{i} do not depend on *N*, then the expected value of the overall amount for the time interval is
i.e. the product of the expected number of events and the expected amount for each event.

### A2.3—On tail costs

In this appendix we describe the predicted numbers of events (*N*), the probabilities that some of them lie in the extreme tail of the amount distribution, and the estimated Expected Shortfall (*ES*), or the average cost among costs greater than the (1 − *α*) − level quantile of the cost distribution.

Indeed, the cost quantiles are available, and the conditional distribution of the number *V* of extreme claims out of the *N* = *n* is a Binomial(*n*, *α*), where *α* is the tail area corresponding to the quantile amount *q*_{1 − α} of interest, as estimated from the analysis of the amounts. For example, the probability of observing *k* or more claims out of the *n* whose associated amounts are greater than or equal to *q*_{1 − α} is equal to
where *α* = *P*(*C* ≥ *q*_{1 − α}) = *P*(*C* ≥ *q*_{1 − α}|*C* > 0)*P*(*C* > 0), and where all quantities are estimated from data. Note that this procedure is similar to the back-testing approach that is sometimes used in the verification of Value-at-Risk in risk management [7]. For large values of *n* (and as long as *nα*(1 − *α*) > 7, say) one may use the normal approximation to the binomial distribution and use the expression
with Φ the standard normal cdf.

Let us now turn to the estimated expected amounts (or Expected Shortfall—*ES*—in risk management terminology), conditionally on the amount being positive (which happens with some probability 1 − *p*_{0}), and in particular in the *α*-probability upper tail of the distribution. It is easy to check that
where again we plug-in all the estimates to obtain such conditional expected values consistently. In our analyses we provide the quantity based on (*q*_{0.90}) as an example (hence choosing *α* = 0.10).

## Appendix 3—Final models for amounts

In this appendix we report the details of the final models selected for the amounts associated to injuries.

Details of the final model selected for the probability that cost is equal to zero are shown in Table 18. That model has identified statistically significant effects for several regions, medical departments, and for calendar time (quadratic effect). For the model for (positive) cost, the model selection process identified significant effects for the Sicilia and Veneto regions, as well as for all medical departments and for time. The estimated parameter values are shown in Table 19.

## Supporting Information

### S1 File. S1_File.zip.

The data for the largest region in the data set (Lombardia/Lombardy) are freely available for download. For each claim, we provide: code of the hospital (Hospital), department (Department), time of the event (Time), year (Year), type of claim (Event), CMI (CMI), number of patients for the hospital in 2012 (Patients 2012), reserved amounts (Reserves) and final amounts (Final Amount). The names of the Brokers have been anonymized: 1BR23 indicates Hospital 23 of Broker 1BR. Time has been rescaled, so that Day 1 corresponds to January 1 2004.

https://doi.org/10.1371/journal.pone.0153362.s001

(ZIP)

## Acknowledgments

The authors wish to thank the Research Division C. Demattè of SDA Bocconi School of Management, as well as the precious and continued support of the three brokers AON, Marsh and Willis Italy. The authors had access to all of the data in the study and take responsibility for the accuracy of the data analysis. The authors also wish to thank two Referees and an Academic Editor for their thoughtful comments and suggestions on an earlier version of the manuscript.

PC also acknowledges the support of the Marie Curie Career Integration Grant “Multivariate Shocks” (PCIG13-GA-2013-618794) from the European Union.

## Author Contributions

Conceived and designed the experiments: MB PC PMT ET. Performed the experiments: MB PC PMT ET. Analyzed the data: MB PC. Wrote the paper: MB PC PMT ET. Data cleaning: MB PC. Introductory qualitative study: PMT ET.

## References

- 1. De Feijter JM, de Grave WM, Muijtjens AM, Scherpbier AJJA, Koopmans RP. A Comprehensive Over-view of Medical Error in Hospitals Using Incident-Reporting Systems, Patient Complaints and Chart Review of Inpatient Deaths. PlosOne 2012;7:2.
- 2. Rapp GC. Doctors, Duties, Death and Data: A Critical Review of the Empirical Literature on Medical Malpractice and Tort Reform. Northern Illinois University Law Review 2006;26:439–468.
- 3. Jena AB, Seabury S, Lakdawalla D, Chandra A. Malpractice Risk According to Physician Specialty. New England Journal of Medicine 2001;365:629–636.
- 4.
Brusoni M, Trinchero E, Marazzi L., and Partenza I. Gestione, ritenzione e assicurazione del risk: alla ricerca di una prospettiva integrata. L’aziendalizzazione della sanità in Italia. Rapporto Oasi Egea 2012. Italian.
- 5. Tediosi F, Gabriele S, and Longo F. Governing decentralization in health care under tough budget constraint: What can we learn from the Italian experience?, Health Policy, 2009, vol. 90(2–3):303–312.
- 6.
Ritchey FJ. Medical Malpractice. The Wiley Blackwell Encyclopedia of Health, Illness, Behavior, and Society. New York: Wiley 2014;1387–1394.
- 7.
Hull JC. Risk Management and Financial Institutions. 4th ed. New York: Wiley; 2015.
- 8. Traina F. Medical Malpractice: The Experience in Italy. Clinical Orthopaedics and Related Research 2009;467:434–442. pmid:18985423
- 9. Baker JJ. Medicare payment system for hospital inpatients: diagnosis related groups. Journal of Health Care Finance 2002;28:1–13. pmid:12079147
- 10. Cooil B. Using Medical Malpractice Data to Predict the Frequency of Claims: A Study of Poisson Process Models with Random Effects. Journal of the American Statistical Association 1991;86:285–295.
- 11. Gibbons RD, Hedeker D, Charles S, Frisch P. A random-effects Probit model for predicting medical malpractice claims. Journal of the American Statistical Association 1994;89:760–767.
- 12. Nye BF, Hofflander AE. Experience Rating in Medical Professional Liability Insurance. Journal of Risk and Insurance 1988;55:150–157.
- 13. Randelli M, Maradei L, Markopoulos N, Castagna A. Incidence of Risk in Shoulder Surgery. La Chirurgia degli Organi di Movimento 2008;91:125–131. pmid:18320386
- 14. Sloan FA, Mergenhagen PM, Burfield WB, Bovbjerg RR, Hassan M. Medical Malpractice Experience of Physicians: Predictable or Haphazard? Journal of the American Medical Association 1989;292:3291–3297.
- 15.
Italian Ministry of Health. Rapporto annuale sull’attività di ricovero ospedaliero—Dati SDO 2012. Website: http://www.salute.gov.it/portale/temi/p2_6.jsp?lingua=italiano&id=1237&area=ricoveriOspedalieri&menu=vuoto.
- 16. Fattore G, Torbica A. Inpatient reimbursement system in Italy: how do tariffs relate to costs? Health Care Management Science 2006;9:251–258. pmid:17016931
- 17. Fetter RB, Shin Y, Freeman JL, Averill RF, Thompson JD. Case mix definition by diagnosis related groups. Medical Care 1980;18:1–53.
- 18.
Istituto Italiano di Statistica, Consumer Price Index www.istat.it/it/prezzi.
- 19.
http://tinyurl.com/malpractice1 (in Italian).
- 20.
http://tinyurl.com/malpractice2 (in Italian).
- 21. Lindgren G. Model process in non-linear prediction, with application to detection and alarm. Annals of Probability 1980;8:775–792.
- 22. Cirillo P, Hüsler J, Muliere P. Alarm Systems and Catastrophes from a Diverse Point of View. Methodology and Computing in Applied Probability 2013;15:821–839.
- 23. Craig CC. A new exposition and chart for the Pearson system of frequency curves. Annals of Mathematical Statistics 1936;7:16–28.
- 24.
Johnson NI, Kotz s. Continuous Univariate Distributions 1–2. New York: Wiley; 1970.
- 25. Vargo E, Pasupathy R, Leemis LM. Moment-Ratio Diagrams for Univariate Distributions. Journal of Quality Technology 2010;42:276–286.
- 26. Cirillo P. Are your data really Pareto distributed? Physica A: Statistical Mechanics and its Applications 2013;392:5947–5962.
- 27.
Kleiber C, Kotz S. Statistical Size Distribution in Economics and Actuarial Sciences. New York: Wiley; 2003.
- 28.
van der Wijk J. Inkomens- en Vermogensverdeling. Publication of the Nederlandsch Economisch Instituut 1939; 26. Dutch.
- 29.
Embrechts P, Kluppelberg C, Mikosch T. Modelling Extremal Events. Berlin: Springer; 2003.
- 30. Couturier DL, Victoria-Feser MP. Zero-inflated truncated generalized Pareto distribution for the analysis of radio audience data. The Annals of Applied Statistics 2010;4:1824–1846.
- 31. Lawless JF. Regression methods for Poisson process data. Journal of the American Statistical Association 1987;82:808–815.
- 32. Lee L, Lee K. Some results on inference from the Weibull process. Technometrics 1978; 20:41–45.