A window-DEA based efficiency evaluation of the public hospital sector in Greece during the 5-year economic crisis

The main objective of this study was to apply the non-parametric method of Data Envelopment Analysis (DEA) to measure the efficiency of Greek NHS hospitals between 2009–2013. Hospitals were divided into four separate groups with common characteristics which allowed comparisons to be carried out in the context of increased homogeneity. The window-DEA method was chosen since it leads to increased discrimination on the results especially when applied to small samples and it enables year-by-year comparisons of the results. Three inputs -hospital beds, physicians and other health professionals- and three outputs—hospitalized cases, surgeries and outpatient visits- were chosen as production variables in an input-oriented 2-year window DEA model for the assessment of technical and scale efficiency as well as for the identification of returns to scale. The Malmquist productivity index together with its components (i.e. pure technical efficiency change, scale efficiency change and technological scale) were also calculated in order to analyze the sources of productivity change between the first and last year of the study period. In the context of window analysis, the study identified the individual efficiency trends together with “all-windows” best and worst performers and revealed that a high level of technical and scale efficiency was maintained over the entire 5-year period. Similarly, the relevant findings of Malmquist productivity index analysis showed that both scale and pure technical efficiency were improved in 2013 whilst technological change was found to be in favor of the two groups with the largest hospitals.


Introduction
The effects of the 2007-08 financial crisis were strongly felt in Greece in the years that followed. In 2013, the economy entered the sixth year of recession, resulting in a substantial GDP decline. The main impact of the economic crisis was on the unemployment rate which rose eighteen percentage points from 9.6% (485,000 persons) in 2009 to 27.5% (1,330,000) in 2013 [1]. The main share of jobless workers was from the private sector which amounted to 769,000 lost jobs in the years 2008-2012 compared to 89,000 in the public sector. At  In the same period, an HIV epidemic outbreak was also recorded among heroin addicts owed mainly to budget cuts that led to the cancellation of many preventive programs (exchange of needles etc.) [12][13]. However, it should be noted that although drug use increased between the years 2007-2011, the per capita alcohol consumption in the same period decreased by 2.3% [13].
As far as mortality is concerned, the relevant data published by ELSTAT [1] showed that the number of deaths increased from 108,316 in 2009 to 111,794 in 2013. It is notable that the 116,670 deaths that occurred in 2012 was the largest recorded number since 1949 [16] whilst 2,000 of them (almost one third of the additional number of deaths) could be linked to austerity. However, it should be noted that the total number of dead and injured in road accidents has followed a continuously downward trend (20,097 in 2009 vs 16,054 in 2013) [1]. The life expectancy in the same period continued an upward trend which had been recorded in the pre-crisis years. Specifically, according to ELSTAT [1], the relevant index increased from 77.7 years in 2009 to 78.3 years in 2013 for men and from 82.8 to 83.4 years for women, amounting to average annual increments of 0.27 and 0.20 years respectively.
In 2013, the GDP declined by almost one quarter compared to 2007 whilst in the same period, health care spending followed a parallel trajectory. It is noteworthy that in the years from 2005 to 2009, where GDP increased by 19.2%, the corresponding total health spending rose by 41% [17]. The positive sign of the slope was reversed after the onset of the crisis. In the years between 2009-2012, the GDP declined by 16.4% which was accompanied by a 23.1% downturn in the total health care expenditure and 24.5% reduction in the public health spending (from 23.0 to 17.7 and from 16.1 to 12.0 billion € respectively) [18].
In terms of GDP percentage, total health care expenditure was increasing until it reached 9.9% of GDP in 2009 where it started declining to reach 9.1% of GDP in 2012. In the same period the GDP percentage of public health care expenditure was reduced from 7% in 2009 to 6.2% in 2012. Within this period and in order to fulfill loan conditions which require that public health care expenditures must not exceed 6% of GDP, The Ministry of Health had to go through a wide range of reforms aiming, among others, at the following goals: to achieve greater efficiency in all NHS services and, at the same time, to achieve a radical cut in expenses.
Specifically, the reform efforts were focused on two targets. The first was to merge neighboring public hospitals, as well as hospital clinics and departments, and the second to modernize the financing mechanism used by the unified, state-owned health insurer (EOPYY), by introducing a Greek version of Diagnostic Related Groups (DRGs [19]). Regarding the first reform, with the exception of 2-3 small hospitals which were integrated into larger units, hospital mergers were never implemented in Greece due to extensive political and social pressure by respective interest groups. As for the intended plan for EOPYY to finance public hospitals via the DRG system (known as KEN-DRGs in Greece), this is pending on the unlikely event that EOPYY can solve its inherent deficit problems and find sufficient resources to fulfill its intended purpose. Hence, public hospitals continue to be funded by The Ministry of Health through the state budget, which covers salaries and all current expenses. Apart from closed budgets, two horizontal measures were adopted by the Ministry to reduce public health care expenditures. One measure was an average 35% cutback in salaries since 2009, and the other an "until further notice" suspension of all public hiring. In light of these developments, the aim of this study was to investigate longitudinal efficiency of Greek public hospitals during the crisis period.
hospitals are geographically partitioned in seven administrative regional units. Each regional unit serves as the management link -at an intermediate level-between individual hospitals and the Ministry of Health. These hospitals constitute the backbone of the Greek NHS as they provide the bulk of secondary, tertiary as well as primary health care services to the majority of the population. Table 1 reports a summary of operational characteristics of Greek NHS hospitals. As it can be seen, there was a substantial increase in the demand for health care services from public hospitals as evidenced by the relevant increments in the total numbers of hospitalized cases (32.9%) and surgeries (9.2%) during the period 2009-2013. The increased demand for health care services in the public sector was due to a shift of patients from the private sector, mainly of lower income. In the 2009-2010 period, a decrease had been recorded in the total number of admissions in private hospitals by 25-30%, owed mainly to the inability of lower income groups to afford the relative high costs [6]. In addition, increased demand for dental and obstetric services in public hospitals was also recorded, two areas that had been traditionally covered by private providers/hospitals up until then [20].
At the same time, there was a substantial gradual reduction in the average length of stay (ALoS) from 4.75 days in 2009 to 3.86 days in 2013 (i.e. -18.7%). The increased number of admissions combined with the reduced mean length of stay accounts for an overall reduction of 8% in the total number of inpatient days for the same five-year period. It is also interesting to note that there was an 8.5% reduction in the number of laboratory tests between 2009 and 2013 after an intermediate peak of 5.1% which occurred in 2011. It should also be taken into account that since the onset of the crisis, certain nongovernment organizations such as Médecins du Monde and Medecins Sans Frontieres, which had been providing health care services mainly to immigrants, started to cover additional groups of the population including the poor, the unemployed and the uninsured [20].
According to ELSTAT [1], the percentage of persons with self-declared unmet needs for medical examination or treatment (due to several reasons including financial barriers, long waiting times and traveling distances, lack of time etc.) increased from 4.1% in 2009 to 11.2% in 2013. Taking into account that all citizens could visit almost at no charge either general practitioners or outpatient clinics in hospitals, Kentikelenis et al. [12] concluded that such reductions (during the first two years of the crisis) reflect most probably supply-side problems: hospital budgets were cut by 40%, many clinics were understaffed due to the suspension of public hiring, shortages in medical supplies were encountered rather frequently whilst patients in many cases had to bribe medical staff in order to be given priority especially in overloaded Table 1. Greek NHS data.   2009  2010  2011  2012  2013   Beds  33,693  34,917  34,577  35,020  34,092   Physicians  22,084  22,056  20,074  20,824  18,853   Other staff  64,135  68,429  62,609  61,742  57,934   Inpatient  hospitals with large queues. It has also been reported that although physician visits for issues related to chronic diseases have been largely met, this was accomplished with an increase in out-of-pocket expenditures and cuts in family budgets [21]. It is worth noting that the percentage of such illegal payments for health care services in Greece has been estimated to be more than 20% of the total private expenditure [6] whilst the out-of-pocket payments in Greece was one of the highest among OECD countries [2].
As it can be seen in Table 1, there was a tremendous cut in hospital expenses in the period 2009-2013. Specifically, total expenses in 2013 were cut by almost 52% since 2009. A closer look at the evolution of expenses over time reveals that the sharpest decline occurred between 2011 and 2012. In particular, almost 35 out of the 52 percentage points of the total cut took place in the transition from 2011 to 2012.
In the same period, as shown by Table 1, there was also a substantial reduction in the number of medical and other staff amounting to almost -10% and -15% respectively, reflecting the imposed suspension of recruitments in the public sector.

Efficiency assessment using Data Envelopment Analysis
Improving efficiency has become an increasingly important target for hospital managers. One of the most commonly used tools for efficiency measurement is Data Envelopment Analysis (DEA) [22]. DEA is applied to a set of homogenous units -the so-called decision making units, DMUs-and seeks to maximize each unit's efficiency as it is defined through the ratio of weighted sum of outputs over weighted sum of inputs. The concept behind the method is that it allows each unit to weigh production inputs and outputs in a way so as to achieve the maximum possible efficiency compared to the other units in the sample. Put differently, each unit is allowed, in essence, to consider its own production practice as best by weighting inputs and outputs in the most preferable way [23].
In mathematical terms, the afore-mentioned objective in conjunction with certain assumptions about the producible production points (i.e. efficient input-output combinations permitted by the technology) leads to the formulation of a fractional programming problem. The solution of an equivalent linear program identifies a set of units that are deemed as efficient and all other units are deemed as inefficient. Determination of fully efficient units (also known as best practice units) enables the construction of a piece-wise linear frontier, the so-called "best practice frontier" which isolates potentially efficient units (all points on the frontier) from inefficient ones (all productively attainable points surrounded -enveloped-by the frontier). All units residing on the frontier can be thought of as either the ones that produce a certain level of output using the lowest allowed amount of inputs or the ones than produce the highest attainable level of output using a certain amount of inputs. Thus, all these technically efficient units on the frontier are assigned an efficiency score of 1 (100%) whereas technically inefficient ones are assigned a positive score less than 1 (less than 100%). The percentage score of an inefficient unit can be derived in one of two alternative ways referred to as input-orientated and output-orientated efficiency scores. In input-orientation the score represents the maximum allowed equiproportionate (i.e. radial) reduction of its inputs that is still capable of producing the same level of output. Accordingly, in output-orientation the score reflects the maximum equiproportionate (radial) expansion of its outputs that can be produced using the same level of inputs. In geometric terms this can be thought of as the maximum allowed contraction or required expansion of the unit's input/output position ray until the unit has reached the efficient frontier.
It is evident that DEA is a non-parametric method which is based solely on the observed input-output combinations of the units in the sample without any assumptions concerning the form of the production function. The term Data Envelopment Analysis (DEA) was introduced by Charnes et al. [24] in their paper in 1978 which was based on the previous relevant research of Farell (1957) [25]. The method has been used in numerous sectors, including healthcare in which its first applications date back to the1980s [26][27]. An advantage of the method is its ability to handle multiple inputs as well as multiple outputs, without the requirement of a common denominator of reference.
Formally, given a set of n DMUs (DMU r , r = 1,2, . . .p, . . .n) each one consuming m inputs (x 1r , x 2r , . . ., x mr ) to produce s outputs (y 1r , y 2r , . . ., y sr ), the input oriented efficiency y Ã p of unit p, is given by the solution of the following linear programming problem: j¼1 u jp y jp À w p ; p 2 f1; 2; . . . ; ng: Where ε is a non-Archimedean infinitesimal value for forestalling weights u jp , v ip to be zeroed. The formulation stated above refers to the BCC model [28] which assumes variable returns to scale. Omitting w p in the above formulation the linear program is converted to what is known as CCR model [24] which assumes constant returns to scale. When the BCC model is applied, the sign of w p that comes out from the solution of the linear program identifies the nature of returns to scale.

Window-DEA analysis
A drawback of DEA is that efficiency measures are defined relative to the best practice frontier of the sample under examination and consequently DMUs deemed as efficient are efficient only in relation to others in the particular sample [29]. Therefore, it is not meaningful in general to compare the scores between two different samples as all calculations are based on different best practice frontiers whose differences are not known. Consequently, even the efficiency comparison of the same set of units in two different time periods is questionable. In order to overcome this subtle point of DEA that hampers the comparison of efficiency scores over time (i.e. efficiency changes), one could move from the so-called contemporaneous perspective, where a unique frontier is derived for each time period, to the so-called intertemporal perspective where a single common frontier which spans the whole period is defined [30]. The basic idea within this latter framework is to regard each unit as if it were a different unit in each of the reporting periods. Thus, the performance of a unit in a particular period is compared with its own performance in other periods as well as with the performance of other units. Although, within this latter perspective, a year-to-year comparison can be carried out, one has to bear in mind that this approach implicitly assumes that there are no substantial technical changes over the entire time period. (i.e. the technological frontier is fixed). This assumption, however, cannot always be considered valid, especially when long time periods are analyzed, since production conditions may have substantially altered between distant years.
A compromise between contemporaneous and inter-temporal analyses is the so-called window analysis where DEA is applied successively on overlapping time periods of constant width (called a window). Once the window width has been specified all observations within it are viewed and examined in an inter-temporal manner referred to as locally inter-temporal analysis [31]. The method was initially proposed by Charnes et al. [32] in order to measure efficiency in cross sectional and time varying data. Furthermore, when window-DEA is applied, the number of observations taken into account is multiplied essentially by a factor equal to window's width, which is quite useful when dealing with small sample sizes as it increases the discrimination capability of the method [33]. Therefore, two factors should be reconciled when choosing window width. The window should be wide enough to incorporate the minimum number of DMUs for the required discrimination but it should also be narrow enough to ensure that technological change within it is negligible and therefore it will not allow misleading or unfair comparisons between DMUs belonging to distant apart time periods [34].

The Malmquist productivity index
The assessment of productivity change over time together with its decomposition into efficiency changes and technology changes can be carried out using the so-called Malmquist index which was first introduced by Caves et al. [45] based on an idea of Malmquist [46]. Following Fare et al. [47] the input-based adjacent Malmquist index between time periods t and t+1 is given by Where d t CRS ðx t ; y t Þ and d tþ1 CRS ðx t ; y t Þ represent the distance functions [48] of the production bundle (x t , y t ) from the CRS technology frontiers in periods t and t+1 respectively whilst d t CRS ðx tþ1 ; y tþ1 Þ and d tþ1 CRS ðx tþ1 ; y tþ1 Þ represent the corresponding distance functions for the production bundle (x t+1 , y t+1 ), i.e.
for τ = {t, t +1} and φ = {t, t +1}and are equal to the reciprocal of the Farell measures of technical efficiency.
The MPI index defined above can equivalently be written as The first term is equal to the Farell technical efficiency measure at period t divided by the Farell technical efficiency measure at period t+1 and therefore reflects the efficiency change component in productivity change. In other words, the term indicates whether the hospital has moved closer to the CRS-frontier (i.e. catching-up to the frontier). The second term is equal to the geometric mean of the shifts in the CRS technology observed at the production bundles (x t , y t ) and x t+1 , y t+1 respectively and therefore reflects the technological change component in productivity change (i.e. shift in the frontier).
The component of technical efficiency change can be further decomposed into pure technical efficiency change and scale efficiency change. Assuming the variable returns to scale technologies T t VRS and T tþ1 VRS for the two-time periods τ = {t, t+1}, i.e.
and the corresponding distance functions the MPI can equivalently be written as Thus, the Malmquist productivity index can be thought of as the product of three terms representing the changes attributed to pure technical efficiency, scale efficiency and technology. Taking the reciprocal of the indices calculated above [47], values greater than unity are meant to indicate progress whilst values smaller than unity indicate regress. It should be noted however that technological change can be calculated in a different way using the so-called base period Malmquist index introduced by Berg et al. [49].
A review of the literature in the health sector in Greece shows that the method has been employed for the assessment of productivity change in hospitals [50][51], hospital clinics [52][53] and dialysis facilities [54].

Bias correction with bootstrapping
One of the disadvantages of DEA is that statistical inference is difficult to be applied due to the implied assumption of the method that the whole distance of a DMU from the efficient frontier reflects solely its inefficiency. In reality, this distance reflects inefficiency as well as sampling variability and noise because input and output data are normally subject to errors. Furthermore, given the assumption that noise does not exist, the estimated empirical technology can only be a subset of the true but unknown technology, and therefore DEA scores will be upwards biased [55]. In order to overcome this shortcoming of biased DEA scores due to sampling variability Simar and Wilson [56] proposed a methodology, which is based on bootstrap techniques [57] and allows determining the statistical properties of DEA estimators.
In the present study, their bootstrapping approach [58] was applied in order to calculate the bias-corrected Malmquist indices together with estimations of their confidence intervals.
Thus, the method allows for examination whether increases or decreases in productivity are significant in a statistical sense. In other words, the method allows for the conclusion of whether a result indicates real progress/regress or is a coincidence due to sampling variation.

Sample
The study examined the efficiency of a sample of 107 Greek NHS hospitals over the five-year period, 2009-2013. Twenty-four out of the 131 hospitals of the Greek NHS were excluded due to the idiosyncratic nature of health care services they provide [59][60][61]. More specifically, nine psychiatric, four anticancer/tumor, two dermatological, one maternity, one ophthalmological, one special diseases, one thoracic diseases, one pathological and four pediatric hospitals were excluded from the analysis in order to increase homogeneity of the remaining units. Going one step further towards increasing homogeneity and to permit their comparability, these 107 hospitals were additionally classified into four groups according to their size and the mixture/ range of services they provide [59,[62][63]. Thus, under this classification the four groups A, B, C, D were assigned with N A = 21, N B = 33, N C = 30 and N D = 23 members respectively with corresponding sizes spanning within the ranges of <85, 85-190, 190-400, >400 beds. At this point, it should be noted that three hospitals could not be classified as belonging always to the same group throughout the whole five-year period of the study due to reductions/augmentations in their bed capacity. In order to maintain exactly the same members within each group, those three hospitals were considered to belong to the group into which they were assigned most of the time (i.e. 3 or 4 out of the 5 years).
All relevant data were collected from sources of Ministry of Health and Welfare. The data were given in (annual) summarized form and therefore no medical records or any other information concerning patients/caregivers/staff were used in the study. The study was also approved by the Ethical Committee of the Hellenic Open University

Model specification
In DEA, the set of variables (inputs and outputs) that need to be included in the model should meet the following criteria: inputs should cover the full range of resources used; outputs should capture all activity levels and performance measures; furthermore, both, input and output variables, should constitute a set of factors common to all units under evaluation [33].
Thus, in accordance with published research [60,[64][65][66][67] the actual variables selected for this study are among some of the most commonly used inputs and outputs affecting hospital efficiency. On the input side, labor and capital inputs were aggregated as follows: labor inputs were measured in terms of absolute numbers of staff (in full time equivalents-FTE) classified separately as physicians and other hospital employees. The separation was considered necessary since the former are fundamentally different as they enjoy the primary "decision rights" for patient care. Physicians are the most dominant and influential components in the entire production process, with little or no interference from management and it has been reported [68] that their decisions, directly or indirectly, may eventually account for as much as 80-90% of the total health care expenditure in any system. Capital input, as in most relevant studies [60,[64][65][66][67], was proxied by the number of hospital beds assuming that invested capital per bed is similar throughout the hospitals in the sample. This assumption, in the present study, is expected to be valid considering the high degree of homogeneity of the four groups in the whole sample.
As regards outputs, patients' health gain is the ultimate measure of output against which hospital activity should be assessed. Since practical difficulties limit this outcomes approach [69], output is usually measured as an array of health care services that supposedly improve patients' health. Thus, in this study the multiplicity of hospital services was aggregated into three main outputs: inpatient cases, surgeries and outpatient visits. As far as inpatient services are concerned the number of cases was chosen rather than the (alternative choice of) number of inpatient days in order to avoid distortions imposed by variations in average length of stay due to higher/lower occupancy rates [43]. Furthermore, surgeries were counted separately as they constitute a fundamentally different part of inpatient services (compared to those of general medical) which usually account for a substantial volume of the total inpatient work. Thus, the total number of surgeries was included as a separate output variable in order to capture the workload of this special inpatient component. Finally, the number of outpatient visits was included as the third output variable representing the volume of outpatient services of the hospitals.
All DEA models in this study were formulated as input-oriented because of the public character of the hospitals which implies that hospital managers cannot seek for output increment but for reduced input usage instead [70].
Finally, a 2-year window width was chosen and therefore four overlapping windows were analyzed over the 5-year study period. The narrow window width was decided since it provides the minimum common ground that allows the year-to-year efficiency comparisons without the possible distortion that would have been imposed if a wider one had been chosen. At the same time, the 2-year width is considered sufficiently large -as far as the discrimination of the method when applied on small groups is concerned-since, it doubles the number of DMUs in each window. It is evident that the choice of a wider window could not be justified -especially within the years of the financial crisis of the country that are characterized by substantial changes in the health care sector affecting, among others, the applied technology. Thus, a wider window would most likely have led to unfair or non-realistic comparisons among hospitals in distant apart years.

Empirical results
Descriptive statistics for input and output variables for the four groups of hospitals over the four 2-year windows are presented in Table 2.
Average year-specific and window-specific VRS efficiency scores are presented in Table 3 which has been constructed in the following way [71]. Columns represent years and rows represent windows. Thus, the intersection of a row with a column represents a value in the context of a year within a specific window. It is clear that a hospital can have different efficiency scores for the same year in the context of different windows. More specifically, for a 2-year width window, each year will participate twice in two adjacent windows (excluding the first and last years of the study period which participate only once). Consequently, when reading the table vertically (following the so-called column view) one can see possible efficiency alterations for a hospital in the same year measured against the efficient frontiers of the two different windows it participates. Thus, any efficiency difference, in essence, reflects the impact on the efficient frontier due to changing half of the units (by adding a later year and removing the earliest one from the window) and therefore vertical fluctuation provides an indication for the stability of efficiency results for each year across the two different windows it participates. According to Cooper et al. [23], a hospital that is efficient in one year regardless of the window is said to be stable in its efficiency rating.
On the other hand, when reading the table horizontally one can see how the efficiency of a hospital changes from one year to another within the same window. Thus, these row views, in essence, make it possible to determine efficiency trends (i.e. whether a hospital exhibits    improving, steady or deteriorating efficiency in the second year of a window against to the corresponding one in the first year). As noted in [72] the observation of "stability" and "trend" in window analysis reflects simultaneously both the absolute performance of a hospital over time and the relative performance of that hospital in comparison to the others in the sample. Average values of row and column values, for each section, are presented in the bottom row (labeled "common year") and in the rightmost column (labeled "entire window") respectively. The results in Table 3, for average year-specific and window-specific efficiency scores suggest that there is considerable room for improvement. More specifically, the lowest and highest In addition, the VRS efficiency scores for groups A and B show the highest and lowest spread respectively, as evidenced by their corresponding standard deviations (and minimum values not presented in the table). As it can also be seen from the Table 3, the window-specific efficiency scores appear to have an increasing trend until window 2011-12 after which they are reversed and start to deteriorate.
Efficiency trends within windows, on individual hospital basis, are shown in Table 4 which has been constructed in the following way: There are four columns per group corresponding to the four windows. Three symbols "%", "&" and "$", are used to denote improvement, deterioration or steadiness of a hospital's efficiency respectively, when going from the first to the second year within the same window. In addition, the symbol "+" is used in conjunction with these three symbols to denote a fully efficient unit in the implied year on the left or right hand side of the arrow. It is clear that only the three combinations "%+", "+&", and "+$+" may arise. The first one, "%+", will be used when efficiency improvement starts from a non-efficient unit and leads to a fully efficient unit. Symmetrically, the second one, "+&", will be used when efficiency deterioration starts from a fully efficient unit and leads to a non-efficient unit. Finally, the third one, "+$+", will be used to denote the        transition from a fully efficient unit to an also fully efficient unit. An extra summary column to the right of the four window-specific columns (labeled "inter-temporal trend") is used in order to designate a constant inter-temporal (all-windows) behavior of a unit whenever there is one. Thus, if the same trend symbol appears for a hospital in all four windows the corresponding symbol (%, &, or $) is placed in this summary column. In addition, the extra symbol " p " is placed in this column whenever a hospital manages to be fully efficient in at least one year within every window. Therefore, it is evident that the combined presence of the " p " together with the symbol $ (i.e. $ p ) indicates a fully efficient hospital throughout all years within all windows. Similarly, the presence of symbol " p " either sole or accompanying the symbols % and &, indicates an (inter-temporal) almost fully efficient unit which we shall call a "semi-efficient" unit. Finally, at the bottom of the table, a summary of column totals shows the number of occurrences of each trend symbol. It is worth noting that the total number of occurrences of the "+" symbol on the left (or right) side of a trend arrow coincides with the total number of fully efficient hospitals in the year implied on the corresponding side of the arrow. For example, as can be seen in summary section of group B in Table 4, the symbol "+" appears 14 times on the right of trend arrows (9 x "%+" and 5 x "+$+") in window (column) 2010-11. There are, therefore, 14 VRS fully efficient hospitals (within window 2010-11) in the year which is implied on the right of the particular trend arrow which, in this case, is 2011. Similarly, there are 7 (5 x "+$+" and 2 x "+&") fully efficient hospitals in year 2010 within window 2010-11. Furthermore, it is evident that 5 hospitals (5 x "+$+") managed to be fully efficient in both years within window 2010-11. Some interesting points emerge from a careful examination of Table 4. In terms of interwindow efficient units, as can be seen from the contents of the summary column for each group, the following apply: There are nine semi-efficient and two inter-efficient hospitals in group A. In group B, there is only one inter-efficient hospital and three semi-efficient ones. This is also the group with the lowest number of inter-temporal efficient units (i.e. semi-efficient plus inter-efficient). Group C has five semi-efficient hospitals and two inter-efficient ones while in group D there are seven semi-efficient and two inter-efficient units. In terms of inter-window trend (i.e. constantly upward/downward units) there are only three hospitals that appear to be trending always upward. These are hospitals B23, C15 and D23. It should be noticed also that the two smaller groups (A and D) are the ones with the highest number of inter-window efficient units. Table 3 summarizes also the results of average year-specific and window-specific scale efficiency. As can be seen, year-specific scale scores are relatively high and lie within the ranges  Fig. 1 depict graphically the average scale efficiency trend form year-to-year. It is evident that there is a downward dominating tendency for all groups in window 2009-10 (all groups deteriorate slightly, with the exception of group B that shows an inappreciable increment). In the following window, 2010-11, all groups appear to have a sharp upward trend which is followed by a more or less downward trend in window 2011-2012. More precisely one can see that groups B and D in window 2011-12 show a relatively sharp deterioration whilst groups A and C seem to remain almost constant.
Finally, in the last window 2012-13, although there is no clear common trend for all groups it is evident that there is an upward tendency since, with the exception of group B, all other groups show a clear improvement in their average scale efficiency scores when going from year 2012 to 2013. The sixth section of Table 3 presents the number of scale efficient hospitals (CRS). As can be seen the maximum number of scale efficient hospitals is achieved (on average) in the two instances of year 2011 for all groups (with the slight exception of group A). Finally, as far as scale inefficiency is concerned, one should be able to distinguish whether it is attributed to increasing or decreasing returns to scale. The last section in Table 3 summarizes the detailed findings concerning the nature of scale returns for each hospital. The contents of this section show that, with the exception of group A, for which relevant findings are almost equally divided, there is a clear dominance of hospitals operating at increasing returns to scale (IRS). Table 5 presents the (geometric) average change of the estimations of the input-based Malmquist productivity index together with its decomposition into the components of efficiency change "Eff" (i.e. catching-up to the frontier) and technical change "Tech" (i.e. shift in the frontier). The index of the efficiency change component "Eff" is further decomposed into its two constituting indices of pure technical efficiency change "Pure.eff" and scale efficiency change "Scale". The results are presented along with their 95% confidence intervals that were calculated applying the bias-corrected and accelerated (BCα) bootstrap methodology introduced by Efron and Tibshirani [57]. Since Malmquist indices are given in geometric rather than arithmetic means, the methodology was applied in the way described by Atkinson and Wilson [73] where the bootstrap procedure is initially used to calculate the confidence intervals for the arithmetic mean of the log(index) values and subsequently take their exponential values in order to derive the confidence intervals for the geometric mean.
As can be seen from the empirical results of the Table 5, all groups of hospitals achieved statistically significant (at the 5% level) increases in productivity between 2009 and 2013 with the exception of group B that is shown to exhibit a (non-statistically significant) slight decrease. It is interesting to note that the larger hospital set (group D) is shown to exhibit the greater (statistically significant) productivity change by 26.8%. However, when the total productivity change is decomposed into efficiency and technical change, it appears clearly that the growth in productivity is primarily attributed (20.2%) to a progressive shift of the efficient frontier and secondarily (5.5%) to efficiency improvement. The results indicate that the dominance of the technology versus efficiency change is reversed for the other groups of hospitals. Specifically, in groups A and B, the contribution of efficiency change in productivity is positive by 11.8% whilst the corresponding contribution of the technology change component is negative by 2.1% and 11.1% respectively. Similarly, the statistically significant productivity improvement by 7.1% in group C is mainly due to efficiency improvement (4.7%) rather than to technology progress (2.4%). Summing up, the results in Table 5 reveal that all indices have a positive contribution (>1) the only exception being the technology change index in groups A and B where it appears to participate negatively. However, only in group B does a negative contribution lead to a marginal (non-significant) productivity decline by 0.7%.
On an individual basis, the values of the Malmquist productivity indices for groups A, B, C and D are depicted in the four diagrams depicted in Figs 1-4. The hospitals have been arranged clock-wisely in ascending order according to their MPI values (blue line) whilst the first occurrence of an MPI value >1 is denoted by the yellow dots. The MPI line is surrounded by two spirals that correspond to the bootstrap calculated for upper and lower values for the 95% confidence interval. Furthermore, the geometric MPI mean for each group is depicted as a yellow circle. From the diagrams, it is easily deducted that the number of hospitals that experienced overall productivity progress (MPI>1) for the groups A, B, C and D are 14/21

Discussion
The main objective of this study was to apply DEA to measure the efficiency of the Greek NHS hospitals during the period 2009-2013. The hospitals were categorized and allocated into four separate groups with common characteristics in order to increase their homogeneity. The DEA-Windows method was chosen for the assessment of their efficiency since (a) it leads to increased discrimination on the results especially when applied on small samples and (b) provides a means for the year-to-year comparison of the results. At the same time, the number of DMUs is doubled since, in essence, every actual hospital participates twice in each window and therefore the discrimination of the method is improved.
The findings showed that all four groups of hospitals were operating at relatively high levels of technical and scale efficiency over the whole period. In general, mean pure technical efficiency (VRS) for all groups was found to span approximately a range of values from 85.5% up to almost 94%. Mean scale efficiency was also found to lie within the range 85.9%-94.6%.
The analysis also concluded that there are only seven hospitals that can be considered allwindows best performers. Two of them belong to group A (A6, A15), one (B28) is a member of group B and two (C12, C26) are members of group C and finally two (D5, D13) belong to group D. These seven hospitals are the ones that managed to maintain their full pure technical efficiency in all eight instances of their evaluation against the four different window frontiers. This finding, therefore, suggests that hospital managers and policy makers should pay attention and examine closely and thoroughly their applied practices and technology profile since they can serve as benchmarks for the others. In addition, apart from this small number of the seven all-window best performers, the DEA-window analysis identified also a respectable number of all-window very good performers which we called "semi-efficient". These are the hospitals that managed to remain fully efficient at least once in every window. It is worth noting that when their results were scrutinized on an individual hospital basis it was found that almost all of them managed to maintain a very high VRS efficiency score in the years that they were not fully efficient. On the other end of the spectrum, the analysis also reported 33 hospitals as being consistently inefficient over all windows. These are 33 hospitals that never managed to achieve a pure technical efficiency score of 100% in any of the years and they are distributed as 6, 14, 7 and 6 in the four groups respectively. Although not all of them can be considered as all-windows worst performers the worst ones are indeed among them. From a managerial point of view, it is clear that this group of hospitals needs also to be scrutinized closely in order to identify peculiarities and possible sources of inefficiency. Hospitals A2, B14, C2 and D9 are the ones with the lowest average efficiency scores (65.8%, 73.3%, 57.7%, 62.9% respectively) over all windows. Among the factors that may explain this finding is the fact that all output/input ratios of these hospitals are among the lowest of their groups throughout all years. Furthermore, hospital A2 is shown to have the lowest occupancy rate (29%) whilst hospitals C2 and D9 exhibit almost a twofold ALoS compared to the average values of their groups. Hospital A2 is located on a rather small island and therefore the low occupancy rate is expected due to the low population. On the other hand, hospitals C2 and D9 are big hospitals located in Athens and are among those covering the more difficult and complicated cases that require longer stays.
As far as efficiency trend is concerned, the finding that emerged from window analysis is the following. Although the results do not reveal the existence of a clear and consistent intertemporal trend in any of the four groups there is, however, an indication that year 2011 constitutes a turning point in the whole 5-year period since the upward efficiency trends for all groups in window 2010-11 are turning downwards in window 2011-12. A similar finding is documented in another study [61] according to which the middle-sized hospitals (100-400 beds) of the Greek NHS exhibit efficiency improvement in the period 2009-2011.
Empirical results also showed that the hospitals under study were operating at respectable levels of scale efficiency from 85.9% up to 94.6%. Detailed results, summarized in Table 3, show that a small number of hospitals were operating at optimal size (characterized by constant returns to scale), though many others were operating close to their optimal size. Furthermore, by examining individual scale efficiency findings, it was concluded that the hospitals under study exhibit a mix of decreasing and increasing returns to scale at current levels of output. Interestingly, the pattern of scale inefficiency indicates that most of the hospitals are operating in an area of increasing returns to scale implying that they could benefit from increasing the scale of their operations. At this point two things need also to be taken into account. First, the size of each hospital should be interpreted relatively in the context of the group it belongs to and therefore it should not be seen as an absolute "big" or "small" number. Second, scale inefficiency for many hospitals account for as little as 5% or even less. This implies that the potential for their improvement through resizing is rather limited. These findings suggest that hospital managers and policy makers should primarily focus on addressing the technical inefficiency issues before examining ways for a possible restructure of their scale of operations. Besides, it should not be overlooked that technical improvement is more controllable and can be addressed in the short-term without requiring the prior change of scale.
Finally, our findings showed that technical and scale efficiencies of all groups were improved at the end of the 5-year period although these were difficult years of the financial crisis. At this point, it is interesting to examine these results against the ordinary indicators used by hospital managers and policy makers. As can be seen in Table 1 for instance, the indicators of average length of stay and occupancy rate have been improved (-18.8% and +6.8% respectively). The same applies to the indicators for medical and other staff resources per inpatient case which were increased by 55% and 47% respectively. Going one step further, we examined all nine possible indicators that can be formed by the ratios of the three outputs when combined with the three inputs (i.e. the six variables that were used in DEA model). Simple comparisons of year's 2013 values, against those of 2009, showed that eight out of nine indicators significantly improved, the only exception being the (meaningless) ratio of outpatient visits per bed, which was reduced by 10%. Thus, it can be argued that the findings of the study are in line with the whole "macro-picture" depicted in Table 1. The fact that the efficiency improvement between 2009 and 2013, as measured by DEA, is not of the extent implied by the large increment of ordinary indicators can be mainly attributed to its inherent characteristic of relativity in comparisons. The method is capable to perform only relative measurements of efficiency which as such are valid only inside the "borders" of the particular sample. For instance, the improved performance that would be expected by a plain +5% increment throughout all units' outputs, while keeping all their inputs constant, would not be captured by DEA. On the other hand, the overall augmented DEA scores in year 2013, compared with the ones five years before, indicate that eventually some inefficient hospitals in 2009 managed to operate closer to 2013's efficient frontier defined by their counterpart efficient ones regardless of the relative possible shift and final distance between the two borders which cannot be accounted for by the method. It is evident therefore that DEA findings are complementary to the ones that can (and should) be derived by other means of performance evaluation.
In this context, the productivity of the hospitals between the first (2009) and last (2013) years of the study was also assessed by means of the Malmquist productivity index so that the changes in technology (reflected by shifts in the technology frontier) can be captured as well. Thus, the overall productivity change was subsequently decomposed into technological and efficiency changes whilst the latter was further analyzed into scale and pure technical efficiency changes. The relevant findings were in line with the ones from window analysis as all efficiency related indices were found to be greater than unity meaning that both scale and pure technical efficiency were improved in 2013. As far as the technological change is concerned, the findings revealed progress for the groups of larger hospitals (B and C) and regress for the smaller (groups A and B). The group D with the largest hospitals (>400 beds) experienced the highest technological progress (+20.2%) whilst the group B with the medium to small hospitals (85-190 beds) exhibited the highest regress (-11.1%). On the other hand, the comparisons of efficiency changes showed a twofold improvement for the medium-to-small sized hospitals (groups A+B) compared to the medium-to-large ones (groups C+D). It is therefore deducted that productivity change in larger hospitals is mainly related to changes in technology, as opposed to the smaller ones for which changes in efficiency appears to have higher contribution. It could be argued that the finding is compatible with the fact that larger hospitals are usually equipped with the latest medical technological innovations and personnel and therefore their productivity growth is heavily dependent on them. A similar finding is also reported by Gannon [74] who examined the productivity growth and efficiency in the production of hospital care in Ireland from 1995 to 1998.
Summing up, it can be concluded that there has undoubtedly been a significant performance improvement due to the large increments of outputs and the corresponding simultaneous large decrements in inputs. At the same time, there was also a respectable efficiency improvement meaning that technically the "center of efficiency mass" of the whole system, viewed as a unit, was elevated positively.
The present study has some limitations which should be taken into account. First, patient severity which may influence utilization and therefore measured efficiency has not been incorporated directly in the study. The lack of relevant data did not allow adjusting outputs for variability in case mix across hospitals. This means that efficiency results may be biased since all outputs are considered equivalent across all facilities. Although it is clear that not all of them are equivalent, it can be considered that this limitation has been adequately remedied as all "special" hospitals were excluded from the study whilst the rest of them were further allocated into four fairly homogenous samples in terms of their size and mixture of services they provide (same clinics/specialties). Consequently, the four groups can be considered as consisting of hospitals with similar case mix. Second, it is worth noting that although the technical aspect of efficiency variations has been covered adequately, the equally important efficiency component that relates to quality variations has been left out in this study due to lack of appropriate and reliable data. It is clear that the establishment of a mechanism for collecting more detailed data in a centralized and systematic way will resolve the afore mentioned limitations and undoubtedly improve the completeness and quality of future efficiency studies.

Conclusion
The fact that health care expenditures represent a substantial proportion of total public expenditure places an enormous pressure to cut costs. Towards this objective in the last five years, the Ministry of Health has an ongoing program of reforms in the public hospital sector. On the other hand, cost containment in the health care sector should be carried out without compromising the quantity and quality of services produced. This is even more pertinent in Greece, given the current economic crisis. One way to compensate for reduced budgets in order to maintain or even improve the level of health care services offered is through identification and elimination of possible sources of inefficiency. It is evident, for instance, that once inefficiency has been eliminated, the saved resources could be devoted to cover other areas of the health care system (improved quality care to patients, innovative technology, staff training). Hence, the establishment of specific and thoroughly researched criteria for measuring efficiency is extremely important. In this context, the present study constitutes an inter-temporal efficiency map for the technical performance of hospital activity in the context of the Greek NHS throughout the five-year period 2009-2013. The empirical findings can be used as a useful guide for both hospital managers and policy makers in their process to improve the health care system, in favor of the true beneficiaries, which are the patients and society in general.