Additive integer-valued data envelopment analysis with missing data: A multi-criteria evaluation approach

Traditional data envelopment analysis (DEA) models assume that all the inputs and outputs data are available. However, missing data is a common problem in data analysis. Although several scholars have developed techniques to conduct DEA with missing data, these techniques have some disadvantages. A multi-criteria evaluation approach is proposed to measure the efficiency of decision making units (DMUs) with missing data. In this approach, analysts first estimate the upper and lower bounds of DMUs’ efficiency using the proposed I-addIDEA-U models (interval additive integer-valued DEA models with undesirable outputs) that can be applied to address integer-valued variables and undesirable outputs. Then, DMUs’ “relative” efficiency is evaluated using the proposed “Halo + Hot deck” DEA method (if there is no correlation between variables) or regression DEA techniques (if there is a correlation between variables). Finally, the multi-index comprehensive evaluation method is applied to determine which scenario (the lower bound of efficiency, the “relative” efficiency, or the upper bound of efficiency) should be selected. With a case study, it is shown that the proposed multi-criteria evaluation approach is more effective than traditional approaches such as the mean imputation DEA method, the deletion DEA method, and the dummy entries DEA method.


Introduction
Traditional data envelopment analysis (DEA) models assume that all the inputs and outputs data are available [1,2]. If the data related to some vital variables of decision making units (DMUs) are missing, traditional DEA models cannot be applied to measure the performance of these DMUs [3,4]. However, missing data is a common problem in data analysis [5].
To deal with the problem of missing data many methods have been proposed, e.g., deletion, imputation, and multiple imputation [6,7]. (1) The deletion methods (deleting all variables with missing data or all units with missing data) are easy to implement, but they may lead to biased estimates [8]. (2) The imputation methods mainly include the mean imputation, Hot deck imputation, and regression imputation [9,10]. Mean imputation means that the missing data a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 are replaced by the mean of the available data. It is simple, but the variability in the dataset is reduced [11]. In the Hot deck imputation method, missing data are replaced with the available values from a "similar" unit. Hot deck imputation is an effective method and has been widely used in practice [12,13]. Regression imputation is also a widely used method in which missing data are replaced with the values obtained from regression techniques, e.g., linear regression, logistic regression, polynomial regression, Probit regression, and Tobit regression [14,15]. (3) Multiple imputation is also an attractive method, which has been regarded as a more accurate and less biased method [16]. According to the multiple imputation method, missing data should be imputed based on the distributions and variability of other data elements in the sample [17]. (4) There are also some other methods for dealing with missing data, e.g., the maximum likelihood [18,19], Bayesian [20,21], and the expectation maximization [22,23].
Several scholars have researched DEA with missing data in different ways. O'neal et al. applied the deletion method and proposed DEA models (the deletion DEA) to measure DEA efficiency, but this approach was problematic because deleting DMUs may lead to changes in the other DMUs' relative efficiency [24]. Kuosmanen used dummy entries (zero for output variables and large enough numbers for input variables) to reduce the effects of DMUs with missing data on the relative efficiency of the other DMUs [25]. Gardijan and Lukač applied the dummy entries method and proposed DEA models (the dummy entries DEA) to measure the efficiency of the food and drink industry [26]. Interval DEA approach is another widely used method in which missing data are replaced with a lower bound and an upper bound so that the lower and upper bounds of efficiency can be evaluated [27][28][29]. Kao and Liu developed a fuzzy DEA approach that allowed analysts to use the available data to evaluate membership functions of fuzzy efficiency [30]. In fact, the fuzzy DEA approach is similar to the interval DEA approach. The difference between the two approaches is that the fuzzy DEA approach is based on the fuzzy theory while the interval DEA approach uses deterministic techniques [31][32][33][34][35][36]. Zha et al. developed a Halo DEA approach (Halo effect is a psychological term) to impute missing data [37]. Chen et al. presented a multiple linear regression analysis DEA approach (regression DEA) [38].
However, the above-mentioned techniques have a few disadvantages. First, they use simple imputation methods or deletion methods to handle missing data, which may lead to erroneous results. Second, while they modify basic radial DEA models to measure the efficiency of DMUs with missing data, they are unable to deal with integer-valued variables or undesirable variables. If decision-makers simply round up the DEA solutions to the nearest integers, the results may be wrong [39][40][41][42]. Integer-valued DEA models have attracted researchers because inputs and outputs can only be integer numbers in many cases. Lozano and Villa [39], Du et al. [40], Ajirlo et al. [41], Kordrostami et al. [42], Ren et al. [43], and other scholars have applied integer-valued DEA models to many fields, e.g., universities, Olympic games, and pallet rental companies. Measuring the efficiency of DMUs with undesirable outputs is another hot topic in DEA research. There are several approaches to handle undesirable outputs, e.g., weak disposability assumption [44], direction distance function [45,46], linear or non-linear monotonic decreasing transformation [47,48], treating undesirable outputs as inputs [49], and applying the SBM (Slacks-Based Measure) approach and proposing additive DEA models [50].
Another disadvantage of radial DEA models is that they have weaker discriminatory ability than non-radial DEA models [51,52]. Radial DEA models can only proportionally reduce inputs or increase outputs, while non-radial DEA models, e.g., the additive DEA [53], the enhanced Russell measure [54], and the slacks-based measure [55], do not need to make the assumption of proportional changes [56].
In this study, to handle missing data in DEA a multi-criteria evaluation approach is proposed based on the Hot deck imputation, regression imputation, Halo effect, interval DEA, integer DEA, additive DEA, DEA with undesirable outputs, and multi-index comprehensive evaluation. The main advantages of this approach are as follows. (1) The approach not only estimates the upper and lower bounds of DMUs' efficiency but also evaluates the "relative" efficiency of these DMUs based on the "Halo + Hot deck" DEA method (if there is no correlation between variables) or regression DEA techniques (if there is a correlation between variables). Therefore, the evaluation results are relatively diverse, which avoids the shortcoming of simple imputation methods as mentioned above. (2) A multi-index comprehensive evaluation system, which involves many important factors related to the variables with missing data, is established to determine which scenario (the lower bound of efficiency, the "relative" efficiency, or the upper bound of efficiency) should be selected. The multi-index comprehensive evaluation method guarantees that the resulting efficiency is more reliable. (3) Interval additive integervalued DEA models with undesirable outputs are proposed. These models can be used to handle integer-valued variables and undesirable outputs.
The rest of this paper is structured as follows. The multi-criteria evaluation approach (including the interval additive integer-valued DEA models with undesirable outputs and the "Hao + Hot deck" imputation method) is presented in Section 2. In Section 3, the proposed approach is applied to the pallet rental industry, and the effectiveness of the methodology is examined by analyzing error rates. Conclusions and the contributions of this paper are presented in Section 4.

Methodology
Assume that Q represents a group of DMUs. Each DMU i (DMU i 2 Q, i = 1, 2, q) consumes r inputs x ji (j = 1, 2, . . ., r) to produce m desirable outputs y pi (p = 1, 2, . . ., m) and t undesirable outputs z hi (h = 1, 2, . . .., t). Further assume that the data related to some DMUs' important variables are missing. The multi-criteria evaluation approach for measuring the performance of DMU k (DMU k 2 Q) is shown in Fig 1, and the corresponding algorithm is as follows.
First, the analysts should replace missing data with their values under DMU k 's (the DMU under evaluation, DMU k 2 Q) best condition (see Subsection 2.1, model 1). Second, analysts should replace missing data with their values under DMU k 's worst condition (see Subsection 2.1, model 2). Then, analysts can apply the interval additive integer-valued DEA models with undesirable outputs, which is proposed in Subsection 2.1, to calculate the upper bound of DMU k 's efficiency (y U k ) under DMU k 's best condition and the lower bound of DMU k 's efficiency (y L k ) under DMU k 's worst condition. Therefore, DMU k 's interval efficiency ½y L k ; y U k � can be evaluated.
Analysts should study the relationship between the variables with missing data and the other variables. There are several methods for correlation analysis, e.g. the scatter diagram method, Pearson's correlation coefficient, Spearman's rank correlation coefficient, and the least squares method [57]. If there is a correlation between variables, analysts should replace missing data with the values obtained from the regression imputation method and apply the DEA to calculate DMU k 's "relative" efficiency (y R k ). Otherwise, analysts should replace missing data with the values obtained from the "Halo + Hot deck" imputation method and apply the DEA ("Halo + Hot deck" DEA) to calculate DMU k 's "relative" efficiency (y H k ). The "Halo + Hot deck" imputation method is presented in Subsection 2.2. Regarding the regression imputation method, since it is well-understood, the paper does not provide a detailed explanation. As mentioned in Section 1, there are many regression techniques, so analysts should select the right regression technique based on the detailed analysis of variables, e.g., the type of variables and shape of the regression line. There must be y L k � y R k � y U k or y L k � y H k � y U k (see Subsection 2.2).
Step 3: Establishing a multi-index comprehensive evaluation system to finally determine DMU k 's efficiency.
analysts should establish a multi-index comprehensive evaluation system. The indicators should be related to the variables with missing data, and the evaluation method can be qualitative or quantitative. An example is proposed in Section 3. Decisions makers can rank all DMUs after they finally determine the efficiency of all DMUs.

Interval additive integer-valued DEA models with undesirable outputs
Assume that some of the inputs and desirable outputs can only take integer values. Following Du et al. [40] and Ren et al. [43], J NI and J I respectively represent the subsets of real-valued and integer-valued inputs, while P NI and P I respectively denote the subsets of real-valued and integer-valued desirable outputs. Hence, x ji 2 J NI (j = 1, 2, . . ., g) and x ji 2 J I (j = g + 1, g + 2, . . ., r) respectively imply DMU k 's real-valued and integer-valued inputs, while y pi 2 P NI (p = 1, 2, . . ., o) and y pi 2 P I (p = o + 1, o + 2, . . ., m) respectively indicate DMU k 's real-valued and integervalued desirable outputs.
Model (1) and model (2), which are interval additive integer-valued DEA models with undesirable outputs, are developed to measure the upper and lower bounds of DMU k 's interval efficiency, respectively. Additive DEA models are proposed because they are non-radial DEA models that can distinguish all inefficiencies [53]. To deal with undesirable outputs the SBM approach is applied and additive DEA models are proposed [50].
To calculate the upper bound of DMU k 's efficiency (model 1), the analysts should replace missing data with their values under DMU k 's best condition (as stated above), which means that analysts should replace DMU k 's missing data related to inputs, desirable outputs, and undesirable outputs with x L jk ¼ minðall x ji with precise dataÞ, y U pk ¼ maxðall y pi with precise dataÞ, and z L hk ¼ minðall z hi with precise dataÞ, respectively. If there are also some DMUs (DMU i 2 Q, i 6 ¼ k) with missing data besides DMU k , analysts should also replace their missing data related to inputs, desirable outputs, and undesirable out- . . .; r y pk 2 P I ; where λ i indicates the weight for DMU i ; s j À , s j IÀ , s p þ , s p Iþ , and s h À respectively represent the slack variables for real-valued inputs, integer-valued inputs, real-valued desirable outputs, integer-valued desirable outputs, and undesirable outputs, respectively;x jk ðj ¼ g þ 1; g þ 2; . . .; rÞ andỹ pk ðp ¼ o þ 1; o þ 2 . . .; mÞ are the targets for integer-valued inputs and integer-valued desirable outputs, respectively. Note that the superscript "U" and "L" respectively indicate the upper bound and lower bound values of the related variables.
To calculate the lower bound of DMU k 's efficiency (model 2), the analysts should replace missing data with their values under DMU k 's worst condition (as stated above), which means that analysts should replace DMU k 's missing data related to inputs, desirable outputs, and undesirable outputs with x U jk ¼ maxðall x ji with precise dataÞ, y L pk ¼ minðall y pi with precise dataÞ, and z U hk ¼ maxðall z hi with precise dataÞ, respectively. If there are also some DMUs (DMU i 2 Q, i 6 ¼ k) with missing data besides DMU k , as discussed above, analysts should also respectively replace their missing data related to inputs, desirable outputs, and undesirable outputs with . . .; r The mathematical notations used in model (2) are the same as those used in model (1). Different from traditional additive DEA models, both model (1) and model (2) are unit-invariant [58]. Model (1) and model (2) cannot provide the efficiency scores, so Eqs (3) and (4) are proposed to calculate the upper bound and lower bound of DMU k 's efficiency, respectively.
in which fl i �; s j À �; s j IÀ �; s p þ �; s p Iþ �; s h À �;x jk �;ỹ pk �g is the optimum solution resulting from model (1). There must be 0 � y U k � 1. y U k ¼ 1 implies that DMU k is additive-efficient under DMU k 's best condition because y U k equals to 1 if and only if all slacks variables are equal to 0. The greater value of y U k , the better performance of DMU k .
x jk �;ỹ pk �g is the optimum solution resulting from model (2). There must be also 0 � y L k � 1. y L k ¼ 1 implies that DMU k is additive-efficient under DMU k 's worst condition. The greater value of y L k , the better performance of DMU k .

"Halo + Hot deck" imputation method 2.2.1 Halo effect.
Halo effect is a psychological term proposed by Thorndike in 1920 [59]. It means that an individual's positive thoughts about a company (person, product, brand, and so on) in one area positively affect how he/she thinks of the company in other areas [60]. This theory can be applied to evaluate DMUs' relative efficiency. If DMU k 's relative efficiency (y N k ) is better than that of other DMUs' when not taking into account the variables with missing data (deleting the variables with missing data when measuring the performance of DMUs), it can be thought that this DMU's relative efficiency (y � k ) would also be better when taking into account the variables with missing data. Model (1) and Eq (3) (or model 2 and Eq 4) can be applied to calculate y N k by deleting all the symbols related to the variables with missing data. However, the Halo effect may lead to bias. To overcome this shortcoming, we propose the multi-criteria evaluation approach (See Fig 1).

"Halo + Hot deck".
According to the Hot deck imputation method, as mentioned in Section 1, the missing data should be replaced with the observed values from a "similar" unit. Therefore, based on the ideas of the Halo effect and Hot deck imputation, the missing data related to DMU k can be replaced with the values of a DMU with "similar efficiency y N k ". The "Halo + Hot deck" imputation method is as follows.
Based on the relative efficiency of all DMUs without considering the variables with missing data, a "similar" DMU whose relative efficiency is less than DMU k 's efficiency and a "similar" DMU whose relative efficiency is greater than DMU k 's efficiency can be found. Then, the missing data about DMU k can be replaced with the average of the two "similar" DMUs' related values. The missing data related to DMU k are not replaced with the values of the "closest" DMU because it may lead to larger errors. Note that there may be several DMUs that have the same efficiency scores as DMU k 's. In that case, analysts can just replace the missing data about DMU k with the average of these "same" DMUs' related values.

2.2.3
Measuring the "relative" efficiency. Model (1) and Eq (3) (or model 2 and Eq 4) can be applied to calculate the "relative" efficiency y H k based on the "Halo + Hot deck" imputation method (this method is called "Halo + Hot deck" DEA which means the "Halo + Hot deck" imputation method + the DEA approach), but analysts should set x L jk ¼ x H jk as well as There must be y L The superscript R indicates that the values of the variables with missing data are obtained from regression imputation methods.

Numerical illustrations
This section applies the proposed approach to analyze the efficiency of pallet rental companies. There is limited quantitative research in the pallet rental industry because the data related to this industry are not publicly available [61,62]. Therefore, it is necessary to propose an approach to evaluate the performance of pallet rental companies when some important data are missing, and this research is important to the pallet rental industry. Also, this industry involves undesirable outputs, e.g., pallet loss, and some of the inputs are integer numbers. The proposed approach is able to deal with these types of data. Each company uses two integer-valued inputs (employees x 1i and pallets x 2i ) to produce one realvalued desirable output (annual revenue y 1i ) and one real-valued undesirable output (annual pallet loss rate z 1i ), and the data related to these companies in 2018 are shown in Table 1 [43,62]. The data about x 1i , x 2i , and y 1i (unit: million U.S. dollars) are obtained from the official websites of these companies as well as other relevant websites, and the values of z 1i (unit: percent) are estimated by managers in these companies. Model (1) and Eq (3) (or model 2 and Eq 4) can be applied to evaluate the efficiency of these companies using these precise data, and the resulting efficiency (y P i ) is precise. The results are also shown in Table 1. Note that analysts should set

Data
To apply the proposed multi-criteria evaluation approach to this case, it is assumed that the data about some DMUs' annual pallet loss rates are missing (z M 1i ). Note that z M 1i represents missing data while z 1i indicates precise data. Twelve scenarios (l = 1, 2, . . ., 12) are considered. Scenario l indicates that the value of DMU i 's annual pallet loss rate is missing. For example, Scenario 4 represents that the value of DMU 4's annual pallet loss rate is missing. Then, the proposed approach can be applied to measure the efficiency of all companies (y � i ).
The effectiveness of the proposed approach can be estimated by the error rate ε that can be calculated by ε ¼ The lower the value of ε is, the better the performance of the approach should be.

Measuring the efficiency of pallet rental companies using the proposed multi-criteria evaluation approach
In this subsection, the proposed approach is applied to measure the efficiency of the twelve companies.
3.2.1 Interval efficiency. As stated in Section 2, analysts should first measure DMU k 's interval efficiency. The lower and upper bounds of DMU 5's annual pallet loss rate are 2 and 12, respectively, while the lower and upper bounds of the other DMUs' annual pallet loss rates are all 1 and 12, respectively. Tables 2 and 3 show the interval efficiency resulting from the proposed interval additive integer-valued DEA models with undesirable outputs (model 1 as well as Eq 3 and model 2 as well as Eq 4).
In Tables 2 and 3, the sub-scenario l-U, the sub-scenario l-H, and the sub-scenario l-L represent the efficiency of these companies under DMU k 's best condition, "Halo + Hot deck" condition, and worst condition, respectively. Therefore, the efficiency of DMU k in the three sub-scenarios is indicated by y U k ; y H k ; y L k , respectively. Note that the "Halo + Hot deck" DEA efficiency of DMU k (y H k ) is also shown in Tables 2  and 3 for the sake of clarity. X indicates the efficiency of the DMU k under estimation, and X represents the efficiency of DMU i (i 6 ¼ k) that changes with different values of DMU k 's missing data. DMU 2, DMU 10, DMU 11, and DMU 12 are fully efficient because their efficiency scores are equal to 1 in all scenarios. All DMUs are efficient under their own best condition. The value of DMU 2's annual pallet loss rate does not affect the ranking of these companies. Thus, analysts do not need to further evaluate the "relative" efficiency of these DMUs in Scenario 2. The values of some DMUs' annual pallet loss rates (i.e., DMU 2, DMU 5, DMU 6,  greater than 0.8, there is a relationship between variables. In the case study, all the R-squares are less than 0.8, so there is no relationship between the annual pallet loss rate and the other variables. It is worth noting that the outliers (the values of DMU 2) have been removed from the diagrams and the results also show that there is no relationship between the annual pallet loss rate and the other variables. In fact, Pearson's correlation coefficient and Spearman's rank correlation coefficient are also applied to analyze the relationship between variables, and the results are the same. Therefore, missing data should be replaced with the values obtained from the proposed "Halo + Hot deck" imputation method.   Model (1) and Eq (3) are modified (deleting all symbols related to undesirable outputs) and applied to measure the relative efficiency (y N k ) of all DMUs without considering the variable with missing data (the annual pallet loss rate). Table 4 shows the results. Table 5 shows the values of z H 1i obtained from the "Halo + Hot deck" imputation method. z M 1i is replaced with z H 1i . The annual pallet loss rates of DMU 3's "similar" DMUs, i.e., DMU 6 and DMU 8, are the same so that it is needed to employ another DMU (DMU 4). There are  four DMUs rank No. 1 so that DMU 5's "similar" DMUs include five DMUs, i.e., DMU 2, DMU 7, DMU 10, DMU 11, and DMU 12. DMU 9 ranks the 12th, so its missing data should be replaced with the average of the annual pallet loss rates of DMU 1 and DMU 8. Model (1) and Eq (3) are used to measure the "relative" efficiency y H k of all companies and the results are shown in Tables 2 and 3. 3.2.3 Establishing a multi-index comprehensive evaluation system to finally determine the efficiency of these pallet rental companies. The annual pallet loss rate can be affected by many factors. Experts who have researched the pallet rental industry for more than three years in the United States, the United Kingdom, and China were reviewed. They proposed the following multi-index comprehensive evaluation system to determine the values of the annual pallet loss rate (as shown in Table 6). "Experience" indicates how long a company has operated. The longer a company has operated, the better its performance would be in reducing the annual pallet loss rate. For example, if a company has operated for over 50 years, it can be regarded as the most experienced in reducing the pallet loss rate. Thus, this company's score in the indicator "Experience" is 10. "Information management technology" indicates the level of a pallet rental company using MIS (basic management information system), barcode, RFID (radio-frequency identification), PTS (pallets tracking system), and other techniques. If a company has applied all these techniques to control pallets, this company's score in the indicator "Information management technology" is 10. It means that the company has applied the most advanced information management technologies to reduce its pallet loss rate. "Team" represents a company's investments in human resources for reducing the pallet loss rate. "Non-professional team" means that the company has invested in human resources but there is not a professional team that dedicates to reduce the annual pallet loss rate, so its score in the indicator "Team" is 5. "Process improvement" indicates the level of a company's control of its business. If a company utilizes 6 sigma (6σ), i.e., the highest level, as standard practice, its score in the indicator "Process improvement" is 10.
The group of experts was asked to score each pallet rental company based on the multiindex comprehensive evaluation system. The results are shown in Table 7. Note that these experts did not know these companies' precise annual pallet loss rates. If the score of a company is below 20 (below 50% of the total score), this company is under the worst condition (DMU 6, DMU 7, DMU 8, DMU 9, DMU 10, and DMU 12). If the score of a company is between 20 and 32 (50%-80% of the total score), this company is under the "Halo + Hot deck" condition (DMU 3, DMU 4, and DMU 11). If the score of a company is greater than 32 (over 80% of the total score), this company is under the best condition (DMU 1, DMU 2, and DMU 5). For instance, if the value of DMU 1's annual pallet loss rate is missing, its efficiency should be y U 1 and the efficiency of the other DMUs should take the values in the sub-scenario 1-U (the first row, Tables 2 and 3). Finally, analysts can rank these pallet rental companies based on the efficiency obtained from the proposed multi-criteria evaluation approach.

Analysis
In order to examine the validity of the proposed multi-criteria evaluation approach, the results obtained from the proposed approach and those obtained from other methods are compared. Based on the proposed interval additive integer-valued DEA models with undesirable outputs, the deletion method (the deletion DEA), the dummy entries method (the dummy entries DEA), and the mean imputation method (the mean imputation DEA) are applied to measure the twelve pallet rental companies' efficiency in each scenario. The efficiency of DMUs obtained from the deletion DEA method (deleting the variable "annual pallet loss rate") has been shown in Table 4. According to the dummy entries method, analysts should use large enough numbers for the pallet loss rates of DMUs with missing data because the undesirable output is expected to be minimized. Therefore, the resulting efficiency of DMUs obtained from the dummy entries DEA method should be y L k (under DMU k 's worst condition), which has been shown in Tables 2 and 3. The efficiency of DMUs obtained from the mean imputation DEA method is shown in Tables 8 and 9.
Then, the error rates of the four methods, i.e., the multi-criteria evaluation approach (MEA), the mean imputation DEA method (MIM), the deletion DEA method (DM), and the dummy entries DEA method (DEM), can be calculated using the formulas proposed in Subsection 3.1. The results are shown in Table 10. The average error rate of the proposed multicriteria evaluation approach is the lowest (0.0460), while the average error rate of the mean imputation DEA method is the greatest (0.4405). The average error rate of the deletion DEA method is 0.2551, and the average error rate of the dummy entries DEA method is 0.3504. Therefore, the proposed multi-criteria evaluation approach ("interval additive integer-valued DEA models with undesirable outputs", the "Halo + Hot deck" imputation method, and the multi-index comprehensive evaluation method) is better than the other methods, and it can help analysts measure the efficiency of DMUs with missing data.

Conclusions
DEA, especially non-radial DEA, is a useful nonparametric technique to measure efficiency. DEA is a "data oriented" method so that analysts need to collect enough data. However, missing data is a common problem in data analysis. Therefore, it is necessary to develop effective methods to conduct DEA with missing data. The contributions of this paper are as follows. (1) Interval additive integer-valued DEA models with undesirable outputs are proposed, which enables analysts to handle integer-valued variables and undesirable outputs when measuring efficiency. (2) The "Halo + Hot deck" imputation method is presented to deal with missing data, which is simple and easy. (3) A multi-criteria evaluation approach is proposed to measure the efficiency of DMUs with missing data based on the "interval additive integer-valued DEA models with undesirable outputs", the "Halo + Hot deck" imputation method, and the multi-index comprehensive evaluation method. The proposed approach is applied to the pallet rental industry, and the case study proves that the proposed approach is more effective than traditional approaches such as the mean imputation DEA method, the deletion DEA method, and the dummy entries DEA method.
However, the paper still has some limitations. For example, (1) there are some other methods to deal with missing data, and the multiple imputation has been regarded as a more accurate and less biased method. It is worth combining the multiple imputation method and DEA to evaluate the efficiency of DMUs with missing data; (2) in the case study, only two inputs, one desirable output, and one undesirable output were selected because there are very few public data about the pallet rental industry. In the future, more data should be collected and the performance of pallet rental companies should be measured in more detail.