A novel two-phase robust portfolio selection and optimization approach under uncertainty: A case study of Tehran stock exchange

Portfolio construction is one of the most critical problems in financial markets. In this paper, a new two-phase robust portfolio selection and optimization approach is proposed to deal with the uncertainty of the data, increasing the robustness of investment process against uncertainty, decreasing computational complexity, and comprehensive assessments of stocks from different financial aspects and criteria are provided. In the first phase of this approach, all candidate stocks’ efficiency is measured using a robust data envelopment analysis (RDEA) method. Then in the second phase, by applying robust mean-semi variance-liquidity (RMSVL) and robust mean-absolute deviation-liquidity (RMADL) models, the amount of investment in each qualified stock is determined. Finally, the proposed approach is implemented in a real case study of the Tehran stock exchange (TSE). Additionally, a sensitivity analysis of all robust models of this study is examined. Illustrative results show that the proposed approach is effective for portfolio selection and optimization in the presence of uncertain data.


Introduction
The portfolio selection and optimization problems are two of the main branches of studies in investment management. Extensive researches have been done on the portfolio selection problem from different viewpoints [1][2][3]. The most important research in this area has been by Markowitz [4]. He presented the concept of diversity in the portfolio selection problem. In the original Markowitz's [4] model, the portfolio selection problem is developed by only two criteria, i.e., risk and return. However, the decision to purchase a stock and select a portfolio of stocks can be more difficult since many attributes must be considered simultaneously. Some of these attributes may include the rate of return, the rate of liquidity, systematic risk, non-systematic risk, financial ratios, etc. Decision-makers (DMs) and investors can use the multi-criteria decision making (MCDM) approach to consider more than two criteria in selecting stocks [5].
Data envelopment analysis (DEA) is one of the popular and powerful MCDM approaches applied to reach this goal. DEA estimates the relative efficiency of decision-making units a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 (DMUs) considering the multiple inputs and multiple outputs [6][7][8]. DEA can be implemented in portfolio construction by measuring stocks' efficiency to recognize good stocks and filter bad stocks. It should be noted that in classic DEA models, each DMU could specify a set of weights that show it in the most favorable condition in comparison to other DMUs. This flexibility in choosing weights for each DMU caused that the efficiency of stocks to be considered optimistic. Thus, to propose the conservative approach and resolve this issue, after filtering the undesirable stocks and detecting the most desirable stocks, it is needed to reevaluate the qualified stocks in another phase in order to assign the amount of investment in each stock.
Another point that should be considered in the proposed approach for portfolio construction is the uncertain nature of parameters [9][10][11][12]. Because in the real-world, we face uncertain data, and one of the most important features of financial markets is their embedded uncertainty. Also, one of the most important assumptions in DEA is that the measured data are certain. However, a little bias or deviation in data's values can cause significant differences in the results. In a worst-case, we will face infeasible solutions. Especially when the efficiencies of units are close, it is essential to develop a procedure and models for ranking the stocks and, consequently, decision-making about weights of the stocks in the portfolio that is capable of being employed under uncertainty. Robust optimization (RO) methodology is one of the popular methods that can be used to deal with uncertainty [13][14][15].
The goal of the current study is to propose a robust two-phase approach for portfolio construction problem by using data envelopment analysis and robust optimization approaches. In the first phase, the efficiency of all stocks that can be invested, are evaluated and measured. At the end of this phase, only the stocks that pass the filter of the investor are qualified for a candidate to be invested in the second phase. In this phase, DEA models are used. Then, in the second phase, the amount invested in each qualified stock is decided, and finally, the portfolio will be created. In this phase, mean-semi variance-liquidity (MSVL) and mean-absolute deviation-liquidity (MADL) models are used. It should be noted that in each phase, uncertainty is considered by a robust optimization method. Finally, the proposed approach of paper will be implemented in a real case study of the Tehran stock exchange (TSE).
The main advantages of the proposed approach in this study can be summarized as follows: (1) the presented approach can be applied in the presence of uncertain data, (2) computational complexity of portfolio optimization is decreased by the first phase in order to satisfy cardinality constraint, (3) conservatism levels of the investment process is increased using of twophases method and considering uncertainty, (4) all candidate stocks for investment are comprehensively assessed from different financial aspects and criteria by employing the MCDM approaches.
The rest of this paper is organized as follows. The literature and research gaps are reviewed in Section 2. The nomenclatures and background of the paper is explained in Section 3, which contains the classic portfolio models, basic DEA models, and main robust optimization approaches. Two phases approach for portfolio construction problem of this research is presented in Section 4. The proposed approach for the portfolio selection problem is implemented for a real case study of the Tehran stock exchange that will be presented in Section 5. All of the proposed models have been studied using sensitivity analysis in Section 6. Finally, the conclusions of this study and some directions for future research are provided in Section 7.

Literature review
In this section, the literature review for robust DEA as well as robust portfolio selection and optimization will be introduced. Moreover, the literature gaps and characteristics of this study will be highlighted.

Robust data envelopment analysis
Sadjadi & Omrani [16] were the pioneer researchers that worked on robust data envelopment analysis (RDEA) model with consideration of uncertainty on output parameters for measuring the performance of Iranian electricity distribution companies. In the last decade, the application of RDEA approach is increased more and more in different real-world problems and case studies. A more detailed classification of the most important RDEA studies is illustrated in Table 1 by considering three characteristics: DEA model, uncertainty set, and application. The characteristics of our work have also been presented in the last row of Table 1.

Robust portfolio selection and optimization
There are some practical models and studies in robust portfolio selection and optimization (RPSO) problem. Ben-Tal et al. [51] initially introduced a robust model for multi-stage portfolio (asset allocation) problems. According to the applicability and effectiveness of robust optimization in investment problem, proposing and applying RPSO models have increased in recent years by many researchers [52][53][54]. A more detailed classification of the most important studies of robust portfolio selection and optimization is introduced in Table 2 by considering three characteristics: investment model, uncertainty set, and research feature. Also, the characteristics of our work have been illustrated in the last row of Table 2.
As it can be seen in the last row of Tables 1 and 2, in this paper, a new RPSO approach will be proposed. Notably, this approach consists of two phases: the first phase is the application of robust data envelopment analysis models to qualify efficient stocks and the second phase is the application of robust portfolio optimization models in order to construct an optimal portfolio.

The nomenclatures of paper
The indices, parameters, and decision variables are described as follows:

Classic portfolio models and risk measures
The first method in portfolio selection is proposed by Markowitz [4]. The mean-variance (MV) model for solving the portfolio selection problem is as Model (1): As shown in Model (1), the variance criterion is used as a risk measure for portfolio. It should be explained that variance as a risk measure for portfolio selection penalizes both returns above and below expected return. Markowitz [91] suggested semi variance (SV) as a downside risk measure that quantifies possibilities of return below the expected return. The definition of semi variance risk measure is as Eq (2): To solve the mean-variance model, DMs need the covariance matrix that estimation of this matrix is difficult with the real-world data, but by using of the mean-semi variance (MSV) model, it is not required to compute the covariance matrix and the joint distribution of stocks is needed to be computed.
Since the original Markowitz's [4] model is a quadratic programming (QP) model and it is difficult to be solved for large data sets, Konno & Yamazaki [92] proposed absolute deviation (AD) instead of variance as a risk measure for portfolio selection. The mean-absolute deviation model (MAD) is a linear programming (LP) model and reduce computational time. The definition of absolute deviation is as Eq (3): This risk measure quantifies the deviation from the expected return and by using MAD model, it is not required to compute the covariance matrix.

Data envelopment analysis
Data envelopment analysis was proposed by Charnes et al. [93] for the first time and it is based on Farrell's [94] idea. This methodology is a non-parametric technique for performance evaluation and ranking the homogeneous decision-making units. Charnes et al. [93] proposed the first DEA model that based on the constant returns to scale (CRS) assumption and called the CCR model. Then, Banker et al. [95] developed CCR model based on the variable returns to scale (VRS) assumption and called the BCC model. The CCR and BCC models are radial projection constructs. Charnes et al. [96] proposed the DEA model by considering simultaneously both input minimization and output maximization which is called Additive (ADD) model. It is worth noting that CCR, BCC and Additive models are radial, radial and non-radial models, respectively.
With respect to CCR, BCC and Additive models are basic and popular DEA models, in this research, input-oriented CCR (CCR-IO) model, output-oriented CCR (CCR-OO) model, input-oriented BCC (BCC-IO) model, output-oriented BCC (BCC-OO) model, Additive model with constant returns to scale (ADD-CRS) and Additive model with variable returns to scale (ADD-VRS) will be applied. The multiplier form of CCR-IO, CCR-OO, BCC-IO, BCC-OO, ADD-CRS and ADD-VRS models are introduced in Models (4) to (9), respectively.

Robust optimization
In real cases, generally, the inputs and outputs of DEA models are tainted by uncertainty [97][98][99][100][101][102][103][104][105]. The imprecision of the input parameters increases when there is a low access to reliable historical data. In this condition, it is important to protect the robustness of the solution obtained from the DEA model; otherwise, the efficiency and ranking of the concerned DMUs may become unreliable and consequently significant costs may impose on different stakeholders. To prevent such undesirable outcome robust optimization methods can be employed [106]. Notably, a solution to a DEA model is said to be robust if it remains feasible for almost all possible values of uncertain parameters and the corresponding ranking should have minimum variation for all possible values of imprecise parameters. Here, a hard-worst-case robust optimization approach is applied to cope with uncertain parameters in the DEA model [107]. This approach does not need significant historical data and therefore it can be applied in almost all of the real-life DEA problems. In addition, this method assures the feasibility of the DEA model solution for all possible values of uncertain parameters in the assumed convex uncertainty set. Soyster [108], Ben-Tal & Nemirovski [109] and Bertsimas & Sim [110] presented a popular and main robust optimization approach in convex uncertainty set.
In robust optimization method, for dealing with uncertainty in data, consider a particular constraint a of a nominal model and let Λ a represent the set of coefficients in constraint a that are subject to uncertainty. It should be noted that each entry α ab ,b 2 Λ a is modeled as a symmetric and bounded random variable which takes values in ½a ab Àâ ab ; a ab þâ ab Þ. The central of this interval at the point α ab is a nominal value andâ ab is the perturbation of uncertain parameters α ab ,b 2 Λ a . Finally, robust counterpart of constraint a based on Soyster [108], Ben-Tal & Nemirovski [109] and Bertsimas & Sim [110] robust optimization approaches are proposed as Eqs (10) to (12), respectively: It is worth mentioning that robust optimization approach of Soyster [108] is too conservative. Ben-Tal & Nemirovski [109] proposed a robust approach but their robust counterpart is nonlinear programming (NLP) which can be problematic in the real-world problems although the model can adjust the conservatism by parameter O. Bertsimas and Sim's [110] robust approach can flexibly adjust the level of conservatism of the robust solutions by parameter Γ and robust counterpart in their approach is linear programming (LP) [111][112][113][114]. With respect to this feature and linearity of robust counterpart in Bertsimas and Sim's [110] robust approach, this approach will be used in this paper for dealing with uncertainty in all models. Please note that RO approaches of Soyster [108], Ben-Tal & Nemirovski [109] and Bertsimas & Sim [110] are presented based on "box", "box & ellipsoidal", and "box & polyhedral" uncertainty sets, respectively.

The proposed robust approach for portfolio selection and optimization problem
In this section, the robust approach for portfolio construction problem in the financial markets is presented. This approach contains two phases that in continuous, steps of each phase, thoroughly are explained. Fig 1 presents a schematic summary of all steps in two-phase robust portfolio construction approach of this paper.

Phase I: Portfolio selection
In this phase during 6 steps, the performance of all stocks that investors can invest in them, are evaluated and measured. At the end of this phase, only the stocks that pass the filter of the investor are qualified to be a candidate that can be invested in the second phase.
Step 1.1. Choose a Data Envelopment Analysis (DEA) model. In the first step of phase 1, the data envelopment analysis models are chosen to evaluate the stocks. In this paper, CCR-IO, CCR-OO, BCC-IO, BCC-OO, ADD-CRS and ADD-VRS models are selected. Notably, all of DEA models that are used in this study, are presented in the Subsection 3.3.
Step 1.2. Choose a financial criteria for evaluating the stocks. In the second step of phase 1, financial criteria for evaluation of stocks are chosen from different perspectives that contains of return, risk, profitability, liquidity, leverage, valuation and growth. Based on

PLOS ONE
literature review, expert opinion and Delphi method, inputs and outputs of DEA models are as shown in Table 3.
Step 1.3. Choose a robust optimization approach. In the third step of phase 1, with respect to weaknesses and strengths of Soyster [108], Ben-Tal & Nemirovski [109] and Bertsimas & Sim [110] robust approaches, the Bertsimas & Sim's [110] (B&S) robust approach are selected for dealing with uncertain parameters in DEA models. It should be noted that the formulation of robust counterpart in the B&S robust approach are presented in Subsection 3.4.
Step 1.4. Proposing the Robust Data Envelopment Analysis (RDEA) model. In the fourth step of phase 1, robust data envelopment analysis models are proposed. This step is the most important step in the first phase. In order to consider the uncertainty of input and output parameters in DEA models based on Bertsimas & Sim's [110] robust approach, primarily all of the constraints, to become less than or equal constraints. In each of the CCR-IO, CCR-OO, BCC-IO and BCC-OO models, how to convert the equal constraint to less than or equal constraints, will be discussed in the following, respectively.
The compact form (CF) of CCR-IO model is as Model (13). If vx 0 = 1 become to vx 0 � 1, the optimal solution does not change.
The compact form of CCR-OO model is as Model (15). If uy 0 = 1 become to uy 0 � 1, the optimal solution does not change.
PLOS ONE Proposition 3. The optimal solution of Model (17) is equal to Model (18). Proof. Assume that the optimal solution of Model (18) (18). Also, in the objective function The compact form of BCC-OO Model is as Model (19). If uy 0 = 1 become to uy 0 � 1, the optimal solution does not change.
Step 1.5. Run the RDEA model for desired Γ and Δ. In the fifth step of phase 1, the robust DEA model with consideration of the conservatism level Γ and perturbation Δ for performance measurement of all stocks will be run. Also, by applying the RDEA Model, all stocks will be ranked. For the constraint i to be violated with probability at most δ i , it is sufficient to choose Γ i at least equal to Eq (27): Where F, the cumulative distribution, is function of the standard Gaussian variable and n is the number of uncertain parameters in the constraint i.
Step 1.6. Selection of top stocks from first phase. In the sixth step of phase 1, with respect to cardinality constraint ∑τ j = k for portfolio selection in the second phase, top k stocks that qualified for pass the first phase to second phase will be selected. For conservative perspective to selection of the best stocks in first phase, top k stocks will be selected based on the average rank of per stock in all RDEA models contain of RCCR-IO, RCCR-OO, RBCC-IO, RBCC-OO, RADD-CRS and RADD-VRS models.

Phase II: Portfolio optimization
In this phase with 5 steps, the amount to be invested in each qualified stock is decided and finally the portfolio is created. In other words, in this phase DM makes a decision for weights of qualified stocks from the first phase in the portfolio.
Step 2.1. Proposing the portfolio optimization (PS) model for qualified stocks. In the first step of phase 2, two portfolio optimization models with consideration of risk, return and liquidity will be proposed. In the first model, semi variance and in the second model, absolute deviation are risk measures, respectively. For consideration of return and liquidity, two constraints are added to each model that ensures achieving the desired minimum expected return and desired minimum expected liquidity of investor. Also, in order to develop the model for covering the financial market constraint, cardinality constraint and purchasing limitation should be considered. Now, the mean-semi variance-liquidity (MSVL) model and the mean-absolute deviationliquidity (MADL) model are proposed as Models (29) and (30), respectively: It is worth noting that in MSVL and MADL models, cardinality constraint ∑τ j = k for portfolio selection is satisfied by first phase.
Step 2.2. Choose a robust optimization approach. In the second step of phase 2, the Bertsimas & Sim's [110] robust approach is selected for dealing to uncertain data and parameters in MSVL and MADL models. It should be noted that the formulation of robust counterpart in the B&S robust approach is presented in the Subsection 3.4.
Step 2.3. Proposing robust Portfolio Optimization (RPO) models. In the third step of phase 2, robust portfolio optimization models will be proposed. This step is the most important step in the second phase. According to B&S robust approach, the RMSVL and RMADL models are proposed as Models (30) and (31): In this step, two robust portfolio optimization models that are RMSVL and RMADL are proposed.
Step 2.4. Run the RPS model to achieve desired Γ and Δ. In the fourth step of phase 2, the robust portfolio optimization model with consideration of the desired level of conservatism Γ and perturbation Δ is run to make a decision for weights of the qualified stocks obtained from the first phase. As same as the fifth step of phase 1, for the constraint i to be violated with probability at most δ i , it is sufficient to choose Γ i at least equal to Eq (27).
Step 2.5. Portfolio construction with weights of the RPO model. In the fifth step of phase 2, finally, with respect to weights of top k stocks in the RMSVL and RMADL models, the investor desired portfolio will be constructed. It should be noted that, with changing the desired minimum expected return and desired minimum expected liquidity of the investor, the efficient frontier will be made.

Case study and numerical results
In this section, the implementation of the proposed approach of this paper for the portfolio construction problem, is presented for a real-world case study from Tehran stock exchange (TSE). TSE, with a history of nearly half a century, is one of the most attractive financial markets in the Middle East region. Pharmaceutical industry involving 27 stocks is selected and financial data are extracted from March 2013 to March 2014. Summary of real-world data from Pharmaceutical industry of Tehran stock exchange (TSE) that are used in this research are as Table 4. Now, after collecting data, the robust CCR-IO, robust CCR-OO, robust BCC-IO, robust BCC-OO, robust ADD-CRS and robust ADD-VRS models will be run. According to the desired confidence level of 90% in order to satisfy the constraints in the robust data envelopment analysis models, based on Eq (27), the level of conservatism Γ is set equal to 3.56, 3.86 and 4.84 for constraints with 4, 5 and 9 uncertain parameters, respectively. Also, the perturbations Δ is set to 0.05. The results of all RDEA models that are presented in Model (21) to (26) are introduced in Table 5.

Stocks
Inputs Outputs According to cardinality constraint in RMSVL and RMADL, k is set equal to 10, ten stocks that have a higher average rank in Table 6 are selected. Finally, the set of stocks that selected from RDEA models are PDRO, DLGM, THSH, TMVD, DKSR, DARO, DJBR, KIMI, ROZD, and AMIN. In order to run RMSVL and RMADL models, the monthly data for the return and the liquidity of the selected stocks are extracted for 12 months between March 2013 to March 2014 from TSE. The real data for the return and the liquidity of the selected stocks per 12 periods are presented in Tables 7 and 8, respectively: Now, after selecting stocks from the first phase, in the second phase, the robust mean-semi variance-liquidity (RMSVL) and robust mean-absolute deviation-liquidity (RMADL) models will be run. According to the desired confidence level of 90% in order to satisfy the constraints in the RMSVL and RMADL models, based on Eq (27), the level of conservatism Γ is set equal to 5.05 for a constraint with 10 uncertain parameters. Also, the perturbations Δ is set to 0.05 and taking into account the expected liquidity of portfolio is fixed equal to 10.50, and the expected return of the portfolio is increased. With considering the different expected returns of the portfolio, the results of RMSVL and RMADL models that are presented in Models (30) and (31) are introduced in Tables 9 and 10: As can be seen in the results, with an increase in the expected return of the portfolio, the risk of portfolio is also increased. The efficient frontier of RMSVL and RMADL are presented in Figs 2 and 3, respectively.

Sensitivity analysis
In this section, the sensitivity analysis of all robust models that are presented for different Γand Δ. The Sensitivity analysis RCCR-IO, RCCR-OO, RBCC-IO, RBCC-OO, RADD-CRS, RADD-VRS, RMSVL and RMADL models are presented in Tables 11-18, respectively. Also, the trend of results from all robust models are introduced in Figs 4-11, respectively: As can be seen in Tables 11-18 and Figs 4-11, the results indicate that, as the budget of robustness Γ increases from 0% to 100% for uncertain parameters, the objective function gets worse. Also, as the perturbations Δ increases from 0.01 to 0.1, the objective function gets worse than the nominal problem. It should be noted that the expected return and the expected liquidity of portfolio in both of robust MSVL and robust MADL models are set equal to 0.013 and 14.50, respectively.

PLOS ONE
In the end of this section, the portfolio performance based on RMSVL and RMADL models will be analyzed. Accordingly, five popular measures including excess mean return (EMR), downside deviation (DD), Sharpe ratio (SHR), information ratio (IR), and Sortino ratio (SOR) are applied. A brief description of these measures is introduced as follows: EMR: Describe portfolio's reward over market index or the difference between portfolio return and market index return. EMR is calculated by Eq (32), where R P and R I denote on portfolio return and market index return, respectively. Please note that higher values of EMR are desirable.
DD: Describe the underachievement of portfolio from the market index. DD is calculated  (34), where E(R P ), R f , and σ(R P ) denote on average portfolio return, risk-free return rate, and standard deviation of portfolio return. Please note that higher values of SHR are desirable.
IR: Describe the risk-adjusted returns of a financial asset or portfolio relative to a certain benchmark and it is calculated by Eq (35). Please note that higher values of IR are desirable.
SOR: Describe the return per unit risk and it is calculated by Eq (36). Please note that higher values of SOR are desirable.
Now, by applying Eqs (32) to (36), all performance measures are calculated for RMSVL and RMADL models. It should be explained that the risk-free return rate is 0.10. The results of EMR, DD, SHR, IR, and SOR are presented in Table 19:

PLOS ONE
According to the results, it is obviously observed that both of two proposed models including RMSVL and RMADL are effective to construction of optimal portfolio. In other words, the proposed approach is capable to achieve desirable return in comparison with risk-free return rate. It should be noted that the performance of RMSVL model is marginally better than RMADL model under all five measures.

Conclusions and future research directions
In this study, a novel approach for the portfolio construction problem is proposed in order to deal with data uncertainty, increasing conservatism levels of the investment process, decreasing computational complexity, and assessing comprehensive of stocks. Accordingly, this study presents six RDEA models based on the most widely cited and popular classic data envelopment analysis models in the first phase and two robust portfolio optimization models including robust mean-semi variance-liquidity and robust mean-absolute deviation-liquidity in the second phase. It is worth mentioning here that the uncertainty is considered on all data in two

PLOS ONE
phases including input and output data in DEA models and financial parameters in MSVL and MADL models by robust optimization approach. Finally, a real-life case study from the Tehran stock exchange is implemented to demonstrate the applicability of the proposed twophase robust portfolio selection and optimization approach and exhibit the efficacy and effectiveness of the presented method in this paper. Additionally, the sensitivity analysis of all robust models of this study is illustrated. The results show that the proposed approach is effective for portfolio construction under uncertainty environment. Also, the computational complexity for consideration cardinality constraint in portfolio optimization models by applying the presented two phases approach is decreased. In other words, this approach does not need any meta-heuristic algorithm for solving the portfolio optimization model with investment constraint. In the end, the main contributions of this study can be summarized as follows: • The paper introduces a novel two-phase portfolio selection and optimization approach.
• Six RDEA models are proposed in order to stock performance measurement under uncertainty.  • Two robust portfolio optimization models with different risk measures are presented.
• Sensitivity analysis of all eight robust models in this study are illustrated.
• The proposed approach is implemented in a real-life case study of Tehran stock exchange.