Experimental Evaluation of Suitability of Selected Multi-Criteria Decision-Making Methods for Large-Scale Agent-Based Simulations

Multi-criteria decision-making (MCDM) can be formally implemented by various methods. This study compares suitability of four selected MCDM methods, namely WPM, TOPSIS, VIKOR, and PROMETHEE, for future applications in agent-based computational economic (ACE) models of larger scale (i.e., over 10 000 agents in one geographical region). These four MCDM methods were selected according to their appropriateness for computational processing in ACE applications. Tests of the selected methods were conducted on four hardware configurations. For each method, 100 tests were performed, which represented one testing iteration. With four testing iterations conducted on each hardware setting and separated testing of all configurations with the–server parameter de/activated, altogether, 12800 data points were collected and consequently analyzed. An illustrational decision-making scenario was used which allows the mutual comparison of all of the selected decision making methods. Our test results suggest that although all methods are convenient and can be used in practice, the VIKOR method accomplished the tests with the best results and thus can be recommended as the most suitable for simulations of large-scale agent-based models.


Introduction
In the realm of socio-economic systems, problem solving is mostly related to the necessity for individual agents to make right decisions. In most cases, the operational level of decision-making can be handled by standard optimization algorithms, as these problems mostly require only the optimized solution, and decision-making is generally specialized and not overwhelmingly complex. Nevertheless, decision-making on both tactical and strategic levels is generally dependent on a large number of factors and is characterized by the pursuit of parallel goals, and the decision-making component is expected to adapt to changing conditions because economic environments are typically dynamic and complex in nature. Therefore, these environments are modelled with the help of various techniques or approaches. The traditional analytical approach to modelling of economic systems has several weak points [1]. Therefore, modelling based on systems dynamics principles [2] or agent-based techniques has recently emerged as a viable alternative. Tied to the latter, a new discipline termed Agent-based Computational Economics (ACE) has been established. In this field of study, multi-agent economic systems comprise multiple intelligent agents that interact to solve problems that might be beyond the capabilities of a single agent or system [3]. Hence, individual agents represent economic entities of various types, and the performance of the whole system is the result of a large number of their mutual interactions. ACE models are constructed through a bottom-up approach, i.e., there is no centralized control or decision-making. However, this approach causes one of the main issues of the agent-based approach, especially when large-scale models are developed and used for simulations. As the square law of computation indicates [4], as the number of agents increases, the computation complexity rises disproportionately. This limitation forces users to apply as effective methods or techniques as possible. In case of decisionmaking agents it encourages the exploration of the possibility of applying multi-criteria decision-making as the main control mechanism on both the individual and collective level of multi-agent systems. Unfortunately, there is a very limited number of studies published that would explicitly deal with application of multi-criteria decision-making (MCDM) methods in multi-agent models. However, there are studies connecting decision-making at the general level and agent-based modelling. For instance, while Yu and Xu [5] deal with graph-based multi-agent decision making, Nguyen et al. [6] investigate decision-making agents in rice pest risk management. Obviously, exploration of suitability of particular MCDM methods for largescale agent-based simulations represents a significant research gap and associated experiments can offer results with a high level of value added and novelty.
This study follows the seminal work of Triantaphyllou [7], who states that the fundamental research question is "which is the best method for a given problem?" Similarly, Peng et al. [8] state that "it is a challenging task to decide, which MCDM method(s) are suitable for a problem". In case of this study, the problem is a decision-making process performed by agents in large-scale agent-based systems, which represents the main research construct of this study. Thus, rather than development of a new MCDM method suitable for decision-making in multi-agent systems or adjustment of the existing one, it is more suitable to focus on several existing formal methods that might be applied in this specific decision-making environment. This study is based on two main assumptions, which existence and thus relevancy was verified by review of research papers published in three scientific databases-Scopus, ScienceDirect and Web of Knowledge. First, no study comparing the computational effectiveness of selected multi-criteria decision-making methods in large-scale agent-based systems has been conducted yet. Second, agent-based models are mostly developed and simulated on standard personal computers, laptops, or notebooks. Although there is an intensive research focused on usage of distributed environments, high performance (HPC) and high throughput (HTC) compute platforms, Grid or clouds ( [9] or [10]), we consider it as specific and mostly associated with particular research conducted in the information technology realm. It can be assumed that agent-based models are mostly built and simulated on commodity compute resources and large-scale models are mostly developed and simulated with the help of similar configurations as well. Although the aforementioned review of studies revealed that absolute majority of models are presented without description of hardware configurations that was used for modelling and simulations, there are two main prevailing reasons for this assumption. First, Anylogic as one of the leading agent-based platform provider states system requirements at its web pages. Even when AnyLogic represents quite sophisticated set of tools that focus on various modelling paradigms, requirements are in concordance with equipment of standard low-cost computers or laptops (dated to 2015/2016). Second, existing literature supports this assumption as well. For instance, the international group of researchers states that "recent advances in computing hardware such as the wide availability of multi-core CPUs and increased main memory capacities make it possible to investigate population-scale phenomena using commodity compute resources. " [11]. Based on these two assumptions the main research question is formulated as follows: Which from the selected MCDM methods is the most suitable (least time demanding) for application in large-scale agent-based simulations using commodity compute resources?
The main objective of this manuscript is to compare computational effectiveness of the WPM, TOPSIS, VIKOR and PROMETHEE methods during the simulation of a specific decision making situation in economic the large-scale agent-based system. In this context, the suitability constitutes time demands associated with runtimes required for execution of the method. The main working hypothesis states that among these four methods, one surpasses the others in terms of suitability as defined above. The subsequent parts of the paper are structured as follows. After the introductory section, the materials and methods are presented. More specifically, the agent-based computational economy used as a platform for the experiments is introduced and the specific experimental software and hardware settings are described. Further, description of the theoretical basis of the decision-making methods applied in this study is provided. The third section presents the results and discusses them in a broader context. Consequently, study limitations and further research paths are outlined. The final section presents the conclusions.

Materials and Methods
At the beginning of the research, several platforms for development of multi-agent systems were available for the implementation of the model. In order to find the most suitable platform, following criteria were defined, according to basic requirements: • Geographical Information Systems (GIS) data support, Win OS compatibility, • orientation on economic models (preferably), • capacity to handle large-scale models, i.e. models with several thousands of agents or more, • deliberative (or variable) decision-making support, • object-oriented structure, preferably Java-based, • various simulation approaches, • built-in optimization support for experiments.
Based upon these requirements, several platforms were taken into consideration.
1. NetLogo (https://ccl.northwestern.edu/netlogo/)-the NetLogo platform is a tool for development of agent-based models, available free of charge at the Northwestern Center for Connected Learning and Computer Based Modeling. NetLogo uses simplified representation of the environment (patches organized into a grid) and agents (turtles) and while it is in general possible to use GIS data and employ large number of agents in a model at the same time, it does not offer sufficient complexity for decision making capabilities of agents for the intended purposes and extent of the model. Although implementation is relatively easy, NetLogo can be perceived as slightly obsolete today. While unsuitable for our needs, many successful models were implemented in NetLogo, and developer provides extensive library of implemented models for the user`s convenience.  It also fully complies with the FIPA  specifications and offers distribution of models over several machines, if needed. However,  for the large-scale models, a communication bottleneck exists since all requests are handled  by single facilitator entity at certain point (which is an integral part of the Jade communication protocol). Although environment works efficiently with smaller number of agents, for higher number of agents (100k and more) there are severe limitations in model performance. JADE also lacks native support for the GIS data processing.
3. EKI One/MASA platforms (http://masa-group.biz/)-the commercial EKI One platformoriginally developed by Artificial Technology GmbH-had integrated Unity support, a Sandbox module for prototyping, and used Lua programming language for scripting. The first version of the model has been partially implemented in this environment, but since the developer company was bought by a competitor (MASA), this platform`s support and further development was terminated. A triplet of platforms was later introduced by the MASA group, namely SWORD (specialized on military logistics and simulations), LIFE (character design and behavior), and SYNERGY (development of DSS for emergency management).
Being out of scope for our research, this branch of platforms is no longer used for implementation.
4. AnyLogic (http://www.anylogic.com/)-AnyLogic is a general purpose agent-oriented commercial platform which supports three of the most common modeling paradigms today: (i) agent-based modeling, (ii) discrete event, and (iii) system dynamics simulations. Similar to NetLogo in provision of library of examples and excellent technical support for users, Any-Logic allows implementation of models of various complexity and scale. Supply chain and logistics, manufacturing and production, transportation and warehousing, social processes, or strategic planning and management represent intended application areas for this platform. Moreover, variety of basic functionalities is already well-prepared in AnyLogic libraries, such as pathfinding, GIS support, distribution functions, etc. which makes development of models faster and easier. Altogether, AnyLogic is found to be the most suitable, modern platform available, well-fitting to the needs determined by this study requirements.
The list of agent-based modeling platforms is obviously not complete, for there is a plethora of platforms and toolkits (ADK, Brahms, GAMA, MASON etc.) available on the market. The selection of methods applied in this manuscript covers only platforms closely related to this particular research. The list actually represents only a tip of an iceberg, since multi-agent modeling has become very popular recently and there are literally dozens of platforms available. More thorough survey of agent-based modeling software in general is provided by Nicolai & Madey [12]. Less general and more domain-oriented is recent paper of Hmida that focuses on implementing various MAS architectures in supply chains [13]. Interesting perspective also offers Wang`s paper [14], which deals with smart factories and a strategic initiative called "Industrie 4.0" (German equivalent to Industry 4.0 in English), which was proposed and adopted by the German government as part of the "High-Tech Strategy 2020 Action Plan" [15]. Other studies utilizing agent approach focus mainly on various sub-topics which may be also considered relevant, e.g. more technical view with emphasis on application of engineering approaching in developing simulation models [16], management of SME`s supply chains via MAS outsourcing [17], or MAS supply networks in Industry 4.0 [18].

Simulation platform
The experiment addresses the computational effectiveness of single multi-criteria decisionmaking methods without strict relation to other aspects of the economic model. As the model's functionality, usability, validity, and robustness have already been proved, only a brief description is provided in this section. Additional details associated with the platform can be found in [1].
The economic system has been developed for several months in the AnyLogic software package, v. 7.1.2 x64, with the intention to create a model of real economic systems for use as a suitable platform for various experiments. The main purpose of the model is to represent/emulate an economy at the macro-level, with attention focused especially on the formation of supply chains, consumption, and market interactions. The model implements several types of agents representing: • consumers (C-agents who consume goods and services and generate a workforce), • production units (M-agents who harvest and mine the basic production inputs, and industrial F-agents who transform inputs and semi-products to semi-products and products), • services providers (S-agents offer various services), • energy providers (E-agents who generate energy for all other agents), • transportation units (T-agents who ensure transport services).
Furthermore, there is a meta-agent available in a form of colony (COL, who provides public services on macro-level, such as social and health care services, or security, collect taxes and realize public policies). The model requirements and its particularization progressively lead to the implementation of decision-making performed at the individual, group and collective levels. To respect basic prerequisites of ACE model development [19], the government level (COL) does not play the role of a centralized control entity but creates general constrictions or support according to selected policies or priorities (e.g., tax levels, tax relief, support of green technologies, subsidies for selected investments).
The most relevant features of the model are as follows: • The model consists of a large number of agents (one geographical region is typically represented by more than 10.000 agents).
• The model has complete numerical and symbolic representation.
• From the macro-economic perspective, the variety of possible actions of given economic entities in the model can be perceived as limited.
• Optimization problems at the operational level are handled by standard techniques and algorithms associated with operations management.
Although there are three levels of decision-making in the system, and each level is specific with regard to the input data used and the variety of solutions considered, the mechanics of decision-making remains basically the same, and thus the same decision-making principle is intended for application at all three levels (with respective variations of input data, decision criteria, variants etc.). The main design principle of the model is grounded in the repeated use of standardized modules of identical (or very similar) construction, which makes consequent testing, verification, and overall implementation significantly easier.

Experimental set-up
The experimental testing of the selected methods is associated with the decision-making of companies (Factory-agents/Mining-agents/Store-agents/Energy-agents/Transport-agents) who need to select the most appropriate candidates for a working position, i.e. decide upon set of agents (Consumer-agents) available in the economic model, who are offered through the labor market, and select the best one according to specialization and work-experience requirements specific for the respective job. Similar approach is used in the model e.g. for establishment of the optimal consumption of goods and services (i.e. consumer basket related decisions), establishing business relationships between agents, such as finding the best supplier, and other decision making situations of economic nature.
In practice, the process is implemented with the help of the abstract class DecisionMethod. This class contains attributes of a decision-making table (alternatives, criteria, criteria weights and values). However, information about maximization or minimization is not included in this class because not all methods need it for their calculations. Furthermore, elements related to the results of multi-criteria decision-making are included. Whereas result represents the best compromise alternative, listOfResults provides the complete list of alternatives based on the results obtained. Next, the class comprises methods for the access and setting of particular attributes. Naturally, this class represents the parent class of further classes associated with particular decision-making methods. These specific classes also implement the interface IDecision for interaction of the user with the simulation environment.
More specifically, all specific classes associated with particular decision-making methods implement given algorithms (see section Calculations below). In the case of the Wpm class, the constructor requires alternatives, criteria, criteria weights and information about minimization or maximization. Moreover, parameters in the form of a list are required due to the possibility of relatively high numbers of both alternatives and criteria. In addition to the Wpm class, the Topsis class addresses normalization. Ideal normalization is selected for the experiment. The selection of the best and the worst evaluations of single criteria form the normalized weighted matrix, which is used for the determination of ideal and negative solutions. The constructor of the Vikor class requires identical parameters to the Topsis class. In addition to all previous classes, the Promethee class constructor requires preference and indifference values in the form of a two-dimensional array. While the array's first column comprises indifference borders of particular criteria, the second column contains preference borders.
The user interface and available settings modes were created. Furthermore, a menu for the determination of a specific decision-making method and the number of conducted experimental runs was developed.
As already stated in the introductory section of this paper, description of hardware settings applied in published studies is very often missing. Therefore, the experimental testing was performed on four hardware settings (HWs) in order to outline influence of hardware on simulation results. The virtual environment was the only application running on the HWs during the experiment. HWs details are presented in Table 1.
For each method, 100 tests were conducted, which represented one testing iteration. Four testing iterations were conducted on each HW setting in order to find out whether there are any internal effects influencing acquired results. Furthermore, the existence of the "-server" parameter in the Java Virtual Machine (JVM) had to be taken into consideration. The -server parameter causes aggressive code optimization and is a standard setting in JVM benchmarks. Thus, the influence of application of -server parameter in the source code had to be verified because justin-time compilation adds very large variance to execution time [21]. Thus, tests were conducted on single HWs also with the-server parameter applied. Based on recommendations focused on testing in the Java environment published by Georges et al. [21], the first 100 data points associated with the initialization process ("warm-up" period) were removed from the analysis. Altogether, 12800 data points were collected and consequently analyzed (these data are available in the Supplemental file S1 Dataset). The length of one test was set to 2592000 seconds of modelled time during which 432000 decision-making moments were performed. This period of 30 days (1 month of the model runtime) was selected because it provides sufficient volume of data for subsequent analysis. In average, there is a decision made every 6 second of virtual (model) time. Analyzed real-time runtimes were monitored in seconds.

Evaluated multi-criteria decision-making methods
The performed literature review reveals that there is a lack of research papers or studies covering area of computational effectiveness of MCDM methods in general. Relevant studies mostly compare selected MCDM methods based on their application in various specialized domains, such as housing affordability [22], decision support in financial services [23], maintenance delivery in engineering industry [24], bio-energy systems [25], fuzzy logic [26], development of food products [27], or students`career preference models [28]. When the methods are mutually compared, the computational effectiveness is not included as a criterion. For instance, Thor et al. [29] focus on consistency of methods as the main criterion. Peniwati [30] offers quite complex comparison based on 16 criteria used for evaluation of 16 group decision-making methods (outranking, goal programming, MAUT, or AHP included). These range from technical to psychophysical or social ones. However, computational effectiveness is not included. There are also relevant papers published in the economic modeling domain, which might be referenced. For example a study presented by Tan, Lee and Goh [31] that focuses on empirical evaluation of MCDM techniques in business-to-business (B2B) collaboration in supply chains, or Rezaei's paper dealing with MCDM application in reverse logistics [32] are worth noticing. The evaluation of computational effectiveness is regrettably omitted. Furthermore, it seems that there is a strong tendency to apply MCDM in a form of hybrid solutions (e.g. [33,34], [35], [36], etc.), which makes mutual comparison of MCDM methods even more difficult, if not straightforwardly impossible. Other papers are focused on comparison of application areas, e.g. [37], but this approach leads even more towards listing of case studies for applications rather than their explicit comparison. Lack of literature and resources on computational effectiveness might be partially related to scarce applications of MCDM in large scale simulations and models where the effectiveness and computational requirements are very important. In smaller models, other aspects of decision making such as precision and quality of decision making, robustness of the DM method, or problems related to description of task environment, usually play more important roles.
From a large number of possibilities, four multi-criteria decision-making (MCDM) methods were implemented and tested, namely the following: These methods were selected based on their suitability for use in the model. Although decision-making support is mostly aimed at individuals or groups of individuals as the ultimate target users, it is generally perceived as desirable to minimize the participation of human element in the decision-making process and respective data preparation due to human-related issues such as subjectivity, emotions, or biases [38]. Methods requiring interaction with the user/ operator or such that would be difficult to process in the model, e.g., containing pairwise comparison, evaluation scales, or the autonomous construction of fitness functions, were omitted from the testing. Examples of omitted methods are AHP, ANP, MAUT/UTA, MACBETH, ELECTRE, CBA, COMET, AIRM, GRA, NATA, PAPRIKA and others. The WSM method, as described in the following section, was omitted because of difficulties related to the handling of different units of measure and problems with multidimensional decision-making situations.
Weighted Sum Model and Weighted Product Model. Because there is a generally prevalent approach to start with the simplest alternative, the Weighted Sum Method (WSM) was considered as the first option for the application. WSM is a straightforward decision-making method, usually applied to one-dimensional problems. Under the assumption of m alternatives and n criteria, the best option is selected by application of the following expression (according to Fishburn [39]): A Ã wsm denotes best alternative score, a ij denotes value of i-th alternative with consideration of j-th criterion, w j denotes weight of j-th criterion. However, Triantaphyllou [40] states that ". . .in single-dimensional cases, where all units are the same, WSM can be used without difficulty. Difficulty with this method emerges when it is applied to multi-dimensional MCDM problems. Then, in combining different dimensions, and consequently different units, the additive utility assumption is violated." For this reason, the Weighted Product Model (WPM) was used instead. The Weighted Product Model (WPM) is similar to WSM but uses multiplication instead of addition (Bridgman [41] and Miller and Starr [42]). We may suppose that a given problem is defined on m alternatives and n decision criteria. Moreover, we may assume all the criteria to be considered as benefit criteria (i.e. the higher the values, the better). Then, if a decision-maker wants to compare the two alternatives A K and A L (where m ! K, L ! 1), then the following equation needs to be calculated: n is the best alternative score, a ij is actual value of i-th alternative in terms of j-th criterion, w j is weight of importance of the j-th criterion.
If the term R A K A L is greater or equal to one, then alternative A K is more desirable than A L .
The best alternative is better than or at least equal to all other alternatives [40]. The structure of the method eliminates any units of measure, allowing its use in both single-and multi-dimensional problems. Technique for Order Preference by Similarity to Ideal Solution (TOPSIS). The Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method was developed by Hwang and Yoon in 1981 [43]. This method is based on the concept that the chosen alternative should have the shortest Euclidean distance from the ideal solution and the farthest Euclidean distance from the negative ideal solution. The ideal solution is a hypothetical solution for which all attribute values correspond to the maximum attribute values in the database comprising the satisfying solutions; the negative ideal solution is the hypothetical solution for which all attribute values correspond to the minimum attribute values in the database. TOPSIS thus gives a solution that is not only the closest to the hypothetically best but is also the farthest from the hypothetically worst [44]. The TOPSIS procedure consists of the following steps (all adopted from [44]): 1. Calculate the normalized decision matrix from the evaluation matrix consisting of m alternatives and n criteria, with the intersection of each alternative and criteria given as f ij , we therefore have a matrix (f ij ). The normalized value r ij is calculated as 2. Calculate the weighted normalized decision matrix. The weighted normalized value v ij is calculated as where w i is the weight of the i-th attribute or criterion, and P n i¼1 w i ¼ 1.

Determine the ideal and negative-ideal solutions
where I' is associated with benefit criteria, and I" is associated with cost criteria.
4. Calculate the separation measures using the n-dimensional Euclidean distance. The separation of each alternative from the ideal solution is given as Similarly, the separation from the negative ideal solution is given as 5. Calculate the relative closeness to the ideal solution. The relative closeness of the alternative a j with respect to A Ã is defined as 6. Rank the preference order.
As is apparent from the first and second step, the TOPSIS method uses a normalization procedure, which is necessary for the comparison of different measures of units tied to single criteria. There are several usable normalization methods, e.g., distributive or ideal normalization. Weighted normalized scores are used for comparing and determining ideal and negative solutions.
Vise Kriterijumska Optimiyacija I Kompromisno Resenje (VIKOR). According to Chang [45], the compromise-ranking method VIKOR is an utilizable technique for multi-criteria analysis. The method was developed for application in complex systems. The VIKOR method is grounded in ranking and selecting from the alternatives with conflicting criteria [46]. Based on the assumption that each alternative is assessed according to multiple criterion functions, the compromise ranking is conducted by comparing the measure of closeness to the ideal alternative [47], [48], [49]. There are five main steps that need to be performed in the compromise-ranking algorithm of the traditional VIKOR (procedure overtaken from [45]): 1. The various alternatives are denoted as x 1 , x 2 , . . ., x m . For an alternative x j , the merit of the i-th aspect is denoted by f ij , i.e., f ij is the value of the i-th criterion function for the alternative x j . Additionally, m is the number of alternatives, and n is the number of criteria.

Determine the maximum f i
Ã and minimum f i values of all criterion functions, i = 1, . . ., n.
S j denotes utility measure for the alternative x j , R j denotes regret measure for the alternative x j , w i denotes weight of i-th criterion, which represents the relative importance of the criterion.
where v is the weight for the strategy of maximum group utility, and 1-v is the weight of the individual regret, according to Kackar [50] and Opricovic [49].
5. Rank the alternatives by Q j . The lower the value of Q j is, the better a decision is the alternative [45].
Chang [45], in the first step of the sequence, already expects normalized values of the (f i, j ) decision matrix. Yazdani and Payam [51] and Tong et al. [52] mention the normalization of values in the first step. However, they work with vector normalization, corresponding to the distributive normalization of the TOPSIS method. This method is in contrast to Opricovic and Tzeng [49], who use linear normalization (below), which eliminates the influence of units of measure and is included in the first step of the procedure:

Preference Ranking Organization Method for Enrichment Evaluation (PRO-METHEE).
One of the options for evaluating a decision-making problem is the method called Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE). Currently, this method has been extended to encompass six ranking formats: PROMETHEE-I (partial ranking), PROMETHEE-II (complete ranking), PROMETHEE-III (ranking based on intervals), PROMETHEE-IV (continuous case), PROMETHEE-V (net flows and integer linear programming), and PROMETHEE-VI (representation of human brain) (see [53]).
In this experiment, PROMETHEE-II is applied; thus, the ranking of the alternatives is based on positive and negative flows. According to Ishizaka and Nemery [20], there are four possible outcomes: 1. Alternative A is better than B, if the total positive and negative flows are simultaneously better; 2. alternative A has a worse evaluation than alternative B if the total positive and negative flows are worse; 3. alternatives A and B are incomparable when one of alternatives has a better positive score but a worse negative score, and vice versa (i.e., worse positive and better negative score), and 4. two alternatives are equal in the case of matching positive and negative total flows.
PROMETHEE-II uses a ranking system based only on total net flows. This results in complete alternative evaluation, in contrast to PROMETHEE-I, which does not include a situation in which two alternatives are incomparable [20]. The methods works with the indifference and preference thresholds. While the indifference threshold for a given criterion represents the largest deviation that is considered as negligible in the comparison of two actions, the preference threshold for a given criterion corresponds to the smallest definition that a decisionmaker considers as definitely important when he/she compares to actions. A thorough formal description of the first two variants of the PROMETHEE methods, namely PROMETHEE-I and PROMETHEE-II, is provided by Mateo [53]. Due to the extent of the formal description of this methods, it is left up to the reader to examine the details in the provided reference. However, implementation of the method can be found in the Supplemental file (S1 File). Table 2 presents the mean runtimes associated with particular methods and test iterations conducted on all four HWs. Although median can be generally considered as more appropriate indicator due to possible outliers contained in the dataset, these are mostly associated with the initialization process in each iteration. Since this limitation was removed based on the methodology provided by Georges et al. [21]-see section 2.2, the means are used for evaluation. Furthermore, all statistical methods applied during the verification phase are based on usage of means. A graphical representation of the acquired results is depicted in Fig 1.

Mean runtimes (without -server parameter)
Mean runtimes (with -server parameter) A cursory examination of the results indicates that there may be differences in terms of mean runtimes among all methods. While mean runtimes achieved by Vikor and Topsis methods seem promising, runtimes associated with Wpm and Promethee methods look inappropriate for decision-making in agent-based models. However, since some test iterations contain significant variance in runtimes, while others do not, this needs be verified. Therefore, additional statistical verification is conducted. To find out if the obtained results are influenced by certain factors, the main analyzed question is formulated as "What is the impact of Test, HW setting (only Setting hereinafter) and Method on Runtime?". The tested hypothesis is formulated as follows "the means of runtimes are equal across groups defined by Test, Setting and Method". Hypothesis is tested with the help of the GLM Repeated Measures model with additional post-hoc tests are applied with the help of the SPSS v23 statistical package. Firstly, the mixed model ANOVA with Test and Setting as with-in subject factors and Methods as between subject factor has been conducted. There have been four levels in each factor Test (Test1, Test2, Test3, and Test4).
The multivariate test revealed that there are significant interactions Setting Ã Method, and Setting Ã Test, Wilks' Lambda = 0.003, F (9, 394) = 1042.712, p = 0.000, and Wilks' Lambda = 0.638, F(9, 394) = 24.512, p = 0.000 respectively. The interaction points that there were significant differences in methods across different HW settings, and in tests across different HW settings. This interaction could be explained by different hardware configuration of each computer, difference in computational effectiveness of methods and the independence of each test. The same results are acquired from Tests of Within-Subjects Effects. The interaction Setting Ã Test Ã Method is at the border line of significance. Since it is statistically difficult to Suitability of Multi-Criteria Decision-Making Methods in Large-Scale Agent Models interpret threefold type of interaction, it will be excluded from the analysis. The only insignificant interaction Test Ã Method, Wilks' Lambda = 0.977, F (9, 394) = 1.030, p = 0.413, confirms consistency of tests, because results of tests do not differ in single methods (see Table 3). The profile plot in Fig 2 indicates that the interaction is connected with the Promethee and Wpm methods. Vikor and Topsis methods seem to have significant interaction with particular tests. Therefore, further examinations are conducted.
Test of Between-Subject Effects reveals that influence of Methods is significant and thus methods differ from each other, F(3, 400) = 3345.173; p<0.05; Partial Eta Squared = 0.962 (see Table 4). The Post Hoc tests show that Vikor is significantly faster than the other methods. Consequently, pairwise comparisons of single methods are conducted. Results are presented in Table 5. This analysis confirms previously indicated result that there is a similarity of results acquired by Promethee and Wpm methods. Other pairs of methods, when mutually compared, differ from each other.
Together with the aforementioned analysis several post-tests are conducted. Since Levene's test does not confirm the null hypothesis that the error variance of the dependent variable is equal across groups, the Tukey HSD test which is commonly used cannot be applied. Therefore, the Games Howell test is used instead. This test confirms results acquired from the previous analysis (see Table 6). As already stated, the same analysis is performed with dataset obtained during experiments with the -server parameter applied. Three out of four indicators (namely Pillai's Trace, Wilks' Lambda and Hotelling's Trace) indicate insignificant interaction between Test and Method in Multivariate tests again. This is also confirmed by Test of Within-Subjects Effects. Test of Between-Subject Effects again reveals that influence of Methods is significant when the-server parameter is used and thus methods differ from each other, F(3, 400) = 8504.825; p = 0.000; Partial Eta Squared = 0.985. In opposite to previous analysis, the pairwise comparisons of methods reveal that all four methods mutually differ from each other. This is also confirmed by particular profile plot depicted in Fig 3. Similarly to the previous analysis, the Levene's test indicates violation of an assumption of homogeneity of variances. Games Howell's post-hoc test confirms difference with p<0.05 for every combination of methods.
Next part of the statistical analysis is focused on influence of the-server parameter on acquired results. Pairwise comparisons of all methods on both-server parameter states is conducted. Results are presented in Table 7. Profile plots confirm that the Vikor method represents  significantly faster method for decision-making agents in agent-based simulations across all Settings and Tests. Fig 4 reveals that the-server parameter influences mean runtimes with significant effect. Evaluation of suitability can be summarized in Fig 5 in which all available data points were analyzed. It is apparent that the Vikor method surpasses all other methods in terms of mean runtimes at all Settings. Although the order of method suitability may change when experiments are conducted on different Settings, this variation is mostly related to Premethee and Wpm methods. The tested hypothesis thus cannot be accepted.
In order to prove that results are valid also for different sizes of population, we conducted additional experiments with 10^3, 10^4, 10^5 and 10^6 decision-making agents. This test was  Table 8 with means of various model sizes). The order of methods in terms of suitability evaluated by mean runtimes remains unchanged. Acquired results can be compared to other similar studies, which focus on comparison of various MCDM methods in different settings. For instance, in addition to studies mentioned earlier in section Calculations, Anojkumar et al. [54] provide a thorough comparative analysis of MCDM methods for industrial applications, namely FAHP-TOPSIS, FAHP-VIKOR, FAH-P-ELECTRE, and FAHP-PROMETH EE. Authors conclude that "application of VIKOR method provides valuable assistance for material selection decision-making . . . The MCDM techniques are producing significant results and also a bridge for material selection problem". Related research was also conducted by Mulliner et al. [55], using the WPM, WSM, AHP, TOPSIS, and COPRAS methods for comparable housing affordability decision-making problems. Mulliner and his team conclude that "the 'best' (and second best) alternative obtained by all examined methods was equal but the overall ranking of all alternatives varied between methods. " Because the comparative analysis also demonstrates that"none of the MCDM methods are to be considered perfect", authors recommend the application of more than one method wherever possible.
While complex comparative analyses are quite scarce, there are however many applications using various combinations of MCDM methods, such as [56], [57], [58], [35] and [44]. It seems from a review of the literature that the actual trend in the borderline research lies in using hybrid solutions that combine two or more MCDM methods. For such applications, the results of our testing may prove to be useful, especially if the computational requirements are carefully considered.  Despite the acquired results, all methods can be used in practice. Statistically significant differences do not have to be significant in practice. The selection of the most appropriate MCDM method from among WPM, PROMETHEE, TOPSIS and VIKOR is therefore, to a large extent, matter of preference according to the given problem. However, only more important decisions should be made using MCDM methods in the case of large-scale model applications, where a large number of participating subjects are involved (in this case agents). Although simpler methods of decision-making may be less precise, the computational load associated with more sophisticated methods could in many cases outweigh any possible gain. Therefore, the strategic level of decision-making can be considered as the most suitable for MCDM application.

Limitations and Further Research Directions
Conducted experiment is associated with some limitations that need to be explicitly stated. First, the research question is closely tied to two assumption which validity is based on the review conducted by authors. Nevertheless, it would be valuable to conduct extensive quantitative research focused on usage of platforms (software) and simulation environments (hardware) in agent-based modelling to support creation of more relevant and representative studies. Second, according to Hobbs et al. [59] a good experiment should satisfy the following conditions: (a) compare methods that are widely used, represent divergent philosophies of decision making or claimed to represent important methodological improvements, (b) address the question of appropriateness, ease of use and validity, (c) controlled, uses large samples and is replicable, (d) compares methods across a variety of problems, and (e) problems involved are realistic. Our simulation experiment satisfies all conditions except the fourth one. Thus, various models can be used in future research to confirm acquired results or further specify their validity. Third, there is an important aspect of distributed computation (using supercomputers, computational grids, high-performance networks) that is often related to large-scale models. Diverse issues related to such an implementation of the virtual environment can have a significant impact on the effectiveness of decision-making algorithms. However, there are many related problems that need to be solved, such as the fact that some of the input data used in decision-making may be necessary to share, or sub-optimal implementation may create communication bottlenecks resulting in slower data exchange. Moreover, this research direction is quite topical with many unsolved issues [60] and thus goes beyond the research question investigated in this manuscript. Nevertheless, it represents an interesting direction for further testing and future work. Finally, although unlikely, the results might be solely tied to the specific virtual environment used in the experiment. The comparison of similar experiments in different platforms and verification of the results would be very helpful and valuable.
This study has several implications. For example, the difference between small-scale models and large-scale models from the perspective of application in practice is significantly reduced. This result contributes to the development of agent-based modelling and simulation in the same way as the increase of computational power or advances in computer graphics. Based on the outcomes of this study, prospective modelers and designers might make a more qualified and intelligible choice of decision-making mechanisms for their models.

Conclusions
Multi-criteria decision-making is formally realized with the help of various methods or algorithms. These methods might be successfully implemented in the specific subset of economic systems, i.e., agent-based computational economics in general, and in large-scale (agent-oriented) models in particular. This study conducted several experiments on four different HWs and compared four MCDM methods, WPM, TOPSIS, VIKOR, and PROMETHEE. The working hypothesis, which stated that one of the selected methods exceeds the rest in terms of computational effectiveness, is confirmed, as the acquired data demonstrated significant differences in suitability of the selected methods. Therefore, VIKOR can be recommended as the most suitable method for implementation in practice because it achieves the best results. Nevertheless, further testing under different conditions would be useful to more powerfully validate this result.