Multi-criteria decision support for planning and evaluation of performance of viral marketing campaigns in social networks

The current marketing landscape, apart from conventional approaches, consists of campaigns designed especially for launching information diffusion processes within online networks. Associated research is focused on information propagation models, campaign initialization strategies and factors affecting campaign dynamics. In terms of algorithms and performance evaluation, the final coverage represented by the fraction of activated nodes within a target network is usually used. It is not necessarily consistent with the real marketing campaigns using various characteristics and parameters related to coverage, costs, behavioral patterns and time factors for overall evaluation. This paper presents assumptions for a decision support system for multi-criteria campaign planning and evaluation with inputs from agent-based simulations. The results, which are delivered from a simulation model based on synthetic networks in a form of decision scenarios, are verified within a real network. Last, but not least, the study proposes a multi-objective campaign evaluation framework with several campaign evaluation metrics integrated. The results showed that the recommendations generated with the use of synthetic networks applied to real networks delivered results according to the decision makers’ expectation in terms of the used evaluation criteria. Apart from practical applications, the proposed multi-objective approach creates new evaluation possibilities for theoretical studies focused on information spreading processes within complex networks.


Introduction
The evolution of social networking platforms has led to a crucial need to understand how millions of online users behave, including their online and real life behaviors, patterns and predispositions [1]. Apart from studying social relations and online activity, information spreading processes are among the phenomenas with high attention from both researchers and practitioners. In a number of cases, as a result of information spreading, viral marketing seems to produce better results than traditional advertising campaigns [2]. There is an increase in the a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 number of online marketers using this opportunity to place even greater efforts in the engagement of potential consumers to benefit from their services and products by propagating information. Due to an increased trustworthiness of communications within a social network that has ties that are particularly strong, recommendations that are socially oriented have a greater impact on the targeted consumers than traditional commercial messages [3]. The research that is related to diffusion of marketing content takes into consideration the factors that lead to campaigns that are successful [4] [5], factors affecting usersÔ Ç Ö participation during information spreading [6], the initial seed sets that are selected for the initialization of the campaign [7] [8], as well as using epidemic models to analyze diffusion processes [9]. Other studies emphasize the role of different centrality measures used for the selection of initial influencers [10], the impact of homophily for successful selection of the initial network nodes [11], the role of the content and network structures [12], user motivation to forward content [5], the role of emotions [13] [14] and other factors [15]. Apart from static networks and single layer structures, multilayer networks [16] and the spreading of information in temporal networks have been studied in the more recent research [17].
Many earlier studies were focused on theoretical and empirical approaches increasing the number of reached customers within a network [7] [18]. While it is an important metric of the campaign success, several other factors should also be taken into account [19]. Apart from coverage, they include the campaign's costs and duration, number of initial seeds and their selection strategies [20] [21].
This study proposes and examines the framework for a multi-objective evaluation of information spreading processes. The presented framework can be used for strategic planning of information spreading processes in order to help selecting the appropriate strategy for selection of the initial nodes within the network and adjusting the number of activated nodes in the seeding process. While viral marketing processes can be based on increasing the motivation of content forwarding, the evaluation of the potential of available approaches creates another areas of applications of the multi-objective methods. The main contribution of the presented study is the framework for multi-objective selection of methods influencing campaign dynamics and coverage with the use of several evaluation criteria. In practical terms, an evaluation model was created with the use of the PROMETHEE II method and agent-based simulations were performed with sensitivity analysis used.
The remainder of this paper provides the methodological background and the conceptual framework in Section 2. This is followed by the example planning process in Section 3, data evaluation and searching for alternatives in Section 4 with conclusions in Section 5.
The methodological background on network structures evolved simultaneously, yet separately, on various disciplines [25]. In this paper, network G is defined as a set of nodes (vertices) V(G) interconnected with the set of edges E(G), which can be represented with the following mathematical notation: G(V, E). A path in graph G is a set of edges {{v 1 , v 2 }, {v 2 , v 3 }, . . ., {v n−1 , v n }}, where the end of the {v i , v i+1 } edge is the beginning of the {v i+1 , v i+2 } edge for every i = 0, . . ., n − 2, and where every node and edge are unique. The length of a path is the number of edges it comprises of. The distance d(i, j) from node i to node j is the length of the shortest path from i to j.
The current research done in the field can be identified as taking different directions. The dedicated solutions like linear threshold [22] and independent cascades model [26], as well as epidemic research models are implemented to model the way information is spread [9]. A large number of studies relate to the initiation of the information distribution processes and network node selection [7]. The most of the seeding strategies use network centrality measures for obtaining the nodes' ranking and initiating the seeding process, assuming an increased potential to distribute information resulting from the top nodes, having a vital role in the network structures. One of the most fundamental characteristics of a graph's node is its degree, i.e. the number of edges incident to this node, denoted deg(v), where v 2 V. In case of information spreading, the higher the node's degree, the more nodes the information can be potentially propagated to. The degree distribution P(k) of a network represents the fraction of the nodes in the network which have degree equal to k. Other measures based on closeness, betweenness or eigenvector centralities are used as well. The closeness of a node is a measure of its centrality in a network, calculated as the sum of the lengths of the shortest paths between the node and all other nodes in the network: The smaller the closeness value, the more central position in the network the node has, thus allowing to reach every other node in fewer steps. For every pair of vertices (v i , v j ) in graph G, there exists at least one shortest path between v i and v j with the number of edges on the path minimized. The betweenness of a vertex v k is the ratio of the number of such shortest paths that pass through v k to all such shortest paths. The higher the betweenness value of a node, the more nodes can be accessed through that node. Eventually, the eigenvector centrality is a measure of influence of a node in a network. Each node in a network obtains a relative score based on a concept that connections from the highscored nodes contribute to the node's score more than connections from the low-scoring ones. Therefore, a high eigenvector centrality value means that a node has more influence on the other nodes in the network.
These approaches tend to be used despite the fact that they require computational resources that are limited and that they fail in the delivery of an optimal seed set. Better results are obtained from solutions that are more sophisticated, like greedy-based selection, along with extensions it may have, however, the computational costs are substantial and it is not easily implemented on networks that are very complex [26].
Structural measures are used to improve optimization, so that nodes in the same network segments are not selected to allow a better seed allocation. These types of solutions are based more on better use of processes of natural diffusion and use sequential seeding [27], avoid nodes from within the same communities with intra connections that are close by using target communities [28], use dynamic rankings with sequential seeding [29] and use mechanisms for voting that have lower weights once activated nodes have been detected [30]. Apart from basic centrality measures, the central nodes in networks can be detected using a k-shell based approach [31]. Alternative approaches use bio-inspired algorithms to select the initial set of nodes [32].
The majority of earlier approaches are based on networks that are static, while the more recent studies account for networks that are dynamic. They have temporal characteristics and more reality-based specifics, as opposed to static snapshots [17]. The other research paths took base on the multi-layer networks and processes of information spreading that are intra layer [16]. There have been attempts to use other knowledge about the on-going processes to obtain better results by using adaptive seeding, even though the majority of solutions are geared to the seed set initiated processes [33]. Other approaches account for numerous campaigns that are on-going, as well as how they interrelate [34].
Unfortunately, the knowledge about the social network in which the campaign is going to be performed is often limited to some basic characteristics. Moreover, the number of nodes and edges in a real social network is often immense, which makes advanced simulations infeasible. The research takes into account simulations within synthetic networks to investigate phenomena within different network structures. While collecting information about real networks is difficult, synthetic networks based on theoretical models can be used. Moreover, the structure of synthetic networks can be adjusted during the generation process, thus allowing the researchers to perform a more profound analysis of the processes in complex networks. Simulation studies often use networks based on the free-scale model proposed by Barabasi-Albert (BA) [35], small world model proposed by Watts-Strogatz (WS) [36] and random graph model introduced by Erdos-Renyi (ER) [37]. For example [38] used WS, ER and BA synthetic networks for modeling interacting processes, [12] analyzed the role of structures of ER, BA, and WS networks on campaign performance, [39] proposed a framework to analyze multiple spreading processes and verified it with the use of BA networks, [40] used WS networks for cooperative epidemic modeling.
The characteristics BA and WS theoretical models are close to real systems. The Barabasi-Albert network model was first created in 1999, as a result of a study of the at-the-time structure of the WWW. The construction of BA networks is based on two complementary mechanisms: network growth and the mechanism of preferential attachment. The BA model is similar to several natural and human-made systems, such as the Internet, WWW, citation networks or social networks, to name just a few, where several selected nodes (hubs) have unusually high degree compared to the remaining nodes of the network. Fig 1a presents an example of a Barabasi-Albert network and the chart on Fig 1d depicts the degree distribution of a sample BA model. The Erdos-Renyi network model was first described in 1959 and is constructed on the assumptions that at first the number N of nodes is defined and, subsequently, from all ð N 2 Þ pairs of nodes, random E pairs are selected between which the edges are created. A sample ER model and degree distribution of a sample ER model is presented on Fig 1b and 1e respectively. The ER model offers a simple and powerful model with many applications, but might be inappropriate for modeling some real-life phenomena, due to the fact it does not generate local clustering of nodes. Therefore, in 1998 the Watts-Strogatz model was created to addresses this issue. The WS model accounts for clustering, but keeps the short average path lengths from the ER model.

MCDA foundations of the proposed approach
In case of a viral marketing campaign in a social network, the ordering party might be interested not only in maximizing the coverage of the campaign, but also in affecting its dynamics, as well as keeping the campaign cost within a reasonable budget. All these aspects need to be considered before launching the campaign. Therefore, planning a viral marketing campaign in social network is a multi-criteria problem, which can be presented as (2) [41]: where A is a finite set of possible campaign strategies {a 1 , a 2 , . . ., a n }, whereas {c 1 (�), c 2 (�), . . ., c k (�)} is a set of evaluation criteria. Some of the criteria might be maximized and others minimized. The criterial performance of each strategy regarding each criterion can be expressed in a form of a performance table. Intuitively, it is expected from the decision maker (DM) to identify the strategy that optimizes all criteria. However, usually there exists no alternative that optimizes all criteria simultaneously. Let us consider an example viral marketing campaign, for which multiple alternative strategies were prepared. The strategies are characterized by three criteria: seeding fraction, propagation probability and the potential coverage that can be obtained. The coverage is a very important criterion, however, generally a strategy that obtains 100% coverage is not always chosen, as it would require infecting a massive number of initial seeds in the network or providing multiple incentives to increase the propagation probability in the network. On the other hand, if a strategy with minimal seeding fraction and minimal propagation probability is chosen, it cannot be expected to cover the complete network. Therefore, a compromise solution between the strategies should be chosen. Multi-criteria decision support for planning and evaluation of performance of viral marketing campaigns It is important to note that the solution to a multi-criteria problem depends not only on the criterial performance of each alternative, but also on the campaign ordering party itself. There is no absolute best strategy for all campaigns and the best compromise strategy depends on the preferences of the DM.
Three natural dominance relations can be associated to a decision problem of the multi-criteria nature presented in (2): indifference, preference and incomparability. Let us consider two alternatives a and b. If for every criterion c i a is as good as b, then the two strategies are indifferent (aIb). If for every criterion c j a is as good or equal to b and there exists at least a single criterion c k for which a is better than b, then a is preferred to b (aPb).
Finally, if there is a criterion c m for which a is better than b, but there also exists a criterion c n for which b is better than a, then the two strategies are incomparable (aRb). Strategies which are best at each criterion are rare and, therefore, usually most strategies are incomparable without additional information from the campaign ordering party. This information can include inter alia the weights expressing the relative importance of each criterion or preferences associated to each pairwise comparison of strategies when each criterion is considered on its own [41].
Multiple multi-criteria decision analysis methods have been invented in order to reduce the number of incomparabilities (R) in the decision graph between the considered viral marketing campaign strategies. The MCDA attempts to handle this task can be generally divided into two approaches, the so-called American and European MCDA schools. The former is based on aggregating all the decision-making problem criteria into a single criterion-a utility function. Such approach has the benefit of providing the possibility to produce a complete ranking of strategies with a precise score given to each one. However, such approach largely transforms the structure of the decision problem. On the other hand, Roy [42,43] proposed to construct outranking relations by enriching the dominance relation between the alternatives where possible. In such an approach (European MCDA school), not all incomparabilities are eliminated, however, a reliable selection of the best alternative is possible.
Presently, the literature review allows to observe a number of approaches (MCDA methods) based on the above American and European approaches [44]. Discussions about the up-todate MCDA methods can be found inter alia in [45,46]. The AHP, TOPSIS and Electre methods are often indicated as popular and widely used in the problems of evaluation and ranking creation [47,48]. The selection of the aggregation technique (the utilized MCDA method) may influence the quality of the obtained modeling results [49][50][51][52] and requires justification in the context of the modeling aspects adapted in the paper [53,54].
When analyzing the characteristics of the data and the environment of the constructed MCDA model, it should be noted that the input data of the model has a quantitative character and is expressed on the cardinal scale. It was decided that the weights of the individual criteria should be taken into account and that they should be expressed using explicitly specified numerical values [55]. Thus, the result is expected to be expressed on a quantitative scale [53]. The modeling process also assumed the natural imprecision of the preference information, which in practice, in MCDA methods, takes the form of complex preferential functions (e.g. pseudo-criteria) [42]. Additionally, the construction and usage (decision problematic) of the model, should result in a ranking of variants (strategies) [43]. The obtained ranking is expected to provide a complete order of the strategies [56]. None of the popular MCDA methods (AHP, TOPSIS, Electre) meet all of the indicated requirements at once. Based on a set of 56 MCDA methods discussed in [45] and the MCDA taxonomy contained there, it is easy to show that these assumptions are fully implemented only by the PROMETHEE II method. Therefore, it was decided to use this method in the next steps of the data analysis.
PROMETHEE is a family of MCDA methods that use pairwise comparison and outranking flows to produce a ranking of best decision variants [57]. The weights expressing the relative importance of each criterion need to be specified by the decision maker. This is a complicated process based on the DM's priorities and perceptions. The actual values of the criteria weights can be freely selected by the campaign ordering party. Fortunately, multi-criteria decision analysis provides tools such as sensitivity and robustness analyses, which allow to verify the effects the chosen values have on the resulting rankings and to sequentially adjust the weights.
When the PROMETHEE methods are used for viral marketing campaign strategy selection, all strategies are compared pairwise. The preference for one alternative over another is studied under each criterion. For a small difference d in evaluations of the two compared strategies, a small preference P would be assigned to the better one, or no preference at all, if the difference is negligible. On the other hand, the larger the difference d between alternatives, the higher the preference P. The preference P between strategies a and b under criterion c j is expressed as real numbers P j (a, b) and is in the range between 0 and 1. The actual preference value assigned depends on the preference function in the DM's brain. The authors of the PROMETHEE methods propose six preference functions to express the preference function of the DM: usual criterion, U-shape criterion, V-shape criterion, level criterion, V-shape with indifference criterion, Gaussian criterion (see Fig 2 and Table 1) [41,43,58]. The three variables presented in Table 1, i.e. p, q and s require an explanation. The p preference threshold is the smallest difference between two alternatives that would result in a full preference of one alternative over the other. The q indifference threshold is the largest difference between two alternatives that the DM considers negligible. Parameter s denotes the inflection point of the Gaussian preference function and should be selected as a value between q and p. It was decided in the process of preference modeling to use two of the six preference functions: V-shape and Gaussian. Their choice was dictated by the possibility of including the natural imprecision of the preference information of the decision maker into the modeling process [41,43]. The form of the V-shape function, being the most complex structure of the preference function, is based on the concepts of strong and weak preferences, as well as indifference [43]. It is directly related to the concept of a pseudo-criterion, i.e. criterial function with the thresholds of indifference and weak and strong preference. This, in contrast to pre-criterion, quasicriterion and real criterion, allows to effectively model the areas of information uncertainty of the decision-maker and, consequently, also to examine the robustness of the model in a wider scope, instead of simply generating a ranking [53]. Complementary, for comparative purposes, the Gaussian form was used as the second preferential function. In this research, contrary to the classic tasks of the MCDA methods, where only a small number of variants is being ordered [58], the obtained sample is fairly complex. It was assumed that the distribution of preferences is based on the Gauss function and, thus, reflects the normal distribution. Such representation allows to build, based on the given sample, a softer form of the preferential function (when compared to the linear form of the V-shape function). It is worth noting that this form of the preference function is based exclusively on the concepts of weak preference and indifference.
For each viral marketing campaign strategy, an aggregated preference index can be computed with the formula (3): where w j denotes the weight assigned to the C j criterion. π(a, b) * 0 implies a weak global preference, whereas π(a, b) * 1 implies a strong global preference of a over b.
The obtained indices are then used to calculate the positive and negative outranking flows with (4) and (5) [41]: The ϕ + (a) value indicates the strength of alternative a, i.e. how well it is outranking other alternatives. On the other hand, the ϕ − (a) value represents how the alternative a is outranked by other alternatives, thus showing its weakness. The PROMETHEE I method uses the ϕ + and  ϕ − values to produce a partial ranking of the alternatives [41]. The usage of the PROMEHTEE II method, in turn, would allow to obtain the complete ranking of the campaign strategies based on the net outranking flow (6):

a) usual b) U-shape c) V-shape
If the criterion c j is given the weight of 100% while the rest of the criteria is given the weight of 0%, a single criterion net flow for each strategy a is obtained: ϕ j (a). When all single criterion net flows for all k criteria and n strategies are known, then all strategies can be represented as points in a k-dimensional space. Since the decision problems usually consist of more than two criteria, the n points from the k-dimensional space need to be projected to a plane.
The family of PROMETHEE methods comes with the GAIA visual modeling. A GAIA plane is a plane on which the alternatives and the criteria unit vectors are projected, for which as much information as possible is preserved after projection. The quantity of information that was preserved by the projection is denoted as δ. The GAIA plane can successfully support the decision problem analysis if δ >= 50%. The GAIA plane allows the analyst to learn the information about the criteria and alternatives in a decision problem [41]: • the length of the criterion vector on the plane represents how discriminating the criterion is.
The longer the vector, the more effect the criterion has on the final decision; • the vectors of criteria similar in terms of preference are pointed in similar directions; • the vectors of criteria conflicting in terms of preference are pointed in opposite directions; • the vectors of criteria not related to each other in terms of preference are pointed orthogonally; • alternatives with similar performance are grouped closely together; • alternatives supported by a particular criterion are located in the direction pointed by this criterion's vector.
An example GAIA plane for a sample viral marketing strategy campaign selection problem with four possible strategies and three criteria (c 1 -maximization of coverage, c 2 -minimization of the number of iterations, c 3 -minimization of the seeding fraction) is presented on Fig  3. The analysis of the example allows to observe that, since the c 1 and c 3 vectors are pointing in opposite directions, the preference for maximizing the coverage is in conflict with the preference for minimizing the seeding fraction. On the other hand, the c 2 's orthogonal direction compared to c 1 and c 3 implies that the preference for minimization the number of iterations is not related to the preference for maximizing the coverage or minimizing the seeding fraction. The lengths of the criteria's vectors suggest that the maximization of coverage (c 1 ) is most discriminating in this decision problem. The alternatives a 1 and a 4 are grouped together at the location pointed by the vector c 1 , therefore, it can be assumed that they both are similar to each other and mostly supported by the coverage maximization criterion. On the other hand, they are located in a direction opposite to the direction of the c 3 vector, which implies that they fail to use a small seeding fraction.

Conceptual framework and evaluation criteria
The selection of the best strategy for a viral marketing campaign in social networks is a complex decision-making problem based on multiple criteria. Moreover, running simulations on a real network is most often time consuming and sometimes impossible. Therefore, in the presented approach (see Fig 4) the authors propose to run the planning process on a synthetic model, which has similar properties to the target real network, yet allows to perform multiple simulations resulting in data for the performance table as an input to the strategy evaluation process.
In order to obtain such a synthetic model resembling the target real network, the authors propose to generate a set of BA, ER and WS networks with various parameters and number of nodes equal to 10%, 20%, . . . 100% of the real network. Subsequently, KullbackÔ Ç ôLeibler divergence [59] can be used to learn which of the generated networks is closest to the real one. Criteria affecting the complexity of simulations, such as number of nodes and edges, can also be considered. Additional criteria can also be added to the decision process, depending on the analyst's needs. When the performance table is created for all synthetic networks and all criteria, the most preferred one should be selected based on the gathered data with the use of MCDA methods.
The next element of the proposed framework is the decision model structuring process, during which the decision criteria for evaluating the possible campaign strategies are chosen. In the authors' approach, the criteria can be divided into two groups. The first group contains the input criteria for constructing a strategy-Par1, Par2, . . ., Parm. The second group contains the strategy performance evaluation criteria Eff1, Eff2, . . ., Effn, whose values are based on achieved effects and can be obtained by simulating each strategy. Nevertheless, the proposed framework assumes the decision-maker's freedom in selection of the decision criteria and in grouping them into clusters, depending on the campaign ordering party requirements.
After the criteria have been selected and the decision model has been structured, the chosen synthetic network should be used to perform a complete set of simulations required to obtain the performance table containing the evaluations of each strategy. For simulations independent cascades model (IC) [26] was implemented. Information spreading process is initiated by a set of nodes activated by seeding. Spreading is based on propagation probability PP(a, b) that node a activates node b in the step t + 1 under condition that node a was activated at time t by other node or was selected as a seed [60]. The main reason for selecting this model was a relatively small number of seeds needed to induce diffusion what can be important for small networks. In linear threshold model (LT) small number of activated nodes would have no effect [26].
Subsequently, the obtained performance table is used to perform MCDA analysis of the possible strategies with the use of PROMETHEE II. The analysis includes the following aspects: • generation of a complete ranking of the viral marketing campaign strategies, based on various preference functions; • usage of the GAIA plane to verify how each criterion affects the strategy selection; • performing sensitivity analysis to verify the stability intervals of the rankings of the leading campaign strategies.
It should be noted that during the analysis, the preference modeling step is repeated multiple times. The initial preference weights of the criteria can be subsequently modified to verify the robustness of the obtained strategy selection problem solution. Eventually, the analyst provides the recommendation which strategy, i.e. campaign parameters, should be used to run the campaign on the target real network. Last, but not least, the authors' proposed framework can also be used to monitor the results of the executed campaign, as well as to perform a multi-criteria evaluation of the campaign strategies on a real network.
The conceptual framework proposes a model in a generalized form with Par1,Par2,. . ., Parm criteria related to campaign settings treated as input variables and Eff1,Eff2,. . .,Effn evaluation metrics related to campaigns results understood as output variables. The generalized model can be parametrized for campaigns with different mechanics, used strategies and goals. Like it was discussed in [19], criteria can vary across different campaigns, sectors, strategies and available resources. Therefore, a set of multi-criteria decisions needs to be made at the planning stage of the viral marketing campaign, based on the campaign objectives and available budget. It leads us to framework verification with the use of parameters mapping initial viral campaigns parameters and settings into the parameters of the simulation model. For the model validation we propose a set of criteria discussed in earlier studies and the specifics of the used simulation with the independent cascade model and the agent based approach presented in Table 2.

The number of initial customers and seeding fraction (Par1).
The process of information spreading in viral marketing campaign in social networks is initialized with seeding the advertising content to a group of people (initial set of nodes). The fraction of nodes that are selected from the network for seeding can be adjusted according to the campaign objectives and it is affecting the dynamics and coverage of the process. The earlier research usually uses fixed ranges of seeding percentage as a parameter [7]. The activation of the initial seeds is recognized as the main cost of viral marketing campaigns [61]. The cost can be growing for highly influential nodes, while they attract high attention from marketers and the users from their direct and indirect connections. Other research focuses on minimization of the seed set to reduce the initial costs with probabilistic coverage guarantee [62]. The cost-effectiveness can decrease when more nodes are added to the seeding [63]. If too many users are targeted, an overexposure effect takes place [64]. While the activation of large fraction of network nodes as the seed set can result in high number of nodes reached within the network, it requires high activation costs. The goal can be to use the smallest possible seed set delivering satisfactory results [65] [62]. To include the above factors, we define criterion Par1 representing the fraction of nodes used as the initial seeds denoted as seeding fraction (SF) within the simulation model.

Spreading the content and propagation probability (Par2).
In order to motivate the network members to pass the information further, some financial investments need to be made. As a result, the propagation probability is directly related to the campaign costs. From the practical point of view, the propagation probability can be increased with coupons and other incentives [66]. Authors discuss the role of incentives for increasing the camping dynamics and the costs of incentives is related to the degree of the target nodes [67]. The proposed approach minimizes the cost while guarantees the number of reached users. One of the strategies is enforcing the propagation dynamics without the use of additional seeds and users with high centrality measures which are expensive to reach [61]. Activation of early adopters and increasing their propagation probabilities may require higher incentives [68]. Multi-scale incentives can be used for users from different target groups to further boost the diffusion rate [69]. The top influential nodes, such as a popular user, may require more incentives to be recruited as a Seed [70]. To generalize the aforementioned factors, we use Par2 as the main result of the increased motivation and propagation probability (PP) during the simulations.

Selection of initial customers and nodes ranking method (Par3).
The nodes for seeding are selected based on their ranks computed from various centrality measures, such as degree, betweenness, closeness or eigenvector centrality. Each centrality measure requires some level of effort, indirectly related to a third kind of cost. Intuitively, if the seeding fraction is high and the network members are motivated to increase their propagation probability, the process of information diffusion should execute dynamically and achieve high network coverage. However, the budget for the campaign can be limited. Moreover, the aim of the campaign might not be to achieve high coverage very fast, but to keep the campaign slowly crawling for a longer amount of time. Computational cost of choosing seeds was analyzed earlier in relation to greedy algorithm [71]. Another study discusses computational costs and propose upperbound estimation based algorithm to accelerate the computing speed [72]. Authors with the same approximation ratio like greedy algorithm [26]. The authors of [73] emphasize that the earlier approaches use impractical assumptions that any seed user can be acquired with the same cost and the same is the benefit obtained when influencing each user. Study [73] proposes cost-aware targeted viral marketing focused mainly on of selecting a node. Costs may represent the degree of difficulty with which people accept specific information [74]. From the perspective of network analysis, the centrality measures like page rank can represent costs because they are usually proportional to social influence. They can be used for mapping the corresponding cost values to all users in a given social network. Positive correlation between degree centrality and the success of viral marketing is observed [7]. To include above factors within the model, we assume different costs for different rankings methods. Nodes selection costs are represented by parameter Par3 within the model. For example they are lower for degree and higher for betweenness computations during simulations.

Campaign duration and number of simulation steps (Eff4).
While Par1-Par3 are related to the model inputs and key factors affecting campaign performance like number of initial customers, budges, incentives and other forms of customer motivation, the proposed approach assumes monitoring of the campaign effects and assigning to them the preferences of the decision maker. The campaign cognitive goals can be based on reach, awareness and knowledge, behavioral goals are represented by number of actions and rate at which creatives are transferred [19]. From the perspective of the decision maker, the time when the assumed number of messages is received can be crucial. One of the goals can be minimizing the time in which assumed coverage is achieved [75]. Other authors emphasize the velocity and the speed of transmission, persistence and mental barriers [20]. Another study minimizes the complete influence time with cost represented by a fuzzy variable [21]. The role of time was emphasized in terms of campaigns with limited time (eg. political campaigns) [76]. In the proposed model, the duration of the campaign is represented by evaluation criteria Eff4 and (in simulations) as the number of simulation steps until the process is finished.

Campaign coverage and the total number of activated nodes (Eff5).
Another measured result is related to the network coverage and is represented by criterion five (Eff5). It is the most common effect taken into consideration. Most of research focuses on maximizing reach and number of infected nodes and is treated as the main goal of a campaign [18]. The ability to reach a large number of customers with limited advertising budget is the key feature of the viral marketing [2]. From the perspective of algorithms, total coverage is the key evaluation factor used for influence maximization problem and seed selection algorithms evaluation [26]. It is represented by the number of activated nodes within the network used during simulations.

Planning a viral marketing campaign with the use of synthetic networks
The empirical study has been divided into two subsections-planning and evaluation. In the former, a substantially smaller synthetic network was chosen to facilitate the planning of a viral marketing campaign. In the latter, an evaluation of marketing strategies in a real network [77] was presented.

Synthetic network selection
In the empirical research, the authors used the proposed framework to plan a viral marketing campaign for a real network [77]. The real network is based on 7610 nodes, 15751 edges with average values of main metrics: total degree D = 4.14, closeness C = 0.0004, PageRank PR = 0.0001, EigenVector EV = 0.003, clustering coefficient CC = 0.49 and betweenness B = 13478.93. The degree distribution of the network is presented on the chart on Fig 5a. In order to approximate the real network, a set of 150 synthetic networks was generated. This set was built by combining three network models (BA, ER and WS) with the following parameters: percentage of nodes of the real network-10%, 20%, . . ., 100%, i.e. 761, 1522, . . ., 7610 nodes; out-degree parameters with values 1,2,3,4 and 5 for the BA networks; number of edges in graph equal to 100%, 200%, 300%, 400% and 500% for the ER networks and neighborhood within which the vertices of the lattice will be connected with values 1,2,3,4,5 with rewiring probability 0.5 for the WS networks.
Kullback-Leibler divergence (KLD) [59] was used to evaluate the similarity of the synthetic networks to the real network. The results are visually presented on  network, regardless of the size of the network or parameters selected. On the other hand, in case of the BA networks the similarity to the real network depends on the number of nodes and parameters chosen. The more nodes and edges, the closer its degree distribution is to the degree distribution of the real network. The lowest KLD value was observed for the BA network with 7610 nodes and 38035 edges. However, selection of such a network provides little computational benefit compared to the original real network. Therefore, the actual network for the campaign planning was selected with the use of MCDA analysis of all the 150 potential synthetic networks based on the following criteria: K1-number of nodes, K2-number of edges and K3-KLD value; with the preference of the minimum values for all criteria and equal weights of all the criteria. As the result of the analysis, the BA network containing 10% nodes of the original network (761) and 3034 edges with network metrics degree D = 7.97, closeness C = 0.3211, PageRank PR = 0.0013, EigenVector EV = 0.086, clustering coefficient CC = 0.040 and betweenness B = 812.03 was selected. The degree distribution of the selected synthetic network is presented on the chart on Fig 5b.

Overview of the simulations in the synthetic network
In order to ascertain repeatability of the results regardless of the simulated parameters, ten simulation scenarios were generated, in which for each node a random value from the range of < 0, 1 > was assigned. This value was later used in the simulations to decide if the particular node passes the information through (the drawn value was smaller than the simulated propagation probability) or if the propagation stops (the drawn value was higher than the simulated propagation probability).
During the simulation stage, a total of 400 sets of parameters was tested, as a Cartesian product of the following parameter values: Each simulation was repeated for all 10 scenarios, thus resulting in 4000 simulation runs. After each simulation run, the iteration of the last infection, as well as the achieved coverage was registered. Their averaged values were then saved as the empirically measured performance values of the Eff4 and Eff5 criteria.

PROMETHEE II analysis
After the simulations finished, the output from subsection 3.2 was used to create performance tables for the PROMETHEE II analysis. Initially, a V-shape preference function was used to model the comparison preferences, with the indifference threshold q = 0 (no uncertainty taken into consideration) and preference threshold p equal to the standard deviation value for each criterion Par1-Eff5. The preference direction for the cost criteria Par1-Par3 were minimized and for the dynamics and coverage criteria Eff4-Eff5 maximized. Initially, every criterion was assumed to be equally important and, therefore, the weights of all criteria were set to 1 (see Table 3a).
The first 10 strategies from the ranking obtained with the PROMETHEE II method are presented in Table 4 and on Fig 7a. It can be observed, that all strategies from the list were based on the fastest (and therefore cheapest) degree measure. The leading alternatives A9 and A13, having a difference between their ϕ values equal to only 0.003 are very similar. Both use the smallest possible seeding factor of 0.01, whereas the propagation probability is equal to 0.2 and 0.3 for A9 and A13 respectively. As a result of the strategy A9, the campaign took averagely 10.4 iterations and covered 58.37% of the network, whereas for the strategy A13, the campaign was more dynamic (averagely 7.9 iterations) and covered more network (averagely 81.13%). The A17 strategy, ranked third, also uses the seeding factor of 0.01, but the propagation probability was increased to 0.4. It can be observed, that while the process averagely took 7 iterations, the coverage increased intensely to the level of 91.83%. The A5 strategy, ranked 4, uses the computationally cheapest parameters-seeding factor and average propagation probability set to 0.01 and the degree measure used. While the propagation process averagely lasted long, i.e. 9.7 iterations, the obtained coverage was merely 16.89% on average. It can be observed, that the alternatives A17 and A57, ranked 3 and 7 respectively, obtained the same coverage, with equal propagation probability and very similar dynamics of the process. However, in case of A17 the seeding fraction was equal to 0.01 and in case of A57 it was twice as much, i.e. 0.02, for which fact the latter was penalized in the overall ranking. The observation of Fig 7a allows to observe that for the best 10 strategies, the coverage grew along with the propagation probability, but that caused shortening of the process due to its high dynamics.

GAIA analysis
The basic PROMETHEE II analysis was followed by the GAIA analysis, which allows to study the relations between criteria, as well as shows which criteria support which strategies. A set of GAIA planes for the PROMETHEE decision problem specified in subsection 3.3 is presented on Fig 8. Fig 8a represents   It can be noticed from Fig 8 that the lengths of all criteria vectors is similar, which confirms their similar importance in the evaluation. If one of the vectors was significantly longer, it would mean that the related criterion is more discriminating. The layout of the vectors' directions allow to note that the cost criterion of the average propagation probability (Par2) is in strong conflict with the average coverage criterion (Eff5) in terms of preference. This confirms an intuitive thesis that the preference for reducing the cost of motivating the members of the social network for passing the seeded information further is in conflict with the preference for maximizing the achieved network coverage. On the other hand, all the cost criteria Par1-Par3 are perpendicular to each other, which means they are generally not related in terms of preference. Last, but not least, the vector representing criterion Eff4 (average last infection iteration) is slightly angled in the direction of both the the vectors representing criterion Par1 (seeding fraction cost) and Par2 (average propagation probability). This means, that the preference for increasing the duration of the campaign may result in selecting similar strategies to the ones in case of the preference for minimizing the costs related to the seeding fraction or the average propagation probability. One of the advantages of the PROMETHEE methods is their ability to aggregate data into groups and clusters. Fig 8c provides the GAIA analysis for a scenario where the three cost criteria Par1-Par3 were aggregated into a single Cost group, Eff4 into Dynamics group and Eff5 into Coverage group. The δ for Fig 8c is very high (90.9%), proving this projection to be very reliable. The analysis of this figure allows to confirm the strong conflict between the Coverage maximization group preference with the Cost minimization preference, however, there is no clear relation between the preferences for the Dynamics and Coverage maximization groups.

Sensitivity analysis
In the scenario analyzed in subsection 3.3, the weights of all criteria were equally set to 1. However, with such a dense grid of alternatives as presented on Fig 8a, it is easy to anticipate that if

Fig 7. Visual representation of the 10 best strategies in PROMETHEE II rankings for the synthetic (a-c) and real (d-f) networks, based on V-shape preference function with no indifference threshold (a,d), V-shape preference function with indifference threshold (b,e) and Gaussian preference function (c-f).
https://doi.org/10.1371/journal.pone.0209372.g007 these preference weights were to change, the ranking of the best strategies would change. For this very reason, MCDA methods provide a tool called sensitivity analysis, which allows to verify the stability of the ranking and to learn how the positions of the alternatives would change if the change in preferences would occur. The results of the performed sensitivity analysis are presented in Tables 5 and 6 for grouped and individual criteria respectively.
In case of the grouped criteria (Table 5), the initial weights were 60% for the Cost group and 20% each for the Dynamics and Coverage groups. A very wide stability interval, equal to 80.10% can be observed for the Dynamics group, which means that the weight of this criterion can be largely increased and the leading alternative would not change its position. On the other hand, if the weight of the Dynamics group dropped by as little as 0.10%, a change in the ranking leader would occur. Much narrower stability interval is observed for the Cost and  Coverage criteria groups, equal to 15.33% and 9.92% respectively. There is possibility to increase the Cost's weight or decrease the Coverage's weight slightly without the change of the ranking leader. If the individual criteria Par1-Eff5 are taken into consideration, wide stability intervals can be observed for the first rank, and, therefore, the ranking stability was also performed for ranks 2 and 3. No changes in the weight of Par1 can cause a change in the rank of the first and second alternative. Only if the Par1 weight drops to below 15.24%, the strategy on rank 3 would be replaced. Similarly, the stability interval for the first three ranks of the Par3 criterion is equal to 95.01%. In case of the Eff4 and Eff5 criteria, the stability intervals for the first rank are the same as in Table 5. For the two leading ranks, the stability interval drops from 80.10% and 9.92% to 11.27% and 3.85% for Eff4 and Eff5 respectively. To sum up, the sensitivity analysis allows to notice that the criteria Par2, Eff4 and Eff5 are most discriminating to the final rankings of the viral advertising strategies for the selected synthetic network.

Uncertainty analysis
The strategies' evaluations in subsections 3.3 and 3.5 were based on certain data, i.e. on situations that the analyst was always able to specify their preference of one strategy over another regarding to individual criteria. However, such differentiation might not always be possible, especially if the difference between the criteria evaluation values are negligibly small. Therefore, in the subsequent section of the analysis, an uncertainty analysis was performed with the use of the PROMETHEE II method. The evaluation model from subsection 3.3 was modified to use the V-shape preference function with indifference area. Therefore, apart from the p threshold, which remained unchanged, the values of the q indifference threshold were set to 50% of the standard deviation value for each criterion Par1-Eff5.
The results of the PROMETHEE II analysis with uncertainty taken into account are presented in Table 4b. The analysis of the results allow to observe that strategy A13 overranked the previously leading strategy A9, whereas strategy A17, previously ranked 3, obtained position 4, while position 3 was taken by the strategy 57, previously ranked 5. Strategy A49 advanced from rank 6 to rank 5, whereas the rank of strategy A5 was reduced from 4 to 6. The rank of the strategies A57 and A89 was reduced by one, i.e. from 7 and 8 to 8 and 9 respectively. Two strategies A93 and A133, previously outside the set of the top ten strategies, advanced to ranks 7 and 10 respectively, when uncertainty was taken into consideration. The 10 leading strategies are presented on the chart on Fig 7b.

Gaussian preference function
In the final step of the PROMETHEE II analysis, the sharp V-shaped preference function was replaced with a much softer Gaussian preference function, with its s parameter equal to the mean value of each criterion Par1-Eff5. The obtained ranking varies substantially from the ones obtained in subsections 3.3 and 3.6. The first visible difference is that only four of the strategies from the original ranking remained in the top ten positions of the new ranking: A9, A13, A5 and A49 on positions 1, 7, 8 and 5 respectively (see Table 4c and Fig 7c). The remaining ranks were were distributed between strategies based on different centrality measures: closeness (A11, A51), eigenvector centrality (A12, A52, A16) and betweenness (A10). The sensitivity analysis of the newly obtained ranking is presented in Table 7. The analysis of the table allows to observe that Par1 is the least and Eff5 is the most discriminating criterion to the final ranking.

Overview of the simulations in the real network
In the second stage of the empirical research, the proposed framework was used to evaluate the viral marketing campaign strategies within a real network [77]. Similarly to the study in subsection 3.2, ten simulation scenarios were generated for the network, in order to ascertain the repeatability of the results regardless of the simulated parameters, as well as 400 various sets of parameters for the criteria Par1-Par3 as in subsection 3.2 were tested in the simulations. In case of the real network, the values of the Par3 criterion were as follows: degree [0.0200], betweenness [3.6900], closeness [2.1200], eigenvector centrality [0.030]. The output of the simulations was used to obtain the average performance values for Eff4 and Eff5 criteria for each of the 400 sets of parameters.

PROMETHEE II analysis
The results of simulations from subsection 4.1 were used to build the performance table for the PROMETHEE II analysis. As in section 3.3, V-shape preference function was used for modeling the comparison preferences, with the indifference threshold q = 0 and the preference threshold p equal to the standard deviation value for each criterion Par1-Eff5. The direction of the preference functions were as in subsection 3.3, to allow comparison of the results on the synthetic network and the real network (see Table 8a). The ten strategies which ranked highest are presented in Table 9a and Fig 7d. The analysis of Table 9a allows to observe that similarly to the strategies obtained on the synthetic network, all 10 best strategies are based on the cheapest ranking measure, i.e. degree. The leading strategy A17 is based on a low seeding fraction (0.1), and mediocre propagation probability (0.4), which leads to very long process (16.2 iterations), but mediocre coverage (43.74%). The strategies A13 and A21 obtained a very close ϕ values, 0.4694 and 0.4645 respectively. The analysis of 7 shows that whilst having the same seeding fraction (0.1), they differ in the propagation probability and obtained duration and coverage. The higher-ranked strategy A13 uses lower propagation probability (0.3 compared to 0.5) and lasts longer (averagely 15.3 iterations compared to 13.8 iterations), but results in much lower coverage (30.92% compared to 53.73%). A further analysis of Fig 7d allows to observe that for the 10 leading strategies, increasing the propagation probability results in the increase of the coverage and in the reduction of the count of the propagation process iterations.
It can be observed that six of the strategies, i.e. A17, A13, A21, A57, A9 and A53 also occurred on the top 10 positions of the ranking obtained from the synthetic network in subsection 3.3. A further comparison of the results allows to note that the both rankings of strategies are highly correlated, with the Pearson correlation coefficient equal to 0.7589. The high correlation can be visually confirmed on the chart on Fig 9. The chart shows the positions of each strategy obtained in the ranking based on the synthetic network (x axis) and on the real network (y axis). The closer the strategy is plotted to the diagonal line on the chart, the more similar was the rank of the strategy in each ranking. The analysis of the figure clearly shows a similarity of the evaluations of strategies in both rankings.

GAIA analysis
The basic PROMETHEE II analysis was followed by the GAIA analysis. A set of GAIA planes for the PROMETHEE decision problem specified in subsection 4.2 is presented on The analysis of Fig 10b shows that again the preference for the Par2 criterion (average propagation probability) is in strong conflict with the preference for the Eff5 criterion (average coverage). On the other hand, the vectors for criteria Par1, Par3 and Eff4 are pointing similar directions which indicates similarity in preference of these three criteria. In contrast to what was observed in subsection 3.4, the vectors of Par1 and Par3 criteria (seeding fraction and ranking generation measure) are no longer perpendicular to each other. Instead, they point the same direction, thus demonstrating a similarity in preference of reducing the cost of both these criteria. This can be caused by a considerably higher value of the standard deviation for the values of the criterion Par3 for the real network compared to the synthetic network. Moreover, it is significant to keep in mind that the action of generating the rankings of network nodes before seeding is strongly related to the action of seeding limited fraction of the best nodes from such obtained ranking. When the criteria Par1-Eff5 are aggregated into three groups again, i.e. Cost, Dynamics and Coverage (see Fig 10c with δ = 94.0%), a very similar relation of the groups to the one obtained in subsection 3.4 can be observed (compare with Fig 8c). There is a strong conflict between the Cost minimization and Coverage maximization criteria, but the Coverage and Dynamics maximization criteria are not highly related in terms of preference. This time, however, a very minute similarity between the preference for the Dynamics maximization and Cost minimization criteria can be observed.

Sensitivity analysis
Similar to subsection 3.5, a sensitivity analysis was performed also for the real network. The results of the performed analysis are presented in Tables 10 and 11 for grouped and individual criteria respectively. In case of the grouped criteria (Table 10), the initial weights were 60% for the Cost group and 20% each for the Dynamics and Coverage groups. The widest stability interval, equal to 55.99% can be observed for the Dynamics group, which means that the weight of this criterion can be decreased by 13% or increased by 42.99% and the leading alternative would not change its position. A narrower stability interval can be observed for the Cost criteria group, equal to 32.80%. The weight of this group can be increased by 9.97% or decreased by 22.83% without the change of the leader strategy. For the Coverage criteria groups, the stability interval is equal to 14.99% and its weight can be reduced to 12.86% or increased to 27.85% without a change of the leader strategy. When compared to the results obtained for the strategies based on the synthetic network (see Table 5), a 0.9292 correlation coefficient is obtained for the intervals. However, the Dynamics cluster interval is narrower, yet the Cost and Coverage cluster intervals are wider.
If the individual criteria Par1-Eff5 are taken into consideration, wide stability intervals can be observed for the first rank, and, therefore, the ranking stability was also performed for ranks 2 and 3. No changes in the weight of Par1 can cause a change in the rank of the first strategy, however, if the Par1 weight drops to below 11.58%, the strategy on rank 2 would be replaced. Similarly, the stability interval for the first three ranks of the Par3 criterion is equal to 91.01%. In case of the Eff4 and Eff5 criteria, the stability intervals for the first rank are the  same as in Table 10. If two leading ranks were considered instead of a single leading rank, the stability interval would drop from 55.99% and 14.99% to 49.18% and 20.74% for Eff4 and Eff5 respectively. Again, as in subsection 3.5, the sensitivity analysis allows to notice that the criteria Par2, Eff4 and Eff5 are most discriminating to the final rankings of the viral advertising strategies. When the results for the real network are compared with the results for the synthetic network, correlation indexes of 0.9217, 0.4928 and 0.3546 are obtained respectively for the stability intervals for up to 1, up to 2 and up to 3 leading strategies.

Uncertainty analysis
An uncertainty analysis similar to the one in subsection 3.6 was also performed for the real network. Again, the q indifference threshold for all criteria was set to 50% of their standard deviation values (see Table 8b). The results of the uncertainty analysis are presented in Table 9b. The analysis of the results allow to observe that even when the uncertainty is taken into account, strategy A17 remains the leading one for the real network. The order of the subsequent strategies changes from A13, A21 and A57 to A21, A56 and A13 on ranks 2 to 4. Strategies A9, A61 and A25 remained unchanged on positions 5 to 7. Strategy A53 was degraded from rank 8 to rank 9, whereas the rank 8 was given to the previously 10th strategy A97. When the uncertainty was taken into consideration, strategy A20, previously outside of the set of the top ten strategies, obtained rank 10. The ten leading strategies are presented on the chart on

Gaussian preference function
In the final step of the analysis, the sharp V-shaped preference function was replaced with the Gaussian preference function, with its s parameter equal to the mean value of each criterion Par1-Eff5. In contrast to the rankings from subsections 4.2 and 4.5, the obtained ranking contains mostly strategies based on the eigenvector centrality measure. The only three strategies based on the degree measure are A17, A13 and A21 (ranks 4, 8 and 10 respectively). The analysis of Fig 7f shows that the highest-appraised strategy A12 is based on a small value of seeding fraction (0.01 for Par1) and long duration (21 iterations in Eff4), however the obtained coverage is very small (14.87% for Eff5).
The results of the sensitivity analysis for the ranking is presented in Table 12. It can be observed, that the Par3 criterion is the least discriminating one when the Gaussian preference function is used and its weight can be vastly modified (up to 86.12%) without considerable changes in ranking. However, if the weight of the Eff5 criterion grew by 5.17%, the leading strategy in the ranking would change.

Conclusions
Recently, marketers have put more and more efforts to perceive the positive user experience within online platforms. While intrusive marketing techniques resulted in increased interest in software focused on blocking advertising content marketers, a new demand to focus on searching for more sustainable solutions appeared. The overall number of contacted customers can be less important than the real interest in products and a proper specification of the campaign intensity. In the area of viral marketing and information spreading processes, the highest attention was put on increasing campaign coverage with the use of seeding methods and techniques increasing the propagation probability like incentives and other ways to motivate customers to spread the content. The approach presented in this paper shows a framework based on multi-criteria decision support, targeted on planning and evaluation of marketing campaigns with different preferences and criteria taken into account. The results showed how multi-criteria evaluation of results can affect strategies, campaign parameters and allocated budgets. The presented approach makes it possible to perform an evaluation of different scenarios within simulated environment before the campaign within a real environment begins. The empirical study showed that the characteristics of information spreading processes within the network sample selected according to network measures' distributions are similar to those observed within a real network. Various scenarios can be tested without interaction with real environments.
During the empirical study, an example viral marketing campaign was planned for an actual real network. Based on the real network parameters, a corresponding synthetic model was selected. Preference modeling and a profound multi-objective decision analysis were performed, which resulted in the selection of the best strategy in the context of the previously modeled preferences.
The research has identified possible areas of improvement and future works. First of all, the decision support system utilized in the presented framework was based on a set of five criteria. This set can be expanded to provide more precise evaluations. Secondly, future works include a more detailed evaluation of the relations between the processes within real networks and theoretical structures generated with different parameters. Another direction can be a sampling of real networks and performing simulations on samples of real networks instead of theoretical network models.
Supporting information S1 File. Microsoft Excel spreadsheet containing partial results used during the research. (XLSX)