Manipulating concept spread using concept relationships

The propagation of concepts in a population of agents is a form of influence spread, which can be modelled as a cascade from a set of initially activated individuals. The study of such influence cascades, in particular the identification of influential individuals, has a wide range of applications including epidemic control, viral marketing and the study of social norms. In real-world environments there may be many concepts spreading and interacting. These interactions can affect the spread of a given concept, either boosting it and allowing it to spread further, or inhibiting it and limiting its capability to spread. Previous work does not consider how the interactions between concepts affect concept spread. Taking concept interactions into consideration allows for indirect concept manipulation, meaning that we can affect concepts we are not able to directly control. In this paper, we consider the problem of indirect concept manipulation, and propose heuristics for indirectly boosting or inhibiting concept spread in environments where concepts interact. We define a framework that allows for the interactions between any number of concepts to be represented, and present a heuristic that aims to identify important influence paths for a given target concept in order to manipulate its spread. We compare the performance of this heuristic, called maximum probable gain, against established heuristics for manipulating influence spread.


Introduction
In many environments it is possible for strategies, behaviours, knowledge or infections to spread within a population. The nature of propagation is determined by the interactions between individuals. Populations of autonomous entities are complex systems, meaning that the net effects of propagation are hard to predict or influence, despite being due to individual behaviour. Such propagation is a form of influence spread, which can be modelled as a cascade from a set of initially activated individuals [1].
Insight gained from understanding how to manipulate cascades in abstract populations has many applications, such as informing epidemic control, viral marketing, and understanding convention emergence in multi-agent systems. For example, characterising the spread of disease aids in identifying at risk groups, enabling containment efforts to be focused to avoid wider spread. Understanding  through identifying influential individuals. Other applications include the adoption of behaviours. It is known that behaviours, such as drinking and smoking, are often adopted by those who socialise with individuals that already partake in such behaviours. Furthermore, it has been shown that close acquaintances giving up smoking can encourage an individual to quit [2]. It may therefore be possible to identify individuals who, if they quit smoking, would result in the largest number of others quitting.
Modelling such a decision process can also inform an individual if they are being manipulated. For example, consider a competitive business such as finance, where a large factor in being successful is in the prediction of others' behaviour. If a business can identify influential individuals and how their behaviours spread through a network, they may be able to identify when such an individual is taking an action as a strategy to influence others to take a similar action. Identifying this can allow for a business to avoid the manipulation tactic and make more informed decisions. The key enabler in all these examples is being able to identify the set of individuals who can help spread a behaviour, idea or product, or who can restrict future spreading (e.g. through their vaccination).
Several models have been developed to characterise influence spread [1,3], along with techniques to maximise spread [4]. These models represent a population as a network, with individuals corresponding to the nodes, and edges representing the existence of an influential relationship from one individual to another. Opinions, behaviours, knowledge or diseases can be represented as concepts that spread along the edges of a network, with a chance to infect the nodes they encounter. Existing models of influence spread typically assume that only a single concept exists, or that concepts block the spread of other concepts, preventing an individual from activating multiple concepts [5,6]. In real-world environments, individuals can have multiple concepts active simultaneously and, through concept interaction, a concept's ability to spread can be affected. As such, in this paper we remove the assumption that concepts are blocking, allowing for more complex concept interactions.
Previous work on the interactions between spreading concepts has focused on epidemics and how interactions between diseases may affect the epidemic threshold of a network [7]. However, there has been little consideration of concept interactions to help improve the spread of a given target concept, in situations where we cannot directly manipulate the spread of that target concept. Consider, for example, a disease spreading through a network. While it may not be possible to affect the disease directly, its spread can be inhibited through the spread of knowledge and the use of vaccination, both of which can also be modelled as concepts within a network. The interactions between such concepts may be positive or negative, aiding or hindering spreading respectively.
In previous work, we developed a heuristic that aims to maximise the spread of a chosen concept in an environment where concepts interact. This involved selecting seed nodes to avoid other, inhibiting, concepts while increasing the chance to encounter boosting concepts [8]. However that work, along with most previous work on influence spread, assumed that we can select seed nodes for the target concept we wish to maximise. There has been little consideration of scenarios where this is not the case. When we cannot control the target concept that we aim to manipulate, we must instead indirectly affect its spread through a secondary controllable concept. It is this problem of indirect concept manipulation that is the focus of this paper.
Concept interactions facilitate the manipulation of a concept that cannot be directly controlled. Consider opinions spreading within a social network. It is extremely difficult to guarantee an individual's adoption of a chosen opinion, instead we may consider exposing an individual to particular information or news stories that makes them more, or less, likely to adopt that opinion themselves. This may be of use in encouraging loyalty to a given brand or promoting particular social habits within a group. If, for example, we wish to spread the opinion that smoking is detrimental to health, we may choose to expose certain individuals to information on the health risks of smoking, making them more likely to agree when a friend shares a negative opinion on smoking. The use of a secondary concept to boost the spread of a given target concept is known as the indirect influence maximisation problem.
Alternatively, we may wish to indirectly inhibit the spread of a detrimental concept such as a disease or rumour. A potential approach to minimising the spread of a target concept is to expose a selected group of individuals to a secondary inhibiting concept. This is known as the indirect influence limitation problem [9]. Previous investigations into minimising influence have focused on finding nodes present on a high number of shortest paths [10], or nodes that connect communities [11]. In environments where concepts are blocking, selecting such nodes to block concept spread prevents the undesirable concept from utilising the most influential network paths. However, when concepts are not blocking but instead merely lower the chance of spreading, such methods are less effective. If a node is on many shortest paths, but is not near to the start of the target concept cascade, it is unlikely to encounter the target concept, and so its selection is unlikely to help limit the concept spread. As such both a node's expected gain for the target concept and the likelihood of activating that target concept, must be considered when evaluating a node's suitability for influence limitation.
In this paper, we investigate the use of a secondary concept to affect, positively or negatively, the spread of a target concept. We present a framework for representing concept interaction, and discuss the differences between indirect concept spread manipulation and the more widely studied influence maximisation problem. With those differences in mind, we propose the maximum probable gain (MPG) heuristic and evaluate its effectiveness in both the indirect influence maximisation and indirect influence limitation problems. A subset of our experiments for the influence limitation problem, namely the performance of MPG and other heuristics in small-world, scale-free and real-world networks for the ICM has been previously published in [9].
cascades are blocking, meaning that nodes activated by one cascade cannot be activated by another. Additionally, most formulations assume that an environment contains only two concepts, when in reality there could be many more, all of which interact.
If concepts are not blocking, then how they interact must be modelled. Sanz et al. developed a multi-layer network model in which each concept spreads on a different layer, but nodes can activate more than one concept at a time [7]. When multiple concepts are active on the same node, they can boost or inhibit each other's ability to spread. The idea of boosting and inhibiting the spread of a concept has been explored by others [9], including the effects this can have on concept interactions. Newman and Ferrario consider a model where an epidemic can only spread through nodes that have been infected by a previous epidemic [22]. In the work of Myers and Leskovec the concepts an agent chooses to adopt are affected by all the concepts they have previously been exposed to (regardless of adoption) [23]. Such models typically focus on concepts that are similar to each other, such as diseases, but other factors, such as the information available on that disease or the availability of vaccinations can also affect spread [10].
Related, but semantically different, concepts can also interact. Wang et al. demonstrated, with multi-layer networks, the relationship between a disease and information about that disease. As the disease propagates, it increases the spread of information about it, which in turn allows people to make informed decisions and decrease the spreading rate of the original disease [24]. Multi-layer networks can model multiple concepts by restricting each concept to its own layer, allowing for the modelling of transmission methods distinct to each concept [25]. For example, a concept spread by hand-to-hand contact will spread differently than one that utilises word-of-mouth. Multi-layer networks have also been utilised to facilitate the analysis of the effectiveness of introducing an immunising concept to a network that contains a spreading disease [26].
An individual's opinions can affect the concepts they adopt or spread. These opinions can be represented through network and node attributes, as in the adaptation of the LTM proposed by Kaur and He [27]. Each edge is associated with two separate influence strengths, representing a positive and negative opinion respectively. A node has both a positive and negative threshold, and will activate the opinion that first has its corresponding threshold exceeded. Nodes with high positive influence can be selected to block the negative opinion from spreading further. Stitch et al. present a method of representing opinions by assigning an attitude score to individual nodes [28]. Nodes with a high attitude score are more likely to spread negative word-of-mouth, even if they have been mostly exposed to positive opinions, and vice versa for nodes with low attitude scores. Both of these models again assume that concepts are blocking.
A further factor that has been shown to impact concept spread is the general behaviour of a population. Perra et al. present a model that changes the infectiousness of an epidemic over time to reflect behaviour [29]. Other approaches include modifying network connections to simulate fear [30], introducing an additional edge weight to represent social connections [31], and the use of game theory to model strategies that can change as the epidemic spreads [32]. The idea of responsiveness was introduced by Kiss et al., with responsive individuals more likely to avoid infection or be treated quicker, reducing the spread of a disease [33].

Utilising concept interaction
There has been little consideration of indirectly maximising the spread of a concept, with the majority of previous work that does consider concept interactions focusing on influence limitation. Fan et al. propose using nodes that connect one community within a network to another, known as bridge ends, to block an undesirable concept from reaching large communities within the network [11]. Li et al. select nodes for immunization that will protect the bridge ends themselves, further distancing an undesired concept from other communities within the network [34]. Other methods have also been proposed, such as removing central nodes, partitioning the network and making it more difficult for a disease to spread [35]. While protecting bridge ends can limit the spread of a concept, the approach becomes less effective when concept, and therefore path, blocking cannot be guaranteed. The betweenness of nodes can also be utilised as an indicator of a node's potential to limit a concept's spread, but is computationally expensive to calculate [10].
Kotnis and Kuri propose a solution where individuals can be trained, at a cost based on their degree and the quality of training, to be better at deciding whether information is a rumour [36]. For a given budget, having more low quality trained individuals yields better results than having fewer individuals with higher quality training. While Kotnis and Kuri assume a single cascade, they discuss the possibility of other messages affecting the spread of a rumour.
A related problem, selecting a group of nodes and improving their ability to spread a target concept, is discussed by Liontis and Pitoura [37]. They present the MoBoo heuristic, which selects nodes to become able to boost the probability of spreading a target concept by evaluating the possible gain from selecting each node.
To estimate the gain of a node, MoBoo first finds the likely paths that the target concept will travel through. For each node with the target concept active, MoBoo constructs a tree of paths that start from that node. With these trees in place, MoBoo then constructs λ independent paths, where typically λ = 2 in order to maintain a reasonable runtime. For each node, v, MoBoo selects the top λ paths where the only nodes shared are the end node, v, and possibly the starting node. These paths are referred to as μ i , where i is the rank of the path. Paths are ranked by their propagation probability and so MoBoo selects the λ paths that are most likely to activate v.
Using the propagation probability of these paths, the activation probability of a node, ap(v) can be calculated. For λ = 2: where P(μ i (v)) is the probability of v to be activated from independent path i. These activation probabilities are used to calculate the gain for each node. The gain for a node, v, assumes that v has the boosted probability to spread the target concept to its neighbours and is calculated as: where Out(v) is the set of nodes that are the children of v in any independent path μ, p v, u is the probability of the concept spreading from node v to u and p 0 v;u ¼ p v;u þ b, with b being the improvement to spreading probability gained by a node when it is selected to become a boosting node.
In each round, we select the node, v, with the highest gain, g(v), and update the required paths. This is repeated until we have selected the desired number of nodes. Full details of the derivation of this approach can be found in [37]. The approach is based on the Prefix excluding Maximum Influence in-Arborescence (PMIA) algorithm for influence spread, which considers the nodes likely to activate a concept, and the expected activations gained by the influence of those nodes [38].
The MoBoo algorithm focuses on boosting concepts, but the potential influence of a node may also be of use when considering an inhibiting concept. Targeting high influence paths with a boosting concept should provide a larger performance increase for the target concept than a low influence path. Similarly, limiting the potential of a high influence path may prove more damaging to a target concept than targeting a low influence path. Budak et al. propose the highest infectees heuristic that, for environments with 'good' and 'bad' cascades, gives results comparable to greedy hill climbing [39]. This heuristic assumes knowledge of the seed set for the 'bad' cascade, and simulates a large number of cascades using that seed set. Nodes are ranked by the number of simulations in which they became infected with the 'bad' cascade, and are selected as seeds for the 'good' cascade in descending order. While the aim of this work is to limit the spread of the 'bad' cascade, it does so through maximising the spread of a 'good' cascade. Therefore this may also be an applicable approach for both boosting and inhibiting a target concept through a secondary concept, in order to increase the number of concept interactions.
Rather than selecting nodes to block a concept's spread, we may also consider modifying the edges in a network. Li et al. proposed reducing the edge weight to limit the rate of transmission [40]. An edge's weight is expressed as a function of the degree of its two end points, and the rate of transmission between two connected nodes is proportional to the weight of the edge between them. The use of inflammation immunisation, which reduces edge weights by a chosen factor, is shown to be effective in this model and may translate well to real-world scenarios. Notably, the reduction of edge weights lowers the transmission weight but does not compromise network efficiency. Conversely, the bond percolation approach suggested by Kimura et al. curbs the spread of an infection but, by removing links, damages the network structure and in turn, the ability of the network to transmit other concepts [41].
Overall, we see that most previous work involving multiple cascades assumes that concepts block. This leads to a focus on competitive influence spread, where the aim is to maximise the spread of one cascade over another, where we directly choose the seed nodes for the concept we wish to maximise. Through more complex concept interactions, we can move beyond competitive influence and consider indirect influence manipulation. The work of Liontis and Pitoura [37] is one of the few approaches to consider indirect influence maximisation, although the ability to boost a concept's spread cannot be spread itself.

Real-world experiments
The studies discussed above all rely on empirical results from simulations. This is due to the difficulty of performing real-world analysis on the spread of influence. In particular, there is not always a precise definition of what constitutes an individual being 'influenced'. In studies involving the social media site Twitter, re-tweeting and using hash tags have been used to signify that a person has been influenced, but this is not necessarily a reliable method [42]. If we cannot confidently state that an individual has been influenced, tracking a cascade is difficult.
Furthermore, having full knowledge of a social network is difficult. Presently, the majority of influence spread studies assume that we have knowledge of the full network and do not account for any possible hidden factors. The requirement of full knowledge for such studies makes translation to real-world scenarios difficult.
However, this does not mean that real-world studies have not been performed. There has been recent work which has considered spreading information in a local network of homeless youth that frequent particular shelters [43,44]. This work focuses on the uncertainty problems, but demonstrates that influence spread studies can be performed in the real-world, with some tangible effect, and that there is value in empirical simulations since the results can be subsequently applied in real-world environments.

Indirect concept spread
Individuals, or nodes, in real-world networks are typically capable of activating many concepts at once, representing a wide range of ideas, opinions and attributes. Furthermore, concepts may be related to each other and have the ability to affect each other's spread. As a result, the concepts active on a node may increase or decrease the likelihood that the node will activate other concepts in the future.
Previous work on limiting the spread of a concept typically assumes that concepts are blocking [11,27,28,34]. However, Sanz et al. presented a model for the interactions of two diseases spreading through a network, where an individual infected with one disease is more likely to be infected with the other [7]. This demonstrates the value of considering concept interaction, and highlights that such interactions could be used to indirectly affect a target concept. We adopt the view of Sanz et al., in assuming that concepts are not blocking.
Utilising the interactions between concepts allows for the manipulation of concepts that may not be directly controlled. For example, we could encourage the spread of an opinion through supporting information, or slow the spread of a disease with targeted vaccination. At the start of an influence cascade, some nodes will have concepts active on them, which can then spread to other nodes. The initial set of nodes that activate a concept is known as the seed set for that concept. When attempting to manipulate the spread of a concept indirectly, we consider two different types of concept, the target concept and the controllable concept: Target Concept: A concept for which we cannot directly select seed nodes within the network but wish to manipulate its spread.

Controllable Concept:
A concept for which we can directly place seed nodes within the network, that interacts with the target concept and affects the target concept's spread.
Indirectly controlling the spread of a concept can be divided into two different problems. The indirect influence maximisation problem aims to identify a seed set of size k for a boosting controllable concept that will maximise the spread of a target concept. Conversely, the indirect influence limitation problem aims to identify a seed set of size k for an inhibiting controllable concept that will minimise the spread of a given target concept. In both cases, the problem becomes how best to place seed nodes for a secondary concept to maximise our impact on the spread of the target concept, either positively or negatively.
When the objective is to leverage a controllable concept to indirectly manipulate a different concept, we must reconsider how we value nodes. In the traditional influence maximisation problem, nodes are valued on their ability to facilitate the spread of the target concept to the highest number of neighbours possible. For indirect concept manipulation, we must consider whether a selected node will help facilitate concept interactions, and the extent to which it will impact the spread of the target concept. With indirect concept manipulation, the final spread of the controllable concept is unimportant, since the concern is maximising the controllable concept's effect on the target concept. This does not mean, however, that maximising the spread of the controllable concept is not a viable approach. If the controllable concept is spread widely throughout the network, then it will be more likely to interact with the target concept and thus impact its ability to spread.
An alternative simple strategy is to select all the nodes that currently have the target concept active. Selecting nodes that are actively spreading the target concept will naturally result in guaranteed interactions between the target and controllable concepts, contributing to the goal of indirect influence manipulation. However, we may not have the budget to target every node that is currently spreading the target concept, especially if the two concepts are not introduced into the network at the same time and so the target concept has already spread. Alternatively, we may also be able to select more seeds for the controllable concept than there are nodes currently spreading the target concept. Therefore, we would require a method to identify which nodes with the target concept active are the most valuable to select, and a method to evaluate the value of nodes without the target concept active.
Instead, we may consider the paths a concept is most likely to travel. Analysing the likely paths a concept will travel from a given node, and evaluating the activations expected as a result of travelling those paths, allows for nodes to be ranked in terms of their contribution to the spread of a chosen concept. Nodes that have a higher number of expected activations are naturally more valuable to the target concept and, through the use of concept interaction, the number of expected activations can be manipulated. Attempting to evaluate the full set of paths from every node within in a network is prohibitively computationally expensive. However, the search space can be restricted to consider only paths above a given probability of being taken, allowing for the tractable estimation of the expected activations of a node.
In the ICM, each successive round of a cascade typically infects fewer nodes than the previous round due to the relatively low probability of infection. Therefore, a concept performs the majority of its spreading in the early time steps of a cascade and so will be more affected by concept interactions in the early stages than at a later time. Furthermore, the higher the expected gain of a node, the greater its impact on the spread of the target concept. Naturally, if the target concept becomes active on a node with high expected gain, the concept can spread further and be more likely to encounter other nodes with a high expected gain. If we wish to boost the spread of a target concept, then targeting nodes with a high expected gain increases their impact even further. If the goal is to inhibit a target concept's spread, targeting these nodes will result in a higher number of lost activations than targeting nodes with a lower expected gain. As such, for both the indirect influence maximisation problem and the indirect influence limitation problem, targeting nodes with a high expected gain could prove effective.

Concept interaction framework
To model complex concept interactions, we propose a framework based on the model presented by Sanz et al. [7]. In this paper, we focus on the ICM and LTM, due to their widespread use in previous work [1,3,4,14]. In the ICM, newly activated nodes have a single opportunity to spread a concept to each of their neighbours, with a probability, p, of success [3]. Once a node activates a concept, it remains activated permanently, but cannot attempt to spread the concept after its initial opportunity. In the LTM, every node has a threshold that incoming influence must exceed before they activate a concept and begin exerting influence on their neighbours [1]. Again, once a node is activated it remains activated permanently but, unlike in the ICM, a node continually exerts influence on its neighbours. This means that a node's neighbours may activate a concept over several time steps, until their collective influence causes the node to activate the concept. In both models, influence travels from one node to another through interactions. We call the node that the concept or influence is travelling from the infector node and the node that the concept or influence is travelling to the receiver node. For example, in the ICM a newly activated node is the infector and may spread a concept to each its neighbours, which are receivers. In this framework, the concepts already active on a receiver affect the probability of an additional concept being adopted by that receiver. Furthermore, the concepts active on an infector affect the probability of those concepts spreading to its neighbours. Although we define propagation and influence strength in terms of the ICM and LTM, our concept interaction framework is generally applicable and can represent a range of spreading dynamics. Furthermore, while we define the interactions between only two concepts, we briefly discuss how this framework can be extended to an arbitrary number of interacting concepts. Table 1 summarises the notation used to described the framework.

Environment
We represent an environment by a set of nodes situated in a network. Each node, v, has a set of incoming neighbours, N i v , and outgoing neighbours, N o v , where nodes in N i v can influence v, and v can influence nodes in N o v . These sets are not necessarily disjoint and may be equivalent in some environments, allowing both directed and undirected graphs to be represented.

Influence Spread
A concept spreads through a network during interactions between nodes. During these interactions, a concept can spread from the infector node to the receiver node dependent on the strength of the influence being exerted by the infector onto the receiver. I v,u (c) represents the influence strength exerted by node v onto node u with respect to concept c. The value of I v,u (c) is defined based on the influence spread model that is used.
The ICM emulates word-of-mouth propagation, with concepts being spread from node to node based on single interactions. As discussed previously, a concept successfully spreads from an infector to a receiver with a given probability p, and each infector-receiver pair can interact only once per concept. We define I v,u (c) = p for any pair of nodes such that v 2 N i u and p is the chance of infection. In each time step, each node that activated a concept in the previous step can attempt to spread that concept to each of its neighbours.
The LTM emulates the idea of group influence, with nodes activating a concept based on the number of their neighbours that have already activated that concept. Each node, n, has a threshold, T n , that represents the total incoming influence required for a node to activate a concept. A pair of nodes is connected by a directed edge, with a given weight, w v,u , that represents the strength of influence from the source node v to the destination node u. In the original, single concept, LTM, a node activates the concept once the sum of incoming weights, w v,u from neighbours with the concept active exceeds its own threshold.
When there are multiple concepts, a node n will have an individual threshold, T c n , for each concept, c. Similarly, edges will have individual weights, w c v;u , for each concept, leading to an environment that is equivalent to a multi-layer network. Each concept is then activated on a node in a similar way to the original LTM. When the total incoming influence for a concept c exceeds a node's threshold for c, that node activates c. For a given concept c, we define We divide the weight of the edge between the two nodes by the receiver node's threshold, so that influence strength represents the proportion of the receiver node's threshold that is matched by the edge weight. Accounting for the threshold of the Table 1. Notation used to denote the concept interaction framework.

Parameter
Values The set of incoming neighbours for node v.
The set of outgoing neighbours for node v.
The strength of influence exerted by node v on node u relating to concept c.
The contextual influence exerted by node v on node u relating to concept c. This is calculated by scaling I v,u (c) appropriately.
https://doi.org/10.1371/journal.pone.0199845.t001 receiver allows for a relative measure of influence strength, which can be better compared to other relationships within the network. We limit I v,u (c) to a maximum value of 1 since that signifies a guaranteed activation.

Concept interaction
When nodes interact, the spread of a concept can be affected by other concepts active on both the infector and the receiver. In this model, this effect is represented by scaling the influence strength that an infector v can exert on a receiver u regarding concept c. The resulting value is referred to as the contextual influence and is represented by The definition of CI v,u (c) is dependent on the use case, in particular the number of concepts that can interact within the model. For our case, where we have only a target concept, t, and a secondary controllable concept, s, we define CI v,u (t) as follows: where r 2 [0, 1) is a feature of the environment that represents the extent to which concept s affects concept t. If r < 1, then s decreases the ability of t to spread, and the relationship is said to be inhibiting. If r > 1, the relationship is said to be boosting as s increases the ability of t to spread. If r = 1 then s has no impact on the ability of t to spread. As discussed previously, the concepts active on an infector can affect the concepts that spread from it, and the concepts active on a receiver affect the concepts the receiver will adopt. As such, we consider the context of both nodes when evaluating the strength of the contextual influence. Note that we limit the contextual influence value to a maximum of 1 as, in both the ICM and LTM, that means the receiver is guaranteed to activate the target concept. The possible effects of concept interaction are illustrated in Fig 1, where we see a boosting concept increase the number of neighbours activated and an inhibiting concept decrease the number of neighbours activated. The concept interaction model is designed to be extensible and allow for any number of concepts to spread within a network and interact. If there are more than two concepts that can interact, each relationship must be defined. In particular, this requires defining how each concept affects the spread of others when active on the infector and when active on the receiver. Each of these situations can be defined using a value for r. When a concept, c, spreads, I v,u (c) is then scaled by the summation of the effects of each concept active on the infector and a summation of the effects of each concept active on the receiver. While in this work we do not fully utilise the wider functionality of this model, the used to run the simulations has been designed to allow for the most general case and is obtainable from: https://github.com/JamesArchbold/ ConceptInteraction.

Indirect concept manipulation
The influence maximisation problem has been well studied, leading to a variety of heuristics that aim to approximate an optimal solution. However, there may be situations where we do not have direct control of the concept we wish to manipulate, as discussed in Section 3. Concept interactions provide an opportunity for us to indirectly manipulate the spread of another concept, allowing us to tackle a wider range of problems when compared to assuming blocking concepts.
In this section, we propose the maximum potential gain (MPG) heuristic to address the indirect influence maximisation and indirect influence limitation problems in environments with concept interaction. Algorithm 1 defines the MPG heuristic and in this section we describe its operation in detail. A summary of the notation used in our description can be found in Table 2. Previous techniques that have been developed for environments where concepts block are less effective when dealing with concept interaction, as it becomes impossible to guarantee that a concept can be prevented from spreading. As such, new techniques are needed for indirect influence manipulation, leading to the development of MPG.
We wish to select locally influential nodes, that can spread the controllable concept and affect the spread of the target concept. This is similar to the notion of betweeness [10,45], since selecting nodes with a high betweeness means that they are likely to be encountered by the target concept and to reach a higher number of nodes. Determining betweeness is computationally expensive, since it requires the calculation of a large number of shortest paths. Therefore, we need an alternative method to estimate the influence of a node. Nodes that are likely to both activate the target concept and provide a high number of expected activations are likely to be locally influential, therefore MPG aims to select such nodes.
Algorithm 1: Greedy Algorithm using MPG Heuristic input: Set of nodes in the network, V; Set of nodes that will attempt to spread concept t in the next time step, S t ; probability threshold, θ; Number of seeds to select, s output: Seed set for controllable concept c, S c for v 2 V do #Initialise the activation probability (ap) and expected gain (E) for node v.

S t
The set of nodes that can spread the target concept in the next time step.

MIP(v, u)
The most influential path from v to u for t, i.e. the path with the highest amount of influence originating at v and ending at u. θ The minimum influence value for a MIP(v,u) to contribute to either ap(u) or E(v).

ap(v)
The activation probability, a measure of the likelihood that v will activate the target concept.

E(v)
The expected number of activations that v will provide if v activates the target concept.

WE(v)
The weighted expected gain of v, namely E(v) weighted by ap(v). We assume that MPG has full knowledge of the network, including which nodes have the target concept active. A similar assumption was made by the authors of MoBoo [37], as some level of knowledge of the target concept is needed to be able to indirectly manipulate it. While not always realistic, there are cases where we may know that an individual has a concept active and be unable to directly interact with that concept. For instance, consider a disease such as bird flu. When it is first introduced into a population, there is no vaccine, but information about the disease and its symptoms can help to limit its spread through a population. Alternatively, we can often learn of someone's political affiliation, but it is difficult to directly encourage others to adopt that affiliation. Instead, a person may be influenced to read news stories that cause a change in opinion. In both cases we can see how, while we may not be able to control the target concept, we may be able to learn which individuals have it active and so manipulate its spread.

Activation probability
If an edge exists in a network from node v to node u, then v can exert influence on u in relation to a concept c. This influence, I v,u (c), is used in an influence spread model to determine whether u will activate concept c based on v's influence. We define the set S t as the set of nodes that are able to exert influence with regard to concept t in the next time step. For the ICM, this is the set of nodes that activated t in the previous time step or, if we are at the start of a cascade, the initial seed set. For the LTM, this is the set of nodes that currently have the target concept active, since nodes in the LTM always exert influence, to model the gradual effect of peer pressure.
For each node v 2 S t , the influence reaching node u from v, for concept t, is dependant on the most influential path from v to u, MIP t (v, u). For a given path the strength of the influence reaching u from v is defined as: If a node in the path has activated the controllable concept, the influence it receives and exerts in relation to the target concept will be modified according to the concept relationship between the controllable and target concept. The use of CI v,u (t) in the definition of I P allows for that relationship to be accounted for. MIP(v, u) can then be defined as: where AP(v, u) is the set of all paths that start with v and end in u. This means the influence received by u from v, denoted as The activation probability of a node, u, can now be defined as the sum of the influence received from all nodes with the target concept active: Fig 2 illustrates the calculation of the activation probability of u, represented in the figure as the green node. The blue nodes have the target concept active, and those within the red area can send influence to u equal to or greater than θ and so contribute to ap(u). The nodes outside the exploration range, which send influence to u less than θ, do not contribute to ap(u). Note that in this case, the controllable concept has yet to be introduced, and so each edge has the same probability of propagating the target concept.

Expected gain
With the activation probability calculated, we now consider a node's expected activations for the target concept. This is simply calculated as the sum of the influence that a node, u, exerts on all other nodes within the network: Due to the definition of IR(u, w), E(u) can be calculated efficiently, as it will only need to the explore a small local area of the network. This is illustrated in Fig 3, where nodes outside of the red area receive an amount of influence less than θ and so do not contribute to E(v).
To identify influential nodes that are likely to be encountered, the expected gain of a node is weighted by the probability that it will activate the target concept. This weighted expected gain, WE(u), for a node u represents the expected value of u to the spread of the target concept, is defined as: A high WE(u) value indicates a node is both likely to activate the target concept and provides a high expected gain for the target concept. As such, we select the node with the highest weighted expected gain, WE(u), as a seed node for the controllable concept and we update the value of WE(v) for any node v that will be affected by the presence of the controllable concept at node u. The selection process then repeats, selecting the node with the highest WE(u) value, and recalculating WE(v) for affected nodes until the seed set for the controllable concept reaches the desired size. Blue nodes have the target concept active, and the red border signifies a propagation probability of θ = 0.001. A concept spreading from a node within that border will reach the green node with a probability higher than θ, a concept spreading from outside the border will reach the green node with a probability lower than θ and so does not contribute to the green node's ap(green). https://doi.org/10.1371/journal.pone.0199845.g002

Experimental set-up
We evaluate the effectiveness of our proposed heuristic using the experimental parameters given in Table 3. For both the indirect influence maximisation and indirect influence limitation problems, we compare the performance of MPG against heuristics used to maximise the spread of the secondary concept and the MoBoo heuristic for indirect influence maximisation. Nodes inside the red area have a probability higher than θ to be reached by a concept spreading from the blue node and so contribute to its expected gain. Nodes outside the red area have a probability lower than θ to be reached and so do not contribute to the expected gain of the blue node.
https://doi.org/10.1371/journal.pone.0199845.g003 Manipulating concept spread using concept relationships As previously stated, the code used to run the simulations is obtainable from: https://github. com/JamesArchbold/ConceptInteraction. In this evaluation, for the ICM we assume θ = 0.001, meaning that a path must have an influence strength above 0.001 to be included in the calculation of ap(v) and E(v) values, limiting our considerations to a local area around each node with the target concept active. This setting is equivalent to SIR model where the probability of deactivation is 1, however the majority of influence spread studies use terms associated with the ICM, and so we use ICM terminology in this paper.
When evaluating MPG in the LTM, θ instead represents the length of a path. Due to the variance in influence strength present in the LTM, the amount of the network we explore can vary drastically. Since a typical cascade does the majority of its spreading in the first few time steps we only consider paths of length 4 or less, meaning that from each node we explore their 3-hop neighbourhood.
For the ICM, we assume that the probability of a concept spreading from an infector to a receiver is initially p = 0.1. This value is then modified by the interactions between the different concepts active on the infector and receiver. For the LTM, nodes have a threshold selected from a Gaussian distribution with a mean of 0.8 and standard deviation of 0.05. This means that the average node is difficult to activate, which mirrors the low chance of infection present in the ICM.
We acknowledge that both of these models are simplistic, and that the results we obtain are limited by the restrictions of the models. In this paper, we focus on demonstrating the feasibility of indirect concept manipulation and so we use two of the most widely used models in the study of influence spread. In the future, we intend to perform investigations with other, more dynamic models such as SIS and the SIR model with a wider range of deactivation probabilities.
We compare the performance of MPG against other heuristics for the indirect influence maximisation and the indirect influence limitation problems. In particular, we compare its performance against degree discount [4] and MoBoo [37]. These heuristics have been shown to perform well in related settings and so provide comparators for evaluating the performance of MPG. In addition, we will compare against the performance of simple heuristics, such as highest degree and random selection, to provide a baseline.
The target concept has a randomly selected seed set in all our experiments. We use the following heuristics to select a seed set for the controllable concept: • Random-randomly chosen nodes are selected.
• Highest Degree-nodes with the highest degree are selected.
• Single Discount-nodes with the highest degree of nodes not already selected for the seed set are selected.
• Degree Discount-nodes with the highest 1-hop expected gain are selected.
• MoBoo-nodes with the most probable paths of activation are selected.
• MPG-nodes likely to activate target concept, with high expected gain, are selected.
Note that we do not use the highest infectees heuristics proposed by Budak et al. [39], as simulating cascades is often an expensive approach to estimating the effectiveness of seed sets. Furthermore, the heuristic assumes blocking concepts, unlike MoBoo, which can be interpreted as activating a concept that does not spread on the nodes it selects.
The primary target concept will spread for a given number of time steps, referred to as the burn-in time, before we introduce and select seeds for the controllable concept. We evaluate the heuristics using a variety of burn-in times in order to explore the impact of burn-in time on the effectiveness of the various heuristics. We focus on short burn-in times, preventing the controllable concept being introduced after the target concept has stopped spreading and at which point we would be unable to manipulate it.
We performed a small selection of initial cascades, where there is one concept spreading from a randomly selected seed set of 250 nodes, using either the ICM or LTM as described above. These tests used both scale-free and small-world networks of 50000 or 100000 nodes. As shown in Fig 4, we see an increase in the number of infections each round in the early time steps, but always by time step 4, the number of new infections begins to rapidly decline. Note this happens for both influence spread models, demonstrating why we wish to intervene early and so focus on early burn-in times. Furthermore, we test a range of boosting and inhibiting relationship strengths. This allows us to determine whether different strategies may be more viable at different levels of relationship strength.
Our evaluation of these heuristics requires a variety of networks. Real-world networks can be characterised by a number of topological features [46,47]. Two characteristics that are often considered are the small-world and scale-free properties, both of which are often present in real-world networks [48,49]. As such, synthetic small-world and scale-free networks are both used to evaluate the performance of the heuristics, with a range of sizes as detailed in Table 3. The results from these networks can help to inform how MPG may perform in realworld situations.
The small-world networks used in this paper have a clustering exponent of either 0.25 or 0.75. These are generated using the Kleinberg small-world generator provided in the JUNG graph framework (http://jung.sourceforge.net/) [50]. The scale-free networks are constructed through the use of the Barabási-Albert generator provided in JUNG [51]. This generator begins with an initial set of unconnected nodes and introduces a new node at each evolution step. This node then gains a number of edges, with connections chosen randomly through preferential attachment. We begin with 10 unconnected nodes and add either 4 or 8 edges in each step. For each combination of network size, type and network characteristics, we generate 100 networks.
We also run tests on selected real-world networks, taken from the Stanford SNAP project networks (http://snap.stanford.edu/data), namely DBLP, CA-CondMat and soc-Epinions1. Real-world networks are not strictly small-world or scale-free and instead can display properties of both types of network [52,53]. Details of the characteristics of the real-world network samples we utilise are provided in Table 4.
For networks of 10000 nodes or less, we perform simulations for each combination of r value, burn-in time, controllable concept heuristic and seed set size under 100. For networks of 25000 nodes or more, we use each seed set size of 100 or above. This means that for each controllable concept heuristic, we have 100 results for each combination of r value, seed set size, burn-in time, network type and network size. For the real-world networks we run 100 simulations for each combination of parameters, using seed set sizes of 100 or above. We also performed two tailed t-tests to compare the performances of the various heuristics. For each environment, we performed the test on each possible pairing of heuristics.
With these experiments we aim to evaluate the feasibility of indirect influence manipulation in a social network. The main limitation of this work is that while we use synthetic networks and models that have been widely used, we focus on simulation rather than real-world validation. Ideally, after an initial period of testing, a real-world study would be performed. As discussed in Section 2.4, the few real-world studies that exist are small in scope, and focus on the problem of uncertainty. Instead, we would wish to use a large network for which topological information is available, such as an online social network. We would then select individuals to Manipulating concept spread using concept relationships generate content that relates to another, trending, topic but without directly linking to it. From there we would track the individual's neighbours that begin to spread information on the trending topic. However, such a study is difficult to perform as many social networks are, understandably, protective of their data. Furthermore, there is an issue of how to incentivise the individuals we choose to generate content. For now, we focus on empirically validating a method of indirect influence manipulation, since real-world studies are expensive to perform, are time consuming, and potentially raise privacy concerns. Our view is that demonstrating the effectiveness of a heuristic through simulation is an important first step that should precede any real-world application.

Results for indirect concept maximisation
We evaluate the performance of MPG for both the ICM and LTM for the indirect influence maximisation problem, across three types of network namely small-world, scale-free and realworld. In this problem, the controllable concept has a boosting relationship with the target concept and we aim to maximise the spread of the target concept.

Small-world networks
We begin our assessment of the performance of the different heuristics with small-world networks. We first consider the ICM and then the LTM, discussing the impact of the different parameters explored in this study.

The Independent Cascade Model.
When considering small-world networks in the ICM, we see that MPG performs well when the burn-in time is 0, consistently outperforming all other heuristics in increasing the number of target concept activations, as shown in Table 5. MoBoo is the only other heuristic that performs close to MPG, with the other degree-based heuristics all performing poorly. This is to be expected, due to MoBoo being the only other heuristic that accounts for possible concept interactions and was explicitly designed for concept spread maximisation. Table 5 summarises the performance of the heuristics when the concept relationship is strong. We note that, when comparing various graph types, the size of the graph and its clustering exponent have no significant effect on the performance of any heuristic. The size of the seed set does affect performance, as is to be expected. The more initial seed nodes there are, the further we expect concepts to spread. Importantly, the rank order of the heuristics' performance does not change, but each heuristic increases the number of activations by a similar magnitude.
Two factors that do notably affect the performance of MPG and MoBoo are the length of the burn-in time and the strength of the relationship between the target and controllable concept. Fig 5 highlights the difference in performance between MPG and MoBoo for a variety of burn-in times and relationship strengths. As the relationship strength increases, we see both heuristics improve their performance, as a stronger relationship will naturally make MPG and MoBoo more effective, as they aim to maximise the number of interactions between the concepts. With a strong concept relationship, any interaction between the target and controllable concept greatly increases the probability of the target concept spreading, which can in turn facilitate further interactions. Fig 5 also demonstrates the effect of the burn-in time on the performance of the heuristics. Both heuristics demonstrate decreased performance with higher burn-in times, as is expected based on Fig 4. A typical cascade gains most of its activations in the first few time steps, and a higher burn-in time means that there are fewer activations occurring and so less chance of interactions taking place. With no interactions, we cannot indirectly affect the spread of a concept. We also note that MPG is more affected by burn-in time than MoBoo. When the burn-in time is 0, we see a significant difference between the two heuristics (p < 0.05) in favour of MPG. However, at a burn-in time of 2, there is minimal difference between the two heuristics, and with a burn-in of 5 MoBoo begins to outperform MPG. At weaker relationship strengths, we can see that the difference is minimal and not statistically significant (p > 0.2). However, once the r value exceeds 1.4, MoBoo significantly outperforms MPG when the burn-in time is 5 (p < 0.05).
While both heuristics calculate the gain of selecting a given node, the method utilised by MoBoo seems to mitigate the impact of fewer activations occurring in later time steps.

The Linear Threshold Model.
When concepts spread using the LTM, we see a similar performance in small-world graphs as we do for when concepts use the ICM. As Table 6 highlights, MPG continues to out-perform the other heuristics when there is no burn-in time in small-world networks. A major difference is the magnitude of activations. When comparing similar network and seed set sizes, we see a larger number of activations for all heuristics in the LTM when compared to the ICM. Nodes in the LTM continuously exert influence once they activate a concept and so, unlike in the ICM, a node that activates the controllable concept after activating the target concept can still be affected. This means there will be more concept interactions, leading to the boosting relationship being more effective. Note that this is true for small-world networks due to the low variability of node degree, meaning that most nodes will have a degree close to the average. In the LTM, nodes typically require multiple neighbours to be active before they activate a concept and so these additional interactions contribute Table 5 effectively to the number of activations. In networks with a higher number of low degree nodes, we expect to see fewer activations in the LTM. We also see that MoBoo has its relative performance affected. In the ICM we observed that MoBoo performed similarly to MPG, but in the LTM there is a much larger difference between the two. Another significant difference between the ICM and the LTM is the effect of burn-in time. As Fig 6 demonstrates, a longer burn-in time actually slightly improves the performance of MPG. If we compare the difference in performance between a burn-in time of 0 and 2, we see a noticeable increase in the performance of MPG. Furthermore, we see a diminished increase between a burn-in time of 2 and 5. At higher burn-in times, in the LTM, there will be more actively spreading nodes, unlike in the ICM where there are less. For example, consider introducing the controllable concept at time step 0, with a seed set size of 100 for both concepts. This means that there are 100 nodes that can spread the target concept when the controllable Manipulating concept spread using concept relationships concept is introduced. If instead, we introduce the controllable concept at time step 2, the target concept will have spread and there may be 200 nodes actively spreading the target concept. In the LTM, once a node activates a concept, it as always capable of spreading that concept and so introducing the controllable concept at time step 2 in this scenario is equivalent to introducing it at time step 0 but with a seed set size of 200 for the target concept. We have previously seen that a higher number of nodes with the target concept active increases the performance of MPG, and this remains true here.

Scale-free networks
Scale-free networks differ to small-world networks in the way that nodes are connected to each other. In scale-free networks the degree distribution follows a power law, meaning there are a small number of nodes with a very high degree while the majority of nodes will have a very low degree. In this section we explore how this characteristic affects indirect influence maximisation, across both the ICM and LTM.

The Independent Cascade Model.
In scale-free networks, for the ICM there is little difference between the degree-based heuristics and MPG in terms of performance, as shown in Table 7, although MPG does consistently outperform MoBoo. Across the different network types and sizes, the degree-based heuristics consistently perform best, followed by MPG and then MoBoo. The power law degree distribution present in scale-free networks clearly lessens the advantage of the exploration performed by MPG and MoBoo.
Unlike in small-world networks, we see that the seed set size does not significantly impact the performance of the different heuristics. This suggests that increasing the seed set size does not increase the area of the network that the seed set can reach, resulting in selecting new seeds whose reach largely overlaps with previously selected seeds. However, both network size and the number of edges added per node during creation affect performance. We can see that the performance of all heuristics is increased by approximately 85% when the number of edges is doubled, and similarly when the size of the network is doubled. Due to how the scale-free networks are constructed, increasing either of these parameters will result in the hub nodes of the network having higher degrees, increasing the chance of concept interaction, resulting in more activations.
If we observe the performance of MPG and degree discount across different burn-in times and relationship strengths, as illustrated in Fig 7, we see that the burn-in time affects Table 6 performance as is the case in small-world networks, although the impact is not as pronounced in scale-free networks. Furthermore, we see that degree discount increases its performance as the relationship strength increases. For both the ICM and LTM in small-world networks, the degree-based heuristics did not noticeably increase their performance as the relationship strength increased.

The Linear Threshold Model.
For the LTM in scale-free networks, we observe very similar results as for the ICM. As seen in Table 8, the degree-based heuristics all perform at a similar level and perform best, followed by MPG and then MoBoo. There are also fewer activations across all environments in the LTM than in the ICM. The power law degree distribution of scale-free networks means that the majority of nodes have a low degree, which can make the activation of a concept difficult in the LTM. As expected, based on the results of the LTM in small-world networks, we see MPG increases its performance when the burn-in time is non-zero. As Fig 8 shows, while MPG does increase its performance, it never outperforms degree discount. We note that, unlike MPG, the performance of degree discount decreases as the burn-in time increases.

Real-world networks
Performing influence spread in the real-world is extremely difficult, due to the issues of defining what constitutes an individual being influenced, and of mapping the network. Therefore, we utilise simulations based on samples of real-world networks as discussed in Section 6.

The Independent Cascade Model.
In real-world topologies, for the ICM we observe a similar pattern of performance as seen in the synthetic scale-free networks. Degree-based heuristics perform the best, with no statistically significant difference between them, followed by MPG, and then MoBoo being outperformed by all other heuristics. From Table 9, we can see that both MPG and MoBoo are typically outperformed by the degree-based heuristics, despite their focus on utilising concept relationships. The difference in performance between MPG and the degree-based heuristics is smaller in both the CA-CondMat and the soc-Epi-nions1 networks. Despite DBLP having over 4 times the number of nodes as soc-Epinions1, it only has slightly more than twice the edges. Comparing DBLP to CA-Condmat we see that DBLP has nearly 14 times the number of nodes, and approximately 11 times the number of edges. These factors suggest that DBLP is a sparser network than soc-Epinions1 or CA-Cond-Mat. MPG targets nodes close to the currently active nodes, which are likely to have a low degree in DBLP, decreasing the impact of the boosting relationship. In this case, attempting to maximise the spread of the controllable concept performs better, as we increase the chances of the target concept activating nodes in a path that leads to the few high degree nodes of DBLP.
Furthermore, from Table 9, we see the typical pattern of heuristic performance decreasing as the burn-in time decreases. We also note that when the burn-in time is 5, MPG begins to perform better than the degree-based heuristics. However, this difference is not statistically significant, and introducing the controllable concept much later than time step 5 is unlikely to see further improvement, as the infection rate begins to rapidly drop.

The Linear Threshold Model.
Considering the LTM in real-world networks, we see similar results to using the ICM. For both CA-CondMat and soc-Epinions1, we see in Table 10   Table 7 that the degree-based heuristics perform best, with no single heuristic consistently performing best. In CA-CondMat, the difference between the best and worst performing heuristics is small, but that performance gap becomes more significant in soc-Epinions1. If we consider the DBLP network, we see that MoBoo is most often the best performing heuristic. With a smaller seed set, we see MoBoo consistently outperform all other heuristics, even as the burn-in time increases. However, with a larger seed set, we see the difference in performance lessen. This, combined with the general decrease in performance seen by the increase in burn-in time, leads to MoBoo being outperformed by the degree-based heuristics. We see that MPG performs significantly worse than the other heuristics in the DBLP network in the LTM. As noted before, in the LTM, a node will often require multiple active neighbours to become active. In a sparse network, such as DBLP, it becomes difficult for activations to occur and we never observe more than 9.1% of the network being activated.

Summary
MPG performed well in small-world networks, both in the ICM and LTM, only being significantly outperformed by MoBoo in the ICM when the burn-in time was 5 time steps. In scalefree networks we saw that degree-based heuristics are the most effective, but often the difference in performance was small. The characteristics of the network are clearly important, as highlighted by the experiments with real-world networks. The real-world networks had vastly different node counts, edge counts, diameters and densities. These factors all affect the best approach for indirect concept maximisation, with no heuristic consistently performing best.
The burn-in time and the increased efficiency of information spreading are the most significant factors impacting the performance of MPG when indirectly boosting the spread of a target concept. Within the LTM, we see the impact of burn-in time on heuristic performance diminished when compared to the ICM.

Results for indirect concept limitation
As with indirect concept maximisation, we evaluate the performance of MPG across a variety of network types. While the indirect influence limitation and indirect influence maximisation problems are similar, their opposite aims may change the characteristics that impact the performance of the various heuristics. We note that, except for MoBoo, all heuristics have previously been evaluated for the influence limitation problem using the ICM across small-world, scale-free and real-world networks in [9]. Most notably, the results for influence limitation in the ICM for small-world networks were the main focus of this previous work, while the heuristic evaluations in scale-free and real-world networks were much less detailed than presented here.

Small-world networks
We divide our discussion across the two influence spread models, as before. This allows us to highlight how the individual characteristics of those models affect indirect influence limitation.

The Independent Cascade Model.
In the context of the ICM, MPG significantly (p < 0.05) outperforms the other heuristics when there is no burn-in time, as shown in Table 11. This result is consistent across seed set sizes and network clustering coefficients. As Table 8 in the indirect influence maximisation problem, we see minimal difference between the degree-based heuristics. We also note that, as before, network size and clustering coefficient has little effect on the performance of the different heuristics. Furthermore, we observe that when the focus is influence limitation, the MoBoo heuristic performs poorly. This is unexpected, as both MoBoo and MPG focus on utilising probable paths to calculate the expected gain of a node. However, in all cases, MoBoo performs worse than MPG for indirect influence limitation. As Fig 9 demonstrates, as the burn-in time increases the performance of MPG becomes significantly worse. The performance drop from a burn-in time of 2 to 5 is much less than that from 0 to 2, which is to be expected. The majority of spreading for a concept occurs in the first 2 time steps, meaning that the ability of the controllable concept to significantly limit the spread of the target concept is diminished. At the highest burn-in time, there is no statistically significant difference between the performance of all heuristics (p > 0.5), and even the strength of the relationship between concepts has little effect. This is in contrast to MPG with a burn-in time of 0 where we see that as the relationship strength decreases, becoming more inhibiting, the performance of MPG noticeably increases. Table 9. Average infections for the target concept for real-world networks in the ICM and a r value of 2, with standard deviation in brackets, and the best performing heuristic in bold.

Network
Seed  Table 12 shows that MPG maintains its superior performance in small-world networks in the LTM. Overall, we can see that the number of target concept activations decreases when compared to the same parameters in the ICM. This is intuitive since the LTM is threshold based. A node with the controllable concept active will have all incoming influence decreased, and will exert decreased influence. This makes the node more difficult to activate and it will contribute less to the activation of other nodes. Furthermore, the controllable concept can also spread through the local area, making it difficult to activate any node in the neighbourhood.
Unlike in the problem of indirect influence maximisation, we do not see an increased burnin time result in an increase to the performance of MPG in the LTM. As Fig 10 shows, increasing the burn-in time has a similar impact as in the ICM. The primary difference is that the strength of the relationship is less important in the LTM, as it is only at the weakest inhibiting relationship that we see a change in the performance of MPG.

Scale-free networks
In the indirect influence maximisation problem, we saw that the existence of hub nodes resulted in the heuristics that considered relationships, namely MPG and MoBoo, becoming less effective. We now consider their effectiveness for indirect influence limitation.

The Independent Cascade Model.
Observing scale-free networks in the ICM we see that all non-random heuristics perform at a similar level when the burn-in time is 0, as shown Table 11. Average infections for the target concept for small-world networks in the ICM, with no burn-in time and a r value of 0, with standard deviation in brackets, and the best performing heuristic in bold. in Table 13, aside from MoBoo which consistently performs the worst, by a significant margin. Unlike when considering indirect influence maximisation, we see that seed set size, in addition to network size and clustering exponent, all affect the performance of the heuristics. For indirect influence limitation, increasing the seed set size of the target concept will naturally result in more activations. As such, we see the performance of the influence limitation heuristics worsen. Similarly, increasing the size of the network or the number of edges added during creation again aids the spread of the target concept and increases the difficulty in limiting its spread. We note that the increased efficiency with which information can spread in scale-free networks naturally makes inhibition less effective. MPG focuses on finding nodes with a high number of expected activations, that are likely to be activated by the target concept, typically selecting nodes of high degree or the neighbours of those nodes. Due to the degree distribution of scale-free networks, there are nodes with a degree exceeding 500 in a 25000 node network. If the target concept is activated on one of these high degree nodes, it can then efficiently spread to a large proportion of the network, the majority of which will not have the controllable concept active. This means that the impact of the controllable concept becomes negligible.

Network
In terms of the effect of the burn-in time, Fig 11 shows that, at a burn-in time of 0, both heuristics perform at a similar level. Then, as the burn-in time increases, we see that degree discount begins to out-perform MPG. This is to be expected, due to MPG's noted sensitivity to the burn-in time. Table 14, to the ICM, we see that for scale-free networks there are uniformly more activations in the LTM. Again, the heuristics all perform similarly, with MoBoo being the only outlier. Furthermore, MPG does not perform as well in networks with 4 edges added as it did in the ICM. As in the ICM, we see that all of the network characteristics can affect the general performance of the heuristics but not drastically enough to make one perform significantly better than the rest.

The Linear Threshold Model. Comparing influence limitation in the LTM, shown in
Considering the impact of the burn-in time on performance in the LTM, Fig 12 shows that, unlike in the ICM, increasing the burn-in time decreases the difference in performance. In all cases, the degree discount heuristic performs better, but the difference noticeably decreases between each different burn-in time. Furthermore, the decreased significance of relationship strength in the LTM is also displayed.

Real-world networks
We saw that the properties of the real-world networks had a significant impact on the performance of the various heuristics when studying the indirect influence maximisation problem. We now consider how these properties affect the heuristics in the indirect influence limitation problem.

The Independent Cascade Model.
When considering the ICM, we see that no single heuristic consistently performs the best. Table 15 shows that, most often, degree discount is the best performing heuristic, but can be outperformed by MPG and single discount. Overall, the performance of all the heuristics is inconsistent.
In the DBLP network, we see that MPG performs significantly better than the other heuristics, when the burn-in time is 0. The sparser nature of this network, as previously discussed, means that it is difficult for concepts to spread, which makes the local targeting of MPG more effective at inhibition. This advantage is soon lost, and as the burn-in time increases MPG's relative performance decreases, as we have seen throughout this analysis. It becomes comparable to the degree-based heuristics at a burn-in time of 2, and is noticeably outperformed by degree discount at higher burn-in times.
In the CA-CondMat network, we see MPG perform best for small seed sets and burn-in time of 0, then begin to perform noticeably worse than the degree-based heuristics at larger burn-in times. In particular, the distance between the performance of the degree-based heuristics and MPG is greatest when the burn-in time is 2. At a burn-in time of 5, it is likely that the target cascade has performed the majority of its spreading and so no heuristic can make a significant difference in the final number of infections. At a burn-in time of 2, however, MPG's strategy of targeting local areas is clearly ineffective compared to spreading the controllable concept as widely as possible. Furthermore, we see very similar behaviour in the soc-Epinions1 network which, as we have already discussed, has similar properties to CA-CondMat.

The Linear Threshold Model.
In the LTM, we see similar inconsistency across the real-world networks as we observed in the ICM. The degree-based heuristics often perform the best but, as seen in Table 16, no single heuristic is clearly superior.
We also observe how the LTM is less affected by burn-in time in the performance of MPG in the DBLP network. When compared to the ICM, we see that in the LTM, MPG maintains superior performance in the majority of environments, even as the burn-in time increases. However, the DBLP network results in all heuristics having significantly higher standard deviations than in other networks. As such, it is difficult to confidently say that MPG is the best choice for indirect influence limitation in the DBLP network, especially at higher burn-in times.
For both CA-CondMat and soc-Epinions1, we see the same behaviour in the ICM. At a burn-in time of 5, all heuristics perform similarly and MPG has comparatively the worst performance when the burn-in time is 2.

Summary
Overall, we see that MPG performs best in small-world networks, with no burn-in time, for both influence spread models. In these networks, MPG consistently and significantly outperforms the degree-based heuristics. With a high burn-in time, all heuristics performed at a similar level, due to the inhibiting concept being introduced as the target concept cascade is ending. In scale-free networks, we see no heuristic performing significantly better than others. Similarly, in real-world topologies we see that none of the heuristics consistently maintains superior performance when compared to the others.

Computational Performance
We present a brief discussion on the runtime performance of the three most effective heuristics, namely degree discount, MPG and MoBoo. These three heuristics require different levels of information and perform varying levels of calculation, and it is important to evaluate how the runtime of each heuristic scales as the size of the network grows. The results of our tests can be seen in Table 17. Degree discount has the most consistent execution time, which increases slowly as the size of the network increases. Of the three heuristics, we see that degree discount is the only one not to have its execution time increase significantly when used in scale-free networks, remaining comparable to its execution time in small-world networks.
In small-world networks, MoBoo's execution time increases at a similar rate to degree discount, but with higher initial values. In scale-free networks, MoBoo's execution time increases at a higher rate, as both the number of nodes and edges increases. Furthermore, in smallworld networks MoBoo consistently has the worst execution time, and in scale-free networks it increases at the highest rate, although it is generally faster than MPG.
MPG is significantly affected by the type of network it is applied to. In small-world networks, MPG often executes fastest and scales with the slowest rate. Since MPG best manipulates the spread of a concept within small-world networks, this means that it runs very quickly in the best scenario for its use. In scale-free networks, MPG scales poorly, especially in the more connected networks with 8 edges added for each node. The aim of MPG is to limit exploration to a local area to prevent the need to search the entire network multiple times. However, in scale-free networks, the presence of hub nodes means that many nodes contain the majority of the network within their 2-hop neighbourhood. As we can see, the execution time of MPG suffers accordingly. The high execution time, and low effectiveness, means that we should avoid the use of MPG in scale-free networks, especially when compared to degree discount. Degree discount has been shown to be at least as effective as MPG in scale-free networks, with a fraction of the execution time.

Conclusion
In this paper, we have defined the indirect influence maximisation and indirect influence limitation problems, in which a controllable concept is used to indirectly affect the spread of a target concept. We proposed the MPG heuristic as a method for addressing both indirect influence maximisation and indirect influence limitation, and evaluated its effectiveness against previously used influence maximisation heuristics. For both problems, we evaluated MPG in a variety of small-world, scale-free and real-world networks for both the ICM and LTM.
In both the indirect influence maximisation and indirect influence limitation problems, we found that in the ICM MPG performs effectively in small-world networks with a burn-in time of 0. In scale-free networks and any environment with no burn-in time the difference in performance between the heuristics begins to disappear. The MoBoo heuristic was the only heuristic examined that could consistently outperform MPG, but only in small-world networks with a high burn-in time when attempting to boost the target concept. However, unlike MPG, we saw that MoBoo could not adapt to the problem of influence limitation. Overall, we see that MPG is the best performing heuristic in small-world networks, with influence manipulation in scale-free networks seeming to be much less effective. In some cases, MPG performs better in the LTM, due to the decreased significance of burn-in time when compared to the ICM.
In particular, we see that MPG is most suited for limiting the spread of a concept when the controllable concept can be introduced at the same time, or very soon after, the target concept. For example, if a network is being observed and an undesirable concept, such as an epidemic or competing product, begins to spread in one area, we may focus on a local area nearby that we expect to be exposed to this concept. MPG can be used in this instance to introduce a Table 14. Average infections for the target concept in the LTM for scale-free networks, with no burn-in time, a seed set of 500 and a r value of 0, with standard deviation in brackets, and the best performing heuristic in bold. controllable concept to a local area at time close to when we expect the undesirable concept to reach that area. In the real-world it is typically impossible to introduce a new concept at the exact time as an undesirable concept begins to spread, but it is possible to predict areas that may be exposed to a spreading concept. If we can introduce our controllable concept at that time, then MPG provides an effective method to limit the spread of the undesirable concept to that local area. In a large network, this can be applied to several areas simultaneously, allowing for a wider area of the network to be protected. Ideally, the local network we choose will exhibit small-world properties, since hub nodes lower the effectiveness of indirect concept manipulation in general. When considering the real-world networks, we saw that the differences in their parameters result in different performances for each heuristic, though none was consistently superior. This further highlights the impact of network parameters, making the development of a general indirect influence manipulation heuristic difficult.

Network
In the future, we will investigate the use of MPG to manipulate influence spread in other spreading models. For example, we will consider the SIS model and a more general version of Table 15. Average infections for the target concept for real-world networks in the ICM and a r value of 0, with standard deviation in brackets, and the best performing heuristic in bold.

Network
Seed the SIR model [12,13]. These provide variations in how a concept may spread through a network, and the ability to manipulate a concept's spread in these models will allow for investigation of problems focused on tackling epidemics and disease spread. In addition, we also wish to investigate the performance of MPG in more complex network environments, including in multi-layer networks, representing the different social networks an individual may belong to.
Project administration: Nathan Griffiths.