Staged Models for Interdisciplinary Research

Modellers of complex biological or social systems are often faced with an invidious choice: to use simple models with few mechanisms that can be fully analysed, or to construct complicated models that include all the features which are thought relevant. The former ensures rigour, the latter relevance. We discuss a method that combines these two approaches, beginning with a complex model and then modelling the complicated model with simpler models. The resulting “chain” of models ensures some rigour and relevance. We illustrate this process on a complex model of voting intentions, constructing a reduced model which agrees well with the predictions of the full model. Experiments with variations of the simpler model yield additional insights which are hidden by the complexity of the full model. This approach facilitated collaboration between social scientists and physicists—the complex model was specified based on the social science literature, and the simpler model constrained to agree (in core aspects) with the complicated model.

. Using a network with either high degree (blue) or high rewiring probability (red) regains the bistability observed in the reduced model. Both simulations use model M 2 . The high degree network has degree = 65 and rewiring probability = 0.15. The high rewiring network has degree = 15 and rewiring probability = 0.45.
interested individuals are more connected to less interested individuals, while a lower 30 level of interest is achieved if highly interested individuals are more connected with 31 other highly interested individuals. This is so because the influence of highly interested 32 individuals is 'wasted' if they talk to one another, while it has a larger effect if it is 33 concentrated on less interested individuals. When the immigrants are different to the 34 incumbent population, it will boost turnout if they tend to be connected to the 35 incumbent population; if immigrants have higher interest they will be able to pass this 36 interest to the rest of the population; if immigrants have lower interest they will have 37 the chance to increase it via contacts with the rest of the population. 38 Runtime comparison 39 To assess the difference in computational demands of the full and the reduced model, we 40 compared the (real) time needed to run the models on a standard desktop computer.  (h(i)), post-18 education (e(i)); 52 integer variables: (political) interest level (l(i)), minimum interest level (m(i)), 53 age (in years, a(i)), number of remembered (political) conversations (c(i)). 54 The main parameters of the model are: 55 Influence rate K, scales the number of (political) conversations per year. 56 The same procedure initialises immigrants into the model, using the subset of the BHPS 66 corresponding to survey responses from immigrants. This procedure sets the civic duty, 67 turnout, habit, post-18 education, interest level and minimum interest level, with some 68 of these characteristics being inferred using proxies for the required information. Agents 69 initially have no remembered conversations, and an age drawn from a uniform  The following processes happen in a loop until the required timepoint is reached. All 78 rates are given in Table 1.  The agent has the chance to initiate three conversations, with 83 probabilities p c (l(i), v(i)) each with a random other agent.

84
Agents (with l(i) > 0) receiving a conversation (from an agent with civic 85 duty), acquire civic duty with probability p acd (e(i), v(i)).   Else, if c(i) > T h l then set l(i) = m(i) + 1.

90
Updating civic duty: Agents lose civic duty with probability, 91 p lcd (a(i), e(i)), dependent on their age and education.

92
Forgetting conversations: Agents forget conversations that happened more 93 than one year ago, with probability, p f (l(i)), per conversation, dependent on the 94 agent's interest level.

95
Birth/death: Each agent dies with a probability, p d (a(i)), dependent on their 96 age, and is replaced by a new agent by the 'birth' process (described in the 97 Initialisation procedures).

98
Immigration/emigration: Each agent emigrates with a probability p e = 0.015 99 and is replaced by a new agent by the 'immigration' process ( factors) with probability p c (a(i)), dependent on their age.

105
Agents gain habit if they vote in 3 consecutive elections.

106
Agents lose habit if they do not vote in 2 consecutive elections.

107
Here x denotes the integer part of x, that is, the largest integer less than or equal to 108 x. Here we give more detail about the full model (M 1 ). This description will follow the 111 "ODD" protocol for this [2]. The full code, a complete description of the details of the 112 model and a sensitivity analysis can be found at [3]. This is intended as a consistent, detailed and dynamic description, in the form of an 116 agent-based simulation, of the available evidence concerning the question of why people 117 bother to vote. This integrates a variety of kinds and qualities of evidence, from source 118 data and statistics to more qualitative evidence in the form of interviews. The model is 119 being developed following a KIDS rather than a KISS methodology, that is, it aims to 120 be more guided by the available evidence rather than simplicity [4]. Each patch represents a household or other location (place of work, school, or kind of activity). The circles or triangles on patches are the agents of a household. The links between them social networks of different types. Agents have a number of attributes, including: their ethnicity (shape), their political leaning (colour), and their age (size). Other agent attributes include class, level of political interest, and which activities they belong to. They also have a (partial) memory of past events, including who they voted for and the political discussions they have had.

122
The model is based around a 2D grid of locations, each of which may be a: household, 123 place of work, school, activity (two kinds) or empty. Households consist of a number of 124 agents which each represent a single person. Agents are born, age, partner, have 125 children and die as the simulation progresses. Agents have a large number of 126 characteristics, but these include: a memory of past events, a party affiliation (or none), 127 a set of family relationships (children, partner, and/or parents) and social connections 128 with other agents. It is over the network of social relationships that influence occurs in 129 the form of events that represent communication about political or civic matters. The 130 agents are influenced over time via these communications. When an election occurs,  The simulation is initialised at the start. Then the simulation proceeds in discrete time 178 steps, one step usually representing each month in a year. Each time step the following 179 stages are carried out. For each of these stages agents are fired in a random order (newly random each time 223 and process). In most of these processes the update for each agent has no immediate 224 effect on any other agent, so these agent processes are effectively in parallel. Similarly 225 most of these stages could be done in any order with very little impact on the outcome, 226 the exception being the sub-stages of voting (item 14 above).  To fill in some of the cognitive and contextual "glue", evidence from many different 239 sources has been included to motivate the assumptions and mechanisms of the model.

240
Thus it is difficult to identify discrete "sub-models" in this. However, a post-hoc analysis 241 of the structure that emerged suggests the following could be considered as sub-models: 242    confounded by factors such as: have a very young baby, having just moved, having 302 just been made unemployed or being to ill to vote, (5) finally those going to vote 303 may "drag" others to come with them and vote, especially partners or family.

304
PLOS 10/18 9. Voting statistics are then recorded, with agents remembering where and when 305 they voted, with the election result being decided by the majority vote within the 306 model (although an option is that it could be imposed from outside).

307
The above are not the full details but a summary of their main features. Generally 308 micro-causation in the model happens down the order above (from first to later), but 309 there are some weaker and slower feedbacks that occur back up, for example the 310 outcome of an election effects agents' perceptions of the experience of voting (whether 311 voting resulted in the party they wanted); the characteristics of agents (including party 312 leaning) may affect which activity they join, their friends and who they choose as a Clearly in such a complicated model it is not possible to make an easy and clean 316 distinction between results that emerge and those that are programmed into the model. 317 Indeed, the model was designed with a view to integrate available evidence rather than 318 produce or demonstrate emergent effects (or to be predictable). However it is not the 319 case that all outcomes from the model are straightforwardly forced by the settings and 320 programmed micro-processes, including the following.

321
• Although the underlying demographic model is fairly predictable in its unfolding, 322 which partnerships are formed affect which new households with children are 323 created (that do not result from people moving into the district from outside), so 324 the developing social network affects the demographics a little.

325
• The patterning of households within the 2D space has certain self-organising 326 features. Households have a tendency to move to districts where surrounding 327 households will have some similar agents to themselves, resulting in some weak 328 clustering. The positioning of schools also has an effect as children will go to the 329 nearest school, and links may be formed between parents of children at the same 330 school.

331
• Agents will tend to choose to participate in (voluntary) activities whose other 332 members are (on average) most similar to themselves, so that these activities tend 333 to act to promote clustering of similar individuals, regardless of location.

334
• Depending on the network structure, clusters of agents will tend to reinforce 335 patterns of interest/lack of interest in politics. This may reinforce or act against 336 tendencies that might already exist within households of different kinds within the 337 simulations (which will for the reasons above tend to cluster together in terms of 338 location and activity membership etc.).

339
The initialisation of the model (see below) has a complicated but predictable effect 340 on the model, in that the kinds of household the model is seeded with will affect the 341 tendencies that follow. Thus in the data set that these are selected (at random) from 342 those from "invisible minorities" (Irish etc.) tend to be more politically involved and 343 have a higher sense of civic duty than the native majority population, so if the model is 344 selected to have more of this kind one will find a higher level of turnout.

345
The impact of many of the parameters is straightforward, for example: increasing "re-mixing" is done to achieve the user defined proportion of majority population as well 454 as to ensure that out-of-UK immigration is selected from those recorded as immigrants 455 in the BHPS sample. Thus the mix of initial households in each run of the simulation 456 will be somewhat different, but on the whole, the balance of household characteristics 457 will be representative for simulations with larger populations albeit with some stochastic 458 variation.

459
Collectives 460 Some of the agent characteristics do influence how the agents make links and move.

461
Which locations a household moves to is influenced by a bias towards moving next to 462 households with similar characteristics; which instance of a kind of activity 1/2 are 463 joined will be those whose existing members have (on average) the most similar 464 characteristics as themselves; which person they make links with via an activity will be 465 biased by a similar homophily formula. Thus over time agents will tend to have more 466 links with those similar to themselves. However due to the presence of much 467 stochasticity in the model this does not produce pronounced segregation, but rather a 468 "softer" bias in terms of social links. The characteristics that are involved are: age, 469 ethnicity, class and political leaning. At the moment there is a single dissimilarity 470 measure used between two agents regardless of the context (in future versions this will 471 be changed so that there are different measure for different circumstances, so (for 472 example) a weaker one at work than for choosing which instance of an activity to join). 473 Political parties are not currently represented, except implicitly in terms of the The grid is initialised in the following manner: households (e.g. which child in a household belongs to which parent) have to be 501 inferred from the data as this is not always unambiguous. Some initial agent 502 characteristics are set using proxies from the data, e.g. civic duty is set for agents 503 who are recorded as being a member of certain kinds of organisation 504 • Links to household members and some random neighbours are made 505 • To give the households an initial network the procedure to develop other network 506 links is done 10 times for each household.

507
• Appropriate activities are joined depending on those in the BHPS data. Thus the 508 exact composition of the grid varies in each run but are drawn from the same 509 sample, so in a sufficiently large initial set of households (determined by the size 510 of the grid and how much is left empty) one gets a similar mixture each time.

511
Various other things are initialised including: shapes and colours for main display, 512 election dates, and party labels.

514
There are two sets of data that are used in the model:

515
• A sample of the 1992 wave of the BHPS data as described above. This file cannot 516 be distributed due to UK Data Archive restrictions and it will be soon available 517 on their site. In its stead we are distributing the model with synthetic data which 518 does not relate to any real individuals but has some of the same characteristics as 519 the original file [3]. It is important to understand that this is not a simulation with free-parameters that are 526 conditioned on some "in-sample" data. It does have a lot of parameters, but these are 527 set (or could be set) from empirical data. The model is then run "as is" and can be 528 compared with available data -to see how and where it matches this and when it does 529 not. Thus (unlike many models) it is not an attempt to 'fit' any data, but rather is a 530 computational description to enable the 'detangling' and critique of various 531 explanations of observed social behaviour.

532
Some of the principal parameters that have real referents (that is, in principle they 533 could be determined from empirical data), include the following:  housholds seeking to move near empty spaces, a high value to avoid empty spaces) 550 • election-mobilisation-rate: the percentage of its supporters who are not intending 551 to vote that a party tries to get to vote 552 • start-mobilisation: when party mobilisation starts 553 • end-mobilisation: when party mobilisation stops 554 The following allow the turning on and off of various processes or structures and 555 thus allows the comparison of the simulation behaviour with and without them. Some of the other parameters can be used to implicitly switch processes on and off: 569 • influence rate: setting this to zero switches off all political conversation (apart 570 from mobilisation conversations)

571
• prob-contacted: setting this to zero switches off mobilisation during elections 572 • major-election-period and minor-election-period: setting these to zero switches off 573 elections 574 • immigration-rate and int-immigration-rate: setting these to zero switches off any 575 incomers to model (warning may critically affect longer-term population levels)

576
• emmigration-rate: setting this to zero switches off any emigration model (warning 577 may critically affect longer-term population levels)

578
• birth-mult: setting this to zero switches off any births (warning may critically 579 affect longer-term population levels)

580
• death-mult: setting this to zero switches off any deaths (warning may critically 581 affect longer-term population levels)

582
• prob-partner: setting this to zero switches off any partnering after initialisation 583 (warning may critically affect longer-term population levels)

584
• separate-prob: setting this to zero switches off any separation of partners 585 (warning may critically affect longer-term population levels)

586
• forget-mult: setting this to zero switches off any forgetting of conversations etc. 587 by agents (warning will cause model to slow down as agent accumulate huge lists 588 of memories)

589
• move-prob-mult: setting this to zero switches off any moving within model 590 The following affect the initialisation of the simulation.

591
• density: the initial density of households in the spaces left for them after schools 592 etc. have been allocated

593
• majority-prop: the proportion of the initial population from the majority group 594 • init-move-prob: how many times households are moved in the initialisation (this 595 produces a slightly more realistic starting point for the model with weak 596 clustering)

597
The following control how the simulation run occurs and what data is output.

598
• start-date: year simulation starts 599 • end-date: year simulation finishes 600 • ticks-per-year: how many simulation ticks are in each year and probabilities 601 throughout the simulation are adjusted so that roughly the same will happen with 602 different settings of this, so as to enable fast debugging runs with 1 tick per year 603 before slower ones with 12. However there will be subtle differences in model 604 behaviour for different settings of this.