Sheep Movement Networks and the Transmission of Infectious Diseases

Background and Methodology Various approaches have been used to investigate how properties of farm contact networks impact on the transmission of infectious diseases. The potential for transmission of an infection through a contact network can be evaluated in terms of the basic reproduction number, R 0. The magnitude of R 0 is related to the mean contact rate of a host, in this case a farm, and is further influenced by heterogeneities in contact rates of individual hosts. The latter can be evaluated as the second order moments of the contact matrix (variances in contact rates, and co-variance between contacts to and from individual hosts). Here we calculate these quantities for the farms in a country-wide livestock network: >15,000 Scottish sheep farms in each of 4 years from July 2003 to June 2007. The analysis is relevant to endemic and chronic infections with prolonged periods of infectivity of affected animals, and uses different weightings of contacts to address disease scenarios of low, intermediate and high animal-level prevalence. Principal Findings and Conclusions Analysis of networks of Scottish farms via sheep movements from July 2003 to June 2007 suggests that heterogeneities in movement patterns (variances and covariances of rates of movement on and off the farms) make a substantial contribution to the potential for the transmission of infectious diseases, quantified as R 0, within the farm population. A small percentage of farms (<20%) contribute the bulk of the transmission potential (>80%) and these farms could be efficiently targeted by interventions aimed at reducing spread of diseases via animal movement.


Introduction
Understanding the structure of contact networks is important for predicting and controlling the spread of infectious diseases [1][2][3][4]. One important route of transmission of infectious diseases of farm animals is the movement of livestock between farms [5]. In Britain, comprehensive, computerized movement-record keeping systems have been in place for cattle since 1998 and for sheep since 2002. The movement-record data have been used in studies of the epidemiology of a variety of diseases, for example, foot-and-mouth disease in cattle and sheep [6], bovine tuberculosis in cattle [7,8], and scrapie in sheep [9,10]. But the records of movements of British livestock between farms also provide a rare example of large and fully documented contact networks. In parallel with the disease-specific studies there have been a number of studies of the generic properties of livestock movement networks relating to the spread of infectious disease. These have taken two approaches to characterising the movement networks.
The first approach, adopted from generic network methodologies, is based on assessing the connectedness of a farm network through calculating the size of its giant connected component [11][12][13][14]. For a directed network (where the link between two nodes may be in one direction or the other or both) the giant strongly connected component (GSCC) is the largest subset of members of the network mutually reachable through a direct path; each pair of members in the GSCC is connected in both directions [11]. The giant weakly connected component (GWCC) is the largest subset of the network linked by any contact [11]. Therefore the GSCC and GWCC provide lower and upper bounds, respectively, to maximum epidemic size. The giant out-component (GOC) is the subset of the network approachable from the GSCC by a direct path [11]; therefore the GOC includes the GSCC itself and all the farms which can be reached directly from the GSCC. An increase in the size of the GSCC of British cattle farm network was reported after new regulations governing the movement of cattle in the UK were introduced between 2001 and 2003 [15]. This result implies that the potential scale of infectious disease epidemics in British cattle may have subsequently increased rather than decreased.
A second approach is based on evaluating the potential for transmission for an infection that may spread through the contact network in terms of the basic reproduction number, R 0 . In this context R 0 is a measure of the expected average number of secondary cases generated from a single primary case introduced into a naïve population [16]. The relationship between R 0 and the giant connected components of the network were discussed by Kao et al. [17]. An important distinction is that R 0 is a function of the rates of contact of members of the network whereas the giant components are static measures of the network's connectedness [18].
For any contact network, contributions to R 0 can be partitioned into a first order moment (relating to the mean contact rate of a member) and second order moments (relating to the variances and co-variances in contact rates of individual members) [19]. Earlier work on livestock movements and other networks (e.g. human sexual contacts) has focused on the contribution of the variance in contact rates and, for networks with bi-directional links, the covariance between contact rates in either direction [2,14,17,20]. Using these measures and a sample of the cattle movement network in Scotland, Woolhouse et al. [20] concluded that the cattle network was consistent with the '20-80' rule, which states that 20% of the population contribute at least 80% of the magnitude of R 0 [2]. Interventions targeted at these farms could therefore be particularly effective in reducing the size of epidemics or the level of endemic infection.
Here, we analyse the entire contact network of Scottish farms via movements of sheep during 4 years from July 2003 to June 2007. Given knowledge of the complete network for each year, we calculate the sizes of GWCC, GSCC and the giant outcomponent, and the relative magnitude of the basic reproduction number. We partition the latter in terms of the contributions of the first and second order moments of the network. These calculations allow us to identify which features of the farm network structure, and which individual farms contribute the most to the potential for spread of infections through the network, and how these have changed from 2003 to 2007. We do not focus here on specific infections. However, because we consider a one-year time span and do not attempt to capture the early dynamics of disease outbreaks, our results are most directly relevant to endemic and chronic infections, with prolonged periods of infectivity of affected animals. By weighting differently the contacts between farms, we address diseases with three distinct scenarios of animal-level prevalence (high prevalence, low prevalence, and intermediate prevalence).

Descriptive network statistics and network's connectedness
Descriptive statistics for the Scottish sheep farm network for the 4 years from July 2003 to June 2007 are given in Table 1. In summary, each year the number of farms in the network, N, was greater than 15,000, with approximately 70,000 uni-directional connections between the farms. Over 100,000 sheep batches were moved within the network per year, totalling more than 2,000,000 sheep. Approximately half of the farms that recorded moving sheep within Scotland each year were part of the GSCC, twothirds were part of the giant-out component, and over 98% were part of the GWCC of the year's network ( Table 1). The size of the GWCC confirmed that the farm network was highly interconnected; the size of the giant out-component showed that a long-lasting infection introduced into this farm population within a year could directly reach nearly 70% of the farms via the movements of sheep.
The mean number of farm contacts per year was within the range 4.3 to 4.7 for the 4 years studied (Table 2A). The distributions of numbers of in-contacts (farms sheep were brought from) and out-contacts (farms sheep were sent to) made by individual farms in one year were highly over-dispersed ( Figure 1A), with only a small fraction of the farms making large numbers of contacts. The variances in the numbers of in-contacts were much greater than that of out-contacts (Table 2A). The linear correlations between the numbers of annual in-contacts and outcontacts of the farms, r binbout , were positive but weak over the 4 years studied (Pearson correlation coefficient +0.07 to +0.11, all p,0.001) (Table 2A and Figure 2A).
The mean numbers of batches of sheep received by (or sent from) a farm in a year was in the range 6.7 to 7.8 over the 4 years ( Table 2B). The variances in the numbers of batches received by a farm were much greater than that in the numbers of batches moved off (Table 2B and Figure 1B). The linear correlations between the numbers of batches moved on and off the farms in a year were slightly lower (Pearson correlation coefficient +0.04 to +0.07, all p,0.001) (Table 2B and Figure 2B) than the correlations between the numbers of annual in-contacts and out-contacts.
The mean numbers of sheep received by (or sent from) a farm in a year was in the range 137 to 144 per year (Table 2C). The variances in the numbers of sheep received by a farm were much greater than that in the numbers of sheep moved off (Table 2C and Figure 1C). The linear correlations between the numbers of sheep moved on and off the farms in a year were higher (Pearson correlation coefficient +0.18 to +0.36, all p,0.001) (Table 2C and Figure 2C) than the correlations between the numbers of batches moved on and off or between the numbers of annual in-contacts and out-contacts.

Impact of network's moments on the magnitude of R 0
To make the analyses relevant to diseases with different animallevel prevalence on affected farms the directed contact rate from farm j to farm i in a particular year, a ij , was defined in three ways: 1) present or absent (unweighted), 2) weighted by the number of batches of sheep moved (batch-weighted), and 3) weighted by the total number of sheep moved (animal-weighted). Model 1 is most appropriate for a highly transmissible infection with high animallevel prevalence, which would be likely to be transmitted via any sheep movement from farm j to farm i. Model 3 is most appropriate for a rare infection with low animal-level prevalence, for which the probability of transmission could be considered to depend linearly on the number of sheep moved farm j to farm i. Model 2 is an intermediate scenario, here represented by the farm contact weighted by the numbers of batches of sheep moved. For all three models, the contribution of heterogeneities in contact rates of the farms, second order moments of the farm contact matrix, to R 0 was quantified as the ratio of the quantity calculated in Expression [3] in Methods to the mean farm contact rate. Using unweighted a ij values (Model 1) the net contribution of second order moments of the contact network was to increase (from that contributed by the first order moment alone) the magnitude of R 0 by up to a factor of 2 (Table 2A, Column 7). This contribution varied only slightly throughout the years of the study. Using a ij values weighted by numbers of batches moved between farms (Model 2) the net contribution of the second order moments of the contact network was to increase the value of R 0 by a factor of more than 3 in year 1 but was lower, 2.20 to 2.36, in years 2 to 4 (Table 2B, Column 7). Using a ij values weighted by numbers of sheep moved between farms (Model 3) the net contribution of the second order moments was to increase the value of R 0 by a factor of more than 6 in year 1 but was also lower, 4.89 to 5.48, in the years 2 to 4 (Table 2B, Column 7).

Effectiveness of interventions targeted at the topcontributors to the magnitude of R 0
The '20-80' rule reflects that, in many situations, the potential for transmission of infection can be reduced by at least 80% by targeting just 20% of the members of population [2]. In the Scottish sheep farm network, removing the contribution of top 20% of farms most contributing resulted in at least 90% reduction in the magnitude of R 0 in any of the 4 years studied regardless of how contacts were weighted ( Figure 3 and Table 3, Column 2).
The magnitude of R 0 was reduced by at least 80% in a given year for animal-weighted contacts if the contributions of 6.8% to 8.1% of farms were removed. These fractions were smaller for unweighted contacts, from 1.1% to 2.1% of farms, and, for batchweighted contacts were from 2.2% to 4.3% of farms.
In practice, farm contact information from the preceding year may be more readily available than real-time data. In the Scottish sheep network the identities of the top 20% of farms contributing the most to the magnitude of R 0 changed from year to year. For animal-weighted contacts approximately 70% of farms in the top 20% contributors to the transmission potential in a given year also appeared in this fraction the following year. This fraction was similar for batch-weighted contacts, but was 65% or less for unweighted contacts. When the contacts of 20% of farms most contributing to the magnitude of R 0 in the preceding year were removed in the current year, the resulting reductions in the value of R 0 were consistently smaller, and also more variable, compared with targeting the current year's top-contributors (Table 3, Columns 3 and 4 versus Column 2).

Discussion
Although there have been numerous studies of contact networks as they relate to transmission of infectious diseases, very few of these investigate complete networks, and those that do have generally dealt with small populations [19]. Livestock movement databases allow analyses of large and complete networks, here, covering the entire population of Scottish sheep farms. Another feature of the majority of studies of contact networks is that they consider bi-directional contacts. Again, livestock movement databases are unusual due to explicit designation of uni-directional contacts where movement of livestock from farm j to farm i is associated with risk of disease transmission only in that direction [20]. This paper therefore provides information on the structure of contact networks and its relationship to the potential for spread of infectious diseases not readily available from studies of other populations.
The size of the giant weakly connected component of the network and relatively small number of movements from farms outside Scotland confirm that Scottish sheep farms can be regarded as a single population connected by sheep movements for the purposes of these analyses. The size of this giant component relative to the size of the network confirms that the Scottish sheep industry is inter-connected, in contrast, for example, to the commercial pig industries where animal movements are largely constrained within sub-networks [21,22]. Notably, this large connected component emerges even though the contract matrix itself is very sparse (with approximately 0.03% non-zero entries in a year) reflecting that, on average, each farm moves sheep to or receives sheep from less than five other Scottish farms in a given year. We can then use calculation of the basic reproduction number, R 0 , as a method to characterise the properties of the network of contacts between Scottish farms via movements of sheep and how these properties relate to the spread of infectious diseases within that population of farms. For a number of reasons however these calculations do not represent formal estimates of R 0 for any specific infectious disease. First, as indicated in Expressions [1]- [3] in Methods, we generate relative, not absolute, measures of the  magnitude of R 0 . Nor are the different contact formulations (unweighted, batch-weighted and individual animal-weighted farm contact) directly comparable amongst themselves (each being most relevant to certain disease scenarios, as discussed above). Secondly, we aggregate all movements over a one-year interval to provide a measure of relative contact rates for the farms. This does not account for temporal heterogeneities within the year, in particular marked seasonality in Scottish sheep movements; these can significantly affect R 0 [23] and could influence the results reported here if temporal variations in contact rates were poorly correlated across the farms. Finally, although movement of livestock is an important route for the spread of many endemic livestock infections, it is typically not the only route; other routes of transmission of infections between farms may be relevant for specific applications, e.g. wildlife [24][25][26][27], insect vectors [28], fomites and visitors or over-the-fence contact [29,30].
The size of the Scottish sheep farm network, the sizes of its giant weakly and strongly connected components, of its giant outcomponent, and the mean contact rates of a farm were broadly consistent across the 4 years (Table 1). However, there were differences in the contributions of the network's second order moments to the relative magnitude of R 0 throughout the years using animal-weighted or batch-weighted contacts. Previous studies of contact networks have reported increases in the value of R 0 associated with heterogeneities in contact rates between individuals [19]. Here we find that the size of such effects vary according to how the contacts are weighted.
However, the impact of second order moments on the magnitude of R 0 is far less than might be anticipated from the very high variances in farm contact rates [2]. The explanation is that there is only a weak correlation between the movements on and movements off individual farms (Table 2). Nonetheless, because these correlations are positive (if negative, the effect would be to reduce R 0 , see Expression [2]) and the variances of contact rates are so high, the net effects are still substantial for all of the disease scenarios considered.  Given the importance of heterogeneities in farm contact rates in determining the magnitude of R 0 , it is apparent that targeting interventions at farms contributing the most to R 0 is likely to be efficient. Interventions (e.g. pre-movement testing or, for some diseases, preventive vaccination) may reduce or eliminate the risk of disease transmission via livestock movement to or from individual farms. In practice, the 100% reduction in the susceptibility or infectiousness of individual farms is unlikely to be feasible.
Notably, information on contacts of farms in the preceding year was consistently slightly less valuable for identifying the 20% of farms to target in the current year (Table 3). This result presumably reflected some year-to-year variation in individual farms' contact patterns ( Table 2). As to the processes underlying such variation, characterising the farms repeatedly or intermittently appearing in the 20% contributing the most to the potential for transmission of infections each year may provide further insights. We note that the contact patterns of farms can also be altered by changes to the legal restrictions on livestock movements.
The key conclusions arising from this work are as follows. First, second order properties of a contact matrix (i.e. those not quantifiable from knowledge of the mean contact rates alone) can have a substantial impact on the magnitude of R 0 , see also Anderson and May [3], Woolhouse et al. [2] and others. Here we quantify the impact that heterogeneities in contacts rates of farms have on the potential for transmission of infections of livestock in a farm population. Second, the way in which contacts are weighted or defined makes a very substantial difference to quantification of R 0 and its components. When and how contacts should be weighted is relatively straightforward for livestock movements, perhaps less so for other kinds of 'contact' between individuals in a population. Third, contact matrices may vary through time not only in terms of contact rates of individual members of the population but also in terms of higher order properties, as has been reported previously for the UK cattle movement network [15] and observed here for the Scottish sheep movement network. The wider applicability of these conclusions depends on how representative the livestock farm networks are of contact networks in general, but we conjecture that similar issues will arise in many other contexts. The SAMS records were processed using the Python programming language and then in SASH 9.1.3 software for Windows (SAS Institute Inc., Cary, NC, USA). Up-to-date lists of sheep markets, show-grounds, abattoirs and other industry units registered in Scotland were collated with help from Livestock Traceability Policy Branch, Animal Health and Welfare Division, the Scottish Government and from Animal Health agency in Scotland. The data were processed, including definitions of types of holdings and movements, as previously described [31]. In short, and pertinent to these analyses, the vast majority (.99%) of the SAMS entries for sheep 2003 to 2007 were logical movement records, and the number of sheep movements not reported to SAMS during this period was believed to be low. The June/July dividing date precedes the major annual movement of sheep in the autumn. Seasonality in sheep movement patterns is not considered further in these analyses.

Sheep movement data
A farm was included in a year's analysis if it either sent or received sheep from another Scottish farm directly or via a Scottish livestock market during that year (movements to and from designated show-grounds and to slaughter were excluded).
During the period of study, the sheep identification and traceability regulations in Scotland did not require specification of individual animals in the movement documents (the Sheep and Goats Movement Interim Measures Scotland Order 2002 and Amendments; the Sheep and Goat Identification and Traceability Scotland Regulations 2006 and Amendments). Therefore the length of stay of an individual sheep on a given farm could not be determined. The legally required standstill period was 13 days, i.e. no sheep should have been moved off the farm earlier than 13 days after a sheep on-movement unless to slaughter, although certain categories of movements were exempt from the standstill. Sheep housed on mixed livestock farms were also subject to standstill after an on-movement of cattle (13 days), pigs (20 days) or goats (13 days).
The focus of these analyses was the network of Scottish farms via movements of sheep. For this purpose the network was treated as closed and movements outside Scotland were ignored. In practice, cross-border movements onto Scottish farms, primarily from England and Wales, did occur, but at low rates (,2% of movements onto Scottish farms during the study period). Movements off Scottish farms to locations outside Scotland were much more frequent, but are not relevant here.
Within Scotland, the majority of sheep movements between the farms (.80% in each of the 4 years analysed) occurred via Scottish livestock markets. Since we considered a relatively long time period (full year) and diseases with prolonged periods of infectivity of affected animals, we assumed the potential for disease transmission during brief stays at markets to be negligible compared to that on the farms (noting that this assumption would not hold for acute infections which are transmitted over short time scales). Therefore, we treated any indirect movement from farm j to farm i via a market as equivalent to a direct movement from farm j to farm i.

Giant network components
Connectedness of the farm network in each of the 4 years was evaluated by calculating the giant strongly connected component (GSCC), the giant weakly connected component (GWCC) and the giant out-component (GOC) of the network. The GSCC and GWCC were calculated with Tarjan's algorithm [32] implemented in C++. The GOC was calculated by choosing a farm from the GSCC and performing a depth-first search excluding cycles to identify every farm reachable from the chosen farm by a direct path; this was implemented in C++. For a given year, the GSCC encompassed all farms linked by bi-directional contacts; the GOC encompassed the GSCC plus all farms reachable from the farms in GSCC by a direct path ('sinks'); and the GWCC encompassed the GSCC plus all farms connected to the farms in the GSCC by any uni-directional contact (both 'sources' and 'sinks').

Definitions of contact between farms
Let a ij be the directed contact rate from farm j to farm i in a particular year. We assign values to a ij in one of three ways. 1) contact scored as 0 (no movement of sheep from farm j to farm i) or 1 (any movement of sheep from farm j to farm i); 2) as (1) but weighted by the number of batches of sheep moved from farm j to farm i (noting that this is equivalent to the frequency of contact from j to i); and 3) as (1) but weighted by the number of sheep moved from farm j to farm i. Model 1 is most appropriate for an infection with high animal-level prevalence on affected farms (i.e. likely to be transmitted via any movement of sheep between farms). Model 3 is most appropriate for a rare infection with low animal-level prevalence (so the probability of transmission can be considered to depend linearly on the number of sheep moved between farms). Model 2 is intermediate between 1 and 3.
Calculating contributions of network's first and second order moments to the magnitude of R 0 For all three disease models discussed above, the in-contact rate for farm i is b i in = S j a ij , and the out-contact rate is b i out = S j a ji . R 0 is related to the mean contact rate. In a closed network S i b i in = S i b i out and, if we were to assume that there was no variation in individual contact rates then: More generally, R 0 is further influenced by the second order moments of the contact matrix. We denote the standard deviation of in-contact rates as s(b in ), the standard deviation of out-contact rates as s(b out ), and the Pearson product-moment correlation coefficient between in-contact rates and out-contact rates as r binbout . As previously shown [20], R 0 (ignoring higher order properties of the network) is a function of these terms as follows: Therefore, non-zero variances of b in and b out can increase R 0 if b in and b out are positively correlated. Expression [2] can be written in terms of the product of b in and b out , denoting the number of farms in the network as N, this is: where b in (i) and b out (i) refer to in-and out-contact rate, respectively, for farm i. The contribution of second order moments of the farm contact matrix to R 0 was evaluated as the ratio of the quantity calculated in Expression [3] to the quantity calculated in Expression [1].
Quantities [1] and [3] were calculated for the contact matrices where contacts between farms were weighted according to each of the three scenarios of the animal-level prevalence of the disease.
Calculating contributions of individual farms to the magnitude of R 0 For each of the 4 years and three disease scenarios, we assumed that contacts of a farm were non-infectious or absent (setting b in b out = 0 for the farm) and re-calculated the contribution of the first and second order moments of the network to the magnitude of R 0 . The resultant quantity evaluated individual contribution of the farm to the magnitude of R 0 and allowed ranking the farms in the order of their contribution.