The introduction of dengue follows transportation infrastructure changes in the state of Acre, Brazil: A network-based analysis

Human mobility, presence and passive transportation of Aedes aegypti mosquito, and environmental characteristics are a group of factors which contribute to the success of dengue spread and establishment. To understand this process, we assess data from dengue national and municipal basins regarding population and demographics, transportation network, human mobility, and Ae. aegypti monitoring for the Brazilian state of Acre since the first recorded dengue case in the year 2000 to the year 2015. During this period, several changes in Acre’s transport infrastructure and urbanization have been started. To reconstruct the process of dengue introduction in Acre, we propose an analytic framework based on concepts used in malaria literature, namely vulnerability and receptivity, to inform risk assessments in dengue-free regions as well as network theory concepts for disease invasion and propagation. We calculate the probability of dengue importation to Acre from other Brazilian states, the evolution of dengue spread between Acrean municipalities and dengue establishment in the state. Our findings suggest that the landscape changes associated with human mobility have created favorable conditions for the establishment of dengue virus transmission in Acre. The revitalization of its major roads, as well as the increased accessibility by air to and within the state, have increased dengue vulnerability. Unplanned urbanization and population growth, as observed in Acre during the period of study, contribute to ideal conditions for Ae. aegypti mosquito establishment, increase the difficulty in mosquito control and consequently its local receptivity.

The airline data provided by the Agência Nacional de Aviação Civil (National Agency 2 for Civil Aviation, ANAC, http://www.anac.gov.br), limits the information about the 3 flow between airport pairs only to the number of passengers on direct flights (with or 4 without stoppage) and the total amount of passengers making a connection in each 5 airport by last boarding airport. Information about these connecting passengers does 6 not include their final destination, that is, in which flight they boarded at the connecting 7 airport. Similarly, data on direct flights do not inform how many passengers took a 8 previous connecting flight before boarding a direct flight. Therefore, for connecting 9 passengers, the data do not provide their final destination, while for passengers on direct 10 flights the data do not inform what the de facto airport of origin of each passenger is.

11
Such limitations create a particular challenge to obtain proper origin-destination 12 matrix between Brazilian airports. Given the continental scale of the Brazilian territory, 13 it is known that in the national grid several routes are made with connecting flights. 14 The availability of direct flights between regions far apart is very limited. The major 15 connection hubs are located in the metropolitan areas of São Paulo, Rio de Janeiro and 16 Brasília, which combine not only densely populated areas but, maybe more importantly, 17 regions that are quite centralized with respect to the North-South axis of the Brazilian 18 territory.

19
Given the absence of detailed information regarding connecting flights, we propose a 20 method to estimate the number of passengers flying from one state with final 21 destination in another supposing that there is at most one connection in each 22 passenger's route. That is, we assume that the number of passengers that take more 23 than one connection to reach its destination state from its state of origin is significantly 24 lower than those on direct flights or using a single connecting flight. To estimate the 25 number of passengers between two airports with a connecting flight in between, we will 26 combine the information regarding passengers on direct flights and the fraction of 27 passengers on connections in each airport.

28
To illustrate the proposed method, we will present the general formula and exemplify 29 its usage on a simple network (Fig 1). Example of a small directed weighted network with information structure as provided in the Brazilian airline database. Each node represents an airport, and the edges represent passengers flying between them. Weights represent the number of passengers on direct flights from airport i to j (W ij ) and passengers from airport i taking a connecting flight at j (C ij ).
De facto origin-destination estimate 31 Let us denote by W ij the number of passengers on a direct flight from i to j, and by C ij 32 those flying from i to make a connection at j. While the former end their trip at j, the 33 later will board another flight at j before finishing his/her trip. Those two quantities are 34 provided for each pair of airports in the Brazilian territory, as described in the previous 35 section. Our goal is to obtain an estimate for the de facto origin-destination matrix for 36 the Brazilian flight network given this dataset. That is, we want to estimate what is the 37 actual number of passengers that start their trip at a given airport (origin) and end it 38 at another one (destination), for every pair of airports, which we will denote by Ω ij .

39
For ease of notation, we will use the dot symbol, "·", to indicate sum over indexes 40 when doing so does not compromise readability. For instance, W i· = j W ij represents 41 the total number of passengers boarding direct flights at i, while W ·j = i W ij denotes 42 the total number of passengers arriving at j on direct flights, that is, with final 43 destination at j.

44
To estimate the actual flow of passengers from origin i and destination j we will 45 make a few key assumptions:  Proposition 1 implies that the number of passengers on a given origin-destination 51 pair whose trips are made with a direct flight or with at most one connecting flight is 52 much greater than those with more than one connecting flight. Although a strong 53 assumption, it is necessary to limit the number of possible flight combinations between 54 each pair of airports. In the lack of detailed information regarding the typical number of 55 connections, assuming only one is made makes the problem mathematically tractable.

56
Let us define W * ij and W * ikj as the number of passengers whose origin is airport i 57 and destination is j, through direct flight or with a connection at k, respectively. The 58 difference between W * ij and W ij is that while the later is the number of passengers on a 59 direct flight from airport i to airport j taken from original data -which has no 60 information on travel origin for each of those passengers, as described -, the former is 61 the estimated number of passengers whose travel origin is airport i and final travel 62 destination is airport j on a direct flight. That is, W * ij is our estimate for the de facto 63 origin-destination matrix based on direct flights aline. the quantity W * ikj is the 64 estimated de facto origin-destination tensor for passengers with travel origin at i, final 65 destination at j with a connecting flight at k. Finally, Ω ij , the total number of 66 passengers with travel starting at i and ending at j, regardless of the path taken, 67 assuming that the number of travelers with more than one connection is negligible, can 68 be approximated by: 69 To estimate the number of passengers from i to j with a connecting flight at k, W * ikj , 70 let us first define lower case c ik as the ratio between passengers from i making a 71 connection at k with respect to all direct flight passengers at k: From this definition, c ik is an estimate of the probability that a passenger boarding at k 73 on a direct flight is, in fact, a connecting passenger originally from i. With proposition 74 2 we are assuming that passengers arriving at a given airport k for a connecting flight 75 are distributed among the available direct flights from k following a multinomial 76 distribution. Each destination probability is proportional to the number of passengers 77 on each direct flight. Therefore, from the total number of passengers from i making a 78 connection at k -C ik -, and the total number of passengers on a direct flight from k to 79 j -W kj -, the expected number of those that came from i, to make a connection at k, 80 with final destination at j -W * ikj -, is given by Note that this construction also allows us to estimate the fraction of passengers on 82 direct flight from k to j that are connecting passengers from other airports. In order to 83 estimate the number of passengers who are from k itself -W * kj -we must take into 84 account all connecting passengers on that flight, W * ·kj : Therefore, the expected number of passengers with origin k and destination j, on a 86 direct flight, is given by Finally, combining Eqs. 3 and 5 with Eq. 1, we have that the expected number of 88 passengers in the origin-destination pair i to j, regardless of route, is 89 In order to aggregate this information by state, we sum over the corresponding airports 90 to obtain the estimate for the total number of passengers flying from state I to state J, 91 that is Since the provided information has a monthly temporal resolution, in order to obtain 93 the average daily flow in month m, defined as π IJ,m , we divide the estimated flow in 94 that month, Ω IJ,m , by the corresponding number of days in m. In this fashion, we 95 preserve any seasonal effect that might be present at the monthly level, which would 96 otherwise be washed out if flight data was aggregated yearly.

97
It is important to remind the reader that the assumptions made to obtain these 98 estimates present some limitations. For instance, for any pair of remote airports, the 99 assumption that the number of passengers taking more than one connection is 100 significantly smaller than those taking up to one connection might not hold by sheer 101 lack of available flight paths. Since the flow between remote airports represents a small 102 fraction of interstate airport flow, we believe that the error generated by this 103 simplification does not justify the mathematical complexity of introducing two or more 104 connections in our calculations. Also, the proportional distribution of connecting 105 passengers among direct flights might introduce error. Particularly, this assumption 106 favors the presence of passengers of airport hubs on both origin and destination. This 107 could be addressed by weighting down the number of connecting passengers W * ikj when 108 the direct flow W ij is high, for instance. Nonetheless, this would only be an issue if C ik 109 is relatively high compared to C ·k . Since hubs are characterized by having a relatively 110 high presence of direct flights, especially to other hubs, the number of passengers from 111 hub i making a connection on other airports is low compared to the number of 112 passengers on direct flights from that hub.

114
To exemplify the use of the proposed approximation, we'll make use of the toy network 115 illustrated on Fig. 1. In that network, since there are no passengers on flights to node 2, 116 the number of passengers on direct flights from that node are all originally from that 117 airport. From Eq. 5 this means that W * 21 = W 21 and W * 23 = W 23 . Since connecting 118 passengers from node 2 have connections at nodes 1 and 3, and there is no direct flight 119 from 3 to 1, the only possible route from node 2 with final destination at 1 is via direct 120 flight. Therefore, the flow from node 2 to nodes 1 is simply Regarding the flow of passengers with final destination at node 3, we have the following 122 estimates for passengers with origin at nodes 1 and 2 becomes more involved. On the 123 one hand, since there are no flights bound to node 2, all passengers boarding at that 124 node are necessarily from there, giving W * 23 = W 23 . On the other hand, for passengers 125 from 1 to 3 we have not only the local population of 1 but also connecting passengers 126 arriving at 1 from flights originated at node 2, given by C 21 . Therefore, for passengers 127 arriving on a direct flight from node 1 to node 3, W 13 , we have passengers originally 128 from 1 but also from 2. This leads to the following estimate for the origin-destination 129 flow to node 3: 130 1 → 3 : Ω 13 = W * 13 = W 13 1 − C 21 W 13 + W 14 , 2 → 3 : Ω 23 = W * 23 + W * 213 = W 23 + W 13 C 21 W 13 + W 14 .
Taking node 4 as final destination, from node 1 we would have both passengers on 131 direct flight and with connection at 3. From node 2, the only possibility is via 132 connecting flights, either at 1 or 3. From node 3, there are only passengers to 4 trough 133 direct flight. While the direct flight from node 1 to 4 can carry passengers from nodes 1 134 and 2, direct flight from node 3 to 4 can carry passengers originally from nodes 1, 2 and 135 3. Combining all this information, we end up with the following set of equations for the 136 estimate for each node of origin: 137 1 → 4 : Ω 14 = W 14 1 − C 21 W 13 + W 14 + W 34 C 13 W 34 , : Ω 14 = W 14 1 − C 21 W 13 + W 14 + C 13 , 2 → 3 : Ω 23 = C 21 W 13 + W 14 + C 23 , 3 → 4 : Ω 34 = W 34 1 − C 13 + C 23 W 34 .