Large-scale simulation of traffic flow using Markov model

Modeling and simulating movement of vehicles in established transportation infrastructures, especially in large urban road networks is an important task. It helps in understanding and handling traffic problems, optimizing traffic regulations and adapting the traffic management in real time for unexpected disaster events. A mathematically rigorous stochastic model that can be used for traffic analysis was proposed earlier by other researchers which is based on an interplay between graph and Markov chain theories. This model provides a transition probability matrix which describes the traffic’s dynamic with its unique stationary distribution of the vehicles on the road network. In this paper, a new parametrization is presented for this model by introducing the concept of two-dimensional stationary distribution which can handle the traffic’s dynamic together with the vehicles’ distribution. In addition, the weighted least squares estimation method is applied for estimating this new parameter matrix using trajectory data. In a case study, we apply our method on the Taxi Trajectory Prediction dataset and road network data from the OpenStreetMap project, both available publicly. To test our approach, we have implemented the proposed model in software. We have run simulations in medium and large scales and both the model and estimation procedure, based on artificial and real datasets, have been proved satisfactory and superior to the frequency based maximum likelihood method. In a real application, we have unfolded a stationary distribution on the map graph of Porto, based on the dataset. The approach described here combines techniques which, when used together to analyze traffic on large road networks, has not previously been reported.


Introduction
In the past decade, research and development of Smart City applications have become an active topic.These applications provide services to inhabitants which can make everyday life easier [1].In addition, these services contain solutions such as intelligent city planning, crowd-sourcing, as well as crisis and disaster management [2].These applications will also both generate and make us of big data analytics which will arise from the wide availability of cloud computing.In addition, complex sensor systems are being implemented, called the Internet of Things (IoT) that could aid the operation of these applications.By the year 2050, 70% of Earth's population is expected to live in urban areas [3].City infrastructures will face new challenges from many factors; one such factor is urban traffic.
In the recent years, the research and development of smart city applications focusing on urban transportation and traffic have become a vivid topic.Since more and more people live in urban areas, a solution for the problem of dense traffic is highly demanding.Moreover, in the past few years, many developments have occurred in the automobile industry that fundamentally changed the way we think about personal mobility.Autonomous or driverless cars are being introduced by several tech giants and car companies, some of them are on the market already.From driving-aid system (e.g.pedestrian observers, lane support systems), which have become ubiquitous by the end of the 2010s, we can envision the widespread use of driverless cars in the 2020's.In addition, pure electric cars have been on the market for a few years and will certainly continue to expand their market share.
In a previous paper [4], we introduced a smart city application that can support the operation of self driving and electric cars in urban environments and may provide a common research platform for investigating the connection between autonomous cars and smart cities.Our main question was: what can a city controlled IT solution offer to these cars to allow them to be operated more efficiently?The application was a prototype and some of its features were not defined or developed satisfactorily.One such feature was the traffic simulation algorithms included in the application.In a previous paper [5], we raised the question whether we can develop such an algorithm to control the simulated cars that can hold the real distribution of cars.An additional question is how can we fit this algorithm by estimating its parameters based on real data, e.g., which has the form of trajectories.Our results presented in this paper are an answer to these questions.
In [6], see also [7] and [8], a stochastic model is proposed which can handle the traffic in an urban network by using a mathematically rigorous method.This model is based on discrete time Markov chain on the road graph which plays the role of the state space.In the traffic interpretation, the transition probability matrix describes the dynamic of the traffic while its unique stationary distribution corresponds to the traffic equilibrium (or steady) state on the road network.In this steady-state, the distribution of vehicles remains invariant locally in time under the transport dynamic.Thus, this stationary distribution of the Markov chain can be interpreted as the momentary true distribution of the vehicles on the road network.
Our contributions in this paper are the following.Based on [6] we introduce the concepts of a Markov random walk, which describes the motion of an individual vehicle, and Markov traffic, which describes the entire traffic on the road network, respectively.We determine the stationary distribution of the Markov traffic as a multinomial distribution, see formula (5).We present how the ergodic theory of finite Markov chains implies that a complex traffic event can be approximated well by the stationary distribution of a Markov chain on the road network; see Theorem 2. This theorem also yields the theoretical ground of our simulation algorithm.We reparametrize the model by introducing the concept of two-dimensional stationary distribution which possesses equi-distributed marginals that are the unique stationary distribution of the transition probability matrix, respectively.To estimate this parameter matrix the weighted least squares (WLS) estimation as a kind of composite (quasi-) likelihood methods is applied, see [9].In Theorem 3, we show that the WLS estimator of the two-dimensional stationary distribution can be expressed explicitly.Moreover, this estimation method provides a computationally effective technique in a large scale since the MapReduce paradigm can be easily applied for it.Finally, we present how a city-controlled IT solution can be developed which is able to simulate the traffic on a road network that fits to the real data.
Note that the joint application of Markov chains and large graphs to analyze the behavior of complex systems is well known in several fields, e.g., distributed systems [10], geophysics [11] and biology [12].
This paper is structured as follows: the remainder of this chapter provides a short introduction to our software system and shows a brief outlook for other simulation tools and approaches.In section 2, we describe our proposed traffic simulation model in detail.In section 3, we discuss our findings, and in section 4, we conclude the paper.The Appendix provides the proofs of theoretical results and a toy example to demonstrate the proposed estimation method.

Previous and related works
This research follows our development of a traffic simulation platform initiative called rObOCar World Championship (or OOCWC for short) [4], [5], [13], [14], [15].The OOCWC is a multiagent-oriented environment for creating urban traffic simulations.The goal is to offer a research and educational platform to investigate urban traffic control algorithms and the relationship between smart cities and self-driving cars.In our vision, a software system that is controlled by a city administration can support these cars well, because a city possesses all the necessary information (congestions, accidents, detours, etc.) for effective route planning.The traffic simulations are performed by one of its components called Robocar City Emulator (RCE).The first rapid prototype of RCE was called Justine, which is an open source project released under the GNU GPL v3 and can be downloaded from GitHub. 1 It has 3 main components, the RCE itself, and two visualization tools, the rcwin and the rclog.This prototype uses the OpenStreetMap (OSM) database and processes it with the Osmium Library.The result of this processing is a routing map graph and a Boost Graph Library graph.The routing map graph is then placed into a shared memory segment.This prototype was also called "Police Edition", because it could simulate police pursuit (macroscopic, with multiple police agents).In addition, it was somewhat capable of simulating traffic, but only for a technical demonstration, because every traffic unit moved randomly.In the following, we only focus on the traffic simulation algorithm of the RCE.The traffic simulation model of the RCE is based on the Nagel-Schreckenberg (NaSch) model [16], because it uses a cell based approach, as well.Moreover, it can be considered as a standalone multi-agent system.The simulation takes place on a rectangular part of the OSM.We slice all edges for parts 3 meters long.Based on NaSch terminology, these parts are called cells.Therefore, the cell length is equal to 3 meters.In contrast with the NaSch model, where a cell may contain many cars, edges can contain a given number of cars only (calculated as the edge length divided by the length of the cell, or part, i.e. 3 meters), since in our simulation, one cell can contain only one car.In the original implementation, the simulation algorithm moves the cars by random walk.So, when a car arrives to a graph vertex (i.e.intersection), it selects the next edge (i.e. next street) according to uniform distribution.For a detailed description of the operation of RCE, see paper [4].
A simulation can be initialized based on measured data or some prescribed distribution, e.g., uniform one, on the road network.During the simulation, we can observe how the distribution of the cars change.One of the reasonable in sections 2.1 and 2.2 we outline the basic concepts in the fields of graph theory and finite Markov chains, respectively.Then, in 2.3, we describe the proposed model called "Markov traffic" in detail.As a case study, we use a publicly available trajectory dataset, namely, the Taxi Trajectory Prediction dataset4 ; in section 2.4, we outline the key points of how we selected and processed the trajectory data.In section 2.5, we describe how we process OSM data and build a traffic graph from this data.Section 2.6 is devoted to the statistical inference for Markov traffic.

Basic concepts and notation
In this section, we outline the concepts of graph theory that are necessary for modeling traffic flow.A standard textbook on graph theory is [48].
Road network, line digraph and degree distributions.Let G = (V, E) be a directed graph (digraph) where V and E denote the set of vertices or nodes and the set of directed edges or arcs or arrows of the graph, respectively.In the sequel, vertices are denoted by u, v, w, edges are denoted by e, f, g.For a directed edge e = (v, w) ∈ E we also use the notation v → w.We suppose that G is a simple digraph in the sense that it does not contain multiple arrows, i.e., two or more edges that connect the same two vertices in the same direction.The digraph G, called road network in this paper, represents the road system of a city.More precisely, we have the following definition, see [49], , where V is a set of nodes representing the terminal points of road segments, and E is a set of directed edges denoting road segments.
A road segment e = (v, w) ∈ E is a directed edge in the road network graphs, with two terminal points v and w.The vehicle flow on this edge is from v to w.
then it is called a loop which connects the vertex v to itself.If there would be a loop in a road network then there would be a physical road segment v → v (v ∈ V ) in the road network.From a practical point of view, we lock out this possibility because in this case a vehicle can move in an infinite cycle.Thus, we suppose that E ∩S = ∅.Later however, when we define the stochastic traffic on a road network, we allow loops in order that we ensure that a vehicle may remain at the same node or edge of the road network after a time step.For a digraph G = (V, E) another digraph can be associated by the following way.Let the set V of vertices of this new digraph be the set of directed edges E of G and let the set E of its directed edges consist of the ordered pair (e, f ) where e, f ∈ E such that there exist u, v, w ∈ V that e = (u, v) and f = (v, w), i.e., u → v → w is a path (or dipath) in G of length 2. This associated digraph is called a directed line graph, see Section 4.5 in [48], shortly line digraph, or line road network (network line graph, see [50]), and it is denoted by L(G) = (V , E ).The elements of E can be described by triplets (u, v, w), where u, v, w ∈ V, (u, v), (v, w) ∈ E, and for a directed edge in L(G) we use the notation (u, v) → (v, w) too.
The basic difference between the digraph and line digraph views of a road network is that the former assigns the vehicles moving in a city to the vertices while the latter to the edges.One can refer the former as first-order (primal) network while the latter as second-order (dual) network, see [51] ( [52]).These two kinds of graphs are both useful because in a road network, certain measurements are associated with the crossings (vertices), and certain measurements are associated with the road segments (directed edges).When we are concerned with comparing measurements associated with crossings, then we will be concerned with the adjacency relationships of crossings, and so with the road network.However, when we are concerned with measurements associated with road segments we will be concerned with the adjacency relationships of road segments, and so our analyses will involve the line road network.
The degree distributions of the digraphs G and L(G), respectively, are given in the following way.For all i = 0, 1, 2, . . .
Then, the pairs (i, n + i ), i = 0, 1, 2, . . ., form the frequency histogram for the outdegree distribution of G.The indegree frequency histogram is defined similarly as (i, On the other hand, for all i = 0, 1, 2, . . ., define m + i := Then, the pairs (i, m + i ), i = 0, 1, 2, . . ., form the frequency histogram for the outdegree distribution of L(G).Note that n + i = |G + i | for all i.Similarly, the pairs (i, m − i ), i = 0, 1, 2, . . ., form the frequency histogram for the indegree distribution of L(G) where m − i := ) One can easily see that the supports of the two indegree (outdegree) histograms are the same.For the Porto example (described later in this paper), the above mentioned degree distributions can be seen in Fig. 2.These histograms corroborates the fact that the Porto's road network is a sparse graph since there is no node with higher in-and outdegree than 6 and the ratio of the number of edges and the number of nodes is less than 2, see Fig. 7.
Clearly, if v is reachable from u then there is a path from u to v. A digraph G is said to be strongly connected (diconnected) if every vertex is reachable from every other vertex.
We can define vectors (functions) and matrices (operators or kernels) on V in the following way.Let α : V → R be a real function on V and let F(V, R) denote the set of real functions on V .We shall also use the notations We shall also use the notations T (u, v) = t uv , u, v ∈ V and T = (t uv ) u,v∈V .Then T is called matrix, operator or kernel on V and induces a linear operator on F(V, R) in the following way: for each α ∈ F(V, R), T (α) ∈ F(V, R) is defined as T (α) u := v∈V t uv α v , u ∈ V .For the sake of simplicity, we write T (α) = T α as a matrix-vector product.If the support of T (the set Let A = (a uv ) u,v∈V denote the adjacency matrix of the digraph G, i.e., A is a matrix on V and a uv = 1 if and only if (u, v) ∈ E and 0 otherwise.Clearly, the support of A is E, i.e., A is a G-subordinated matrix in strong sense (a vv = 0 for all v ∈ V ).Note that A is not necessarily symmetric.The number of directed walks from vertex u to vertex v of length k is the entry on u-th row and v-th column of the matrix A k .For example, in Fig. 1, the number of directed walks of length 4 from vertex 2 to vertex 4 is 4, see Appendix A. One can easily check that G is strongly connected if and only if there is a positive integer k such that the matrix Clearly, the line digraph of a strongly connected digraph is also strongly connected.Namely, if e = (u, v) ∈ V (= E) and f = (w, z) ∈ V (= E) are arbitrary such that e = f , then, since G is strongly connected, there exists a walk (or a path) of length in , there exists a walk (or a path) of length in L(G) between the vertices e, f ∈ V .If u → v → u for a pair u, v ∈ V then we have (u, v) → (v, u) → (u, v) in the line digraph, i.e., vehicles can turn back at vertex u into v.Sometimes the traffic regulations do not allow this kind of reversal, i.e., the edge set E in L(G) must not contain some triplet (u, v, u) while some of these triplets are needed that L(G) be strongly connected.By deleting all of the unnecessary triplet (u, v, u), u, v ∈ V , such that the remaining line digraph be still strongly connected we get the minimal strongly connected line digraph of G.This line digraph is denoted by ML(G).For example, the vertices of ML(G) for G defined in Fig. 1 are given in Table 1.
Recall that a cycle Here (C) = is called the length of C. A digraph G is said to be aperiodic if the greatest common divisor of the lengths of its cycles is one.Formally, the period of G is defined as per(G) := gcd{ > 0 : ∃C ⊂ V cycle such that (C) = }.Then, G is called aperiodic if per(G) = 1.Clearly, if a digraph G is aperiodic then its line digraph L(G) is also aperiodic.This statement follows from the following fact: if ) is a cycle in L(G).Thus, if > 0 and there exists a cycle C ⊂ V such that (C) = then there exists a cycle C ⊂ V such that (C ) = .
It is well known that the adjacency matrix A of an aperiodic, strongly connected graph G is primitive, i.e., irreducible and has only one eigenvalue of maximum modulus.Primitivity is equivalent to the following quasi-positivity: there exists k ∈ N such that the matrix A k > 0, see Section 8.5 in [53].
Open road networks and their closures.In general, there are vehicles which leave or enter the city.To model these two possibilities V is augmented by a new ideal vertex 0 which denotes the world outside of the city.This approach is similar to that is applied for public transport in [54].Let V := V ∪ {0}.Then, additional directed edges which contains vertex 0 are also added to E. In this case, (v, 0) denotes that the vehicles can leave the city at vertex v, and (0, v) denotes that new vehicles can enter the city at vertex v, where v ∈ V .Let E denote the augmentation of E by directed edges for getting into and out of the city.Note that, for E, it is not allowed to contain the loop (0, 0).The augmentation of G is denoted by G = (V , E) and it is called the closure of a road network G.For e = (v, w) ∈ E we also use the notation v → w.In what follows, we suppose that there exist u, v ∈ V such that u → 0 and 0 → v.
In the sequel, each definition (strong connectedness, periodicity, line digraph) given for G can be extended for G in a natural way.Note that in the augmented line digraph L(G) = (V , E ) the elements of the edge set E can be described by triplet (u, v, w), where u, v, w ∈ V and if v = 0 then u, w = 0 and if u or w is 0 then v = 0 because triplets (0, 0, v), (v, 0, 0) and (0, 0, 0) are excluded from E .One can easily see that if G is strongly connected then its closure G is also strongly connected.Moreover, the strongly connected components of G, if there exist more than 1, can be connected through the ideal vertex 0, resulting in a strongly connected G. Thus, in this case, the augmented line digraph will also be strongly connected.Clearly, if G is aperiodic then G is aperiodic too.
In the rest of this paper, it is assumed that the road network is closed, i.e., the vehicles can not get into and out of the road system of the city augmented with the ideal vertex 0.

Probability distributions and Markov kernels on road networks
In this section, we summarize some basic concepts and results of the theory of finite Markov chains with their interpretations and consequences for traffic flow modeling.Some textbooks on this field are [55] and [56].
We can define two kinds of probability distributions on a road network G considering the set V or E as the state space, respectively.In fact, we can handle both cases altogether by defining probability distributions on the set of vertices of the digraph G or the line digraph L(G), accordingly.However, we should note that the Markov kernels defined on the line digraph must be defined with particular attention.
A probability distribution (p.d.) on V is the vector π := (π v ) v∈V where π v ≥ 0 for all v ∈ V and v∈V π v = 1.That is a p.d. π is a normalized V → R + function.We can think of π v as the proportion of the number of vehicles which drive through the crossing v with respect to the whole number of vehicles in the city at a fixed time period.A Markov kernel or transition probability matrix on V is defined as a real kernel P := (p uv ) u,v∈V such that p uv ≥ 0 for all u, v ∈ V and v∈V p uv = 1 for all u ∈ V , i.e., p u := (p uv ) v∈V is a p.d. on V for all u ∈ V .The quantity p uv ∈ [0, 1] is called the transition probability from vertex u to vertex v.The kernel P is said to be G-subordinated if p uv > 0 for a pair u, v ∈ V implies (u, v) ∈ E or u = v, i.e., P as a matrix on V is G-subordinated in the weak sense.It is well known, see [57], that for a Markov kernel P on V , an associated graph G P = (V, E P ) can be introduced in the following way: for a pair u, v ∈ V (where the case u = v is also allowed) (u, v) ∈ E P if and only if p uv > 0. Thus, P is G-subordinated if and only if E P ⊆ E ∪ S, i.e., G P is the subgraph of the graph G extended with its diagonal S. In this case, we also use the notation P = (p e ) e∈E∪S .In other words, a G-subordinated Markov kernel P is a stochastic matrix on V with support E ∪ S.Then, the sum condition for Markov kernel P can be rewritten as: A p.d. π on V is a stationary distribution (s.d.) of the kernel P if u∈V π u p uv = π v for all v ∈ V .For a Gsubordinated Markov kernel P this formula, the so-called global balance equation, can be expressed as: Fig. 3 presents a Markov kernel with its s.d. on the road network in Fig. 1.
Since the vehicles are moving along the road segments of the road network G it is natural to choose the set E for the role of the state space.In this case, we can define probability distributions on the set of vertices again, however, we have to consider the line digraph L(G) (or ML(G)).Formally, a probability distribution (p.d.) on L(G) is the vector π := (π e ) e∈E where π e ≥ 0 for all e ∈ E and e∈E π e = 1.Similarly to p.d.'s on V , the p.d. π can also be considered as a normalized E → R + function or a normalized V → R + function.If we want to emphasize the vertices of the original road network G instead of the edges then the notation π e = π uv is also used where e = (u, v) ∈ E. We can think of π e as the proportion of the number of vehicles at the road segment e with respect to the whole number of vehicles in the city at a fixed time point.Note that G endowed with π is a weighted digraph which is often called a network in itself as well.
A transition probability matrix (or Markov kernel) on E, i.e., on the line digraph L(G), can be defined as a real kernel P := (p ef ) e,f ∈E such that p ef ≥ 0 for all e, f ∈ E and f ∈E p ef = 1 for all e ∈ E. A p.d. π on E is a s.d. of the kernel P if e∈E π e p ef = π f for all f ∈ E. Since G represents a road system we may suppose that if e = f then p ef > 0 implies that (e, f ) ∈ E , i.e., there exist u, v, w ∈ V such that e = (u, v) and f = (v, w), and hence, u → v → w is a walk of length 2. In other words, we may suppose that the Markov kernel P is L(G)subordinated.Similarly to Markov kernels on V , P also induces an associated graph G P = (V , E P ).Thus, P is L(G)-subordinated if and only if E P ⊆ E ∪ S where S denotes here the diagonal {(e, e) | e ∈ E} in L(G).In this case, we use the notation p ef = p uvw as well.In fact, p uvw denotes the probability that a vehicle on the road segment (u, v) will go further to the road segment (v, w) in the next time point.Moreover, in the case of e = f = (u, v), let p ee = p uv be the probability that a vehicle remains on the same road segment in the next time point which can be non-zero as well.Thus, since P is a Markov kernel, we have that, for all u → v, w:v→w and the global balance equation is given as: for all v → w.An example for the Markov kernel P on the minimal line digraph ML(G) of the road network G in Fig. 1 is shown in Table 1.Fig. 4 shows the unique stationary distribution π of the Markov kernel P .
Probability distributions and Markov kernels on open road networks.Probability distributions and Markov kernels on the closure G of an open road network G can be defined similarly by considering the set V or E as the state space, respectively.A p.d. on V is the vector π := (π v ) v∈V where π v ≥ 0 for all v ∈ V and v∈V π v = 1.Note that π 0 denotes the proportion of the number of vehicles which drive in or out of the city's roads at a time point.A Markov kernel Note that G-subordination implies that p 00 = 0, i.e., the vehicles cannot move from 0 to 0, thus they either enter to the road network or leave the road network.The equations ( 1) and (2) remain hold too.Equation (1) can be rewritten as w∈V :v→w The global balance equation (2) for the s.d.can be rewritten as u∈V :u→v We can define Markov kernels on the line digraph L(G) of the augmented road network G, and thus on the augmented edge set E similarly to the case of L(G).A Markov kernel P on L(G) is called L(G)-subordinated if p ef > 0 for e, f ∈ E implies (e, f ) ∈ E or e = f .Note that (e, f ) ∈ E implies that e = (u, v) and f = (v, w) where u, v, w ∈ V excluding the triplets (0, 0, v), (v, 0, 0) and (0, 0, 0).We shall also use the notation (u, v) and f = (v, w) and p uv = p ee if e = (u, v).However, three additional conditions should be added.The first one is that p u0u = 0 for all u ∈ V such that u → 0 → u.This means that if a vehicle is on the edge (u, 0), i.e., it leaves the city at vertex u then it cannot be on the edge (0, u) at the next time point, i.e., it cannot enter at vertex u in the road network again, immediately.The second one is that p 0v0 = 0 for all v ∈ V such that 0 → v → 0, i.e., vehicles can enter and leave the city at node v.This means that if a vehicle enters the city then it cannot leave the city at the next time.Finally, the third one is that p u0 = p 0v = 0 for all u, v ∈ V such that u → 0 and 0 → v.That is a vehicle cannot remain on the road network at the edge (u, 0) after two consecutive time points and if a vehicle enters into the road network at the edge (0, v) (or at the vertex v) the first time then it does not remain on this edge after the next time point and it goes further immediately in the road network.Under these conditions the equations ( 3) and (4) remain hold.Equation (3) can be rewritten as: w∈V :v→w Equation ( 4) can be rewritten as: u∈V :u→v u∈V \{w}:u→0 The stationary distribution in all cases, i.e., for Markov kernels on road networks, line road networks and their closures, can be derived by solving the above appropriate linear equations numerically.Since the state space (the road network) is finite there exists at least one stationary distribution.However, in some cases, the stationary distribution is not uniquely defined by these equations.
Uniqueness of stationary distribution.We show that there is a direct connection between the existence of s.d. of the Markov kernels P and P and the strongly connected property of the physical road network G if the Markov and graph structures are compatible with each other.The Markov kernel P on V is called G-compatible if, for any u, v ∈ V such that u = v, p uv > 0 if and only if (u, v) ∈ E. Note that the G-compatibility implies the weak G-subordination for a Markov kernel P , however the converse is not true.Similarly, the Markov kernel Since (e, f ) ∈ E if and only if there exist u, v, w ∈ V such that e = (u, v) and f = (v, w) we can define the G-compatibility of a Markov kernel P as, for any e, f ∈ E such that e = f , p ef > 0 if and only if there exist u, v, w ∈ V such that e = (u, v) and f = (v, w).
Clearly, if P is G-compatible then the strong connectivity of G implies that the associated graph G P to the Markov kernel P is also strongly connected.In this case, the Markov kernel (the transition matrix) P is called irreducible.Thus, by Theorem 1 in [57], see also Theorem 3.1 and 3.3 in Chapter 3 of [56] the following theorem holds.
Theorem 1.If a road network G is strongly connected then there is a unique stationary distribution π (π ) to any G-compatible Markov kernel P (P ).Moreover, this distribution satisfies The main consequence of this theorem is that, in case of any physical road network augmented by the ideal vertex 0, all of the Markov kernels defined on the road network that has positive transition probability on all roads have unique stationary distribution.Thus, it is reasonable to suppose that a real traffic which follows a Markovian dynamic has a local unique stationary distribution in a short time period that can be explored by observing the traffic.

Markov random walk and Markov traffic on road networks
Let (Ω, A, P) be a probability space.Then a V -valued (E-valued) random variable (r.v.) is a X : Ω → V (Y : for all e ∈ E).In this case, X (Y ) is a random function on the set V of vertices (on the set E of edges).For example, X (Y ) can be the random position of a vehicle on the road network G, where the position refers to the actual vertex (edge) which the vehicle belongs to.Then, P(X −1 (v)) = P(X = v) (P(Y −1 (e)) = P(Y = e)) denotes the probability that a vehicle is at the vertex v ∈ V (at the edge e ∈ E).Clearly, by π X (v) := P(X = v), v ∈ V , a r.v.X induces a p.d. π X on V .Similarly, by π Y (e) := P(Y = e), e ∈ E, a r.v.Y induces a p.d. π Y on E.
A sequence {X t } t∈Z+ of V -valued r.v.'s is a Markov chain on the state space V if the Markov property holds: The main concepts of this paper are the Markov random walk and the Markov traffic defined in the following way.
Definition 2. Let the road network G be strongly connected and let P be a G-compatible Markov kernel on V with unique s.d.π.Moreover, let {X t } t∈Z+ be a Markov chain on V such that π X0 = π and X t |X t−1 ∼ P for all t ∈ N.
Then, {X t } t∈Z+ is called Markov random walk on the road network G with Markov kernel P .
The set of k (k ∈ N) mutually independent Markov random walks on G with Markov kernel P is called Markov traffic of size k and it is denoted by the quadruple (G, P, π, k).
Similarly, {Y t } t∈Z+ is a Markov random walk on the line road network if it is a Markov chain on the state space E such that π Y0 = π and Y t |Y t−1 ∼ P for all t ∈ N.
Note that if P is the uniform Markov kernel on G then we obtain the standard random walk of the graph theory, see the survey [58].
A Markov random walk is an individual Markov traffic with k = 1 in the sense that it describes the movement of a random vehicle which follows the stochastic rules defined by the Markov kernel.For a pair u, v ∈ V the notation u ⇒ v will mean that (u, v) ∈ E ∪ S, i.e., either u → v or u = v.If X 1 , X 2 are random functions on V then X 1 ⇒ X 2 means that the two-dimensional distribution of (X 1 , X 2 ) is concentrated on E ∪ S. Clearly, for any Markov random walk {X t } t∈Z+ we have X t ⇒ X t+1 ⇒ . . .⇒ X t+n for all t and n ∈ N. One can also call {X t } t∈Z+ as a first-order random walk on the road network where a vehicle moves from vertex u to vertex v with probability p uv .On the other hand, {Y t } t∈Z+ can be referred as a second-order random walk where the vehicles move from edge to edge, i.e., we have to consider where the vehicle came from, the vertex visited before the current vertex.The second-order random walk has also been considered in graph analysis, see [51].
The state space of a first-order Markov traffic can be described by the function space F(V, N 0 ) where on F k as a multinomial distribution with parameters k and π, see Chapter 35 in [59], given by the formula for all f ∈ F k .By this formula the probabilities of complex events of the traffic can be computed.Then, we define the concept of Markov kernel on We demonstrate that every Markov kernel P induces a natural Markov kernel on F k .The matrix K = (k uv ) u,v∈V is called transport matrix from traffic configuration f to g on the road network

and
u∈V k uv = g v for all v ∈ V .In fact, K has row and column marginals f and g, respectively and, heuristically, K defines a way for transporting the vehicles from configuration f into g on the road network.An example for transport matrix can be seen in Fig 5 .For a pair f , g ∈ F(V, N 0 ) let M(f , g) denote the set of all transport matrix from f to g. Define the Markov kernel R on F k in the following way where f , g ∈ F k .Then, R maps a p.d. into the p.d. R on the state space F k in the following way: for all g ∈ F k .To check that R is a Markov kernel indeed we note that, by the multinomial theorem, Moreover, one can easily see similarly to (8), by the multinomial theorem, that if π is a s.d. of the Markov kernel P , then the p.d. defined by ( 5) is the s.d. of the induced Markov kernel R defined by (6).Namely, we have the global balance equation for all g ∈ F k .(For the proof see Appendix B.) Note that the concepts of traffic configuration and induced Markov kernel on them can be extended to the case of second-order Markov traffic by using the state space F(E, N 0 ).
Ergodicity of Markov traffic.Let π 0 be a p.d. on V .Using π 0 as an initial distribution let us define the n-th absolute p.d. π n by the recursion π n = π n−1 P , n ∈ N. Clearly, π n = π 0 P n , where the product of two Markov kernels P and Q on V is defined as (P Q) uw := v∈V p uv q vw , u, w ∈ V .If G is strongly connected and for the initial distribution π 0 = π, where π is the unique stationary solution for P , see Theorem 1, then π n = π P n = π for all n ∈ N. Thus, in this case, π n → π as n → ∞.However, in general, the sequence (π n ) n∈N does not converge to the stationary distribution for all initial distribution π 0 even if the stationary distribution is unique.However, if we consider the average of the n-th absolute p.d.'s in time, we have the convergence to the unique stationary distribution.The Markov kernel P is called ergodic if it satisfies this property.
The following result on G-compatible Markov kernel is based on Theorem 4.1 in Chapter 3 of [56].If a road network G is strongly connected then any G-compatible Markov kernel P is ergodic and the average Markov kernel A n converges, i.e., A n := (n + 1) −1 (I + P + . . .
as n → ∞, where π is the unique stationary distribution of P .Moreover, the limiting probabilities of the time averages of the absolute p.d.'s satisfy as n → ∞ for all initial p.d. π 0 .
In applications, along absolute p.d.'s, we may also be interested in some functionals of these distributions, e.g., the number of vehicles in a region of the road network.Define the functional Instead of time averages, in order to achieve the convergence of n-th absolute p.d.'s we need the additional assumption of aperodicity for G, see Theorem 2.1 in Chapter 4 of [56].If G is an aperiodic, strongly connected road network and P is a G-compatible Markov kernel on it, then the sequence of Markov kernels P n , n ∈ N, converges to the limiting Markov kernel Π.Moreover, the limit of the sequence of n-th absolute p.d. π n is the unique stationary distribution π to the Markov kernel P which is independent of the initial p.d. π 0 .
For any functional F we also have that F (π n ) converges to F (π) as n → ∞ on an aperiodic, strongly connected road network.In [56] stronger convergence concepts, e.g., the convergence in variation, are also investigated.
Note that similar results hold for any G-compatible Markov kernel P on V = E.
The above results on Markov kernels on V (or E) can be extended to Markov kernels on the state space of traffic configurations.Let 0 be an initial p.d. on F k and let us define the nth absolute p.d. n on F k by the recursion n := R n−1 , n ∈ N, where R is a Markov kernel on F k induced by a G-compatible Markov kernel P on G, see formula (6).One can prove that the irreducibility and aperiodicity of P imply the same properties for R, respectively.
Our main result on ergodicity of Markov traffic is the following theorem.Note that the nth power of R is defined recursively as R n := R(R n−1 ), n = 2, 3, . .., by formula (7).
Theorem 2. Let G be a strongly connected and aperiodic road network and P be a G-compatible Markov kernel.Then, there is a unique stationary distribution to the Markov traffic described by the Markov kernel R on F k induced by P which has the form (5).
Moreover, the Markov traffic is ergodic in the sense that we have as n → ∞ for all f , g ∈ F k and, for all initial p.d.
as n → ∞ for all f ∈ F k .Furthermore, the p.d. π on G can be unfolded by the limit of state space averages in time as as n → ∞ for all v ∈ V .
Note that formula (11) follows from the well-kown fact that the expectation vector of a multivariate distribution with parameters k and π is equal to kπ, see formula (35.6) in [59].
These convergence results guarantee that the unique stationary distribution of a G-compatible Markov kernel can be approximated and thus explored by long run behavior of absolute p.d.'s on the traffic configurations of the road network.
Two-dimensional stationary distribution.Using two-dimensional distributions a Markov traffic can be reparametrized in the following way.Introduce the twodimensional distribution Q = (q uv ) on V × V as q uv := π u p uv , u, v ∈ V .Then, Q is a two-dimensional stationary distribution on G in the following sense: u,v∈V q uv = 1 (i.e., Q is a normalized matrix on V ); and (iii) v∈V q uv = v∈V q vu for all u ∈ V (i.e., Q has equidistributed marginals).

A two-dimensional stationary distribution
Property (iii) states that the two (row-wise and columnwise) marginal distributions of a two-dimensional stationary distribution on G coincide with each other.Clearly, for a Markov traffic, Q defined above is a positive twodimensional distribution on G. Q can also be considered as a p.d. on the state space E ∪ S, i.e., if we extend the set V of vertices of L(G) as V = E ∪ S, on the line digraph.Thus, Q can be interpreted as the distribution of the vehicles on the edges of the road network, i.e., on the line digraph, see formula (11) in [6].The distribution Q can also be visualized on the edges, see, Fig. 6 for the toy example and Fig. 11 in case of the Porto example discussed later.However, the converse of this statement is not true because there is p.d. on the line digraph which does not satisfy (iii).If {X t } t∈Z+ is a Markov random walk then the two dimensional distribution of any consecutive pair (X t , X t+1 ), t ∈ Z + , corresponds with Q.
Denote by Q the set of two-dimensional stationary distributions on G.One can easily see that Q is closed with respect to the affine combination.Namely, if Conversely, for a positive Q ∈ Q, let us define Then, P = (p uv ) defines a G-compatible Markov kernel with stationary distribution π on G. Thus, a Markov traffic defined by the quadruple (G, P, π, k) can be introduced by an equivalent way through the triplet (G, Q, k).However, it will turn out later that, from a statistical point of view, the parameter matrix Q can be estimated in a computationally more efficient way than the pair of transition matrix P and its stationary distribution π.
With the help of two-dimensional stationary distribution, we can assign a p.d. to any Markov traffic on the space of traffic configurations which are defined as counting functions on the edges of the road network.Namely, define the traffic configuration h = (h uv ) u⇒v as a function in F(E ∪ S, N 0 ).Then, h uv denotes the number of vehicles on the edge (u, v) where u, v ∈ V such that u ⇒ v. Similarly to (5), the two-dimensional distribution σ on the set of traffic configurations h with size k (k ∈ N), i.e., u⇒v h uv = k, induced by a p.d. π on G can be expressed as a multinomial distribution with parameter k and Q as Finally, one can easily see that the concept of twodimensional stationary distribution can be extended for the second-order Markov traffic as well.

Trajectories from public datasets
For our experiments, we needed a dataset of real-life traffic trajectory data.In our terminology, a trajectory is a sequence of data that provides information about the path of a vehicle moving from a start to an end point, associating geographic coordinates with timestamps.We required a dataset that satisfies the following criteria: 1. Contains complete trajectories, i.e., the availability of only the start and end points is not sufficient, intermediate trajectory points must also be available.
2. The trajectory points must be sampled at a high enough frequency, so that the distance between consecutive points should not be too large, (e.g., an average distance of the order of 10 meters is acceptable, but an average distance of the order of 100 meters is definitely not.) Figure 6: The two-dimensional stationary distribution (on edges) with its equidistributed marginals (on vertices) on the road network in Fig. 1 for the Markov kernel in Fig. 3.One can easily check that the sums of probabilities written on the edges in and out each vertex are equal, respectively.

Trajectories should cover a relatively small geographic
area, e.g., a city or a district.
5. Vehicles should not follow a fixed route, e.g., public transport bus trajectories are not suitable.
6. Publicly available for research purposes.
These requirements were satisfied by the Taxi Trajectory Prediction (TTP) dataset from Kaggle.The dataset covers a period of one year from July 1, 2013 to June 30, 2014.It is split into a training and a test set, the former contains 1,710,670 trajectories, and the latter contains 320.The trajectories were collected in the city of Porto, Portugal, with a sampling rate of 15 seconds.First, we created a subset of the dataset, filtered to coordinates between W8.6518, W8.5771, N41.1129, N41.1756, see Fig. 7.The data samples' features that were not relevant to the research, such as origin of call, identifiers for individual taxi or customers, and type of day (i.e.workday, weekend, holiday) were omitted.The processed format included the time of departure, both as a timestamp and as distinct date attributes, the length of the trajectory, and the points of the trajectory, represented as a list of GPS coordinates.Some data samples contained incomplete trajectories, these were discarded.Because of the properties of the proposed simulation model, the data was filtered to include only those samples that had a time of departure between 8-9 am.As a result, 82,345 trajectories remained.Although the length of trajectories had a wide range (the longest has 2,324 sample points), long trips were rare.Fig. 8a shows the distribution of the length of trajectories.Most routes were around a length of 41 sample points, and routes with over 150 points were less than 1% of the dataset, see Fig. 8b.The distribution of the trajectory points (all, difference of start and end points,

Building graphs from OpenStreetMap data
OpenStreetMap (OSM) is a community project to build a free map of the world to which anyone can contribute.Data is available under the Open Data Commons Open Database License (ODbL).The representation and storing of map data is based on a simple but powerful model, that uses only three modeling primitives, namely, nodes, ways, and relations5 : 1.A node represents a geographical entity with GPS coordinates.2. A way is an ordered list of at least two nodes.3. A relation is an ordered list of nodes, ways, and/or relations.All of these modeling elements can have associated key-value pairs called tags that describe and refine the meaning of the element to We started our processing by building a graph from the OSM map of Porto, with the same bounding box as the filtered dataset.Specific nodes of the OSM file become the nodes in the graph.Because we only need those nodes that can be reached via vehicles, we had to filter the OSM file and collect only specific types of way nodes.In the OSM file, a way is a sequence of OSM nodes, so naturally, the nodes of ways become nodes in the graph.For every node we store the node's OSM ID, and its coordinates.We also insert an edge into the graph between every nodes in way.The weight of an edge is given by the squared distance between the nodes, which we calculate from the OSM file's data.We used pyosmium library for processing the OSM files and the NetworkX Python library for building the graph.
After building the graph process the list of trajectories.Because the trajectories are given in GPS coordinates, we first have to translate those coordinates into OSM node IDs.For every coordinate in a trajectory, we search for the closest way node's coordinates in the built graph, so the result nodes have the same domain as the built graph's nodes.Obviously, the original trajectories made up of GPS coordinates does not have the same scaling as the OSM map.The coordinates in the trajectory are sampled in regular, but larger time intervals than the OSM, so they are not aligned.In order to match a trajectory to a way in our graph, we had to perform an interpolation on the result list of node IDs, so we ran a Dijkstra's shortest path algorithm on our graph between every node IDs for every trajectory.Because the OSM database contains errors, it can happen that in real life a route exists between two given places, but in the OSM database, there are no existing routes between those nodes that are representing the given places.In this case, we cut the faulty trajectories into pieces.The result of this process is an aperiodic strongly connected road network augmented by the ideal vertex 0.

Statistical inference for Markov traffic using mobile sensors
The statistical analysis of a traffic systems described by the Markov traffic model means the estimation of the quadruple (G, P, π, k) or the triplet (G, Q, k) using observed data.(In this subsection only the first-order Markov traffic is considered.)To estimate G we have to explore the road system under study by identifying the set V of vertices and the set E of directed road segments.Fortunately, this exploring has already been done by a few organizations, see, e.g., the Google Maps and the Open-StreetMap.However, we should note that, in case of GPSbased trajectory data, we have to fit the data to the applied map system which is not an evident task at all, see section 2.5.In the present paper, we propose a method for estimating the two-dimensional stationary distribution Q immediately instead of the pair (P, π) of a transition matrix and its stationary distribution using mobile sensor data which may be gathered by vehicles, passengers etc.In this case, we have trajectories data which consists of the sequences of consecutive vertices, like in the TTP dataset.By (12), the estimators for P and π can be easily derived from an estimator of Q.Finally, it is supposed that the size k of the traffic is known.
Suppose that, for a Markov traffic, we observed a random sample of trajectories {X i }, i = 1, . . ., k, of size k defined by X i 1 ⇒ X i 2 ⇒ . . .⇒ X i ni , i = 1, . . ., k, where n i denotes the length of the i-th trajectory.Let n := n 1 + . . .+ n k be the total sample size.Define the total two-dimensional consecutive empirical frequencies as: u, v ∈ V , where the trajectory-wise two-dimensional consecutive empirical frequencies, i = 1, . . ., k, are defined as Plainly, n i uv denotes the number of consecutive (u, v) (u, v ∈ V ) pairs in the i-th trajectory.One can see that since {X i } is a proper Markov random walk we have n i uv = 0 for all (u, v) / ∈ E ∪ S. Thus, the support of the two-dimensional frequency matrices N := (n uv ) u,v∈V , N i := (n i uv ) u,v∈V , i = 1, . . ., k, is a subset of E ∪ S, i.e., they are weakly G-subordinated matrices.Clearly, N = k i=1 N i and we have u,v:u⇒v where n − k is the corrected sample size.Introduce v ∈ V , i.e., s v denotes the number of trajectories which start at vertex v and e v denotes the number of trajectories which terminate at vertex v, respectively.Denote the one-dimensional marginal frequencies of N by n v+ := u∈V n vu and n +v := u∈V n uv , v ∈ V .We obtain that for all v ∈ V , where n v denotes the number of vertex v in all trajectories.Finally, Define the vectors s and e on V as s := (s v ) v∈V and e := (e v ) v∈V , respectively.Then, ( 16) implies that 1 (e − s) = 0, i.e., the vectors e − s and 1 are orthogonal.The traditional maximum likelihood (ML) estimator P ML of the transition matrix P is given by the maximization of the conditional loglikelihood log L = u⇒v n uv log p uv in parameters p uv , u, v ∈ V such that u ⇒ v, under the constraints p u+ = 1 for all u ∈ V .A solution of this constrained optimization problem is p ML uv = n uv /n u+ for all u, v ∈ V if n u+ > 0 and p ML uv = δ uv if n u+ = 0 where δ denotes the Kronecker delta.The maximum likelihood estimator π ML of the stationary distribution π is derived by the solution of the global balance equation π = π P ML in π.Thus, the maximum likelihood estimator Q ML = ( q ML uv ) of Q is given by q ML uv = π ML u p ML uv , u, v ∈ V .In the sequel, a direct method is proposed for estimating the two-dimensional stationary distribution Q.
A naîve estimator for the two-dimensional stationary distribution Q based on the two-dimensional consecutive empirical frequency matrix N is Q naîve := (n − k) −1 N .Clearly, Q naîve , as a non-negative matrix on V , satisfies the properties (i) and (ii) of Definition 3.However, the problem with this naîve estimator is that its row and column marginals are not necessarily equal, i.e., in general, it does not satisfy the asumption (iii) of Definition 3. Hence, we have to introduce a new estimator Q which belongs to Q and is optimal in some sense.
The optimality of the proposed estimator is defined by means of the least squares distance between matrices over G. Let A = (a uv ) u,v∈V and B = (b uv ) u,v∈V such that a uv = b uv = 0 for all u, v ∈ V where u v, i.e., let A and B be weakly G-subordinated matrices.The distance between A and B is defined as

In fact, • G is the Frobenius norm of the matrices of dimension |V | × |V | which vanish on the entries outside of E ∪ S.
To formulate our error or objective function for estimating the two-dimensional stationary distribution it is convenient to weaken the assumptions of Definition 3 by leaving the normalizing assumption (ii).In the sequel, let M = (m uv ) denote a non-negative parameter matrix on G which satisfies assumptions (i) and (iii) of Definition 3, i.e., M is weakly G-subordinated and v∈V m uv = v∈V m vu for all u ∈ V .Then, one can easily derive a two-dimensional stationary distribution Q from M by its normalization defining as Based on k number of trajectories, using the Frobenius norm, the optimality criterion is defined as the weighted sum of squared errors (SSE): where M is a non-negative parameter matrix satisfying assumptions (i) and (iii) of Definition 3, w = (w i ) i=1,...,k are non-negative unknown weights, i.e., k i=1 w i = 1, and N := (N i ) i=1,...,k denotes the data, where N i is the twodimensional consecutive empirical frequency matrix for the ith trajectory, see (13).The statistical inference for a Markov traffic means the minimization of the objective function SSE in its parameters M and w deriving the weighted least squares (WLS) estimators M WLS and w WLS .Then, the WLS estimator of Q is defined as Q WLS := n −1 eff M WLS where n eff := (1 M WLS 1) is the so-called effective sample size.Here, Q can be interpreted as the estimated two-dimensional stationary distribution which describes the individual Markov traffic in time.On the other hand, n eff gives the equivalent sample size related to the independent case which may be thought of as the information content of the observed data.Note that n eff is not necessarily an integer and is different from n and n − k.Finally, w i gives the importance of the ith trajectory in the sample.One can see that longer trajectory implies higher weight.
To formulate our result on WLS estimation of Markov traffic we need some basic facts on the spectral theory of directed graphs, see [60] for details.The symmetric unnormalized graph Laplacian matrix L of a digraph G is defined as where A denotes the adjacency matrix of G and D := diag{d + + d − }.Note that for the road graph G, since there is no loop, we have The main theorem of this paper is the following.
Theorem 3.There is a unique pair ( M WLS , w WLS ) which minimizes the weighted sum of squared errors SSE defined in (17).These WLS estimators are derived as where λ ∈ F(V, R) is called Lagrange vector and defined as a unique solution to the linear equation Lλ = s − e and • denotes the entrywise (Hadamard) product of matrices.
Based on the previous theorem, by (14), the effective sample size is given as i.e., n eff depends only on the graph structure of the road network, which is independent of the data, the traffic direction vector s − e, and the corrected sample size.However, it does not depend on the data which are inside the trajectories.
The WLS estimators proposed above can be considered as a kind of composite (or quasi-) likelihood estimators for Markov chains, see [9].The composite likelihood method is widely applied in complex statistical models when the full ML method can be difficult to apply or may not be robust enough.In our method, the objective function is based on pairwise marginal distributions, however, instead of formula (2) in [9], the quasi-likelihood function is a square function, the logarithm of the normal probability density with heteroscedastic variance which depends on the length of trajectories.The latter will be more clear by introducing the mean squared error (MSE) as where n i eff := w i n eff denotes the effective sample size of the ith trajectory, i = 1, . . ., k.The parameters of the objective function MSE are the effective sample sizes {n i eff } and the two-dimensional stationary distribution Q.The heuristic explanation of the need to use weights in formulas (17) and ( 19) is the following.By the Central Limit Theorem, for large n i , the trajectory-wise two-dimensional consecutive empirical frequency matrix N i can be approximated as where ξ i is a normally distributed random matrix on V for all i = 1, . . ., k, which is a heteroscedastic equation between the observed N i and the parameter matrix Q.Hence, , where ξ i 2 G , i = 1, . . ., k, are independent identically distributed r.v.'s.Thus, we have to normalize the trajectorywise squared errors proportionally to their lengths, respectively, in order to get balanced error terms.
The estimation theory of finite Markov chains goes back for a long time, see [61].In the traditional ML approach the estimators of the transition and stationary probabilities are derived by corresponding relative frequencies, respectively.However, these estimators have a few problems which imply that they can be applied with limited success for estimating the Markov traffic on a road network.Firstly, they are based on only one long trajectory (or realization).However, in a real traffic dataset there is a large number of relatively short trajectories, i.e., the set {n i , i = 1, . . ., k} are bounded, where k is large or tends to infinity.In our example, for the TTP dataset, the number of trajectories is above 80K with the mean length 40 and maximum length 2K, see Table 2. Secondly, they are asymptotic estimators in the sense that, for finite sample size, the estimated stationary distribution does not satisfy the global balance equation given by the estimated transition probability matrix.The global balance equation holds only asymptotically, i.e., when the sample size tends to infinity.In fact, the inaccuracy in the global balance equation is not too large, however, this little bias can cause significant discrepancy from the "true" stationary distribution in the simulation.Thirdly, the trajectories are biased during a short time period in the sense that they are starting from some parts of the road network and ending at other parts.For example, in the morning period the vehicles are moving from the residential districts to the business districts of the city and they are moving back in the afternoon period.In other words, the traffic has a definite direction on the road network.To demonstrate this behavior in the case of TTP dataset, Fig. 9c shows the distribution of the elements of the traffic direction vector s − e while Fig. 9b shows their spatial distribution.Neither distributions are concentrated around the zero.The known improvements of the ML estimators, e.g., by using the bootstrap, see [62], do not solve these problems.However, the WLS estimator of the two-dimensional stationary distribution proposed in this paper is able to handle all of these problems.The estimator Q WLS is taking account of more than one trajectory with their length.It determines uniquely both the transition probability matrix and its stationary distribution by (12) which satisfy the global balance equation obviously.Finally, by taking account of the traffic direction vector in the estimator, it can correct the bias due to the unbalanced sampling of trajectories on the road network.
The fundamental statement of Theorem 3, as one of the main result of this paper, is that the estimator Q WLS (or M WLS ) consists of two parts: the first part is the naîve estimator for the distribution of the consecutive pairs in trajectories based on the empirical frequencies, while the second part is a correction term ensuring that Q WLS (or M WLS ) has equidistributed marginals.The second part also depends on two components.The first one is the Laplacian matrix of the road graph which depends only on the graph structure of the road network and independent from the trajectory data.The second one is the traffic direction vector which depends only on the trajectory data.Note that all sufficient statistics, namely the total two-dimensional consecutive empirical frequencies and starting and ending empirical frequencies, can be computed by counting, which is numerically very effective and can be executed even for big data.
The following propositions and remarks are useful in computing the Lagrange vector λ in a numerically effective manner.The first proposition sums up the basic properties of the symmetric unnormalized graph Laplacian matrix.Note that 1)-4) are based on Proposition 1 in [60].
Proposition 1.The matrix L satisfies the following properties: 1.For all α ∈ F(V, R) we have 2. L is symmetric and positive semi-definite.
3. The smallest eigenvalue of L is 0, the corresponding eigenvector is the constant one vector 1.
5. If G is strongly connected then τ 2 > 0, i.e., the multiplicity of the eigenvalue τ 1 = 0 is 1, and and the inverse can be expressed as L −1 S = |V | j=2 τ −1 j α j α j .By Proposition 1 the Lagrange vector can be expressed by the eigenvalues and the eigenvectors of L as This formula can be applied for approximating λ by leaving the too large eigenvalues which have negligible inverse in the above finite sum in a large road network.Thus, if |V | is too large then it is enough to store the first few significant eigenvalues and eigenvectors.Remark that these eigenvalues and eigenvectors, being independent from the data, can be computed and stored in advance for a simulation program.
If a sharp bound is needed for the eigenvalues then the application of symmetric normalized Laplacian matrix is more useful.The symmetric normalized adjacency matrix A and the symmetric normalized graph Laplacian matrix L of a digraph G are defined as For a road graph, we have l vv = 1 for all v ∈ V .One can see that The next proposition is based on Proposition 3 in [60], the statements 5)-7) are new as far as we know.
Proposition 2. The matrices A and L satisfy the following properties: 1.For all α ∈ F(V, R) we have 2. L and A are symmetric and L is positive semi-definite.
3. The smallest eigenvalue of L is 0, the corresponding eigenvector is the vector D 1/2 1.
where the infinite series is absolute convergent.Moreover, the convergence is exponentially fast since Instead of computing the inverse of the Laplacian matrices, (e.g., by determining their eigenvalues and eigenvectors, similarly to the Google's PageRank algorithm, see Chapter 15 in [63]), a linear recursion could be computationally more efficient in large-scale problems.For each α ∈ S, L −1 S α can be obtained as the limit of the iteration where β n ∈ S for all n ∈ N.
where λ n ∈ S for all n ∈ N.

Results
To evaluate the performance of the proposed WLS estimation method by comparing it to the traditional ML one discussed in section 2.6 a simple simulation study was conducted at different sample sizes for small and medium road network.In the simulations, in order to mimic the real traffic, we tried to keep the length of trajectories low and the number of trajectories high compared to the size of the road network, similarly to the Porto example.The absolute bias of an investigated estimator Q for the twodimensional stationary distribution Q as a parameter is defined by Q − Q G .The empirical absolute bias and its standard error (SE) correspond to the mean and standard deviation of absolute biases in 100 replications, respectively.All simulations were carried out in Python using the PyDTMC library developed for analysing discrete time Markov chains 8 .The codes and datasets of our simulation are available upon request.
Table 3 displays the simulation results for the small road network in Fig. 1 using the Markov kernel of Fig. 3, see the toy example in Appendix A. The simulation parameters were k = 100, 200, 500, and 1000 number of trajectories with n = 3, 5, and 10 length.The absolute bias and its standard error do not depend on the length n and they are decreasing as k is increasing for both estimation methods.The latter is an expected result.Moreover, while for relatively small k the performance of the WLS and ML methods are similar, in the case of relatively large number of trajectories the ML estimator outperforms the WLS one a little bit.This phenomenon could be due to the asymptotic optimality of the ML estimator because the parameter k is enough large (1000) compared to the size of the road graph (5).
In the second simulation scenario, a strongly connected subgraph, which contains 1000 vertices, of Porto's road network was chosen (exported from the OSM, as well, GPS coordinates W8.6137, W8.5991, N411573, N41.1437).The entries of Markov kernel were generated randomly.The  k n ML WLS 1000 3 0.166 (0.0559) 0.025 (0.0007) 1000 5 0.184 (0.1214) 0.025 (0.0007) 1000 10 0.169 (0.0938) 0.025 (0.0008) 3000 3 0.064 (0.1725) 0.023 (0.0005) 3000 5 0.070 (0.1665) 0.023 (0.0005) 3000 10 0.063 (0.1705) 0.023 (0.0005) 5000 3 0.016 (0.0150) 0.023 (0.0004) 5000 5 0.014 (0.0055) 0.023 (0.0004) 5000 10 0.014 (0.0126) 0.023 (0.0003) simulation parameters were k = 1000, 3000, and 5000 with n = 3, 5, and 10.In this scenario, the absolute bias and its standard error are also independent of the length n.However, there are significant differences between the performances of the two estimation methods (ML and WLS) related to the parameter k.On the one hand, the absolute bias of ML estimator is decreasing as k is increasing while it is constant for WLS estimator.On the other hand, the WLS estimator is better than the ML one in case of k = 1000 but worse in case of k = 5000.Since the former parameter setting is closer to the real traffic, this simulation corroborates the superiority of WLS method based on two-dimensional stationary distribution against the traditional maximum likelihood.Finally, in this scenario, the scale of the SE's indicates that the WLS estimator is more stable than the ML one.
We have also implemented the model in the OOCWC system.First, we have filtered the TTP dataset as described in section 2.4.Then, we have created the Markov kernel from the filtered dataset following the method described in section 2.6 (see also Appendix A for a toy example).Regarding the RCE, we performed several modifications.First, we extended the operation of the RCE to be able to handle kernel files.This kernel file can be loaded to the RCE software, so all nodes of the simulation graph will have the corresponding transition probability vector from the Markov kernel file.For this, we had to extend the shared memory segment of the RCE.We should note, however, that not all nodes can be found in the Markov kernel, because it can happen that the dataset does not completely cover the whole map, i.e., not all nodes are part of a trajectory.In this case, we set uniform distribution for the corresponding node.Finally, we had to modify the basic operation of the simulation algorithm.In the original implementation, the cars are moving on the map quite randomly.Now, a car selects the next node based on the transition probability vector of the current node.For this, we use the pseudo-random number generation engine from the Boost Random library that is based on the method presented in paper [64].Let's consider an example.We are at the graph vertex (or intersection) of OSM node ID 1110673569 (with GPS coordinates 41.1752185, -8.6231927).The total transitions of this node (i.e. the total trajectories that cross this intersection) in the dataset is 1,649.The transitions to the neighbor nodes are shown in Table 5. Please note that the actual transition probability (TP) is not the same as the ratio of the transitions to the neighbor node and the total transitions of the node which is called frequency (or ML) based transition probabilities.The actual transition probability comes from the Markov kernel of the whole graph.The two kinds of transition probabilities are also compared in Table 5 where the WLS based transition probabilities have been derived by our method.One can already see in this simple example that the difference between the two methods could be huge.This small example can be observed in Fig. 10, as well.
The initialization phase of the simulation adds traffic units to the map.Each unit is placed to an OSM node, i.e., on a vertex of the simulation graph.There exist two ways to do this, one is following a prescribed distribution (e.g.uniform), the other is following measured data.In our test case, we initialized simulations with fictional measured data.We put units only to the streets Rua de Antero de Quental, Rua da Constituição and Rua da Boavista (25.6%, 51.4%, 23% of the cars, respectively), i.e., the simulation starts from the traffic configuration which is concentrated on three nodes of the road graph.In addition, we can set the number of the simulation units.We run simulations with k = 5, 000, 10, 000, 20, 000, 30, 000 and 50, 000 units.The simulation starts when all simulation units are added to the map.Fig. 13 shows the change of the distribution of cars during the simulation.
The RCE produces a logfile that contains the position of every simulation unit in every simulation step.From this file, we create a new file that contains the number of cars by streets in every minute, so we can observe the change of distribution of the cars.In addition, we calculated the stationary distribution of cars for streets in the city of Porto, see Fig. 11.This latter one tells us, what is the probability that a car is on a given street.It is worth noting the similarity between this figure and Fig. 9a.The ticker line on Fig. 11 corresponds to increasingly hot color on Fig. 9a.
To obtain a quantitative measure that describes the "goodness" of our simulation algorithm, we applied the Pearson's chi-squared test.We expect, by Theorem 2, that during the simulation, independently from the initial distribution, within a certain time period, the distribution of the cars become close to the previously calculated stationary distribution.Fig. 12 shows the test results.We can observe that in the first few minutes the test statistic is significantly high, meaning that the distribution of the cars is still far from the steady-state.However, after a time period that depends on the number of cars, the test statistic becomes low, meaning that the distribution became steady.One can observe that it takes more time to reach the steady-state with more traffic units, which is reasonable.Another notable trend is the case of 5,000 cars, where the line is elevating after reaching the steady-state.This can be caused by the low number of cars.The number of individual streets (named or unnamed, e.g.motorway junctions) is 2,194.5,000 cars are simply not enough to reach and hold a steady-state in this type of simulation.
Finally, we should note some implementation details and possible drawbacks of this model that may have an impact on the model's overall performance.
In small and medium graphs, the proposed algorithm and its implementation performs as it is expected.But in the case of our Porto example, where the graph has 33,961 nodes and 53,126 edges and the TP matrix is very sparse, numerical problems may occur.One problem can occur when we calculate the Q WLS estimator and then TP matrix using the method described in section 2.6.For matrices with this size (34,000 x 34,000), we cannot solve the linear equation of the Lagrange vector in Theorem 3 always numerically, thus we could only use the least square solution for a numerically stable calculation.In some cases, this causes impossible numbers to present in the TP matrix, e.g. for a node, the TP vector is [1.17489, -0.174894], which is obviously impossible.It is interesting to note that the sum of these "malfunctional" TP vectors are 1 all the time, and mostly occurs if the node has a low number of transitions (less than 20).In such cases, we use the frequency based TP.Another numerical problem can occur when we calculate the stationary distribution π, namely, negative values may present in the results.We need to handle this problem when we calculate the Pearson's chi-   squared test.We chose to shift every value of π until we get a sum of 1 for π.Some minor issues can occur with the map database and the differences between the Porto dataset and the OSM data.In some cases, we could not calculate a route between two consecutive trajectory points using OSM data.This can happen because of the imperfection of the OSM data or false GPS measurement.We handle this case by splitting the trajectory into pieces.
Another minor issue arises in the calculation of the Pearson's chi-squared test.Since the OSM Porto map and the trajectory dataset do not cover each other perfectly, we only know the stationary distribution π for a subgraph of the whole map.During the simulation the units can traverse the whole map graph, so, it can happen that a traffic unit reaches an edge which is not part of the subgraph where we know the stationary distribution π.During the calculation of the Pearson's chi-squared test, we consider only those cars that are present on the road network, where the stationary distribution π is known.

Conclusions
In this paper, we have described our traffic simulation model that is called "Markov traffic" based on tools from graph theory and Markov modelling.The aim was to provide a simulation method that is able to keep the distribution of the cars on the map in a steady-state on a large scale road network.We have proven that, under general assumptions, the stationary distribution is unique for any Markov transition mechanism on a wide class of road networks.An explicit formula has also been derived for the stationary distribution.
We have shown that the stationary distribution, with the related transition mechanism, can be explored from observed data based on sample trajectories.We have provided a statistical method and proved its optimality with which we can create the Markov kernel necessary to obtain a Markov traffic on a given road graph.Using this kernel, we can initiate traffic simulations that provide a stationary distribution of the cars on the map.
To provide an example for creating this kernel file, we have used a publicly available dataset, namely the Taxi Trajectory Prediction dataset.Our simulation uses Open-StreetMap, a free map database.
To test our theories, we have implemented the proposed model in our simulation program (RCE).We have run simulations and it has been proved to provide a stationary distribution on the map graph of Porto, Portugal.The whole project (including the RCE) is available for download. 9Some simulation video is available at the YouTube channel of the first author. 10 Future work will focus on the possible applications of our simulation approach, e.g., modelling the pollution or energy consumption in a city due to multi-modal traffic with gasoline, diesel, electric and plug-in hybrid vehicles.{1, 2, 3, 4, 5} and E := {(1, 2), (2, 1), (2,3), (2,4), (3,4) The entries of this matrix are the number of directed walks of length 4 between the pairs of vertices.One can see that the in-and outdegree of vertices are given as: T .
Define the Markov kernel P on the road network G as: where S contains the eigenvalues in its diagonal and O is the orthonormal matrix of eigenvectors in its columns.The multiplicity of the smallest eigenvalue 0 is 1 which shows that the road network is strongly connected.The inverse of L on the subspace S which is the generalized inverse or Moore-Penrose inverse of L, can be derived as

Appendix B. Proofs
Proof of formula (9).For all g ∈ F k we have by formulas ( 5) and ( 6) and the multinomial theorem that Clearly, if •, • G denotes the inner product induced by the norm • G ,

Figure 1 :
Figure 1: A simple road network.

Fig. 1
Fig. 1 presents a simple example for road network.Let S denote the diagonal set of V , i.e., S :={(v, v)|v ∈ V }.If v → v, i.e., (v, v) ∈ E, for a v ∈ V ,then it is called a loop which connects the vertex v to itself.If there would be a loop in a road network then there would be a physical , w)}, i.e., v − and v + are the sets of arrows in and out the node v, respectively.Note that deg − (v) = |v − | and deg + (v) = |v + | is the indegree and outdegree of v, respectively, where | • | denotes the cardinality of a set.

Figure 2 :
Figure 2: The degree distribution histograms of the Porto map traffic graph.

Figure 3 :
Figure 3: A Markov kernel (on edges) with its stationary distribution (on vertices) on the road network in Fig. 1.
called traffic configuration or counting function and f v measures the number of vehicles at vertex v ∈ V .Let |f | denote the size of the traffic configuration f defined by |f | := v∈V f v .The size of a traffic configuration counts the number of vehicles on the road network at a time.Let F k (k ∈ N) denote the subset of traffic configurations of size k

Figure 7 :
Figure 7: The map of the observed area.The graph created from the OSM data has 33,961 nodes, 53,126 edges, and covers a total of 857.26 km of road.The size of the area is about 43.68 km 2 .(Image source: openstreetmap.org) .
Histogram of trajectory lengths.The rightmost bar represents trajectories longer than 12,500 meters.The average trajectory length is 3,628.93meters.Histogram of number of sample points per trajectories.The rightmost bar represents trajectories with more than 200 sample points.On the average, a trajectory consists of 40 sample points and takes 10 minutes.

Figure 8 :
Figure 8: Histograms of lenght of trajectories and sample points.

Figure 9 :
Figure 9: Distribution of trajectory points of the filtered dataset.

Figure 10 :
Figure 10: A visual explanation of transitions of intersection 1110673569.TP means transition probability, yellow arrows indicate directions, red dots indicate nodes.(Source of the map: openstreetmap.org,annotated by the authors.)

Figure 11 :
Figure 11: The two-dimensional stationary distribution of cars in Porto based on the TTP dataset.

.
is weakly G-subordinated.Fig.3displays the Markov kernel P denoting the transition probabilities on the edges and its stationary distribution π denoting on the vertices.Note that π = (1/7, 2/7, 1/7, 2/7, 1/7) .The symmetric unnormalized graph Laplacian matrix L and the symmetric normalized graph Laplacian matrix L of the road network G are given as:The eigenvalues and eigenvectors of the symmetric unnormalized graph Laplacian matrix L are given as:

π
eigenvalues of L are in [0, 2] and the smallest one is 0 with multiplicity 1.In this example, the speed of convergence in Proposition 2 is κ = max{0.23,0.73} = 0.73.Let the following trajectories be observed in the road network G: total sample size is n = 3350, the number of trajectories is k = 1000 and the two-dimensional consecutive frequency matrix N is given by Lagrange multiplicators are given as: λ = −116.66−16.66 116.66 0 16.66 .Thus, the correction matrix R is given by Since d − = d + we have n eff = (n − k) = 2350, andN + R =The stationary distribution is given as π WLS = 0.149 0.362 0.142 0.213 0.135 and one can check that π WLS is indeed the stationary distribution of the estimated Markov kernel: Note that the estimated Markov kernel using the standard maximum likelihood estimator is given as ML = 0.224 0.398 0.1 0.174 0.104 .

1 i 2 GFigure 13 :
Figure 13: The change of the distribution of cars during the simulation (5,000 (a-c), 10,000 (d-f ), 20,000 (g-i) and 50,000 (j-l) cars, initial state, after 30 mins, after 60 mins, respectively).The thickness of the street is proportionate with the number of cars on the street.
Figure4: The stationary distribution of the Markov kernel in Table1.

Table 1 :
An example for a Markov kernel on the minimal line digraph of the road network in Fig.1.

Table 3 :
Simulation results, absolute bias and SE (inside parenthesis), for the Markov kernel in Fig.3on the road network in Fig.1.(k -number, n -length of trajectories)

Table 4 :
Simulation results, absolute bias and SE (inside parenthesis), for a part of Porto's map with 1000 vertices.(k -number, nlength of trajectories)