Figures
Abstract
Accurate traffic volume prediction is essential for managing congestion, improving road safety, mitigating environmental impacts, and supporting long-term transportation planning. The traditional four-step travel demand model (FSM) is a well-established framework, but it relies on static survey data, substantial calibration effort, and simplified behavioural assumptions that may not adequately capture complex travel patterns. In contrast, data-driven models are capable of learning nonlinear relationships from large datasets, yet they are often designed for short-term forecasting and typically do not target the long-term, segment-level volume estimation tasks required for strategic planning. This study proposes Mukara, a deep learning framework that directly approximates the mapping from external socioeconomic and network features to observed traffic volumes on highway trunk road segments. The model is trained on eight years of data from England and Wales and incorporates population, employment, land use, road network characteristics, and points of interest as inputs. Mukara achieves a mean GEH of 50.74, a mean absolute error of 8,989 vehicles per day, and an R2 of 0.583 under random cross-validation, outperforming baseline models and existing studies under comparable settings. Under a more stringent region-based spatial cross-validation scheme, performance remains robust, demonstrating strong spatial transferability. Ablation experiments further demonstrate the robustness of the proposed architecture and reveal the relative importance of different input feature groups for prediction.
Citation: Li Y, Chen S, Jin Y (2026) Mukara: A deep learning alternative to the four-step travel demand model with a case study on interurban highway traffic prediction in the UK. PLoS One 21(4): e0345576. https://doi.org/10.1371/journal.pone.0345576
Editor: Gianluca Genovese, University of Salerno: Universita degli Studi di Salerno, ITALY
Received: April 29, 2025; Accepted: March 6, 2026; Published: April 16, 2026
Copyright: © 2026 Li et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All python scripts and files are available from the GitHub repository https://github.com/yueli901/mukara.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Traffic prediction plays a pivotal role in addressing critical challenges such as reducing congestion, mitigating carbon emissions and pollution, and improving road safety and public health [1–5]. In the United Kingdom, road transport is the predominant mode of travel, accounting for 86% of all passenger kilometres in 2022 [6]. This trend is consistent with many OECD countries, where road transport similarly dominates passenger travel [7]. Simultaneously, vehicle ownership is rapidly increasing in the Global South, with projections indicating that by 2030, 56% of the world’s vehicles will be owned by non-OECD countries, compared to 24% in 2002 [8]. A robust traffic prediction system can help travellers plan routes effectively, assist traffic operators in informed decision-making, and enhance overall traffic management efficiency [9].
Despite advancements in traffic prediction, research has predominantly focused on urban traffic, leaving interurban traffic networks relatively underexplored [10]. Interurban settings, however, present a unique opportunity for testing novel traffic prediction models due to the abundance of high-quality data and the relatively lower complexity of traffic patterns compared to urban areas. Urban traffic is often influenced by localised factors such as pedestrian activity, public transit systems, and highly variable demand patterns, making it noisier and more challenging to model. In contrast, interurban traffic data typically exhibits more stable and predictable patterns, making it ideal for evaluating the feasibility of innovative approaches like Mukara.
As shown in Table 1, traffic prediction has traditionally relied on two main approaches: the four-step travel demand model (FSM) and deep learning-based models. The FSM has long served as a foundational framework in transportation planning [11]. It decomposes travel behaviour into trip generation and attraction, trip distribution, mode choice, and traffic assignment. A concise overview of the FSM structure and its sequential components is provided in Appendix S1 (S1 File). In principle, the FSM provides a clear mapping from zonal socioeconomic inputs to an origin-destination (OD) matrix and then to link-level flows. Its modular structure and behavioural grounding have made it a cornerstone of planning practice for decades. In practice, however, FSM implementations often rely on restrictive functional forms and oversimplified assumptions. The sequential structure of the FSM also treats its steps as largely independent, even though destination choice, mode choice, and route choice are closely interrelated in reality [12]. As a result, FSM-based workflows typically require repeated calibration of the OD matrix and assignment parameters, which is labour-intensive and increasingly difficult at large spatial scales or under novel planning scenarios [13]. A case study in Istanbul found huge discrepancies between daily traffic predicted by FSM and the actual observed daily traffic volume [14].
On the other hand, deep learning models have emerged as powerful alternatives due to their ability to model complex spatial-temporal dependencies in traffic data [10,15]. Architectures such as Convolutional Neural Networks (CNNs) [16], Recurrent Neural Networks (RNNs) [17], Long Short-Term Memory networks (LSTMs) [18], and Gated Recurrent Units (GRUs) [19] have significantly improved the modelling of temporal trends, while recent advancements like Transformers [20], Graph Neural Networks (GNNs) [21], and Graph Attention Networks (GATs) [22] further enable spatial reasoning within graph-structured networks. State-of-the-art models such as ST-ResNet [23], DCRNN [24], ConvLSTM [25], PFNet [26], and STGAT [27] incorporate these techniques to produce highly accurate short-term predictions. However, most of these models treat traffic prediction as a time-series forecasting task based on historical sensor readings. As such, their outputs often reflect patterns seen in the past rather than insights into the determinants of traffic dynamics. This is illustrated in recent work such as ST-MetaNet [28,29], where the predictions strongly mirror input trends. While some models incorporate external features such as points of interest (POIs), road types, and event data [23,30], these are typically used as auxiliary inputs rather than as primary determinants.
Motivated by these limitations, recent studies have explored machine learning models as alternatives for individual steps of the FSM. Deep learning–based gravity models have been proposed as predictive alternatives to traditional trip distribution formulations, directly estimating OD flows from zonal attributes [31]. Other work has focused on zone-to-zone travel demand forecasting using data-driven models [32], mode choice prediction using machine learning classifiers [33], or approximating traffic assignment and equilibrium behaviour using graph neural networks [34]. These studies demonstrate that machine learning can improve flexibility and predictive performance for specific components of the demand modelling pipeline. However, most existing approaches focus on approximating individual components of the FSM in isolation, are often difficult to interpret, and rarely produce link-level traffic volumes in a fully integrated and scalable manner.
These observations highlight complementary strengths and limitations across modelling paradigms. FSM offers interpretability and theoretical structure but can be limited in flexibility, scalability, and calibration efficiency. Data-driven models offer the potential to improve prediction accuracy by uncovering non-linear relationships and leveraging diverse data sources, but they are often not designed for long-term planning applications and rarely target full segment-level demand estimation. There remains a need for a modelling framework that preserves the strategic planning orientation of the FSM while leveraging modern machine learning to directly estimate segment-level traffic volumes.
In this study, we propose Mukara, a deep learning framework that approximates the aggregate relationship between external demand-related and network features and observed highway traffic volumes. The objectives are threefold: (1) to develop a data-driven model that directly estimates national-scale highway traffic volumes from socioeconomic characteristics and road network structure; (2) to evaluate its predictive performance and spatial generalisability by benchmarking it against baseline models; and (3) to examine the contribution of different feature groups to prediction accuracy through ablation analysis. We name our model “Mukara”, derived from the Japanese term meaning “from nothing”, reflecting its objective of predicting traffic without relying on historical sensor readings.
Materials and methods
Overview
We now introduce the workflow of this study. First, we construct a highway trunk road network, where the graph structure encodes connectivity information relevant to trip distribution across OD pairs. Each road segment is enriched with attributes such as driving distance and driving duration, which correspond to the generalised travel cost inputs typically used in the assignment step of FSM. Next, we integrate rasterised geographic datasets including population density, employment statistics, land use areas, the number of various types of POIs, and aggregated measures of local road infrastructure. These inputs serve to approximate key elements of trip generation and mode choice, capturing the spatial and socioeconomic determinants of travel demand that are traditionally modelled using regression and discrete choice models within FSM [11]. They have also been shown to be fundamental determinants of travel demand across the broader transport and urban economics literature, consistently shaping trip frequencies, spatial interaction patterns, and network flows [35–38]. Ground truth traffic volume data is aligned with corresponding road segments to enable supervised learning.
Mukara is trained using sensor-labelled road segments in the training set and evaluated on spatially distinct test segments to assess spatial generalisability. By structuring inputs around the core components of FSM and learning end-to-end mappings to observed traffic volumes, Mukara provides a data-driven predictive approximation of the overall demand-to-flow process without explicitly modelling intermediate behavioural stages. Fig 1 illustrates the complete methodological pipeline. Table 2 is a summary of all symbols used in this study. Table 3 provides a summary of all data sources, including the source agency, time coverage, spatial resolution, and intended usage within the study.
Data
Highway network.
To study interurban traffic dynamics, a highway network graph covering the entire region of England was constructed based on the National Highways Strategic Road Network [39]. This graph serves as the backbone for information propagation in Mukara. The network consists of 181 nodes (), representing highway trunk road junctions, typically located near major cities, and 498 edges (
), corresponding to 249 highway trunk road segments in both directions. Each edge is assigned a sensor from the National Highway Traffic Information System (TRIS) [40], resulting in a total of 498 sensors. Nodes and edges were selected to capture traffic flow along trunk roads connecting major cities and towns, while avoiding bypasses and highway exits. Details of the sensor selection procedure and the representativeness evaluation of the selected sensors are included in Appendix S2-S3 (S1 File).
Features on highway trunk road segments in both directions were collected using the Google Routes Application Programming Interface (API). The extracted edge features include driving duration, driving distance, straight-line distance, average driving speed, and detour factor. Together, these edge features, , are structured into a tensor with dimensions
, where
is the total number of edges and
is the number of features. All edge features are normalised before being input into the model. Fig 2 provides a visual summary of the distributions of these edge features.
Namely, driving distance, driving duration, straight-line distance, average driving speed, and detour factor. These metrics were normalised for input into the model.
Population and employment.
Population data was sourced from the Population Estimates – Small Area dataset [41], provided by the Office for National Statistics (ONS) through the National Online Manpower Information System (NOMIS) service. This dataset provides annual population estimates for England and Wales at the Lower Layer Super Output Area (LSOA) level, stratified by age group and sex. Employment data was obtained from the Business Register and Employment Survey (BRES) [42], also provided by ONS through NOMIS. This dataset includes employment counts, covering full-time, part-time, and self-employed workers across all industries within England and Wales.
For both datasets, data from the years 2015–2022 were selected. The LSOA-based data was rasterised into a 1 km x 1 km grid using LSOA boundaries provided by ONS [43,44]. This process resulted in a grid tensor with dimensions
, where
is the number of years,
and
are the height and width of the grid, and
represents the number of grid channels for population and employment.
To account for differences in travel behaviour across demographic groups, separate channels were created for population and employment strata. The input data were stratified accordingly, and Table 4 summarises the resulting input channels. In each training task, all channels or a subset of these channels were selected to analyse the model’s performance under different levels of stratification of the input data. Fig 3 visualises the aggregated population and employment data in 2022 as heat maps.
Higher intensity indicates areas with larger population and employment density, based on LSOA-level data rasterised to 1 km x 1 km grid.
Land use, road network, and POI.
Land use, road network, and POI data were sourced from OpenStreetMap (OSM) and downloaded via Geofabrik’s free download server [45]. The data were extracted from a historical snapshot of the England subregions and Wales .osm.pbf files with the timestamp 2023-01-01. This static snapshot was chosen for all years from 2015 to 2022 to ensure consistency and completeness, as land use and POI data are relatively stable over time. To assess robustness to OSM data vintage, we conducted an additional sensitivity analysis using an alternative historical OSM snapshot (timestamp: 2019-01-01), selected to be temporally closer to the early portion of the study period. Results of this analysis are reported in Appendix S4 (S1 File).
The tags used to extract the data are summarised in Table 5. These include land use classifications (e.g., residential, industrial), road network hierarchies from high-level motorways to low-level residential roads, and a diverse set of POI categories such as transport, food, health, education, and retail facilities. The selection of these tags was based on the Deep Gravity model [31] and their availability in the OSM database.
For each grid cell, the following metrics were calculated: total area of each land use type, total length of roads for each hierarchical level, and total number of POIs for each category. These metrics were then aggregated into a grid tensor, , with dimensions
, where
and
represent the height and width of the grid, and
is the number of grid channels corresponding to the land use and POI categories. Fig 4 visualises these features as heat maps. To construct the final grid features
, the tensor
was broadcast on the time axis resulting in
, which was then concatenated with the population and employment tensor
along the channel dimension:
.
The features were aggregated to 1 km x 1 km grid.
Traffic volume.
The ground truth traffic volume data were sourced from the Traffic Information System (TRIS), managed by National Highways [47]. TRIS provides comprehensive data on traffic speed and volume, collected in 15-minute intervals using loop sensors. Across England, 19,364 sensors are integrated into this network, which has been operational since 2014. The data used in this study were accessed and downloaded through the API provided by TRIS [40].
To align the traffic volume data with the input features, we used 8 years of traffic volume records from 1 January 2015 to 31 December 2022, for each of the 498 sensors included in the established highway network. The mean weekday daily traffic volume was calculated for each year and each sensor to generate the ground truth tensor , with dimensions
. Weekends and bank holidays are excluded to focus on regular traffic patterns. A sensitivity analysis of including weekends and holidays is provided in Appendix S5 (S1 File). This mean-based aggregation serves to smooth out daily variability and reduce noise caused by anomalous or irregular traffic days. The resulting average reflects the typical structural demand for highway usage under normal conditions, aligning with the planning-level nature of our study. Fig 5 shows a histogram of the average traffic volumes across the 498 sensors over the 8-year period, illustrating the variability in traffic levels. Fig 6 provides a spatial visualisation of these volumes, averaged over both directions.
The histogram shows traffic volumes across the 498 sensors, calculated over 8 years (2015–2022).
The figure shows traffic levels across the 498 sensors, averaged over 8 years (2015–2022) and aggregated for both directions of the same highway segment.
Mukara model
The goal of the Mukara model is to predict weekday daily traffic volumes for all highway segments in a given year, using only external features. The model leverages three primary inputs: the graph structure of the highway network, edge-level characteristics, and grid-based contextual features for the corresponding year. The prediction task for year t is formally defined as:
where denotes the set of predicted traffic volumes for all edges in year t;
represents the highway network graph, where
is the set of nodes and
is the set of directed edges;
is the set of raw feature vectors for each edge
; and
is the rasterised grid-based input tensor for year t. For notational simplicity, we omit the time subscript t in the remainder of this section where the context is clear.
To efficiently pass information from inputs to outputs, the Mukara model is composed of two main building blocks: a CNN block for processing spatial grid features, and a GAT block for capturing topological and relational information from the network. An overview of these blocks is provided in Figs 7 and 8.
The block processes grid features () by extracting regions of interest (ROIs) around each node, applying convolutional and pooling layers, and generating node embeddings (
) for subsequent graph attention processing.
Initial node embeddings () are refined through multiple GAT layers, incorporating edge embeddings (
). The final embeddings are concatenated and passed through an MLP to predict traffic volumes for edges. Solid and dashed lines represent training and test edges, respectively.
The CNN block is responsible for processing the grid features to generate node-specific embeddings by extracting and encoding information from a fixed-size Region of Interest (ROI) centred around each node. CNNs are particularly suitable for this task, as they not only flatten the grid into a usable vector but also extract rich spatial patterns (e.g., density gradients, clustering effects) that are often lost in simple aggregations. This enables more nuanced representation of local environmental context. For each node
, its geographic coordinates are used to locate the corresponding centre pixel in
. A square ROI of fixed size that is aligned with the pixel grid and centred at this location is extracted to represent the spatial context around the node. If the ROI extends beyond the boundaries of the grid, zero-padding is applied to maintain consistent input dimensions. The ROI size, defined in pixel units, corresponds to a real-world spatial area (e.g., 25 km × 25 km) and is treated as a tunable hyperparameter. The extracted ROIs are also three-dimensional tensors, with the first two dimensions representing the spatial extent (height and width), and the third dimension representing the number of feature channels in
.
Each ROI is passed through a series of convolutional layers with depth LCNN, followed by Rectified Linear Unit (ReLU) activations and max-pooling operations. The final convolutional output is flattened into a one-dimensional vector, producing the initial node embedding . The entire CNN transformation can be expressed as:
where is the initial embedding of node
produced by the CNN block; LCNN is the number of convolutional layers, ∘ denotes function composition; and
is the region of interest centred at node
extracted from the grid input
.
This initial node embedding captures the spatial and socioeconomic context surrounding each node, which is related to the trip generation process, and serves as the starting point for graph-based reasoning.
Following the CNN block, the GAT block refines the node embeddings by integrating information from neighbouring nodes and the attributes of the connecting edges. Each edge embedding is generated by applying a Multi-Layer Perceptron (MLP) to the raw edge feature vector
:
where is the raw feature vector associated with edge
, and
denotes the shared edge embedding layer applied across all edges. The output of this layer is the edge embedding
.
The GAT block operates over multiple layers, each performing an attention-based message passing step. At each layer l, the node embeddings are updated by attending to their neighbours and the features of the connecting edges, based on the graph structure:
where is the embedding of node
at layer l;
is the embedding from the previous layer;
is the embedding of the edge connecting
and its neighbour
; and
is the input graph. The function
represents the graph attention operation of the l-th GAT layer.
The attention mechanism computes unnormalised scores that quantify the importance of node
to node
, based on their respective embeddings and the edge connecting them:
where and
are learnable linear projection matrices applied to node and edge embeddings, respectively;
is a learnable attention vector; and
denotes vector concatenation. To explicitly incorporate edge information, the edge embedding
is included in the attention mechanism. This allows the model to modulate attention weights not only based on node content but also on attributes of the edge (e.g., distance, speed, or detour factor), which are highly relevant for traffic flow. As a result, the edge-aware attention improves the ability of the model to capture meaningful spatial interactions in the highway network. For clarity, we omit the attention head index k in the notation for
and
, although in practice, separate sets of attention parameters are learned for each of the K attention heads at each layer.
Within each layer, the attention coefficients are computed by normalising the attention scores
across the neighbouring nodes
of node
:
Using these attention coefficients, each node updates its embedding for attention head k by aggregating the transformed embeddings of its neighbours:
where denotes a non-linear activation function such as ReLU, and
indexes different attention heads.
The outputs from all K attention heads are concatenated and passed through a MLP to produce the updated embedding for node at layer l:
The GAT block sequentially updates node embeddings by calculating attention scores, normalising these scores, and aggregating neighbour information across LGAT layers. By the end of the process, the node embeddings encapsulate both local contexts and wider network information.
Finally, traffic volume predictions for each edge are obtained by concatenating the final embeddings of the origin node (), the destination node (
), and the edge embedding (
), and passing the result through a prediction MLP:
This structured design enables Mukara to effectively capture spatial, relational, and feature-based dependencies, leading to accurate predictions of traffic volumes across the highway network.
Model training and experimental settings
Loss function and evaluation metrics.
The Mukara model is trained using the mean of the Geoffrey E. Havers (GEH) statistic [48]—hereafter referred to as MGEH—a metric widely used to evaluate the goodness-of-fit of traffic models. The GEH statistic accounts for both the absolute difference and the percentage difference between the modelled and observed flows, making it particularly suitable for traffic volume prediction tasks. Unlike the commonly used Mean Squared Error (MSE), GEH emphasises proportionality, allowing errors to be evaluated relative to the magnitude of the observed volumes. A recent study has shown that the GEH loss function is consistent and outperforms Mean Absolute Error (MAE) and MSE in most cases [49].
The GEH statistic for an individual edge is defined as:
where is the observed traffic volume and
is the predicted traffic volume for edge
. The time subscript t in this equation and the equations for MAE and MSE are omitted for simplicity.
For evaluation, in addition to MGEH, we report the Mean Absolute Error (MAE) and the coefficient of determination R2. MAE, MSE, and the coefficient of determination R2 are defined as follows:
where denotes the mean observed traffic volume across all evaluated edges.
The MGEH loss function used for training is the mean of the GEH statistics across all edges:
To assess the robustness of this choice of objective function, we conducted a loss-function sensitivity analysis (Appendix S6, S1 File), in which Mukara was re-trained using alternative objectives (MSE, MAE, and Huber loss) under the same spatially blocked cross-validation protocol. The results indicate that aggregate predictive performance remains broadly consistent across loss specifications.
Training algorithm.
Model training and evaluation were conducted using both random cross-validation (CV) and spatial CV. For CV, a five-fold scheme was adopted. In each iteration, one fold was held out as the test set, while the remaining four folds were used for model training. This process was repeated five times so that each subset served as the test set once. For spatial CV, the 498 highway trunk-road segments were grouped according to the nine official regions of England. A nine-fold spatial CV procedure was then implemented, in which all segments within one region were held out as the test set in each fold, while the remaining eight regions constituted the training set. Ground truths for segments in the test region were used exclusively for evaluation and were not accessed during model training. This evaluation design is consistent with the study’s objective of assessing the feasibility of predicting traffic volumes for geographically unobserved highway segments, thereby providing a stringent test of spatial transferability.
As shown in Algorithm 1, the training process involves iteratively selecting one year of grid features from the training data, performing a forward pass to make predictions, calculating the loss to measure prediction errors, and conducting a backward pass to compute gradients. The parameters are updated after each batch, and this cycle is repeated for a predefined number of epochs. While using the entire year of training samples as a batch is quite computationally generous, it ensures that the model learns from all sensors simultaneously. Experiments showed that this approach achieves lower loss compared to splitting the samples into smaller batches.
Algorithm 1 Training algorithm for the Mukara model
Input: Highway network graph , edge features
, grid features
.
Output: Predicted traffic volumes .
1: Split sensors into five folds, and select one fold as the test set.
2: Initialise model parameters .
3: for epoch = 1 to Nepochs do
4: for each year t in training data do
5: Extract grid features from
.
6: Extract ground truth traffic volumes from
for all sensors.
7: Perform a forward pass through the Mukara model:
8: Compute the training loss:
9: Compute gradients: .
10: Update parameters using gradient descent with learning rate :
11: end for
12: end for
13: Return: Trained Mukara model and predicted traffic volumes .
Experimental settings.
For the default grid features, we use aggregated population, aggregated employment, all land use, road network, and POI features, resulting in a total of channels for the grid tensor. The default model hyperparameters are as follows: The ROI size is set to 25, corresponding to 25 km, which covers typical spatial extents of small to medium-sized UK cities and aligns with observed urban activity ranges such as commuting distances and economic catchment areas. This choice ensures that the model captures sufficient spatial context without introducing excessive noise from distant, unrelated regions. The CNN block consists of LCNN = 3 layers with channel sizes of 16, 32, and 64 for each layer. The kernel size is set to 3, strides are set to 1, and max pooling is applied with a pool size of 2 and strides of 2, effectively reducing spatial dimensions while preserving relevant feature patterns. The output dense layer of the CNN block, which also serves as the node embedding size in the GAT block, is set to 16 to balance representational capacity and computational efficiency. The GAT block is composed of LGAT = 5 layers, each employing 3 attention heads to capture diverse relational patterns among neighbouring nodes and edges. All MLPs used in the model have a hidden size of 16 with ReLU as the activation function and an output size of 16. Each batch corresponds to one year of data; therefore, there are 8 batches in one epoch.
Training is performed using the Adam optimiser with a learning rate of 0.001 and gradient clipping at 5 to ensure stability. The experiments were conducted on a system equipped with an Intel i7 CPU, 16 GB of RAM, and a single NVIDIA RTX 4060 Ti GPU. The software environment included Windows 11 as the operating system, Python 3.9.18, TensorFlow 2.10.1, and Deep Graph Library (DGL) version 1.1.2 with CUDA 11.8 support.
Following best practices in empirical forecasting and applied machine learning [50–53], we introduce three commensurate benchmark models evaluated under identical CV protocols and performance metrics: (1) Ridge regression (L2 regularised linear regression) using the same segment-level feature set as Mukara; (2) a gravity-interaction baseline, a classical distance-decay formulation based on aggregated population and employment “masses” linked to observed traffic volumes via log-linear regression; (3) Random forest regressor, a non-linear ensemble model trained using the identical feature set. All baseline models are trained and evaluated under both random CV and spatial CV. Hyperparameters are tuned strictly within training folds to prevent leakage. Full methodological details for these baselines are provided in Appendix S7 (S1 File).
Results
Ablation study and tuning
In the first experiment, we conducted a grid search to identify optimal settings for the Mukara model. As shown in Table 6, the tuned hyperparameters included the number of channels in each CNN layer, the ROI size, the depth of the GAT block LGAT, the number of attention heads, and the dimensions of the node embeddings. Each model was trained for a maximum of 50 epochs, and the lowest MGEH and MAE losses were recorded.
The learning curve for the default model is shown in Fig 9. The curve demonstrates that the model learns effectively, with the lowest loss occurring around the 27th epoch. After this point, the model begins to overfit, as indicated by a gradual increase in test loss. Other models also tend to reach their best performance around this point, suggesting that 50 epochs are sufficient for the learning task. An early stopping mechanism was also examined, with a patience of 10 epochs and a learning rate decay schedule starting from 0.01 and decaying by a factor of 10 down to 0.00001. The best performance achieved under this new setting matches the peak performance without the mechanism.
The results of the hyperparameter tuning are presented in Fig 10. The optimal settings were found to be CNN channels of [16, 32, 64], an ROI size of 21 km x 21 km, 4 attention heads, a GAT depth (LGAT) of 5, and a node embedding size of 16. Based on these findings, the default model was updated to include 4 attention heads while retaining the other hyperparameter settings.
Several observations can be drawn from these experiments. First, the simplest model, which relies solely on edge features for prediction and does not use grid features or node embeddings, results in high loss. This finding emphasises that geographic and contextual information captured in the node embeddings is essential, as edge embeddings alone are insufficient for accurate predictions. A slightly more complex model that incorporates OD node embeddings in addition to edge features, but excludes GAT layers (LGAT = 0), also yields high loss. This demonstrates that adding only origin and destination embeddings to the edge representation is not sufficient for effective predictions. Models with a GAT depth of 5 or 6 achieved the lowest losses, suggesting that incorporating information from nodes up to 5 or 6 degrees away significantly enhances the model’s predictive capability. However, increasing the depth beyond this point led to overfitting.
The inclusion of multiple attention heads also improved performance, highlighting the benefit of passing multiple channels of information through the network. This effect is analogous to increasing the number of feature maps in CNNs, enhancing the model’s ability to capture diverse patterns and relationships.
Finally, the dimensions of the CNN channels and node embeddings were most effective when balanced. Channels and embeddings that were too small resulted in underfitting, as the model failed to capture sufficient information. Conversely, excessively large dimensions led to overfitting, where the model struggled to generalise due to capturing irrelevant or noisy features.
Performance evaluation
We evaluated the Mukara model using the optimal configuration identified through hyperparameter tuning. For each cross-validation scheme, the model was retrained within the training folds and evaluated exclusively on held-out data. The configuration achieving the lowest MGEH within the validation procedure was retained. The trained models were then used to generate traffic volume estimates for all eight years across the 498 highway trunk-road sensors. The corresponding results are presented in Figs 11 and 12.
(Left) Scatter plot comparing predicted traffic volumes with ground truth values for all sensor-year points, with GEH boundaries for reference. (Upper right) Histogram of mean GEH for each sensor, averaged over 8 years. (Lower right) Bar plots of MGEH and MAE for sensors grouped by traffic volume quartiles. Results are for the first fold of the cross-validation. Metrics shown are mean and standard deviation across folds.
Positive values (red) indicate overestimation, while negative values (blue) indicate underestimation. The maps reveal localised errors, particularly around areas such as Manchester, but no clear geographical trends overall.
Table 7 summarises comparative performance across models and validation schemes. Under random 5-fold CV, Mukara achieves a mean test MGEH of 50.74 (1.51), a test MAE of 8,989 (236) vehicles per day, and an R2 of 0.583 (0.027). These results substantially outperform all baseline models. The gravity model yields an MGEH of 84.23 (2.47) and an MAE of 14,836 (419), while ridge regression and random forest reduce errors further but remain clearly inferior to Mukara. The global mean predictor performs worst, as expected.
Under spatial CV, overall performance decreases modestly for all models, reflecting the more stringent evaluation setting. Mukara attains a mean MGEH of 57.63 (3.42), an MAE of 9,955 (612), and an R2 of 0.521 (0.072). Importantly, the relative performance ranking remains unchanged, and no systematic degradation is observed across regions. Slightly higher errors are observed in folds corresponding to London and the South West, which likely reflect distinct traffic regimes (extremely high-volume urban segments and lower-volume rural segments, respectively). The consistent advantage of Mukara under spatial CV demonstrates that the model generalises to geographically unseen regions. The performance of ridge regression and random forest under spatial CV is comparable to that of the Mukara variant with GAT depth of 0, indicating that models relying solely on local link-level and endpoint features achieve similar predictive capacity. The additional gains observed in the full Mukara configuration therefore arise from multi-hop message passing and structural context propagation across the road network. This confirms that incorporating non-local relational information provides measurable benefits over purely local models.
Fig 11 illustrates detailed prediction performance. The left panel presents a scatter plot comparing predicted traffic volumes with grounds across all sensor–year observations, with GEH reference thresholds overlaid. The upper-right panel shows the distribution of mean GEH values for each sensor averaged over eight years. The lower-right panel reports MGEH and MAE grouped by traffic volume quartiles. Consistent with Table 7, errors are larger for sensors with extremely low or extremely high traffic volumes, suggesting that extreme traffic regimes remain more challenging to model than medium-range volumes.
GEH reference ranges are commonly used as diagnostic guidelines rather than formal acceptance criteria. Because the GEH statistic scales with flow magnitude, higher reference ranges are typically applied when evaluating daily mean traffic volumes compared to hourly counts. Following established practice in large-scale assignment validation [48,54], values below 16 are treated as indicative of close agreement and values between 16 and 32 as reflecting moderate deviation for daily volumes. In the test sets under random CV, 18% of sensors achieve a MGEH below 16, and 49% fall below 32. The empirical distribution of MGEH across sensors further indicates a right-skewed pattern, with the 25th, 50th (median), and 75th percentiles equal to 23.88, 32.10, and 72.51, respectively.
Fig 12 presents spatial error maps showing signed MGEH values averaged over eight years. Positive values indicate overestimation and negative values indicate underestimation, with separate panels for northbound and southbound traffic. No strong large-scale geographic bias is observed. Errors appear localised rather than regionally systematic, further supporting the model’s spatial robustness.
In addition, we also provide detailed hierarchical aggregation results in Appendix S8 (S1 File), including region-level and national-level observed versus predicted totals under both random and spatial cross-validation. These supplementary tables report absolute and percentage deviations for each region, offering a complementary planning-scale evaluation of aggregation coherence beyond edge-level metrics.
Feature importance
In this section, we explore the relative importance of various input features in the Mukara model. First, we analyse how different levels of stratification in population and employment affect the model’s performance. As detailed in the population and employment subsection, level 1 stratification includes 7 channels for population (2 for sex and 5 for age) and 21 channels for employment (3 for work type and 18 for sector). Level 2 stratification expands to 10 channels for population and 54 channels for employment.
The results are illustrated in Fig 13. When population is the sole grid feature, increasing the level of stratification does not significantly reduce the loss. However, for employment, the introduction of stratified channels leads to a marked decrease in loss, particularly for level 2 stratification. Furthermore, when both population and employment are included, the model achieves its lowest loss values with higher stratification levels, surpassing the performance of either feature alone. This indicates that stratification allows the model to capture nuanced patterns in the grid features and leverage interactions between demographic and employment strata, such as age, sex, part-time/full-time employment, and sectors.
Increased stratification improves model performance for employment and combined features.
Next, we conduct a feature ablation study to evaluate the importance of each feature set. The full model, which uses all features, serves as the baseline. Six additional models are tested, each omitting one of the following features: population, employment, land use, road network, POI, and edge features. The percentage change in MGEH and MAE loss values is calculated relative to the baseline, revealing the importance of each feature. Fig 14 presents the radar plots summarising these changes across sensors grouped by overall performance and traffic volume tertiles (low, medium, and high levels).
The analysis is presented for overall performance and traffic volume tertiles (low, medium, high). Negative changes indicate a reduction in loss, suggesting possible overfitting or redundancy.
The results show that the removal of any feature generally increases the loss, highlighting their contribution to the model. Notably, land use emerges as the most critical feature, with its removal leading to the largest loss increase across all tertiles. Interestingly, removing employment results in a slight decrease in loss, suggesting possible redundancy or correlation with other features. For sensors with low and medium traffic volumes, employment, land use, POI, and edge features are particularly important, whereas high-traffic sensors exhibit less sensitivity to these features. In fact, for high-volume sensors, the loss reduction upon feature removal suggests potential overfitting or misleading patterns in the training data that fail to generalise to the test set.
These findings underscore the importance of carefully selecting and incorporating features in the Mukara model, as well as the need to account for variations in their relevance across different traffic volume levels. The results also highlight the value of stratifying features to improve the model’s ability to capture complex interactions in the data.
Discussion
This study proposes a methodological shift in traffic volume prediction by modelling weekday daily highway traffic volumes using an end-to-end deep learning framework that relies exclusively on external socioeconomic and spatial inputs obtainable from official statistics and OSM, without using historical traffic series as model inputs. Using the UK strategic road network as a case study, Mukara achieves a mean test MAE of 8,989 vehicles per day against an average daily traffic volume of 33,734.9 vehicles (relative error 26.6%) and a mean test R2 of 0.583 under random 5-fold CV. Under a more stringent nine-fold spatially blocked CV scheme based on England’s official regions, performance remains stable, with an MAE of 9,955 vehicles per day and an R2 of 0.521. Mukara outperforms all baseline models in both CV settings. The modest reduction in accuracy under geographic hold-out suggests that the model generalises effectively to spatially unseen regions. In comparison, a traditional FSM evaluation on an Istanbul case study reported a best-case %RMSE of approximately 100.92% [14]. Related work also reports lower or comparable performance under different settings and data sources: Das and Tsapakis [55] reported a mean R2 of 0.36 when predicting annual average daily traffic on low-volume roads using census and survey data, while Ganji et al. [56] achieved R2 = 0.58 using aerial imagery for urban roads. Narayanan et al. [57] reported higher R2 values in a metropolitan case study, but relied on synthetic traffic data rather than real-world observations, which typically contain greater noise and heterogeneity.
Importantly, Mukara consistently outperforms all commensurate baseline models evaluated under identical data splits and metrics. Under random CV, the gravity-interaction baseline achieves an R2 of 0.342, ridge regression 0.463, and random forest 0.504, all substantially below Mukara’s 0.583. Under spatial cross-validation, performance gaps widen further: the gravity model attains an R2 of 0.201, ridge regression 0.302, and random forest 0.361, compared to Mukara’s 0.521. Similar trends are observed for MAE and MGEH. The baseline models exhibit larger variance and stronger degradation under spatial blocking, indicating greater sensitivity to geographic distribution shifts. These results demonstrate that models relying solely on local link-level and endpoint features—or simple distance-decay formulations—have limited capacity to generalise across regions. By contrast, Mukara’s graph attention architecture with multiple depths captures structural context beyond immediate nodes, enabling information propagation across the network and yielding measurable performance gains under both random and strictly spatial evaluation settings.
From an applied-econometrics perspective [50,51,53], Mukara is framed explicitly as a predictive demand-approximation tool rather than a structural causal estimator. All performance claims are restricted to out-of-sample predictive accuracy under defensible validation protocols, and improvements are demonstrated relative to transparent, commensurate baseline models evaluated under identical spatially blocked cross-validation splits. This benchmarking strategy aligns with forecasting practice as discussed in Barkan et al. [52], where gains must be shown against simple and interpretable baselines while ensuring coherence across aggregation levels. In this study, we therefore evaluate performance at the edge level (primary target), examine regional aggregation under spatial cross-validation, and verify that improvements at the segment level translate into consistent aggregate patterns.
The framework has several practical implications. By substituting hand-specified, rule-based calculations with a data-driven predictive mapping, Mukara captures nonlinear interactions between external determinants and observed traffic volumes within a unified end-to-end modelling framework. At the same time, its input structure is aligned with the FSM tradition, which supports planning use cases where external scenarios (e.g., changes in population, employment, or land use) are available but historical traffic measurements may be sparse or unavailable. Although not tested outside the UK in this study, the design supports prediction on road segments with no prior flow observations, provided that comparable external features and network representations can be constructed. This property is relevant for data-sparse contexts. More generally, models that rely purely on historical data can struggle to anticipate network changes driven by new infrastructure, as illustrated by the Shenzhen–Zhongshan Link: while it alleviated congestion on the Humen and Nansha Bridges, it was associated with severe congestion within Shenzhen due to increased inflows to the city network [58].
Several limitations should be acknowledged. First, while Mukara is conceptually aligned with FSM logic, we did not include external models for direct comparison under identical data and assumptions. No existing deep learning method directly matches the present task setting—predicting highway-level daily volumes using external drivers in an FSM-like input format without historical traffic series—and implementing a traditional FSM on the UK network would require OD estimation and extensive calibration beyond the raw inputs used here. Without such tuning, FSM implementations can perform poorly in practice and would not provide a meaningful benchmark under the same assumptions. For this reason, we focus benchmarking on both statistical baselines and internal neural ablation variants evaluated under identical data splits and metrics, and on comparisons with published results that use different data, study designs, and evaluation settings.
Second, the study is predictive rather than causal. It does not attempt to identify exogenous effects of population, employment, land use, or network characteristics, and it does not resolve econometric identification concerns such as simultaneity, omitted variables, or reverse causality [59–61]. These variables co-evolve with transport systems over long time horizons, and Mukara should be interpreted as predicting realised traffic volumes conditional on observed spatial and socioeconomic configurations. Relatedly, the use of a static OSM snapshot for a multi-year traffic panel introduces temporal misalignment and potential measurement error. Although major road hierarchies and national-scale land-use structures in the UK evolve relatively slowly, this choice remains a pragmatic trade-off, and time-varying land-use and infrastructure datasets would be preferable where available.
Third, several modelling and data choices constrain performance and generalisability. Errors are higher for sensors with very low or very high volumes, which may reflect measurement noise, capacity constraints, junction effects, and the absence of time-varying operational drivers (e.g., incidents, weather, roadworks) that can be influential for extremes. In addition, sensor selection was designed to represent interurban trunk-road conditions with reliable coverage, including prioritising sensors near segment midpoints to reduce local access effects. While this reduces noise, any selection strategy may affect representativeness, and expanding coverage by matching more sensors to segments may improve training signal. Although the effective sample includes repeated edge–year observations, the number of distinct monitored segments remains a constraint when learning generalisable network-wide representations.
Finally, while Mukara supports spatial transfer to unmonitored links and regions within the studied network, external validation in a different geographic region or institutional context was not conducted. Transfer to regions with different modal structures, land-use patterns, road hierarchies, or data quality therefore remains an open empirical question in this paper.
These limitations motivate several directions for future research. A first practical step is to expand training coverage by automatically matching all available sensors to road segments rather than manually selecting a single representative sensor per segment, thereby increasing the number of targets and capturing within-segment variability. A second direction is to diagnose which parts of the FSM the model approximates well and where the main bottlenecks lie. For example, Deep Gravity has shown that OD distribution can be predicted using external determinants [31], suggesting that remaining challenges may be more related to modal split and assignment-like behaviour. Controlled experiments using synthetic or semi-synthetic data could support systematic evaluation of deep learning alternatives for individual FSM steps, or combinations of steps, under known ground truth. A third direction is architectural refinement. CNNs provide a convenient mechanism for aggregating gridded spatial context, but alternative spatial encoders (e.g., Vision Transformers) may better represent multi-scale or long-range spatial structure. Likewise, while GNNs are effective for information propagation, assignment-like behaviour may benefit from architectures that allow more interpretable interactions between embeddings, potentially integrating explicit impedance representations with structured attention or routing-inspired modules.
A broader development agenda includes (a) computational efficiency and suitability for real-time deployment, (b) extension to urban contexts, (c) long-term forecasting, (d) finer temporal resolutions such as daily or hourly prediction, (e) integration of uncertainty estimation through probabilistic modelling, and (f) transferability to entirely new regions with minimal adaptation. The current implementation is computationally lightweight, with memory usage peaking at around 11 GB. Training takes approximately 2 minutes per epoch (or about 15 seconds per yearly time step), and inference over the full network can be completed within about 5 seconds under the current setup. This runtime profile supports practical deployment and becomes increasingly relevant if the model is extended to higher temporal resolutions.
Extending Mukara to urban contexts is theoretically feasible but more challenging due to dense networks, frequent intersections, and stronger interactions with public transport systems. Addressing these complexities may require additional datasets such as public transport supply, signal timing, and richer operational information. For forecasting, the current annual-resolution model can generate scenario-based projections if future external drivers (e.g., population and employment forecasts) are available, but extrapolation far beyond the training range should be interpreted cautiously. Refining the temporal resolution to daily or hourly predictions would require modelling temporal dynamics (seasonal, weekly, diurnal cycles) and incorporating dynamic drivers such as weather, holidays, special events, and road maintenance logs. Adding static modal context (e.g., public transport availability, schedules, and costs) may further improve realism in multimodal settings.
We also tested an uncertainty-aware extension using heteroscedastic regression with a Gaussian negative log-likelihood loss to estimate both mean and variance. This variant produced weaker point prediction performance and less stable convergence, with minimum MAE increasing from roughly 9,000 to around 12,000. These results were therefore not included in the main evaluation. Nonetheless, uncertainty estimation remains important for planning and risk-aware applications, and alternative approaches such as Bayesian neural networks or ensemble methods may provide better-calibrated predictive intervals while preserving point accuracy.
Regarding transferability, Mukara can be applied to road segments with no historical traffic observations because it relies on external determinants rather than lagged traffic states. The attention-based message passing supports generalisation across network structures when comparable features are available. In practice, a lightweight fine-tuning procedure using a small amount of local data may help capture regional differences while retaining the advantages of minimal data requirements. However, full transfer to regions with distinct cultural, infrastructural, or institutional contexts remains to be evaluated.
Finally, issues of welfare analysis, economic efficiency, and market failure identification, while important, fall outside the scope of this study. Mukara is not intended to evaluate optimality or efficiency of observed traffic patterns, but to predict realised traffic volumes conditional on existing spatial and socioeconomic configurations. Extending the framework toward welfare-aware or policy-evaluative applications would be a valuable direction for future research.
Conclusion
This study proposes Mukara, an end-to-end deep learning framework for predicting weekday daily traffic volumes on highway trunk road segments using only external socioeconomic, land-use, and network-related features. Using the UK trunk-road network as a case study, Mukara achieved a mean test MGEH of 50.74 and a mean test R2 of 0.583. These results are comparable to, and in some cases outperform, existing studies conducted under different settings, while addressing a more restrictive prediction task that excludes historical traffic observations. Ablation experiments showed that accurate prediction depends on the joint modelling of spatial context and network structure, with land-use features playing a particularly important role.
Mukara is intended as a predictive, planning-oriented framework. Within this scope, the results demonstrate that an integrated, representation-learning approach can approximate key elements of the demand-to-flow relationship without explicitly modelling individual steps of the four-step framework or requiring extensive calibration.
Future research could extend this framework by expanding sensor-to-segment matching to increase training coverage, incorporating time-varying spatial and network attributes, and improving the treatment of very low and very high traffic volumes. Additional work is also needed to evaluate transferability across different geographic and institutional contexts and to assess the performance of the approach under alternative temporal resolutions or modelling assumptions.
Supporting information
S1 File. Supplementary appendix (PDF).
Contains Appendix S1–S8. Appendix S1 summarises the four-step travel demand model (FSM). Appendix S2 describes the sensor selection procedure, and Appendix S3 evaluates the representativeness of selected sensors. Appendix S4–S6 present robustness and sensitivity analyses, including the OSM vintage robustness test, weekend and holiday inclusion sensitivity, and GEH loss sensitivity analysis. Appendix S7 details the specification of baseline models, and Appendix S8 reports hierarchical aggregation consistency and planning-scale evaluation results.
https://doi.org/10.1371/journal.pone.0345576.s001
(PDF)
Acknowledgments
This research was supported by the Cambridge Commonwealth, European and International Trust. Additional support was provided by the Martin Centre for Architectural and Urban Studies. The authors gratefully acknowledge the Office for National Statistics, OpenStreetMap contributors, and National Highways for providing the data used in this study. The authors also thank colleagues and external experts for valuable discussions and feedback. The views expressed are those of the authors and do not necessarily reflect those of the supporting institutions. All code used in this study is publicly available at https://github.com/yueli901/mukara. All data used in this study are obtained from publicly accessible sources. UK Office for National Statistics data are licensed under the Open Government Licence, and OpenStreetMap data are licensed under the Open Database License (ODbL).
References
- 1. Wang J, Li H, Wang Y, Yang H. A novel assessment and forecasting system for traffic accident economic loss caused by air pollution. Environ Sci Pollut Res Int. 2021;28(35):49042–62. pmid:33928504
- 2.
Zhao Z, Zhou D, Wang W, Dai J, Yang R, Hu Q. Research progress on road traffic accident prediction based on big data methods. In: Wang W, Guo H, Jiang X, Shi J, Sun D. Smart transportation and green mobility safety. Singapore: Springer Nature Singapore. 2024. 121–43.
- 3. Mystakidis A, Koukaras P, Tjortjis C. Advances in Traffic Congestion Prediction: An Overview of Emerging Techniques and Methods. Smart Cities. 2025;8(1):25.
- 4. Afrin T, Yodo N. A Survey of Road Traffic Congestion Measures towards a Sustainable and Resilient Transportation System. Sustainability. 2020;12(11):4660.
- 5. Chen S, Li Y, Jin Y. Social and environmental disparities in mental health benefits from active transport in the UK: a causal machine learning analysis. Transportation Research Part A: Policy and Practice. 2026;204:104809.
- 6.
UK Department for Transport. Transport Statistics Great Britain 2022: Domestic Travel. 2023. https://www.gov.uk/government/statistics/transport-statistics-great-britain-2023/transport-statistics-great-britain-2022-domestic-travel
- 7.
Tikoudis I, Papu Carrone A, Mba Mebiame R, Lamhauge N, Hassett K, Bystrom O. Household transport choices: New empirical evidence and policy implications for sustainable behaviour. Paris: OECD Publishing. 2024.
- 8. Dargay J, Gately D, Sommer M. Vehicle Ownership and Income Growth, Worldwide: 1960-2030. The Energy Journal. 2007;28(4):143–70.
- 9. Boukerche A, Tao Y, Sun P. Artificial intelligence-based vehicular traffic flow prediction methods for supporting intelligent transportation systems. Computer Networks. 2020;182:107484.
- 10. Medina-Salgado B, Sánchez-DelaCruz E, Pozos-Parra P, Sierra JE. Urban traffic flow prediction techniques: A review. Sustainable Computing: Informatics and Systems. 2022;35:100739.
- 11.
Ortúzar JdD, Willumsen LG. Modelling Transport. 5th ed. Chichester, UK: Wiley. 2011. https://www.wiley.com/en-us/Modelling+Transport%2C+5th+Edition-p-9781119242813
- 12.
Mladenovic M, Trifunovic A. The Shortcomings of the Conventional Four Step Travel Demand Forecasting Process. Journal of Road and Traffic Engineering. 2014.
- 13. Najmi A, Rashidi TH, Vaughan J, Miller EJ. Calibration of large-scale transport planning models: a structured approach. Transportation. 2019;47(4):1867–905.
- 14.
Gachanja JN. Towards integrated land use and transport modelling: evaluating accuracy of the four step transport model- the case of Istanbul, Turkey. 2010. http://essay.utwente.nl/90760/
- 15. Jiang W, Luo J. Graph neural network for traffic forecasting: A survey. Expert Systems with Applications. 2022;207:117921.
- 16. Lecun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86(11):2278–324.
- 17. Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997;9(8):1735–80.
- 18. Graves A, Schmidhuber J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 2005;18(5–6):602–10. pmid:16112549
- 19.
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. 2014.
- 20.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN. Attention is All You Need. In: Advances in Neural Information Processing Systems, 2017. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- 21. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The graph neural network model. IEEE Trans Neural Netw. 2009;20(1):61–80. pmid:19068426
- 22.
Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph Attention Networks. 2018.
- 23. Zhang J, Zheng Y, Qi D. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. AAAI. 2017;31(1).
- 24.
Li Y, Yu R, Shahabi C, Liu Y. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In: 2018.
- 25. Yao H, Wu F, Ke J, Tang X, Jia Y, Lu S, et al. Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction. AAAI. 2018;32(1).
- 26. Wang C, Zuo K, Zhang S, Lei H, Hu P, Shen Z, et al. PFNet: Large-Scale Traffic Forecasting With Progressive Spatio-Temporal Fusion. IEEE Trans Intell Transport Syst. 2023;24(12):14580–97.
- 27. Kong X, Xing W, Wei X, Bao P, Zhang J, Lu W. STGAT: Spatial-Temporal Graph Attention Networks for Traffic Flow Forecasting. IEEE Access. 2020;8:134363–72.
- 28.
Pan Z, Liang Y, Wang W, Yu Y, Zheng Y, Zhang J. Urban Traffic Prediction from Spatio-Temporal Data Using Deep Meta Learning. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019. 1720–30. https://doi.org/10.1145/3292500.3330884
- 29. Pan Z, Zhang W, Liang Y, Zhang W, Yu Y, Zhang J, et al. Spatio-Temporal Meta Learning for Urban Traffic Prediction. IEEE Trans Knowl Data Eng. 2022;34(3):1462–76.
- 30. Wang S, Lv Y, Peng Y, Piao X, Zhang Y. Metro Traffic Flow Prediction via Knowledge Graph and Spatiotemporal Graph Neural Network. Journal of Advanced Transportation. 2022;2022:1–13.
- 31. Simini F, Barlacchi G, Luca M, Pappalardo L. A deep gravity model for mobility flows generation. Nature Communications. 2021;12(1):6576.
- 32. Zhang X, Zhao X. Machine learning approach for spatial modeling of ridesourcing demand. Journal of Transport Geography. 2022;100:103310.
- 33. Kalantari HA, Sabouri S, Brewer S, Ewing R, Tian G. Machine learning in mode choice prediction as part of MPOs’ regional travel demand models: is it time for change?. Sustainability. 2025;17(8).
- 34. Hu X, Xie C. Use of graph attention networks for traffic assignment in a large number of network scenarios. Transportation Research Part C: Emerging Technologies. 2025;171:104997.
- 35. Cervero R, Kockelman K. Travel demand and the 3Ds: Density, diversity, and design. Transportation Research Part D: Transport and Environment. 1997;2(3):199–219.
- 36. Duranton G, Turner MA. The Fundamental Law of Road Congestion: Evidence from US Cities. American Economic Review. 2011;101(6):2616–52.
- 37.
Wilson AG. Entropy in Urban and Regional Modelling. 1st ed. Routledge: Routledge. 1970.
- 38. Wegener M. Overview of land use transport models. Journal of Discrete Algorithms. 2004;5.
- 39.
National Highways. National Highways Strategic Road Network. National Highways. 2023. https://nationalhighways.co.uk/our-roads/roads-we-manage/
- 40.
National Highways. Traffic Information System (TRIS). 2024. https://webtris.highwaysengland.co.uk/api/swagger/ui/index
- 41.
Office for National Statistics. Population Estimates - Small Area (2021 Based); 2011-2022. https://www.nomisweb.co.uk/sources/pest
- 42.
Office for National Statistics. Business Register and Employment Survey (BRES). 2022. https://www.nomisweb.co.uk/sources/bres
- 43.
Office for National Statistics. Lower Layer Super Output Areas (December 2011) Boundaries EW BFC (V3). 2011. https://dservices1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/services/Lower_layer_Super_Output_Areas_Dec_2011_Boundaries_Full_Clipped_BFC_EW_V3/WFSServer?service=wfsrequest=getcapabilities
- 44.
Office for National Statistics. Lower Layer Super Output Areas (December 2021) Boundaries EW BFC (V10). 2024. https://services1.arcgis.com/ESMARspQHYMw9BZ9/arcgis/rest/services/Lower_layer_Super_Output_Areas_December_2021_Boundaries_EW_BFC_V10/FeatureServer
- 45.
Geofabrik GmbH, OpenStreetMap C. OpenStreetMap Data Extracts: United Kingdom. 2022. https://download.geofabrik.de/europe/united-kingdom.html#
- 46.
Office for National Statistics. Countries (December 2023) Boundaries UK BFC. 2024. https://geoportal.statistics.gov.uk/datasets/ons::countries-december-2023-boundaries-uk-bfc-2
- 47.
National Highways. National Highways: Managing England’s Strategic Road Network. 2024. https://nationalhighways.co.uk/
- 48.
Feldman O. The GEH Measure and Quality of the Highway Assignment Models. London, UK: Association for European Transport and Contributors. 2012. https://aetransport.org/public/downloads/V7AGa/5664-5218a2370407f.pdf
- 49. Esugo M, Haas O, Lu Q, Havers GE. Hybrid deep-learning approach with Geoffrey E. Havers-based loss function and evaluation metric for multilocation traffic-flow forecasting. Transportation Research Record. 2023;0(0):03611981241274645.
- 50. Varian HR. Big Data: New Tricks for Econometrics. Journal of Economic Perspectives. 2014;28(2):3–28.
- 51. Mullainathan S, Spiess J. Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives. 2017;31(2):87–106.
- 52. Barkan O, Benchimol J, Caspi I, Cohen E, Hammer A, Koenigstein N. Forecasting CPI inflation components with Hierarchical Recurrent Neural Networks. International Journal of Forecasting. 2023;39(3):1145–62.
- 53. Bajari P, Nekipelov D, Ryan SP, Yang M. Machine Learning Methods for Demand Estimation. American Economic Review. 2015;105(5):481–5.
- 54. Friedrich M, Pestel E, Schiller C, Simon R. Scalable GEH: A Quality Measure for Comparing Observed and Modeled Single Values in a Travel Demand Model Validation. Transportation Research Record: Journal of the Transportation Research Board. 2019;2673(4):722–32.
- 55. Das S, Tsapakis I. Interpretable machine learning approach in estimating traffic volume on low-volume roadways. International Journal of Transportation Science and Technology. 2020;9(1):76–88.
- 56. Ganji A, Zhang M, Hatzopoulou M. Traffic volume prediction using aerial imagery and sparse data from road counts. Transportation Research Part C: Emerging Technologies. 2022;141:103739.
- 57. Narayanan S, Makarov N, Antoniou C. Graph neural networks as strategic transport modelling alternative - A proof of concept for a surrogate. IET Intelligent Transport Systems. 2024;18(11):2059–77.
- 58.
Post SCM. Long traffic jams greet Hong Kong tourists trying out new Shenzhen-Zhongshan link. South China Morning Post. 2024.
- 59. Athey S, Imbens GW. The state of applied econometrics: causality and policy evaluation. Journal of Economic Perspectives. 2017;31(2):3–32.
- 60.
Athey S. The Impact of Machine Learning on Economics. University of Chicago Press. 2018. 507–47.
- 61. Anselin L. Thirty years of spatial econometrics. Papers in Regional Science. 2010;89(1):3–26.