Figures
Abstract
Urban environments, characterized by their complex, multi-layered networks encompassing physical, social, economic, and environmental dimensions, present significant challenges for sustainable urbanization. These challenges, ranging from traffic congestion and pollution to social inequality, call for advanced technological interventions. The technology innovation in big data, artificial intelligence, urban computing, and digital twins have laid the groundwork for sophisticated city simulation. However, a gap persists between these technological capabilities and their practical implementation in addressing urban issues because they often fall short of capturing the complex and subtle human behaviour in urban space. The recent advance in large language model (LLM) agents shows emergent abilities of human-like behaviour simulation, presenting important opportunities for characterizing human behaviour in urban studies. This paper provides a comprehensive review on the recent literature about the technology development of urban computing, digital twins, LLM agents and beyond, as well as the interdisciplinary studies on complex urban system and agent-based modeling. Moreover, we conceptualize a novel Urban Generative Intelligence platform that grounded LLM agents in simulated urban environment. The UGI platform allows LLM agents to operate within a textual urban environment emulated by city simulator, interact through a natural language interface, offering an open platform for diverse intelligent and embodied urban tasks. Such platform unleashes the power of LLM agents for complex urban system simulation, providing a novel approach to understand and manage urban complexity.
Citation: Xu F, Zhang J, Gao C, Liu P, Feng J, Li Y (2026) Towards a foundational platform for generative agents in simulated city environment. PLOS Complex Syst 3(3): e0000093. https://doi.org/10.1371/journal.pcsy.0000093
Editor: Hocine Cherifi, Université de Bourgogne: Universite de Bourgogne, FRANCE
Published: March 13, 2026
Copyright: © 2026 Xu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: The authors received no specific funding for this work.
Competing interests: The authors declare no competing interests.
1. Introduction
Urban are complex systems with dynamic and multi-layered networks encompassing physical elements (buildings, roads, infrastructure), social structures (population distribution, organizations, culture), economic activities (industry, services, commerce), and environmental factors (natural resources, ecosystems, climate change) [13,18,20]. This intricate interplay creates uncertainty and dynamism, reflecting the complex interactions between human activities and the urban environment [20,151]. Each individual, community, and organization within these systems is interconnected, influencing the city’s overall characters and functionality [97,150]. The primary challenge in urban complex systems is balancing economic growth, social welfare, and environmental sustainability amid rapid urbanization, and it faces critical challenges including traffic congestion, environmental pollution, resource scarcity, and infrastructure strain, all exacerbated by rapid urbanization [182]. Besides, social inequality and housing issues further impact residents’ quality of life [178,179]. The growing threats of climate change, such as extreme weather and rising sea levels, add to these challenges, highlighting the urgent need for solutions. Addressing these prominenturban issues is essential to ensure sustainable, equitable urban development and maintain the vitality of cities in a rapidly evolving global context [2,4].
In order to address above problems, the recent technological evolution began with urban data intelligence, which provided rich and variety urban information [3,175]. The complexity of this data necessitated the development of artificial intelligence (AI) for effective description, prediction and management [5,184]. This synergy led to urban computing [182], which applies AI to big data for urban problem-solving. Building on this, digital twins [14] and city simulations [172,173] emerged, creating virtual models of cities using real-time data and AI analysis. These technologies represent a progression from data collection (big data), to data analysis (AI), to application (urban computing), and finally to advanced simulation and modeling (digital twins and city simulators). Each stage builds upon the last, offering increasingly sophisticated tool-chain for tackling urban complexities. However, despite these advancements, the ability of these technologies to systematically address the complexity of urban issues lies in the gap for comprehending and modeling complex human behaviour. This shortfall implies that while these technologies are invaluable tools, they cannot yet address the intricate and systemic challenges cities face, necessitating further advancements in more human-like agents to simulate complex behaviour of urban residents.
The advent of artificial general intelligence, particularly large language models (LLMs), as a form of human-like intelligence presents a unique opportunity in addressing urban challenges [28,52,144]. LLMs demonstrate emergent intelligence capabilities, mimicking human cognitive process to analyze and reason vast datasets. The human-like intelligence of LLMs can significantly contribute to the intelligence of urban systems by providing deep and context-aware insights. They can identify patterns, sentiments, and trends within urban discourse, offering a more nuanced understanding of the complex social and economic dynamics at play in urban environments, which further supports smart decision-making in urban planning and policy formulation. Thus, the integration of LLM agents into urban systems holds great promise for finding comprehensive solutions to the multifaceted challenges in cities, which will push the urban technology to the next stage of urban intelligence.
In this paper, we provide a comprehensive review on the recent literature about the technology development of urban computing, digital twins, LLM agents and beyond, as well as the interdisciplinary studies on complex urban system and agent-based modeling. On top of these recent studies, we conceptualize Urban Generative Intelligence (UGI), a foundational platform that fosters the application and evolution of generative intelligence in urban space. The UGI platform is built on top of an open digital infrastructure that consists of the UrbanKG knowledge graph [93] and a city simulator engine [173]. This infrastructure is capable of simulating realistic urban interactions based on multi-source urban data and providing embodied feedback to intelligent agents, setting it apart from existing sandboxes [109] or virtual environments [44]. We propose to use a standard language interface to expose the access to this digital infrastructure, facilitating the easy plugin of language models and the development of generative and embodied agents. Using LLM as a generative intelligence core, we propose a general framework of creating embodied agents for various urban tasks, such as planning transportation system, assisting policy-making, and simulating urban life and socioeconomic interactions [53]. Moreover, this general framework can be easily adapted to build customized agents, fostering the emergence of diverse intelligent agents supporting various aspects of urban intelligence. Through the realistic embodied feedback provided by digital infrastructure and the interactions with other agents, the LLM-empowered agents are able to learn from the environment, develop their own understanding, and further evolve their intelligence to deal with complicate urban tasks. These components collectively form a foundational platform to facilitate the emergence and advancement of generative intelligence in urban space.
As shown in Fig 1, we summarize the main content and core contributions of this paper. The main
contributions of the present work can be summarized in the following points: (1) We propose a conceptual framework for foundational UGI platform, which couples LLM agents with Urban knowledge graph and city simulator via Natural Language Interface. This framework enables languagenative, tool-grounded LLM agency in urban tasks. (2) We extensively review case studies in LLM agents for social, economic and physical mobility beahviour simulation, which can integrate domain knowledge and toolkits, moving from generic role-play to theory-aligned decisions. (3) We connect the UGI design with scalable execution and evaluation protocols, enabling empirical assessment against real data and stylized facts. (4) We delineate current, emerging, and remote capabilities and discuss assumptions, risks, and ethical guardrails to guide responsible adoption.
2. Related Work
2.1. Urban computing
urban issues like traffic congestion, energy consumption, and air pollution with the help of computer science algorithms and urban big data. In the early years, the research focused on the spatiotemporal data analytics and management [86,164,183,186,187] and its applications in urban planing [149,152,163,185], transportation [95,141,164] and environment [41,115]. With carefully designed data processing and fusion techniques in different fields, these works achieved remarkable results. In the past 10 years, with the application of deep learning methods, urban computing made significant process on various prediction problems in the urban space, e.g., traffic prediction [61,82,136] and individual mobility prediction [50,159,174] which are widely used in transportation, epidemic modeling and environmental studies. STResNet [174] first introduces the convolution neural network into the crowd flow prediction problem to better model the spatial correlation between regions. DeepMove [50] proposes to utilize the power of recurrent neural network and attention mechanism to capture the periodic pattern of individual mobility. ASTGCN [61] applies graph neural network to the traffic network to capture the spatio-temporal correlations between road segments.
While these data analytics and prediction methods from urban computing have helped us better understand and predict urban spaces from different aspects, they are limited in tackling many reallife issues that require counterfactual inference and decision-making capabilities. Thus, recent works have further extended the concept of urban intelligence into new fields like behavior simulation [51,165,167] and decision-making [33,142,143,184]. MoveSim [51] simulates human movement behaviors and is applied in epidemic modeling. Additionally, SAND [167] develops a knowledgedriven framework to simulate human activities with Maslow’s hierarchy of needs. Intellilight [143] is the first to utilize deep reinforcement learning for intelligent traffic light control problem. Zheng et al. [184] propose an artificial intelligence urban-planning model to generate spatial plans for urban communities using graph neural networks and deep reinforcement learning. Furthermore, researchers are also exploring how to extract urban knowledge from the urban big data and utilize the it to address urban issues. Knowledge-graph-based approaches become one of the mainstream methods [93,138,190].
However, despite the above advancements, these methods often oversimplify the assumptions of
specific real-life problems and fail to address the complex issues of urban system characterized by high variance and requiring common knowledge in practice. Recently, the rapid development of large language models (LLMs) with incredible human-like cognitive abilities and common sense knowledge has provided us new opportunities to address these issues. The integration of LLMs and LLM agents into urban computing field will enable the more real-world applicable methods and frameworks for pattern discovery, urban dynamics prediction and decision-making in the urban space from a systematic perspective.
2.2. Digital Twin City
Digital Twin City (DTC) refers to the emulation of a city in a digital environment, enabling realtime sensing, analysis, and optimization of urban systems through the use of data-driven models [189]. DTC stands as a significant trend in smart city research, finding applications in urban planning, traffic management, environmental protection and disaster response. Recent years have witnessed substantial progress in various information technologies, significantly accelerating the development pace of DTC. On one hand, advancements in sensor devices have facilitated the collection of vast amounts of data from multiple sources, including aerospace satellites [157], aircraft and drones [104], smartphones and mobile terminals [30], smart wearable devices [122], industrial and household monitoring equipment [74], and wireless sensing devices [116]. A recent focus has been on crowd-sensing methods that leverage distributed smart devices within a crowd, exhibiting distinctive characteristics [32,60], focusing on integrating and analyzing digital footprints left by large-scale crowds to establish reliable and semantic-rich representations of group behavior across spatial domains. On the other hand, the analysis and optimization of urban data heavily rely on machine learning algorithms. The complex spatio-temporal data of DTC present challenges for forecasting and decision-making, and the advent of AI [56] has significantly improved data processing efficiency. Advancements like differentiable decision trees [128] or knowledge graphs [22,162] can yield more informed and contextually sound outcomes through utilizing inherent comprehension of domain intricacies. Deep reinforcement learning provides a direct path to model spatio-temporal data through trial-and-error learning with agents [166] for urban decision-making tasks including traffic signal [100] or navigation [90].
It is important to acknowledge that the digital twin city technology to solve urban problems still faces several challenges and problems. Firstly, there is a strong need to enhance the capability to process the multi-source urban data to improve credibility and realism. Secondly, the processing and computation of large-scale data pose challenges for real-time updates and evolution, and how to process complex instructions and provide human-friendly outputs, making DTC more accessible to policymakers and urban planners, is crucial for scenario testing and decision-making processes. Our proposed LLM-empowered fundamental city platform aims to solve these problems to upgrades DTC into the practice of urban problem solving.
2.3. LLM agents
Large language models [180] such as ChatGPT [108], LLaMA [134], Alphca [132], and GLM [169], are the recent advances of artificial intelligence, which learns from the large corpus, with emergent abilities in understanding and generating language texts. Since language is the most basic tool for humans interacting with the world [62], the human-like language ability endows large language models with high-level capacity, including reasoning [125] and decision-making [34]. Therefore, large language models are considered a promising approach for artificial general intelligence [29].
To achieve urban generative intelligence, it is required to endow the existing large language with essential abilities for urban scenarios. It is worth mentioning that there are some recent attempts to build city-related large language models. Deng et al. [40] proposed to fine-tune Llama-7B with Geoscience Academic Knowledge Graph and relevant research papers in the geoscience field. The fine-tuned model obtained the ability to understand these professional concepts, and support the basic QA tasks. Similar solutions with fine-tuned large language models with geo-related corpus include GeoLM [88] and GeoLLM [101]. Zhang et al. [176] utilize the large language model to help process the user query in traffic-related tools. GeoGPT [177] took advantage of the ability in tool usage, considering large language model a bridge in connecting the practitioners with GIS software. Despite these efforts, they are still limiting in only understanding some city-related concepts, without fully considering what abilities a real human has in the urban scenarios, i.e., urban generative intelligence, limiting these works’ application. In this paper, we pay attention to the construction of foundational model, simulation environment, and embodied agents for the urban context
Despite the astonishing performance in various tasks of natural language processing, the large language models can also be used as an agent [140,147], which can act and behave like a real human, serving as a virtual agent for personalized purposes. That is, the agent can be a digital twin in various scenarios, such as representing a real human to interact with other humans in social networks in Metaverse applications. One of the most critical challenges is to extend the language ability to more dimensions of models, from environment perception and action execution [170]. On the one hand, the environment can be represented as textual descriptions [109], which can be naturally perceived by the large language models; on the other hand, recent advances build the multi-modal ability by the alignment among different modals [68], including textual, visual, etc. As for action execution, it is widely acknowledged that the agent can leverage various tools well, which extend the action space to language into real-world actions, supporting various interactions between the large language model agent and the environment. That is, it endows the large language models with the ability of embodied perception, reasoning, and action, which is also known as embodied agent [43,105,137,171]. Specifically, these works approach the embodied tasks in the real-world environment, such as navigating and controlling robots, and take advantage of the reasoning and decision-making abilities of large language models. Moreover, to ensure the agents can take embodied actions, these works connect the textual output with a tool or another action-execution module.
However, these works only consider simple environments such as a room or virtual game environment. In addition, the tasks are relatively simple. Unlike these works, in this paper, we present a far more complex problem in building fundamentally embodied agents in the city environment, and the agents can have almost all kinds of embodied behaviors of real humankind.
2.4. Metaverse
The Metaverse epitomizes a collective virtual shared space, formed from the fusion of physical and digital realities, typically accessed through immersive technologies such as virtual reality (VR) [81] and augmented reality (AR) [107]. This digital universe offers an array of interactive experiences, social interactions, and economic activities, presenting various research questions and avenues for advancement. At its philosophical core, the Metaverse is entwined with the reality-virtuality continuum [35]. On the reality aspect, it draws inspiration from the concept of the digital twin, meticulously replicating the physical world within the virtual sphere. This comprehensive replication captures physical objects, interactions, and dynamics, integrating reality into the digital landscape [48]. Conversely, the virtuality facet of the Metaverse revolves around generating entities within the digital realm [23]. These virtual creations, born from human imagination and innovation, surpass physical limitations and showcase human ingenuity in the virtual domain. Recent strides in Artificial Intelligence Generated Content (AIGC) notably advance this field [96,114].
Therefore, it comes as no surprise that research in this field is currently focused on two primary directions, one of which involves progress related to devices closely associated with the physical world. In contemporary times, Metaverse applications vary based on execution devices, such as tabletops, projectors, hand-held touchscreen devices, and headsets [10,58,103]. These devices play a pivotal role in creating seamless, immersive experiences. Recent studies have delved into innovative solutions aiming to prevent information overload [42], alleviate cognitive load [75], explore eye-tracking technologies [118], synchronize visual-motor responses [76], or leverage natural finger positioning [154]. These efforts aim to create more accessible and intuitive entry points into the physical world. In the virtual realm, advancements in Artificial Intelligence (AI) have revolutionized artistic creation. With the success of diffusion models [119] and applications like Midjourney, demonstrating high-quality, real-life image generation capabilities, creators within the Metaverse can now generate diverse types, contents, and styles of artworks through simple textual prompts. Similarly, generative works in film [46], audio [72], or poetry [91] have produced impressive content, ultimately providing users with experiences derived from reality but elevated beyond it. This revolution extends further to architectural designs [69], landscape design [11], and urban planning [184], offering exciting possibilities to create a metaverse city.
Looking ahead, the Metaverse is poised for rapid evolution and expansion, particularly in encompassing entire cities and creating an equitably accessible City Metaverse. Future developments may involve enhanced interoperability between diverse virtual environments, integration of AI for personalized experiences, advancements in haptic feedback systems [39], and exploration of blockchain [77] for decentralized virtual economies. Our proposed foundational platform aims to build the future city metaverse with embodied agent to achieve urban generative intelligence.
2.5. Complex urban system
Cities have long been viewed as a complex system of interconnected humans, things and space [13]. Extensive research attentions are devoted to review the universal patterns of various statistics in complex urban system, such as city size and morphology shape [12]. Specifically, previous works find cities exhibit typical fractal morphology [15], contradicting the common practice in urban planing, but is a symbolic feature of complex system. Besides, a large body of literature is devoted to reveal the scaling laws in urban space [21]. They find universal patterns of super-linear growth of economy, innovation and crimes, and sub-linear growth of infrastructure investment as city size increases. Previous works build theoretical framework to explain these scaling laws from the interaction mechanisms of urban agents [19], which root in the increased interaction frequency in the compact urban space of large cities [124]. In recent years, increasing research attention is drawn to the complex challenges faced by modern cities, ranging from climate change to economic inequality, especially as the detailed data of social fabric is increasingly available [33,161]. Researchers are dedicated to understanding the complex and concerning phenomena of experienced segregation [106], widening socioeconomic gaps [25], and prevalence of slums [26].
Motivated by these empirical studies, extensive previous works aim to design agent-based model to explain complex urban system from a bottom up perspective. For example, researchers propose to use diffusion limited aggregation model to reproduce fractal urban morphology as the process of physical particles [16]. A later study uses correlated percolation model on spatial lattice to explain urban morphology [99]. Recent study finds the scaling laws of urban growth and fractal urban morphology can naturally emerge from agent-based model of human mobility [148]. In terms of complex urban challenges, agent-based model has been leveraged to reproduce and explain the ubiquitous segregation in urban space [123]. Researchers have also developed bottom up model to explain the emergence of varying levels of socioeconomic inequality as the provision of universal basic urban service change [25].
However, previous studies mainly use agents guided by simple rules. The recent advance of large language models provides unique opportunity to design generative agents with much more sophisticated intelligence. Such agents have been proved feasible in simulating virtue village [109] and company [66]. By leveraging the human like intelligence and bias encoded in large language model, these generative agents have the potential to explain more complex social phenomena. For example, recent studies show large language model-driven agents exhibit similar content bias in transmission chain [1], and can reproduce the typical social processes like wisdom of crowd [6] and social conformity [188]. However, these agents are all simulated in simple and virtual environment. It is still unclear if they are sophisticated enough to interact with complex urban environment. Hence, it is an important research direction to expose language model-driven agents to the rich information collected from urban systems or generated by realistic simulators.
2.6. Agent-based modeling
Agent-based modeling is an important and powerful approach to modeling complex systems, such as the city system, understanding, analyzing, explaining, and even predicting the dynamics of the systems [17]. Generally speaking, simulation can be divided into macrosimulation and microsimulation. Macrosimulation refers to simulating the system from an aggregate or high level, which focuses on the trends and behaviors within a system or a population without focusing on the individual-level characteristics. Specifically, macrosimulation may deploy several equations to describe how the critical variables in the system affect each other. However, it is quite challenging to formulate the equations since real-world systems are always very complex, which motivates the microsimulation, which is also known as agent-based simulation. On the other hand, microsimulation focuses on individual entities within a system.
In general, agent-based simulation aims to model the behavior of individual components or agents to understand their interactions and how they collectively contribute to the overall system. For example, the famous Cellular Automata [146] is comprised of discrete cells, each following a set of rules based on their neighboring cells. The simulations based on setting rules for each individual can often showcase emergent behaviors, where complex patterns arise from simple interactions between individual components. Since it is general, agent-based simulation is extensively used in various fields, such as biology [9], ecology [102], sociology [98], etc., to model systems where individual entities influence collective behavior.
The early attempts at agent-based simulation [27,133] used some simple rules or formulas to guide how each individual behaved when faced with environmental change, which is easy to implement but makes it hard to capture complex individual behaviors accurately. After that, with the development of neural networks, for those individual decision factors that the simple rules cannot well capture, the neural networks are leveraged [47,55]. Furthermore, recent works tend to deploy reinforcement learning-based agents [181], for which each individual’s goal in the simulation is to maximize the reward.
However, these agents are limited since they are not autonomous and require human-defined goals or rules, which motivates the large language model-driven agents. In this paper, we present the UGI system, one of the main goals of which is to deploy large language model-based agents to simulate the complex dynamics of the city and further support various applications such as decision-making, etc. The large language model agents are the up-to-date solution for agent-based modeling and simulation for the complex city system.
3. Architecture of urban generative intelligence platform
Here, we present the overall architecture of our proposed Urban Generative Intelligence (UGI) platform (see Fig 2). The key idea is to assemble the powerful city simulator, urban knowledge graphs and various data streams as an open digital infrastructure. More importantly, the infrastructure will provide a standard language interface that enables the easy plugin of large language models and generative agents. It allows the generative intelligent models to conveniently access the computation power and factual knowledge in digital infrastructure, test strategies in various simulated scenarios and learn to evolve based on the feedback. Consequently, these empowered generative models will facilitate various downstream urban applications. The key components of the presented architecture are elaborated as follows:
Open Digital Infrastructure: This component aims to provide a backbone system that integrates the data science resources and computational tools designed for urban problems. Specifically, it accesses various data streams in urban systems to collect massive spatial-temporal data of empirical urban activities, covering the aspects of spatial layout (e.g., points-of-interest and areas-of-interest), infrastructure distribution (e.g., road network and subway network), human behaviour (e.g., individual movements and collective mobility flows) and urban dynamics (e.g., traffic jams). These rich datasets are fed into a powerful city simulator, Mirage [173], which can simulate the complex interactions between human, thing and space in an efficient and extensible manner. On top of the empirical observational data, this module can simulate various hypothetical scenarios efficiently, providing diverse environment to host intelligent agents. Besides, UrbanKG module [93] fuses various data streams and extracts factual knowledge, such as the spatial relation like “border by” and semantic relation like “category of”. UrbanKG provides the functions of construction, storage, and basic operations and algorithms of factual knowledge, which can facilitate easy access in generative intelligence models.
Language Interface: We design a standardized language interface to fully release the power of open digital infrastructure. City simulator, UrbanKG and diverse data sources used to be difficult to access. They often require customized algorithms to configure city simulator, retrieve factual knowledge from UrbanKG and integrate various data sources. Such obstacles limit their application in downstream tasks, and make their power inaccessible to advanced AI models. In our architecture, we design a user-friendly language interface to fully unleash the potential of the open digital infrastructure. It uses predefined natural language protocol to allow large language models and generative agents to conveniently leverage the computation power of city simulator and access factual knowledge from UrbanKG. The standardized language interface reduces the barrier of developing language model-driven agents on top of the open digital infrastructure, which hopefully will foster the proliferation of generative urban agents.
Generative Intelligence: On top of the language interface, we propose to use a foundation model pre-trained for urban problems, CityGPT [49]. Specifically, CityGPT is a pre-trained large language model that encodes local urban knowledge via the language interface. It effectively leverages the reasoning capability and common sense in large language model, and greatly reinforced for specialized local urban problems. Empowered by this powerful city foundation model, we will design a series of generative agents in the dimensions of urban mobility, economy, community and society. These agents not only are capable of high quality decision making in various scenarios, but also can enable realistic agent-based simulations. Such agents combined with CityGPT will release the power of generative intelligence to solve various important urban problems, such as urban planning, climate adaptation, inequality reduction, etc.
4. Generative agents
4.1. General framework for generative city agents
Here, we present a general framework for embodied generative agents in urban space (see Fig 3). This framework leverages generative foundation model as intelligence core, and it is built upon the open digital infrastructure of city simulator and UrbanKG knowledge graph. Following embodied cognition hypothesis [145], this framework allows generative agents to harness the realistic embodied feedback provided by the digital infrastructure and evolve its intelligence in simulated urban environment. Specifically, the autonomous agents under this framework have the Mental States components of memory, persona and preference. The memory component stores the history of past behaivour and interactions, persona component assigns the agents a specific profile to leverage the role play capability of language model, and preference component allows personalizing the agents with high-level language description. These LLM agents use city foundation model (such as CityGPT [49], UrbanGPT [89]) as its generative intelligence core, which can comprehensively model the internal Mental States and external Interactions to generate appropriate behaviours. The proposed framework aims to provide a unified conceptual blueprint for most generative city agents, and provide enough flexibility for customization in various applications.
One key element of generative city agents is the Interaction components, including the perceive module that senses the simulated urban environment; act module that registers behaviours or status changes in city simulator; and communicate module to exchange information with other agents.
These components will allow generative city agents to ground their reasoning and action on reliable City Simulators and Urban Knowledge Graphs. On one hand, LLMs excel at understanding multi-modal context, enabling rich conversations and simulating human-like decision-making, but they also exhibit certain unreliability when performing more complex or specialized tasks. Their behavior can be inconsistent and error-prone when faced with highly specific urban scenarios. On the other hand, City simulators and Urban Knowledge Graphs offer efficiency, accuracy, and simplicity in modeling urban environments and dynamics. However, these tools lack the flexibility and versatility offered by LLMs. These Interaction components enable generative city agents to grouded on city simulators and UrbanKGs through a natural language interface like Model Context Protocol(MCP) [67], allowing them to invoke simulators as tools. NLI improves the efficiency and accuracy of LLM agents. This integration facilitates the construction of a more robust and scalable urban simulation platform to support complex urban tasks. Through this synergy, LLMs can leverage specialized simulators for data-driven decision-making, while simulators benefit from the flexibility offered by LLMs, taking a significant step toward creating a foundational platform for generative urban intelligence.
To better illustrate our framework, we provide several concrete agent designs as below, focusing on two major categories of urban problems, i.e., simulating urban phenomena and informing complex decision making.
4.2. Simulation Agent: Generating individual and collective behaviour
Complex urban phenomena are driven by the spatiotemporal agglomeration of micro activities in physical, economic and social domains. Understanding the underlying micro mechanisms and the emergence process plays an important role in modeling and managing urban systems, necessitating the simulation of complex urban phenomena with micro autonomous agents. Here, we present three design examples of embodied agents under the proposed general framework, which are customized for the simulation of the basic urban activities in physical, economic and social domains, respectively.
Physical mobility This agent aims to simulate individual activities and movements within urban environments, with the objective of creating trajectories that mirror real-life patterns. In addition to generate logical individual mobility behavior, we also want to reproduce the statistical distribution of collective movements by simulating a population of these agents, such as reproducing the daily number of commuters between two locations. Traditional simulation models often use simplified rules to guide agent’s mobility behaviour, but they lack depth in understanding the rich semantic in urban mobility, such as the function of a specific place and characteristics of diverse demographic profile. The generative intelligence in city foundation model offers a promising alternative. It excels in common sense reasoning and has deep knowledge of the local environment. These features equip simulation agents with accurate prior of social norms and human behaviour patterns, contributing to more plausible simulation outcomes. Besides, the flexibility of language models, particularly through their prompt-based mechanism, enables more logical and realistic reasoning in behavior simulation.
One recent work found the Theory of Planned Behaviour [7] can be leveraged for simulating human mobility behaviour and proposed an agentic workflow, called Chain-of-Planned-Behaviour (CoPB) [126]. Specifically, CoPB imitates human cognitive process to generate the factors of attitude, subjective norm, and perceived behaviour control that govern the intention of human mobility. The designed agent will simulate human mobility behaviour as a step-by-step reasoning process that comprehensively consider all three factors. The attitude factor will generate the personal preference for each mobility intention, while the perceived behaviour control factor will evaluate its feasibility given the dynamic context. The subjective norm factor will generate a routine for each agent based on its profile, such as the work schedule, serving as the anchor points of its daily life. The reasoning core is prompted to avoid violation with these anchor points during generation. Such agent designs allow the simulation of realistic, personalized and coherent urban mobility behaivours, Besides, the ability to reason human mobility intention can also be beneficial to predict future mobility behaviour [87]. These agents unleash the reasoning power of LLM for the most essential micro urban activities in physical domains, reflecting the interactions between urban dwellers and the access of various urban resources.
Economy activity:For the economy, agent-based modeling and simulation is a promising solution for understanding and predicting the dynamics of economic systems [47]. Specifically, when predicting economic indicators such as GDP, unemployment rate, etc., Traditional methods, such as econometrics [129], cannot handle some complex real-world scenarios. For example, for one of the most famous econometrics methods, Dynamic Stochastic General Equilibrium (DGSE) [63], sometimes there is no feasible solution for the equilibrium. That is, the agent-based simulation can construct multiple heterogeneous agents to describe each user in the ecosystem and then define what kind of economic behaviors the agents can have. The major objective of the agent-based simulation in the economy is to observe both the emerging phenomenon from the perspective of macroeconomics and behavioral economics, which can regarded as an environment to support theory validation and decision-making. To construct the economic agents, one recent work [84] adopted the well-acknowledged simulation mechanism with four components: labor, consumption, financial markets, and government taxation, covering the primary components of existing macroeconomic simulations. The agent is deployed to simulate the two most critical decisions the real human will make in real life: going to work (earning money) and consumption (spending money). The government agents decide the tax policy, and the bank agents adjust interest rates based on market inflation or deflation. From the macroscopic perspective, the system can observe the dynamics of overall labor and consumption markets.
Social interaction: Human social behaviors can also be simulated with large language modelempowered agents. Specifically, the social agents require human-like abilities in social behaviors, i.e., interacting with other individuals in the city system. For social activities, there are both online and offline social networks, i.e., the communication can occur in both online social networks or just via chatting in a room. The social agent simulation mainly focuses on how information propagates on the social network and its further impact on the individuals. That is, the LLM-driven agent can first shape their social awareness, i.e., distinguishing the friends and other individuals and distinguishing different social-tie strengths. The agents can further make their own daily schedule autonomously in the city environment, which further leads to social activities, yielding interaction between different agents, including chatting, cooperation, or even conflicts. Last, the online social network, which is not restricted by the physical space, also provides the environment for social activities. The agents can post new content or propagate content of the other users. Despite the behavior itself, the internal characteristics, including the emotion and attitude of the agent, are also contained in the memory and mechanism of the large language model-based social agents. Overall speaking, the simulation can be evaluated from both individual-level and population-level perspectives. Regarding individual-level simulation, the aim is to generate social behaviors, attitudes, and emotions by leveraging user characteristics and the informational context within social networks. In the social simulation system S3 [54] built based on the UGI framework, the social agents can accurately simulate the propagation process of information, attitude, and emotion on two representative events about nuclear energy and gender discrimination. To summarize, the UGI system provides a good platform to support understanding and simulate social behaviors, including the emerged social phenomenon.
4.3. Decision Making Agent: Task solving and personal assistance
We present the design cases of decision-making agents in the following two scenarios.
Location recommendation: Location recommendation is one kind of new infrastructure in the area of information overload. That is, there are too many points of interest (locations) in the city environment, and the individual living there finds it hard to determine where to visit to meet the demands. Furthermore, each individual may have his/her own preferences and interests, which motivates the construction of personalized assistants based on large language model-empowered agents. We propose to build LLM-driven agents for location recommendation based on the LLM’s strong ability to understand both user preferences and decision-making. Such agents can extract critical information from the profile, attributes, and other basic information for a given user. In other words, the LLM agent can be a personal assistant with essential information about the user. Besides, the agent will be good at planning and scheduling based on the city environment’s feedback. Specifically, it is always challenging for a human to directly query or search for locations for visitation since the searching or filtering process will be faced with abundant data and information. To address it, the agent can organize the output of the traditional search and recommendation engines well and even adjust the engine if the results cannot meet the requirements. Last, the agent can communicate well with the user, understand the user’s new and instant feedback, and provide textual explanations for recommendation results based on the users’ historical behaviors, personal demands, or spatial-temporal context. Under the UGI framework, the LLM-based agents will have three major abilities, including 1) understanding the mixed and complex user intents, 2) detecting the user profiles and interests based on historical data and then adjusting recommendations, and 3) identifying the improper user demands given the real-world city environment.
Schedule planning: Effective schedule planning is crucial for efficient daily activity management. Traditional methods, often relying on shortest path algorithms [45], provide time-saving solutions but lack personalization and flexibility. They fail to account for user-specific preferences and cannot dynamically adapt to evolving or abstract requirements, potentially leading to suboptimal scheduling and conflicts. To overcome these limitations, we propose to design an LLM-driven agent to help users make high quality decisions for nuanced schedule planning. This agent leverages CityGPT capabilities, integrating common-sense knowledge to contextualize tasks and offering logical, user-centric planning solutions. Its natural language interface ensures a seamless, intuitive human-computer interaction, significantly enhancing the user experience.
The proposed agent comprises several key components. Upon receiving user’s schedule input, the agent will use the comprehension and reasoning skills of its generative intelligence core to formulate optimal schedule. It will respect the fixed commitments specified by users while accommodating preferences and time constraints. The memory module continuously integrates new information, facilitating dynamic adjustments. Through its interfaces for perceive and communicate, the agent evaluates the feasibility of plans, considering travel time and proximity. It finalizes the schedule with the user preference inferred from persona module, ensuring logical coherence and user satisfaction. This approach represents a significant advancement in personalized schedule planning, harnessing foundation model’s deep understanding and reasoning for more tailored and efficient daily organization. The design of schedule planning agent represents the basic decision making capability in urban daily life, which can also serve as a useful personal assistance that continuously learns user preference and evolves with urban environment.
5. Enabled urban applications
In this Section, we take several typical examples to discuss how our proposed UGI foundation platform enables to deal with complicated urban tasks and issues from four important urban systems of transportation, business, economy, and society.
5.1. Transportation system
Travel surveys have long been a cornerstone in transportation research, providing indispensable insights into travel behaviors and patterns. These surveys inform urban planning, infrastructure development, and transportation policy, aiding in the creation of more efficient and user-centric transport systems. However, traditional travel surveys, such as household travel surveys and on-board transit surveys, come with significant challenges. They are often expensive and time-consuming to conduct, involving face-to-face interviews, manual data collection, and extensive processing [131]. Additionally, the data collected may not adequately capture rapid shifts in travel behavior due to its infrequent nature [117]. Recent advancements in technology have led to the exploration of data science research in human mobility [59]. Researchers leverage the increasingly available mobility data to identify the universal rules in human mobility [130], and design rule-based generator of urban mobility behaviour [71]. This shift is significant as it promises to overcome the limitations of traditional surveys, offering real-time data collection and analysis capabilities.
However, the classic rule-based mobility generator model, such as TimeGeo [71], leverages simplified statistics rules to simulate individual movements between several frequent locations like home, work, and other. However, they lack a in-depth understanding of mobility intent and user profiles, and hence the travel behaviour they generate are not realistic and diverse. LLM-based generative agents bring about important opportunities. The reasoning capability of language model possesses not only robust comprehension capabilities for commonsense, but also could make high quality reasoning based on contextual information. For example, our recent study shows LLM agents can navigate in complex urban environment based on goal description and street view images [170]. In this paper, we describe a generative agent for physical mobility behaviour, which can generate realistic and intention-aware travel behaviour. Such generative model will provide an important opportunity for high-quality and efficient alternative for travel survey. In addition, the Chain-of-PlannedBehavior (CoPB) workflow [126] integrates the powerful reasoning capabilities of LLMs with the Theory of Planned Behavior (TPB) to substantially improve the inference of next-step movement intentions. By incorporating key cognitive constructs from TPB—such as attitudes, subjective norms, and perceived behavioral control—CoPB enables LLMs to generate more accurate behavioral predictions. Compared with conventional LLM-based generation methods, CoPB markedly reduces the error rate in travel intention inference. Furthermore, when combined with a gravity model that maps intentions to physical movements in real space, it produces travel trajectories that align more closely with observed geographical patterns.
5.2. Business Intelligence
Business site selection plays a key role in the interdisciplinary areas of urban planning, economic growth, and social development. Traditional site selection methods, often reliant on expert consultants and manual surveys, are resource-intensive and time expensive [24,78,111]. In contrast, research interests have shifted towards a data-driven paradigm, employing machine learning models fed with diverse urban data to evaluate potential sites [73,85,92,153]. These models, however, often lack comprehensive feature representation and logical reasoning in their analysis [160]. Recent advancements have introduced knowledge graphs in business site selection, integrating multifaceted data into a graph structure for enhanced knowledge representation without complex feature engineering [70,94]. Despite their potential, knowledge graphs face challenges in assimilating varied urban data, refining knowledge for different factors, and ensuring interpretability in decision-making. LLMs have emerged as a promising tool, capable of automating text-related tasks with extensive domain knowledge, advanced language generation abilities, and efficient data processing [29,144]. Their application in business site selection offers capabilities like comprehensive information retrieval and real-time decision support. However, LLMs often struggle with accurately recalling facts in knowledge-based content generation [156]. Here, we propose to address these gaps with an integrated intelligent site selection model. It combines the structured knowledge of knowledge graphs with the common-sense reasoning powers of LLMs, which is particularly enhanced for urban problems in CityGPT. Utilizing algorithms of reasoning on knowledge graph, this model is designed to deliver precise site selection results with enhanced decision-making quality, clear interpretations and improved efficiency and breadth. Therefore, we can leverage city foundation model to unleash the power in urban knowledge graph and various empirical data to transform business site selection.
5.3. Urban economy system
Agent-based modeling and simulation are of great importance for the research of the economy due to the limitations of other approaches. Early empirical statistical models, such as the Phelps Model [110] highlighted in the pioneering works of Hendry [64], delved into data-driven analyses of macroeconomic phenomena. These models unraveled relationships among pivotal variables. Kydland and Prescott [80] crafted a computational model geared toward predicting policy outcomes. Later, the advent of Dynamic Stochastic General Equilibrium (DSGE) models [38] aimed to encapsulate the dynamics of diverse economic variables like output, inflation, consumption, and investment, while accommodating the inherent uncertainty and randomness within economic processes. However, as pointed out by Farmer [47], these models operate under the assumption of a perfect world, which motivates agent-based modeling and simulation for the economy. That is, the LLM-based economic simulation based on our system can be an environment to deploy various relevant applications.
This simulation offers an ideal platform to scrutinize and emulate complex macroeconomic behaviors. By leveraging the interplay of diverse agents and institutions, the model can elucidate emergent behaviors, market dynamics, and the ripple effects of economic decisions. Understanding these behaviors is crucial for forecasting economic trends and devising resilient strategies in response to varying scenarios. Through this simulation framework, the intricate landscape of macroeconomic activities can be explored comprehensively. From trade dynamics and investment patterns to consumption trends and labor market behaviors, the model provides a simulated environment to examine and evaluate diverse economic activities. This simulation serves as an invaluable tool for policymakers to test and assess the efficacy of different policy interventions in a controlled environment. By simulating policy scenarios and their potential impacts on various economic indicators, policymakers can fine-tune strategies, evaluate trade-offs, and anticipate unintended consequences before implementation. This proactive approach to policymaking helps in devising robust, adaptive policies conducive to sustainable economic growth.
EconAgent [83] serves as a representative example. It leverages LLMs to conduct experiments within a constructed macroeconomic simulation environment that encompasses labor markets, consumer markets, financial markets, and government taxation. Within this framework, many classic economic phenomena naturally emerge, including consumer market inflation, labor market unemployment, and income dynamics under taxation and redistribution mechanisms. Compared with traditional rule-based or neural network–based agents, EconAgent produces more realistic macroeconomic behaviors. Experimental results further demonstrate its capacity to reproduce patterns consistent with established economic laws and to exhibit human-like decision-making, such as sensitivity to unemployment, wage fluctuations, and market shortages.
In essence, our system’s LLM-based economic simulation platform offers a versatile and robust framework for investigating, understanding, and shaping macroeconomic behavior, activities, and policy outcomes.
5.4. Urban society
Understanding our society is the core of social sciences, for which the proposal and validation of theory highly relies on social experiments. Due to the high cost of real-world social experiments, the simulation is a very promising approach. There are two key perspectives in social simulation, as outlined by Gilbert [57]: 1) the dynamic interaction among individuals, and 2) the status evolving of the population. By simulating social activities, both researchers and practitioners gain the ability to forecast the future progression of individual behaviors and the overall status of populations. Moreover, these simulations provide experimental arenas where interventions can be implemented and their effects observed.
Numerous studies have empirically validated the simulation of urban societies in real-world contexts, including OpenCity [155], AgentSociety [112], YuLan-OneSim [139], and OASIS [158]. OpenCity leverages six global cities as case studies to assess the realism of LLM agents in simulating urban mobility and daily activities. AgentSociety conducts four types of social experiments—political polarization, the spread of inflammatory messages, universal basic income (UBI), and external shocks such as hurricanes. These experiments not only replicate well-documented social phenomena but also align closely with findings from real-world surveys, interviews, and in-tervention studies, thereby demonstrating the realism and interpretability of LLM-powered social simulators. YuLan-OneSim offers a library of 50 default scenarios spanning eight domains, including economics, sociology, political science, and psychology. Its evaluation encompasses both the validation of established social theories (e.g., the theory of planned behavior and social capital theory) and the fitting of simulated outcomes to real-world data, confirming the system’s ability to generate trends consistent with empirical observations. OASIS models various social phenomena on platforms such as X and Reddit, including information diffusion, group polarization, and herd effects. Experiments show that its results reliably reproduce actual societal dynamics, such as patterns of information spread and the differentiation of public opinion.
The applications supported by our system and agents can be summarized as follows.The simulation system, along with the LLM-driven agents, enables a deep dive into individual behaviors within social contexts. By emulating these behaviors, researchers and practitioners can forecast and comprehend how individual actions are driven by internal mechanisms and external contexts or factors. Beyond individual behaviors, the system facilitates the prediction of broader population dynamics. It offers insights into how collective behaviors, trends, and group interactions evolve over time, aiding in anticipating societal shifts and trends. The simulation serves as an experimental ground for testing interventions in simulated social environments. Researchers and practitioners can implement and study the impact of various interventions, policies, or changes within these controlled settings, providing crucial insights into potential real-world outcomes. Thus, policymakers can develop and evaluate policies in a simulated societal landscape. By testing proposed policies virtually, they can assess their potential effects and fine-tune strategies before real-world deployment. In the real-world scenario, emergency-related data is always sparse, which leads to the challenge of risk prevention. By exploring different potential outcomes based on varying parameters, the government can prepare strategies to mitigate risks and adapt to changing circumstances, supported by the simulation system and LLM-driven agents in simulated society.
6. Implementation, ethics, and governance of urban generative intelligence
6.1. Practical implementation strategies and challenges
The implementation of an Urban Generative Intelligence platform requires a systematic approach to integrating diverse data sources, simulation environments, and large language models. We identify three critical aspects of the implementation process:
- Data Access and Standardization: The foundation of UGI lies in the seamless integration of the Urban Knowledge Graph, which must encompass data from multiple domains, such as transportation, energy, health, and demographics. To ensure interoperability, the use of standardized ontologies and data exchange protocols is crucial. We propose a process that begins by harmonizing raw data into the UrbanKG schema, followed by the linking of this data to simulation-ready datasets through automated preprocessing.
- Simulation Environment Configuration: Dynamic, multimodal representations of urban systems can be constructed using advanced scenario configuration tools, such as Mirage Environment. By aligning UrbanKG entities with simulation objects and processes, Mirage enables flexible experimentation under various urban conditions, including transportation disruptions, disaster responses, and policy interventions.
- LLM Interface Deployment: The large language model (LLM) serves as an intermediary between the CityGPT agent, the UrbanKG, and the simulator, utilizing a natural language interface such as the MCP protocol. Through structured prompts and API-based tool usage, the LLM can query data, generate hypotheses, and perform simulation operations. This modular architecture reduces system complexity while enhancing scalability and user accessibility.
Despite these opportunities, several technical challenges persist. Urban simulation and decision support systems often require real-time processing and feedback for timely responses. However, UGI systems involve extensive computational tasks, necessitating the processing of large volumes of data and the execution of complex simulation operations, all of which demand significant computing power. For LLMs, this computational requirement can grow exponentially. Additionally, the substantial computing load can lead to high latency. Furthermore, the data used in UGI systems is often heterogeneous, coming from various sources and formats, including structured data (e.g., traffic flow, energy consumption), semi-structured data (e.g., social media and news), and unstructured data (e.g., images and videos). This diversity in data types presents significant challenges. Effectively integrating, transforming, and utilizing this heterogeneous data remains a key obstacle to the implementation of smart city simulations and decision support systems.
6.2. Ethics, Privacy, and Social Impacts
The deployment of Urban Generative Intelligence platforms presents several ethical and social challenges. Urban data often includes sensitive information, such as travel trajectories, health indicators, and socioeconomic attributes. Without adequate protections, this data could expose individuals or vulnerable groups to surveillance or misuse. To mitigate these risks while preserving analytical utility, we recommend the use of techniques such as differential privacy, federated learning, and robust access control mechanisms. Furthermore, agents powered by LLMs are vulnerable to algorithmic bias, which can lead to unjust outcomes when simulating urban scenarios. For example, generative agents might inadvertently reinforce stereotypes or misrepresent the preferences of marginalized groups. To address this, we propose the integration of fairness-aware learning objectives, multi stakeholder evaluation processes, and continuous bias audits into urban governance workflows. The legitimacy of AI-enabled urban governance hinges on citizen trust and active participation. Thus, transparent communication, explainable agent decisions, and participatory modeling workshops involving community stakeholders are essential for fostering social acceptance and ensuring equitable outcomes. In the digital age, urban resilience depends on the adaptability of information systems and the inclusiveness of governance frameworks. [3,4], These findings resonate with our design philosophy for a UGI platform that considers generative agents as participatory tools rather than autonomous decision-makers to enhance human-centered urban planning. [5]
Social experiments have shown that the performance of LLMs simulated with actual results is strongly correlated (r = 0.85), even exceeding human performance [65]. Leveraging the cognitive constructs of attitude, subjective norm, and perceived behavioral control in the TPB, COPB significantly enhances the ability of LLMS to reason about next-step movement intentions [126]. Specifically, COPB effectively reduces the error rate of mobility intentions from 57.8% to 19.4%. While synthetic urban data generated by LLM agents is highly scalable, its epistemic value—specifically, how well it mirrors real-world human behavior—remains a subject of debate [135]. Agents may exhibit behaviors that appear reasonable but are ineffective, particularly in complex sociocultural contexts. Without proper validation or calibration using empirical data, synthetic outputs risk reinforcing simulation-driven misconceptions rather than contributing to evidence-based policy. We advocate for the cautious use of synthetic data and call for further research and attention in the future.
6.3 Policy Framework and Governance Model
For Urban Generative Intelligence to be effective, it requires a supportive institutional and policy environment. Currently, urban governance is fragmented across various institutions, making cross-sector coordination challenging. To address this, we recommend aligning urban governance frameworks with existing smart city strategies and national AI governance policies to ensure compliance while fostering innovation. Drawing inspiration from regulatory sandboxes in sectors like fintech and healthcare, we propose the creation of policy sandboxes for urban AI. These sandboxes would provide a controlled environment for testing new UGI applications, with limited scope and regulatory oversight. This approach allows regulators to observe potential risks in practice while encouraging experimentation and innovation. Urban intelligence should not be viewed as a unilateral responsibility of the government; rather, it must be developed collaboratively by government agencies, private enterprises, research institutions, and civil society. We advocate for a polycentric governance model, where responsibilities are decentralized and adaptable, ensuring both accountability and inclusiveness in the deployment of urban smart systems.
7. Discussion
7.1. Dive into complicated urban issues
Urban environments are dynamic and multifaceted, which are increasingly confronted with a myriad of complex issues stemming from their intricate networks encompassing physical, social, economic, and environmental factors. As mentioned in the Introduction, the rapid urbanization exacerbates challenges like traffic congestion, environmental pollution, resource scarcity, and infrastructure strain, alongside socio-economic issues like social inequality and housing crises. Addressing these issues is crucial for sustainable, equitable urban development and maintaining the vitality of cities in a global context.
To navigate these complexities, our proposed foudation platform of Urban Generative Intelligence(UGI) can foster emergent and sophisticated urban solutions. By leveraging the multi-source urban data and creating a real urban environment for interaction beyond traditional sandboxes or virtual simulations, with the LLM-empowered embodied agents, UGI enables deep, context-aware insights, offering nuanced understandings of urban dynamics. Morevoer, UGI allows for the emergence of intelligent solutions, which is able to address complex urban issues through advanced cognitive capabilities similar to human intelligence, while with the power of computational intelligence.
While UGI holds promise in tackling urban complexities, several challenges necessitate further exploration. Bridging the gap between advanced technological capabilities and practical, real-world urban applications remains a crucial hurdle. This includes adapting UGI to rapidly evolving urban dynamics and policy landscapes. Moreover, there is a pressing need to develop advanced embodied agent for a more nuanced, systematic understanding of urban complexities, integrating the diverse social, economic, and environmental aspects of urban life. Additionally, adapting these solutions to the escalating challenges of rapid urbanization and climate change is vital for ensuring sustainable and resilient urban development. Addressing these problems is critical for the successful implementation and evolution of UGI, making it a truly transformative tool for urban problem-solving.
7.2. Scale up to large city
Recent advancements in LLMs have opened new frontiers in simulating complex urban systems. Studies reveal that LLM agents, when personalized with diverse roles such as executives, engineers, and designers, can synergistically solve complex tasks like software development, making significant strides in designing, coding, testing, and documentation processes [113]. The scalability of these simulations, introducing more varied personas, has been shown to be beneficial across various domains [192], which are particularly of simulating large urban systems.
However, simulating societies of large-scale LLM agent, reflecting the complex constraints in urban environments, faces substantial computational challenges. Research efforts are geared towards optimizing the memory footprint and operational efficiencies of these models [8,127]. Techniques like model compression through knowledge distillation and quantization have been proposed [79,191]. Specifically, in urban simulations, batch prompting has emerged as a crucial technique, enhancing efficiency by simulating multiple agents concurrently, showing up to a 5x improvement in inference time and cost [36]. AgentTorch [37] leverages simple LLM archetypes shared across a population to scale up the simulation of simple behaviors. Moreover, the MetaGPT framework, initially applied in virtual software companies, presents a promising approach for efficient multi-agent collaboration in urban simulations [66]. Its shared message pool and subscription mechanism offer significant reductions in resource consumption. Despite these advancements, simulating expansive urban societies with LLM agents remains a formidable challenge, limiting the full potential of these simulations. Successfully simulating large-scale urban environments with LLM agents could not only enhance performance in specific tasks but also mimic emergent properties of human societies, offering insights into complex urban dynamics [31]. Thus, achieving full-process acceleration in LLM agent simulations remains a critical, yet unresolved, task in urban science.
7.3. Openness of the environment
As a foundational platform that integrates advanced technologies such as big data, simulation, and LLMs, UGI’s capabilities are not limited to providing realistic urban environments. With UGI’s open capabilities, users can transform their environments at will based on real cities, or even create a new city. Specifically, users can adjust AOI, POI and other data to change the urban spatial structure, land use type, so as to change the spatial distribution of urban functions. Based on the new urban spatial structure and functional distribution, users can use existing algorithms [120,121,166,168] to generate urban human activities under new conditions. Users are also allowed to create human activities using their own algorithms or data sources. Besides, the city’s road network, infrastructure networks, and even the image data, are all open and allow users to make any modifications on the copy that belongs to them. Through the openness of the environment, we hope that UGI will not only be used to build LLM-based agents in the city, but also that it will be able to provide a full range of intelligence for the planning, design, and governance of future cities, and promote multidisciplinary paradigm innovation in the urban field.
7.4. Developer community
As a topic that integrates the latest achievements in big data, urban simulation, LLMs and other fields, the development of UGI requires the collaboration of researchers and developers in multiple fields. This requires the establishment of a multi-disciplinary collaborative UGI developer community. In the community, researchers in the field of big data can share their data sets, data processing methods, and data generation methods to provide high-quality data streams for UGI. People interested in urban simulation can add new functions to the open infrastructure, improve its computing performance, and design more reasonable interfaces. Large language model researchers can provide insights for the training of CityGPT. Researchers in urban-related fields, such as urban planning, traffic management, economics, etc., can build their own agents that solve domain-specific problems through programming or natural language interface. The community will be a highly interdisciplinary community that will inspire many interesting ideas and research questions, help solve urban problems, and achieve smart and sustainable urban development.
7.5. Future outlook
The UGI platform serves as a foundational step in integrating LLMs with urban simulation environments. Its long-term vision is closely linked to the advancing capabilities of generative AI and digital twin technologies. We outline a three-phase roadmap that distinguishes current achievements, near-term potential, and long-term aspirations.
At present, UGI facilitates the deployment of LLM agents capable of basic conversational interaction and rule-based reasoning within simulated urban environments. These agents can access and interpret structured data from urban simulators like Mirage and perform simple query tasks using our urban knowledge graph, UrbanKG. The system currently supports use cases such as location recommendations, mobility behavior simulation, and policy scenario exploration, though these applications are limited by contextual depth and short-term memory.
In the near future, we anticipate substantial advancements in both agent intelligence and environmental fidelity. As AI evolves, agents will become more capable, enhancing UGI’s simulation abilities. Additionally, micro-agent behaviors will be more closely integrated with macro-city dynamics, allowing for more accurate feedback loops and the modeling of emergent behaviors. UGI is steadily progressing toward dynamic, adaptive urban ecosystems that can respond in real time to human inputs and environmental changes.
Looking further ahead, we envision UGI autonomously developing intelligent agents that can continuously learn and adapt within urban environments. These systems will evolve into high-resolution, real-time digital replicas of entire cities, simulating not just physical infrastructure but also social, economic, and ecological systems. These digital twins will serve as platforms for policy experimentation, disaster response training, and long-term urban forecasting. They will support comprehensive urban governance, enabling cross-sector optimization and resilient planning under uncertainty.
8. Conclusion
In conclusion, we propose an Urban Generative Intelligence (UGI) platform for modeling complex urban system, bridging the gap between cutting-edge technological capabilities and practical urban system applications. By integrating LLMs with urban data and digital twins, UGI provides a nuanced, dynamic platform for the development and deployment of embodied urban agents with human-level intelligence. This foundational platform not only propels forward the field of urban science but also sets new paradigm of generative intelligence in urban space. UGI’s comprehensive approach to modeling complex urban systems heralds a new era of intelligent, sustainable, and resilient urban development, paving the way for future cities that are more adaptive and responsive to the evolving needs of their inhabitants.
References
- 1. Acerbi A, Stubbersfield JM. Large language models show human-like content biases in transmission chain experiments. Proc Natl Acad Sci U S A. 2023;120(44):e2313790120. pmid:37883432
- 2. Acuto M, Parnell S, Seto KC. Building a global urban science. Nat Sustain. 2018;1(1):2–4.
- 3. Agboola OP. The Role of Artificial Intelligence in Enhancing Design Innovation and Sustainability. Smart Des Pol. 2024;1(1):6–14.
- 4. Agboola OP, Bashir FM, Dodo YA, Mohamed MAS, Alsadun ISR. The influence of information and communication technology (ict) on stakeholders’ involvement and smart urban sustainability. Environmental Advances. 2023;13:100431.
- 5. Agboola OP, Tunay M. Urban resilience in the digital age: The influence of Information-Communication Technology for sustainability. Journal of Cleaner Production. 2023;428:139304.
- 6.
Aher VA, Arriaga RI, Kalai AT. Using large language models to simulate multiple humans and replicate human subject studies. In: International Conference on Machine Learning, 2023. 337–71.
- 7. Ajzen I. The theory of planned behavior. Organizational Behavior and Human Decision Processes. 1991;50(2):179–211.
- 8.
Aminabadi RY, Rajbhandari S, Awan AA, Li C, Li D, Zheng E, et al. DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. In: SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 2022. 1–15. https://doi.org/10.1109/sc41404.2022.00051
- 9. An G, Mi Q, Dutta-Moscato J, Vodovotz Y. Agent-based models in translational systems biology. Wiley Interdiscip Rev Syst Biol Med. 2009;1(2):159–71. pmid:20835989
- 10.
Apple. Apple vision. 2022.
- 11. Ardhianto P, Santosa YP, Moniaga C, Utami MP, Dewi C, Christanto HJ, et al. Generative Deep Learning for Visual Animation in Landscapes Design. Scientific Programming. 2023;2023:1–12.
- 12. Batty M. The size, scale, and shape of cities. Science. 2008;319(5864):769–71. pmid:18258906
- 13.
Batty M. The new science of cities. MIT Press. 2013.
- 14.
Batty M. Digital twins. 2018.
- 15.
Batty M, Longley PA. Fractal cities: a geometry of form and function. Academic Press. 1994.
- 16. Batty M, Longley P, Fotheringham S. Urban Growth and Form: Scaling, Fractal Geometry, and Diffusion-Limited Aggregation. Environ Plan A. 1989;21(11):1447–72.
- 17.
Baudrillard J. Simulacra and simulation. University of Michigan Press. 1994.
- 18.
Berkowitz AL, Nilon CH, Hollweg KS. Understanding urban ecosystems: a new frontier for science and education. Springer Science & Business Media. 2003.
- 19. Bettencourt LMA. The origins of scaling in cities. Science. 2013;340(6139):1438–41. pmid:23788793
- 20.
Bettencourt L ís MA. Introduction to urban science: evidence and theory of cities as complex systems. 2021.
- 21. Luís MA, Bettencourt MA, Lobo J, Helbing D, Kühnert C, West GB. Growth, innovation, scaling, and the pace of life in cities. Proceedings of the National Academy of Sciences. 2007;104(17):7301–6.
- 22. Bi W, Cheng X, Xu B, Sun X, Xu L, Shen H. Bridged-gnn: Knowledge bridge learning for effective knowledge transfer. 2023. https://arxiv.org/abs/2308.09499
- 23. Binkley T. The Vitality of Digital Creation. The Journal of Aesthetics and Art Criticism. 1997;55(2):107–16.
- 24.
Breheny MJ. Practical methods of retail location analysis: a review. Store choice, store location and market analysis. 1988. 39–86.
- 25. Brelsford C, Lobo J, Hand J, Bettencourt LMA. Heterogeneity and scale of sustainable development in cities. Proc Natl Acad Sci U S A. 2017;114(34):8963–8. pmid:28461489
- 26. Brelsford C, Martin T, Hand J, Bettencourt LMA. Toward cities without slums: Topology and the spatial evolution of neighborhoods. Sci Adv. 2018;4(8):eaar4644. pmid:30167459
- 27. Brock WA, Hommes CH. Heterogeneous beliefs and routes to chaos in a simple asset pricing model. Journal of Economic Dynamics and Control. 1998;22(8–9):1235–74.
- 28. Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. Language models are few-shot learners. Advances in neural information processing systems. 2020;33:1877–901.2020.
- 29. Bubeck S, Chandrasekaran V, Eldan R, Gehrke J, Horvitz E, Kamar E, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint. 2023.
- 30. Calabrese F, Ferrari L, Blondel VD. Urban Sensing Using Mobile Phone Network Data: A Survey of Research. ACM Comput Surv. 2014;47(2):1–20.
- 31. Caldarelli G, Arcaute E, Barthelemy M, Batty M, Gershenson C, Helbing D, et al. The role of complexity for digital twins of cities. Nature Computational Science. 2023;:1–8.
- 32. Capponi A, Fiandrino C, Kantarci B, Foschini L, Kliazovich D, Bouvry P. A Survey on Mobile Crowdsensing Systems: Challenges, Solutions, and Opportunities. IEEE Commun Surv Tutorials. 2019;21(3):2419–65.
- 33. Chen L, Xu F, Han Z, Tang K, Hui P, Evans J, et al. Strategic COVID-19 vaccine distribution can simultaneously elevate social utility and equity. Nat Hum Behav. 2022;6(11):1503–14. pmid:36008683
- 34.
Chen L, Xu F, Li N, Han Z, Wang M, Li Y, et al. Large Language Model-driven Meta-structure Discovery in Heterogeneous Information Network. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024. 307–18. https://doi.org/10.1145/3637528.3671965
- 35. Chen M. The philosophy of the metaverse. Ethics Inf Technol. 2023;25(3).
- 36. Cheng Z, Kasai J, Yu T. Batch prompting: efficient inference with large language model apis. arXiv preprint. 2023.
- 37.
Chopra A, Kumar S, Giray-Kuru N, Raskar R, Quera Bofarull A. On the limits of agency in agent-based models. In: 2024. https://doi.org/arXiv:2409.10568
- 38. Christiano LJ, Eichenbaum M, Evans CL. Nominal Rigidities and the Dynamic Effects of a Shock to Monetary Policy. Journal of Political Economy. 2005;113(1):1–45.
- 39.
Cui D, Mousas C. Evaluating the Sense of Embodiment through Out-of-Body Experience and Tactile Feedback. In: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry, 2022. 1–7. https://doi.org/10.1145/3574131.3574456
- 40. Deng C, Zhang T, He Z, Chen Q, Shi Y, Zhou L, et al. Learning a foundation language model for geoscience knowledge understanding and utilization. arXiv preprint. 2023.
- 41.
Devarakonda S, Sevusu P, Liu H, Liu R, Iftode L, Nath B. Real-time air quality monitoring through mobile sensing in metropolitan areas. In: Proceedings of the 2nd ACM SIGKDD International Workshop on Urban Computing, 2013. 1–8. https://doi.org/10.1145/2505821.2505834
- 42. Dincelli E, Yayla A. Immersive virtual reality in the age of the Metaverse: A hybrid-narrative review based on the technology affordance perspective. The Journal of Strategic Information Systems. 2022;31(2):101717.
- 43. Driess D, Xia F, Sajjadi MSM, Lynch C, Chowdhery A, Ichter B, et al. Palme: An embodied multimodal language model. arXiv preprint. 2023.
- 44.
Duncan SEANC. Minecraft, beyond construction and survival. 2011.
- 45. Eppstein D. Finding the k Shortest Paths. SIAM J Comput. 1998;28(2):652–73.
- 46.
Esser P, Chiu J, Atighehchian P, Granskog J, Germanidis A. Structure and Content-Guided Video Synthesis with Diffusion Models. In: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023. 7312–22. https://doi.org/10.1109/iccv51070.2023.00675
- 47. Farmer JD, Foley D. The economy needs agent-based modelling. Nature. 2009;460(7256):685–6. pmid:19661896
- 48. Tao Fe i, Zhang Chenyua n, Qi Qingli n, Zhang H e. Digital twin maturity model. Computer Integrated Manufacturing Systems. 2022;28(5):1–20.
- 49. Feng J, Du Y, Liu T, Guo S, Lin Y, Li Y. Citygpt: Empowering urban spatial cognition of large language models. arXiv preprint. 2024.
- 50.
Feng J, Li Y, Zhang C, Sun F, Meng F, Guo A, et al. Deepmove: Predicting human mobility with attentional recurrent networks. In: Proceedings of the 2018 World Wide Web Conference, 2018. 1459–68.
- 51.
Feng J, Yang Z, Xu F, Yu H, Wang M, Li Y. Learning to Simulate Human Mobility. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020. 3426–33. https://doi.org/10.1145/3394486.3412862
- 52. Fjelland R. Why general artificial intelligence will not be realized. Humanit Soc Sci Commun. 2020;7(1).
- 53.
Gao C, Lan X, Li N, Ding J, Yuan Y, Zhou Z, et al. Large language models empowered agent-based modeling and simulation: A survey and perspectives. arXiv preprint. 2023.
- 54. Gao C, Lan X, Lu Z, Mao J, Piao J, Wang H, et al. S3: Social-network simulation system with large language model empowered agents. arXiv preprint. 2023.
- 55. Geanakoplos J. The Leverage Cycle. NBER Macroeconomics Annual. 2010;24(1):1–66.
- 56. Ghaderi A, Sanandaji MBM, Ghaderi F. Deep forecast: Deep learningbased spatio-temporal forecasting. arXiv preprint. 2017.
- 57.
Gilbert N, Troitzsch K. Simulation for the social scientist. McGraw-Hill Education (UK). 2005.
- 58. Goh E g S u, Sunar MS, Ismail AW. 3d object manipulation techniques in handheld mobile augmented reality interface: A review. IEEE Access. 2019;7:40581–601.
- 59. Gonzalez MC, Hidalgo CA, Barabasi A-L. Understanding individual human mobility patterns. Nature. 2008;453(7196):779–82.
- 60. Guo B, Liu Y, Liu S, Yu Z, Zhou X. CrowdHMT: Crowd Intelligence With the Deep Fusion of Human, Machine, and IoT. IEEE Internet Things J. 2022;9(24):24822–42.
- 61. Guo S, Lin Y, Feng N, Song C, Wan H. Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. AAAI. 2019;33(01):922–9.
- 62. Hauser MD, Chomsky N, Fitch WT. The faculty of language: what is it, who has it, and how did it evolve?. Science. 2002;298(5598):1569–79.
- 63.
Hayashi F. Econometrics. Princeton University Press. 2011.
- 64. Hendry DF, Richard J-F. On the formulation of empirical models in dynamic econometrics. Journal of Econometrics. 1982;20(1):3–33.
- 65.
Hewitt L, Ashokkumar A, Ghezae I, Willer R. Predicting results of social science experiments using large language models. In: Preprint, 2024.
- 66. Hong S, Zhuge M, Chen J, Zheng X, Cheng Y, Zhang C, et al. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint. 2023.
- 67.
Hou X, Zhao Y, Wang S, Wang H. Model context protocol (mcp): Landscape, security threats, and future research directions. In: arXiv preprint, 2025. https://doi.org/arXiv:2503.23278
- 68. Huang H, Zheng O, Wang D, Yin J, Wang Z, Ding S, et al. ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model. Int J Oral Sci. 2023;15(1):29. pmid:37507396
- 69.
Huang W, Zheng H. Architectural Drawings Recognition and Generation through Machine Learning. In: ACADIA proceedings, 2018. 156–65. https://doi.org/10.52842/conf.acadia.2018.156
- 70. Ji S, Pan S, Cambria E, Marttinen P, Yu PS. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications. IEEE Trans Neural Netw Learn Syst. 2022;33(2):494–514. pmid:33900922
- 71. Jiang S, Yang Y, Gupta S, Veneziano D, Athavale S, González MC. The TimeGeo modeling framework for urban motility without travel surveys. Proc Natl Acad Sci U S A. 2016;113(37):E5370-8. pmid:27573826
- 72. Jin C, Wu F, Wang J, Liu J, Guan Z, Han Z. Metamgc: a music generation framework for concerts in metaverse. EURASIP Journal on Audio, Speech, and Music Processing, 2022(1):31, 2022.
- 73.
Karamshuk D, Noulas A, Scellato S, Nicosia V, Mascolo C. Geo-spotting: mining online location-based services for optimal retail store placement. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 2013. 793–801.
- 74. Kim H, Choi H, Kang H, An J, Yeom S, Hong T. A systematic review of the smart energy conservation system: From smart homes to sustainable smart cities. Renewable and Sustainable Energy Reviews. 2021;140:110755.
- 75. Kim J, Jeong Y, Stengel M, Aksit K, Albert R, Boudaoud B, et al. Foveated ar: dynamically-foveated augmented reality display. ACM Transactions on Graphics. 2019;38(4):99:1-99:14.
- 76. Kondo R, Sugimoto M, Minamizawa K, Hoshi T, Inami M, Kitazaki M. Illusory body ownership of an invisible body interpolated between virtual hands and feet via visual-motor synchronicity. Sci Rep. 2018;8(1):7541. pmid:29765152
- 77. Kugler L. Non-fungible tokens and the future of art. Commun ACM. 2021;64(9):19–20.
- 78. Kumar V, Karande K. The Effect of Retail Store Environment on Retailer Performance. Journal of Business Research. 2000;49(2):167–81.
- 79.
Kwon W, Li Z, Zhuang S, Sheng Y, Zheng L, Yu CH, et al. Efficient Memory Management for Large Language Model Serving with PagedAttention. In: Proceedings of the 29th Symposium on Operating Systems Principles, 2023. 611–26. https://doi.org/10.1145/3600006.3613165
- 80. Kydland FE, Prescott EC. Time to Build and Aggregate Fluctuations. Econometrica. 1982;50(6):1345.
- 81. Lee L-H, Braud T, Zhou P, Wang L, Xu D, Lin Z, et al. All one needs to know about metaverse: A complete survey on technological singularity, virtual ecosystem, and research agenda. arXiv preprint. 2021.
- 82. Li F, Feng J, Yan H, Jin G, Yang F, Sun F, et al. Dynamic Graph Convolutional Recurrent Network for Traffic Prediction: Benchmark and Solution. ACM Trans Knowl Discov Data. 2023;17(1):1–21.
- 83.
Li N, Gao C, Li M, Li Y, Liao Q. EconAgent: Large Language Model-Empowered Agents for Simulating Macroeconomic Activities. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024. 15523–36. https://doi.org/10.18653/v1/2024.acl-long.829
- 84. Li N, Gao C, Li Y, Liao Q. Large language model-empowered agents for simulating macroeconomic activities. arXiv preprint. 2023.
- 85.
Li N, Guo B, Liu Y, Jing Y, Ouyang Y, Yu Z. Commercial Site Recommendation Based on Neural Collaborative Filtering. In: Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers, 2018. 138–41. https://doi.org/10.1145/3267305.3267592
- 86.
Li Q, Zheng Y, Xie X, Chen Y, Liu W, Ma W-Y. Mining user similarity based on location history. In: Proceedings of the 16th ACM SIGSPATIAL international conference on Advances in geographic information systems, 2008. 1–10. https://doi.org/10.1145/1463434.1463477
- 87. Li S, Feng J, Chi J, Hu X, Zhao X, Xu F. Limp: Large language model enhanced intent-aware mobility prediction. 2024. https://arxiv.org/abs/2408.12832
- 88. Li Z, Zhou W, Chiang Y-Y, Chen M. Geolm: Empowering language models for geospatially grounded language understanding. arXiv preprint. 2023. https://arxiv.org/abs/2310.14478
- 89.
Li Z, Xia L, Tang J, Xu Y, Shi L, Xia L, et al. UrbanGPT: Spatio-Temporal Large Language Models. In: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024. 5351–62. https://doi.org/10.1145/3637528.3671578
- 90.
Liu L, Dugas D, Cesari G, Siegwart R, Dube R. Robot Navigation in Crowded Environments Using Deep Reinforcement Learning. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020. 5671–7. https://doi.org/10.1109/iros45743.2020.9341540
- 91.
Liu Q, Chang C, Shen H, Cheng S, Li X, Zheng R. Research on artificial intelligence generated audio. In: Sixth International Conference on Computer Information Science and Application Technology (CISAT 2023), 2023. 1206–12.
- 92. Liu Y, Yao L, Guo B, Li N, Zhang J, Chen J, et al. DeepStore: An Interaction-Aware Wide&Deep Model for Store Site Recommendation With Attentional Spatial Embeddings. IEEE Internet Things J. 2019;6(4):7319–33.
- 93. Liu Y, Ding J, Fu Y, Li Y. UrbanKG: An Urban Knowledge Graph System. ACM Trans Intell Syst Technol. 2023;14(4):1–25.
- 94. Liu Y, Ding J, Li Y. Knowledge-driven site selection via urban knowledge graph. arXiv preprint. 2021.
- 95.
Lou Y, Zhang C, Zheng Y, Xie X, Wang W, Huang Y. Map-matching for low-sampling-rate GPS trajectories. In: Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2009. 352–61. http://dx.doi.org/10.1145/1653771.1653820
- 96. Lv Z. Generative artificial intelligence in the metaverse era. Cognitive Robotics. 2023;3:208–17.
- 97.
Lyon L, Driskell R. The community in urban society. Waveland Press. 2011.
- 98. Macy MW, Willer R. From Factors to Actors: Computational Sociology and Agent-Based Modeling. Annu Rev Sociol. 2002;28(1):143–66.
- 99. Makse HA, Havlin S, Stanley HE. Modelling urban growth patterns. Nature. 1995;377(6550):608–12.
- 100.
Mannion P, Duggan J, Howley E. An Experimental Review of Reinforcement Learning Algorithms for Adaptive Traffic Signal Control. Autonomic Road Transport Support Systems. Springer International Publishing. 2016. p. 47–66. https://doi.org/10.1007/978-3-319-25808-9_4
- 101. Rohin M, Khanna R, Mai G, Burke M, Lobell D, Ermon S. Geollm: Extracting geospatial knowledge from large language models. arXiv preprint. 2023.
- 102. McLane AJ, Semeniuk C, McDermid GJ, Marceau DJ. The role of agent-based models in wildlife ecology and management. Ecological Modelling. 2011;222(8):1544–56.
- 103.
Meta. Meta quest. 2022.
- 104. Mohd Noor N, Abdullah A, Hashim M. Remote sensing UAV/drones and its applications for urban areas: a review. IOP Conf Ser: Earth Environ Sci. 2018;169:012003.
- 105. Yao M u, Zhang Q, Hu M, Wang W, Ding M, Jin J, et al. Embodiedgpt: Vision-language pre-training via embodied chain of thought. arXiv preprint. 2023.
- 106. Nilforoshan H, Looi W, Pierson E, Villanueva B, Fishman N, Chen Y, et al. Human mobility networks reveal increased segregation in large cities. Nature. 2023;624(7992):586–92. pmid:38030732
- 107. Wang H, Ning H, Lin Y, Wang W, Dhelim S, Farha F, et al. A Survey on the Metaverse: The State-of-the-Art, Technologies, Applications, and Challenges. IEEE Internet Things J. 2023;10(16):14671–88.
- 108. OpenAI. Introducing chatgpt. https://openai.com/blog/chatgpt 2022. 2023 January 10.
- 109.
Park JS, O’Brien J, Cai CJ, Morris MR, Liang P, Bernstein MS. Generative agents: Interactive simulacra of human behavior. In: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023. 1–22.
- 110. Phelps ES. Phillips Curves, Expectations of Inflation and Optimal Unemployment over Time. Economica. 1967;34(135):254.
- 111. Phelps NA, Wood AM. The business of location: site selection consultants and the mobilisation of knowledge in the location decision. Journal of Economic Geography. 2017;18(5):1023–44.
- 112. Piao J, Yan Y, Zhang J, Li N, Yan J, Lan X, et al. Agentsociety: Large-scale simulation of llm-driven generative agents advances understanding of human behaviors and society. 2025. https://arxiv.org/abs/2502.08691
- 113.
Qian C, Liu W, Liu H, Chen N, Dang Y, Li J, et al. ChatDev: Communicative Agents for Software Development. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024. 15174–86. https://doi.org/10.18653/v1/2024.acl-long.810
- 114.
Qin HX, Hui P. Empowering the Metaverse with Generative AI: Survey and Future Directions. In: 2023 IEEE 43rd International Conference on Distributed Computing Systems Workshops (ICDCSW), 2023. 85–90. https://doi.org/10.1109/icdcsw60045.2023.00022
- 115.
Rana RK, Chou CT, Kanhere SS, Bulusu N, Hu W. Earphone: an end-to-end participatory urban noise mapping system. In: Proceedings of the 9th ACM/IEEE international conference on information processing in sensor networks, 2010. 105–16.
- 116. Rashid B, Rehmani MU. Applications of wireless sensor networks for urban areas: A survey. Journal of Network and Computer Applications. 2016;60:192–219.
- 117.
Richardson AJ, Ampt ES, Meyburg AH. Survey methods for transport planning. Melbourne: Eucalyptus Press. 1995.
- 118.
Robert F. Analysing and Understanding Embodied Interactions in Virtual Reality Systems. In: Proceedings of the 2023 ACM International Conference on Interactive Media Experiences, 2023. 386–9. https://doi.org/10.1145/3573381.3597234
- 119.
Rombach R, Blattmann A, Lorenz D, Esser P, Ommer B. High-Resolution Image Synthesis with Latent Diffusion Models. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 10674–85. https://doi.org/10.1109/cvpr52688.2022.01042
- 120. Rong C a n, Ding J, Li Y. An interdisciplinary survey on origin-destination flows modeling: theory and techniques. arXiv preprint. 2023.
- 121. Rong C a n, Ding J, Liu Z, Li Y. Complexity-aware large scale origin-destination network generation via diffusion model. arXiv preprint. 2023.
- 122. Salamone F, Masullo M, Sibilio S. Wearable Devices for Environmental Monitoring in the Built Environment: A Systematic Review. Sensors (Basel). 2021;21(14):4727. pmid:34300467
- 123.
Schelling C. Micromotives and macrobehavior. WW Norton & Company. 2006.
- 124. Schläpfer M, Bettencourt LMA, Grauwin S, Raschke M, Claxton R, Smoreda Z, et al. The scaling of human interactions with city size. J R Soc Interface. 2014;11(98):20130789. pmid:24990287
- 125. Yu S, Yu L, Xu F, Li Y. Defint: A default-interventionist framework for efficient reasoning with hybrid large language models. arXiv preprint. 2024.
- 126. Shao C, Xu F, Fan B, Ding J, Yuan Y, Wang M, et al. Beyond imitation: Generating human mobility from context-aware reasoning with large language models. arXiv preprint. 2024.
- 127.
Sheng Y, Zheng L, Yuan B, Li Z, Ryabinin M, Chen B, et al. Flexgen: High-throughput generative inference of large language models with a single gpu. In: International Conference on Machine Learning, 2023. 31094–116.
- 128.
Silva A, Gombolay M, Killian T, Jimenez I, Son S-H. Optimization methods for interpretable differentiable decision trees applied to reinforcement learning. In: International conference on artificial intelligence and statistics, 2020. 1855–65.
- 129. Smets F, Wouters R. An Estimated Dynamic Stochastic General Equilibrium Model of the Euro Area. Journal of the European Economic Association. 2003;1(5):1123–75.
- 130. Song C, Koren T, Wang P, Barabási A-L á s z l ó. Modelling the scaling properties of human mobility. Nature Physics. 2010;6(10):818–23.
- 131.
Stopher P e t e r. Collecting, managing, and assessing data using sample surveys. Cambridge University Press. 2012.
- 132. Taori R, Gulrajani I, Zhang T, Dubois Y, Li X, Guestrin C, et al. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca. 2023.
- 133.
Tesfatsion L, Judd KL. Handbook of computational economics: agent-based computational economics. Elsevier. 2006.
- 134. Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint. 2023.
- 135. Wang A, Morgenstern J, Dickerson JP. Large language models that replace human participants can harmfully misportray and flatten identity groups. Nat Mach Intell. 2025;7(3):400–11.
- 136. Wang D, Zhang J, Cao W, Li J, Zheng Y. When Will You Arrive? Estimating Travel Time Based on Deep Neural Networks. AAAI. 2018;32(1).
- 137. Wang G, Xie Y, Jiang Y, Mandlekar A, Xiao C, Zhu Y, et al. Voyager: An open-ended embodied agent with large language models. arXiv preprint. 2023.
- 138. Wang H, Yu Q, Liu Y, Jin D, Li Y. Spatio-Temporal Urban Knowledge Graph Enabled Mobility Prediction. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2021;5(4):1–24.
- 139. Wang L, Gao H, Bo X, Chen X, Wen J-R. Yulan-onesim: Towards the next generation of social simulator with large language models. arXiv preprint. 2025.
- 140. Wang L, Ma C, Feng X, Zhang Z, Yang H, Zhang J, et al. A survey on large language model based autonomous agents. 2023.
- 141.
Wang Y, Zheng Y, Xue Y. Travel time estimation of a path using sparse trajectories. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014. 25–34. https://doi.org/10.1145/2623330.2623656
- 142.
Hua W, Xu N, Zhang H, Zheng G, Zang X, Chen C, et al. Colight: Learning network-level cooperation for traffic signal control. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019. 1913–22.
- 143.
Hua W, Zheng G, Yao H, Li Z. Intellilight: A reinforcement learning approach for intelligent traffic light control. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018. 2496–505.
- 144. Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, et al. Emergent abilities of large language models. arXiv preprint. 2022.
- 145. Wilson M. Six views of embodied cognition. Psychon Bull Rev. 2002;9(4):625–36. pmid:12613670
- 146. Wolfram S. Statistical mechanics of cellular automata. Rev Mod Phys. 1983;55(3):601–44.
- 147. Xi Z, Chen W, Guo X, He W, Ding Y, Hong B, et al. The rise and potential of large language model based agents: A survey. arXiv preprint. 2023.
- 148. Xu F, Li Y, Jin D, Lu J, Song C. Emergence of urban growth patterns from human mobility behavior. Nat Comput Sci. 2021;1(12):791–800. pmid:38217178
- 149. Xu F, Li Y, Wang H, Zhang P, Jin D. Understanding Mobile Traffic Patterns of Large Scale Cellular Towers in Urban Environment. IEEE/ACM Trans Networking. 2017;25(2):1147–61.
- 150.
Xu F, Li Y, Xu S. Attentional Multi-graph Convolutional Network for Regional Economy Prediction with Open Migration Data. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020. 2225–33. https://doi.org/10.1145/3394486.3403273
- 151. Xu F, Lin Z, Xia T, Guo D, Li Y. Sume: Semantic-enhanced urban mobility network embedding for user demographic inference. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 2020;4(3):1–25.
- 152.
Xu F, Zhang P, Li Y. Context-aware real-time population estimation for metropolis. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, 2016. 1064–75. https://doi.org/10.1145/2971648.2971673
- 153.
Xu M, Wang T, Wu Z, Zhou J, Li J, Wu H. Demand driven store site selection via multiple spatial-temporal data. In: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2016. 1–10. https://doi.org/10.1145/2996913.2996996
- 154.
Yamada T, Tanaka T, Sagawa Y. One-Handed Character Input Method Without Screen Cover for Smart Glasses that Does not Require Visual Confirmation of Fingertip Position. Lecture Notes in Computer Science. Springer Nature Switzerland. 2023 603–14. https://doi.org/10.1007/978-3-031-35596-7_39
- 155. Yan Y, Zeng Q, Zheng Z, Yuan J, Feng J, Zhang J, et al. Opencity: A scalable platform to simulate urban activities with massive llm agents. arXiv preprint. 2024.
- 156. Yang L, Chen H, Li Z, Ding X, Wu X. Chatgpt is not enough: Enhancing large language models with knowledge graphs for fact-aware language modeling. arXiv preprint. 2023.
- 157.
Yang XX. Urban remote sensing: monitoring, synthesis and modeling in the urban environment. John Wiley & Sons. 2021.
- 158. Yang Z, Zhang Z, Zheng Z, Jiang Y, Gan Z, Wang Z, et al. Oasis: Open agent social interaction simulations with one million agents. arXiv preprint. 2024.
- 159. Yao H, Wu F, Ke J, Tang X, Jia Y, Lu S, et al. Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction. AAAI. 2018;32(1).
- 160.
Yap JYL, Ho CC, Ting C-Y. Analytic hierarchy process (AHP) for business site selection. In: AIP Conference Proceedings, 2018. 020151. https://doi.org/10.1063/1.5055553
- 161. Youn H, Bettencourt LMA, Lobo J, Strumsky D, Samaniego H, West GB. Scaling and universality in urban economic diversification. J R Soc Interface. 2016;13(114):20150937. pmid:26790997
- 162. Yuan B, Deng Z, Geng N, Chen Y, Hu H. Practice Summary: Cainiao Optimizes the Fulfillment Routes of Parcels. INFORMS Journal on Applied Analytics. 2023;53(6):446–50.
- 163.
Yuan J, Zheng Y, Xie X. Discovering regions of different functions in a city using human mobility and POIs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012. 186–94. https://doi.org/10.1145/2339530.2339561
- 164.
Yuan J, Zheng Y, Zhang C, Xie W, Xie X, Sun G, et al. In: Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems, 2010. 99–108.
- 165.
Yuan Y, Ding J, Wang H, Jin D, Li Y. Activity Trajectory Generation via Modeling Spatiotemporal Dynamics. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022. 4752–62. https://doi.org/10.1145/3534678.3542671
- 166.
Yuan Y, Ding J, Wang H, Jin D, Li Y. Activity Trajectory Generation via Modeling Spatiotemporal Dynamics. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022. 4752–62. https://doi.org/10.1145/3534678.3542671
- 167. Yuan Y, Wang H, Ding J, Jin D, Li Y. Learning to simulate daily activities via modeling dynamic human needs. arXiv preprint. 2023.
- 168.
Yuan Y, Wang H, Ding J, Jin D, Li Y. Learning to Simulate Daily Activities via Modeling Dynamic Human Needs. In: Proceedings of the ACM Web Conference 2023, 2023. 906–16. https://doi.org/10.1145/3543507.3583276
- 169.
Zeng A, Liu X, Du Z, Wang Z, Lai H, Ding M, et al. Glm-130b: An open bilingual pre-trained model. In: 2023.
- 170. Zeng Q, Yang Q, Dong S, Du H, Zheng L, Xu F, et al. Perceive, reflect, and plan: Designing llm agent for goal-directed city navigation without instructions. arXiv preprint. 2024.
- 171. Zhang H, Du W, Shan J, Zhou Q, Du Y, Tenenbaum JB, et al. Building cooperative embodied agents modularly with large language models. arXiv preprint. 2023.
- 172.
Jun Zhang, Wenxuan Ao, Depeng Jin, Li Liu, and Yong Li. A city-level high-performance spatio-temporal mobility simulation system. 2023.
- 173.
Zhang J, Jin D e p e n g, Li Y. Mirage: an efficient and extensible city simulation framework. In: Proceedings of the 30th International Conference on Advances in Geographic Information Systems, 2022. 1–4.
- 174. Zhang J, Zheng Y, Qi D. Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction. AAAI. 2017;31(1).
- 175. Zhang M, Fu H, Li Y, Chen S. Understanding Urban Dynamics From Massive Mobile Traffic Data. IEEE Trans Big Data. 2019;5(2):266–78.
- 176. Zhang S, Fu D, Zhang Z, Yu B, Cai P. Trafficgpt: Viewing, processing and interacting with traffic foundation models. arXiv preprint. 2023.
- 177. Zhang Y, Wei C, Wu S, He Z, Yu W. Geogpt: Understanding and processing geospatial tasks through an autonomous gpt. arXiv preprint. 2023.
- 178. Zhang Y, Xu F, Chen L, Yuan Y, Evans J, Bettencourt L, et al. Counterfactual mobility network embedding reveals prevalent accessibility gaps in U.S. cities. Humanit Soc Sci Commun. 2024;11(1).
- 179. Zhang Y, Xu F, Xia T, Li Y. Quantifying the Causal Effect of Individual Mobility on Health Status in Urban Space. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2021;5(4):1–30.
- 180. Zhao WX, Zhou K, Li J, Tang T, Wang X, Hou Y, et al. A survey of large language models. arXiv preprint. 2023.
- 181. Zheng S, Trott A, Srinivasa S, Parkes DC, Socher R. The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning. Sci Adv. 2022;8(18):eabk2607. pmid:35507657
- 182. Yu Z, Capra L, Wolfson O, Yang H. Urban computing: concepts, methodologies, and applications. ACM Transactions on Intelligent Systems and Technology. 2014;5(3):1–55.
- 183.
Zheng Y, Li Q, Chen Y, Xie X, Ma W-Y. Understanding mobility based on GPS data. In: Proceedings of the 10th international conference on Ubiquitous computing, 2008. 312–21. https://doi.org/10.1145/1409635.1409677
- 184. Zheng Y, Lin Y, Zhao L, Wu T, Jin D, Li Y. Spatial planning of urban communities via deep reinforcement learning. Nat Comput Sci. 2023;3(9):748–62. pmid:38177774
- 185.
Zheng Y, Liu Y, Yuan J, Xie X. Urban computing with taxicabs. In: Proceedings of the 13th international conference on Ubiquitous computing, 2011. 89–98. https://doi.org/10.1145/2030112.2030126
- 186. Yu Z, Xie X, Ma W-Y. Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng Bull. 2010;33(2):32–9.
- 187.
Zheng Y, Zhang L, Xie X, Ma W-Y. Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the 18th international conference on World wide web, 2009. 791–800. https://doi.org/10.1145/1526709.1526816
- 188. Zhou X, Zhu H, Mathur L, Zhang R, Yu H, Qi Z, et al. Sotopia: Interactive evaluation for social intelligence in language agents. arXiv preprint. 2023.
- 189. Yu Z, Liu C. The logic and innovation of building digital twin city in xiong’an new area. Urban Development Studies. 2018;25(10):60–7.
- 190.
Zhou Z, Liu Y, Ding J, Jin D, Li Y. Hierarchical Knowledge Graph Learning Enabled Socioeconomic Indicator Prediction in Location-Based Social Network. In: Proceedings of the ACM Web Conference 2023, 2023. 122–32.https://doi.org/10.1145/3543507.3583239
- 191. Zhu X, Li J, Liu Y, Ma C, Wang W. A survey on model compression for large language models. arXiv preprint. 2023.
- 192. Zhuge M, Liu H, Faccio F, Ashley DR, Csordás R, Gopalakrishnan A, et al. Mindstorms in natural language-based societies of mind. arXiv preprint. 2023.