Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Spatial characteristics, similarity in business scope, and expansion network of listed companies in China’s food manufacturing industry

  • Enkang Li,

    Roles Conceptualization, Formal analysis, Funding acquisition, Software, Validation, Writing – original draft, Writing – review & editing

    Affiliation School of Architectural Engineering, Jinling Institute of Technology, Nanjing, China

  • Yingyi Ma ,

    Roles Conceptualization, Project administration, Supervision, Visualization

    mayingyi@jit.edu.cn

    Affiliation School of Architectural Engineering, Jinling Institute of Technology, Nanjing, China

  • Bo Hu,

    Roles Data curation, Investigation, Resources, Visualization

    Affiliation School of Architectural Engineering, Jinling Institute of Technology, Nanjing, China

  • Wen Zhong,

    Roles Data curation, Methodology

    Affiliation School of Architectural Engineering, Jinling Institute of Technology, Nanjing, China

  • Ruoyan Zhang

    Roles Validation

    Affiliation School of Architectural Engineering, Jinling Institute of Technology, Nanjing, China

Abstract

Using a multi-source dataset of 777 Listed Companies in the Food Manufacturing Industry (LCFMI), this study examines their spatial and relational organization in China. Key findings are: 1) The geographical distribution of LCFMIs closely aligns with China’s macro-economic landscape, with more developed regions hosting a greater concentration and larger scale of firms. 2) While similarity in business scope (SBS) is generally low, significant distance-decay effect is noted, particularly within “0 ~ xx km”; the resulting SBS network demonstrates small-world properties and distinct clustering patterns based on both geography and industrial sub-categories, highlighting the interplay between spatial and cognitive proximity. 3) The inter-city recruitment network, which reflects the spatial expansion of corporate influence, is predominantly shaped by transportation accessibility between cities. This research provides new empirical insights into the geographical organization of manufacturing and the formation of city networks, offering practical implications for regional industrial policy and infrastructure planning.

Introduction

The food manufacturing industry is crucial for the national economy [1]. China has a large population and significant food demand. In recent times, its food manufacturing industry has grown rapidly, becoming a key sector in national economic construction and social development. In 2023, China’s total number of food manufacturing enterprises above a designated size reached 10,075, with total assets reaching RMB 2,163.9 billion [2]. Among them, the operating scale of listed companies in the food manufacturing industry (LCFMIs) is correspondingly larger because they can access financing support in the capital market, thereby exerting a greater impact on the daily lives of urban and rural residents in China. As a key sector in the manufacturing industry, the food manufacturing industry is comparable to other manufacturing sectors in its impact on regional economic patterns. For example, certain enterprises tend to expand from specific industries in economically developed cities, such as Beijing, to neighboring cities, thereby promoting the coordinated development of neighboring regions. Compared to other sectors such as pharmaceuticals, nano, and computers, the food manufacturing industry lacks strict requirements for the quality of the workforce and can absorb numerous people with low and middle levels of education, significantly increasing the incomes of urban and rural residents.

Despite the significance of LCFMIs, research on them remains limited, and several key questions remain unanswered: What are the distribution areas of these enterprises in China? What are the spatial layout’s characteristics of these enterprises? To what extent do the business scopes of various enterprises overlap? What forms of spatial expansion characteristics do enterprises exhibit during development? Addressing these questions is essential for comprehending the unique characteristics of China’s LCFMIs and informing macro-level policy planning. Therefore, aiming to bridge this gap in the literature, in this study a comprehensive research on this crucial topic is conducted.

Theoretical framework

The core argument in this study centers on the Similarity of Business Scopes (SBS) among LCFMIs and the city network formed by cross-city recruitment of LCFMIs. Therefore, we developed a theoretical framework for understanding the spatial organization and cross-local connections of LCFMIs based on the theoretical traditions of economic geography and the analytical paradigm of city network theory. Specifically, we used the multi-dimensional proximity theory as the hub of the core argument to analyze the geographical spatial relationships among enterprises; at the same time, we applied the spatial interaction theory and gravity model to explore the city network formed based on the cross-city recruitment of LCFMIs and the influencing factors of network formation.

Our interest in the SBS among LCFMIs stems from long-standing concerns in economic geography about “agglomeration” and “distance.” Earlier studies mostly focused on geographical proximity and the factors influencing it, apart from the economic effects it would produce [3,4]. However, an increasing number of recent studies have begun to shift their attention to other dimensions of proximity, such as cognitive proximity, social proximity, and the driving mechanisms and policy effects behind this proximity. The SBS proposed in this study is an effective proxy variable for cognitive proximity, which can reflect the degree of proximity of enterprises in terms of knowledge base, technical capabilities, and market positioning. At the same time, a comprehensive analysis of this proximity and classic geographical proximity (i.e., geographical distance) has been conducted to explore (a) the relationship between SBS and geographical distance and (b) whether this relationship exhibits nonlinear characteristics as the geographical scale changes.

Enterprises aiming to avoid excessive competition from high SBS prefer to operate from different cities as their bases and not to cluster their operations in one city. However, this does not mean that they will easily abandon or ignore cities other than the headquarters location. They often establish branches and recruit employees in cities outside the headquarters location to expand their market influence as much as possible. This process strengthens the connections between the city where the headquarters is located and other cities. In other words, the recruitment behavior from the city where the headquarters is located to non-headquarters locations strengthens the spatial interaction between cities, making this an interesting and important aspect of urban network research over the years. In trying to explain this spatial interaction, the gravity model has been used successfully in many studies as a classic and suitable analytical tool.

However, before conducting the analysis, we provide a comprehensive review of the overall situation of LCFMIs (such as the spatial pattern of LCFMIs and its relationship with the regional economy, etc.) to establish the background and foundation for the subsequent research. Therefore, following the logic of the above theoretical framework, we developed the following research framework (Fig 1). Our intention was to cover two core issues: (1) to discuss SBS and SBS networks, and the relationship between SBS and distance (theoretically supported by the proximity theory); and (2) to study the city network constructed based on the recruitment relationship and to analyze and explain the network using the gravity model (theoretically supported by the spatial interaction theory).

Literature review

Spatial layout of manufacturing enterprises and the theory of multi-dimensional proximity

The location selection of enterprises is a classic issue in the field of economic geography. For manufacturing enterprises, it necessitates a comprehensive consideration of the costs associated with spatial layout—including those related to labor and raw materials—as well as potential market returns. Generally, manufacturing enterprises tend to concentrate in regions with more developed transportation and higher capital intensity [5]. However, the priorities for location selection differ depending on firm size. Large enterprises value supervision, whereas Small and Medium-size Enterprises (SMEs) prioritize cost [6]. The behavioral characteristics of large enterprises are also obvious in multinational companies [7], as there are often certain risks in the selection and change of locations [8]. However, agglomeration is desirable. If parent and subsidiary companies are both located in a relatively developed metropolitan area, the production efficiency and economic benefits of the subsidiary company will be higher than those of subsidiaries far away from the parent company [9].

In China, state-owned and private enterprises in the manufacturing industry are related to each other in a spatial layout: upstream state-owned enterprises are in locations with higher enterprise concentration, and the entry degree of new private enterprises is significantly lower [10]. The spatial distribution of the manufacturing industry is affected by land policy [11], industrial structure, and the logistics level [12]. Several studies point to the core indicator, distance [13], which has been extensively discussed by many scholars in economic geography over the past hundred years. In addition, the agglomeration and dispersion of the manufacturing industry are closely related to producer services [14] and environmental issues such as carbon emissions in the region [15].

In fact, the location selection of manufacturing enterprises and the resulting spatial patterns are shaped by a combination of multidimensional factors, including distance and institutions. These investigations are closely related to the core concept of the theory of multi-dimensional proximity. This theory posits that the spatial layout of enterprises and the various relationships formed between them are influenced not only by geographical distance but also by other factors, including cognitive, organizational, and social dimensions. Moreover, the nature of this influence often varies depending on the type of enterprise. This leads to our point of interest: Does this pattern hold true for China’s food manufacturing industry?

Network constructed based on “relationships”

Network analysis is an important analytical concept. City network areas are increasingly being studied in economics and geography. Inter-city connections can be categorized into networks formed by physical flows and those formed by relational constructions. Physical networks include the flow of vehicles, people, and goods between cities, whereas relational networks have no specific physical objects moving in space.

Relational networks include both cooperative and competitive relationships. Cooperative relationships, such as the city innovation network, can be constructed from patent [1618] and thesis cooperation [19]. Competitive relationships, such as the city competition network [20], can be expressed by competition between cities for certain trade markets [21]. These relationships often have no apparent direction; however, similar to investment networks [22,23] and search engine-based relational networks [24], they are directional. Geographical distance further plays an important role in the formation of such relational city networks [25], with the distance decay effect widely observed and validated [26,27].

Descriptive analysis is fundamental in city network research. Social Network Analysis (SNA) is widely used for the specific calculation and description of network attributes [2831], often supported by tools such as Gephi, UCINET, and Python. In econometric analysis or mechanism explanation, gravity models are commonly applied [3235], while Quadratic Assignment Procedure (QAP) regression analysis is used to explore network formation mechanisms [36].

Spatial interaction and the application of the gravity model

Spatial interaction is a pervasive phenomenon in economic geography. The interacting agents can be micro-geographic units such as firms, or macro-geographic units such as cities and nations.

Spatial heterogeneity serves as a fundamental driver of spatial interaction [37]. Such heterogeneity can induce the transfer of various elements—including population, goods, technology, and capital—from one location to another, analogous to water flowing from a higher to a lower elevation. The process of transfer itself constitutes the manifestation of interaction, implying a crucial logic: the interaction possesses directionality [38]. Certainly, factors driving interaction between cities or regions extend beyond mere spatial differences. Improved transport accessibility also plays a significant role in intensifying inter-regional interactions. For instance, the introduction of high-speed rail has been shown to facilitate the flow of production factors and expand the influence and radiating capacity of core cities [39]. Furthermore, interaction can also manifest as mutual dependence and coordination [40]. This spatial dependency is particularly evident between adjacent areas [41]. In such contexts, the application of spatial autocorrelation becomes a key analytical tool employed extensively in the literature to conduct effective empirical analysis [42].

However, merely describing an interaction is insufficient. We need to quantify how different factors influence this relationship. Undoubtedly, the gravity model stands as one of the most classic tools for analyzing such issues. Generally, many scholars adapt or modify the standard gravity model based on the specific characteristics of their research question to ensure that it more accurately reflects real-world contexts [43]. Selecting an appropriate estimation method is crucial for realizing the model’s full analytical value. A classical approach involves taking logarithms for linearization [44]. However, this method may not be useful in handling situations where the dependent variable contains zeros. Consequently, other important estimation strategies, including Poisson Pseudo-Maximum Likelihood (PPML) [45] and Negative Binomial Quasi-Generalised Pseudo-Maximum Likelihood (NB QGPML) [46], have become preferred alternatives in many studies.

Summary

Therefore, the spatial layout of enterprises exhibits a propensity for agglomeration or dispersion. Different choices are related to the similarities and differences between firms. Distance is undoubtedly a crucial factor; together with other elements involved in the theory of multidimensional proximity, such as institutions and culture, it shapes the geographical distribution patterns of enterprises and indirectly influences the relationships between cities. This type of relationship can generally be regarded as spatial interaction, which can serve as a theoretical framework for expressing the expansion of urban influence. In-depth analysis and discussion of this phenomenon require the support of robust analytical tools such as network analysis and gravity model construction.

Materials and methods

Data

LCFMI.

The enterprise data for the collected 777 LCFMIs were derived from Tianyancha (https://www.tianyancha.com/). These include A-shares, Hong Kong stocks, US stocks, the new Third Board, and the new Fourth Board. All selected companies were still active as of December 19, 2024. The coordinate data of the, i.e.,s’ registered addresses were obtained from Baidu’s coordinate-picking system (https://api.map.baidu.com/lbsapi/getpoint/index.html).

Based on the raw enterprise data, we extracted and structured additional recruitment-related information based on the following criteria: First, the data records the recruitment announcements were published by enterprises in different cities; second, the recruitment announcements did not mention the number of specific personnel to be recruited for a certain position; and, third, enterprises had disclosed the relevant information on the Tianyancha platform. The data were analyzed as follows:

Suppose that , an LCFMI registered in city, issues recruitment announcement in city in year , then the connection strength between city and city in year can be defined as . The formula is as follows:

Thus, a city network can be obtained based on the job posts. We can name such a network the LCFMI city network.

Economic data and inter-city transport accessibility data.

The urban gross domestic product (GDP) data were obtained from the China City Statistical Yearbook. Driving time data between cities, derived from the Baidu Map, were collected by the research team in May 17, 2021 (Monday).

Method

SBS measurement.

The LCFMI data obtained from Tianyancha include the business scope of each enterprise. Following Manning et al. [47], we employed a comprehensive method based on Term Frequency-Inverse Document Frequency (TF-IDF) to calculate the similarity of business scopes among different LCFMI. The details are as follows:

  1. TF-IDF Feature Extraction

TF-IDF feature extraction can quantify the importance of words in specific documents, balance the intra-document frequency and inter-document distribution of words, and generate numerical feature vectors for text. The relevant formulas are as follows:

where denotes the -th document; denotes the -th valid word; is the number of occurrences of word in document ; is the total number of valid words in document ; is the total number of occurrences of all valid words in document . is the total number of documents in the document set; is the document frequency of word (the number of documents containing the word); log is the natural logarithm (to avoid excessive values); adding 1 to the numerator and denominator is a smoothing process to prevent the denominator from being 0 or the IDF value from being negative.

Based on the above information, we can construct the TF-IDF vector. The TF-IDF vector of document is:

where is the total number of valid words in all documents (after stopword filtering and word pruning).

We calculate the cosine of the angle between the two TF-IDF vectors, quantify the semantic similarity between documents, and the result ranges from [0,1] (the larger the value, the higher the similarity). Let the TF-IDF vector of document be , and the TF-IDF vector of document be , where , . The cosine similarity between them is:

where is the dot product of vectors and ; and are the L2 norms of vectors and respectively; is the angle between the two vectors; is the total number of valid words in all documents (consistent with the dimension of TF-IDF vectors). To facilitate the analysis, the value of cosine similarity is multiplied by 100, resulting in the SBS referred to in this study. The more detailed parameters involved in the calculation and their meanings can be found in S1 Appendix.

Social network analysis.

1. Degree of weight

The weighting degree refers to the sum of the weights of all the connections associated with a node. The formula used is as follows:

where is the weighting degree of node and is the weight of the connection between nodes and .

2. Community detection parameters

(1) Resolution parameter ()

Resolution is a core tuning parameter in modularity-based community detection that regulates the granularity of community division. It adjusts the algorithm’s preference for merging small communities or splitting large ones by modifying the penalty term in modularity calculation.

(2) Random seed

Random Seed (RS) is an initialization parameter for stochastic community detection algorithms. To ensure result robustness, multiple random seeds were tested to verify the stability of community structure.

3. Modularity ()

It is the primary metric for evaluating the quality of community division, quantifying the difference between the actual edge density within communities and the expected edge density in a random network with identical node degree distribution. The formula of :

where is total number of nodes; is total number of edges in the network (edges represent SBS between enterprise pairs); is adjacency matrix element; is resolution parameter; is degree of node (sum of SBS between enterprise and all other enterprises); is indicator function (1 if enterprise and belong to the same community, 0 otherwise); is community label of enterprise . indicates that intra-community edge density is higher than that of a random network, confirming significant community structure.

4. Homogeneity index

Homogeneity Index is a metric measuring the consistency of node attributes (administrative regions/industry subcategories) within each community, reflecting the aggregation degree of enterprises by specific attributes (range: 0 = completely heterogeneous to 1 = completely homogeneous).

Formula of single community homogeneity (for the community) is:

Formula of global homogeneity (weighted average across all communities) is:

where is the total number of detected communities; is number of enterprises in the community; is number of attribute categories (31 administrative regions/provinces; 7 industry subcategories); is number of enterprises in the community belonging to the attribute category.

5. Chi-Square Test

Chi-Square Test is a statistical test for validating the external validity of community structure, examining whether there is a significant association between detected community labels and enterprise attributes.

First, we need to formulate two mutually opposing hypotheses

  1. H0: Community labels and target attributes are mutually independent (no association);
  2. H1: Community labels and target attributes are significantly associated.

Formula is:

where is observed frequency (number of enterprises in the community and attribute category); is expected frequency under H0 (, where is the total number of enterprises in the attribute category); The meanings of , , , and are the same as the corresponding content in the “Homogeneity Index” section. If p < 0.05, H0 will be rejected. This indicates a significant association between communities and attributes.

Gravity model.

The gravity model has been widely used in economic-geographical research. The basic form is as follows:

Generally speaking, represents the strength of the connection between cities or regions (such as the strength of the relationship and the flow of people); and are the “quality” of a city or region, which are generally expressed by GDP; is mostly used to represent the spatial distance or traffic accessibility (e.g., the longer the driving time between the two places, the larger the value, the lower the traffic accessibility level).

Compared with the strict assumptions of the Ordinary Least Squares (OLS) method regarding the distribution of error terms, the PPML estimation method has better applicability: it can not only naturally fit the non-negative integer type dependent variable, but also maintain estimation consistency when the dependent variable is a continuous non-negative value and excessive dispersion is present, and effectively handle the statistical interference caused by zero-value observations. Therefore, it is widely used in the empirical estimation of gravity models. In this study, we adopt this method, for which the formula is as follows:

represents the dependent variable (such as the strength of the connection between cities and ); represents the explanatory variable vector (such as GDP, distance, etc.). represents the vector of parameters to be estimated; and represents the conditional expectation.

The log-likelihood function of PPML is:

In this study, we used Python to write calculation programs for model diagnosis and regression estimation.

Results

Spatio-temporal characteristics of the distribution of LCFMIs

Overall pattern of LCFMIs in China.

LCFMIs were mainly distributed in the eastern and southern coastal areas of China, as well as in provincial capitals and their surrounding areas in the central and western parts of the country. Specifically, 34.49% of the LCFMIs were located in the eastern provinces (Beijing, Tianjin, Hebei, Liaoning, Shanghai, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong, and Hainan), of which 49 had registered capital exceeding RMB 100 million—accounting for 42.61% of the national total. Moreover, 28.19% of the LCFMIs were located in central China (Shanxi, Henan, Anhui, Hubei, Hunan, and Jiangxi), of which 27 had registered capital exceeding RMB 100 million, comprising 23.48% of the national total.

Based on spatial distribution characteristics, the density and degree of agglomeration of the LCFMIs in different provinces varied. For example, the spatial distribution of the LCFMIs with high registered capital in Shandong was relatively not concentrated; however, that of the LCFMIs with high registered capital in Sichuan, Hunan, and other provinces was relatively concentrated. In Guangdong, the LCFMIs with high registered capital were mainly distributed in the Pearl River Delta region. These patterns reflect the spatial economic strategies and industrial priorities of the respective provinces. For example, Sichuan and Hunan showed prominent characteristics of a “strong provincial capital” strategy, where most of the investment and market resources are concentrated in provincial capitals, leaving other cities in the province relatively underdeveloped in terms of their economic, investment, and financing levels.

In addition, the number of LCFMIs in some provinces, such as Qinghai and Xizang, was small, and the amount of registered capital was low.

Distribution characteristics of the registered capital of LCFMIs by province.

Fig 2 displays the variation in registered capital across provinces. The results showed a significant difference in the average registered capital of LCFMIs in different provinces, with that of Beijing and Shanghai being significantly higher than the others. Furthermore, the registered capital of LCFMIs within a province varied greatly in some cases and was relatively small in others. For example, the coefficient of variation for the registered capital of LCFMIs in Guangdong Province was 3.82, while that in Qinghai Province was 0.46.

thumbnail
Fig 2. Coefficient of variation and mean value of LCFMI registered capital.

https://doi.org/10.1371/journal.pone.0351835.g002

Overlaying the average registered capital of the LCFMIs with the average GDP per capita (Fig 3) of each province from 1981 to 2023, we found that the more developed the economy, the larger the scale of the LCFMI.

thumbnail
Fig 3. Relationship between provincial GDP per capita and LCFMI registered capital.

https://doi.org/10.1371/journal.pone.0351835.g003

SBS and network analysis

Overview of SBS.

Pairwise comparisons of company business scopes revealed a mean value of 7.23 and standard deviation of 8. Regarding distributional characteristics (Fig 4), 60.12% of pairs fell within the 0–6 SBS range, indicating that the vast majority of companies had large differences in the scope of their operations. In other words, the number of business segments in the LCFMIs was high.

thumbnail
Fig 4. Segmentation and accumulation of SBS for different interval ranges.

https://doi.org/10.1371/journal.pone.0351835.g004

Network of SBS.

We systematically examined the distinct community detection results of the SBS network under different algorithm selections and parameter configurations. As presented in Table 1, with the resolution parameter increasing from 1.0 to 1.1 and 1.2, the modularity value (Q) of the SBS network decreased from 0.13 to 0.09 and 0.07, while the number of communities increased from 3 to 6–9 and 9–11, respectively (across different algorithm and random seed combinations). This finding aligns with the general principle of network community detection, wherein higher resolution parameters tend to split cohesive communities into finer subgroups at the cost of reduced overall modularity.

thumbnail
Table 1. Properties of the SBS network under Different Parameter Settings.

https://doi.org/10.1371/journal.pone.0351835.t001

The consistent trend of results generated by the Louvain and Leiden algorithms indicates robust community identification for this network, confirming that the detected community structure is an inherent attribute of the network rather than an artifact of algorithmic bias. Meanwhile, only minor fluctuations in community metrics were observed across different random seeds, further verifying the stability of the identified community structure.

To further explore the associations between the community division of the SBS network and firms’ administrative region affiliations as well as industry subcategories, this study calculated homogeneity indices to quantify the attribute consistency of nodes within each community and employed chi-square tests to assess the external validity of the community detection results. The resolution parameter of 1.0 was selected as the benchmark for in-depth analysis, as it represents the classical default setting for the Louvain algorithm and yielded a relatively high modularity value of 0.13. Under this resolution, both modularity values and community counts remained stable across different algorithms (Louvain and Leiden) and random seeds (50, 100, 150, 200). Additionally, the impacts of algorithm and random seed selections on community division were found to be negligible, supporting the prioritization of results derived from the Louvain algorithm with a random seed of 50 and resolution of 1.0 for external validity assessment.

The results revealed that the homogeneity index for administrative regions reached 0.54, indicating a moderate level of geographical agglomeration within the identified communities. In contrast, the homogeneity index for industry subcategories was relatively low at 0.27. Specifically, for administrative region distribution, firms located in Anhui, Fujian, and Guangdong provinces were significantly overrepresented in Community 0 compared to Communities 1 and 2, demonstrating a clear geographical clustering pattern. For industry subcategory distribution, all three communities were dominated by firms categorized under “Other food manufacturing” (208 firms in Community 0, 125 in Community 1, and 45 in Community 2). The high proportion of this category across all communities diluted the agglomeration effect of other specific industry subcategories. Notably, the “Other food manufacturing” category typically encompasses firms with broad business scopes, which are relatively difficult to define with precise industry boundaries.

Relationship between SBS and distance

When discussing the distance decay effect and analogous issues, it is common practice to examine conditions across different distance intervals (bandwidths). In this study, we selected 40 bandwidths ranging from 5 km, 10 km, 15 km, …, to 190 km, 195 km, and 200 km (Distance here refers to the straight-line distance between companies.). For each bandwidth, we calculated the correlation coefficients between SBS and distance across distinct distance intervals, along with confidence intervals and significance levels under varying scenarios. To ensure the rigor of statistical inference, the Benjamini-Hochberg (FDR-BH) method was applied for multiple testing correction. Additionally, kernel smoothing analysis and correlogram analysis were conducted to corroborate the results derived from artificially defined bandwidths.

We found that in most cases—regardless of whether the bandwidth was small or large—the corrected p-values corresponding to the correlation coefficients between SBS and distance within most distance groups generally exceeded 0.01 (Fig 5). However, when the bandwidth was 55 km or larger, the significance of the correlation coefficient between SBS and distance became highly pronounced in the first segment of the distance grouping (i.e., the range of “0~xx km”). This phenomenon indicates that the negative correlation between SBS and distance exhibits a distinct “near-range effect.” The small absolute value of the correlation coefficient in the near range suggests that while SBS and distance are significantly negatively correlated, the strength of this negative correlation is relatively weak.

thumbnail
Fig 5. The correlation coefficients and corresponding p-values between SBS and distance for the “0 – xx km” group under different bandwidths.

https://doi.org/10.1371/journal.pone.0351835.g005

Our kernel smoothing analysis and correlogram analysis consistently support the above conclusions. As illustrated in Fig 6, within the 0–1000 km distance range, the blue solid line in the kernel smoothing plot shows minimal fluctuation, accompanied by narrow confidence intervals, indicating a weak but stable local correlation in the near-range region. In the correlogram analysis, the overall fluctuating downward trend demonstrates a degree of distance decay characteristics—weak yet consistently present.

Formation and evolution of city networks in firm expansion

City network characteristics.

With the continuous development of their strengths, many LCFMIs are seeking to expand, and this movement is spatially manifested in the development of production and operational activities in cities other than the place of incorporation. This study used recruitment data to construct a city network reflecting such expansion.

As shown in Fig 7, from 2016 to 2024, the LCFMI city network grew by 35.92% in terms of average weighting. The structure of the LCFMI city network in 2024 was more complex, with the emergence of sub-clusters such as “Changsha-Yiyang,” “Hohhot-Shanghai-Beijing,” and “Hefei-Chengdu.” These findings indicate that the development of LCFMIs further tightened inter-city relationships.

thumbnail
Fig 7. LCFMI city network: a) 2016 and b) 2024.

https://doi.org/10.1371/journal.pone.0351835.g007

The node size represents the weighting degree and the node color indicates the module degree level; the darker and thicker the green line, the higher the weight of the edge.

Factors influencing the LCFMI city network.

This study refers to the research of scholars such as Larson [48], Kaya [49] and Larch [50]. When constructing the gravitational model to discuss which social and economic factors the LCFMI city network might be related to, the PPML method was adopted.

The connection strength of the LCFMI city network is set as . Because the LCFMI city network is a directed network, the GDP of the starting city is city1gdp, and the GDP of the ending city is city2gdp. The driving time between the starting and ending cities represents the traffic accessibility between the two cities, which is set as dr. Whether the two cities belong to the same province (sp) and the distance (d) between the cities in a straight line will also have an impact on the connections between the cities; hence, they are also included in the model. When building the model, 1 is used to indicate that the two cities belong to the same province, while 0 is used to represent that they do not belong to the same province.

We employed four different methods to construct the gravitational model and then conducted diagnostics on them as well as compared the results. These four methods are: no fixed effect, fixed city1, fixed city2, and bidirectional fixed effect.

Before conducting the diagnosis of each model, we performed a multicollinearity test and found that the values of this item for each variable were all much less than 10 (city1gdp: 1.0153, city2gdp: 1.0117, dr: 1.1178, sp: 1.2309, d: 1.3422). In the specific diagnostic results (Table 2), Model 4 has no heteroskedasticity issue, and its McFadden_R² is the highest, reaching 0.4174, significantly exceeding the other three models. Meanwhile, the AIC and BIC values of Model 4 are also the lowest, while its log-likelihood is the highest among the four models. Therefore, based on the model diagnosis results, Model 4 is slightly better.

thumbnail
Table 2. Diagnostic results of the four models.

https://doi.org/10.1371/journal.pone.0351835.t002

However, from the perspective of the regression results, Model 4 still has its shortcomings. Although its Pseudo R2 is the highest, only one variable, dr, among the five variables is significant (p < 0.01), and the influence of dr on the dependent variable is negative (the coefficient is −0.0006, which is less than 0). In fact, all four models show significant results, and the coefficients are all negative. Therefore, for the LCFMI city network, a smaller value of dr is more conducive to strengthening the network connections. Another interesting issue highly worthy of discussion is the positive or negative nature of the coefficient of d. From Table 3, we observe that when p is less than 0.01, the coefficients of variable d in Model 1 and Model 3 are both positive, namely 0.0001 and 0.0002. This fact indicates that the relatively large straight-line distance between cities actually helps strengthen the internal connections within the LCFMI city network to a certain extent. The reason for this is that as information technology develops and transportation conditions improve, enterprises find it easier to establish personnel relationship networks across cities. Moreover, conducting recruitment and implementing production and business operations in a less developed city away from the headquarters can also help reduce costs and increase efficiency to a certain extent.

thumbnail
Table 3. Regression Results of the Four Models.

https://doi.org/10.1371/journal.pone.0351835.t003

Discussion

Since China’s reform and opening up of its economy, the food manufacturing industry has advanced considerably. Based on their regional industry characteristics and inherent business strengths, various LCFMIs have established industry patterns with varying business scopes. This has resulted in significant market diversification, offering increasingly rich food consumption choices for urban and rural residents in China. Such diversification indicates significant segmentation within the industry and underscores the necessity of exploring the internal and external factors influencing a firm’s business scope in future research. This observation raises at least two critical questions. First, does SBS among LCFMIs reflect a pattern where latecomer firms learn from or imitate the forerunners? In other words, at the micro level, could it be that after observing Firm A’s substantial economic gains from certain food products, Firm B decides to locate closer to (to facilitate information sharing and infrastructure access) or farther away from (due to competition concerns) Firm A while simultaneously engaging in similar business activities? Second, do similar patterns exist among non-listed companies?

SBS exhibits an overall distance-decay pattern, which aligns with the spatial regularity of economic-geographic activities indicated by Rao et al. [51] and Han et al. [52] in their research. This indicates that the urban economic space has its own internal laws, which are often gradually formed after the comprehensive action and game of multiple factors, such as multiple market subjects and market economic systems and mechanisms. Regarding this point, a question worthy of further discussion emerges: Do similar phenomena exist across different manufacturing sectors? If so, how might the relationship between business scope similarity and distance in those sectors differ from the pattern observed in this study?

Because of their higher levels of informatization and financialization, listed companies, including those in the food manufacturing industry, have a greater potential for cross-city recruitment and expansion compared to small- and medium-sized non-listed enterprises. Such cross-city connections are governed by various factors, including inter-city traffic accessibility and urban economic scales. Therefore, it is essential to enhance and optimize traffic accessibility between cities. However, this aspect of research needs further development. More economic data should be gathered to enhance the model and improve its explanatory power in future research. Constructing urban networks based on inter-firm relationships is not limited to recruitment data. In fact, other critical linkages—such as supply chains between upstream and downstream enterprises, as well as inter-firm patent collaboration and technology transfer—can also serve as a foundational basis for capturing relationships between cities. This represents another promising avenue for future research.

As demand from Chinese urban and rural residents for high-quality food consumer goods increases, rational planning and guidance in the layout of food manufacturing enterprises in metropolitan areas, urban agglomerations, and other spaces become linked to whether the competition between production and operation among enterprises can be controlled within a reasonable range. From a policymaker’s perspective, various industrial policies must align with regional development to prevent congestion [53]. Specifically, the policy implications of this study are threefold. First, when planning food manufacturing clusters, policies should leverage the distance-decay pattern of business scope similarity. This means avoiding the introduction of firms with highly homogeneous businesses at very close spatial scales and instead fostering agglomerations based on industrial chain complementarity to optimize the competitive landscape. Second, given that transportation accessibility significantly influences the inter-city flow of factors, improving connectivity between core cities should be a priority for infrastructure investment. This enhances regional economic networks and facilitates knowledge spillovers. Third, considering the spatial hierarchy observed in the cognitive geography of firms, differentiated policy measures are necessary. Support for high-value-added activities should be focused around innovative cores, while specialized operations that utilize local resources can be encouraged in peripheral areas. This approach fosters regional synergy and complementary development.

This study has several limitations that should be acknowledged, which also point to directions for future work. First, our analysis focuses exclusively on listed companies. While listed firms are influential actors with significant impacts on regional economic landscapes—justifying their selection for this study—this focus necessarily excludes the vast population of small- and medium-sized, non-listed enterprises. Consequently, our findings regarding business scope patterns and inter-city networks may not be fully generalizable to the entire food manufacturing sector. Second, our dataset includes 777 LCFMIs that were active as of December 19, 2024. Although this date is 12 days before the end of the year, the proportion of firms likely to undergo major changes (such as de-listing) within this short window is minimal. Therefore, we consider the impact of this slight temporal discrepancy on our core conclusions to be negligible. Third, our modeling process involves certain simplifying assumptions. For instance, the selection of variables in the gravity model, though informed by theory, could be expanded. Future research could incorporate a wider array of factors influencing inter-city corporate linkages, such as detailed supply-chain data. Exploring these avenues would enhance the explanatory power and nuance of the model.

Conclusion

This study, based on data including the location, registered capital, business scope, and recruitment announcements of 777 LCFMIs in mainland China, employs various methods such as text analysis, city network analysis, and the gravity model. It examines the distribution pattern of LCFMIs, the SBS network and its relationship with distance, the city network constructed from recruitment data, and the faxtors influencing this network. The main conclusions are as follows:

First, the distribution of LCFMIs aligns with the macroeconomic landscape of mainland China. More developed provinces host a larger number of companies with greater registered capital.

Second, the SBS values are generally low, reflecting the differentiated business strategies and models among LCFMIs. The SBS network constructed from pairwise similarities exhibits certain small-world characteristics, along with tendencies of spatial convergence and industrial subcategory convergence. This means that firms classified within the same community have a relatively high probability of being located in the same province, neighboring provinces, or belonging to the same industrial subcategory. A significant, albeit overall weak, negative correlation is observed between SBS and distance. This correlation is particularly notable within the “0 ~ xx km” segment.

Third, the city network constructed from corporate recruitment data reflects the spatial expansion of corporate influence and reach. The results from the gravity model analysis indicate that transportation accessibility plays a crucial role, suggesting that LCFMIs are sensitive to transport convenience when recruiting across cities.

Supporting information

S1 Appendix

The relevant parameters involved in calculating SBS, their meanings, and the calculation methods.

  1. (1) Random Seed

Random seed can fix the initial state of the random number generator to ensure result reproducibility. Let the random number sequence be , where (follows a uniform distribution over the interval [0,1]). By setting the seed , the sequences generated in multiple runs are identical:

denotes the random number sequence generated with seed in the run; means “is equivalent to”; is the random number in the sequence (no conflict with symbols in other modules).

  1. (2) Chinese Word Segmentation

We split continuous Chinese text into semantically independent word units, laying the foundation for feature extraction. Let the original text be( is the original business scope text corresponding to document , is the Chinese character in the text). The goal of word segmentation is to find the optimal word sequence that satisfies

where is the original word set of document after segmentation; is the word in the set; denotes the union of all words (must cover the complete text ); means no overlap between any two words; is the occurrence probability of word in the dictionary; means finding the word sequence that maximizes the subsequent product.

  1. (3) Stopword Filtering

Stopword filtering can remove words with no semantic contribution, retain valid features, and reduce subsequent computational complexity. Let the original word set of document after segmentation be , and the global stopword set be , where: is custom industry stopword list (e.g., words with no differentiation such as “公司” [company], “服务” [service]); is set of words with length less than 2 ( denotes the character length of word ).

The valid word set of document afer filtering is:

The global valid word set (union of valid words from all documents) is:

is the valid word set of document ; is the total valid word set of all documents; is the total number of valid words (consistent with the dimension of TF-IDF vectors); is the total number of documents in the document set (consistent with in IDF calculation).

  1. (4) Document Frequency (DF)

Let the document set be , and the global valid word set be . The document frequency of word is:

where is the indicator function (if holds, i.e., word belongs to the valid word set of document , then , otherwise ); is the total number of documents; is the total number of valid words.

  1. (5) N-gram Feature Extraction

We extract unigram (1-gram) and bigram (2-gram) features to capture word semantics and collocation relationships, enriching the dimension of text features.

Let the valid word sequence of document be ( is the number of valid words in document ). Then:

  1. 1-gram feature set (unigrams):
  2. 2-gram feature set (bigrams):
  3. Final feature set of document :

The global final feature set (union of features from all documents) is:

where and are the 1-gram and 2-gram feature sets of document respectively; is the final feature set of document ; is the global final feature set; is the feature in the global feature set (corresponding to the word dimension in the TF-IDF vector, is the total number of features, consistent with the previous total number of valid words); is the total number of documents.

  1. (6) L2 Normalization

We normalize the TF-IDF vector to eliminate the impact of document length differences on similarity calculation and improve result comparability. Let the TF-IDF vector of document be , where ( is the feature in the global feature set ). The vector after L2 normalization is:

where is the L2 norm (Euclidean length) of vector ; is the total number of global features (consistent with the dimension of TF-IDF vectors and the total number of valid words); is the element of the normalized vector.

Acknowledgments

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. We are extremely grateful to Mrs. Yang Xing for her assistance in writing this manuscript.

References

  1. 1. Psomas E, Deliou C. Lean manufacturing practices and industry 4.0 technologies in food manufacturing companies: the Greek case. IJLSS. 2024;15(4):763–86.
  2. 2. Statistics NBo. China Statistical Yearbook 2024. Beijing: China Statistics Press; 2024.
  3. 3. Hanson GH. Market potential, increasing returns and geographic concentration. J Int Econ. 2005;67(1):1–24.
  4. 4. Bosker M, Brakman S, Garretsen H, Schramm M. Adding geography to the new economic geography: bridging the gap between theory and empirics. J Econ Geogr. 2010;10(6):793–823.
  5. 5. An Y, Kang Y, Lee S. A study on the impact of soft location factors in the relocation of service and manufacturing firms. Int J Urban Sci. 2014;18(3):327–39.
  6. 6. Wang K-J, Lestari YD, Tran VNB. Location Selection of High-tech Manufacturing Firms by a Fuzzy Analytic Network Process: A Case Study of Taiwan High-tech Industry. Int J Fuzzy Syst. 2017;19(5):1560–84.
  7. 7. Nuruzzaman N. Revisiting strategic motives, location choices, and impacts of international research and development. Newark: Rutgers The State University of New Jersey; 2020.
  8. 8. Bauman TJ. Reshoring: manufacturing location decision strategies. D.B.A. Thesis, Capella University. 2020. Available from: https://webofscience.clarivate.cn/wos/alldb/full-record/PQDT:67971698
  9. 9. Lee J, Lee J-Y. The Study on the Regional Productivity Differences depending on the Location of the Manufacturing Subsidiary and the Proximity of the Parent Company. J Korean Assoc Reg Geogr. 2017;23(3):529–42.
  10. 10. Zhao Z, Zheng L. The effect of the spatial distribution of state-owned enterprises on the location of private-owned enterprise births. J Reg Sci. 2024;64(1):136–75.
  11. 11. Zheng D, Shi M. Industrial land policy, firm heterogeneity and firm location choice: Evidence from China. Land Use Policy. 2018;76:58–67.
  12. 12. Duan J, Zhao Z, Xu Y, You X, Yang F, Chen G. Spatial Distribution Characteristics and Driving Factors of Little Giant Enterprises in China’s Megacity Clusters Based on Random Forest and MGWR. Land. 2024;13(7):1105. https://doi.org/10.3390/land13071105
  13. 13. Tolerra MJ, Worku H. Spatial distribution vis-a-vis agglomeration of industry in Oromia special zone surrounding Finfinnee, Ethiopia. Int J Contemp Econ Adm Sci. 2024;14(1):1–13.
  14. 14. Gong W, Zhang Z, Gao F, Li R, Ma X. Comparison of the spatial distribution and location selection of producer services and manufacturing in Lanzhou from the perspective of relevance. Geogr Res. 2021;40(11):3154–72.
  15. 15. Liu Y, Wu Y, Zhu X. Industrial clusters and carbon emission reduction: evidence from China. Ann Reg Sci. 2024;73(2):557–97.
  16. 16. Zinilli A, Gao Y, Scherngell T. Structural dynamics of inter-city innovation networks in China: A perspective from TERGM. Netw Spat Econ. 2024;24(3):707–41.
  17. 17. Zhao Y, Lyu J, Huesig S. The impact of innovative city cooperation network on city’s innovation efficiency: Evidence from China. J Knowl Econ. 2023.
  18. 18. Zhao D, Tang C, Huang X. Innovation network, knowledge spillover, and urban green economic performance. Front Phys. 2025;13.
  19. 19. Cao Z, Derudder B, Dai L, Peng Z. An analysis of the evolution of Chinese cities in global scientific collaboration networks. ZFW – Adv Econ Geogr. 2023;67(1):5–19.
  20. 20. Zhang W, Qian Y. Unpacking intercity competitive relations in the global corporate spatial organization of manufacturing. G Netw Trans Aff. 2024;24(3).
  21. 21. Li E, Ma Y, Wang Y, Chen Y, Niu B. Competition among cities for export trade brings diversification: The experience of China’s urban export trade development. PLoS One. 2022;17(9):e0271239. pmid:36107934
  22. 22. Zhang Z, Tang Z. Exploring the imprint of the institutional context on the urban network in China: Comparative analyses between corporate-based networks with different ownership structures. Netw J Transn Aff. 2024;24(4).
  23. 23. Yang Y, Lu J, Caset F, Derudder B. Agglomeration externalities or network externalities? Explaining productivity in Chinese urban regions. Spat Econ Anal. 2024.
  24. 24. Xu Y, Kang C, Jiao W, Jia Y. Discovering structure and influencing factors of Chinese city directed network (CCDN) from web search engine data. Appl Geogr. 2025;177.
  25. 25. Zhang Z, Yan Z, Meng X. The Effects of Intergovernmental Networks on Intercity Collaborative Innovation in China. Asia Pac Policy Stud. 2025;12(2).
  26. 26. Zhang W, Qian Y, Tang J, Liu X. Exploring cooperative and competitive relations in a Chinese intercity innovation network. Appl Geogr. 2025;175:103508.
  27. 27. Liu L, Sun Y, Li W. Characteristics and prediction of urban interaction networks from the perspective of traffic flow and text information flow. Environ Plan B Urban Analyt City Sci. 2025;52(2):430–56.
  28. 28. Man S, Yang Y, Zeng T, Wang M. Cross-border urban networks based on manufacturing global value chain: A study of listed companies in Western China. Chin Geogr Sci. 2023;33(6):1033–52.
  29. 29. Zhao L, Yang L, Chang X. Spatial network characteristics and economic effects of element flow in the Lanxi urban agglomeration. PLoS One. 2024;19(5):e0296496. pmid:38701104
  30. 30. Zhang X, Huang X, Shi J, Zheng Y, Wang J. Connections and Spatial Network Structure of the Tourism Economy in Beijing–Tianjin–Hebei: A Social Network Perspective. Land. 2024;13(10):1691.
  31. 31. Zhang L, Zuo X, Wu Z, Chen C, Pan Z, Hu X. The Spatial Structure and Driving Mechanisms of Multi-Source Networks in the Chengdu–Chongqing Economic Circle of China. IJGI. 2023;12(10):411.
  32. 32. Zheng Y, Xiao J, Tang J. Research on urban agglomeration spatial network structure in the middle reaches of the Yangtze River based on real-time traffic accessibility scenario analysis. Transp Left. 2025;17(2):369–83.
  33. 33. Zhang X, Ma W, Sheng S. Understanding the structure and determinants of economic linkage network: The case of three major city clusters in Yangtze River economic belt. Front Environ Sci. 2023.
  34. 34. Ren X, Xiong R, Ni T. Spatial network characteristics of carbon balance in urban agglomerations - a case study in Beijing-Tianjin-Hebei city agglomeration. Appl Geogr. 2024;169.
  35. 35. Dong S, Ren G, Xue Y, Liu K. Urban green innovation’s spatial association networks in China and their mechanisms. Sust Cities Soc. 2023;93:104536.
  36. 36. Yang X, Li H, Zhang J, Niu S, Miao M. Urban economic resilience within the Yangtze River Delta urban agglomeration: Exploring spatially correlated network and spatial heterogeneity. Sust Cities Soc. 2024;103:105270.
  37. 37. Gao X, Zhu J, Liu J. Density, Division and Distance: Understanding China’s Urban Land-Use Change from an Economic Geography Perspective. Appl Spat Anal. 2024;17(2):439–69.
  38. 38. Tu Y, Chen Z, Wang C, Yu B, Liu B. Quantitative Analysis of Urban Polycentric Interaction Using Nighttime Light Data: A Case Study of Shanghai, China. IEEE J Sel Top Appl Earth Observ Remote Sens. 2022;15:1114–22. https://doi.org/10.1109/jstars.2021.3137167
  39. 39. Wang X, Liu J, Zhang W. Impact of High-Speed Rail on Spatial Structure in Prefecture-Level Cities: Evidence from the Central Plains Urban Agglomeration, China. Sustainability. 2022;14(23):16312.
  40. 40. Zhang C, Zhao L, Song X, Zhang Q, Zhang X. Spatial-temporal coupling characteristics and interaction effects of economic resilience and people’s livelihoods and well-being: An analysis of 78 cities in the Yellow River Basin. Sust Cities Soc. 2024;112:105638.
  41. 41. Vagnini C, Vieira LC, Longo M, Mura M. Regional drivers of industrial decarbonisation: a spatial econometric analysis of 238 EU regions between 2008 and 2020. Reg Stud. 2025;59(1).
  42. 42. Yang LXZ, Qian WJ. Strategic interactions of Chinese cities in improving total factor energy efficiency: a two-regime spatial econometric analysis. Environ Dev Sustain. 2024.
  43. 43. Li J-F, Liu W-W, Yu X-S, Sun H-X, Wang D-F. Research on the Spatial-Time Evolution and Influence Mechanism of Logistics Network Structure in Zhengzhou Metropolitan Area. IEEE Access. 2023;11:41596–608.
  44. 44. Lee CG, Woo H, Yang JS. Two kinds of gravitational forces in transport: An analysis using the gravity model. Int J Mod Phys C. 2024;35(06).
  45. 45. Silva JS, Tenreyro S. The log of gravity. Rev Econ Stat. 2006;641–58.
  46. 46. Bosquet C, Boulhol H. Applying the GLM variance assumption to overcome the scale-dependence of the negative binomial QGPML estimator. Economet Rev. 2014;33(7):772–84.
  47. 47. Manning CD. An introduction to information retrieval. Cambridge: Cambridge University Press; 2009.
  48. 48. Larson J, Baker J, Latta G, Ohrel S, Wade C. Modeling International Trade of Forest Products: Application of PPML to a Gravity Model of Trade. For Prod J. 2018;68(3):303–16. pmid:32280136
  49. 49. Kaya Aydın G, Aydın U, Ülengin B. A Comparison of Forecasting Performance of PPML and OLS estimators: The Gravity Model in the Air Cargo Market. Ekoist. 2023;0(39):112–28.
  50. 50. Larch M, Wanner J, Yotov YV, Zylkin T. Currency Unions and Trade: A PPML Re‐assessment with High‐dimensional Fixed Effects. Oxf Bull Econ Stat. 2018;81(3):487–510.
  51. 51. Rao X, Dai M. A review of the spatial characteristics of R&D industry and its role in the economy. J Low Carbon Econ. 2016;5(4):49–58.
  52. 52. Han Y, Wei D, Zhang F, Wang X. Industrial structure distance and Chinese overseas mergers and acquisitions: A test of the new structural development theory. Stat Res. 2024;41(11):20–35.
  53. 53. Shoufu Y, Dan M, Zuiyi S, Lin W, Li D. The impact of artificial intelligence industry agglomeration on economic complexity. Econ Res-Ekon Istraz. 2023;36(1):1420–48.