Tiled vector data model for the geographical features of symbolized maps

Electronic maps (E-maps) provide people with convenience in real-world space. Although web map services can display maps on screens, a more important function is their ability to access geographical features. An E-map that is based on raster tiles is inferior to vector tiles in terms of interactive ability because vector maps provide a convenient and effective method to access and manipulate web map features. However, the critical issue regarding rendering tiled vector maps is that geographical features that are rendered in the form of map symbols via vector tiles may cause visual discontinuities, such as graphic conflicts and losses of data around the borders of tiles, which likely represent the main obstacles to exploring vector map tiles on the web. This paper proposes a tiled vector data model for geographical features in symbolized maps that considers the relationships among geographical features, symbol representations and map renderings. This model presents a method to tailor geographical features in terms of map symbols and ‘addition’ (join) operations on the following two levels: geographical features and map features. Thus, these maps can resolve the visual discontinuity problem based on the proposed model without weakening the interactivity of vector maps. The proposed model is validated by two map data sets, and the results demonstrate that the rendered (symbolized) web maps present smooth visual continuity.


Introduction
Web electronic maps (E-maps) have been extensively exploited and widely applied with the rapid development of technologies such as wireless communication, computer technology and geographic space information technology [1]. Many successful applications of E-maps have been implemented in daily life [2]. E-maps can be used as information inquiry services by providing users with specific information, such as POIs (points of interest) and tourist information [3,4]. E-maps present bus routes and real-time traffic information and provide data that support vehicle navigation [5,6]. E-maps are also useful for teaching geography [7] and urban planning [8]. The emergence of social media (such as Twitter, Facebook, and the Sina microblog) has inspired a considerable amount of research that has used check-in data and a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 map [48,50]. Therefore, a critical issue for tiled vector E-maps is to join neighboring features with their symbol representation to maintain visual continuity and avoid visual conflicts.
To eliminate visual breakage, this study proposes a tiled vector data model for the geographical features that define the additivity of map features and geographical features, partition vector geographical features, and implement map symbolizations to graphically match joined symbolized partitioned features without causing graphic conflicts and losses.
This paper is organized as follows: Section 2 describes the symbol representation method. Section 3 introduces the tiled vector data model, verifies the feasibility of the data model, and then proposes different data organization methods for geographical features based on the tiled vector data model. Section 4 reports an experiment that uses the proposed model and discusses the results of the experiment. Finally, Section 5 presents the conclusions and future work.

Symbol representation
Point symbols, linear symbols and area symbols are the three basic symbol representation methods for vector data in cartography and geographic information system [48]. The symbol representation for point features renders the specified point symbol at a location point. The symbol representation for area features can be summarized as the rendered filling of different graphic cells based on the scan line algorithm or its improved algorithms. Compared to the symbol representation for point symbols and area symbols, the symbol representation for linear features is much more complicated. A linear feature is generally symbolized by repeatedly drawing the corresponding symbol along its path. Fig 1(a) shows the symbol of a railway and the map feature of a railway geographical feature (representing the linear feature), and Fig 1(b) shows the symbol of grassland and the map feature of a grassland geographical feature (representing the area feature).
Any symbol q in the symbol library Q is an ordered set that consists of geometric graphics (such as polylines, polygons and ellipses) with properties that include color and line width. The circumscribing polygon of the symbol can be expressed as a rectangle; the width of the rectangle is referred to as the width of the symbol, and the length of the rectangle is referred to as the length of the symbol. λ is a symbol parameter: when the feature is a linear feature, the symbol parameter λ denotes the length of the linear symbol l, and when the feature is an area feature, λ denotes the width of the symbol w and the length l of the symbol [51]. The symbol  Geographical feature space is the abstract expression of spatial entities and geographical phenomena in the real world [52]. Geographical feature space can be converted to map feature space by symbol representation. This conversion process can be expressed as follows [51]: where M represents the map feature space, G represents the geographical feature space, Q represents the symbol library (set) that corresponds to the geographical feature space G, and S represents the common function for symbolizing the geographical feature space. The conversion process for an individual map feature in the map feature space can be expressed as follows: In eq (3), the connection between geographic features and map features is fixed for a given scale, this is because the symbol is fixed. However, a feature may have different symbols at different scales.

Graphic conflicts in symbolization and operators for tiled vector data model
Graphic conflicts and losses may occur along the borders of tiles when symbolizing geographical features on each tile and connecting tiles according to their respective coordinates. Examples of the three basic types of problems that arise are shown in Fig 2. As shown in Fig 2(a), when the points at the borders of the tiles are symbolized, the components of the map features inside the corresponding tile are easily rendered. Nonetheless, we cannot omit the components of map features that are outside these tiles because such deletions would create additional issues, such as a loss of certain components and the non-conformity of map features along the tile borders. Fig 2(b) shows an example of symbolizing a linear feature along the borders of these tiles, and the problems are marked. The key to resolving these problems is to design or construct an operator for map features with 'additivity', which means that two separate symbolizations can yield the same graphic presentation as a single symbolization. Several basic concepts and signs are introduced here to illustrate the above problems and clarify the principle of the proposed model. The "addition" sign È denotes joining two features, È M denotes joining two map features and È G denotes joining two geographical features. Definition 1. The premise of the "addition" operator for geographical features: The operation of g 1 È G g 2 is tenable if EITHER (1) both g 1 and g 2 are linear features, g 1 and g 2 are connected at their ends, or g 1 = g 2 ; OR (2) both g 1 and g 2 are area features and g 1 intersects, touches or contains g 2 .
Examples for Definition 1 are shown in Fig 3, where (a), (d) and (e) satisfy the premise of the "addition" operator for geographical features and (b), (c) and (f) do not satisfy the premise of the "addition" operator. Definition 2. The "accumulation" operator for geographical features: An "accumulation" operator for geographical features is defined as^.^n i¼1 g i can be expressed aŝ n i¼1 g i ¼ g 1 È G g 2 È G . . . È G g n (g i 2 G, n ! 1, g 1 .g 2 . . . g n satisfy Definition 1).
The difference between the "addition" sign for geographical features and the "accumulation" operator in this study is similar to the difference between the plus sign "+" and the sum operator "∑" in mathematics. To avoid cognitive conflicts with common mathematical operators, we use "È G " and "^" to replace "+" and "∑", respectively. The "accumulation" operator for geographical features is used in this article to simplify this formula. As shown in Fig 4, g is a symbol, and the linear feature A-B is divided into 5 components (g 1 , g 2 , g 3 , g 4 and Δg) by this symbol. Thus, the linear feature can be written as g 1 È G g 2 È G g 3 È G g 4 È G Δg. However, expressing all the geographical features in this manuscript similarly to this formula is tedious. Therefore, we use the "accumulation" operator to simplify this formula to^4 i¼1 g i È G Dg. Definition 3. The premise of the "addition" operator for map features: The geographical features g 1 and g 2 have the same symbol q and a corresponding symbol parameter λ. By using these symbols, we can obtain the map features m 1 = S(g 1 , q) and m 2 = S(g 2 , q). The formula m 1 È M m 2 = m 3 holds true when (1) feature g 1 can be divided into n components (g l 1 ; g l 2 . . . g l n , where n is an integer) by the symbol parameter λ and (2) g 1 and g 2 satisfy Definition 1. In this formula, m 3 can be expressed as m 3 = S(g 1 È G g 2 , q).
If the features satisfy Definition 3, the map features can match well. Some examples are shown in Fig 5 to explain Definition 3. In Fig 5, both g 1 and g 2 are the railway feature, and q is the railway symbol. S(g 1 , q) and S(g 2 , q) do not satisfy the premise of the "addition" operator for map features in Fig 5(a) because feature g 1 cannot be divided into n components (n is an integer) by the length of the symbol q. S(g 1 , q)È M S(g 2 , q) is obviously not equal to S(g 1 È G g 2 , q). In Fig 5(b), feature g 1 can be divided into n components (n is an integer) by the length of the symbol q, and S(g 1 , q)È M S(g 2 , q) is equal to S(g 1 È G g 2 , q). Definition 4. The invariance of addition by itself: The railway symbol is chosen as an example to illustrate why the traditional vector data model is already inapplicable to vector map tiles based on Definitions 1-4 but applicable for other linear symbols. For example, Fig 6 shows two tiles called T 1 and T 2 ; a railway feature g crosses the tiles and is clipped by the tiles into g 1 and g 2 . Thus, we can obtain the following formula: The map feature that corresponds to g can be expressed as follows: where q is the railway symbol and m denotes the map feature of g. Similarly, the map features of g 1 and g 2 can be represented by the following formulas: If m 1 È M m 2 is equal to m, then the map features match well at the borders of the tiles. As shown in Fig 6, we introduce λ to represent the length of the symbol q, and then feature g can be divided into n components (g l 1 ; g l 2 . . . g l n , where n is an integer) by the corresponding symbol parameter λ. As shown in Fig 6, we assume that feature g intersects tiles at the component k of the feature and that the tiles divide g l k into Δg 1 and Δg 2 . Δg 1 and Δg 2 are located in T 1 and T 2 , respectively. Feature g can then be rewritten as follows: The expression of feature g 1 then becomes: The expression of feature g 2 then becomes: In Formula (9), Δg 1 is not equal to g λ , as shown in  Both linear features (railway) and other types of features encounter this problem. Therefore, the key to implementing the vector map tile technique is proposing a tiled vector data model for geographical features so that tiles can comprise an entire map with accurate and complete cartographic representation. Using the proposed data model can make the cartographic representation results render more favorably for features along the borders of tiles. Meanwhile, the rationality of the data model is also verified in this section.

Tiled vector data models for point, linear and area features
We designed a novel tiled vector data model for all types of features to match map features well along the borders of tiles. Tiled vector data model for linear features. A tiled vector data model for linear features is proposed based on Definition 3 as follows: The end point of the linear feature is the intersection point: The start point of the linear feature is the intersection point: The start point and end point of the linear feature are the intersection points: In Formula (11), σ is an expansion feature and ΔgÈ G σ = g λ . Fig 8 shows examples for Formula (11). In Fig 8(a), the linear feature A-B intersects the tile at point P. The end point for the linear feature A-P is the intersection point, therefore, the data model for feature A-P is case 1 in Formula (11). In Fig 8(b), the data model for feature P-B is case 2 in Formula (11). In Fig 8(c), the data model for feature P 1 -P 2 is case 3 in Formula (11).An illustrative diagram that is based on the proposed tiled vector data model for linear features in Formula (11), the feature g 1 in Fig 6 satisfies case 1 and thus can be changed as follows: The feature g 2 in Fig 6 satisfies case 2 and can be rewritten as follows: According to Formula (12) and Formula (13), the expansion features of g 1 and g 2 are Δg 2 and Tiled vector data model Δg 1 , respectively, and m 1 È M m 2 can be rewritten as follows: Formula (14) can be simplified as follows based on Definition 3 and Definition 4: In Formula (15), m 1 È M m 2 is equal to m, which means that using the data model for linear features in Formula (11) can solve the problems of graphic conflicts and losses at the borders of tiles. This relationship also proves that expanding a feature that is clipped by a tile to a complete symbol can resolve these problems. Tiled vector data model for point features. When using the method of linear features for reference and by considering the simplicity of point features, the data model for point features is as follows: In the data model, σ is an expansion feature and σ = g. According to Formula (16), case 1 means that if the tile contains a point feature, this point feature is added to the feature set of the tile. In case 2, when the tile does not contain a point feature but the map feature of a point intersects the tile, we also add the point feature to the feature set of the tile. In case 3, when the map feature of a point does not intersect the tile, then the model skips the point feature.
Tiled vector data model for area features. Research on the rendering of area symbols can be summarized as the filling of different graphic cells [48]. Graphic cells have two patterns: color filling and symbol filling. When the filling pattern is color filling, the map features of an area feature can match well along the borders of tiles. However, when the filling pattern is symbol filling, as shown in Fig 2(c), then obvious graphic conflicts from area map features occur within neighboring tiles. Generally, a line is considered a one-dimensional object and an area is considered a two-dimensional object. Area features are more complicated than linear features; specifically, the tiled vector data model for linear features cannot adequately handle area features. We now introduce several basic concepts and signs prior to introducing the tiled vector data model for area features. As shown in Fig 10, the upper-left corner of the circumscribing rectangle of area feature g (polygon A-B-C-D-E) is vertex P, so vertex P is called the symbolization starting point of feature g. Definition 6. The premise of the "addition" operator for area map features: For area features g 1 and g 2 , the symbolization starting points of g 1 and g 2 are located at P 1 (x 1 , y 1 ) and P 2 (x 2 , y 2 ), respectively. These features have the same symbol q, and the corresponding symbol parameter λ contains the width of the symbol w and the length of the symbol l. We can obtain the area map features m 1 = S(g 1 , q) and m 2 = S(g 2 , q). The formula m 1 È M m 2 = m 3 holds true; therefore, m 1 and m 2 can be properly matched when the following criterias are met: (1) |x 1 − x 2 | = k × w (k is an integer), (2) |y 1 − y 2 | = n × l (n is an integer), and (3) g 1 and g 2 satisfy Definition 1. In this formula, m 3 can be expressed as m 3 Some examples are shown in Figs 11 and 12 to explain Definition 6. As shown in Fig 11, the area features are g 1 (polygon A-B-C-D) and g 2 (polygon G-E-F). The symbol is q, and the symbol parameter is λ(w, l). The map features of g 1 and g 2 are m 1 = S(g 1 , q) and m 2 = S(g 2 , q), respectively. The circumscribing rectangle of g 1 and g 2 can be divided by the corresponding symbol as the dashed boxes shown in Fig 11. Obviously, the areas g 1 and g 2 do not satisfy (1) |x 1 − x 2 | = k × w (k is an integer) and (2) |y 1 − y 2 | = n × l (n is an integer). Therefore, as shown in Fig 11, S(g 1 , q)È M S(g 2 , q) is not equal to S(g 1 È G g 2 , q). In Fig 12, the areas g 1 and g 2 satisfy Definition 6 and S(g 1 , q)È M S(g 2 , q) is equal to S(g 1 È G g 2 , q), as shown in Fig 12. Based on Definition 5 and Definition 6, the tiled vector data model for area features is as follows: g ¼ Dg; the fill pattern is color filling DgÈ G s; the fill pattern is color filling ð17Þ ( In Formula (17), Δg is the result of the area feature that is clipped by tiles and σ is an expansion feature of Δg that allows g to satisfy the data model. The details of the expansion feature σ are illustrated in Section "Data organization method for area features".

Data organization based on the tiled vector data model
We now propose data organization methods for vector features based on the "Tiled vector data model for point, linear and area features" section. The results can avoid graphic conflicts and losses when they are symbolized. Moreover, the results can ensure the correctness and integrity of the geographical features. These features can be applied to user interactions and spatial analyses.

Data organization method for point features.
Based on the data model for point features, the data organization method for point features consists of the following steps: 1. Obtain the circumscribing polygon of the map feature. If the circumscribing polygon does not intersect the target tile, skip the point. An example of point feature data organization is shown in Fig 13. In the traditional data model for point features, only tile 1 contains Point A, which may cause graphic losses, as illustrated in Fig 2. Based on the proposed tiled vector data model for point features, tile 1 satisfies case 1, and tiles 2-4 satisfy case 2. Therefore, we added Point A to the feature set of all of the tiles (tiles 1-4).
Data organization method for linear features. The preparation for processing linear features mainly includes the following steps.
Step one is to obtain the corresponding symbol for the linear feature. Then, the linear feature is divided into many segments based on the length symbol. We fully exploit the rectangular characteristic of the tile. The Liang-Barsky [53] algorithm is appropriate for organizing lines. The Liang-Barsky algorithm is a parametric line 2. Obtain the intersection points of the linear feature P 1 − P 2 and tile, called V 1 and V 2 , respectively. In the traditional model, the saved feature in the tile is V 1 − V 2 . However, this feature may cause graphic conflicts and losses along the borders of tiles.  (11)), the model of saved features is g ¼ s 1 È G Dg 1 È G^n i¼1 g l i È G Dg 2 È G s 2 . Dg 1 È G^n i¼1 g l i È G Dg 2 is the linear feature V 1 − V 2 in this case, and we must add the expansions σ 1 and σ 2 to V 1 − V 2 . Finally, A − V 1 and V 2 − B are σ 1 and σ 2 , and the saved feature in the tile is Another case that should be considered is a linear feature that does not intersect the tile alongside a map feature that does intersect the tile. This case may result in the loss of the map feature. Concrete details are shown in Fig 15. A method to manage such situations is as follows: 1. Divide the linear feature P 1 − P 2 into sections according to the length of the symbol. Uniform nodes are the hollow nodes in Fig 15. 2. According to the width of the feature symbol, move the feature to the direction of the tile and obtain new features, such as P 0 1 À P 0 2 in Fig 15. 3. Find the intersection points of P 0 1 À P 0 2 and the tile, called V 0 1 and V 0 2 , respectively, and then measure the perpendiculars of V 0 1 and V 0 2 with regard to P 1 P 2 . The corresponding foot points are called V 1 and V 2 , which can be regarded as the intersections of the features and tiles.  (11)). Finally, the saved feature in the tile Data organization method for area features. In this section, the grassland symbol was chosen as an example to illustrate data organization for area features. As shown in Fig 16, two tiles are marked T 1 and T 2 . An area feature g (polygon A-B-C-D-E) intersects the tiles and is clipped by the tiles into g 1 (polygon F-B-C-G) and g 2 (polygon A-F-G-D-E). The symbol for grassland is q, and the symbol parameter is λ(w, l). The map features of g, g 1 and g 2 are m = S(g, q), m 1 = S(g 1 , q), m 2 = S(g 2 , q) respectively. The symbolization starting points of g, g 1 and g 2 are located in P(x, y), P 2 (x 2 , y 2 ) and P 3 (x 3 , y 3 ), respectively. The circumscribing rectangle can be divided by the corresponding symbol into the dashed boxes in Fig 16. The areas g 1 and g 2 do not satisfy Definition 6; therefore, m 1 È M m 2 is not equal to m and the map features cannot be matched along the borders of the tiles. Based on the tiled vector data model for area features in Section "Tiled vector data model for area features", the key of this data organization is constructing the corresponding expansion features σ 1 and σ 2 for g 1 and g 2 . However, for the feature g 1 , x = x 1 , y − y 1 6 ¼ n × l (n is an integer). Therefore, we must find a point P 0 1 ðx 0 1 ; y 0 1 Þ from g 2 so that P 0 1 satisfies the following formula: The expansion feature for g 1 is constructed by P 0 1 . In this case, the vertex A can satisfy the formula and minimize the area of the expansion feature. Therefore, the expansion feature σ 1 is the polygon A-F-G. Finally, we add the polygon A-F-B-C-G to the feature set of tile T 1 . Similarly, we must find a point P 0 2 ðx 0 2 ; y 0 2 Þ for the feature g 2 from g 1 so that P 0 2 satisfies the following formula: The vertex H can satisfy the formula, and the expansion feature σ 2 is the polygon F-H-G. Finally, we add the polygon A-F-H-G-D-E to the feature set of tile T 2 .

Experiments
The proposed tiled vector data model are illustrated experimentally by using the data organization methods in Section "Data organization based on the tiled vector data model".

Experiment setting and implementation
The experimental platform of this article is a personal computer (PC) with an Intel i5-4670T central processing unit (CPU) with 2.3GHz, 4 processor counts, and 8.00 GB random access memory (RAM) running the Microsoft Windows 7 Ultimate x64 operating system. All the Tiled vector data model experiments were implemented in Desktop as our team is experienced in representing symbols with this software. The generating tile program used to implement the proposed tiled vector data model was realized using Visual C#.NET and ArcEngine of ArcGIS. ArcEngine was used for accessing spatial data as well as a tool for geometric calculation. The map explorer program was implemented using Visual C#.NET and a graphics rendering library (graphics device interface plus (GDI+)). The experimental data were obtained from a 1:10,000 scale electronic map, including various types of geographical features, such as "main street","subway","railway","building","canal","vegetation","school" and"hospital". As the objective of the experiment was to test the proposed tiled vector data model, all the features are not involve in map generalization. All of the features were saved as shapefile ( Ã .shp). Then, we used the data model proposed in Section "Tiled vector data model for point, linear and area features" and organized the data using the methods outlined in Section "Data organization based on the tiled vector data model". The organization results for each tile can be saved in any format for decoding (we saved the vector tile in the form of shapefile in the article). Naming rules were based on the rows and columns of corresponding tiles. Thus, the name of the tile in row 5, column 4 was "5_4". The advantage of this convention is its convenience for querying and its ability to effectively and accurately finding the target tiles.
The map explorer consisted of 16 tiles. The distribution form of the tiles on the map explorer is shown in Fig 17. All the functions of the map explorer, such as initializing the symbol library, symbolizing the features and viewing the map, were implemented in this experiment. Importantly, the map explorer had to initialize the symbol library before the tiles were loaded [54]. Then, tiles could be loaded into the map explorer and the map browser could render the map features in each tile as E-maps.  Table 1. The experiments assess different sizes of data and all the experiments were performed 10 times to obtain the average time cost to reduce the randomness of the experiments. From Table 1, we can see that generating vector tiles costs more than generating raster tiles. Because the method of generating raster tiles is only image clip while generating vector tiles needs lots of geometric computations. However, compared with raster tiles, vector tiles allow users to interact with the system and perform spatial analysis. The efficiency of generating vector tiles based on the traditional model was better than that based on the proposed model. The result is due to the structure of the proposed model, which is more complicated than the traditional model and requires additional geometric computations. Nevertheless, the main focus of this study is to eliminate graphic conflicts and losses, and such elimination is the unique advantage of the proposed data model compared with the traditional data model. Furthermore, the vector tiles can be pre-generated at the server-side. Therefore, this loss in efficiency of generating tiles is acceptable.  Table 2 tile types (raster tiles, vector tiles based on the traditional model and vector tiles based on the proposed model). The time of rendering raster tiles can be ignored because raster tiles are images. Due to the complex structure of proposed model, rendering vector tiles based on this model required slightly more time than that required by the traditional model, but this increase in time cost had no effect on the performance of the browsing map.
As this article focuses on proposing a tiled vector data model for geographical features, optimized methods of generating tiles and rendering tiles are beyond the scope of this paper. The running times in Tables 1 and 2 are only reference values. Tiled vector data model

Discussion and conclusions
E-maps are widely applied in many fields. Most current applications of E-maps are implemented based on raster tiles or vector tiles that are generated based on the traditional data model. This article summarizes previous literatures and explains why the traditional data model is inappropriate for vector tiles. It then proposes a novel tiled vector data model for geographical features. The proposed data model is superior to traditional methods in eliminating visual discontinuities and conflicts at the tile borders. In additional, we verify the proposed model is theoretically correct and feasible by formula derivation and several experiments. The conclusions can be summarized as follows: Tiled vector data model model can be used as reference to solve similar problems for E-maps of mobile and web clients.
3. Although the efficiency of generating and rendering tiles is lower for the proposed model than for the traditional model, the efficiency is acceptable considering the proposed model's complicated structure. Moreover, the cost of generating tiles can be offset by pre-generating all of the vector tiles before rendering the tiles. For rendering tiles, the time costs for the proposed model and the traditional model are almost equal, although the proposed model has an unique advantage. The efficiency comparisons reveals that the proposed model solve the conflicts problem while sacrificing very little time cost, which is significant for applications of vector tile technology.
This work has a good potential for contributing to enlarge the application scope of Emaps but still suffers from several limitations, which could be addressed in future research. First, the proposed model is only valid for maps with static symbols. The design of map symbols is open, and certain symbols that are cognitively effective for communication but might be realized by special procedures, such as dynamic symbols, which are especially useful for E-maps. Most dynamic symbols are dynamic or Flash images, and their symbol parameters (the length or the width of symbols) cannot be obtained directly. In future work, we will create an extension for the library to support dynamic symbols so that we can obtain the symbol parameter easily. Second, the proposed data model is more complicated than the traditional model, and the data size of vector tiles based on the proposed model is slightly larger. A balance between data transmission and cartographical rendering is critical for implementing the model. We will use "Protocolbuffer Binary Format" (PBF) as the encoding format for vector tiles rather than GeoJSON in future work. PBF is a high-compact format, and files encoded in PBF are much smaller than those encoded in GeoJSON. Using this technology will improve the transmission performance. Finally, real-time interactivity and spatial analysis have not been considered in this article. The geographical features are clipped into several vector fragments and stored in different vector tiles. These vector fragments cannot be used for interaction and spatial analysis, which are two merits of vector data and depend on reconstructing vector fragments and storing entire geographical features. The key to reconstruction is to seek out fragments and obtain the vertex series according to the chained    Tiled vector data model