Using multivariate statistical methods to assess the urban smartness on the example of selected European cities

The growing importance of maturity smart cities is currently observed worldwide. The vast majority of smart city models focus on hard domains such as communication and technology infrastructure. Scientists emphasize the need to take into account social capital and the knowledge of residents. The smart cities invest in enhanced openness and transparency data. Mature smart cities use real-time evidences and information to citizens, businesses and visitors. The smart cities are characterized by bottom-down management and civil government. The paper aims to assess the urban smartness of selected European cities based on the ISO 37120 standard. Several research methods including the Multidimensional Statistical Analysis (MSA) were applied. Using the statistical analysis of European smart cities with the implemented ISO 37120 standard, the author tried to fill gaps in the knowledge and to evaluate maturity smart cities. The results of the research have shown that the smart city concept is a viable strategy which contributes to the urban sustainability. The author also found out that urban sustainability frameworks contain a large number of indicators measuring environmental sustainability, the smart city frameworks lack environmental indicators while highlighting social and economic aspects.


Introduction
Cities play an extensive role in the sustainable development of the world. Cities constitute centers of innovation, entrepreneurship and creativity. According to United Nations, 56% of the world's population resides in cities, whereas forecasts indicate an increase to 69% in 2050. In fact, the world's 750 largest cities generate 57% of global GDP [1]. On the other hand, cities face many challenges, including overpopulation, environmental pollution and social segregation. Cities emit over 70% of global greenhouse gases and consume 80% of the world's energy. The European Union assumes that cities will reduce greenhouse gas emissions by 60% to 2050 [2]. Contemporary urban development is increasingly focused on ICT and sustainability in the so-called smartization process. Author's smart city definition refers to Fernandez-Anez's definition [3]. Smart city means a system achieves sustainable development and a high quality of life using ICT infrastructure. and only 15.6% of areas have project of plans (7.2% for Poland). Additionally planning coverage are characterized by a great diversity, for instance Lodz is covered by plans only 16.1%, but Gdansk-65.4%. In the cities 40.9% of planned local plans have been in preparation for more than three years, which indicates a long process of developing planning documents [23,24]. Marsal-Llacuna [25] suggests that smartness means to contribute to sustainable development and resilience. Smartness in the smart city is when the three pillars of sustainability (environmental, economic and social) are safeguarded while urban resilience is being improved by making use of ICT infrastructure ICT. Smartness in the smart city equals urban smartness which is a combination of three components such as: sustainability, urban resilience and ICT infrastructure (Fig 1). The smart city value chain by Dameri is the basis of urban smartness [21]. The smart city value chain obtains: (i) sustainability (carbon neutral, clean air and water); (ii) quality of life (safe, diverse, leisure, convenience); (iii) smart growth (knowledge, innovation, employment, investments). Furthermore Trindade et al. analysed scientific studies focusing on both environmental sustainability and smart city concepts to understand relationship between these two [26].
In the 2014, International Organization for Standardization published ISO 37120 norm which helps to measure and compare urban performance in terms of urban service and quality of life. It is a tool for uniform reporting of the state city's development in 17 thematic groups such as: education, energy, environment, finance, fire & emergency response, governance,  [27,28]. Moreover, Fox [29] introduces the Global City Indicator Ontology which it addresses the problem of how city indicators and their supporting data are to be published on the Semantic Web. Arroyo-Caňada and Gil-Lafuente [30] suggest that there are significant differences between western and eastern European cities. Additionally Western European cities, particularly those in the Nordic countries, are the best positioned to attract creative IT designers. Researchers explored fuzzy subsets which composed of 29 factors related to the economy, people, governance, mobility, environment, quality of life. The study focuses on 71 European cities using hierarchical cluster analysis. Similarly, Akande et al. [31] note that Nordic cities and cities in Western Europe perform better scores than cities in Eastern Europe. Berlin and other Nordic capital cities lead the ranking, while Sofia and Bucharest obtained the lowest rank scores. Furthermore Maltese et al. [32] investigated the relation between smartness and energy dimension concerning renewable energy, energy consumption and energy policy. The study refers 103 Italian NUTS3 province capitals using cluster analysis. Researchers identified four cluster labelled competitive cities e.g. Roma, Milano, specializing cities e.g. Palermo, Catania, attractive cities e.g. Bologna, Verona and liveable cities e.g. Rimini, Como. Papa et al. [33] examined 13 Italian metropolitan cities between 2006 and 2014 by using the principal component analysis. Researchers suggest that northern cities perform better than southern cities in reducing private transport and increasing the share of sustainable modes of transport such as public transportation, cycling and car sharing. Besides Alonso et al. [34] carried out the mobility and environmental evaluation 62 Spanish cities. Researchers claim that the cities better scored are Valencia, Madrid, Barcelona and Sevilla. Jolliffe and Cadima described some variants of principal component analysis and their application [35].

Materials and methods
The test procedure consists of several successive stages: (I) date work (selection of urban sustainability indicators and European cities with ISO37120 standard from the WCCD database; computation of basic statistics; standardization of variables); (II) clustering (estimation of the number of principal factors based on the Kaiser criterion; determining the eigenvalues of the correlation matrix; calculation of the eigenvectors of the correlation matrix; rotation selection; identifying the values of the factor loadings after equamax rotation; calculation of factor scores; for objects; drawing the variables configuration in the two factors space; drawing the objects configuration in the two factors space; determining the clusters number from the agglomeration chart; grouping of cities on the basis of cluster analysis for Ward agglomeration with the Euclidean distance from the link tree diagram; characterization of each cluster based on the kmeans analysis from the mean variable graph); (III) results analysis-finding recommendations. The most important stages of the research procedure were visualized in the Fig 2. The selection of urban smartness indicators is a huge challenge because this issue is approached in the scientific literature, international strategic documents and reports of various organizations in so many different ways. Indicators from the ISO 37120 standard were used in the conducted study. The choice of indicators (structure or intensity) was motivate by suggestions from Transforming our World: The 2030 Agenda for Sustainable Development [36]. The availability of statistical data at the city level becomes the second criterion. Empirical materials within this study were based on currently available statistical data listed by the World Council on City Data between 2014 and 2017. There are 100 indicators of urban service and life quality (46 basic and 54 additional) for 54 cities. The research facilities were selected from the list of cities with the ISO 37120 standard (www.open.dataforcities.org). The following analysis includes only European cities. Table 2 presents the general overview of the analyzed cities.
The selection of analyzed cities was carried out using three criteria: (i) spatial coverage concerns Europe; (ii) possession of an ISO 37120 certificate at the platinum level; (iii) all mandatory indicators identified. Fig 3 shows the location of the analyzed cities.
The selection of diagnostic variables to assess sustainable urban development included several stages. It was checked whether the variables fulfill the formal criteria in terms of measurability, completeness and comparability. Regarding statistical premises, the set of variables eliminated those for which the coefficient of variation was below 10%. A further reduction of the variables related to excessive correlation with the analysis of the matrix of coefficients used Pearson correlations. It allowed the identification of diagnostic features that were excessively correlated, which should be removed from further research. Finally, the following indicators were selected for analysis: a share of city's unemployment, a ratio of primary education student to teacher, a amount of fine particulate matter (PM2,5) concentration, a number of firefighters per 100,000 population, a number of total collected municipal solid waste per capita, a number of green area per 100,000 population.
The evaluation of sustainable development is very complex due to the wide range of factors. Assessment of urban sustainability requires the determination of a set of indicators characterizing key aspects in three dimensions economic (X 1 ), social (X 2 , X 4 ) and environmental (X 3 , X 5 , X 6 ) as well as an indication of their importance for sustainable development. The research was carried out using STATISTICA version 13.1 program and EXCEL. Table 3 presents the list of analyzed cities with the characteristics of sustainability.
The factor analysis was developed by C. Spearman in 1904 [37]. It is a popular multivariate method used for data reduction purpose. The basic idea is to represent a set of variables by a smaller number of factors. The variables used in factor analysis should be linearly related to each other. This can be checked by looking at scatterplots of pairs of variables.
Thus, the factor analysis model can be expressed using the following formula: where: X 1 ,X 2 ,. . .,X p -p variables, then variable i can be written as a linear combination of m factors F 1 ,F 2 ,. . .,F m , m<p; a i -the factor loadings for variable i; e i -the part of variable X i . The cluster analysis, derived from the modeling classification, was developed by R.C. Tryon in 1939 [38]. It is a popular a method of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. The cluster analysis refers to data mining and machine learning. Grouping is strictly conditioned by the data source and the expected form of results. The cluster analysis algorithms are divided into two basic categories of hierarchical and non-hierarchical methods. Agglomerative procedures create a similarity matrix of classified objects, and then in the next steps combine the most similar objects into clusters. K-means methods consists in pre-dividing the set into a predetermined number of classes. The most popular distance is the Euclidean metric, which can be calculated using the following formula: dðp; qÞ ¼ dðq; pÞ ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi where: p = (p 1 ,p 2 ,. . .,p n ), q = (q 1 ,q 2 ,. . .,q n ) -two points in the Euclidean n-space, d-the distance from a point p to a point q.

Results
The research began with computing the basic statistics for urban indicators by measuring position (arithmetic mean) and variability (standard deviation, variation coefficient, skewness, kurtosis). The most diverse indicator is number of a green areas, while the least is a ratio of primary education student to teacher. Table 4 presents information on general statistics for each indicator. Afterwards indictors were standardized using the following formula: The next stage was to determine the eigenvalues of the correlation matrix ( Table 5). The eigenvalues for a given factor measure the variance in all the variables which is accounted for by that factor. The ratio of eigenvalues is the ratio of explanatory importance of the factors with respect to the variables. If a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be ignored as redundant as compared to more important factors. It reflects the significance of factors in explaining the information of input variables (percentage in the variability of the data set). The number of factors was determined using the eigenvalues method greater than 1 (a Kaiser criterion). The decision on the number of factors can also be made on the basic of the scree criterion. The higher the correlation coefficient of a variable with a factor means the higher the significance of the variable for a given factor.
There are several ways to conduct factor analysis for instance unweighted least squares, generalized least squares, maximum likelihood. The interpretability of factors improved through rotation. There are many different types of rotation, but they try make factors each highly responsive to a small subset of items. Rotation works through changing the absolute values of the variables whilst keeping their differential values constant. There are two major categories The next step of investigation was to determine the values of the factor loadings after equamax rotation ( Table 6). Each of measures are linearly related to each factors. The strength of this relationship is contained in the respective factor loading, produced by rotation. This loading is interpreted as a standardized regression coefficient, regressing the factor on the measure.
Consequently, it was calculated the projection of each observation on each of the factor. The factor scores gave the location of each observation in the space of the common factors. Table 7 presents the factor scores for objects. A italic font-a minimum value; a bold font-a maximum value (X 1 ) a share of city's unemployment; (X 2 ) a ratio of primary education student to teacher; (X 3 ) a amount of fine particulate matter (PM2,5) concentration; (X 4 ) a number of firefighters per 100,000 population; (X 5 ) a number of total collected municipal solid waste per capita; (X 6 ) a number of green areas per 100,000 population.
Note: author's elaboration on the based WCCD ISO37120.
The next step of the study involved the identification of outliers based on the configuration of objects in the space of the two factors space. The Fig 5 presents the graphic location of cities in the two factors space.
The next step involved the cluster analysis. Grouping was carried out using agglomeration and k-means methods. In the agglomeration analysis, the Ward method was selected, in which the Euclidean distance was used to compare cities. The agglomeration graph presents information about the binding distance relative to the binding steps. The objects clusters have been identified in the dendrogram-sopel chart (Fig 6). Groups of objects were characterized by the k-means cluster analysis. The graph of variables' average values in individual clusters contains information about the best and the worst cluster of cities (Fig 7). Table 8 presents the assessment of the level of sustainable and smart development based on the average values of indicators for individual city clusters.

Discussion
The considerations, carried out in the manuscript, allowed to state that the smart city concept is implemented using sustainability in economic, social and environmental aspects. The application of the factor analysis presented relationships between indicators characterizing sustainability in selected European cities. The evaluation of the implementation of sustainability using cluster methods has allowed to identify similar cities.
In the case being analyzed, the impact of input indicators on the sustainability of cities was described through the first two factors ( Table 5). The two factors contain 77.13% of the variability of input variables. The first factor transfers 42.57% of the information contained in the input variables. The second factor explains 34.56% of the variability of the input data. The first factor consists of positively correlated variables (Table 6) with a green area (X 6 ) as well as those that are negatively correlated: a total collected municipal solid waste (X 5 ) and a city's unemployment (X 1 ). The above correlations testify to the association of high values of variables X 5 , X 1 with low value of the number of firefighters (X 4 ) and accordingly, with the increase of the latter the former decrease. The second factor consists of positively correlated variables with a ratio of primary education student to teacher (X 2 ) and a green area (X 6 ) as well as also negatively with a responsible for number of firefighters (X 4 ).
In the analyzed example, most of the information contained in input variables is transferred by the factors (Fig 4). The strong correlation (two variables next to each other) occurs between a city's unemployment (X 1 ) and a collected municipal solid waste (X 5 ). The lack of correlation is between a fine particulate matter concentration (X 3 ) and a green area (X 6 ) or a collected municipal solid waste (X 5 ) and a ratio of primary education student to teacher (X 2 ). The negatively correlated is between a number of firefighters (X 4 ) related to a ratio of primary education student to teacher (X 2 ).
The position of cities shows the graph of objects' configuration in the two factors space ( Fig  5). Aalter (cluster 3) is outlying city because of the high level of socio-economic development.  Koprivnica (cluster 1) and Porto (cluster 4) are distinctive from the other cities. The results of grouping cities by methods of the cluster analysis (Fig 6) and the factor analysis ( Fig 5) are identical.
The level identifying the cluster number at 13th step corresponds to 6 binding distances On the basis of the agglomeration graph,. The four clusters of cities were identified on the dendrogram (Fig 6). The urban indicators were characterized using the k-means analysis. The graph of variables' average values in particular clusters presents information about the best and the worst group of cities. The fourth cluster is the weakest set of cities, but the third cluster is the best set of cities.
The clusters' characteristics were prepared based on the k-means analysis (Fig 7). The cluster 2 is the most numerous group. It consists of eight cities (Amsterdam, Eindhoven, Heerlen London, Rotterdam, Sintra, The Hague, Zwolle). The number of a green areas is higher (625.05 hectares/100,000), and the remaining indicators' average values have an medium value.
The cluster 3 is the least numerous group with an isolated Aalter. The value of a green areas is the highest (4,465.8 hectares/100,000) as well as a unemployment (3.3%), the number of firefighters (9.9 units/100,000) and the amount of collected municipal solid waste (0.13 t/capita) are the lowest.
The cluster 4 consists of three cities (Barcelona, Porto, Valencia). The value of a unemployment (18.8%), the ratio of primary education students to teachers (19.1)  100,000) are the highest. The ratio of primary education student to teacher (12.5) is the lowest. Table 9 shows the assessment of urban smartness.
If we look at scientific literature, the factor analysis is rarely used to study the diversity of indicators of sustainability in cities. Salvati [39] applied this method to identify factors shaping land consumption in 155 European cities. The Northern European and United Kingdom cities had the lowest level of land consumption. Furthermore Yan et al. [40] assessed the performance of urban sustainability in Chinese cities based on natural resource input (water, energy, land) and human welfare (safety, health, basic material for good life, freedom of choice, freedom of action) using data envelopment analysis. An interesting investigation was conducted by Gonzalez-Garcia [41]. Spanish cities evaluated on the basis of ratio of people at risk of poverty and social exclusion, the unemployment rate, criminology ratio, educational places, education level, net disposable income as well as an environmental endpoint.

Conclusions
This paper indicates the possibility of using a certain methodology to study the urban smartness. The smart city is rapidly becoming a key success factor for contemporary urban world. The importance of the smart city concept has increased along with the development of the globalization process. Given these facts, the paper aim was to assess of the urban smartness of selected European cities. The presentation of the unique quantitative research related to this topic can be regarded as evidence of the originality of the manuscript. Using the analysis of selected European cities data, the findings brought diversified results, allowing to answer the research questions. This study contributes to the knowledge base in several ways. Firstly, thought this research adopts a single-continent approach, analyzing smart cities in Europe  gives the opportunity to compare the results with other continents. Secondly, given the growing role of the smart city concept, it is expected that many decision makers would have to take this growing trend into account if they wish to help achieve sustainability in urban development. The result of this study can offer guidance for city managers willing to obtain benefits from the implication of the smart city concept. This study had several limitations, the most important of which was the analysis of only a few variables and selected cities.
In this study cluster analysis and factor analysis were used to identify significantly variables related to sustainability of selected European cities with the implemented ISO 37120 standard. The method allows replacing the input set of correlated features with a small number of uncorrelated factor which are linear combinations of variables. Two of the extracting factors explain nearly 77% the variability of input data. The first factor justifies 43% of the variability of input data. The second factor explains almost 34% of the variability of data. It can be concluded that the factor analysis is useful in reducing the dimensionality of variables in the description of the problem under considered. The first factor mainly contains the a green area (X 6 ), but the second factor-a ratio of primary education student to teacher (X 2 ). The strong correlation occurs between the city's unemployment (X 1 ) and the collected municipal solid waste (X 5 ). Aalter, Koprivnica and Porto are distinctive from the other cities. The cluster analysis allowed to identify and characterize four groups of similar cities. The fourth cluster (Barcelona, Porto, Valencia) is the weakest set of cities, but the third cluster (only Aalter) is the best set of cities. The results of grouping cities by methods of the cluster analysis and the factor analysis are identical. The conducted research shows that the analysed cities present a huge diversity of the sustainability level.
This ranking is intended to attract attention and induce competition between cities. The city managers can see in the objective state of the extent in which they are perceived as smart and sustainable, but also are able to identify the points in which to improve their sustainability. The proposed test procedure can be used to assess the sustainability level of other European and non-European cities.
Aalter (cluster 3) has the highest level of urban smartness because of the environmental and the economic pillars of sustainability. The main pillar of sustainability is social in the cluster 1. The cluster 4 presents the low urban smartness because of the environmental and the economic pillars of sustainability.
The paper fills the gap in the editorial market by reviewing issues related to urban smartness through the use of extensive literature. Another advantage of the article is the application of an original cluster and factor analysis methods to assess the indicators of cities in the area of urban smartness. The further research could be conducted through direct interviews with city managers and players in order to understand their attitude towards the development of smart city projects. However, the research carried out in this manuscript does not fully cover the  extensive research topic. Interesting research on the future of urban smartness should include following issues: (i) operationalization of urban smartness measurement; (ii) determinants of the level of urban smartness depending on the size and type of urban units; (iii) conceptualization of the urban smartness model; (iv) using e-Planning tools to build a smart city strategy.