Automated mapping of land cover in Google Earth Engine platform using multispectral Sentinel-2 and MODIS image products

Xia Pan; Zhenyi Wang; Gary Feng; Shan Wang; Sathishkumar Samiappan

doi:10.1371/journal.pone.0312585

Abstract

Land cover mapping often utilizes supervised classification, which can have issues with insufficient sample size and sample confusion, this study assessed the accuracy of a fast and reliable method for automatic labeling and collection of training samples. Based on the self-programming in Google Earth Engine (GEE) cloud-based platform, a large and reliable training dataset of multispectral Sentinel-2 image was extracted automatically across the study area from the existing MODIS land cover product. To enhance confidence in high-quality training class labels, homogeneous 20 m Sentinel-2 pixels within each 500 m MODIS pixel were selected and a minority of heterogeneous 20 m pixels were removed based on calculations of spectral centroid and Euclidean distance. Further, the quality control and spatial filter were applied for all land cover classes to generate a reliable and representative training dataset that was subsequently applied to train the Classification and Regression Tree (CART), Random Forest (RF), and Support Vector Machine (SVM) classifiers. The results shows that the main land cover types in the study area as distinguished by three different classifiers were Evergreen Broadleaf Forests, Mixed Forests, Woody Savannas, and Croplands. In the training and validation samples, the numbers of correctly classified pixels under the CART without computationally intensive were more than those for the RF and SVM classifiers. Moreover, the user’s and producer’s accuracies, overall accuracy and kappa coefficient of the CART classifier were the best, indicating the CART classifier was more suitable to this automatic workflow for land cover mapping. The proposed method can automatically generate a large number of reliable and accurate training samples in a timely manner, which is promising for future land cover mapping in a large-scale region.

Citation: Pan X, Wang Z, Feng G, Wang S, Samiappan S (2025) Automated mapping of land cover in Google Earth Engine platform using multispectral Sentinel-2 and MODIS image products. PLoS ONE 20(4): e0312585. https://doi.org/10.1371/journal.pone.0312585

Editor: Bijay Halder,, Universiti Kebangsaan Malaysia, Malaysia

Received: April 2, 2024; Accepted: October 9, 2024; Published: April 7, 2025

Copyright: © 2025 Pan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript.

Funding: This work was financially supported by the Development and Control of Inner Mongolia Desert Under the Goal of “Double Carbon” (STZX202309); 2024 “Five Major Tasks” Research Project of the Autonomous Region (NCXWD2416); 2023 Inner Mongolia Industrial Development Research Base Project (2023YB008); Inner Mongolia University of Finance and Economics, Party Building and League Building Research Project (DTY2322).

Competing interests: The authors have declared that no competing interests exist.

1. Introduction

Over the past few decades, satellite remote sensing has undergone dramatic changes in the revisit cycle, data quality, area coverage, and spatial resolution [1]. Extensive studies have strongly proved that remote sensing imagery is the best to monitor land cover information over a larger area [2]. Satellite remote sensing has been widely used as an effective and efficient means to monitor land cover patterns at a large geographic extent. Many researchers used LiDAR derived information, including DSM, DEM, point density and spatial statistics calculated from LiDAR data for distinguishing land cover types. More recently, Sentinel-2 satellites were launched in 2015 and 2017 with even finer spatial resolution and revisit times. The increase in data at fine resolutions and in time increases the potential benefit of algorithms to incorporate evidence from large numbers of satellite images into useful maps for monitoring landscape changes.

Classification is one of the most vital phases for extract land cover information, and the classification algorithms learned from the training samples should be extended to the whole imagery [3]. The machine learning classification algorithms are grouped into two categories: unsupervised and supervised [4–5]. The former aggregates the pixel characteristics of images in classes by analyzing the similarity of attributes, without the analyst’s contributions [6]. However, unsupervised classification requires rich-experienced operator input manually to improve the result accuracy [5,7]. By contrast, supervised classifications are generally considered superior to unsupervised ones in terms of operating procedure and result accuracy [8,9].

The supervised classification requires sufficient and good representative samples which are commonly selected and labeled by visual inspection or field survey [10,11]. The process of collecting representative samples is extremely time-consuming and labor-intensive. Although partly researchers tried to only label the most uncertain samples by optimizing machine learning methods in semi-supervised classification to improve the process of sample selection, the accuracy of the classification results is low and still needs further to be improved by selected and labeled samples manually [12–15]. To the best of our knowledge, the index-based calculation is one of the most commonly used methods for automatically extracting samples, however, the classification results are commonly rough, and the land cover types are far less than in nature [16]. Therefore, developing an automatic extraction method of samples and establishing the large-scale detailed sample dataset has great practical significance. In land cover information extraction, the traditional algorithms with the improved resolutions and kinds of remote sensing satellite have several problems, such as the inability to be applied to multispectral and hyperspectral satellite imagery, the weak generalization ability of the model and the difficulty of automating the construction of a training database. To solve these problems, the machine learning algorithm is necessary to be further improved. Farahmand et al. [17] evaluated the capability of various nonlinear regression models based on optical Sentinel-2 remote sensing images to estimate soil salinity. Their evaluation results confirmed that nonlinear regression models are superior to linear regression models in soil salinity estimation. Taghizadeh-Mehrjardi et al. used machine learning algorithms to predict the soil particle size fraction, and found that the ant colony optimization (ACO) had a higher accuracy [18]. Xu et al (2021) used Random Forests and ResNets classifiers to map eight metropolitan areas, comparing training areas drawn by different or consistent interpreters, and data splitting strategy using rules that allow or reduce spatial autocorrelation. They found large discrepancies among results built from crowdsourced training areas digitized by different experts; improving the consistency of labels can lead to substantial improvements in local climate zone classification accuracy [19]. Although there has been increasing attention toward land cover classification, most research has focused on investigating classifiers and methods, whereas limited attention has been placed on the application of training data. Training areas are polygons manually digitized by experts from very-high-resolution imagery to represent land cover classes, which are then used as reference geometries in sampling data from satellite imagery for model training and evaluation [19]. The process of identifying training areas is time-consuming and requires that experts have knowledge specific to the land cover scheme and cities of interest. High-quality training areas are the basis for generating training data for land cover mapping. To decentralize training area collection, the MCD12Q1 V6 product provides global land cover types at yearly intervals (2001-present) derived from different classification schemes in GEE.

The GEE cloud-based platform is an emergent technology with functionalities of an automatic parallel processing and fast computational platform the effective self-programming of remote sensing images, helping scientists to develop algorithms with less effort than before [20–23]. For instance, Gorelick et al. [20] showed that the GEE is a cloud-sharing platform at the planetary scale that provides Google’s massive computational capabilities for a variety of high-impact societal issues, including land cover classification, water management, drought monitoring, food security, and environment protection. Furthermore, according to Hansen et al. [24], it only took 100 hours to process 654,178 Landsat-7 images (about 707 terabytes) and produce a global forest distribution map in the GEE platform. This was reported as a valuable achievement because this process would have taken a million hours to complete without GEE [25–27].

To accomplish automatic classification of land cover types, the present study investigated the samples of multispectral Sentinel-2 images that were extracted and collected automatically from the Annual International Geosphere-Biosphere Programme (IGBP) classification scheme of the MODIS Version 6 Land cover Type 1 Product (MCD12Q1 V6 LC_Type 1). Through this scheme, a large-scale detailed sample dataset was established, and 70% of the samples selected from MCD12Q1 were used to train three state-of-the-art supervised classification algorithms, RF, CART, and SVM, which are the internal machine learning algorithms in the GEE. Additionally, spectral indices from multispectral Sentinel-2 Level-2A data that respond to different land cover types parameters: NDVI, NDWI, and NDSI, were used as classification features. All data used in this study were downloaded and processed through the GEE JavaScript API (Application Programming Interface). The efficient and open-access image analysis workflow provides a fast and reliable method to remotely map land cover types in the study area. The following are the unique contributions of this research:

a. A sample dataset with detailed classification categories of multispectral Sentinel-2 image is set up by the automatic labeling of samples from the IGBP classification scheme of MCD12Q1 V6 LC_Type 1.
b. Presented the differences of land cover types in automatic labeled Sentinel-2 database under the three most commonly used supervised classification algorithms, and a classifier that is most suitable for the above automatic workflow is proposed.
c. Examine the superiority of the GEE cloud platform to the method proposed in this paper. Multispectral Sentinel-2 and MODIS images can be easily processed and fast computed using this platform before without paying too much attention to the basic works.

2. Study area and datasets

2.1 Study area

The study area with Guangdong province (E109°45′-117°20′, N20°09′-25°31′) which includes Shaoguan, Heyuan, Huizhou, Shenzhen, Dongguan, Guangzhou, Qingcheng, Zhaoqing, Foshan, Zhongshan cities is about 96205.45 km² (Fig 1). The altitude ranges from 26 m to 1636 m above sea level. The average annual rainfall is from 1300 mm to 2500 mm, and the average air temperature is 22.3 °C. The main land cover types in Guangdong Province include Forest, Cropland, Urban and built-up areas and Water bodies. The chief reasons for choosing this study area are the Sentinel-2 image acquired has little cloud coverage and the vegetation types are complex and mixed. Consequently, the varied and visible land cover makes this area a good location for testing the ability of detailed and automated classification to sharpen land cover types that may be misidentified in a coarse classification.

Download:

Fig 1. The study area with Guangdong province.

https://doi.org/10.1371/journal.pone.0312585.g001

2.2 MODIS land cover product

The MCD12Q1 V6 product provides global land cover types at yearly intervals (2001-present) derived from different classification schemes in GEE. The supervised classification of MODIS Terra and Aqua reflectance data was used. The supervised classification undergoes additional post-processing that incorporates prior knowledge and ancillary information to further refine specific land cover classes. LC_Type1 (17 classes) of MCD12Q1 from the International Geosphere-Biosphere Program (IGBP) was used to provide training class labels for Sentinel-based land cover classification. Detailed information refers to “https://lpdaac.usgs.gov/products/mcd12q1v006/”. The value, class names, abbreviations, and descriptions were summarized in Table 1. The clipped image and programming in GEE were shown in Fig 2 (Right).

Download:

Table 1. Value, class names, abbreviations, and descriptions of the IGBP classification scheme of MCD12Q1 V6 LC_Type 1.

https://doi.org/10.1371/journal.pone.0312585.t001

Download:

Fig 2. Sentinel-2 and MCD12Q1 for GEE cloud platform.

https://doi.org/10.1371/journal.pone.0312585.g002

2.3 Sentinel-2 multispectral product

The Sentinel-2 Level-2A product (from 1 January 2019–31 December 2019), which is a wide-swath, high-resolution imaging mission supporting Copernicus Land Monitoring studies, was obtained in the GEE cloud-based platform. Sentinel 2 includes Level-2A and Level-2B satellites. The Sentinel-2 Level-2A product carries a Multispectral Instrument (MSI) with a flight altitude of 786 km, a ground swath width of 290 km, incorporating 13 spectral bands: Visible and Near Infrared (NIR) at 10 m spatial resolution; red edge and Short-wave Infrared (SWIR) at 20 m spatial resolution, and three atmospheric bands at 60 m spatial resolution (Table 2). The quality band at 60 m (Q60) spatial resolution was used to mask out clouds. The mosaic image and programming in GEE are shown in Fig 2 (Left). Please see the details in “https://sentinel.esa.int/documents/247904/685211/Sentinel-2_User_Handbook”.

Download:

Table 2. Details of the Sentinel-2 Multispectral Instrument.

https://doi.org/10.1371/journal.pone.0312585.t002

The processing with scene classification and the atmospheric correction was applied to the Sentinel-2 Level-2A product. Atmospheric correction applied to Top-of-Atmosphere (TOA) Level-1C orthoimage products removed the effects of the atmosphere on the TOA reflectance values of original remote sensing images, and this process was accomplished by using a set of look-up tables generated by libRadtran which is a unitless measurement that provides the ratio between the radiation reflected and the incident solar radiation on a given surface. Further, the scene classification algorithm generates a classification map, which consists of four different classes for clouds (including cirrus), together with six different classifications for shadows, cloud shadows, vegetation, soils/deserts, water, and snow. The main output of the Sentinel-2 Level-2A product is an orthoimage Bottom-of-Atmosphere (BOA) corrected reflectance product. Further, the 20% cloud cover in Sentinel-2 Level-2A product was filtered to get less cloudy granules. Finally, the Sentinel-2 Level-2A images with minimal cloud/haze from 1, 2, 3, 4, 5, 6, 7, 8, 11 bands, and 12 of 10, 20, and 60 m spatial resolutions were used. Sentinel-2 Level-2A output images used in the present study were resampled to 20 m in order to generate an equal spatial resolution for all spectral bands.

3. Method

All the processes were performed in the GEE cloud platform. At first, a large and reliable training dataset of multispectral Sentinel-2 image was extracted systematically across the study area from 17 land cover classes of the IGBP classification scheme of MCD12Q1 V6 LC_Type 1 product (3.1 Section). Then, the quality control and spatial filter were applied for all the land cover classes in order to generate a reliable and representative training dataset (3.2 Section). Further, spectral reflectance indices were added to the feature inputs used for land cover classification and mapping (3.3 Section), the reliable training datasets were applied to train the SVM, CART, and RF classifiers, and a confidence map was produced (3.4 Section). Finally, the classification accuracy based on the confusion matrix was evaluated by using new independent layered validation samples (3.5 Section).

3.1 Automatic collection of training samples from MCD12Q1 land cover product

According to the earlier approach proposed by Hankui and David [28], the MCD12Q1 V6 and Sentinel-2 Level-2A images were resampled to the same spatial resolution with 20 m and transformed to the same map projection using Geographic Lat/Lon (EPSG: 4326). Then, multispectral Sentinel-2 images were classified to generate a new land cover map, based on the MODIS IGBP land cover classification scheme. In this process, to reduce the spectral variability of the 20 m pixels within the 500 m pixels caused by heterogeneous pixels, criteria were designed according to Xie et al. [29,30], whereby, all the 500 m MODIS pixels were homogeneous and all the 20 m Sentinel-2 pixels within the 500 m MODIS pixel were homogeneous. These criteria helped select only high-quality training class labels for which there was high confidence. The formulas used are described below:

(1)

(2)

Where ρ_i was a vector that consisted of the reflectance values of the Sentinel-2 Level-2A image; ρ_c was the spectral centroid and was the median value of a sample set in the spectral dimension; ∆ _i was the Euclidean distance from sample i to the spectral centroid of the samples.

Then, we sorted the calculated Euclidean distance from small to large and retained the top 50% of the samples. Nearly 15% of heterogeneous 20 m pixels, which were inconsistent with the 500 m MCD12Q1 pixels, were removed by the above refinement process. Finally, a large and refined training dataset of multispectral Sentinel-2 image was extracted systematically and automatically across the study area from 17 land cover classes of the IGBP classification scheme of MCD12Q1 V6 LC_Type 1. The spatial distribution of samples and programming in GEE was shown in Fig 3. Flowchart showing the main steps of the proposed method used in this study was shown in Fig 4.

Download:

Fig 3. Automatic collection of training samples from MCD12Q1 (The red and green dots are training and verification samples respectively).

https://doi.org/10.1371/journal.pone.0312585.g003

Download:

Fig 4. The main steps of the proposed method used in this study.

https://doi.org/10.1371/journal.pone.0312585.g004

MCD12Q1 products were used to select accurate training samples automatically; the selected samples were then used to extract Landsat spectral-temporal features which were used to train the classifier. This process aimed to automatically generate land-cover mapping at 20 m using Sentinel data. The preliminary filtering criteria for MCD12Q1 products that were used are described below. These criteria were designed to help select only high-quality training class labels in which confidence was high:

(i) the MCD12Q1 pixels that had the same values in the 8 surrounding pixels;
(ii) the 500 m MODIS pixels were homogeneous;
(iii) the 20 m Sentinel pixels within the 500 m MODIS pixel were homogenous.

Rule (i) helped to reduce 500 m pixel edge effects in situations where there were possible changes in the underlying land cover. Rule (ii) was introduced because the homogeneous pixels had a higher classification accuracy. Rule (iii) helped to reduce the spectral variation within the 20 m pixels caused by heterogeneous pixels.

3.2 Quality control and spatial filter of land cover pixels

The quality of MCD12Q1 500 m land cover pixels was first considered to ensure the generation of a reliable and representative training dataset. Therefore, the suitable MCD12Q1 pixels that always had classification confidence (Land Cover_Type_1_Assessment) > 50% and quality assessment (Land Cover_Type_QC) set as “good quality” were selected [28]. According to the earlier approach proposed by Paula et al. [31], a spatial filter was also applied for all the land cover classes to reduce the edge effects and geolocation error caused by spatial differences between the 500 m MCD12Q1 and 20 m Sentinel-2 data, in which only the same land cover class in the surrounding eight MCD12Q1 500 m pixels was obtained. A minority of heterogeneous 20 m pixels which were inconsistent with the land cover types in 500 m MCD12Q1 pixels, were filtered using the above refinement process (Fig 4).

3.3 Spectral indices calculations in Sentinel-2 image data

Three spectral reflectance indices were added to the feature inputs 0used for further land cover classification.

Normalized Difference Vegetation Index (NDVI): NDVI is a commonly implemented index to assess vegetative growth (or biomass), drought, and agricultural production, based on surface information from multispectral measurements. NDVI is defined as (3):

(3)

Normalized Difference Water Index (NDWI): NDWI uses the low reflectivity of water in the infrared band and high reflectivity in the green band to enhance the detection of water [32]. NDWI can effectively extract the water content of the vegetation canopy, and can also respond to the vegetation canopy under water stress in a timely manner, which is of great significance for drought monitoring, and is defined as (4):

(4)

Normalized Difference Snow Index (NDSI): NDSI uses the high reflectivity in the visible and near-infrared bands and low reflectivity in the short-wave infrared band, which is the theoretical basis of remote sensing snow mapping, and is defined as (5):

(5)

Where: NIR = near infrared band (Band 8); RED = red band (Band 4); Green = green band (Band 3); SWIR = short-wave infrared (Band 11).

3.4 Supervised classification algorithms

In this study, 70% of the samples selected from MCD12Q1 were used to train SVM, RF, and CART classifiers which are the internal machine learning algorithms in the GEE cloud platform, and the remaining 30% of the samples were used for testing. Please refer to the official developer’s guide for the detailed application of the three above supervised classification algorithms used in this work (https://developers.google.com/earth-engine/guides/classification).

Typically, SVM classifiers analyze linearly separable cases. For linearly inseparable cases, the linearly inseparable samples in the low-dimensional input space are converted into high-dimensional feature spaces by using a nonlinear mapping algorithm to make them linearly separable, producing samples with nonlinear characteristics that can be analyzed linearly [3,33]. Moreover, the SVM classifier is based on the structural risk minimization theory in order to construct an optimal hyperplane in the feature space, so that the classifier is globally optimized, and the expectations in the entire sample space meet an upper bound with a certain probability [10,31]. The kernel trick is used to avoid the definition of the mapping function [34]. In the present study, the linear kernel was performed and the regulation parameter (C) was set to 102.

The RF algorithm generates multiple decision trees by creating random features, which combines Breiman’s idea of “Bagging” and random selection of features [35]. Three parameters need to be identified: the number of trees, the minimum number of terminal seeds, and the number of features [36]. Previous studies have indicated the accuracy becomes more stable if the number of trees is more than 120 [37,38] so the number of trees was set to 300 in this study. The other two parameters were adopted the default values (the minimum number of terminal seeds was 1, the number of features was the square root of the number of all features).

The CART classifier uses the binary recursive segmentation method with the Gini coefficient as the optimal test variance and segmentation threshold standard, and finally generates a classification decision tree based on the binary tree. The core idea of CART is to apply the attribute characteristics of the sample as test variables in the process of training data to generate a multi-level and multi-node binary tree. The classification process stops when there is no further split [31]. The complexity of the model was determined by the maximum depth of the tree. A large model depth may have greater accuracy, but it also increases the risk of overfitting. Qian et al. [39] found the overall accuracy becomes relatively stable when the maximum depth is set 5–8. Thus, the depth parameter in this study was set to 8.

3.5 Accuracy assessment

Accuracy was assessed using new independent layered validation samples, which were intuitively interpreted from the high-resolution images of the GEE cloud platform. Moreover, the “Bing Map”web service and System for Terrestrial Ecosystem Parameterization sites were also implemented, as well as the field photos from the Global Field Photo Library (http://www.eomf.ou.edu/photos/FieldPhoto/) [29]. In summary, a total of 5321 training samples and 3701 validation samples were extracted from the above datasets.

The traditional metrics, including user’s accuracy, producer’s accuracy, and overall accuracy, were used to quantitatively assess the classification accuracy. Although overall accuracy is correlated strongly with overall classification accuracy, it only computes the number of pixels correctly classified in the diagonal direction. Therefore, in addition to the pixel statistics on the diagonal, accuracy assessment also involved the Kappa coefficient, which calculates the missing and misclassified pixels outside the diagonal. Kappa is always between 0 and 1. Values with 0.80 to 1.00 imply very good agreement, 0.60 to 0.80 imply good agreement, 0.40 to 0.60 imply moderate agreement, 0.20 to 0.40 imply fair agreement, and less than 0.20 implies poor agreement. Given this, the user’s accuracy, producer’s accuracy, overall accuracy, and Kappa coefficient based on the confusion metrics were computed to comprehensively and quantitatively assess the proportion of correctly classified, missing, and misclassified pixels.

All the accuracy indices are presented with a 95% confidence interval. Finally, a land cover map was generated for study areas using the proposed method in this study and the classification accuracy of each map was computed using the corresponding validation datasets.

4. Results

4.1 The area distribution of different classifiers and land cover types

Fig 5 is the area distribution of different land cover types under MCD12Q1 and classifiers. The areas of the EBF, MF, and PW under the RF classifier were less than the MCD12Q1, however, the areas of the WS, CS, UBL, and CVM were greater than the MCD12Q1 (Fig 5A). The area distribution of land cover types under the CART classifier was similar to the RF, such as the areas of the MF and PW were still less than the MCD12Q1, the areas of the CS, UBL, and CVM were also more than the MCD12Q1 (Fig 5B). The area difference between the SVM classifier and MCD12Q1 was relatively large. The area of the EBF under the SVM classifier was greater than the MCD12Q1. Moreover, the areas of the MF and CVM were far less than the MCD12Q1 (Fig 5C). Therefore, among the three classifiers, the areas of different land cover types classified under CART were closest to the MCD12Q1.

Download:

Fig 5. The area distribution of 17 different land cover types under MCD12Q1 and classifiers.

https://doi.org/10.1371/journal.pone.0312585.g005

Fig 6 is the spatial distribution of different land cover types under MCD12Q1 and classifiers. The land cover types in the northern part of the study area under MCD12Q1 were mainly MF, WS, and a small part of CS and CVM. Moreover, large areas of EBF were mainly distributed in the eastern, western, and central parts of the study area. The southern part of the study area near the sea was dominated by UBL, surrounded by a large area of CS and CVM. Furthermore, there were scattered PW in the southwest (Fig 6A). The overall spatial distribution of the land cover types under the CART classifier was similar to MCD12Q1; however, CART has less area of EBF a more scattered spatial distribution of UBL. The land cover types in the northern part of the study area under the CART classifier were still mainly dominated by the MF, WS, and scattered CS and CVM. Compared with MCD12Q1 and to a lesser degree the RF classifier, there were no obvious areas of the PW under the CART classifier (Fig 6B and 6C). There were significantly more UBL near the southern part under the RF classifier than MCD12Q1. Moreover, the spatial distribution of WS under the RF classifier was quite different from MCD12Q1, and there was no obvious large-scale WS in the northern part. Although the area of EBF under the RF classifier was significantly more than that of MCD12Q1, it was still mainly distributed in the eastern, western, and central parts (Fig 6C). Compared with MCD12Q1, there was no obvious UBL under the SVM classifier in the southern part, and the spatial distribution of WS was also significantly different from MCD12Q1. Furthermore, there was no obvious PW under the SVM classifier (Fig 6D).

Download:

Fig 6. The spatial distribution of 17 different land cover types under MCD12Q1 and classifiers.

https://doi.org/10.1371/journal.pone.0312585.g006

4.2 The pixel percentage of classifiers under each land cover type

Fig 7 presents the pixel percentage of each classifier to the total pixels for the 17 land cover types. Under the MCD12Q1 classification system, the pixel percentages of the EBF reached 32.51% with the highest value for this system. Next, the pixel percentages of the WS, MF, CS were 22.62%, 16.71%, and 12.33%, respectively. The pixel percentages of the CVM and UBL were less and were only 7.17% and 4.67%. The pixel percentages of the other land cover types were even less and the total percentage was only 3.17% (Fig 7A). Under the RF classifier, the pixel percentage of the WS was obviously higher than the other land cover types and was 31.87%; however, the pixel percentage of the EBF was less than the MCD12Q1 system. Additionally, the pixel percentages of the EBF and CS were similar and reached 19.71%, 18.75%, respectively. The pixel percentages of the UBL, MF, and CVM were less and were 9.56%, 9.17%, 8.39%, respectively. Moreover, the pixel percentages of the CS, UBL, and CVM were visibly greater than the MCD12Q1 (Fig 7B).

Download:

Fig 7. Pixel percentage of each classifier to the total under 17 different land cover types.

https://doi.org/10.1371/journal.pone.0312585.g007

Under the CART classifier, the pixel percentages of the EBF and MF were far different from the MCD12Q1, among which the pixel percentage of the EBF was more and the pixel percentage of the MF was less compared to the MCD12Q1. The pixel percentages of the WS and CVM were 20.50%and 6.36%, respectively, close to values under MCD12Q1. In addition, the pixel percentages of the CS and UBL were more than the MCD12Q1 and reached 16.48% and 7.80%, respectively (Fig 7C). The pixel percentages of different land cover types under the SVM classifier were far different from the MCD12Q1. The pixel percentage of the EBF was 58.41%, considerably more than MCD12Q1. The pixel percentages of the WS and CS were 19.71% and 15.93%. In addition, the pixel percentages of other land cover types were less and different from the MCD12Q1 (Fig 7D). In short, the main land cover types in the study area were the EBF, MF, WS, and CS under the four different classifiers. The pixel percentages for different land cover types were most similar under the CART and MCD12Q1 classifiers.

4.3 The accuracy assessment based on the confusion matrix

Fig 8 is the confusion matrix of training and validation samples. A-C is the confusion matrix of training samples, and D-F is the confusion matrix of validation samples. The total number of pixels for each land cover type in the study area was set at 500 for training samples and 320 for validation samples. Due to the small area of minority land cover types, the total number of pixels for minority land cover types failed to reach the set value (such as Evergreen Broadleaf Forests, Deciduous Broadleaf Forests, Mixed Forests, and Barren).

Download:

Fig 8. A-C and D-F correspond to the confusion matrix of the training samples and the validation samples respectively.

https://doi.org/10.1371/journal.pone.0312585.g008

The number of correctly classified pixels of the Permanent Wetlands, Croplands, Urban and Built-up Lands, Cropland/Natural Vegetation Mosaics, and Permanent Snow and Ice under the RF classifier was all greater than 480. Next, the number of correctly classified pixels of Savannas, Closed Shrublands, Evergreen Needleleaf Forests, and Deciduous Needleleaf Forests was second only to the above land cover types and all types had greater than 400 pixels. The total pixel numbers of other land cover types were far less than 500, such as Evergreen Broadleaf Forests, Deciduous Broadleaf Forests, Woody Savannas, Grasslands, Open Shrublands, Mixed Forests, and Barren (Fig 8A). The number of misclassified pixels under the RF classifier was obvious, for example, 15 pixels in the Permanent Wetlands were misclassified into the Permanent Snow and Ice, 17 pixels in the Grasslands were misclassified into the Cropland/Natural Vegetation Mosaics, and there were 20 and 22 in the Closed Shrublands that were misclassified into the Croplands and Evergreen Broadleaf Forests, respectively (Fig 8A). Compared to the RF classifier, there were more land cover types for which the number of correctly classified pixels reached nearly 500 pixels under the CART. The number of correctly classified pixels of the Evergreen Needleleaf Forests, Deciduous Needleleaf Forests, Closed Shrublands, and Savannas under the CART classifier was more than the RF. Although the total pixel numbers of Deciduous Broadleaf Forests, Mixed Forests, and Barren under the CART classifier were only 7, 3, and 9, respectively, there were no misclassified pixels. Additionally, the total pixel numbers of the Open Shrublands, Woody Savannas, Grasslands, and Water Bodies were far less than the set value equal to 500 pixels, and the number of misclassified pixels under the CART classifier were less than the RF (Fig 8B). The number of correctly classified pixels under the SVM classifier was less than the CART and similar to the RF classifier. There was only the Barren without misclassified pixel in the SVM classifier. (Fig 8C). Further, in the training samples, the user’s and producer’s accuracies of the Savannas, Permanent Wetlands, Croplands, Urban and Built-up Lands, Cropland/Natural Vegetation Mosaics, Permanent Snow and Ice, and Water Bodies under the RF and SVM classifiers were similar and nearly the 100%. However, the user’s and producer’s accuracies of each land cover type under the CART classifier were almost the same and very close to 100%, indicating the accuracies of CART in the training sample were better than those in the RF and SVM classifiers.

The number of correctly classified pixels in the Evergreen Needleleaf Forests under the RF classifier was higher than the other land cover types in which the numbers of correctly classified pixels were less than 300, and were nearly the set value equal to 320 pixels. The total pixel number of the Deciduous Broadleaf Forests, Mixed Forests, and Barren were equal to the number of correctly classified pixels, indicating no misclassified pixels. There were obvious misclassified pixels of some land cover types under the RF classifier, and even the number of misclassified pixels was more than the correctly classified pixels. For example, the number of correctly classified pixels was 136 for the Permanent Snow and Ice; however, the number of misclassified pixels was 184, among which 55 and 51 pixels were misclassified into Savannas and Urban and Built-up Lands, respectively; the number of misclassified pixels of the Deciduous Needleleaf Forests were 171 pixels and more than the correctly classified pixels (149 pixels), among which 76 and 58 pixels were misclassified into the Closed Shrublands and Savannas, respectively (Fig 8D). The number of correctly classified pixels of the Evergreen Needleleaf Forests under the CART classifier was still the highest. Although there were inevitably misclassified pixels under the CART classifier, the number of correctly classified pixels among different land cover types was substantially above the number of misclassified pixels (except for the Permanent Snow and Ice). The CART classifier misclassified 215 pixels of the Permanent Snow and Ice into the Cropland/Natural Vegetation Mosaics, Urban and Built-up Lands, Savannas, Woody Savannas, Deciduous Needleleaf Forests, and Water Bodies (Fig 8E).

The number of misclassified pixels under the SVM classifier was more than the RF and CART classifiers, such as the Croplands, Permanent Wetlands, Savannas, and Closed Shrublands, etc. The number of correctly classified pixels of the Evergreen Needleleaf Forests was equivalent under the SVM and RF classifiers and reached the highest value (Fig 8F). Further, in the validation samples, the user’s and producer’s accuracies of the Deciduous Needleleaf Forests, Croplands, Urban and Built-up Lands, Cropland/Natural Vegetation Mosaics, and Permanent Snow and Ice under the RF classifier were low, ranging from 35.6% to 68.1% for user’s accuracy) and from 43.5% to 71% for producer’s accuracy). The producer’s accuracy of the Evergreen Broadleaf Forests, Deciduous Broadleaf Forests, and Mixed Forests was only 23.8%, 17.2%, and 10%, respectively. The user’s and producer’s accuracies of most land cover types under the CART classifier were visibly greater than the RF and SVM classifiers. Except for the producer’s accuracies of the Deciduous Broadleaf Forests, Mixed Forests, Barren, and Savannas, and the user’s accuracy of the Permanent Snow and Ice, the user’s and producer’s accuracies of other land cover types were the same and nearly 100%. In short, the user’s and producer’s accuracies of the CART in the training and validation samples were significantly better than the RF and SVM classifiers.

The overall accuracy and kappa coefficient are closely related to the overall classification performance. Table 3 is the accuracy assessment of training and validation samples. In the training samples, the overall accuracies of the RF and SVM classifiers were basically similar, and were 93.65% and 93.14%, respectively. The overall accuracy of the CART classifier was the greatest and reached 99.15%. Similarly, the kappa coefficient of the CART classifier was still greater than the RF and SVM classifiers. In the validation samples, the overall accuracy and kappa coefficient of the SVM classifier was lowest among the three kinds of classifiers. However, the overall accuracy and kappa coefficient of the CART classifier were visibly greater than the RF and SVM, indicating the CART classifier was more suitable to this automatic workflow for land cover classification and mapping.

Download:

Table 3. Accuracy assessment of training and validation samples.

https://doi.org/10.1371/journal.pone.0312585.t003

5. Discussion

Requirements for large volume data pre-processing and the collection of representative training samples are the most common difficulties encountered in supervised classification. Because the land cover types over a large study area are complex, mixed, and scattered (e.g., Evergreen Needleleaf Forests and Evergreen Broadleaf Forests, Deciduous Needleleaf Forests and Deciduous Broadleaf Forests, Closed Shrublands and Open Shrublands, etc.), the misclassification of land cover types results in generally low classification accuracy of remote sensing images. The present study documents a fast and reliable methodology to automatically classify large volume Sentinel-2 data using high-quality training samples derived from the MODIS land cover product. Here are the advantages (i) the land cover to be classified in an automated saving-manner without paying too much time and attention to collect and refine the accurate training data by visual interpretations; (ii) generation of a 20 m Sentinel land cover product with the same classification legend as 500 m MODIS land cover product; (iii) potential ability to provide global land cover mapping with reliable classification accuracy.

In this paper, the CART classifier had an overall accuracy and kappa coefficient of 86.38% and 0.86, respectively, visibly greater than values for RF and SVM classifiers. The user’s and producer’s accuracies of the Urban and Built-up Lands, Croplands, and Cropland/Natural Vegetation Mosaics in the training dataset were higher, probably because the spatial distribution of these land cover types in the study area was relatively concentrated. Further, the numbers of misclassified pixels under the RF classifier were obvious, for example, 15 pixels in the Permanent Wetlands were misclassified into the Permanent Snow and Ice, 17 pixels in the Grasslands were misclassified into the Cropland/Natural Vegetation Mosaics, and the Closed Shrublands had 20 and 22 pixels misclassified into the Croplands and Evergreen Broadleaf Forests, respectively (Fig 8A). There was a typical “salt-and-pepper” effect which was in similar spectral characteristics between Permanent Wetlands and Permanent Snow and Ice, Grasslands and Cropland/Natural Vegetation Mosaics, etc., because these land cover types were more refined [40,41]. Similarly, Hankui and David [28] assessed the level of agreement between the 30 m Landsat classifications and the MODIS land cover product-derived training data by bootstrapping the RF implementation. The overall accuracy and kappa coefficient of locally adaptive RF classification achieved 95.44% and 0.9443, respectively, which had a higher overall agreement than the single RF with 93.13% overall accuracy and 0.9195 kappa. The accuracy of the Sentinel-2 multispectral product used in this study is obviously different from the Landsat proposed by Hankui and David [28]. The overall accuracy and kappa of the RF classifier under the Sentinel-2 multispectral product were just 74.36% and 0.73, respectively, and lower than the Landsat, which showed the influence of remote sensing data on the classification accuracy.

The performance of classifiers is obviously different for land cover types [42]. In this study, the user’s and producer’s accuracies of the Cropland/Natural Vegetation Mosaics under the RF classifier were only 56.56% and 67.79%, respectively (Fig 8D), however, the user’s and producer’s accuracies under the CART classifier were 85.63% and 81.79%, respectively (Fig 8F). The user’s and producer’s accuracies of each land cover type under the RF, CART, and SVM classifiers were all different. Zhao et al. [43] compared the classification effects of different classifiers on land cover, and reported SVM classified rapidly but required detailed feature parameters; RF classifiers had fast speed, good stability, and highest accuracy but relatively poor stability. The overall accuracy and kappa coefficient of the SVM classifier were only 66.87% and 0.65, respectively, the lowest among the three kinds of classifiers. However, Fragou et al. [44] exploited the SVM classifier to classify the natural landscape in a Mediterranean environment using the Landsat Thematic Mapper images and found that overall accuracy was all-around 90%. Apparently, the performance of a certain classifier is influenced by the specific study area, the type of remote sensing images, the classification algorithms, the quantity and quality of training samples, and the dataset selected for accuracy assessment [45,46].

Further, ensemble methods can provide useful information like the Gini index to the end-user. The Gini index is a measure of statistical distribution intended to represent different attribute variables influencing the overall accuracy [47]. Using the Gini index, we were able to identify that B11 and B12 have a substantial effect on the overall model accuracy. Thus, using our proposed method, the classification performance can be increased by including additional indexes [48]. For example, including Normalized Difference Vegetation Index (NDVI) [49], Normalized Difference Salinity Index (NDSI) [50], and Normalized Difference Water Index (NDWI) [51] would help in getting a better performance.

6. Conclusions

Based on the self-programming in GEE cloud-based platform, a large and reliable training dataset of multispectral Sentinel-2 image was extracted automatically across the study area from the existing MODIS land cover product. To enhance confidence in high-quality training class labels, homogeneous 20 m Sentinel-2 pixels within each 500 m MODIS pixel were selected and a minority of heterogeneous 20 m pixels were removed based on calculations of spectral centroid and Euclidean distance. Further, the quality control and spatial filter were applied for all land cover classes to generate a reliable and representative training dataset that was subsequently applied to train the Classification and Regression Tree (CART), Random Forest (RF), and Support Vector Machine (SVM) classifiers. This approach generated a new Sentinel-2 land cover map for each classifier with the same legend as the MODIS product; The CART classifier appeared to be most suitable for this automatic workflow scheme, as its overall accuracy of 86.38% and kappa coefficient of 0.86 were greater than corresponding values for RF or SVM classifiers. The proposed method can automatically generate a large number of reliable and accurate training samples in a timely manner, which is promising for future land cover mapping in a large-scale region.

Moreover, the main land cover types in the study area as distinguished by three different classifiers were Evergreen Broadleaf Forests, Mixed Forests, Woody Savannas, and Croplands. The main land cover types in the northern part were Mixed Forests, Woody Savannas, with lesser areas classified as Croplands and Cropland/Natural Vegetation Mosaics. Areas classified as Evergreen Broadleaf Forests were mainly distributed in the eastern, western, and central parts of the study area. The southern part near the sea was dominated by the Urban and Built-up Lands, surrounded by a large area of Croplands and Cropland/Natural Vegetation Mosaics. Furthermore, there were scattered Permanent Wetlands in the southwest.

Finally, in the training and validation samples, the total pixel numbers (pixel size = 20 x 20 m) of the Evergreen Broadleaf Forests, Deciduous Broadleaf Forests, Mixed Forests, and Barren were the least (less than 17 pixels). Next, the total pixel numbers of the Woody Savannas and Water Bodies were nearly half the set value equal to 500 and 320 pixels. However, the total pixel numbers of the Open Shrublands and Grasslands were greater than 100 pixels but less than 150 pixels. The numbers of correctly classified pixels under the CART without computationally intensive were more than those for the RF and SVM classifiers. Moreover, the user’s and producer’s accuracies of the CART were significantly better than the RF and SVM classifiers. The overall accuracy and kappa coefficient of the CART classifier were the best. Furthermore, compared with the RF and SVM classifiers, the pixel percentages and overall spatial distribution of the land cover types were comparable under the CART classifier and MCD12Q1, indicating the CART classifier was more suitable to this automatic workflow for land cover mapping.

References

1. Pires de Lima R, Marfurt K. Convolutional neural network for remote-sensing scene classification: transfer learning analysis. Remote Sens. 2020;12(1):86.
- View Article
- Google Scholar
2. Li W, Gong P. Continuous monitoring of coastline dynamics in western Florida with a 30-year time series of Landsat imagery. Remote Sens Environ. 2016;179:196–209.
- View Article
- Google Scholar
3. Huang X, Weng C, Lu Q, Feng T, Zhang L. Automatic labelling and selection of training samples for high-resolution remote sensing image classification over urban areas. Remote Sens. 2015;7(12):16024–44.
- View Article
- Google Scholar
4. Persello C, Bruzzone L. Active learning for domain adaptation in the supervised classification of remote sensing images. IEEE Trans Geosci Remote Sens. 2012;50(11):4468–83.
- View Article
- Google Scholar
5. Capolupo A, Monterisi C, Tarantino E. Landsat images classification algorithm (LICA) to automatically extract land cover information in google earth engine environment. Remote Sens. 2020;12(7):1201.
- View Article
- Google Scholar
6. Liu X, Hu G, Chen Y, Li X, Xu X, Li S, et al. High-resolution multi-temporal mapping of global urban land using Landsat images based on the Google Earth Engine platform. Remote Sens Environ. 2018;209:227–39.
- View Article
- Google Scholar
7. Mitra P, Uma Shankar B, Pal SK. Segmentation of multispectral remote sensing images using active support vector machines. Pattern Recognit Lett. 2004;25(9):1067–74.
- View Article
- Google Scholar
8. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345–59.
- View Article
- Google Scholar
9. Hansen MC, Loveland TR. A review of large area monitoring of land cover change using Landsat data. Remote Sens Environ. 2012;122:66–74.
- View Article
- Google Scholar
10. Tuia D, Volpi M, Copa L, Kanevski M, Munoz-Mari J. A survey of active learning algorithms for supervised remote sensing image classification. IEEE J Sel Top Signal Process. 2011;5(3):606–17.
- View Article
- Google Scholar
11. Huang X, Zhang L. Morphological building/shadow index for building extraction from high-resolution imagery over urban areas. IEEE J Sel Top Appl Earth Obs Remote Sens. 2012;5(1):161–72.
- View Article
- Google Scholar
12. Demir B, Persello C, Bruzzone L. Batch-mode active-learning methods for the interactive classification of remote sensing images. IEEE Trans Geosci Remote Sens. 2011;49(3):1014–31.
- View Article
- Google Scholar
13. Di W, Crawford MM. View generation for multiview maximum disagreement based active learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2012;50(5):1942–54.
- View Article
- Google Scholar
14. Patra S, Bruzzone L. A novel SOM-SVM-based active learning technique for remote sensing image classification. IEEE Trans Geosci Remote Sens. 2014;52(11):6899–910.
- View Article
- Google Scholar
15. Persello C, Boularias A, Dalponte M, Gobakken T, Naesset E, Scholkopf B. Cost-sensitive active learning with lookahead: optimizing field surveys for remote sensing data classification. IEEE Trans Geosci Remote Sens. 2014;52(10):6652–64.
- View Article
- Google Scholar
16. Goward SN, Markham B, Dye DG, Dulaney W, Yang J. Normalized difference vegetation index measurements from the advanced very high resolution radiometer. Remote Sens Environ. 1991;35(2–3):257–77.
- View Article
- Google Scholar
17. Farahmand N, Sadeghi V, Farahmand S. Estimating soil salinity in the dried lake bed of Urmia Lake using optical Sentinel-2b images and multivariate linear regression models. J Indian Soc Remote Sens. 2020;48:675–87.
- View Article
- Google Scholar
18. Taghizadeh‐mehrjardi R, Toomanian N, Khavaninzadeh AR, Jafari A, Triantafilis J. Predicting and mapping of soil particle‐size fractions with adaptive neuro‐fuzzy inference and ant colony optimization in central Iran. European J Soil Sci. 2016;67(6):707–25.
- View Article
- Google Scholar
19. Xu C, Hystad P, Chen R, Van Den Hoek J, Hutchinson RA, Hankey S, et al. Application of training data affects success in broad-scale local climate zone mapping. Int J Appl Earth Obs Geoinf. 2021;103:102482.
- View Article
- Google Scholar
20. Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R. Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sens Environ. 2017;202:18–27.
- View Article
- Google Scholar
21. Hird J, DeLancey E, McDermid G, Kariyeva J. Google Earth Engine, open-access satellite data, and machine learning in support of large-area probabilistic wetland mapping. Remote Sens. 2017;9(12):1315.
- View Article
- Google Scholar
22. Kumar L, Mutanga O. Google Earth Engine applications since inception: usage, trends, and potential. Remote Sens. 2018;10(10):1509.
- View Article
- Google Scholar
23. Amani M, Mahdavi S, Afshar M, Brisco B, Huang W, Mohammad Javad Mirzadeh S, et al. Canadian wetland inventory using Google Earth Engine: the first map and preliminary results. Remote Sens. 2019;11(7):842.
- View Article
- Google Scholar
24. Hansen MC, Potapov PV, Moore R, Hancher M, Turubanova SA, Tyukavina A, et al. High-resolution global maps of 21st-century forest cover change. Science. 2013;342(6160):850–3. pmid:24233722
- View Article
- PubMed/NCBI
- Google Scholar
25. Radoux J, Lamarche C, Van Bogaert E, Bontemps S, Brockmann C, Defourny P. Automated training sample extraction for global land cover mapping. Remote Sens. 2014;6(5):3965–87.
- View Article
- Google Scholar
26. Amani M, Ghorbanian A, Ahmadi SA, Kakooei M, Moghimi A, Mirmazloumi SM, et al. Google Earth Engine cloud computing platform for remote sensing big data applications: a comprehensive review. IEEE J Sel Top Appl Earth Obs Remote Sens. 2020;13:5326–50.
- View Article
- Google Scholar
27. Raiyani K, Gonçalves T, Rato L, Salgueiro P, Marques da Silva JR. Sentinel-2 image scene classification: a comparison between Sen2Cor and a machine learning approach. Remote Sens. 2021;13(2):300.
- View Article
- Google Scholar
28. Zhang HK, Roy DP. Using the 500 m MODIS land cover product to derive a consistent continental scale 30 m Landsat land cover classification. Remote Sens Environ. 2017;197:15–34.
- View Article
- Google Scholar
29. Xie S, Liu L, Zhang X, Yang J, Chen X, Gao Y. Automatic land-cover mapping using Landsat time-series data based on Google Earth Engine. Remote Sens. 2019;11(24):3023.
- View Article
- Google Scholar
30. Paula DB, Rene RC, Gerardo LS, Leonardo AH, Ricardo ML, Nicolás AM, et al. A land cover map of Latin America and the Caribbean in the framework of the SERENA project. Remote Sens Environ. 2013;132:13–31.
- View Article
- Google Scholar
31. Lu L, Tao Y, Di L. Object-Based Plastic-Mulched Landcover Extraction Using Integrated Sentinel-1 and Sentinel-2 Data. Remote Sens. 2018;10(11):1820.
- View Article
- Google Scholar
32. Gao B. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens Environ. 1996;58(3):257–66.
- View Article
- Google Scholar
33. Jain AK, Duin PW, Jianchang Mao. Statistical pattern recognition: a review. IEEE Trans Pattern Anal Machine Intell. 2000;22(1):4–37.
- View Article
- Google Scholar
34. Mountrakis G, Im J, Ogole C. Support vector machines in remote sensing: a review. ISPRS J Photogram Remote Sens. 2011;66(3):247–59.
- View Article
- Google Scholar
35. Singhal G, Bansod B, Mathew L, Goswami J, Choudhury BU, Raju PLN. Chlorophyll estimation using multi-spectral unmanned aerial system based on machine learning techniques. Remote Sens Appl. 2019;15:100235.
- View Article
- Google Scholar
36. Loozen Y, Rebel KT, de Jong SM, Lu M, Ollinger SV, Wassen MJ, et al. Mapping canopy nitrogen in European forests using remote sensing and environmental variables with the random forests method. Remote Sens Environ. 2020;247:111933.
- View Article
- Google Scholar
37. Hu L, Li W, Xu B. Monitoring mangrove forest change in China from 1990 to 2015 using Landsat-derived spectral-temporal variability metrics. Int J Appl Earth Obse Geoinf. 2018;73:88–98.
- View Article
- Google Scholar
38. Hu L, Xu N, Liang J, Li Z, Chen L, Zhao F. Advancing the mapping of mangrove forests at national-scale using Sentinel-1 and Sentinel-2 time-series data with Google Earth Engine: a case study in China. Remote Sens. 2020;12(19):3120.
- View Article
- Google Scholar
39. Qian Y, Zhou W, Yan J, Li W, Han L. Comparing machine learning classifiers for object-based land cover classification using very high resolution imagery. Remote Sens. 2014;7(1):153–68.
- View Article
- Google Scholar
40. Gashaw T, Tulu T, Argaw M, Worqlul AW, Tolessa T, Kindu M. Estimating the impacts of land use/land cover changes on ecosystem service values: the case of the Andassa watershed in the Upper Blue Nile basin of Ethiopia. Ecosyst Serv. 2018;31:219–28.
- View Article
- Google Scholar
41. Liu C, Li W, Zhu G, Zhou H, Yan H, Xue P. Land use/land cover changes and their driving factors in the northeastern Tibetan plateau based on geographical detectors and Google Earth Engine: a case study in Gannan prefecture. Remote Sens. 2020;12(19):3139.
- View Article
- Google Scholar
42. Mao W, Lu D, Hou L, Liu X, Yue W. Comparison of machine-learning methods for urban land-use mapping in Hangzhou city, China. Remote Sens. 2020;12(17):2817.
- View Article
- Google Scholar
43. Zhao D, Gu H, Jia Y. Comparison of machine learning method in object-based image classification. Sci Surv Map. 2016;41:181–186.
- View Article
- Google Scholar
44. Fragou S, Kalogeropoulos K, Stathopoulos N, Louka P, Srivastava PK, Karpouzas S, et al. Quantifying land cover changes in a Mediterranean environment using Landsat TM and support vector machines. Forests. 2020;11(7):750.
- View Article
- Google Scholar
45. Ahmadi K, Kalantar B, Saeidi V, Harandi EKG, Janizadeh S, Ueda N. Comparison of machine learning methods for mapping the stand characteristics of temperate forests using multi-spectral Sentinel-2 data. Remote Sens. 2020;12(18):3019.
- View Article
- Google Scholar
46. Huang X, Zhang L. A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery. Photogramm Eng Remote Sens. 2011;77(7):721–32.
- View Article
- Google Scholar
47. Belgiu M, Drăguţ L. Random forest in remote sensing: a review of applications and future directions. ISPRS J Photogramm Remote Sens. 2016;114:24–31.
- View Article
- Google Scholar
48. Henrich V, Götze E, Jung A, Sandow C, Thürkow D, Gläßer C. Development of an online indices database: Motivation, concept and implementation. In Proceedings of the 6th EARSeL Imaging Spectroscopy SIG Workshop Innovative Tool for Scientific and Commercial Environment Applications; 2009 March 16–18; Tel Aviv, Israel; p. 16–18.
49. Crist EP, Cicone RC. A Physically-Based Transformation of Thematic Mapper Data---The TM Tasseled Cap. IEEE Trans Geosci Remote Sens. 1984;GE-22(3):256–63.
- View Article
- Google Scholar
50. Richardson AD, Duigan SP, Berlyn GP. An evaluation of noninvasive methods to estimate foliar chlorophyll content. New Phytologist. 2002;153(1):185–94.
- View Article
- Google Scholar
51. Alkhaier F. Soil Salinity Detection Using Satellite Remote Sensing. 2003.

[ref1] 1. Pires de Lima R, Marfurt K. Convolutional neural network for remote-sensing scene classification: transfer learning analysis. Remote Sens. 2020;12(1):86.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Li W, Gong P. Continuous monitoring of coastline dynamics in western Florida with a 30-year time series of Landsat imagery. Remote Sens Environ. 2016;179:196–209.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Huang X, Weng C, Lu Q, Feng T, Zhang L. Automatic labelling and selection of training samples for high-resolution remote sensing image classification over urban areas. Remote Sens. 2015;7(12):16024–44.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Persello C, Bruzzone L. Active learning for domain adaptation in the supervised classification of remote sensing images. IEEE Trans Geosci Remote Sens. 2012;50(11):4468–83.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Capolupo A, Monterisi C, Tarantino E. Landsat images classification algorithm (LICA) to automatically extract land cover information in google earth engine environment. Remote Sens. 2020;12(7):1201.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Liu X, Hu G, Chen Y, Li X, Xu X, Li S, et al. High-resolution multi-temporal mapping of global urban land using Landsat images based on the Google Earth Engine platform. Remote Sens Environ. 2018;209:227–39.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Mitra P, Uma Shankar B, Pal SK. Segmentation of multispectral remote sensing images using active support vector machines. Pattern Recognit Lett. 2004;25(9):1067–74.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng. 2010;22(10):1345–59.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Hansen MC, Loveland TR. A review of large area monitoring of land cover change using Landsat data. Remote Sens Environ. 2012;122:66–74.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Tuia D, Volpi M, Copa L, Kanevski M, Munoz-Mari J. A survey of active learning algorithms for supervised remote sensing image classification. IEEE J Sel Top Signal Process. 2011;5(3):606–17.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Huang X, Zhang L. Morphological building/shadow index for building extraction from high-resolution imagery over urban areas. IEEE J Sel Top Appl Earth Obs Remote Sens. 2012;5(1):161–72.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Demir B, Persello C, Bruzzone L. Batch-mode active-learning methods for the interactive classification of remote sensing images. IEEE Trans Geosci Remote Sens. 2011;49(3):1014–31.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Di W, Crawford MM. View generation for multiview maximum disagreement based active learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens. 2012;50(5):1942–54.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Patra S, Bruzzone L. A novel SOM-SVM-based active learning technique for remote sensing image classification. IEEE Trans Geosci Remote Sens. 2014;52(11):6899–910.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Persello C, Boularias A, Dalponte M, Gobakken T, Naesset E, Scholkopf B. Cost-sensitive active learning with lookahead: optimizing field surveys for remote sensing data classification. IEEE Trans Geosci Remote Sens. 2014;52(10):6652–64.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Goward SN, Markham B, Dye DG, Dulaney W, Yang J. Normalized difference vegetation index measurements from the advanced very high resolution radiometer. Remote Sens Environ. 1991;35(2–3):257–77.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref17] 17. Farahmand N, Sadeghi V, Farahmand S. Estimating soil salinity in the dried lake bed of Urmia Lake using optical Sentinel-2b images and multivariate linear regression models. J Indian Soc Remote Sens. 2020;48:675–87.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref18] 18. Taghizadeh‐mehrjardi R, Toomanian N, Khavaninzadeh AR, Jafari A, Triantafilis J. Predicting and mapping of soil particle‐size fractions with adaptive neuro‐fuzzy inference and ant colony optimization in central Iran. European J Soil Sci. 2016;67(6):707–25.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref19] 19. Xu C, Hystad P, Chen R, Van Den Hoek J, Hutchinson RA, Hankey S, et al. Application of training data affects success in broad-scale local climate zone mapping. Int J Appl Earth Obs Geoinf. 2021;103:102482.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref20] 20. Gorelick N, Hancher M, Dixon M, Ilyushchenko S, Thau D, Moore R. Google Earth Engine: planetary-scale geospatial analysis for everyone. Remote Sens Environ. 2017;202:18–27.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref21] 21. Hird J, DeLancey E, McDermid G, Kariyeva J. Google Earth Engine, open-access satellite data, and machine learning in support of large-area probabilistic wetland mapping. Remote Sens. 2017;9(12):1315.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref22] 22. Kumar L, Mutanga O. Google Earth Engine applications since inception: usage, trends, and potential. Remote Sens. 2018;10(10):1509.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Amani M, Mahdavi S, Afshar M, Brisco B, Huang W, Mohammad Javad Mirzadeh S, et al. Canadian wetland inventory using Google Earth Engine: the first map and preliminary results. Remote Sens. 2019;11(7):842.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Hansen MC, Potapov PV, Moore R, Hancher M, Turubanova SA, Tyukavina A, et al. High-resolution global maps of 21st-century forest cover change. Science. 2013;342(6160):850–3. pmid:24233722
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref25] 25. Radoux J, Lamarche C, Van Bogaert E, Bontemps S, Brockmann C, Defourny P. Automated training sample extraction for global land cover mapping. Remote Sens. 2014;6(5):3965–87.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref26] 26. Amani M, Ghorbanian A, Ahmadi SA, Kakooei M, Moghimi A, Mirmazloumi SM, et al. Google Earth Engine cloud computing platform for remote sensing big data applications: a comprehensive review. IEEE J Sel Top Appl Earth Obs Remote Sens. 2020;13:5326–50.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref27] 27. Raiyani K, Gonçalves T, Rato L, Salgueiro P, Marques da Silva JR. Sentinel-2 image scene classification: a comparison between Sen2Cor and a machine learning approach. Remote Sens. 2021;13(2):300.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref28] 28. Zhang HK, Roy DP. Using the 500 m MODIS land cover product to derive a consistent continental scale 30 m Landsat land cover classification. Remote Sens Environ. 2017;197:15–34.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref29] 29. Xie S, Liu L, Zhang X, Yang J, Chen X, Gao Y. Automatic land-cover mapping using Landsat time-series data based on Google Earth Engine. Remote Sens. 2019;11(24):3023.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref30] 30. Paula DB, Rene RC, Gerardo LS, Leonardo AH, Ricardo ML, Nicolás AM, et al. A land cover map of Latin America and the Caribbean in the framework of the SERENA project. Remote Sens Environ. 2013;132:13–31.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

[ref31] 31. Lu L, Tao Y, Di L. Object-Based Plastic-Mulched Landcover Extraction Using Integrated Sentinel-1 and Sentinel-2 Data. Remote Sens. 2018;10(11):1820.
View Article
Google Scholar

[93] View Article

[94] Google Scholar

[ref32] 32. Gao B. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens Environ. 1996;58(3):257–66.
View Article
Google Scholar

[96] View Article

[97] Google Scholar

[ref33] 33. Jain AK, Duin PW, Jianchang Mao. Statistical pattern recognition: a review. IEEE Trans Pattern Anal Machine Intell. 2000;22(1):4–37.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref34] 34. Mountrakis G, Im J, Ogole C. Support vector machines in remote sensing: a review. ISPRS J Photogram Remote Sens. 2011;66(3):247–59.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref35] 35. Singhal G, Bansod B, Mathew L, Goswami J, Choudhury BU, Raju PLN. Chlorophyll estimation using multi-spectral unmanned aerial system based on machine learning techniques. Remote Sens Appl. 2019;15:100235.
View Article
Google Scholar

[105] View Article

[106] Google Scholar

[ref36] 36. Loozen Y, Rebel KT, de Jong SM, Lu M, Ollinger SV, Wassen MJ, et al. Mapping canopy nitrogen in European forests using remote sensing and environmental variables with the random forests method. Remote Sens Environ. 2020;247:111933.
View Article
Google Scholar

[108] View Article

[109] Google Scholar

[ref37] 37. Hu L, Li W, Xu B. Monitoring mangrove forest change in China from 1990 to 2015 using Landsat-derived spectral-temporal variability metrics. Int J Appl Earth Obse Geoinf. 2018;73:88–98.
View Article
Google Scholar

[111] View Article

[112] Google Scholar

[ref38] 38. Hu L, Xu N, Liang J, Li Z, Chen L, Zhao F. Advancing the mapping of mangrove forests at national-scale using Sentinel-1 and Sentinel-2 time-series data with Google Earth Engine: a case study in China. Remote Sens. 2020;12(19):3120.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref39] 39. Qian Y, Zhou W, Yan J, Li W, Han L. Comparing machine learning classifiers for object-based land cover classification using very high resolution imagery. Remote Sens. 2014;7(1):153–68.
View Article
Google Scholar

[117] View Article

[118] Google Scholar

[ref40] 40. Gashaw T, Tulu T, Argaw M, Worqlul AW, Tolessa T, Kindu M. Estimating the impacts of land use/land cover changes on ecosystem service values: the case of the Andassa watershed in the Upper Blue Nile basin of Ethiopia. Ecosyst Serv. 2018;31:219–28.
View Article
Google Scholar

[120] View Article

[121] Google Scholar

[ref41] 41. Liu C, Li W, Zhu G, Zhou H, Yan H, Xue P. Land use/land cover changes and their driving factors in the northeastern Tibetan plateau based on geographical detectors and Google Earth Engine: a case study in Gannan prefecture. Remote Sens. 2020;12(19):3139.
View Article
Google Scholar

[123] View Article

[124] Google Scholar

[ref42] 42. Mao W, Lu D, Hou L, Liu X, Yue W. Comparison of machine-learning methods for urban land-use mapping in Hangzhou city, China. Remote Sens. 2020;12(17):2817.
View Article
Google Scholar

[126] View Article

[127] Google Scholar

[ref43] 43. Zhao D, Gu H, Jia Y. Comparison of machine learning method in object-based image classification. Sci Surv Map. 2016;41:181–186.
View Article
Google Scholar

[129] View Article

[130] Google Scholar

[ref44] 44. Fragou S, Kalogeropoulos K, Stathopoulos N, Louka P, Srivastava PK, Karpouzas S, et al. Quantifying land cover changes in a Mediterranean environment using Landsat TM and support vector machines. Forests. 2020;11(7):750.
View Article
Google Scholar

[132] View Article

[133] Google Scholar

[ref45] 45. Ahmadi K, Kalantar B, Saeidi V, Harandi EKG, Janizadeh S, Ueda N. Comparison of machine learning methods for mapping the stand characteristics of temperate forests using multi-spectral Sentinel-2 data. Remote Sens. 2020;12(18):3019.
View Article
Google Scholar

[135] View Article

[136] Google Scholar

[ref46] 46. Huang X, Zhang L. A multidirectional and multiscale morphological index for automatic building extraction from multispectral GeoEye-1 imagery. Photogramm Eng Remote Sens. 2011;77(7):721–32.
View Article
Google Scholar

[138] View Article

[139] Google Scholar

[ref47] 47. Belgiu M, Drăguţ L. Random forest in remote sensing: a review of applications and future directions. ISPRS J Photogramm Remote Sens. 2016;114:24–31.
View Article
Google Scholar

[141] View Article

[142] Google Scholar

[ref48] 48. Henrich V, Götze E, Jung A, Sandow C, Thürkow D, Gläßer C. Development of an online indices database: Motivation, concept and implementation. In Proceedings of the 6th EARSeL Imaging Spectroscopy SIG Workshop Innovative Tool for Scientific and Commercial Environment Applications; 2009 March 16–18; Tel Aviv, Israel; p. 16–18.

[ref49] 49. Crist EP, Cicone RC. A Physically-Based Transformation of Thematic Mapper Data---The TM Tasseled Cap. IEEE Trans Geosci Remote Sens. 1984;GE-22(3):256–63.
View Article
Google Scholar

[145] View Article

[146] Google Scholar

[ref50] 50. Richardson AD, Duigan SP, Berlyn GP. An evaluation of noninvasive methods to estimate foliar chlorophyll content. New Phytologist. 2002;153(1):185–94.
View Article
Google Scholar

[148] View Article

[149] Google Scholar

[ref51] 51. Alkhaier F. Soil Salinity Detection Using Satellite Remote Sensing. 2003.

Figures

Abstract

1. Introduction

2. Study area and datasets

2.1 Study area

2.2 MODIS land cover product

2.3 Sentinel-2 multispectral product

3. Method

3.1 Automatic collection of training samples from MCD12Q1 land cover product

3.2 Quality control and spatial filter of land cover pixels

3.3 Spectral indices calculations in Sentinel-2 image data

3.4 Supervised classification algorithms

3.5 Accuracy assessment

4. Results

4.1 The area distribution of different classifiers and land cover types

4.2 The pixel percentage of classifiers under each land cover type

4.3 The accuracy assessment based on the confusion matrix

5. Discussion

6. Conclusions

References