Skip to main content
Advertisement
  • Loading metrics

M-band discrete wavelet transform-based multi-view and dual input deep learning algorithm for identifying thermokarst lakes in the Qinghai–Tibetan Plateau

  • Andrew R. Li ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    andrew.li.application@gmail.com

    Affiliation University of California, Los Angeles, CA, United States of America

  • Jiahe Liu,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review & editing

    Affiliation Department of Mathematics, Western Connecticut State University, Danbury, CT, United States of America

  • Olivia Liu,

    Roles Conceptualization, Investigation, Methodology, Writing – original draft, Writing – review & editing

    Affiliation Department of Mathematics, Western Connecticut State University, Danbury, CT, United States of America

  • Xiaodi Wang

    Roles Conceptualization, Methodology, Supervision, Writing – review & editing

    Affiliation Department of Mathematics, Western Connecticut State University, Danbury, CT, United States of America

Abstract

In the context of permafrost thaw, thermokarst lakes act as pivotal indicators and are significant point sources of both in the present and in the foreseeable future [1]. Small thermokarst lakes have been identified as the most prolific producers. Nonetheless, identifying thermokarst lakes, especially the smaller ones, has been challenging, as it often requires field data collection. In this research, we propose a novel M-band discrete wavelet transform (MDWT)–based multi-view and dual-input deep learning (DL) framework using convolutional neural networks (CNN) to automate the classification and precise prediction of thermokarst lakes in the Qinghai–Tibetan Plateau (QTP). By applying MDWT to the raw imagery of over 500 Sentinel-2 satellite lake images, we were able to establish a 3-way tensor dataset which breaks each image into multi-views of different frequency components. Moreover, we integrated non-image feature data pertaining to various climatic parameters. This unique and novel data processing technique enhances the feature set and boosts validation accuracy by a margin of up to 17%. Given that our pre-processing approach also removes the need for manual polygon delineation, our technique boasts enhanced robustness and scalability, mitigating the necessity for field data collection.

Introduction

Background

One of the relatively recent phenomena that has been exacerbated by global warming is the accelerated thawing of permafrost triggered by rising air and ground temperatures [2]. Permafrost has historically served as the largest terrestrial carbon sink on Earth. It covers approximately 24% of the northern land surface. This permafrost holds around 1,600 billion tonnes of carbon, twice the amount present in the current atmosphere [3, 4]. Although the permafrost under most of the Qinghai–Tibetan Plateau (QTP) region is relatively thin, it still plays a crucial role in cryospheric processes [5]. More importantly, the QTP is extremely sensitive to climate change [6]. Temperatures at elevations above 4,000 m have warmed up to 75% faster than those at elevations below 2,000 m [7]. Current climate models suggest that a considerable portion of permafrost will disappear in the coming decades [8]. The projections are supported by the effects of recent spikes in global temperatures, which have caused huge areas of permafrost to thaw. This melting has created vast thermokarst terrains and released large amounts of greenhouse gases (GHGs) into the atmosphere. Therefore, increases in thermokarst features are an early indicator of increased permafrost loss and elevated GHG emissions [9, 10]. Historical data supports this, showing that 33–87% of the increase in atmospheric in northern latitudes came from bubbling in new thermokarst lakes. This contributes to the warming climate of past deglaciation periods [11].

Thermokarst is the most common way that permafrost thaws quickly. It leads to several features, including thermo-erosional gullies and retrogressive thaw slumps. It also causes active layer detachment slides, thermokarst lakes, and basins where thaw lakes have drained [12]. Thermokarst terrain appears in northern latitudes where permafrost has collapsed, leading to relatively high levels of diffusion and ebullition of GHGs, particularly [1315]. Diffusion and ebullition are two pathways of GHG release from the water into the atmosphere. Ebullition refers to fluxes of bubbles or gas pockets from sediments and water columns [16]. Of the different types of thermokarst terrain, thermokarst lakes contribute the most to emissions, with expected emissions of Tg of per year, or roughly 17–26% of the total emissions from all northern lakes. By the year 2300, the total projected permafrost carbon feedback could reach up to 0.39°C [17].

Seasonal fluctuations of emissions have been observed. Certain lakes, including floating ice lakes, contain trapped in ice bubbles during the winter months, causing peak seasonal emissions in the warmer months [18]. Despite these observations, long-term emissions from thermokarst lakes are still a highly uncertain bottleneck preventing the scientific community from fully discerning the global budget [19]. To remedy these uncertainties, our research aims to distinguish thermokarst lakes from non-thermokarst lakes.

There exist major distinctions between non-thermokarst permafrost lakes and thermokarst lakes. First, thermokarst lakes must form above rapidly thawing ice-rich permafrost, providing an impermeable foundation that prevents immediate water drainage [20]. Currently, thermokarst lakes cover approximately 7% of total permafrost-affected land [21]. Most thermokarst lakes in the QTP region, in particular, form on top of continuous permafrost [22]. A meaningful feature of permafrost is the active layer thickness (ALT), which is a layer of soil that freezes and thaws annually. The thermokarst process is catalyzed by the deepening of the active layer. Different disruptions, including temperature warming, can trigger this process, causing subsequent subsidence or erosion. The extent of a resulting depression may depend on the elevation and shape of the terrain. The expansion of thermokarst lakes is accelerated by heat conduction, which leads to thawing of water bodies below and around the thermokarst region [1]. Taliks, layers of unfrozen ground, begin to form under these lakes. They facilitate one way of rapid thawing beneath the lakes, causing vertical expansion [23]. The development of non-thermokarst lakes occurs independently of this process.

Another distinctive feature of thermokarst lakes is their water content. Much of the water that forms thermokarst lakes can be attributed to the melted ice within the thawed permafrost. Organic matter from melted permafrost is decomposed by microbes [20]. This matter is highly reactive and releases more nutrients than soil, creating a positive feedback loop that encourages additional activity, which generates more GHG emissions [24, 25]. This decomposition of organic matter is dependent on oxygen availability: a high availability of oxygen leads to the production of gas, while low availability leads to the production of gas [26, 27].

As water drains, the thermokarst process decelerates, and vegetation growth amplifies [20]. Identifying thermokarst lakes serves as a prerequisite for predicting drainage events that shape the surface landscape, catalyzing large-scale changes in topography and carbon cycling [2]. Further, they can introduce significant quantities of sediment and nutrients to nearby water bodies, altering the ecosystems of the lakes and rivers [8]. For example, past LiDAR satellite imagery has exposed linear corrugations, deep thermo-erosional gullies, and drainage channels [28]. Thermokarst lakes can also disrupt the regular pattern of soil temperature fluctuations. They have the ability to raise sediment temperatures by up to 10°C above the mean annual air temperature [29].

Related works

Modern machine learning (ML) algorithms have found applications in almost every field. The most significant advantage of deep learning (DL) methods, in particular, is that they can automatically learn features [30]. Traditional supervised models require users to input a number of variables that they believe would produce an optimal model. Unsupervised models removes the expertise and knowledge barrier [2]. CNNs are the most popular ML method used for remote sensing image classification, as they are effective at spatial feature extraction [31].

Previous research has investigated the classification of thermokarst landforms. A plethora of semiautomatic methods to predict land cover have incorporated ML techniques into their model frameworks, including k-means clustering, random forest regression, maximum likelihood classifiers, and DL models such as temporal convolutional neural network (TempCNN) [2, 6, 22, 23, 32]. Nonetheless, such models often incorporate a manual element during either the post-processing or the training phases. A majority of image-centric classification models primarily utilize RGB channels along with select infrared light channels. The two indices based on the infrared channels most frequently used to map surface water bodies are the Normalized Difference Water Index (NDWI) and the Modified Normalized Difference Water Index (MNDWI) [6, 33, 34]. The NDWI is a cost- and time-efficient method of extracting land surface water information [35]. However, solely basing classification off the NDWI can risk underclassifying shallow lakes, which do not reflect as much green light or absorb as much near-infrared light as deeper lakes [36, 37].

Enhanced pre-processing techniques have not been extensively employed in the classification of thermokarst landforms. The pre-processing of satellite images is intended to improve identification quality and minimize redundant information. Some studies have omitted all additional data pre-processing [23, 32, 38]. The few studies that had used Sentinel data that had performed data pre-processing only utilized the Sentinel Application Platform (SNAP) [22, 39]. In one example, spectral transformation using principal component analysis (PCA) was applied to separate water bodies from other surface features and reduce redundant information at a 10-meter spatial resolution [22]. In another case, the raw digital number values were converted into Top-of-Atmosphere (TOA) reflectance through radiometric calibration; the TOA-corrected images, represented by spectral bands (red, blue, green, near-infrared, and short-wave infrared) at 30-meter resolution, were used in the analysis [39]. These studies do not take full advantage of the potential of pre-processing. To our knowledge, SNAP only covers visualization, resampling, and mosaic operations [40]. More advanced pre-processing techniques, such as discrete wavelet transform (DWT), have the power to unlock hidden information stored in satellite images, as well as to enhance resolution to improve model accuracy [41].

In recent years, DL has been increasingly applied to permafrost research, encompassing the prediction of thermokarst landslide susceptibility [32], the delineation of retrogressive thaw slumps [42], and the cartography of lake ice in Canada [43]. We identified only two studies that utilized convolutional neural networks (CNNs) for the classification of thermokarst lakes [2]. [23] utilizes a TempCNN to automatically map floating lake ice and explore the temporal dynamics of ice changes. However, because the authors chose only to employ the temporal dimension, the classification model remained one-dimensional [43]. The second study investigates the applicability of DL to classifying retrogressive thaw slumps (RTS). Other studies employed DeepLab for the purpose of semantic segmentation [2]. For our research objectives, using semantic segmentation would not be ideal, as differences between the details of individual pixels of thermokarst and non-thermokarst lakes may not be as distinctive or noticeable. Classification would be much more difficult, as feature extraction would be significantly limited.

Although some model frameworks have used advanced ML models to extract exact boundaries of thermokarst lakes, their investigations into smaller thermokarst lakes is poorly resolved. Many studies were unable to include lakes with surface areas less than in their evaluation, leaving these smaller water bodies severely underdocumented [44]. Nevertheless, within the Arctic circle, small thermokarst lakes have been identified as the most prolific producers [45]. Based on this data, one can infer that there isn’t a strong correlation between surface area and GHG emission activity. Overlooking smaller lakes might result in significant underestimation of both water body surface area and emissions.

Objectives of our study

Identifying and analyzing thermokarst lakes is crucial. It helps us understand their impact on the long-term release of and . These lakes also affect landforms, vegetation, and peat changes. By understanding these effects, we can enhance our tools for predicting and managing climate change.

Our model based on wavelet decomposed data paired with a convolutional neural network (CNN) serves as a huge improvement over previous attempts to integrate ML into thermokarst identification. Not only is our model fully automated to allow for ease of scaling up, but it is also one of the first to integrate over fifteen non-image feature datasets including air and ground temperature, precipitation, and snow depth. However, because a portion of the model is a black box, it is difficult to outline exactly which features and channels were analyzed, as the weights and biases of the CNN nodes can only reveal preliminary information about their influences.

The objectives of our study were: (1) to assess the capability of dual input CNN in classifying thermokarst lakes from high-resolution satellite imagery and non-image feature data; (2) to improve the model accuracy by integrating M-band DWTs that decompose each of our raw data images into different sub-images with different frequencies to form a new corresponding 3-way tensor dataset; and (3) to classify smaller thermokarst lakes previously excluded from similar studies.

We created a new 3-way tensor dataset based on raw image data for more than 500 Sentinel-2 satellite lake images and decomposed those images using state-of-the-art M-band DWTs. We also incorporated non-image feature data for various climate variables. These methods significantly improved our model’s performance.

A illustration of our model framework is shown in Fig 1.

The first step is described in the “[lakes]Lake selection and distribution” section. We removed glacial lakes from the Altay and Sayan region due to the distance from the study area and only selected lakes with surface area between and . The second to fourth steps are described in “[image_data]Image data” section. In the second step, images of selected lakes in step 1 is downloaded from Sentinel-2 with bounding boxes. In the third step, we selected sample lakes from step 1 randomly, and removed lakes with cloudy satellite images by visual inspection. In the fourth step, we perform discrete wavelet transform on the images selected in step 3. Steps 5 through 7 are described in the “[framework]Model framework” section. In step 5, the image and non-image data is normalized to have a mean of 0 and a standard deviation of 1. In step 6, the image data is put through the convolutional section of our model. In step 7, the image data is flattened and combined with the non-image data, and both are put through the fully connected section of our model, finally yielding a classification of thermokarst or non-thermokarst lake.

Study area

In this research, we studied the entire QTP region bounded by longitudes 67–104° E and latitudes 27–46° N. Nicknamed the Roof of the World and the Third Pole, the QTP is the highest and largest plateau on Earth with an average elevation of 4,400 m and an area of approximately . Precipitation in the QTP is usually the heaviest between June and September, and the mean annual air temperature is 6.0°C [32, 46]. These conditions form a typical alpine vegetation ecosystem. Recent vegetation maps suggest that alpine environments can be classified into alpine steppes, alpine swamp meadows, alpine meadows, and alpine deserts [46]. Snow cover in the QTP functions as the main source of fresh water in western China and all major Asian rivers [47]. These major rivers, including the Yangtze River, the Yellow River, and the Brahmaputra River, provide water to about 1.4 billion people downstream [48]. Therefore, the formation and disappearance of thermokarst lakes in the QTP can significantly affect the water balances in surrounding regions. The QTP also happens to be the largest high-altitude and low-latitude permafrost zone, where up to 53% of exposed land is underlain by permafrost [48]. The permafrost in this region is particularly susceptible to global warming because of its relatively high ground temperatures, with averages above -3.0°C [32].

Data and methods

Lake selection and distribution

We began our methodology workflow by constructing our own lake image datasets. We determined our ground truth labels using a preexisting 2020 thermokarst lake inventory from [22] and a preexisting 2018 glacial lake inventory from [49]. The glacial lake inventory was selected to avoid overlap of lakes. Furthermore, glacial lakes are of similar size as thermokarst lakes, which means that they may be more difficult to distinguish using other methods [22]. These inventories provided coordinates for 114,420 thermokarst lakes and 30,121 glacial lakes respectively. Although the glacial lake inventory was taken in 2018, a glacial lake cannot transform into a thermokarst lake. Thus, we can reasonably assume that the locations of glacial and non-thermokarst lakes in 2018 remain representative of glacial and non-thermokarst lakes in 2020, provided that the lake did not drain or disappear during those two years.

We removed glacial lakes from the Altay and Sayan region from our dataset due to excessive distance from the study area (Fig 2A). There were 4,677 lakes in this area, or around 16% of the entire inventory (Fig 2B).

thumbnail
Fig 2. Distribution of thermokarst and glacial lakes.

The geographic area shown is the QTP, with degrees latitude and longitude marked on the sides of these figures. (A) Distribution of glacial lakes. Lakes in the Altay and Sayan region are in brown, and lakes not in that region are in blue. The lakes in the Altay and Sayan region are clearly outside the QTP and are far from the other glacial lakes. There are also lakes in the High Asia region outside the QTP included in this inventory. (B) Distribution of thermokarst (red) and glacial (blue) lakes that meet our location criteria. (C) Distribution of thermokarst and glacial lakes that meet our location and size criteria. (D) Distribution of sample thermokarst and glacial lakes that were used to train and evaluate our model. The sample lakes we selected have a similar spatial distribution as the population of lakes meeting our size criteria, which also has a similar spatial distribution as the population of all lakes in the inventories. Made with Natural Earth (https://www.naturalearthdata.com), whose data is in the public domain (https://www.naturalearthdata.com/about/terms-of-use/).

https://doi.org/10.1371/journal.pclm.0000283.g002

We then filtered our data to only select lakes with surface areas between and . These steps reduced our datasets to 1,030 thermokarst lakes and 1,015 glacial lakes (Fig 2C). From there, we extracted around a quarter of the available lakes in each of the two datasets (Fig 2D).

Image data

As both lake inventories contain coordinates of the centers of every lake, we were able to extract satellite data for a certain bounding box around each lake, centered at each lake’s centroid.

Given our lakes’ areas of 0.2 to , we sourced our image data for each lake from a bounding box of dimensions 1.44 km 1.44 km, an area slightly more than . By visual inspection, we saw that this bounding box was enough to fully encompass our lakes, while also not including excessive unnecessary data due to being too large. However, due to the variability in lake dimensions, some dataset entries captured multiple lakes within their bounding boxes. Given the spatial resolution of our imagery stood at 10 m 10 m, the extracted images for each lake comprised pixels.

Our analysis employed six channels: red, green, blue, near-infrared (NIR), short-wave infrared 1 (SWIR 1), and short-wave infrared 2 (SWIR 2) from the Level-2A (Bottom of Atmosphere reflectance) product, sourced from high-resolution Sentinel-2 Multi-Spectral Instrument (MSI) satellite imagery. This dataset was procured via the Google Earth Engine API, specifically leveraging the COPERNICUS/S2_SR product.

Along with these 6 channels, we added 3 spectral indices: the NDWI, the NDVI, and the Brightness Index (BI). These indices were calculated using the following formulas [22]:

(1)(2)(3)

Here, is the reflectance of the red band, is that of the green band, and is that of the near infrared band. Additionally, and are coefficients for the red and green bands, respectively. They have been set to 1 for the purposes of this study, as recommended in the Sentinel Application Platform.

The three aforementioned indices are instrumental in emphasizing the attributes of water, vegetation, and soil, respectively. Notably, the NDWI and NDVI are conducive to delineating surface water bodies [50].

The NDWI is used to delineate and enhance water features while ignoring soil and vegetation features. It functions on the notion that water bodies reflect minimally in the green channel and also produce lower radiation in the NIR channel [33].

The NDVI reflects the density of chlorophyll pigments found in plant matter and vegetation [6]. It can thus be used to measure vegetation development [51].

The BI captures the characteristics of the soil. It is related to the brightness of the soil and, therefore, is influenced by factors such as soil moisture and organic matter on the soil surface [52].

Due to the occlusion caused by clouds in certain satellite images, we meticulously chose unobscured images from each lake’s image repository spanning all of August and September 2020. In an effort to minimize biases and secure a randomized, illustrative sample, the lake databases were initially randomized prior to manual selection. Subsequent to this, satellite images from the designated time frame were presented for review, and those wherein the lake remained unobstructed by clouds were selected. In instances where suitable imagery for a lake was unavailable, it was omitted from the dataset. It is noteworthy that such an approach might introduce a bias, favoring less cloudy regions. Consequently, 47 thermokarst lakes and 45 glacial lakes, amounting to approximately 18.3% of the total lakes assessed, were excluded.

However, we believe our sample is still representative because most important variables have similar distributions in our sample as in the population; we discussed this in detail in the section “[representative]Representativeness of our sample”.

In total, we selected 252 thermokarst lakes and 251 glacial lakes, which comprised both our training and validation datasets.

There is a diversity of lakes, and it is not necessarily easy to visually determine which lakes are of which type.

M-band DWT.

For image signal processing, DWTs are the downsampling procedures that reveal hidden information from an original image by differentiating its high-frequency and low-frequency components, allowing for deeper understanding. They transform each image or channel into different frequency sub-images or channels, which are determined by a filter bank with M filters (). A filter bank includes a low-pass, or scaling filter and M–1 high-pass or wavelet filters for − 1. A filter bank has N vanishing moments, or N-regularity, which indicate the smoothness of corresponding DWT. The most significant properties of a M-band DWT are only dependent on the lowpass or scaling filter. A typically two-step construction procedure is used for a DWT filter bank. The first step is to construct the scaling filter, and the second is to choose the wavelet filters from the given scaling filter [5355].

The following is an example of a 4-band 2-regular DWT matrix constructed by the corresponding a 4-band DWT filter bank with four filters from [55]:

Moreover, a notable benefit of employing M-band DWT is the elimination of the need to choose the band number (M) and filter length (fl) as the wavelet-based approach is equipped with different M, and low- and high-frequency components as a whole. This enables the proposed approach to demonstrate its efficacy without requiring predefined parameters or making assumptions about the probability distribution of the population from the data. In particular, varying the values of M and fl can provide valuable insights into the images we analyze.

To pre-process our data, we used the Daubechies-6 and Daubechies-8 wavelets, along with a 3-band 2-regular wavelet, a 4-band 2-regular wavelet, and a 4-band 4-regular wavelet. The filter banks of the 3- and 4-band wavelets can be found in S1 Appendix.

Because the dimensions of a wavelet transform matrix are divisible by the band of the wavelet, we chose to make our image resolution divisible by 12, since we are using 2-, 3-, and 4-band wavelets. Fig 3A is a pixel image of a Tibetan yak. Because our image data have multiple channels, we applied DWTs to decompose each channel separately. Using the 3-band 2-regular wavelet in S1 Appendix, it is decomposed into approximation and detail components. Each component is a pixel image. The detail component is shown in Fig 3B with the approximation component in top left corner. Because we applied a 3-band DWT to the image, the image is decomposed into 9 sub-images (components or channels). The pixel values of detail components are very small compared to that of the approximation component.

thumbnail
Fig 3. Image of a Tibetan yak before and after 2D DWT.

(A) Original image of thermokarst lake. (B) The full result of wavelet decomposition using the 3-band 2-regular wavelet with scaled color. Original image by Alexandr Frolov from Wikimedia Commons (https://commons.wikimedia.org/wiki/File:Sarlyk_Yak2.jpg)

https://doi.org/10.1371/journal.pclm.0000283.g003

The RGB values of the detail components have been multiplied by 64 to make them visible. It can be seen that the detail components directly to the right of or below the approximation component show horizontal or vertical detail, respectively. The other detail components are various diagonal detail components.

An M-band wavelet can thus be used to turn an matrix into a 3-way tensor after separating the blocks of the resulting block matrix. Thus, for every feature of our input image data, the DWT creates wavelet components.

Non-image climate data

We integrated the monthly-averaged ERA5-Land dataset, sourced from the Copernicus Climate Data Store, to supplement our image inputs [56]. This dataset derives its values from the hourly averages of the ERA5-Land dataset, aggregated on a monthly basis.

The spatial granularity of this dataset stands at , translating to approximately . Given the relatively coarse resolution of the ERA5-Land dataset, each lake was allocated a singular data point of variable values, corresponding to the pixel in which the lake’s center resides. Owing to the fluctuations in monthly meteorological data between August and September 2020, the final input values assigned to each lake epitomize the mean values spanning the two-month period.

Table 1 lists out each variable we used and a description of each variable.

No single variable alone can clearly categorize lakes. However, if a lake shows unusual values in one or more variables, it can help indicate its likely classification. This is done by looking at the region where the lake is located. As a result, these variables can provide extra insight for the model.

These variables can be classified into four categories: temperature, water movement and phase change, snow and ice properties, and leaf area index. Temperature for thermokarst lakes is in general higher than temperature for glacial lakes as the growth of thermokarst lakes is triggered by heat conduction, which causes the freezing water below and around these lakes to melt [1]. In particular, lake bottom temperature may be higher for thermokarst lakes due to microbes decomposing organic matter at lake bottoms [13, 20]. Similarly, differences in temperature between thermokarst and glacial lakes can in turn drive differences in water phase change such as evaporation and the amount of ice and snow. For example, glacial regions may have more snow and ice than thermokarst regions. Leaf area index may differ because the thermokarst formation process results in increased vegetation growth [20].

Parameters like lake bottom temperature, snowmelt, and soil temperature inherently bear significance in distinguishing between lake classes, primarily due to the potential variance in their distributions across different lake categories. We verify this intuition in the subsection “[data_exploration]Exploration of the significance of feature variables” of the discussion section by confirming that these three variables are the ones our model uses the most.

Model framework

S3 Fig shows the flowchart of our classification model combining image data input and non-image feature data into a CNN.

Given a completed model framework and data pre-processing methodology, we were able to create seven different models to test how our framework would respond to seven different types of input data. We split our dataset into training and validation datasets using sklearn’s train_test_split, we divided our datasets into a 70–30 training-to-testing ratio: 70% of our data was used to train our model while the remaining 30% was used to test our model and validate the performance of our model. Note that a random state parameter was utilized to ensure that the data were divided with identical seeds for each model (i.e.,each model’s testing and training datasets contained identical data points).

We developed seven models to be trained that corresponded to seven different modifications of our datasets. First, we had a true control where one model was trained only on the non-decomposed image dataset. Next, we had a dual input control. This model was trained on both the non-decomposed image dataset and the non-image feature dataset. Then, we tested five experimental models. Each model explored the effects of a different DWT with unique filter banks on our data. These included Daubechies-6 (db3), Daubechies-8 (db4), a 3-band 2-regular wavelet (wv32), a 4-band 2-regular wavelet (wv42), and a 4-band 4-regular wavelet (wv44). Some DWTs created special spatial tensor shapes. This required us to adjust certain parameters to better meet our learning goals. The model parameters in question include the input dimension, kernel size of each convolutional layer, and number of strides in each average pooling layer. They can be found in Table 2, which summarizes the specific alterations made to each model.

thumbnail
Table 2. A summary of the seven types of input data we trained our model on.

https://doi.org/10.1371/journal.pclm.0000283.t002

Representativeness of our sample

We performed independent Welch’s t-tests to determine whether our samples were representative of our populations. The t-tests’ null hypotheses are that the samples’ and the populations’ values have equal expected values. The alternate hypotheses were that the variables’ values have distinct expected values.

In Table 3, lake ice surface temperature and lake ice total depth had no p-values because their standard deviations are zero, suggesting that none of the thermokarst lakes had ice on their surfaces. For both the population dataset and sample dataset of thermokarst lakes, none of the lakes had corresponding lake ice temperatures above the freezing point of 273 K or 0°C. so the standard deviation was 0, resulting in undefined p-values.

thumbnail
Table 3. p-values for thermokarst lakes. Suface runoff, total precipitation, snow albedo, 2 meter temperature, soil temperature level 1, and high vegetation leaf area index all produced statistically significant p-values. We explore their distributions and explain these results later in this section.

https://doi.org/10.1371/journal.pclm.0000283.t003

In Table 4, however, lake ice surface temperature and lake ice total depth had definable p-values. This difference is understandable, as analyzing the glacial lake ice depth distributions show that approximately 9% of glacial lakes in both the sample and population had nonzero lake ice depth labels and subsequently lake ice temperatures below 273 K; 93 glacial lakes in the population and 26 glacial lakes in the sample dataset had nonzero lake depth values. Similarly, 93 glacial lakes in the population dataset and 24 glacial lakes in the sample dataset had lake ice temperatures below the freezing point of 273 K or 0°C.

thumbnail
Table 4. p-values for glacial lakes. None of these values are significant, indicating that our sample is representative of our population.

https://doi.org/10.1371/journal.pclm.0000283.t004

S1 Fig shows that for all of the features that failed the t-test, their sample distributions are very similar, if not nearly identical, to their corresponding population variable value distributions. The most notable difference in the distributions that were not identical is a greater number of upper extreme values in each feature’s population distribution. For some features, such as surface runoff and total precipitation, these extremes heavily increased the maximum values in their respective distributions. This observation indicates that none of the distributions were normal or close to normal, providing an additional reason why a ML model is preferable to a traditional regression-based model.

While we did not input the latitude, longitude, lake area, and elevation variable values into our model, we performed a two-tailed independent t-test on the population and sample means of these two variables for both glacial and thermokarst lakes to further assess the representativeness of our sample dataset. The p-values for thermokarst lakes are shown in Table 5. The p-values for glacial lakes are shown in Table 6. While the p-values for elevation and latitude are significant for thermokarst lakes, plotting the histograms of the distributions reveals that the distributions are in fact also very similar (S1 Fig). A potential explanation as to why the data produced such small p-values is that neither distribution fits the normal curve. Both distributions are left skewed.

thumbnail
Table 5. p-values for thermokarst lakes. The p-values for elevation and latitude are significant.

https://doi.org/10.1371/journal.pclm.0000283.t005

thumbnail
Table 6. p-values for glacial lakes. None of the p-values are significant.

https://doi.org/10.1371/journal.pclm.0000283.t006

Results

Upon training our model, we evaluated its classification efficacy using our validation dataset, documenting the confusion matrix and receiver operating characteristic (ROC) curve for each model iteration. Table 7 summarizes our findings for the seven models. Because glacial lakes had a label of 0 and thermokarst lakes had a label of 1, the confusion matrices should be read as (starting from the upper left corner and moving clockwise) true negatives, false positives, true positives, false negatives. Our validation dataset contains 76 glacial lakes and 75 thermokarst lakes, for a total of 151 lakes.

thumbnail
Table 7. A table summary of five metrics of evaluation calculated based on each model’s testing performance. The highest values in each category are in bold.

https://doi.org/10.1371/journal.pclm.0000283.t007

The area under an ROC curve (AUC) quantifies a model’s proficiency in differentiating between two classes. Typically, ROC curves that approximate a right angle in the top-left quadrant signify superior performance. Among our models, the 4-band 2-regular (wv42) DWT exhibited the most commendable performance, evidenced by the maximal area under the ROC curve (Fig 4A). Unsurprisingly, our true control model, using only image-only non–wavelet-decomposed satellite data, performed the worst (Fig 4C), scoring the lowest in every metric we measured but recall. The dual input control model produced a significantly better confusion matrix, but its ROC curve showed very little improvement over the true control, suggesting that despite having an additional 17 non-image features, the dual input control’s ability to distinguish between thermokarst and glacial lakes did not improve significantly.

thumbnail
Fig 4. ROC curves and confusion matrices for 4-band 2-regular model vs. true control.

(A) The ROC curve of the selected model using the 4-band 2-regular (wv42) DWT. (B) The confusion matrix of the selected model using the 4-band 2-regular (wv42) DWT. (C) The ROC curve of the true control model with the input of image-only non–wavelet-decomposed satellite data. (D) The confusion matrix of the true control model with the input of image-only non–wavelet-decomposed satellite data.

https://doi.org/10.1371/journal.pclm.0000283.g004

The confusion matrix of our best-performing model (Fig 4B), using the wv42 wavelet, shows a very low number of false positives (4) and a medium number of false negatives (12). Comparatively, the confusion matrix of the true control model (Fig 4D), there are significantly more false positives (31) than false negatives (11). Thus, the wv42 model’s main improvements over the true control model are on eliminating false positives.

In the confusion matrix of the dual input control model using non–wavelet decomposed data (panel B of S4 Fig), the amount of false negatives is low (8), while the amount of false positives is much higher (26). In the confusion matrix of the model using the Daubechies-6 (db3) DWT (panel D of S4 Fig), the recorded number of false positives and false negatives are very similar. In the confusion matrix of the model using the Daubechies-8 (db4) DWT (panel F of S4 Fig), there is a low number of false negatives. The number of false positives is moderately high, but lower than in the dual input control model. The confusion matrix of the model using the 3-band 2-regular (wv32) DWT (panel B of S5 Fig) is the same as the confusion matrix for the Daubechies-6 DWT. In the confusion matrix of the model using the 4-band 4-regular (wv44) DWT (panel D of S5 Fig), the amounts of false positives and false negatives are very similar and neither low nor high.

The use of DWTs greatly improved the classification capabilities of the remaining five models. By splitting each channel in our image data into approximation and detail channels, we were able to effectively generate more features and uncover potentially hidden information that the models could use to distinguish between thermokarst and glacial lakes. The 4-band 2-regular decomposition had the greatest positive impact on the performance of the model, producing an accuracy of almost 90% and an area under the ROC curve of 0.94, demonstrating that the model was able to effectively distinguish between thermokarst and glacial lakes.

The incorporation of DWTs notably enhanced the classification accuracy of the next five models. By segregating each channel of our image data into approximation and detail channels, we efficiently generated augmented features, potentially unveiling concealed information instrumental for distinguishing between thermokarst and glacial lakes. Among the variations, the 4-band 2-regular DWT decomposition was the most influential, delivering an accuracy nearing 90% and an area under the ROC curve amounting to 0.94. This underscores the model’s proficiency in differentiating between the lake types.

The superior performance of the 4-band 2-regular DWT can be attributed to two primary factors.

Firstly, the 4-band DWT divides each image into many detail channels. This provides a lot of features for training the model. Usually, having more bands leads to better results. This is shown by the improvement in model performance as the number of bands increases. Secondly, the 2-regular type of DWT creates detail channels that are less fine than those made by the 4-band 4-regular DWT. This difference is crucial because the images we’re working with are not very high resolution ( pixels). The detailed texture of each channel, marked by big changes in pixel values, would have been overly smoothed out by a 4-regular DWT. This smoothing would have lowered the quality of the resulting channels.

Our model, trained on the 4-band 2-regular DWT decomposed spatial data and normalized non-image feature data, produced a very high precision of 0.9043 but a relatively lower recall of 0.8400. This indicates that the model produced far fewer false positives than false negatives.

To find out which parts of the data decomposed by the 4-band 2-regular DWT were most important for our model, we first looked at the weights. These weights come from the first convolutional layer of our model. They form an array with the dimensions . We then split the array along axis 2 to isolate the weights of each layer and calculated the average weight value of each layer. We found no significant pattern that would suggest that the model tended to assign more weight to a specific decomposed set of channels. However, we found that the nine layers with the lowest average weights corresponded to the nine approximation channel (channels 0 to 8). To further test this theory, We introduced noise into our testing dataset by randomly shuffling the approximation channels, rendering them essentially ineffective for the model’s classification process. Subsequently, we evaluated our model using this perturbed dataset with the infused noisy channels to see the result. Our model saw a decrease in accuracy, but its area under the ROC curve only decreased by 0.02, suggesting that while the noise might have disturbed the model’s ability to classify certain lakes, it generally had very little impact on the model’s ability to distinguish between thermokarst and glacial lakes. These experiments suggested that the model trained on 4-band 2-regular DWT decomposed data relied more heavily on detail channels than approximation channels when classifying between thermokarst and glacial lakes. This result is consistent with our observation that training our model on 4-band 2-regular DWT decomposed data, the less smooth data, yielded better results than the model trained on 4-band 4-regular DWT decomposed data.

Discussion

We designed an automatic dual input classification model based on DL to identify thermokarst lakes in the QTP region from high-resolution satellite imagery and additional non-image feature data. We explored the application of M-band DWTs to pre-process our image data and produce expanded features. Satellite image between August and September 2020 were obtained from Sentinel-2 datasets. We based our ground truths labels on two separate inventories of thermokarst and glacial lake inventories. We extracted 252 thermokarst lakes and 251 glacial lakes from the respective inventories. We obtained climate feature data from the ERA5-Land Monthly Averaged dataset. To ensure that our sample of thermokarst and glacial lakes were representative of all of the lakes that met our area criteria, we performed multiple two independent two-tailed t-tests on each of our population and sample means of our non-image climate feature data and additional features. A CNN was then trained and tested using our normalized and DWT-processed data to output a classification of either thermokarst or non-thermokarst.

In our effort to develop a high-accuracy classification model of thermokarst lakes via a CNN framework, we found empirical support for two primary hypotheses: a) the enrichment of image data through wavelet transform decomposition into 3-way tensor formats and b) the integration of both image data and non-image-based climate information would improve classification accuracy. The inclusion of non-image data and 2D DWTs significantly improved the model’s accuracy, with an increase of up to 17.21%.

While assessing the model’s accuracy, one key factor was the improvement in precision. We needed to determine if this was mainly due to adding non-image features or the use of DWT-based pre-processing. Evidently, both factors contributed to enhancing the model’s accuracy relative to the baseline. The wavelet decomposition stood out as the more influential factor. It increased accuracy by up to 11.9%. Additionally, it raised the F1 score by as much as 0.09. In contrast, comparing the dual-input model with the primary control showed differences. The addition of secondary, non-image-based climate data had a slightly smaller effect. It improved accuracy by up to 5.29% and increased the F1 score by as much as 0.045.

Comparative analysis

Our study is designed to provide an automatic classification tool for water bodies observed from the remote sensing images into thermokarst lakes and non-thermokarst lakes. In order to measure the performance of our classification tool, we utilized the existing thermokarst lake inventory [22] and glacial lake inventory [49] as the ground truth to examine whether our classification model is accurate or not. The accuracy (the ratio of the number of correct predictions to the total number of predictions) for our best model is 89% and the precision (number of true positives divided by the total number of positive predictions) for our best model is 94%.

[22] utilized both DL methodology and manual polygon delineation achieved an accuracy of 91%. The performance measure of their analysis is the accuracy of the extracted lake boundaries, therefore the performance is not directly comparable.

The previous studies utilizing both DL methodology and field research to identify thermokarst landforms focused on smaller landforms such as thermo-erosion gullies [2] and retrogressive thaw slumps (RTSs) [42].

[2] used CNN to map small, irregular thermokarst landforms such as thermo-erosion gullies from high-resolution remote sensing images and achieved a precision of 59%. [42] utilized DeepLabv3+ to automatically delineate Retrogressive thaw slumps (RTSs) achieved a precision of 86.3%. Purely based on precision, our analysis outperformed these previous studies. However, we should note that the subject thermokarst landforms in these previous studies are relatively small in size as compared to the ones in our study, which increased the difficulty of the analysis.

Exploration of the significance of feature variables

The following section explores the 3 climate feature variables with the greatest average absolute weights in the first dense layer of our 4-band 2-regular DWT model. We compare and analyze the distributions of lake bottom temperature, snowmelt, and soil temperature level 1 of thermokarst lakes and glacial lakes.

The mean lake bottom temperature for thermokarst lakes is higher than that of the glacial lakes. Some of thermokarst lake bottoms are somewhat warmer (282–284 K) than most glacial lakes. This makes sense because tiny organisms called microbes break down thawed organic stuff where there is no oxygen at the bottom of these lakes. This is correlated with higher temperatures and additional permafrost thaw in a positive feedback loop [13, 20] (Fig 5A).

thumbnail
Fig 5. Significance of feature variables: lake bottom temperature, snowmelt, and level 1 soil temperature.

(A) Lake bottom temperature for thermokarst and glacial lakes. (B) Snowmelt for thermokarst and glacial lakes. (C) Level 1 soil temperature for thermokarst and glacial lakes.

https://doi.org/10.1371/journal.pclm.0000283.g005

While thermokarst lakes’ snowmelt values are all very close to zero, glacial lakes sometimes have higher snowmelt values (Fig 5B), reflecting the fact there is in general more snow in the glacial regions than the thermokarst regions. This is partially driven by the higher temperature of thermokarst lakes than glacial lakes as the expansion of thermokarst lakes is triggered by heat conduction [1].

Glacial lakes more often have lower soil temperatures, including below freezing point (273.15 K). This could be due to glacial lakes in glacial regions being colder (Fig 5C).

Applications

The findings from our study indicate that our model holds substantial potential for facilitating the creation of accurate annual inventories of thermokarst lakes without the dependency on field data. Our research covers a wide area, making our model useful beyond just regional or local studies. With data sources from around the world, both image and non-image, we believe our model works well in areas with a significant presence of thermokarst lakes. This approach could lead to a marked reduction in the costs associated with monitoring thermokarst lakes, obviating the necessity for field equipment and dedicated research facilities.

Furthermore, an integrated application of these annual thermokarst lake inventories could serve to refine both global climate models and GHG ledgers. Accurate identification and mapping of thermokarst landforms are foundational to deciphering their specific GHG emissions. Thermokarst lakes, being the most pronounced of these landforms, play a pivotal role. Through precise localization of these lakes, we can gain insights into their annual contributions to and emissions, which in turn can provide a more nuanced understanding for simulating Earth’s climate dynamics.

Future work

With the simultaneous progression of ML techniques and quality of satellite and remote sensing imagery, the future possibilities of DL to further improving classification, automatic mapping, and forecasting of thermokarst landforms exceeds those covered in this research.

First, to improve the time efficiency of our data collection process, a script could be developed to automatically select the image with the least cloud cover and cloud shadows of the extracted satellite image options. We would be able to construct a relatively accurate thermokarst lake inventory for any time period with sufficient high-resolution satellite image data. We were unable to accomplish this step because the cloud coverage area property on the satellite image dataset was unreliable due to the scale of the data. We had also attempted to combine multiple images from the same location by selecting pixels with the median channel values together. However, the results remained inconsistent. Potential ways to overcome this problem could be improvements in the quality and accuracy of satellite image parameters or innovative ways to work around cloud cover via deeper exploration into band value correlation with percentage of cloud cover.

Another option for expanding our training dataset would be to include land as a possible classification. By doing so, it would be possible to automatically detect new or previously-undiscovered thermokarst lakes by running the model on images from randomly-chosen locations.

As introduced in the Related Works, a few other studies have applied time series CNNs (TempCNN) to mapping thermokarst landforms [43]. Future research could explore a potential ensemble or hybrid model that combines our model with that of TempCNN so that it would be plausible to input time-series data. This addition could also provide more information and expand on research related to investigating the dynamics and seasonal changes of thermokarst lakes.

In this research, we were able to classify and include relatively smaller lakes, but there are still water bodies with smaller surface areas that we were unable to include due to limitations in our data resolution. Because the input images had to remain constant in size, including smaller lakes would sacrifice resolution, as the pixel representation of the lake would be insignificant compared to the land cover in the bounding box.

Building on our work classifying lakes, we plan to predict how thermokarst lakes will change, including their surface area and drainage patterns. These predictions will come from analyzing both image and non-image data over time. However, we face a challenge with the detail level of our climate data. The best dataset we can use only offers a 9 km resolution. This broad approach might oversimplify the detailed features of thermokarst landscapes. It could mix them with features of nearby areas, possibly confusing them with other water bodies. This might lead to missing important details.

Conclusion

In this research, we explore the degradation of permafrost in the QTP region through the classification of thermokarst lakes and develop a DWT-based dual input DL model with a CNN to automatically classify and accurately predict thermokarst lakes with area between 0.2 and , a range of lakes previously excluded from many assessments due to issues in satellite data. Our model is the first neural network–based thermokarst lake classification model that incorporates M-band DWTs to decompose raw spatial data into different frequency component sub-images to form a corresponding 3-way tensor dataset. This special treatment of our data adds additional features and improves validation accuracy by up to 17%. Our model can be upscaled and used to build future inventories of thermokarst lakes without having to collect field data.

Acknowledgments

First, we thank Western Connecticut State University for providing its facilities in which we conducted our research.

We are grateful for Vincent Li’s mentorship on this project. He provided valuable insight into the intricacies of the model framework, assisted us with our questions on the neural network, and regularly reviewed our progress.

We are also grateful for Tyler Wooldridge’s mentorship in furthering our understanding of the possible model structures and extent of ML, helping us steer our project in the right direction.

We appreciate the advice and ML expertise of Hieu Nguyen, who helped answer our inquiries and challenges we faced whilst designing our classification model.

Supporting information

S1 Appendix. Filter banks for the 3- and 4-band wavelet transforms we used.

https://doi.org/10.1371/journal.pclm.0000283.s001

(PDF)

S2 Appendix. Code.

https://doi.org/10.1371/journal.pclm.0000283.s002

The code not present in this paper. This code can also be found in the following GitHub repository: https://github.com/jliu2006/pingo.

S1 Fig. Distributions of the non-image climate feature values that failed the t-test () for thermokarst lakes.

The histograms in each subfigure summarize the distribution for our sample of 252 thermokarst lakes, while the histograms below in each subfigure represent the distribution for our population of 1,030 thermokarst lakes that met our area criteria. A: Surface runoff. B: Total precipitation. C: Snow albedo. D: 2 m temperature. E: Soil temperature level 1. F: Leaf area index, high vegetation.

https://doi.org/10.1371/journal.pclm.0000283.s003

(TIF)

S2 Fig. Elevation and latitude of the sample and population of thermokarst lakes.

(A) Elevation distribution. (B) Latitude distribution. The distributions are both very similar.

https://doi.org/10.1371/journal.pclm.0000283.s004

(TIF)

S3 Fig. Flowchart of our classification model.

https://doi.org/10.1371/journal.pclm.0000283.s005

(TIF)

S4 Fig. ROC curves and confusion matrices for non-selected models using no wavelets or Daubechies wavelets.

(A) The ROC curve of the dual input control model using non–wavelet decomposed data. (B) The confusion matrix of the dual input control model using non–wavelet decomposed data. (C) ROC curve of the model using the Daubechies-6 (db3) DWT. (D) The confusion matrix of the model using the Daubechies-6 (db3) DWT. (E) The ROC curve of the model using the Daubechies-8 (db4) DWT. (F) The confusion matrix of the model using the Daubechies-8 (db4) DWT.

https://doi.org/10.1371/journal.pclm.0000283.s006

(TIF)

S5 Fig. The ROC curves and confusion matrices for non-selected models using non-Daubechies wavelets.

(A) The ROC curve of the model using the 3-band 2-regular (wv32) DWT. (B) The confusion matrix of the model using the 3-band 2-regular (wv32) DWT. (C) The ROC curve of of the model using the 4-band 4-regular (wv44) DWT. (D) The confusion matrix of of the model using the 4-band 4-regular (wv44) DWT.

https://doi.org/10.1371/journal.pclm.0000283.s007

(TIF)

References

  1. 1. Zandt MH, Liebner S, Welte CU. Roles of Thermokarst Lakes in a Warming World. Trends Microbiol. 2020;28(9):769–79. pmid:32362540
  2. 2. Huang L, Liu L, Jiang L, Zhang T. Automatic Mapping of Thermokarst Landforms from Remote Sensing Images Using Deep Learning: A Case Study in the Northeastern Tibetan Plateau. Remote Sensing. 2018;10(12):2067.
  3. 3. Grosse G, Jones B, Arp C. 8.21—Thermokarst lakes, drainage, and drained basins. In: Shroder JF, editor. Treatise on Geomorphology. San Diego: Academic Press; 2013. p. 325–53. Available from: https://www.sciencedirect.com/science/article/pii/B9780123747396002165Grosse G,
  4. 4. Jin H, Ma Q. Impacts of Permafrost Degradation on Carbon Stocks and Emissions under a Warming Climate: A Review. Atmosphere. 2021;12(11):1425.
  5. 5. Gao T, Zhang Y, Kang S, Abbott BW, Wang X, Zhang T, et al. Accelerating permafrost collapse on the eastern Tibetan Plateau. Environ Res Lett. 2021;16(5):054023.
  6. 6. Șerban R-D, Jin H, Șerban M, Luo D, Wang Q, Jin X, et al. Mapping thermokarst lakes and ponds across permafrost landscapes in the Headwater Area of Yellow River on northeastern Qinghai-Tibet Plateau. International Journal of Remote Sensing. 2020;41(18):7042–67.
  7. 7. Remington R. Climate Change Affecting High-Altitude Regions at Faster Rate, Rutgers Study Finds; 2015. Available from: https://www.rutgers.edu/news/climate-change-affecting-high-altitude-regions-faster-rate-rutgers-study
  8. 8. Bowden WB. Climate Change in the Arctic – Permafrost, Thermokarst, and Why They Matter to the Non‐Arctic World. Geography Compass. 2010;4(10):1553–66.
  9. 9. Schuur EAG, McGuire AD, Schädel C, Grosse G, Harden JW, Hayes DJ, et al. Climate change and the permafrost carbon feedback. Nature. 2015;520(7546):171–9. pmid:25855454
  10. 10. Knoblauch C, Beer C, Liebner S, Grigoriev MN, Pfeiffer E-M. Methane production as key to the greenhouse gas budget of thawing permafrost. Nature Clim Change. 2018;8(4):309–12.
  11. 11. Walter KM, Edwards ME, Grosse G, Zimov SA, Chapin FS . Thermokarst lakes as a source of atmospheric CH4 during the last deglaciation. Science. 2007;318(5850):633–6. pmid:17962561
  12. 12. Wei Z, Du Z, Wang L, Zhong W, Lin J, Xu Q, et al. Sedimentary organic carbon storage of thermokarst lakes and ponds across Tibetan permafrost region. Sci Total Environ. 2022;831:154761. pmid:35339557
  13. 13. Zimov SA, Voropaev YV, Semiletov IP, Davidov SP, Prosiannikov SF, Chapin FS III, et al. North Siberian Lakes: A Methane Source Fueled by Pleistocene Carbon. Science. 1997;277(5327):800–2.
  14. 14. Sepulveda-Jauregui A, Walter Anthony KM, Martinez-Cruz K, Greene S, Thalasso F. Methane and carbon dioxide emissions from 40 lakes along a north–south latitudinal transect in Alaska. Biogeosciences. 2015;12(11):3197–223.
  15. 15. Matveev A, Laurion I, Vincent WF. Methane and carbon dioxide emissions from thermokarst lakes on mineral soils. Arctic Science. 2018;4(4):584–604.
  16. 16. Zamanpour MK, Kaliappan RS, Rockne KJ. Gas ebullition from petroleum hydrocarbons in aquatic sediments: A review. J Environ Manage. 2020;271:110997. pmid:32778285
  17. 17. Schneider von Deimling T, Grosse G, Strauss J, Schirrmeister L, Morgenstern A, Schaphoff S, et al. Observation-based modelling of permafrost carbon fluxes with accounting for deep carbon deposits and thermokarst activity. Biogeosciences. 2015;12(11):3469–88.
  18. 18. Walter KM, Smith LC, Chapin FS . Methane bubbling from northern lakes: present and future contributions to the global methane budget. Philos Trans A Math Phys Eng Sci. 2007;365(1856):1657–76. pmid:17513268
  19. 19. Sieczko AK, Duc NT, Schenk J, Pajala G, Rudberg D, Sawakuchi HO, et al. Diel variability of methane emissions from lakes. Proc Natl Acad Sci U S A. 2020;117(35):21488–94. pmid:32817550
  20. 20. Anthony KMW, Zimov SA, Grosse G, Jones MC, Anthony PM, Chapin FS 3rd, et al. A shift of thermokarst lakes from carbon sources to sinks during the Holocene epoch. Nature. 2014;511(7510):452–6. pmid:25043014
  21. 21. Serikova S, Pokrovsky OS, Laudon H, Krickov IV, Lim AG, Manasypov RM, et al. High carbon emissions from thermokarst lakes of Western Siberia. Nat Commun. 2019;10(1):1552. pmid:30948722
  22. 22. Wei Z, Du Z, Wang L, Lin J, Feng Y, Xu Q, et al. Sentinel‐Based Inventory of Thermokarst Lakes and Ponds Across Permafrost Landscapes on the Qinghai‐Tibet Plateau. Earth and Space Science. 2021;8(11).
  23. 23. Shaposhnikova M, Duguay CR, Roy-Léveillée P. Bedfast and Floating Ice Dynamics of Thermokarst Lakes Using a Temporal Deep Learning Mapping Approach: Case Study of the Old Crow Flats, Yukon, Canada. Copernicus GmbH. 2022. https://doi.org/10.5194/egusphere-2022-388
  24. 24. Walter Anthony K, Daanen R, Anthony P, Schneider von Deimling T, Ping C-L, Chanton JP, et al. Methane emissions proportional to permafrost carbon thawed in Arctic lakes since the 1950s. Nature Geosci. 2016;9(9):679–82.
  25. 25. Walter Anthony K, Schneider von Deimling T, Nitze I, Frolking S, Emond A, Daanen R, et al. 21st-century modeled permafrost carbon emissions accelerated by abrupt thaw beneath lakes. Nat Commun. 2018;9(1):3262. pmid:30111815
  26. 26. Deshpande BN, MacIntyre S, Matveev A, Vincent WF. Oxygen dynamics in permafrost thaw lakes: Anaerobic bioreactors in the Canadian subarctic. Limnology & Oceanography. 2015;60(5):1656–70.
  27. 27. Deshpande BN, Maps F, Matveev A, Vincent WF. Oxygen depletion in subarctic peatland thaw lakes. Arctic Science. 2017;3(2):406–28.
  28. 28. Edwards M, Grosse G, Jones BM, McDowell P. The evolution of a thermokarst-lake landscape: Late Quaternary permafrost degradation and stabilization in interior Alaska. Sedimentary Geology. 2016;340:3–14.
  29. 29. Hughes-Allen L, Bouchard F, Laurion I, Séjourné A, Marlin C, Hatté C. Seasonal patterns in greenhouse gas emissions from thermokarst lakes in central yakutia (eastern siberia). Limnol Oceanogr. 2021;66:S98–116.
  30. 30. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. pmid:26017442
  31. 31. Ma L, Liu Y, Zhang X, Ye Y, Yin G, Johnson BA. Deep learning in remote sensing applications: A meta-analysis and review. ISPRS Journal of Photogrammetry and Remote Sensing. 2019;152:166–77.
  32. 32. Yin G, Luo J, Niu F, Lin Z, Liu M. Machine learning-based thermokarst landslide susceptibility modeling across the permafrost region on the Qinghai-Tibet Plateau. Landslides. 2021;18(7):2639–49.
  33. 33. McFeeters SK. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. International Journal of Remote Sensing. 1996;17(7):1425–32.
  34. 34. Xu H. Modification of normalised difference water index (NDWI) to enhance open water features in remotely sensed imagery. International Journal of Remote Sensing. 2006;27(14):3025–33.
  35. 35. Du Z, Li W, Zhou D, Tian L, Ling F, Wang H, et al. Analysis of Landsat-8 OLI imagery for land surface water mapping. Remote Sensing Letters. 2014;5(7):672–81.
  36. 36. Szabó L, Deák B, Bíró T, Dyke GJ, Szabó S. NDVI as a Proxy for Estimating Sedimentation and Vegetation Spread in Artificial Lakes—Monitoring of Spatial and Temporal Changes by Using Satellite Images Overarching Three Decades. Remote Sensing. 2020;12(9):1468.
  37. 37. Zhao Y, Shen Q, Wang Q, Yang F, Wang S, Li J, et al. Recognition of Water Colour Anomaly by Using Hue Angle and Sentinel 2 Image. Remote Sensing. 2020;12(4):716.
  38. 38. Chen H, Liqiang T, Zhaocheng G, Jienan T, Hua W, Peng H. A dynamics trend analysis method of thermokarst lakes based on the machine learning algorithm. In: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE. 2021. 6484–7.
  39. 39. Șerban R, Jin H, Șerban M, Luo D. Shrinking thermokarst lakes and ponds on the northeastern Qinghai‐Tibet plateau over the past three decades. Permafrost & Periglacial. 2021;32(4):601–17.
  40. 40. Gunasekaran A. Tutorial in SNAP desktop for sentinel 2 data pre processing applications—GNU GPL environment. 2020.
  41. 41. Rathod M, Khanapuri J. Performance evaluation of transform domain methods for satellite image resolution enhancement. In: Vasudevan H, Deshmukh A, Ray K, editors. Proceedings of International Conference on Wireless Communication. Springer Singapore. 2018. 227–36.
  42. 42. Huang L, Luo J, Lin Z, Niu F, Liu L. Using deep learning to map retrogressive thaw slumps in the Beiluhe region (Tibetan Plateau) from CubeSat images. Remote Sensing of Environment. 2020;237:111534.
  43. 43. Shaposhnikova M. Temporal deep learning approach to bedfast and floating thermokarst lake ice mapping using SAR imagery old crow flats yukon canada. University of Waterloo. 2021. http://hdl.handle.net/10012/17414
  44. 44. Chen X, Mu C, Jia L, Li Z, Fan C, Mu M, et al. High-resolution dataset of thermokarst lakes on theQinghai-Tibetan Plateau. Copernicus GmbH. 2021. https://doi.org/10.5194/essd-2020-378
  45. 45. Polishchuk VYu, Polishchuk YM. The System of Geo-Simulation Modeling of Thermokarst Lakes Fields Based on the Log-Normal Distribution of Their Sizes. In: Proceedings of the 8th Scientific Conference on Information Technologies for Intelligent Decision Making Support (ITIDS 2020). Atlantis Press. 2020. https://doi.org/10.2991/aisr.k.201029.037
  46. 46. Wang X, Wu C, Peng D, Gonsamo A, Liu Z. Snow cover phenology affects alpine vegetation growth dynamics on the Tibetan Plateau: Satellite observed evidence, impacts of different biomes, and climate drivers. Agricultural and Forest Meteorology. 2018;256–257:61–74.
  47. 47. Luo J, Niu F, Lin Z, Liu M, Yin G. Thermokarst lake changes between 1969 and 2010 in the Beilu River Basin, Qinghai–Tibet Plateau, China. Science Bulletin. 2015;60(5):556–64.
  48. 48. Cao B, Zhang T, Wu Q, Sheng Y, Zhao L, Zou D. Permafrost zonation index map and statistics over the Qinghai-Tibet plateau based on field evidence. Permafr Perigl Process. 2019;30(3):178–94.
  49. 49. Wang X, Guo X, Yang C, Liu Q, Wei J, Zhang Y, et al. Glacial lake inventory of high-mountain Asia in 1990 and 2018 derived from Landsat images. Earth Syst Sci Data. 2020;12(3):2169–82.
  50. 50. Qiao B, Zhu L, Yang R. Temporal-spatial differences in lake water storage changes and their links to climate change throughout the Tibetan Plateau. Remote Sensing of Environment. 2019;222:232–43.
  51. 51. Dorigo W, de Jeu R, Chung D, Parinussa R, Liu Y, Wagner W, et al. Evaluating global trends (1988–2010) in harmonized multi‐satellite surface soil moisture. Geophysical Research Letters. 2012;39(18).
  52. 52. Marques MJ, Alvarez A, Carral P, Sastre B, Bienes R. The use of remote sensing to detect the consequences of erosion in gypsiferous soils. International Soil and Water Conservation Research. 2020;8(4):383–92.
  53. 53. Daubechies I. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics. 1992. https://doi.org/10.1137/1.9781611970104
  54. 54. Fromzel V, Ter-Gabrielyan N, Dubinskii M. Acousto-optically Q-switched, resonantly pumped, Er:YVO4 laser. Opt Express. 2013;21(13):15253–8. pmid:23842311
  55. 55. Lin T, Xu S, Shi Q, Hao P. An algebraic construction of orthonormal M-band wavelets with perfect reconstruction. Applied Mathematics and Computation. 2006;172(2):717–30.
  56. 56. MuñozSabater J. ERA5-land monthly averaged data from 1981 to present. 2019.