Thermal remote sensing over heterogeneous urban and suburban landscapes using sensor-driven super-resolution

Thermal remote sensing is an important tool for monitoring regional climate and environment, including urban heat islands. However, it suffers from a relatively lower spatial resolution compared to optical remote sensing. To improve the spatial resolution, various “data-driven” image processing techniques (pan-sharpening, kernel-driven methods, and machine learning) have been developed in the previous decades. Such empirical super-resolution methods create visually appealing thermal images; however, they may sacrifice radiometric consistency because they are not necessarily sensitive to specific sensor features. In this paper, we evaluated a “sensor-driven” super-resolution approach that explicitly considers the sensor blurring process, to ensure radiometric consistency with the original thermal image during high-resolution thermal image retrieval. The sensor-driven algorithm was applied to a cloud-free Moderate Resolution Imaging Spectroradiometer (MODIS) scene of heterogeneous urban and suburban landscape that included built-up areas, low mountains with a forest, a lake, croplands, and river channels. Validation against the reference high-resolution thermal image obtained by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) shows that the sensor-driven algorithm can downscale the MODIS image to 250-m resolution, while maintaining a high statistical consistency with the original MODIS and ASTER images. Part of our algorithm, such as radiometric offset correction based on the Mahalanobis distance, may be integrated with other existing approaches in the future.


Introduction
Measurement of terrestrial thermal emissions allows us to estimate the land surface temperature and the emissivity of surface materials. Thermal remote sensing takes advantage of such features to effectively monitor volcanic disasters [1], wildfires [2], crop fields [3], mineral composition [4], regional climate [5] and urban heat islands [6,7]. In comparison to observation using in situ photographs [8] or unmanned aerial vehicles [9], satellite-based observation has advantages in spatial coverage, frequency, and regularity. One of the major issues in thermal remote sensing is the coarse spatial resolution of the thermal images [4]. In comparison to optical sensors that observe solar reflection from the Earth's surface, thermal sensors that observe thermal emissions from the surface collect electromagnetic waves with lower signal strength, resulting in lower spatial resolution. For example, the spatial resolution of the optical data provided by the Moderate Resolution Imaging Spectroradiometer (MODIS) is 250 m or 500 m, whereas that of thermal data is 1 km. A similar situation arises for other moderate resolution sensors: the resolution of optical data provided by the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) is 15 m or 30 m, but the thermal data resolution is 90 m.
The other aspect of degraded spatial resolution is the general trade-off between spatial and temporal resolution, and the spatial and spectral resolution in a single sensor. Due to technical limitations (especially in data downlinks), frequent and/or spectrally fine-resolution observation sacrifices spatial details, and vice versa. Missing spatial detail in thermal images is particularly critical when monitoring heterogeneous landscapes, such as urban and suburban areas.
To enhance the spatial resolution of thermal images, a wide variety of techniques, called "disaggregation," "downscaling," and "super-resolution," have been developed in recent decades [10]. These can be roughly divided into multi-sensor-based and single-sensor-based approaches. The multi-sensor-based approach, also called spatiotemporal image fusion [11], mainly focuses on mitigating the trade-off between spatial and temporal resolution. In this approach, thermal images with high spatial (but low temporal) resolution are estimated from simultaneously (or quasi-simultaneously) acquired thermal images with low spatial (but high temporal) resolution, based on an empirical relationship between them. Various algorithms, such as the spatial and temporal adaptive reflectance fusion model [12] and similar or improved models (e.g., [13][14][15][16]), are used to describe the relationship. These algorithms are powerful tools for environmental monitoring with high spatiotemporal resolution, and are widely applied with match-up pairs between MODIS and ASTER [17], MODIS and Landsat [15], and polar orbiting satellites and geostationary satellites [14]. However, given the nature of spatiotemporal image fusion, the success or failure of this approach depends on the selection of the matched pairs used to describe the relationship.
In contrast, the single-sensor-based approach relies on a relationship between the thermal image and images in other spectral domains (usually optical) acquired by the same instrument, to enhance the spatial resolution of thermal images. This approach can be applied to a single sensor that observes thermal and another spectral domain simultaneously from the same platform, even in the absence of a counterpart satellite platform that offers a sufficient chance of simultaneous overpasses of the region of interest, which is rarely realized for satellites with irregular orbits, and for deep space exploration. Pan-sharpening methods via intensity-huesaturation transformation or principal component analysis have been used traditionally [18,19], and kernel-driven methods [20][21][22] and machine-learning approaches (e.g., [23][24][25]) have become popular recently. These efforts have created visually appealing thermal images that have higher spatial resolution than the original ones. However, such "data-driven" approaches do not necessarily take physical processes into account, including sensor-specific features, and radiometric consistency.
In contrast, there are a few "sensor-driven" approaches that explicitly consider sensor features, and target radiometric consistency in the super-resolution results. Hughes and Ramsey [4] introduced a sensor-driven super-resolution approach originally developed by Tonooka [26], which creates both quantitatively accurate and qualitatively acceptable results for their exploration of Mars using the Thermal Emission Imaging System (THEMIS) onboard the Mars Odyssey [27]. This simple approach uses the Mahalanobis distance to estimate each high-resolution pixel value from neighboring, spectrally similar low-resolution pixels.
Beneficial characteristics of this approach include consideration of the point spread function (PSF) for the sensor of interest, and radiometric correction weighted by the Mahalanobis distance after the tentative super-resolution retrieval. Such an approach that gives attention to sensor physics also seems to be in line with the recent trend of physically informed machine learning [28], and worth revisiting to achieve single-sensor-based super-resolution rather than using the empirical, data-driven approaches [18][19][20][21][22][23][24][25]. However, sufficient validation and evaluation of the super-resolution results obtained using the sensor-driven algorithm over a heterogeneous Earth surface have not been conducted. In addition, as the original algorithm was proposed more than 10 years ago [26], there seems to be room for refinement. Although it was implemented with ASTER data over urban and suburban areas, quantitative accuracy assessments using independent validation data have not been provided yet.
Therefore, this work aims to investigate the potential applicability of the sensor-driven approach over a heterogeneous landscape, and to improve its primitive algorithm. A complex terrain including urban, suburban, forest, lake, and river areas was selected as the study site for this purpose. Similar to previous thermal super-resolution research (e.g., [17,22]), Terra/ MODIS was used as the sensor of interest. The relatively wide swath of MODIS is suited to covering large areas and capturing various land cover types in comparison with other moderate-resolution instruments (e.g., Landsat) that are also often used for super-resolution algorithm development. The other advantage of Terra/MODIS is the existence of a counterpart higher-resolution sensor (ASTER), which can be used for validation data. Because they are onboard the same satellite platform and make simultaneous observations, comparison between them can minimize differences in atmospheric and/or surface conditions [29]. Because both MODIS and ASTER data are freely available, readers can easily reproduce our results. The radiometric calibration uncertainty (sensor requirement) for MODIS thermal bands for surface temperature measurement (i.e., bands 31 and 32) is ± 0.5% in radiance [30]. That for ASTER is ± 1 K or better in brightness temperature, for the range of 270-340 K (i.e., ± 0.3%) [31]. In-flight validation of the thermal bands of MODIS and ASTER has also been reported by Hook et al. [32]. This work provides the first quantitative accuracy assessment of sensor-driven super-resolution with MODIS, using independent validation data (ASTER).

Materials and methods
We first describe the original algorithm developed by Tonooka in 2005 [26] in the "Original algorithm" section, and then describe our proposed refinement in "Proposed refinement" section. Descriptions of the study site and data are provided in the "Study site and data processing" section.

Original algorithm
The original algorithm for the sensor-driven approach was proposed by Tonooka [26]. It is a single-sensor-based approach, and thus makes full use of high-resolution optical information to achieve super-resolution with the low-resolution thermal pixels. It relies on "the empirical fact that, if two nearby surfaces are covered by a similar material under a similar situation, their radiance spectra will be similar in the wide wavelength region" [26]. Therefore, application of the algorithm is not limited to correlation of thermal and optical images. As long as the abovementioned assumption is reasonable, the algorithm is applicable (and was actually applied), even between visible and near infrared bands and shortwave infrared bands.
For a general description, let us denote a pixel value of a higher-resolution image in band k (= 1, 2, . . ., n) as f high,k , and that of the counterpart lower-resolution image in band k' (k' = 1, 2, . . ., m) as g low,k' . By an appropriate inter-band coregistration and reasonable sensor design, we assume that one lower-resolution pixel corresponds to an integer number of higher-resolution pixels. The overall super-resolution procedure is as follows: Step 1) Search homogeneous pixels within each lower-resolution scale.
Step 2) Degrade f high,k images to the same resolution of g low,k' images considering the PSF (denoted as f low,k hereafter).
Step 3) Make a typical spectral pattern (i.e., correspondence between f low,k and g low,k' ) by clustering the homogeneous pixels within the entire study region.
Step 4) Calculate the Mahalanobis distance (d nei ) between f high,k at the pixel of interest and f low, k at neighboring homogeneous pixels within a moving window.
Step 5) Calculate the Mahalanobis distance (d lib ) between f high,k at the pixel of interest and the typical spectral pattern extracted in step 3.
Step 6) Compare all Mahalanobis distances calculated in steps 4 and 5, and assign g low,k' at the minimum d nei or d lib as the super-resolved pixel value (g high,k' ).
Step 8) Add an offset so that degraded g high,k' can be consistent with the original g low,k' for each low-resolution pixel (i.e., perform radiometric correction). The offset value is determined for each high-resolution pixel from the Mahalanobis distance and PSF.

PLOS ONE
Thermal remote sensing over heterogeneous landscapes using sensor-driven super-resolution resolution images are degraded to the same resolution as the low-resolution images in step 2. For each high-resolution pixel location, the B data is positioned by referring to the relationship between the A and B spectral information, either at a neighboring homogeneous pixel (step 4) or in a typical spectral pattern (step 5). By repeating this process (step 6) for all high-resolution pixel locations (step 7), images having B-band information but having A-band spatial resolution are created (i.e., super-resolution). The final result is output after post-processing (step 8). A more detailed explanation of each step is provided in the following section.
Theoretically, this process can be applied to any two datasets that have different spatial resolutions, as long as they have some statistical relationship. In the case of MODIS, there are terrestrial bands with three different spatial resolutions (i.e., 250 m for bands 1 and 2, 500 m for bands 3-7, and 1 km for thermal bands), leading to arbitrariness in combining these bands to obtain super-resolution. In the case of the original algorithm [26], band 3-7 (500-m resolution) were first super-resolved to a resolution of 250 m by referring to the highest-resolution bands (bands 1 and 2), and then the thermal bands (1-km resolution) were super-resolved to a resolution of 250 m by referring to bands 1 and 2, and previously obtained super-resolution bands (3)(4)(5)(6)(7).
The input-output process for this "two-times super-resolution" method is shown in Fig 2. In the flowchart (Fig 1), original bands 1 and 2 (250-m resolution) correspond to f high,k , degraded bands 1 and 2 (500-m resolution) correspond to f low,k , which are indicated by the two red arrows input to the first super-resolution step in Fig 2. The original bands 3-7 (500-m resolution) correspond to g low,k' , which is shown as the blue arrow input to the first super-resolution step. The super-resolved bands 3-7 (250-m resolution) are further input to the second super-resolution step with the original bands 1-2 (f high,k in the second super-resolution step), as well as both degraded bands (1-km resolution; f low,k ) and the original thermal bands (g low, k' ). The final output is the thermal images (bands 31, 32) with a resolution of 250 m.
Note that band 5 of Terra/MODIS suffers from stripe noise [33], and we decided not to use it for further processing.

PLOS ONE
Thermal remote sensing over heterogeneous landscapes using sensor-driven super-resolution For the second super-resolution process, the Mahalanobis distance from the highest-resolution bands and the previously super-resolved bands (bands 3-7 in our case) were calculated separately. The total Mahalanobis distance is evaluated by where d 1 is the Mahalanobis distance (either d nei or d lib ) for bands 1 and 2 in our case, d 2 is that for bands 3-7, n 1 (= 2) and n 2 (= 4) are the corresponding number of bands, and w is a weighting factor, which was assumed to be 0.7 [26].

Proposed refinement
To make the algorithm more straightforward and to create better radiometrically corrected results, in this paper, we propose two modifications regarding (1) the order of multiple superresolution retrievals and (2) regularization of the offset adjustment. For each super-resolution process, refinement (1) concerns input-output correspondence and degraded image input, whereas refinement (2) concerns post-processing (both are indicated by a star symbol in the flowchart in Fig 1). For the first modification, the second-highest resolution images are first super-resolved to the highest resolution, which are used in the second super-resolution process in the original algorithm. In this case, the first super-resolution process relies only on the two highest-resolution bands (bands 1 and 2), which is likely to cause substantial uncertainty in the first superresolution retrieval. The uncertainty probably propagates to the second super-resolution retrieval, making it difficult to perform reliable analysis with the super-resolution results. In addition, regarding this procedure, the original algorithm evaluates the Mahalanobis distance from the highest-resolution bands and super-resolves the second-highest resolution bands separately (Eq 1). This seems to make the algorithm complex and requires the somewhat arbitrary hyperparameter w.
To avoid this complexity, we applied the procedure in the inverse direction: first, thermal bands were super-resolved to 500 m with the help of bands 1-7, the result of which was further super-resolved to 250 m with the help of bands 1 and 2 (Fig 3). The MODIS bands 1 and 2 were degraded to 500 m and 1 km, and bands 3-7 were degraded to 1 km in the first super-resolution retrieval. In other words, bands 1-7 (500-m resolution) were f high,k , bands 1-7 (1-km resolution) were f low,k , and bands 31, 32 were g low,k' , which were all input to the first super-resolution step. These were used together for calculation of the Mahalanobis distance, and thus Eq 1 and the arbitrary parameter w were no longer needed. The procedure enables the first super-resolution retrieval to make full use of all the optical bands, which may also improve the second super-resolution retrieval and yield a more reliable final result.
For this modification, a detailed description of each step of the algorithm is provided below.
Step 1) Within each low-resolution pixel, the standard deviation of f high,k is calculated. Homogeneous pixels are flagged when the standard deviation within a low-resolution pixel exceeds the standard deviation over the entire study area for all bands k. In the first superresolution process, k ranges from band 1 to 7 with 500-m resolution (i.e., n = 7), whereas in the second super-resolution process, k ranges from band 1 to 2 with 250-m resolution (i.e., n = 2).
Step 2) Within each low-resolution pixel, f high,k is degraded using the PSF as a weighting function to describe signal blurring in low-resolution sensor observation: where i and j denote high-resolution pixel locations within a low-resolution pixel. The mathematical expression for the PSF is provided in the "Study site and data processing" section.
Step 3) For all homogeneous pixels, K-means++ clustering is conducted first with f low,k , and then with g low,k' . The number of clusters is set to nine based on the land cover types of the study site (see "study site and data processing" section). The clusters for f low,k and g low,k' compose hierarchical trees. For each node of the tree, samples of f low,k and g low,k' are averaged and stored as a database, creating typical spectral patterns over the entire study region.
Step 4) Homogeneous pixels are picked up within ±10 low-resolution pixels (i.e., a moving window) from the high-resolution pixel of interest. The Mahalanobis distance is calculated by where f high = (f high,1 , f high,2 , . . ., f high,n ) T is a vector with pixel values at the pixel of interest, f low = (f low,1 , f low,2 , . . ., f low,n ) T is a vector of homogeneous pixels, and V f is a variance-covariance matrix of f low for all the homogeneous pixels of the study site. The homogeneous pixel with minimum d nei (i.e., the spectrum most similar to the pixel of interest) is a candidate for g high,k' .
Step 5) Similarly, the Mahalanobis distance for the typical spectral pattern is calculated by where f lib is a column vector with the average pixel values (band k = 1, 2, . . ., n) from each cluster. The minimum d lib is also a candidate to estimate g high,k' .
Step 6) The minimum d nei and the minimum d lib are compared. When d nei � d lib , g low,k' at the homogeneous pixel with the minimum d nei is placed into the high-resolution pixel as g high, k' . When d nei > d lib , the algorithm searches for the spectrum in the g domain at the node where d lib was a minimum: where g low = (g low,1 , g low,2 , . . ., g low,m ) T is a vector with pixel values at the pixel of interest, g lib is a column vector with average pixel values from each g cluster at the node with minimum d lib , and V g is a variance-covariance matrix of g low for all the homogeneous pixels in the study site. The average g low,k' at the node of minimum d lib,g is placed as g high,k' .
Step 7) Steps 4-6 are repeated for all high-resolution pixels to create a high-resolution g image with g high,k' . At the same time, the Mahalanobis distance corresponding to the adopted g high, k' is stored for each pixel as a "distance map." Step 8) The retrieved g high,k' should be radiometrically consistent with g low,k' when degraded again within a low-resolution pixel. To this end, an offset value is added to g high,k' . Instead of adding an offset uniformly over the low-resolution pixels, full use is made of the Mahalanobis distance, to allow additional offset corrections to be made for less reliable pixels of g high,k' (i.e., pixels with less spectral similarity). The offset to meet this concept is where d is the Mahalanobis distance from the distance map, g' high,k' is the corrected result, and α k 0 is a modification scale defined by Our second modification of the original algorithm regards the offset correction. The abovementioned offset correction with consideration of the Mahalanobis distance as a weighting function is theoretically reasonable; however, a very large Mahalanobis distance among a few pixels may result in overcorrection and implausible pixel values. To mitigate overcorrection while still employing the concept of weighting by the Mahalanobis distance, we introduced a regularization term into the distance map: where d norm is a normalized distance that makes the summation over the entire study region equal to 1, d reg is the regularized distance, and λ is a tunable positive real number applied over the entire study region. A large λ makes the correction uniform within a low-resolution pixel, whereas a small λ makes it diverse (λ!0 is equivalent to the original algorithm). We compared the results from (1) the original algorithm, (2) the inverse-direction superresolution algorithm without distance regularization (i.e., the first modification), and (3) the inverse-direction super-resolution algorithm with distance regularization (i.e., the first and the second modification). For simplicity, hereafter we call them Algorithm 1, Algorithm 2, and Algorithm 3, respectively. This comparison will clarify how our algorithm refinement improves the super-resolution results.
To summarize, the benefit of the sensor-driven algorithm over other existing approaches is explicit consideration of the PSF, and radiometric correction weighted by the Mahalanobis distance. The sensor-driven algorithm (with our improvement) may be useful for thermal super-resolution research in the context of physical consistency.

Study site and data processing
The study site is centered around Tsukuba City, Ibaraki, Japan (36.049N-36.459N, 139.856E-140.353E; Fig 4). The region includes urban and suburban areas of Tsukuba and several neighboring cities; Mount Tsukuba, which is covered by a mixed needleleaf and broadleaf forest; and a part of Lake Kasumigaura, the second-largest inland waterbody in Japan. Rice paddy fields and croplands are distributed along several narrow river channels. According to the land cover map provided by the Japan Aerospace Exploration Agency (JAXA) [34], there are also a few grassland areas. The spatial resolution of the land cover map is 250 m. The overall accuracy and kappa coefficient have been reported as 78.0% and 0.745, respectively [34].
We searched for a cloud-free scene acquired by MODIS and ASTER simultaneously, and the scene on 24 September 2001 was selected for use. Level 1B calibrated radiances (MOD02QKM for bands 1 and 2, MOD02HKM for bands 3-7, and MOD021KM for bands 31 and 32) were downloaded via the Level-1 and Atmosphere Archive and Distribution System from the Land Processes Distributed Active Archive Center website [35]. To treat images with equally spaced meter scales, all images were projected onto the UTM 54 projection with a WGS84 ellipsoid by nearest neighbor resampling. For simplicity, super-resolution processing was conducted with images in the radiance scale (W/m 2 /str/μm), including thermal bands. If necessary, thermal radiance can be translated into brightness temperature T b (K) by Planck's

PLOS ONE
Thermal remote sensing over heterogeneous landscapes using sensor-driven super-resolution law: where h = 6.626 × 10 −34 J s is the Planck constant, c = 2.988 × 10 8 m/s is the speed of light, k = 1.380×10 −23 J/K is the Boltzmann constant, l is the wavelength (m), and B l is the radiance (W/m 2 /str/m) at the wavelength. For reference, ASTER Level-3A radiance data on the same day were downloaded via the METI AIST satellite Data Archive System website [36]. The data were also projected on the UTM 54 projection with a WGS84 ellipsoid. The band correspondence between ASTER and MODIS is summarized in Table 1.
In the research performed by Tonooka [26], image coregistration between bands was not implemented because the author assumed that the accuracy of the inter-telescope registration of the data used (ASTER) was sufficient for the algorithm. However, data-driven coregistration is desirable before integrating multiple images (e.g., [12]). Therefore, we implemented image coregistration using the phase-only correlation (POC) approach [37]. Specifically, reference images (i.e., ASTER) were coregistered via POC between bands first. Then each MODIS band was coregistered by comparing it with the corresponding ASTER band (Table 1) via POC. This ensured the elimination of uncertainty caused by inconsistent MODIS inter-band registration during the super-resolution process, and inter-sensor registration between MODIS and ASTER during validation.
The MODIS PSF to simulate spatial degradation of the higher-resolution signal can be modeled by the convolution of a triangular function along the across-track direction and a Gaussian function [38,39]. The former represents the detector response [40], and the latter represents optical blurring [38]. The PSF was considered to be a weighting function of spatial degradation within a square low-resolution pixel, which includes ν × ν high-resolution pixels (ν is the number of pixels along a column or row). The triangular function can be expressed as follows, by considering geometric transformation of the coordinates within a low-resolution pixel: where i and j are the high-resolution pixel coordinates in a low-resolution pixel; u(i,j) and v(i, j) are those for the cross-track and along-track directions, taking the center of the image as the origin; and a is the inclination of the along-track direction measured on the i-j coordinates, which was set to 5.357 by checking geolocation information in the MODIS data (MOD03 [35]). The Gaussian function was where σ determines the standard deviation of the Gaussian function, which was set to 0.2 by referring to [38,39].
Examples of each PSF are shown in Fig 5. Via the super-resolution algorithm, 250-m MODIS thermal images (bands 31 and 32) were retrieved, which were validated by the corresponding ASTER band 14. To this end, the ASTER image was degraded to 250-m resolution using the MODIS PSF. The correlation coefficient (CC) and root mean squared error (RMSE) between the MODIS and ASTER images were calculated for the three types of algorithms (the original algorithm, and the proposed algorithm with and without distance regularization) to investigate the effect of our refinement. The Relative Dimensionless Global Error (ERGAS) index and peak signal-to-noise ratio (PSNR) [41] were also checked to analyze the accuracy of spectral and spatial reconstruction, respectively. Since the quantization of the thermal reference data (ASTER) is a 12-bit process [42], the maximum value is 4095 (equivalent to a radiance of 21.39 W/m 2 /str/m), which was used for

PLOS ONE
Thermal remote sensing over heterogeneous landscapes using sensor-driven super-resolution calculating the PSNR. Spatial patterns (i.e., images) and the basic statistics for the radiance were also checked among the reference, retrieved, and original data.

Results
The regularization parameter λ was determined as 0.002 based on tuning repeated twice, to give the first super-resolution process the best performance (i.e., the least RMSE and the best CC; Fig 6). On this basis, the validation with ASTER images for Algorithms 1, 2, and 3 is shown in Table 2. Because the relative spectral responses are different even between the corresponding bands in MODIS and ASTER images, it is natural that there is a systematic bias in radiance. Apart from the inevitable bias, the best performance is achieved with our proposed Algorithm 3, which produced the highest CC and PSNR, and the lowest RMSE and ERGAS.
More importantly, our proposed Algorithm 3 shows the best statistical consistency with the original MODIS thermal data, and as a result, also with the ASTER data (as clearly seen in Table 3). The original Algorithm 1 creates both physically impossible negative radiance and implausibly high radiance. Only the average values were acceptable because of the offset correction. Our proposed algorithm without regularization (Algorithm 2) shows better results than Algorithm 1, without any physically impossible values. However, with the appropriate regularization (Algorithm 3), the statistical consistency with the original MODIS and ASTER images increased further, not only for the average values, but also for the minimum and maximum values. Interestingly, the standard deviation of the retrieved result with Algorithm 3 is more consistent with that of the reference data (ASTER) than that of the original MODIS data.

PLOS ONE
Thermal remote sensing over heterogeneous landscapes using sensor-driven super-resolution Fig 7 shows the stepwise enhancement of the spatial resolution in MODIS thermal images by Algorithm 3. Although the image contains some noisy patterns that are probably errors from our algorithm, textual details are certainly retrieved, especially around the boundaries of major land features such as Lake Kasumigaura and Mount Tsukuba. Compared with the land cover types (Fig 4), urban and built-up regions tend to show a brightness temperature higher than that of neighboring areas. Low brightness temperatures over water surfaces and forest areas are reasonable given the abundant evapotranspiration and aerodynamic features. Focusing on the forest area, the northeast part is hotter than the other area, which is probably due to the difference in altitude.
An almost similar spatial pattern can even be retrieved using Algorithms 1 and 2 (Fig 8). However, careful comparison shows that Algorithm 2 tends to generate slightly more noisy patterns and lower contrast than Algorithm 3, and that Algorithm 1 generates a few pixels having a negative brightness temperature (shown by small red points), which also confirms the results in Table 3.

Discussion
We revisited the sensor-driven approach for thermal image super-resolution and investigated its applicability to a complex landscape with urban and suburban regions. The sensor-driven algorithm [26] with our modification refined the statistical consistency of the retrieved MODIS images (250-m resolution) with the original MODIS images and with the reference ASTER images (Tables 2 and 3). Refinement of the algorithm structure (from Algorithm 1 to 2) improved the accuracy of the super-resolution process ( Table 2): in Algorithm 1, the first super-resolution process relies only on bands 1 and 2, which is likely to cause substantial Table 2

PLOS ONE
Thermal remote sensing over heterogeneous landscapes using sensor-driven super-resolution uncertainty in the first super-resolution retrieval. The uncertainty probably propagates to the second super-resolution retrieval, resulting in enhanced uncertainty in the whole super-resolution process. In addition, the somewhat arbitrary hyperparameter w is likely to make the original algorithm too complex to obtain best-tuned results. Algorithm 2 was likely to address such issues, and was further improved by the introduction of a regularization term for the Mahalanobis distance (i.e., Algorithm 3). The standard deviation in the retrieved MODIS images using our algorithm (Algorithm 3) is more consistent with the ASTER images than with the original MODIS image with 1-km resolution. This suggests that contrasting features (i.e., spatial details) are captured by the super-resolution process, which are missed in the original MODIS image having a coarser resolution. Retrieved thermal images well captured specific features of different land covers. In particular, urban areas (Fig 4) tend to show high brightness temperatures, suggesting an urban heat island phenomenon [43]. Statistics for each type of land cover showed that the mean thermal radiance (or brightness temperature) in urban regions is the highest among the categories (Table 4), quantitatively confirming the urban heat island phenomenon. The thermal radiance and super-resolution accuracy were similar between paddy and crop categories distributed around suburban regions. This is understandable because water in rice paddy fields should be drained in preparation for harvest in this season (September), creating similar thermal properties to crop fields before and after harvest. For further discussion of the thermal structure in an urban and suburban area, the land surface temperature [44] rather than the radiance or brightness temperature may be more suitable, although it requires additional work to estimate the thermal emissivity precisely over heterogeneous artificial materials [45,46]. The highest accuracy (PSNR and ERGAS) was observed for the urban category, suggesting that this algorithm is suitable for obtaining super-resolution in urban landscapes. Poor accuracy was obtained in the water and forest categories. For the water category, the weak correspondence between the thermal properties and optical spectra probably makes it difficult to conduct an accurate super-resolution process. The reason for the poor accuracy for the forest categories can be largely attributed to the temperature differences at different altitudes around Mount Tsukuba. Adding altitude data (i.e., digital elevation model) when calculating the Mahalanobis distance and clustering homogeneous pixels may improve the results for such mountainous regions. Improvement in the accuracy indices by our algorithm refinement was statistically confirmed for all land cover categories ( Table 5). The Wilcoxon signed-rank test over the categories (n = 7) indicated that Algorithm 3 showed a smaller ERGAS and a greater PSNR than Algorithm 2 with statistical significance (p < 0.01), and Algorithm 2 showed a smaller ERGAS and greater PSNR than Algorithm 1 with statistical significance (p < 0.01). Therefore, for all categories, Algorithm 2 was better than Algorithm 1, and Algorithm 3 was better than both Algorithms 1 and 2.
Artificial coloring and metal composition of the surface would also influence the estimation of radiance or brightness temperature using our super-resolution algorithm, especially when creating the typical spectral pattern, which relies on the spectral link between the optical and thermal domains. In practice, the impact of the uncertainty in the typical spectral pattern on the super-resolution is limited because most of the thermal radiance g high,k' is retrieved from neighboring homogeneous pixels, rather than the typical spectral pattern extracted by clustering (Fig 9G and 9H). Therefore, as long as a spectrally similar target with 500-m spatial homogeneity in the second super-resolution image can be obtained around the pixel of interest, the resulting image is likely to be reliable. This condition may sometimes be too strict for heterogeneous landscapes including urban and suburban areas, and thus retrieval has uncertainty in such regions. In fact, the Mahalanobis distance for the retrieval is small (i.e., high reliability in the retrieval) over the relatively homogeneous forest region and water body apart from the lake shore (Fig 9D and 9F), but not in the urban and suburban areas, and the boundary the land covers (e.g., the shore of Kasumigaura Lake) with less homogeneity (Fig 9A-9C). However, even in such cases, the offset correction with regularization at least ensures statistical consistency in the final retrieved value g' high,k' . Super-resolution with higher-resolution data (e.g., ASTER) may further mitigate uncertainty arising from such spatial heterogeneity.
There is still room to improve our algorithm regarding the visualizability of the retrieved image. Textural details, such as narrow river channels and mixed landscapes of crop, forest, urban, and suburban areas (Fig 4) are hidden behind the noisy patterns generated by the Table 4. Statistics for super-resolved thermal data and accuracy indices for Algorithm 3 for each land-cover type. The land-cover type was determined using previously reported data [34], while unclassified and snow/ice categories were excluded. Note that the four forest categories (see Fig 4) were integrated into the one category. Radiance (W/m 2 /str/μm) with standard deviation, T b (brightness temperature; K) with standard deviation, PSNR, ERGAS, and N (number of sample pixels) are listed.

PLOS ONE
Thermal remote sensing over heterogeneous landscapes using sensor-driven super-resolution algorithm (Fig 7). Due to the inherent feature of the twofold super-resolution retrieval, the noise generated in the first super-resolution image should inevitably affect the second superresolution image. In fact, a distributed spatial pattern of the large Mahalanobis distance (i.e., less reliable retrieval) is observed in the distance map in the second super-resolution image (Fig 9E), which can be attributed to the noise generated by the first super-resolution retrieval. Data-driven approaches, including traditional pan-sharpening [18,19], kernel-driven methods [20][21][22], and machine learning [23][24][25], may have an advantage from the viewpoint of visualizability. Therefore, comparison of the sensor-driven approach with such data-driven approaches and/or their integrated use will be important future work. Especially, the offset correction, which is an important part of our algorithm, may be added to other data-driven approaches to improve the statistical consistency, while keeping each algorithm straightforward.

Conclusion
To improve the spatial resolution of thermal satellite images, we revisited a sensor-driven super-resolution algorithm and investigated its applicability to a complex landscape with urban and suburban regions. The algorithm explicitly considers the sensor blurring effect using a point spread function, and ensures radiometric consistency with the original thermal image during high-resolution thermal image retrieval, both of which are not generally taken into consideration in existing approaches such as machine learning and kernel-driven methods. We also introduced modification to the original sensor-driven algorithm to enhance the statistical consistency of the super-resolution results, including making the algorithm structure more straightforward, and introducing regularization term when calculating the Mahalanobis distance.
The original sensor-driven algorithm (Algorithm 1) and two refined versions (Algorithms 2 and 3) were applied to a cloud-free MODIS scene to enhance the thermal (1 km) resolution to the optical (250 m) resolution, and were validated against the corresponding high-resolution thermal image (ASTER). The validation result showed that the refined sensor-driven algorithm can downscale the MODIS image to 250-m resolution, while maintaining a high statistical consistency with the original MODIS and ASTER images. Part of our algorithm, such as radiometric offset correction based on the Mahalanobis distance, may be integrated with other existing approaches in the future.