Multi-scale Xception based depthwise separable convolution for single image super-resolution

The main target of Single image super-resolution is to recover high-quality or high-resolution image from degraded version of low-quality or low-resolution image. Recently, deep learning-based approaches have achieved significant performance in image super-resolution tasks. However, existing approaches related with image super-resolution fail to use the features information of low-resolution images as well as do not recover the hierarchical features for the final reconstruction purpose. In this research work, we have proposed a new architecture inspired by ResNet and Xception networks, which enable a significant drop in the number of network parameters and improve the processing speed to obtain the SR results. We are compared our proposed algorithm with existing state-of-the-art algorithms and confirmed the great ability to construct HR images with fine, rich, and sharp texture details as well as edges. The experimental results validate that our proposed approach has robust performance compared to other popular techniques related to accuracy, speed, and visual quality.


Introduction
Single image super-resolution (SISR) is more attractive in recovering the high-resolution (HR) output image from a degraded version of a low-resolution (LR) input image generating by a cheaper cost imaging framework within the limited environmental conditions. Recently, SISR, is a very interesting research space in the area of image and computer vision tasks, which is extensively applied in various applications such as; an object detection [1,2], image segmentation [3,4] and image classification [5,6] purposes.
The better performance and higher accuracy of SISR have been encouraged in the area of an image, especially in medical imaging [7][8][9], face detection and recognition [10,11], a highdefinition television (HDTV) [12], video surveillance [13], satellite imaging [14] and autonomous driving technology [15,16], where rich details information is greatly desired. Though, image SR is a highly challenging ill-posed inverse problem. Recently, a number of SISR approaches have been discussed to resolve the ill-posed inverse problem. These approaches • The Rectified Linear Unit (ReLU) was replaced with the Parametric Rectified Linear Unit (PReLU) to activate the dead features, due to zero gradients.
• We introduced the new Xception block, which can detect the different image features information for rebuilding the HR image.
The remaining section is structured as follows. Section 2 presents a related work of image SR approaches. Section 3 and 4 explain our proposed method and its experimental results. Section 5 explained the conclusion.

Related work
The target of SISR image is to construct the visually pleasing HR output image. The first concrete deep learning-based approach for the SISR problem was suggested by Dong et al. [40] known as Super-Resolution Convolutional Neural Network (SRCNN) [40] and presented significant improvements over all previous SR methods. SRCNN [40] model used three convolution layers to predict the HR image. Wang et al. [41] introduced the sparse prior deep convolutional neural networks for image SR based approach, named as Sparse Coding Network (SCN) [41]. The performance of SCN [41] is better than SRCNN [40]. The major drawback of SCN [41] is the high computational complexity and also hinders its applications in real-time processing scenarios.
Dong et al. [42] proposed the improved and faster version of SRCNN [41] architecture to accelerate super-resolution image reconstruction, known as Fast Super-Resolution Convolutional Neural Network (FSRCNN) [42]. FSRCNN [42] has a modest network architecture, that depends on four CNN layers and one deconvolution layer for upsampling purposes and using the original input LR images without interpolation techniques. FSRCNN [42] has lower computational complexity and better performance as compared to SRCNN [41] but has a limited network capacity.
A very deep SR network (VDSR) [32] was proposed by Kim et al. [32] who was inspired by the Visual Geometry Group Network (VGG-net) implemented in the ImageNet for classification purpose [5]. VDSR [32] network reported the significant performance improvement over the SRCNN [41] network using the 20 CNN trainable layers. In order to ease the training complexity of a deeper model, they have used the global residual learning with a fast convergence rate. However, VDSR [32] network architecture does not use the actual pixel values but used the interpolated upscaled version of the image, which leads to more memory consumption and heavy computational cost. Kim et al. [33] proposed a Deeply Recursive Convolutional Network for image super-resolution (DRCN) [33] and uses the convolution layers multiple times. The key advantage of DRCN [33] is to fix the number of training parameters, although there are many number of recursions, the main deficiency is to slow the training process. The authors similarly used the skip connection with a recursive manner to optimize model performance. Mao et al. [43] extended the concept of residual type architecture and proposed Residual Encoder-Decoder Networks (RED) [43]. The RED [43] model used residual learning with symmetric convolution operation, which is trained on 30 layers and achieves the best performance. Therefore, such studies replicate the concept of "the Deeper the Better".
Lai et al. [44] proposed a different network architecture for image SR is known as a deep Laplacian Pyramid Super-Resolution Network (LapSRN) [44], to generate the HR image. LapSRN [44] architecture depends on the different levels of the pyramid and each pyramid level is caused by a deconvolution layer as an upsample, but having the problem in scaling factor (fixed integer), which limits the flexibility of the model. Zhang et al. [45] suggested the denoising convolutional neural networks (DnCNNs), to accelerate the improvement of very deep neural network types architectures. DnCNN [45] follows the same architecture as SRCNN [40] and stacked the CNN with batch normalization (BN) layers followed by the ReLU activation function. Although the model provides favorable results, they are computationally expensive due to the use of the batch normalization layer. Zhao et al. [46] proposed a more flexible scaling factor to super-resolved the input LR image named as a gradual upsampling network (GUN) [46]. For Upsampling purposes the GUN [46] network architecture used the bicubic interpolation technique.
Tai et al. [47] introduced the idea of the deep recursive residual network (DRRN) [47] with 52 CNN layers. The authors introduced a stable training process for a deeper network with parallel architecture. Ledig et al. [34] employ a deep residual connection with 16 blocks using skip-connection to recover the upscaled version of the image. Lim et al. [48] proposed a method to develop deep SR architecture to increase the training efficiency of a model by eliminating the BN layers and their method to win the NTIRE2017 SR challenge [49]. Meanwhile, Tai et al. [50] suggested the deepest model, known as a persistent memory Network for image restoration purposes (MemNet) [50], in which multiple memory blocks are stacked to obtain persistent memory. Yamanaka et al. [51] presented a combined architecture of skip connection layers and parallelized CNN layers for development of a deep learning-based architecture for SISR and used mainly two networks, the first network is utilized for extracting the features of different levels and the second is the image reconstruction type network. This model is shallower than VDSR [32].
Han et al. [52] proposed the idea of Dual-State Recurrent Network (DSRN) [52], which exchanges the information from LR to the HR state. At each state, they update the signal information and then transmit to the HR state. Li et al. [53] used an adaptive feature detection process to obtain the features fusion at different scales, named as a multi-scale residual network [53]. This approach used the complete hierarchical type of feature information to reconstruct an accurate image super-resolution. Ahn et al. [54] proposed scale-specific upsampling type modules with multiple shortcut connections to learn residuals in LR feature space and to handle the multi-scale information with appropriate specific pathways. Zhang et al. [55] took a concatenated version of the low-resolution image with its degradation mapping type architecture named as super-resolution network for multiple degradations (SRMD) [55].
Wang et al. [56] introduced a dilated CNN network to enhance a receptive field without increasing the size of the kernel. The relative size of the receptive field increases in the case of shallow network type architecture. In dilated convolutional network for SR (DCNSR) [56] uses 12 layers to extract the contextual information efficiently. In [57], the authors proposed End-to-End Image SR via Deep and Shallow (EEDS) [57] CNN architecture and to replace the bicubic interpolation upsampling with the transposed upsampling layer. The HR image is obtained from deep as well as shallow branch simultaneously. Yang et al. [58] suggested a deep recurrent fusion network (DRFN) [58] for image super-resolution, which used the transposed convolution layer with large scale factors. Su et al. [59] proposed a novel type structure, that consists of several sub-networks for reconstructing the HR image progressively. In each subnetwork, the input shall be utilized with the LR feature map and transposed convolution output will be fused with residuals to get the finer one. Wang et al. [60] solves the problem of single image SR using Heaviside Function with iterative refinement. The authors used the binary classification of images to reconstruct the HR image.
Hung et al. [61] proposed a super-sampling network (SSNet) [61] type architecture, which used depthwise separable convolution for image SR. In this architecture a number of parameters as well multiple operations can be significantly reduced by depthwise separable convolution technique. Barzegar et al. [62] introduced a small architecture to prevent the training problem in the deeper model. The design of a DetailNet architecture in such a way, that LR image information can be increased by any approach, then pass through main architecture to boost the perceptual quality of LR image. Hsu et al. [63] inspired by the capsule neural network to extract more potential features information for image SR. In this work authors designed two networks Capsule Image Restoration Neural Network and the Capsule Attention and Reconstruction Neural Network (CARNN) [63] for image SR. The CARNN [63] network generates super-resolution features information efficiently. Liu et al. [64] proposed a new hierarchical convolutional neural network (HCNN) [64] architecture for SR purpose and to learn the features information at different stages. In this approach, the authors have used a three-step hierarchical process, which depends on the extraction of the edge branch, a branch of edge reinforcement, and the SR image reconstruction branch. Muhammad et al. [65] proposed multi-scale inception based super-resolution using a deep learning approach (MSISRD) [65] for image reconstruction. In this approach, the authors used the concept of asymmetric convolution operation to enhance the computational efficiency of the model and finally used the inception block to reconstruct the multiscale feature information for image SR.
Tian et al. [66] resolve the problem of instability during the training and proposed the new network architecture known as Coarse-to-fine CNN for SISR (CFSRCNN). The proposed network architecture consists of feature extraction, enhancement, construction and refinement of blocks to learn the robust image super-resolution model. The stacked feature extraction blocks are used to learn the short as well as long path features, and then finally fuses the learnable features by expending the effect of a shallow to deeper network to enhance the representing of the features.
Qiu et al. [67] proposed the method of multiple improved residual network (MIRN) image SR network architecture. In this network architecture deep residual network with different levels of skip connection is used to resolve the lack of correlation between the information of adjacent CNN layers. Stochastic gradient descent method (SGD) is used to train the MIRN network architecture. Lan et al. [68] proposed the new dense lightweight network architecture known as fast and lightweight network for SISR. This method addresses the problem of feature extraction and feature correlation learning.
The deep CNN based image SR network architectures used an excessive amount of CNN layers and parameters. Usually, used high computational cost and more memory consumption for training a SR model. To resolve these problems Tian et al. [69] proposed the lightweight enhanced super-resolution based SRCNN known as (LESRCNN). In this approach authors are used the three types of successive blocks as an information extraction, enhancement, and reconstruction block with information refinement block.

Proposed method
In this section, we have discussed comprehensive details regarding our proposed network architecture for image SR based on ResNet and Xception blocks. Like the existing SISR methods, our proposed method is classified into five stages namely feature extraction, shrinking, upsampling, expanding, and multi-scale reconstruction, as shown in Fig 1.

Feature extraction
This part is similar to the previous methods but different from the input image. However, majority of the previous deep convolutional neural network type SISR approaches extract the features information from a bicubic interpolated upsampled version of the HR image. It is important to note that the bicubic interpolation technique damages vital information of LR image and introduces new noise in the model [57,70]. In contrast, we have used an alternative strategy in our proposed model for extracting the features information directly from the LR image without using interpolation techniques.
Our initial feature extraction stage consists of one convolution layer and two ResNet Blocks with skip connection followed by Parametric Rectified Linear Unit (PReLU) [71] activation function. The said stage extracts the low, middle, and high-level features of information simultaneously. Inspired by VDSR [32], we have used one convolution layer of filter size 3 × 3 with 64 number of filters accompanied by the Parametric Rectified Linear Unit (PReLU) [71]. Mathematically, the convolution layer can be explained as: where F l denoted the resultant output features map, B l denoted the biases of l th layer.
where W l are the weights of the filter and b l are the biases of the l t h layers, respectively. The output of the features map is denoted by F l and " � " represents the convolution operation. The W l supports n l × f l × f l number of parameters, where, f l indicates the filter size, n l represents number of filters. The CNN layer and ResNet blocks have the same sizes of 3 × 3 × c of kernels which generate the "c" features map, where "c" represents 64 number of channels.

PReLU.
Earlier approaches used the convolution layers or blocks which were followed by the rectified Linear Unit (ReLU), like SRCNN [40] and VDSR [32]. These types of models have a fair response, but results are still not satisfactory, because, in most of the cases ReLU has a constant gradient. Whereas, in the proposed model, we have used the Parametric Rectified Linear Unit (PReLU) [71], which not only resolves the problem of constant gradient but also has a relatively faster speed of convergence during the training. Mathematically, PReLU [71] activation function can be explained as: where x i is the activation function of i th layer input image, and the negative coefficient part of PReLU is denoted by a i , where a i parameter is used as ReLU for zero value and PReLU for learnable purpose. The main purpose of PReLU is used to avoid the "dead features", which is produced by zero gradients in the ReLU activation function. The resultant output feature maps using PReLU activation function can be written as: where F l denoted the resultant output features map, B l denoted the biases of l th layer.

Feature extraction blocks.
The layer stacked, side by side, increases the network depth but reduces the transmission of information to the final layers [72]. Resultantly, the vanishing gradient problem arises in the model and the computational cost of the model is increased. He et al. [73] proposed the ResNet blocks to resolve the above-said problems. The ResNet blocks, these days, are extensively used in the deep learning type SISR image to reconstruct the HR image. Furthermore, the deeper ResNet architecture has a superior performance

PLOS ONE
and is effectively used in the field of image SR [34,48]. In our proposed method, we have used different residual skip connections which make fast training convergence and reduce the complexity of the model. In Fig 2; we have shown the comparison diagrams of the original residual skip [74] connection, SRResNet [34], and our proposed ResNet block.
The architecture of the ResNet block as expressed in Fig 2(a); uses a direct path and skip connection by way of transmitting the features information and the summed up resultant information followed by the ReLU activation function. SRResNet [34] block as indicated in Fig 2(b); uses the alternative strategy to remove the ReLU activation function and provides a simple and clear path from one block to another. Fig 2(c); shows our proposed ResNet block, which eliminates the Batch Normalization (BN) [74] layers for improving the efficiency of the Graphics Processing Unit (GPU) memory card and enhances the computational efficiency of the model. Furthermore, we replace the operation of regular convolution with depthwise separable convolution followed by point wise convolution and ReLU activation function with PReLU. The PReLU is used to avoid the problem of vanishing gradient and to reduce the training complexity as well as enhances the efficiency of the block. For the middle and the highlevel feature extraction, we applied 2 ResNet blocks, each block consists of two 3 × 3 depthwise separable convolution kernels with 64 filters followed by PReLU nonlinearity.

Shrinking layer
If more features are directly applied to the transpose convolution layer, it will led to increase in computational cost as well as in size of the model. However, we have employed a one CNN layer as a shrinking layer before the deconvolution layer. This type of arrangement has also been observed in the latest convolutional neural network architectures for computer vision applications. Authors, proposed in [57,65,75] are using a shrinking layer for increasing the computational efficiency of the model.

Deconvolution layer
Researchers have suggested in [40,57,76] that the purpose of upscaling the LR image resolution before the initial layer is to increase the computational cost and damage critical

PLOS ONE
information due to the fact that the processing speed is directly dependent on resolution of the image. Furthermore, the use of upscaled techniques before the initial layer does not provide additional information, however, introduces the jagged ringing artifacts in the SR image. We propose for generating the high-resolution image directly from the actual low-resolution feature domain to handle these types of problems. For this purpose, we have applied the deconvolution layer as an upscaling operation before the Xception block. The size of the deconvolution layer is 16 × 16 of stride that is equal to enlargement factors.

Expanding layer
The expanding layer performs the inverse operation of a shrinking layer and produces the HR image more accurately. Furthermore, if the HR image is directly reconstructed from LR features, the final reconstruction quality of the image will be poor. Therefore, after the deconvolution layer, we are applying the expanding layer to recover the original feature's information smoothly.

Depthwise separable convolution.
Originally, depthwise separable convolution was proposed by Sifre [77] and was applied for image classification purposes. Factorizing a convolution operation is a form of depthwise separable convolution in which it converts regular convolution operation into a depthwise separable convolution operation followed by a pointwise convolution operation. The separable convolution operation performs a single filter per channel input and finally combines the linear input channels. The convolution process substitutes a factorized convolution layer with two layers; one is used for space filter, and the other is used for combining purposes. Thus, the depthwise separable convolution will sufficiently lessen both the number of parameters and size of the model. The regular type of convolution kernel takes three parameters such as; height (h), width (w), and input channel (c in ) of an input feature map (I). The resultant convolution layer (h × w × c in ) is applied as K × K × c in × c out , where c out , is the number of output channels. The depthwise separable convolution depends on two convolution operations: depthwise separable convolution operation and pointwise convolution operation. Mathematically, the depthwise separable convolution operation can be written as: where K represents the kernels of depthwise separable convolution operation of size K × K × c in . The n th filter in the kernel K is applied on the n th number of channels in the input feature map of I to reconstruct the G output feature map. While reconstructing new features, we apply the pointwise convolution. Mathematically, the pointwise convolution can be written as: Gðx; y; jÞ � Pðj; lÞ; ð6Þ where the size of the kernel of pointwise convolution operation is 1 × 1 × c in × c out .

Xception block.
In the final phase, we have employed a multi-scale Xception block that stands for a multi-scale Extreme version of Inception block, which is adopted from Goo-gLeNet [78] with a modified depthwise separable convolution better than Inception v-3 [79]. Multi-scale Xception block is used to choose the correct kernel size, as kernel size performs a pivotal role in model design, training procedure, and multi-scale reconstruction purposes. The larger size of the kernel is more suitable, when the features information is distributed globally, whereas the smaller size of the kernel is better, when features information is distributed locally. The Xception architecture employs this concept and includes more depthwise separable convolution on kernels of various sizes. Fig 3(a); shows a single scale regular convolution plain type of architectures, in which several convolution layers are stacked in a single straight-line path. Such type of architectures are implemented by a well-known image super-resolution methods, like SRCNN [40] and FSRCNN [42]. These types of architectures are easy in implementation, however, deeper network architecture has more memory consumption and enhances the network depth of the model . Fig 3(b); uses the regular convolution type inception block to extract the multi-scale feature information efficiently. Fig 3(c); shows our proposed block of multi-scale depthwise separable convolution. The proposed Xception block consists of different depthwise separable convolution kernel sizes, like 3 × 3, 5 × 5, and 7 × 7 followed by pointwise convolution with PReLU activation function, to reconstruct the SR image.

Experimental results
In this section, initially, we discuss the selection procedure of training and testing datasets with hyper-parameters. The training as well as testing datasets were downloaded from Kaggle website [80]. Afterwards, we have evaluated the quantitative as well as the qualitative performance in terms of PSNR/SSIM [81] and perceptual vision quality on five test datasets which are publicly available. Finally, we have compared the computational cost and processing speed of our proposed model in terms of PSNR versus the running time and network depth (number of K parameters).

Training datasets
The various sizes of the image datasets have been available for the training purposes to train the model for single image super-resolution. Yang et al. [23] and the Berkeley Segmentation Dataset (BSD300) [82] are commonly used image datasets, because these datasets are used by well-known SR methods, like VDSR [32], DRCN [33] and LapSRN [44] for the training purpose. In order to enhance the training dataset, data augmentation technique has been applied in terms of rotation and flipping. All the experimental evaluations were done on the original image and for data manipulation purposes, we used a programming language python 3.7.9, deep learning Keras 2.1.5 library supported back-end as Tensor Flow and PyTorch version 1.6.0. Various types of loss functions were also available to evaluate model performance. Deep learning-based CNN SR architecture has mostly used the mean square error (MSE) as the loss function. So, we have also used similar type of loss with our proposed method. Mathematically, the loss function may be calculated as: where F(Y i , θ) is the recovered output image, X i is the high-quality HR images, Y i is corresponding the low quality image, and the number of small size batches is the m in the training.
In the training phase, we have used an adaptive momentum estimation optimizer (Adam) [83] having a 0.0004 initial learning rate with mini-batch size of 16. The process of training takes 200 epochs to converge the model properly. We train our model on a NVIDIA GeForce RTX2070 GPU, having 2.6 GHz Ci7-9750H CPU with 16 GB RAM under the Windows 10 operating system's environment.

Testing datasets
We have assessed the performance of proposed network architecture on five standard datasets. The Set5 [84] dataset comprises of five images having different sizes like 228 × 228 and 512 × 512 pixels. The Set14 [85] images consist of different sizes of fourteen images. BSD100 [82] test dataset depends on 100 different natural scenes of images. Urban100 [86] is the challenging test image dataset having different frequency bands with detailed information.
Manga109 [87] test image dataset depends on different comic type images with fine structures.

Implementation details
Under the Windows 10 operating system environment, our proposed approach was trained and tested with NVIDIA GeForce RTX2070 GPU with 16 GB RAM. We have trained our model on the scale enhancement factor of 2×, 4×, and 8× in Keras 2.1.5, PyTorch 1.6.0 and MATLAB 2018a framework.

Comparison with other state-of-the-art-methods
We compare the performance of our MXDSIR SR method with ground-truth HR image, including baseline method (Bicubic interpolation) and twelve other state-of-the-art methods are A+ [88], RFL [89], SelfExSR [86], SCN [41], SRCNN [40], FSRCNN [42], VDSR [32], DRCN [33], LapSRN [44], DRRN [47], MemNet [50], and MSISRD [65] by both objective PSNR/SSIM [81] and subjective measures. The summary of quantitative evaluation performed on five benchmark datasets as shown in Table 1. We can observe from Table 1, that our model achieves the best quantitative results in terms of PSNR/SSIM on enlargement factor 2× and 8×. The maximum and minimum range of the average PSNR improvement on scale factor 2× is 0.13dB to 4.12dB. Similarly, we also used another quality matrix to evaluate the performance of our proposed model is the SSIM. The minimum and maximum average range of SSIM improvement on scale factor 2× are in the range of 0.001 to 0.05. In the enlargement factor 4×, our model achieves the second-best performance as compared to other existing methods, though DRRN [47] and MSISRD [65] are the most comparable, but these models incur a higher computational complexity as they have more model parameters. Finally, our minimum and maximum improvement on challenging enlargement factor 8×, our range of the From these images clearly observed that the baseline bicubic method cannot reconstruct any extra details information, but introduce the new noises in the image as well as more blurry results especially on enlargement scale factor 4× and 8×. The deep learning based image superresolution approach, like SRCNN [40], FSRCNN [42] and VDSR [32] can produce, in some cases, fair reconstruction details from the original LR input image, but still results in blurry image contours due to their model designed in linear fashion (stacked layer side by side). In case of LapSRN [44] as well as family of deeper model, results are fair, but miss some edges and lines, because deeper model only relies on the single scale kernel. As we compare existing deeper model for image SR, our model achieves noticeable improvement in terms of perceptual quality, due to multiscale kernel used in the Xception block. The noticeable improvement observed in Fig 6; especially "080" image from Urban100 has excessive amount of artifacts, but our method produces sharper boundaries and richer textures with less amount of artifacts. Similar artifacts also observed on the image Figs 7 and 8 respectively.

PLOS ONE
In summary, our proposed method can achieve better quality improvement measured by PSNR, SSIM index, and visual image quality comparison compared to other methods. In the following sections, our proposed architecture provides a favorable trade-off in terms of computational cost and visual quality improvement.

Performance comparison in terms of the kernel size
The size as well as the type of the convolution kernel plays a key role in terms of the model size and computational cost. In Fig 9; we have selected the two different convolution kernels, one is regular convolution kernel and the other is a depthwise separable convolution kernel, with the same 64 number of feature maps. Performance of our proposed depthwise separable convolution kernel is more computationally efficient as compared to the regular convolution kernel.

Comparison in terms of the number of the model parameters
We have presented the complexity of the model related to network depth (number of parameters) versus PSNR [81] as shown in Fig 10. By using the depthwise separable convolution layer, our proposed model decreases the number of parameters as compared to other publicly available methods. Our MXDSIR method has parameters about 66% less than the VDSR [32], 87%

Quantitative comparison in terms of run time versus PSNR
In this part, as shown in Fig 11; we have evaluated our method in terms of running or execution time versus PSNR [81]. As for the execution of time performance is concerned, we have used the public access codes given by the authors to evaluate the state-of-the-art methods with 2.6 GHz Ci7-9750H CPU 16GB RAM. The comparative analysis between the execution of time and performance on the Set5 [84] dataset for 8× SR reveals that our method is 0.16 dB higher than LapSRN [44] on PSNR [81] and, approximately, 10 times faster than LapSRN [44].

Conclusion
In this paper, we have presented fast and computationally efficient Xception based residual CNN network architecture for image SR to extract the features information locally as well as globally from the input LR image, and to generate the HR output image. The proposed network architecture used the two ResNet blocks and three Xception block, which is adopted from the ResNet and GoogLeNet to recover several features during the extraction and reconstruction stages. The proposed technique ensured that the network shows fast convergence speed and low computational cost, by replacing the interpolation technique with the learned transposed convolution layer and regular convolution operation with the depthwise separable convolution. Furthermore, our network architecture is relatively simple and well designed for images and computer vision tasks. Extensive experimental results on different image datasets not only provides satisfactory results on the performance of image SR quantitatively but also have favorable results in terms of complexity and provided visual pleasing quality as compare to the existing state-of-the-art SR methods.