Image Quality Assessment Based on Inter-Patch and Intra-Patch Similarity

In this paper, we propose a full-reference (FR) image quality assessment (IQA) scheme, which evaluates image fidelity from two aspects: the inter-patch similarity and the intra-patch similarity. The scheme is performed in a patch-wise fashion so that a quality map can be obtained. On one hand, we investigate the disparity between one image patch and its adjacent ones. This disparity is visually described by an inter-patch feature, where the hybrid effect of luminance masking and contrast masking is taken into account. The inter-patch similarity is further measured by modifying the normalized correlation coefficient (NCC). On the other hand, we also attach importance to the impact of image contents within one patch on the IQA problem. For the intra-patch feature, we consider image curvature as an important complement of image gradient. According to local image contents, the intra-patch similarity is measured by adaptively comparing image curvature and gradient. Besides, a nonlinear integration of the inter-patch and intra-patch similarity is presented to obtain an overall score of image quality. The experiments conducted on six publicly available image databases show that our scheme achieves better performance in comparison with several state-of-the-art schemes.


Introduction
IQA is of great importance in numerous image-based applications and systems, such as image acquisition, transmission, processing, and display. In some image-based systems, IQA schemes can not only be employed to evaluate the system performance but also be embedded in the system optimization, e.g., image restoration in [1]. Although subjective evaluation is the most reliable way of IQA, it is laborious, expensive, and non-embeddable. Therefore, many researchers have paid tremendous attention to objective IQA, whose goal is to design computational models to predict perceptual visual quality automatically and accurately [2][3]. In this paper, we focus on the problem of FR IQA schemes, where a source image is available. This problem is also known as the evaluation of image fidelity or perceived similarity.
The most conventional FR IQA schemes are mean squared error (MSE) and its variations, involving signal-to-noise ratio (SNR) and peak SNR (PSNR). These metrics are popular due to their simple mathematical expressions. Nevertheless, they correlate poorly with perceived quality rating [4], particularly when distortions are content-dependent [5]. Their main problem lies in ignoring features of the human visual system (HVS).
Since the HVS is the ultimate receiver of the majority of visual signal, it seems hopeful to design computational models and systems to simulate functional components in the HVS [6]. The IQA schemes developed based on this point are known as bottom-top or psychophysicsinspired schemes. However, a complete understanding of the HVS, especially the higher level processes therein, has not been provided nowadays due to the complexity [7], although frontend knowledge about the HVS has been extensively studied. Therefore, some researchers prefer to treat the HVS as a black box and build systems with similar input-output relation as the HVS in the sense of IQA. The IQA schemes inspired by this viewpoint are called top-bottom or overarching-based schemes. Of course, the boundary of the above classification is not sharp [2]. The top-bottom models may take psychophysical features of the HVS into account, and vice versa.
Vision researches discover that several octave spacing radial frequency channels exist in the visual pathway [8]. Accordingly, in many IQA schemes [9][10][11][12][13], both the reference and distorted images are decomposed into multiple channels. Fourier decomposition is used in [9] to measure the visual fidelity of encoded images. In [10], discrete cosine transform is employed to provide the maximum visual quality for the minimum bitrate in the design of quantization matrix. The visual SNR (VSNR) metric [11] estimates the distortion of contrast on each wavelet sub-band. In [12], discrete wavelet transform is also adopted before measuring detail loss and additive impairment. In the scheme named as most apparent distortion (MAD) [13], either Fourier transformation or log-Gabor filtering is performed to decompose images according to distortion levels. Some high-level mechanisms of the HVS can also be included in these schemes, e.g., the use of visual property of global precedence in [11] and [13]. A main benefit of channel decomposition is that it is convenient to incorporate the contrast sensitivity function (CSF), which relates the visual sensitivity and spatial frequency of visual stimuli. Nevertheless, the peak of the CSF shifts as viewing distance varies. Therefore, the use of spatial frequency properties of the HVS for IQA tasks is discouraged in [14] because precise knowledge of viewing condition is likely to be unavailable in many practical applications. Furthermore, the computational cost of the schemes with channel decomposition is generally high. Besides the CSF, as the well-known features of the HVS, luminance masking and contrast masking are also widely used in perceptual tasks. For example, a pixel-domain model for just noticeable difference (JND) is introduced in [15] by considering masking effects in texture and edge regions, respectively.
Based on the observation that the HVS is highly adapted to exact structural information, structural similarity (SSIM) index is introduced in [16], where the structural comparison is defined as the correlation coefficient between corresponding patches extracted from the reference and distorted images. This index is attractive due to its mathematical simplicity and effectiveness in predicting subjective rating. Subsequently, some IQA schemes are proposed by extending SSIM, such as multi-scale SSIM [17], complex wavelet SSIM [18], content partitioned SSIM [19], and information content weighting SSIM [20]. In [21], natural scene statistics and mutual information are explored to design the criterion of visual information fidelity (VIF), which has been demonstrated to be equivalent to SSIM [22]. Afterwards, in a number of IQA schemes, various local features are extracted within image patches to measure the distortions. In this paper, we call these features as intra-patch features. A scheme based on the singular value decomposition (SVD) is proposed in [23], where singular values are employed as features. The SVD-based scheme is further improved in [24] by using singular vectors instead of singular values. In [25], sparse feature vectors calculated from independent component analysis serve as the visual responses to image patches. Recently, it has been found that, as an intra-patch feature, image gradient is very effective for IQA tasks [26][27][28][29][30][31][32][33]. In [26] and [27], a modified SSIM is performed on gradient magnitude and edge direction histogram, respectively. Under a similar motivation, the geometric directional distortion model proposed in [28] and [29] is also based on image gradient. In [30], gradient magnitude as well as phase congruency is used to form an objective IQA metric, known as the feature similarity (FSIM) index. In [31], average gradient and edge intensity serve as important parameters to evaluate the quality of remote sensing image. Image gradient can also be incorporated with some psychophysical features of the HVS, including the CSF, perception nonlinearity [32], visibility threshold, masking effects [33], etc. Although the development of objective IQA has been advanced by these schemes, image curvature information is ignored. However, psychophysical studies on scene contour suggest that curvature plays a central role in human perception [34]. In addition, the features are generally extracted and compared within corresponding patches of reference and distorted images. In other words, the visual disparity between a center patch and its spatial neighborhoods is not investigated adequately. In this work, the feature that describes this disparity is named as the inter-patch feature.
In this paper, we propose an FR IQA scheme that compares visual similarity in both the inter-patch and intra-patch ways. The main contributions of this paper are threefold. 1) We investigate the impact of inter-patch feature on the perception of image quality. To this end, a feature vector to describe the disparity between a center patch and its spatial neighborhoods is introduced. The effects of luminance masking and contrast masking are synthetically considered in the design of the inter-patch feature. Moreover, we modify the NCC to make the similarity measurement more reasonable in terms of visual quality. 2) To better measure the perceived similarity within the patches, i.e., the intra-patch similarity, we propose to adaptively use image curvature and gradient comparisons according to local image contents. Specifically, we first partition the image domain into two non-overlapped regions. The partition is based on whether the comparison on image curvature is meaningful. Hence, for one region, merely gradient comparison is performed. For the other region, we compare both the gradient and curvature. 3) An integration strategy to obtain an overall score of image quality is further presented. If the image quality is relative low, we prefer to assign a higher weight to the inter-patch similarity. Otherwise, the intra-patch similarity has a higher weight. The analytical solution to the proposed integration is a nonlinear combination of the inter-patch and intra-patch similarity.

Motivations
In this section, we provide some observations and thoughts on visual perception and cognition. These observations and thoughts enlighten our work described in this paper.
As mentioned in the previous section, it has been verified that image gradient is an effective intra-patch feature for IQA tasks [26][27][28][29][30][31][32][33]. Besides the change in image intensities, the detection of curvature is also a primary goal of visual analysis [35][36]. Image curvature is expected to convey different information from image gradient, since the same gradient does not mean the same curvature, and vice versa. Essentially, image gradient signifies the first-order information of image derivatives whereas curvature represents the second-order derivatives. The researches on vision have shown that curved lines can be discriminated from straight ones within a spacing that is much smaller than the receptive field of ganglion cells and the physical distance of adjacent cones [37]. This phenomenon demonstrates that the HVS is very sensitive to curvature when human beings perceive scene details and detailed shapes [38]. Furthermore, image curvature has been confirmed to play a part in visual recognition. Objects that are sketched by retaining the points with maximum curvature and joining them with straight lines can still be recognized [39].
Moreover, image curvature or second-order derivatives have been successfully adopted in many image processing systems, e.g., the recent work in [40][41][42]. Image curvature is used in [40] to distinguish edges and ridges for image interpolation. In our previous work [41], firstorder and second-order derivatives are unified in the framework of multi-surface fitting for image super-resolution. In [42], curvature is used to accurately locate the eye center in facial images.
Therefore, in this paper, we employ image curvature as a complement of image gradient to measure the intra-patch similarity for the FR IQA task.
In addition to the intra-patch similarity, we further take advantage of the inter-patch similarity, which is the similarity index of inter-patch features. There exists an intriguing clinical phenomenon known as visual agnosia [43], which refers to the impairment in recognition of visually presented objects. There are two types of visual agnosia, i.e., apperceptive agnosia and associative agnosia. Neither of them can be attributed to the defects in vision, intelligence, or memory. The patients that suffer from apperceptive agnosia can well distinguish the local features of the visual information from the retina. However, they are unable to correctly perceive adjacent features, and thus meaningful objects cannot be correctly formed [44]. This phenomenon implies that the inter-patch feature is crucial in visual cognition.
Recent researches on image recognition and pattern analysis also demonstrate the importance of inter-patch feature in visual cognition [45,46]. Dominant neighborhood structure is utilized for texture classification purpose in [45], where the inter-patch feature is calculated as the vector with Euclidean distances in luminance space. In [46], the feature of adjacent patches is estimated by ridge regression to extract structural information for the problem of face recognition.
Furthermore, human beings perceive scene in a fashion of non-uniform sampling, and the region around central focus point is perceived accurately without ocular movement [47]. The fovea in the inner retinal surface is responsible for the high-accuracy perception and sees the central visual angle of two degrees [48]. To give an example, we suppose a similar viewing condition as that of the LIVE database [49]: viewing distance is 2-2.5 screen height, and images are displayed at a resolution of 1024 × 768 pixels to fill the screen height. The visual angle V, size of visual field S, and viewing distance D are related by [50] Accordingly, when the visual attention is fixed at a given point, the field of sharp central vision is a region with diameter of approximate 54-67 pixels (larger viewing distance results in greater diameter). However, the size of image patches, from which local features are exacted, generally ranges from 5×5 to 13×13 pixels. In this work, the patches are of size 9×9. That is, in common viewing conditions, the image region that can be perceived simultaneously and accurately is much larger than patch sizes. Therefore, it is reasonable and necessary to investigate the inter-patch distortions.
Motivated by the above observations, we take the inter-patch similarity into account for IQA tasks. It is worthwhile to notice that the inter-patch similarity is completely different from the multi-scale strategy, which changes the image dimension to simulate varied viewing conditions. Actually, the proposed scheme can be embedded into a multi-scale framework by assigning relative importance to different scales [17], although we mainly discuss singlescale schemes.

Methods
Similar to many existing schemes, we first split images into patches so that a graphical map with distortion measures at each pixel position can be obtained. In the simplified expression, each image patch can be represented by its center pixel. Then, we measure the similarity of the inter-patch and intra-patch features, respectively. Finally, we adaptively integrate the results of the two portions into one single score. The overall framework of the proposed IQA scheme is illustrated in Fig. 1, where some blocks will be introduced in detail in the following.

Similarity Index of Inter-Patch Feature
To appropriately measure the inter-patch similarity, we first need to describe the inter-patch feature in the reference and distorted images, respectively.
For the patch x i (denoted in lexicographic order) centered at the i-th pixel x i in the reference image I r , its neighbors are the patches centered at adjacent pixels on a diamond of radius R using the Manhattan distance. For different radius R, the spatial relationship between a patch and its neighbors is illustrated in Fig. 2, where only the center pixels of patches are highlighted. The default value of radius R is 6 in this work, and its impact will be presented at the end of results and discussions. It is easy to find that the number of neighbors N is proportional to the radius R: where η is the proportional coefficient and equals to 4 in the case of Fig. 2. The inter-patch feature is represented by a vector with size of N×1, and each element of the vector is the visual disparity between the current patch and its neighbor. For the patch x i from the reference image I r , we denote the feature vector as v ri . In the perception of visual signal, the luminance masking and contrast masking are two important features of the HVS. The former declares that the disparity in very bright areas is less likely to be visible, and the latter states that the reduction of visibility increases with the strength of the contrast masker [2]. Accordingly, the j-th (1 j N) element of the vector v ri is calculated as where ||•|| 2 denotes l 2 -norm, sgn(•) is the signum function, M is the number of pixels in each patch, the subscript r stands for reference image, μ ri is the mean intensity of the pixels in the patch x i , σ ri is the standard deviation of the pixel intensities in x i , the patch x ij is the j-th neighbor of x i , μ rij is the mean intensity of the patch x ij , and the constant C 1 is used to avoid instability when the denominator is very small. In this work, C 1 is set to MÁ(K 1 ÁL) 2 , where K 1 = 0.01 is a small constant and L is the dynamic range of the pixel intensities. For 8-bit grayscale images, L equals to 255. By using the signum function sgn(•), v ri (j) is positive if the patch x i is averagely brighter than its neighbor x ij . Otherwise, v ri (j) is negative. For the patch y i centered at the i-th pixel y i in the distorted image I d , we can also represent the inter-patch feature of y i as the vector v di . In the same way as Equation 3, the k-th (1 k The center pixel of current patch is indicated by the "black" square, and center pixels of neighbors are denoted by the "grey" squares. where the subscript d stands for distorted image, μ di and σ di are the mean and standard deviation of the pixel intensities in y i , the patch y ik is the k-th neighbor of y i , and μ dik is the mean intensity of y ik . The numerator of Equation 3 and Equation 4 is the total disparity of the pixel intensities in term of square error. In the denominator, the square of the mean intensity is adopted to simulate the effects of luminance masking, and the variance is included to reflect the influence of contrast masking. In addition, when the two masking effects simultaneously exist, the stronger one plays the dominant role. Hence, the maximum operator max(•, •) is further employed to choose the dominant masker.
It is straightforward to employ the NCC to measure the similarity of two vectors. For the vectors v ri and v di , we have where <•, •> is the operator of inner product, and C 2 is a small constant, e.g., 10 -3 , for the stability. The range of the NCC is [-1, 1]. To make the quality score range from 0 to 1, we can linearly map the NCC by However, for IQA tasks, a problem exists in the NCC. From Equation 5 or Equation 6, we can find that the correlation coefficient is rather high if either of the vectors is close to the zero vector. This mathematical phenomenon does not match the visual perception of human beings. Unfortunately, the phenomenon would appear frequently, especially when the distorted image is severely blurred. In severely blurred regions, the disparities of adjacent patches are small. As a consequence, each element of v di approximates zero, and the correlation coefficient between this v di and any v ri is high. Obviously, the distortion is underestimated since the perceptual quality of severely blurred images is generally poor. Therefore, the inter-patch similarity index is estimated by modifying Equation 6 as where S inter (x i , y i ) is the inter-patch similarity for the patch x i and y i . For the modification in Equation 7, we can have the following two observations. Firstly, the inter-patch features are supposed to be similar, i.e., v ri % v di , if the distorted image region has a high quality. In this case, the mapped NCC can well estimate the similarity of v ri and v di . And S inter and the mapped NCC are almost the same. More specially, the range of S inter is still between 0 and 1. And the similarity index S inter reaches 1 if and only if v ri equals to v di . Secondly, when the inter-patch information is severely lost, v di can be regarded as the zero vector. The mapped NCC would underestimate the distortion level, i.e., overestimating the similarity. In this case, S inter is smaller than the mapped NCC, and thus more reasonable. The flowchart of measuring the inter-patch similarity is illustrated in Fig. 3.

Similarity Index of Intra-Patch Features
The tests in [51] show that image content has a significant effect on the perceptive image quality. In order to distinguish image patches with different contents, the IQA scheme based on content-partitioned is introduced in [19], where pairs of image patches are classified into different types according to local gradients. Different types of patches are assigned with disparate weights. Essentially, it is merely a strategy for error pooling. To measure the intra-patch similarity, we also partition the images based on the local image contents. Instead of allocating varied weights, we compare different intra-patch features for different types of patches.
In this sub-section, image patches are first classified into two types based on the local gradient and curvature. The image gradient can be estimated by convolving the image with the gradient operator. Various gradient operators are alternative. Among them, we select the Scharr operator, denoted by F, as the convolution mask by following the suggestion in [30]. Specifically, we calculate the gradients for the reference image I r through where the symbol Ã means the convolution operation, H r and V r are the image gradients along horizontal and vertical directions, respectively. And the gradient magnitude G r is obtained by In terms of isophotes, the image curvature K r is defined as [40] K r ¼ where H Hr , V Hr , and V Vr are the second-order derivatives of I r . H Hr and V Hr are obtained by convolving H r with F and F T , respectively. Similarly, V Vr is calculated as F TÃ V r . In Equation 9 and Equation 10, all the operations are performed in the pixel-wise fashion. For the distorted image I d , we can also calculate the gradient magnitude G d and curvature K d in the same way as that in Equation 9 and Equation 10.
The main principle of the classification for pairs of image patches is whether the similarity comparison on curvature can be legitimately incorporated. We believe that the following two conditions are necessary for the meaningful comparison on image curvature.
• The gradients from both the reference and distorted images should be visible. The definition of curvature in Equation 10 implies that the estimation of curvature is unstable when the gradient is small. Therefore, the comparison on image curvature is unreliable if either the gradient is invisible. To determine the visibility, we make use of the JND model in [15].
• The curvature from either the reference or distorted images should be smaller than 1. Since the reciprocal of curvature is the radius, the curvature larger than 1 means that the radius is smaller than 1. In the context of digital image, the radius smaller than 1 indicates that we are dealing with a "single dot" isophote. Obviously, the comparison on the curvature of a pair of single dots makes little sense.
We label the image regions that meet the above conditions as "Type I", and the remainder regions are labeled as "Type II". Two examples of the image partition are illustrated in Fig. 4.
For the pair of patches {x i , y i } from the "Type II" region, we merely measure the intra-patch similarity by comparing gradient magnitude: ; if x i and y i 2 Type II ð11Þ where S intra (x i ,y i ) is the intra-patch similarity for the patch x i and y i , x i and y i represent the center pixels of x i and y i , and the constant C 3 is used for the stability. In this work, C 3 is set to (K 3 ÁL) 2 , where K 3 = 0.05 is a small constant. For a pair of patches from the "Type I" region, although at least one of the compared curvatures is smaller than 1, the other one still might be larger than 1. As mentioned above, the curvature larger than 1 means a "single dot" isophote. Since all the "single dot" isophotes in the digital images have similar visual patterns, we should not distinguish the curvatures that are larger than 1. Thus, we modify the isophote curvature by where K mr and K md are the adjusted curvatures for the reference and distorted images, respectively. For the pair of patches {x i , y i } from the "Type I" region, we measure the intrapatch similarity using the comparisons on both the gradient magnitude and modified curvature: if x i and y i 2 Type I where C 4 = 10 -4 is a constant for the stability to avoid a nearly zero denominator, and α 1 and α 2 are the parameters to adjust the relative importance of gradient and curvature. In this work, we simply set α 1 = α 2 = 0.5 to give them the identical importance. By observing the forms of Equation 11 and Equation 13, we can readily unify them in the following expression: where ξ is the combination parameter that depends on the patch classification. From Equation 14, we can find that the similarity index of intra-patch features is essentially an adaptive  Fig. 4C, the "white" part is the "Type I" region, and the "black" part is the "Type II" region.

Overall Objective Score
In the above sub-sections, we have proposed two similarity indices, denoted as S inter and S intra . The former depicts the inter-patch similarity, and the latter focuses on the intra-patch similarity. We further need to integrate the two portions into an overall measurement. As mentioned in Motivations, the inter-patch distortions would result in the difficulties in visual cognition. Generally, the images with this kind of distortions are of low qualities. Meanwhile, the intra-patch distortions may be merely caused by the disparity of image details. Therefore, we prefer to assign a relative higher weight to the inter-patch similarity index S inter if the image quality is relative poor. Conversely, if the image patch is of good quality, the intrapatch similarity index S intra should have a relative higher weight. Hence, we set the weight of S intra in the integration is proportional to the integrated index S I . Specifically, for the given patch x i and y i , the integrated index S I (x i , y i ) is related with S inter and S intra by: where S I , S inter , and S intra are the abbreviations of S I (x i , y i ), S inter (x i , y i ), and S intra (x i , y i ), respectively, and γ (0<γ 1) is the proportional coefficient to relate the weight and S I . In our experiments, the default value of γ is 0.8. The impact of this coefficient will be discussed at the end of Under conditions that S inter , S intra and γ range from 0 to 1, it is easy to prove that the range of S I calculated by Equation 16 is still from 0 to 1. The illustration in Fig. 6 shows two examples of S inter , S intra , and S I maps. From Fig. 6, we can observe that the responses of S inter and S intra have great differences, e.g., S intra is sensitive to the distortions on the details of sea waves while S inter is not. There exists the complementarity between the inter-patch similarity and the intrapatch similarity. As was expected, the integrated similarity index S I is more comprehensive.
To get a single score q(I r , I d ), we average the integrated index of each pair of patches: where N p is the number of patch pairs in the reference and distorted images.

Results and Discussions
In this section, we will provide the experimental results of the proposed scheme. Here our scheme is merely performed on the luminance component of images in a single-scale fashion.
For color images, we transform them to the grayscale version before quality assessment. To determine the proper scale, we follow the suggestion of Wang [52], i.e., down-sampling the original images by a fact of E: where round(•) is the function to return the nearest integer of its argument, and N h is the number of pixels in image height or width.

Databases and Performance Measures
Our experiments are mainly conducted on six subject-rated and publicly available databases, i.e., TID2008 [53], CSIQ [54], LIVE [49], IVC [55], MICT [56], and A57 [57]. Some important information about these databases, including the numbers of reference and distorted images, is listed in Table 1. In these databases, subjective ratings of all the distorted images are provided to serve as the ground truth in the performance comparison. And the subjective scores are given in the form of either mean opinion score (MOS) or differential mean opinion score (DMOS). To analyse the performance in a common space and remove the nonlinearity caused in the process of subjective rating, we need to apply a regression analysis to non-linearly relate the objective scores, i.e., IQA scheme outputs, and the subjective scores. Following the Video Quality Experts Group tests and validation method [58], we utilize the five-parameter logistic function as: where β 1 , β 2 , β 3 , β 4 , and β 5 are the parameters to be fitted by minimizing the MSE between the mapped values p and the subjective scores. The performance comparisons of the IQA schemes are based on four widely-used criteria, which are known as Spearman rank order correlation coefficient (SROCC), Kendall rank order correlation coefficient (KROCC), Pearson linear correlation coefficient (PLCC), and root mean squared error (RMSE). The first two criteria evaluate prediction monotonicity, and the other two are employed to measure prediction accuracy [58]. Since SROCC and KROCC merely depend on the rank of data, we can calculate them using the IQA scheme outputs q and

Overall Performance Comparison
The scatter plots of the proposed IQA scheme are shown in Fig. 7, where the abscissa denotes the objective scores q and the ordinate is the MOS or DMOS. In Fig. 7, each cross-shaped point represents a pair of reference and distorted images.
In Table 2, the proposed scheme is compared with some classical or state-of-the-art IQA schemes, including PSNR, SSIM [16], VIF [21], VSNR [11], FSIM [30], gradient similarity (GSIM) [33], and sparse feature fidelity (SFF) [25]. The two best results across the eight schemes are highlighted in boldface. From Table 2, we can see that the proposed scheme perform consistently well on the six databases. On TID2008 and MICT databases, our scheme has a clear advantage over the compared schemes. On CSIQ and IVC databases, our scheme is still of the best performance, although its superiorities over SFF on CSIQ database and GSIM on IVC database are slight. On LIVE database, the performance of the proposed scheme is  comparable with that of FSIM and SSF. On A57 database, VSNR has the best performance. However, its performance is relatively poor on other databases. Moreover, the averaged results on six databases are provided in Table 3, where the best results are highlighted in boldface. The average values are calculated in two cases. In the first case, the results are directly averaged regardless of the database size. In the second case, we assign the databases with different weights that are proportional to the number of distorted images in each database (see Table 1 for the specific numbers). It is worthwhile to notice that the ranges of RMSE values are not the same on the six databases. Consequently, we normalize each RMSE value by dividing it by the RMSE  Table 3, we can conclude that the average performance of the proposed scheme is the best.
To evaluate the statistical significance of the performance difference between the proposed metric and its competitors, we conduct F-test on the prediction residuals between the mapped objective scores p and the subjective ratings. Here, the residuals are supposed to be Gaussian. Let F denote the ratio between two residual variances, of which the larger one is set as the numerator. The judgment threshold is denoted as F-critical, whose value is calculated based on the number of residuals and a given confidence level. The values of F-critical with 95% confidence are listed in Table 1 for each database. If F is larger than F-critical, the difference between the two metrics is believed to be significant at the specified confidence level. In Table 4, we compare the proposed metric with other metrics in the context of statistical significances. In each entry of Table 4, the symbol "1", "−", or "0" means that the proposed metric is statistically (with 95% confidence) better, indistinguishable, or worse than the corresponding metric, respectively. Table 4 shows that the proposed scheme statistically outperforms most of compared schemes.
Besides the average and statistical performance, we further evaluate the efficiency of the proposed scheme. The average consuming time for assessing an image in IVC database on a machine with 1.6 GHz Intel core is recorded in Table 5. All the codes are implemented and executed in Matlab environment. As shown in Table 5, the efficiency of our scheme is moderate.

Performance on Individual Distortion Types
In this experiment, we investigate the performance of IQA schemes on different types of image distortion. The databases TID2008, CSIQ and LIVE were used for this testing, because they contain the most commonly encountered distortion types. The SROCC values are list in Table 6, where AWGN is the abbreviation of additive white Gaussian noise. Here we choose SROCC due to its suitability for measuring a small number of data points and independence with nonlinear regression [25]. The two best results are shown in boldface. From Table 6, we can find that the proposed scheme performs very well in most distortion types. For 12 out of 17 distortion types in TID 2008, 5 out of 6 types in CSIQ, and 3 out of 5 types in LIVE, our scheme has the best or the second best performance.

Performance of Each Component
In this sub-section, we will discuss about the inter-patch similarity index S inter and the intrapatch similarity index S intra . For simplicity, we only conducted the testing on TID2008, which is the largest database. The results of each component are given in Table 7, where S intra (ξ 1) represents that the parameter ξ in Equation 14 is consistently set to 1 for both Type I and Type Similarity Measurement and Image Quality Assessment II. By consistently setting ξ to 1, we merely perform gradient comparison for the intra-patch similarity index. Moreover, the performance of SSIM, VSNR, VIF, and GSIM is also listed in Table 7. Based on the results in Table 7, we can have the following observations: First, the performance of the integrated index S I is superior to that of S inter and S intra . This indicates the effectiveness of the integration strategy defined as Equation 16 and the complementarity between S inter and S intra .
Secondly, the inter-patch similarity index S inter can achieve promising results without considering any local features within patches. Specifically, S inter outperforms VSNR, VIF, and SSIM, which are widely-accepted IQA schemes. This demonstrates that the inter-patch information is beneficial to the prediction of image quality.
Thirdly, S intra achieves better results in comparison with S intra (ξ 1) and GSIM, which are merely based on image gradient. Therefore, the incorporation of image curvature is necessary and beneficial.

Impact of Parameter Values
The impacts of parameters on the performance of our scheme are checked in this sub-section. We discuss two important parameters, i.e., R and γ, where R is the radius to determine the neighbors (mentioned before Equation 2) and γ is the proportional coefficient in the integration of S inter and S intra (described in Equation 16). This experiment was conducted on six databases, and the SROCC results of weighted average were recorded. Fig. 8A plots the averaged SROCC values as a function of R, which ranges from 1 to 10 with a step of 1. The value of γ is fixed as 0.8 when R varies. It is expected that the performance for  small R is relatively poor, because the inter-patch feature degrades into the feature within patches when the value of R is small. As can be observed, performance improves with the increase in R. However, the performance improvement is not significant when R is larger than 6. Furthermore, there is a tiny decline in the performance when R reaches 10. In addition, according to Equation 2, the increase in the number of neighbors is four times faster than that in R. It means that the memory space and consuming time increases fast as we enlarge the radius. Hence, it is reasonable to set the default value of R to 6. Fig. 8B plots the averaged SROCC values as a function of γ, which ranges from 0.1 to 1 with a step of 0.1. The value of R is fixed as 6 when γ varies. Based on Fig. 8B, we set the default value of γ to 0.8. It is worthwhile to note that larger γ does not always mean that the weight of S intra is larger than that of S inter , since the weight is the product of γ and S I . Specially, when the image region is of poor quality, S inter still has great effects on S I even if γ equals to 1.

Conclusions
In this paper, an FR IQA scheme based on the inter-patch and intra-patch similarity is introduced. The two similarity indices complement each other. One component aims at measuring the similarity of inter-patch feature, which describes the disparity between a center patch and its spatial neighborhoods. The similarity is measured by the modified NCC to avoid the underestimation of distortions. The other component focuses on measuring the similarity of intrapatch feature. Image patches are classified based on whether the comparison on curvature is reasonable. For one type of patch pairs, we measure the intra-patch similarity on both the curvature and gradient comparison. For another type, merely gradient similarity is included. Moreover, an integration strategy is introduced to get the overall score. A relatively higher weight is assigned to the first component if the image quality is low. Extensive experiments on six publicly available databases show that the proposed scheme is more consistent with subjective evaluations than the compared schemes.
Nevertheless, in this paper, we only measure the similarity on grayscale versions of images. Therefore, the way to take advantage of color information in the proposed scheme needs to be further investigated.