Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Full-Reference Image Quality Assessment with Linear Combination of Genetically Selected Quality Measures

Full-Reference Image Quality Assessment with Linear Combination of Genetically Selected Quality Measures

  • Mariusz Oszust
PLOS
x

Abstract

Information carried by an image can be distorted due to different image processing steps introduced by different electronic means of storage and communication. Therefore, development of algorithms which can automatically assess a quality of the image in a way that is consistent with human evaluation is important. In this paper, an approach to image quality assessment (IQA) is proposed in which the quality of a given image is evaluated jointly by several IQA approaches. At first, in order to obtain such joint models, an optimisation problem of IQA measures aggregation is defined, where a weighted sum of their outputs, i.e., objective scores, is used as the aggregation operator. Then, the weight of each measure is considered as a decision variable in a problem of minimisation of root mean square error between obtained objective scores and subjective scores. Subjective scores reflect ground-truth and involve evaluation of images by human observers. The optimisation problem is solved using a genetic algorithm, which also selects suitable measures used in aggregation. Obtained multimeasures are evaluated on four largest widely used image benchmarks and compared against state-of-the-art full-reference IQA approaches. Results of comparison reveal that the proposed approach outperforms other competing measures.

Introduction

Visual information is often a subject of many processing steps, e.g., acquisition, enhancement, compression, or transmission. After processing, some information carried by the content of the image can be distorted. Therefore, its quality should be evaluated from a human perception point of view. There are three categories of image quality assessment (IQA) measures (metrics or models), depending on availability of a pristine, i.e., distortion-free, image: (1) full-reference, (2) no-reference, and (3) reduced-reference models. In this paper, the full-reference approach is considered, in which for each distorted image in a benchmark dataset its reference image is provided.

Application of peak signal-to-noise ratio (PSNR) is one of the simplest approaches to IQA. However, an output of PSNR is not well correlated with human evaluation; therefore this technique often serves as a bottom model for comparison. In [1], Damera-Venkata et al. presented noise quality measure (NQM) in which a distorted image is modelled using a linear frequency distortion and an additive noise injection. Wang et al. [2] introduced universal image quality index (UQI). UQI evaluates quality of an image using loss of correlation, luminance distortion, and contrast distortion. Further extension of UQI, structural similarity (SSIM), was proposed by Wang et al. [3]. A multi-scale SSIM, MSSIM, was presented in [4]. Wang and Li in [5] proposed information content weighted SSIM (IW-SSIM) approach as an extension of MSSIM. In that work, local information was measured using statistical models of natural scenes. Statistical properties of natural environment are also utilised in visual information fidelity (VIF) [6] measure and information fidelity criterion (IFC) [7]. In [8], Riesz-transform based feature similarity (RFSIM) was proposed. The measure is computed by comparing Riesz-transform features at key locations between the distorted image and its reference image. Authors of feature similarity index (FSIM) [9] developed an approach which uses phase congruency and image gradient magnitude as low-level local features. FSIMc is a version of FSIM developed for processing colour images. In [10], spectral residual based similarity (SR-SIM) using visual saliency map was proposed. A visual saliency to calculate a local quality map of the distorted image is used in visual saliency-induced index (VSI) [11]. The gradient similarity (GSM) measure [12] estimates image quality taking into consideration structure and contrast changes, as well as luminance distortions. In [13], image structural degradation was considered and determined using local binary patterns. In SURF-SIM [14], multiscale differences between features detected and described by Speed Up Robust Features (SURF) approach are combined with a pooling strategy. An IQA measure that evaluates images taking into account inter-patch and intra-patch similarities was described in [15]. In that work, authors used modified normalised correlation coefficient and image curvature.

Development of full-reference IQA measures can also involve different fusion strategies. For example, Liu and Yang [16] combined SNR, SSIM, VIF, and VSNR using canonical correlation analysis. A most apparent distortion algorithm (MAD) [17] adopts two strategies for IQA. In that approach, a local luminance and a contrast masking evaluate high-quality images. Changes in the local statistics of spatial-frequency components are used for images with a low quality. Three IQA metrics, MS-SIM, VIF and R-SVD, were non-linearly combined by Okarma in [18, 19]. A non-linear fusion of IQA measures was also investigated in [20]. In [21], up to seven IQA models were combined using a regularised regression. Peng and Li in [22] presented an approach based on conditional Bayesian mixture of experts model. In that paper, a support vector machines classifier was used for prediction of the type of distortion, and then SSIM, VSNR, and VIF with k-nearest-neighbour regression were fused. Authors in their other paper, [23], presented and adaptive combination of IQA measures with an edge-quality based on preservation of edge direction. In [24], a combination of local and global distortion measures was considered using saliency maps, gradient and contrast information.

Recently, many complex fusion approaches have been introduced, and therefore, the main contribution of this paper is to show that a solution based on linear combination, which, together with a genetic algorithm, is able to find well-performing fusion of IQA measures. Apart from comparison of different approaches performed in accordance to a widely accepted protocol, the paper provides some insights on a selection of IQA techniques which are likely to be fused. In this paper, a decision fusion of 16 full-reference IQA measures is defined as an optimisation problem of finding weights in a weighted sum of their outputs. A genetic algorithm finds the solution that minimises root mean square error (RMSE) of prediction performance. The number of used measures and parameters of the regression model for fitting objective scores to subjective scores prior to RMSE calculation are found by the algorithm. Finally, the proposed approach is evaluated on four largest IQA image benchmarks and compared with the state-of-the-art approaches.

The rest of this paper is organised as follows. In the section Methods, a formulation of the optimisation problem and the development of the proposed approach are presented. Experimental results with related discussions are covered in the section Results and Discussion. Finally, the last section concludes the paper.

Methods

Since digital processing can alter an appearance of the image and that may lead to different opinions on its quality, many IQA algorithms have been proposed for automatic assessment [25]. In order to compare IQA approaches, specific image databases have been proposed. They contain reference images, their corresponding distorted images, and ground-truth information obtained from human observers. Information on the perceived quality is reported as mean opinion scores (MOS values) or differential mean opinion scores (DMOS values).

The desired IQA metric should produce objective scores which are consistent with human ratings (subjective scores). In this work, it is assumed that joint metric can provide better results, in terms of prediction quality, than a single metric that contributes to the multimeasure.

Let Q be an output of an aggregated decision of n IQA measures, where nN. It can be expressed as: (1) where A is an aggregation operator. The operator often has a form of a weighted sum [2628], therefore Q can be expressed as follows: (2) where x = [x1, x2, …, xn] denotes a vector of weights, . The vector x contains decision variables in an optimisation problem of finding an effective fusion of IQA measures. Since many fusions can be proposed, a given x should be evaluated. For this purpose one of typically used IQA measures quality evaluation indices can be used. In order to measure consistency of the output of the examined IQA model with human assessment, the following indices of prediction accuracy, monotonicity, and consistency are often considered [29, 30]: Spearman Rank order Correlation Coefficient (SRCC), Kendall Rank order Correlation Coefficient (KRCC), Pearson linear Correlation Coefficient (PCC), and Root Mean Square Error (RMSE). Evaluation indices are calculated after a nonlinear mapping between a vector of objective scores, Q, and MOS or differential MOS (DMOS), S, using the following mapping function for the nonlinear regression [30]: (3) where β = [β1, β2, …, β5] are parameters of the regression model [29], and Qp is a mapped equivalent of Q. SRCC is calculated as follows: (4) where di is the difference between ith image in Q and S, and m is the total number of images. KRCC, in turn, uses the number of concordant pairs in the dataset, mc, and the number of discordant pairs in the dataset, md. It is illustrated by Eq (5). (5) PCC is defined as: (6) where, and denote mean-removed vectors. RMSE is given by Eq (7). (7)

Higher SRCC, KRCC, and PCC values are considered better, in contrary to the values of RMSE.

One of these performance indices could be used as an objective function in a considered optimisation problem. Preliminary experiments revealed that maximisation of SRCC or KRCC may lead to fusion providing unacceptably high RMSE values. On the other hand, RMSE requires determination of β. Finally, RMSE was used as the objective function in the considered problem (Eq (8)), and β components were considered as decision variables in addition to the weights of fused IQA measures. (8)

Linear combination may produce negative weights which can be unintuitive in terms of contribution of IQA measures that take part in the aggregation. Therefore, different combination types were considered starting from convex combination, in which weights are positive and their sum is equal one, affine combination with preserved sum condition, or conical combination with positive weights. Preliminary results confirmed that the proposed approach provides best performance without constraining the weights.

In this paper, an optimisation-based fusion was performed using N = 16 IQA measures with publicly available source code. The following techniques were used: VSI [11], FSIM [9], FSIMc [9], GSM [12], IFC [7], IW-SSIM [5], MAD [17], MSSIM [4], NQM [1], PSNR [29], RFSIM [8], SR-SIM [10], SSIM [3], VIF [6], IFS [31], and SFF [32]. In the proposed approach, the vector of decision variables, x, is obtained in a data-driven fashion. Since there are four largest widely used IQA image benchmarks, in this paper four IQA fusion measures are introduced. For this purpose, 20% of the reference images from the given dataset along with their distorted counterparts were used for training. In the literature, sometimes more images were utilised in order to tune parameters of developed methods, e.g., 30% [9, 11], 80% [13], or parameters were generated for each image dataset separately [13, 2123]. Some approaches used images from all datasets for this purpose [15]. In order to show dataset-independent results, each fusion measure developed in this paper was evaluated on all datasets.

Finally, the vector , where d denotes a dataset, was obtained in the following steps: (1) Selection of the 20% reference images from a given dataset and their distorted equivalents; (2) Evaluation of images using N = 16 full-reference IQA measures; (3) Selection of nN IQA measures, finding weights of linear combination of their opinion scores and β. Objective scores of used measures, if needed, were scaled to be in a 0-1 range.

The optimisation problem was solved using a genetic algorithm (GA) [28, 33], since the number of possible solutions grows exponentially with the number of used IQA metrics. The GA uses a population of individuals, where each individual represents a single solution. Then, from generation to generation, after applying selection, crossover and mutation operators, better solutions are emerging. The GA was run for 200 generations, with a population of 100 individuals, elite count equal to 0.05 of the population size, and 0.8 crossover fraction. Scattered crossover, Gaussian mutation and stochastic uniform selection rules were used [33]. All presented calculations were performed using Matlab software (version 7.14) with GA Toolbox [34]. After 100 runs, the best solution, , was selected. The individual in the proposed solution is represented by real-valued vector, where dimensions refer to weights of IQA measures, x, and β values. Parameters of the GA were determined experimentally observing convergence of the objective function over the generations.

Fig 1 presents a flowchart of the approach with a process in which the introduced fusion measure is obtained and its usage for image quality assessment.

thumbnail
Fig 1. Flowchart of the proposed approach.

In an offline training process, the proposed approach is obtained using some of images from a benchmark dataset. Images are assessed by full-reference IQA measures. Then, a genetic algorithm selects IQA measures and assigns weights to them. Obtained weights for linear combination of selected measures are used in image quality assessment tasks.

https://doi.org/10.1371/journal.pone.0158333.g001

In experiments, the following four image benchmarks were used: TID2013 [35], TID2008 [36], CSIQ [17], and LIVE [3]. The number of reference images, distortions, and subjects for each dataset are shown in Table 1. Each database contains reference images, their corresponding distorted images and subjective scores.

Finally, four IQA measures, namely Linearly Combined Similarity Measures (LCSIMs), were obtained: (9) (10) (11) (12)

Their corresponding β components are as follows:

Results and Discussion

This section presents experimental evaluation of the proposed approach in comparison with state-of-the-art techniques, as well as discussion on influence of the aggregated IQA measures and β on resulting fusion models.

Comparative evaluation

For evaluation, four largest image benchmarks (TID2013, TID2008, CSIQ, and LIVE) and four performance indices (SRCC, PCC, KROCC, RMSE) were used.

Table 2 presents evaluation results for the best ten models and LCSIMs. The top two models for each criterion are shown in boldface. The table also contains direct and weighted averages of obtained values. For the weighted average, the number of images in the database is used as its weight. Overall results for RMSE do not take into account LIVE dataset due to range difference.

thumbnail
Table 2. Performance comparison of resulted fusion measures with IQA models that were used in optimisation.

https://doi.org/10.1371/journal.pone.0158333.t002

The obtained results show that LCSIM3 clearly outperformed other measures, since it yielded the best results on LIVE and CSIQ. It was also the second best measure on TID2008 dataset, after LCSIM2. LSIM1 outperformed other measures on TID2013. Overall results are biased towards techniques that performed well on TID2013, which is the largest benchmark, i.e., LCSIM1, VSI, and IFS. Among results obtained by measures that took part in the LCSIM1 fusion, VSI and MAD are worth noticing. Such good performance of LCSIM family should be confirmed using statistical significance tests. In order to evaluate statistical significance of obtained IQA models, hypothesis tests based on the prediction residuals of each measure after non-linear mapping were conducted using left-tailed F-test [17]. In the test, smaller residual variance denoted the better prediction. Table 3 presents results of these tests, where a symbol “1”, “0” or “-1” denotes that the IQA fusion measure in the row is statistically better with a confidence greater than 95%, indistinguishable, or worse than the IQA measure in the column.

Significance tests confirm good performance of the developed family of multimeasures. LCSIM3 was significantly better than other measures on TID2013, LIVE and CSIQ databases. Its results on TID2013 were also good. However, since it was developed using information carried by scores being a reflectance of the dataset which do not contain many of distortions that are present in CSIQ benchmark, its opinion scores were less correlated in this case than scores of VSI, FCSIM, or IFS. Consequently, LCSIM that was obtained on TID2013 (LCSIM1) performed worse than other measures on LIVE benchmark.

Fig 2 presents the scatter plots for LCSIM3 and the two best performing IQA models for each benchmark. It can be seen that compared models for databases other than TID2013 yielded less accurate quality predictions for large DMOS values and small MOS values (i.e., in presence of severe distortions) than LCSIM3. Fig 3, in turn, contains absolute values of the difference between subjective scores and objective scores for the five best IQA measures after nonlinear fitting (Eq (3)). Here, the values were obtained for 50 images from the most popular LIVE dataset. The figure shows how scores obtained by IQA measures differ from the expected scores; smaller values are considered better. It can be seen that the introduced fusion measure, LCSIM3, returned scores which are visibly closer to subjective scores obtained in tests with human subjects. This is also confirmed by RMSE values reported for this dataset.

thumbnail
Fig 2. Scatter plots of subjective opinion scores against scores obtained by the two best IQA measures and LCSIM3 on used datasets.

Different types of distortions are represented by different colours; the set of colours is coherent within a dataset. Curves fitted with logistic functions are also shown.

https://doi.org/10.1371/journal.pone.0158333.g002

thumbnail
Fig 3. Absolute values of the difference between objective scores and nonlinearly fitted subjective scores for 50 exemplary images from LIVE dataset.

For each image, a smaller value denotes objective assessment which is closer to human evaluation.

https://doi.org/10.1371/journal.pone.0158333.g003

The proposed family of multimeasures aggregates different IQA measures. Therefore, it is worth examining their time- and memory-consumption. The processing time and memory requirements have been determined for all aggregated IQA measures assessing an exemplary image from TID2013 dataset. The results are shown in Table 4. It can be seen that MAD and VIF are the most demanding techniques. Taking into account that processing time requirements for image quality assessment algorithms are less demanding than for video quality assessment techniques, obtained timings on ordinary 2200MHz CPU seem to be acceptable. LCSIMs aggregate several IQA measures; therefore, their running time will be longer in case of sequential execution of used measures or close to the execution time of MAD measure in case of more memory-consuming parallel implementation.

thumbnail
Table 4. Time and memory costs of IQA measures used in the optimisation.

https://doi.org/10.1371/journal.pone.0158333.t004

It would be desirable to compare the proposed multimeasures with other related fusion IQA measures. Table 5 contains such comparative evaluation based on SRCC values. SRCC was used as a basis for comparison since many papers do not report other performance indices. Two best results for a given benchmark dataset are written in boldface, some results were not reported in referred works; therefore, they are denoted by “-”. IQA measures which were developed using images from the benchmark in the column are excluded from the comparison. Moreover, overall results were calculated excluding TID2013 since some measures have not been evaluated on it. Furthermore, in order to provide fair comparison, overall results exclude works in which authors obtained a separate IQA measure for each benchmark without providing cross-database evaluation, e.g., [18, 19, 2123], or [37]. Results for approaches that are not dataset independent are written in italics.

thumbnail
Table 5. Comparison of the approach with other fusion IQA measures based on SRCC values.

https://doi.org/10.1371/journal.pone.0158333.t005

Evaluation results show that LCSIM3 and LCSIM2 outperformed other approaches which use fusion of IQA measures. Among other measures, DOG-SSIM and ESIM provided good results on TID2013 benchmark, and the approach developed by Barri et al. turned out to be the second best technique on CSIQ dataset. Outstanding performances of LCSIM3 and LCSIM2 are also confirmed by overall results. Here, they are followed by ESIM, LCSIM4, LCSIM1, and DOG-SIM. Most of these models were trained on TID2008, except LCSIM3 that was trained on images from CSIQ. This happened since all three most popular datasets share the same types of distortions.

Influence of parameters and IQA measures on fusion

The already presented results confirm good performance of obtained IQA fusion measures in comparison with state-of-the-art fusion and single IQA measures. However, it would be desirable to answer why some measures took part in the fusion more often than others. A contribution of aggregated models also requires some attention, since the linear combination can produce unintuitive negative weights.

At first, in order to show the contribution of a given measure, SRCC values between objective and subjective scores were obtained for each distortion type. This may explain why some measures were involved in a fusion, and also show how well perform developed LCSIMs in comparison with IQA measures that were used in optimisation, from distortion type point of view. Table 6 contains SRCC values of the best ten IQA models and LCSIMs obtained on benchmark datasets. The two best IQA measures for each distortion type are written in boldface.

Results for distortion types reveal that VSI, FSIMc, GSM, VIF, IFS, and SFF are among best single IQA models. They also were often a part of fusion models, what can be seen in Eqs (9)–(12). Here, LCSIM family was better or close to best IQA models and showed outstanding performance on CSIQ dataset. In order to provide further investigation why some measures were fused together, SRCC values between IQA models on CSIQ dataset were obtained. They are shown in Table 7. This time correlation sign was preserved, since it may suggest why some measures have negative weights in fusions. Negative correlations can also be seen on Fig 2. Similar pairwise relations between IQA models were noticed on other datasets. It can be seen that some measures are less correlated with each other while preserving good correlation with subjective scores. VIF is the less correlated measure with MSSIM and MAD, all these measures perform well on CSIQ dataset. SRCC values for these measures are written in boldface in the Table 7. IQA measures in pairs MAD—VIF and MSSIM—VIF are complementary and thus likely to be fused together.

These findings were confirmed in an experiment in which a predefined number of IQA measures, kN, could take part in the fusion. Such reduced fusion models are helpful to determine the contribution of each fused measure. In the experiment, k varied from 2 to 5. In order to estimate the influence of the IQA measure on the results obtained by the fusion model, the percentage decrease of RMSE without the measure was calculated. Table 8 contains such reduced LCSIMs for CSIQ dataset, their RMSE values, and contributions. The table also contains LCSIM3, since it was developed on images from CSIQ.

thumbnail
Table 8. Results of the experiment with predefined number of aggregated IQA models on CSIQ dataset.

https://doi.org/10.1371/journal.pone.0158333.t008

Results shown in Table 8 confirm that the IQA measures that achieve good performance on CSIQ dataset and are less correlated with each other, are likely to be aggregated. In obtained fusion measures, weights do not reflect well the contribution of selected IQA measures, what can be seen in case of three (k = 3) fused models, where VIF and MAD with lower weights contributed more than SFF. The sign of the weight depends on correlation of the measure with objective scores (MOS or DMOS) but it can also be used as compensation, making the resulting vector of objective opinion scores closer to the vector of subjective scores, since the optimisation utilise RMSE between them for finding better aggregated models.

It is worth noticing that RMSE results obtained by all measures developed in experiments with the predefined number of IQA measures are better than results of state-of-the-art approaches on this dataset (see Table 8).

MAD, VIF, and MSSIM contributed the most to LCSIM measures obtained on CSIQ dataset. This can also be observed for the remaining LCSIM measures, where the best contributing three IQA single models are as follows: MAD (19.76%), IFS (16.90%), and PSNR (16.71%) to LCSIM1, VIF (15.87%), MAD (8.31%), and SSIM (4.49%) to LCSIM2, VIF (38.54%), MAD (33.87%), and GSM (4.22%) to LCSIM4.

The β used in calculation of RMSE (and PCC) also influenced the results. In order to show its influence, each β component, β = [β1, β2, …, β5], determined in optimisation for a given LCSIM was changed in the range 0.1 to 20 with the step 0.1, while other components remained unchanged. Table 9 presents minimum, maximum, mean and standard deviation of RMSE values for each component calculated on benchmark datasets. It can be seen that β4 has the largest influence on LCSIM1, β2 on LCSIM2, β3 on LCSIM3, and all components are similarly important to LCSIM4.

Conclusions

In this paper, a multimeasure resulted from a fusion of full-reference IQA measures is presented. The fusion was formulated as an optimisation problem that was solved using the genetic algorithm, which was also responsible for selection of appropriate IQA measures. Evaluation of the proposed approach on widely used four largest image benchmarks reveals that LCSIM family of measures performs better than compared state-of-the-art IQA models, in terms of prediction quality reflected by SRCC, KRCC, PCC, and RMSE. The contribution of aggregated IQA measures was also investigated in the paper.

Further extension of the approach could involve using other IQA measures for fusion; therefore, Matlab source code that would allow running the optimisation with any newly developed measure with known objective scores for used image benchmarks and evaluate the results, is available to download at http://marosz.kia.prz.edu.pl/LCSIM.html. Another direction of future research would be to develop a fusion measure oriented on a given type of distortion or a measure which aggregates full-reference IQA measures with small memory footprint and short computation time.

Author Contributions

Conceived and designed the experiments: MO. Performed the experiments: MO. Analyzed the data: MO. Contributed reagents/materials/analysis tools: MO. Wrote the paper: MO.

References

  1. 1. Damera-Venkata N, Kite TD, Geisler WS, Evans BL, Bovik AC. Image quality assessment based on a degradation model. IEEE T Image Process. 2000 Apr;9(4):636–650.
  2. 2. Wang Z, Bovik AC. A universal image quality index. IEEE Signal Proc Let. 2002 Mar;9(3):81–84.
  3. 3. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: From error visibility to structural similarity. IEEE T Image Process. 2004 Apr;13(4):600–612.
  4. 4. Wang Z, Simoncelli EP, Bovik AC. Multi-scale structural similarity for image quality assessment. In: Proc. IEEE Int. Conf. on Signals, Systems, and Computers, (ASILOMAR); 2003. p. 1398–1402.
  5. 5. Wang Z, Li Q. Information content weighting for perceptual image quality assessment. IEEE T Image Process. 2011 May;20(5):1185–1198.
  6. 6. Sheikh HR, Bovik AC. Image information and visual quality. IEEE T Image Process. 2006 Feb;15(2):430–444.
  7. 7. Sheikh HR, Bovik AC, de Veciana G. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE T Image Process. 2005 Dec;14(12):2117–2128.
  8. 8. Zhang L, Zhang L, Mou X. RFSIM: A feature based image quality assessment metric using Riesz transforms. In: Proc. IEEE Int. Conf. on Image Processing (ICIP). IEEE; 2010. https://doi.org/10.1109/icip.2010.5649275
  9. 9. Zhang L, Zhang L, Mou X, Zhang D. FSIM: A feature similarity index for image quality assessment. IEEE T Image Process. 2011 Aug;20(8):2378–2386.
  10. 10. Zhang L, Li H. SR-SIM: A fast and high performance IQA index based on spectral residual. In: Proc. IEEE Int. Conf. on Image Processing (ICIP). IEEE; 2012. https://doi.org/10.1109/icip.2012.6467149
  11. 11. Zhang L, Shen Y, Li H. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE T Image Process. 2014 Oct;23(10):4270–4281.
  12. 12. Liu A, Lin W, Narwaria M. Image quality assessment based on gradient similarity. IEEE T Image Process. 2012 Apr;21(4):1500–1512.
  13. 13. Wu J, Lin W, Shi G. Image quality assessment with degradation on spatial structure. IEEE Signal Proc Let. 2014 Apr;21(4):437–440.
  14. 14. Wang F, Sun X, Guo Z, Huang Y, Fu K. An object-distortion based image quality similarity. IEEE Signal Proc Let. 2015 Oct;22(10):1534–1537.
  15. 15. Zhou F, Lu Z, Wang C, Sun W, Xia ST, Liao Q. Image quality assessment based on inter-patch and intra-patch similarity. PLoS ONE. 2015 Mar;10(3):e0116312. pmid:25793282
  16. 16. Liu M, Yang X. A new image quality approach based on decision fusion. In: Ma J, Yin Y, Yu J, Zhou S, editors. Proc. Int. Conf. on Fuzzy Systems and Knowledge Discovery (FSKD). IEEE; 2009. p. 10–14. https://doi.org/10.1109/FSKD.2008.469
  17. 17. Larson EC, Chandler DM. Most apparent distortion: Full-reference image quality assessment and the role of strategy. J Electron Imaging. 2010;19(1):011006.
  18. 18. Okarma K. Combined full-reference image quality metric linearly correlated with subjective assessment. In: Artificial Intelligence and Soft Computing. Springer; 2010. p. 539–546. https://doi.org/10.1007/978-3-642-13208-7_67
  19. 19. Okarma K. Extended Hybrid Image Similarity—combined full-reference image quality metric linearly correlated with subjective scores. Elektronika ir Elektrotechnika. 2013;19(10):129–132.
  20. 20. Oszust M. Decision fusion for image quality assessment using an optimization approach. IEEE Signal Proc Let. 2016 Jan;23(1):65–69.
  21. 21. Lahouhou A, Viennet E, Beghdadi A. Selecting low-level features for image quality assessment by statistical methods. CIT. 2010;18(2).
  22. 22. Peng P, Li ZN. A mixture of experts approach to multi-strategy image quality assessment. In: Campilho A, Kamel M, editors. Image Analysis and Recognition. vol. 7324 of Lecture Notes in Computer Science. Springer Berlin-Heidelberg; 2012. p. 123–130. https://doi.org/10.1007/978-3-642-31295-3_15
  23. 23. Peng P, Li ZN. Regularization of the structural similarity index based on preservation of edge direction. In: Systems, Man, and Cybernetics (SMC), 2012 IEEE International Conference on; 2012. p. 2127–2132. https://doi.org/10.1109/ICSMC.2012.6378054
  24. 24. Saha A, Wu QMJ. Full-reference image quality assessment by combining global and local distortion measures. CoRR. 2014;abs/1412.5488. Available from: http://arxiv.org/abs/1412.5488 (Accessed 09.01.2016).
  25. 25. Chandler DM. Seven challenges in image quality assessment: Past, present, and future research. ISRN Signal Processing. 2013;2013:53.
  26. 26. Ghosh J. Multiclassifier systems: Back to the future. In: Multiple classifier systems. Springer; 2002. p. 1–15. https://doi.org/10.1007/3-540-45428-4_1
  27. 27. Kittler J, Hatef M, Duin RP, Matas J. On combining classifiers. IEEE Trans Pattern Anal Mach Intell. 1998;20(3):226–239.
  28. 28. Kuncheva L, Jain LC. Designing classifier fusion systems by genetic algorithms. IEEE T Evolut Comput. 2000;4(4):327–336.
  29. 29. Sheikh HR, Sabir MF, Bovik AC. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE T Image Process. 2006 Nov;15(11):3440–3451.
  30. 30. Group VQE. Final report from the video quality experts group on the validation of objective models of video quality assessment, Phase II (FR_TV2). 2003. Available from: https://www.itu.int/md/T01-SG09-C-0060 (Accessed 09.01.2016).
  31. 31. Chang HW, Zhang QW, Wu QQ, Gan Y. Perceptual image quality assessment by independent feature detector. Neurocomputing. 2015;151, part 3:1142–1152.
  32. 32. Chang HW, Yang H, Gan Y, Wang MH. Sparse feature fidelity for perceptual image quality assessment. IEEE T Image Process. 2013 Oct;22(10):4007–4018.
  33. 33. Goldberg DE. Genetic algorithms in search, optimization and machine learning. 1st ed. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.; 1989.
  34. 34. MATLAB version 7.14 (R2012a) Natick, MA, USA: The MathWorks, Inc.; 2012.
  35. 35. Ponomarenko N, Jin L, Ieremeiev O, Lukin V, Egiazarian K, Astola J, et al. Image database TID2013: Peculiarities results and perspectives. Signal Process-Image. 2015 Jan;30:57–77.
  36. 36. Ponomarenko N, Lukin V, Zelensky A, Egiazarian K, Carli M, Battisti F. TID2008—A database for evaluation of full-reference visual quality assessment metrics. Advances of Modern Radioelectronics. 2009;10:30–45.
  37. 37. Liu TJ, Lin W, Kuo CCJ. Image quality assessment using multi-method fusion. IEEE T Image Process. 2013 May;22(5):1793–1807.
  38. 38. Li S, Zhang F, Ma L, Ngan KN. Image quality assessment by separately evaluating detail losses and additive impairments. IEEE T Multimedia. 2011 Oct;13(5):935–949.
  39. 39. Jin L, Egiazarian K, Kuo CCJ. Perceptual image quality assessment using block-based multi-metric fusion (BMMF). In: Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP); 2012. p. 1145–1148. https://doi.org/10.1109/ICASSP.2012.6288089
  40. 40. Wu J, Lin W, Shi G, Liu A. Perceptual quality metric with internal generative mechanism. IEEE T Image Process. 2013 Jan;22(1):43–54.
  41. 41. Barri A, Dooms A, Jansen B, Schelkens P. A locally adaptive system for the fusion of objective quality measures. IEEE T Image Process. 2014 Jun;23(6):2446–2458.
  42. 42. Pei SC, Chen LH. Image quality assessment using human visual DOG model fused with random forest. IEEE T Image Process. 2015 Nov;24(11):3282–3292.