Xi'an Technological University
Subject: Computer Science, Software Engineering
eISSN: 2470-8038
SEARCH WITHIN CONTENT
Minjuan Gao ^{*} / Hongshe Dang ^{*} / Xuande Zhang ^{*}
Keywords : Image Quality Assessment, Human Visual System, Upper Threshold, Truncating Gradient
Citation Information : International Journal of Advanced Network, Monitoring and Controls. Volume 5, Issue 4, Pages 27-33, DOI: https://doi.org/10.21307/ijanmc-2020-034
License : (CC-BY-NC-ND 4.0)
Published Online: 11-January-2021
Objective image quality assessment (IQA) aims to develop computational models to predict the perceptual image quality consistent with subjective evaluations. As image information is presented by the change in intensity values in the spatial domain, the gradient, as a basic tool for measuring the change, is widely used in IQA models. However, does the change measured by the gradient actually correspond to the change perceived by the human visual system (HVS)? To explore this issue, in this paper, we analyze how the ability of the HVS to perceive changes is affected by the upper threshold, and we propose an IQA index based on an adaptively truncating gradient. Specifically, the upper threshold at each pixel in an image is adaptively determined according to the image content, and the adaptively truncating gradient is obtained by retaining the part of the gradient magnitude that is less than the upper threshold and truncating the part that is greater than the upper threshold. Then, the distorted image quality is calculated by comparing the similarity of the adaptively truncating gradient between a reference image and the distorted image. Experimental results on six benchmark databases demonstrate that the proposed index correlates well with human evaluations.
Image quality assessment deals with the quantitative evaluation of the quality of images and can be widely used in image acquisition, compression, storage, transmission and other image processing systems. Generally, human beings are the ultimate receivers of images. Subjective evaluation by humans is a reliable IQA method, but it is cumbersome and difficult to apply in real-world scenarios. An objective IQA method aims to design mathematical models to automatically measure the image quality in a way that is consistent with human evaluations. According to the availability of ground-truth images, objective IQA indices fall into three categories: full-reference (FR), reduced-reference (RR) and no-reference (NR) models [1]. In this paper, the discussion is focused on FR models.
At present, there are two popular techniques for constructing FR models: knowledge-based and learning-based techniques. The deep learning method learns the evaluation model in an end-to-end manner, and its “black-box” lacks explanation. Furthermore, this approach requires a large number of training samples, but the cost of obtaining high-quality and convincing samples is relatively high. Currently, the commonly used method for obtaining samples is still data augmentation. In this work, we emphasize the knowledge-based approach, which uses knowledge about the HVS to heuristically construct IQA models. Investigating these models reveals that the gradient feature is widely employed. In analyzing the relationship between the gradient feature and the IQA task, the gradient has at least the following two characteristics. 1. The information contained in natural images is presented by changes in intensity value or color in the spatial domain. In extreme cases, the constant image (smoothness) and the pure noise image (variation in all directions) cannot convey any information. Thus, the feature of measuring change is widely used in IQA, with the gradient as the basic tool for measuring change. 2. The judgment of the image quality level in IQA is different from the classic discrimination task. The features for discrimination tasks, such as face recognition and fingerprint recognition, should be robust to image distortion, while the features for IQA should be sensitive to image distortion. The gradient feature is sensitive to image distortion and image content but is weak in robustness.
Representative FR models using the gradient feature include the feature similarity index (FSIM) [2], gradient magnitude similarity deviation index (GMSD) [3], superpixel-based similarity index (SPSIM) [4] and directional anisotropic structure metric (DASM) [5]. In the FSIM and GMSD, the image gradient magnitude is employed as the fundamental feature. SPSIM is computed on the basis of three features: superpixel luminance, superpixel chrominance and pixel gradient. The DASM is obtained by incorporating the gradient magnitude, anisotropy and local directivity features. Objective IQA models are designed by simulating the behaviors of the HVS, which integrates perception, understanding and assessing functions, that is, humans evaluate the image quality in the HVS perception space. Therefore, the features for IQA should be the subjective quantity perceived by the HVS. The gradient is often directly used in IQA models as an effective feature to measure change; however, does the change measured by the gradient actually correspond to that perceived by the HVS? In fact, the change measured by the gradient belongs to the objective quantity (objective physical stimulus), while that perceived by the HVS belongs to the subjective quantity (subjective response). Thus, how can one map the objective quantity to the subjective quantity? This mapping function is nonlinear, and it is difficult to accurately describe its form. Empirically, the ability of the human perception system to sense changes has a certain upper threshold. When the objective change exceeds the upper threshold, the subjective change increases insignificantly in situations such as the human perception of changes in salt-solution saltiness, at an outside temperature, and in the weight of objects carried.
In this paper, we discuss the ability of the HVS to perceive changes affected by the upper threshold by employing the adaptively truncating gradient to measure the change perceived by the HVS. We propose an IQA index based on the adaptively truncating gradient. Specifically, the upper threshold at each pixel in the image is adaptively determined according to the image content, and the adaptively truncating gradient is obtained by retaining the part of the gradient magnitude that is less than the upper threshold and truncating the part that is greater than the upper threshold. Experimental results on public databases show that the proposed index correlates well with the subjective judgments.
The image information is presented by the change in the intensity values in the spatial domain, and this change may be destroyed by degradation of the image quality. The gradient feature can effectively measure the change and is widely used in IQA algorithms. The image gradient can be obtained by convolving the image with a gradient operator, such as Sobel, Roberts and Scharr and Prewitt. Usually, a different gradient operator for the IQA model may yield distinguished performance. This problem was discussed in [2,6], where the experiment results showed that the Scharr operator can obtain a slightly better performance than the others. Here, we adopt a 3×3 Scharr operator whose templates along the horizontal (H) and vertical (V) directions take the following form:
Denote r = [r_{1}, ⋯, r_{1}, ⋯, r_{N}] for a reference image and d = [d_{1},⋯,d_{i},⋯,d_{N}] for a distorted image, where i is the pixel index, and N is the number of total pixels. The image gradients in the horizontal and vertical directions can be obtained by convolution of the image with h_{H} and h_{V}, and the gradient magnitude is computed from their root mean square. The gradient magnitudes of r and d at each pixel i, denoted as G(r, i) and G(d, i) are calculated as
Where the symbol ⊗ denotes the convolution operation.
The image gradient only reflects the objective changes in images. Since human evaluation of image quality is carried out in the HVS perception space, the image features extracted for IQA models should reflect the subjective changes perceived by the HVS. We consider that the ability of HVS to perceive changes is subject to the upper threshold. When the objective change exceeds the upper threshold, the subjective change does not obviously increase. In this study, we define the adaptively truncating gradient to measure the subjective change sensed by the HVS.
Denote T as the upper threshold. We define a truncating function trunc(·). For any given variable x, it is retained when it is less than T and truncated when it is greater than T. The specific expression is
The truncating gradients of r and d at each pixel i are denoted as G_{T}(r, i) and G_{T}(d, i), and the upper threshold at this point is denoted as T(i). Using formula (3), the calculation of G_{T}(r, i) is as follows:
In Eq. (4), if the value of G(r, i) is greater than T(i), then G(r, i) will be truncated, and the truncating gradient G_{T}(r, i) is set to T(i). That is, the part of the gradient magnitude that is greater than the upper threshold is masked. Otherwise, G(r, i) is not be masked, and the truncating gradient G_{T}(r, i) is set equal to G(r, i). That is, the part of the gradient magnitude that is less than the upper threshold can be perceived by the HVS.
Similarly, using formula (3), G(r, i), is calculated as follows:
Obviously, for the calculation of the truncating gradients G_{T}(r, i) and G_{T}(d, i) Eq. (4) and (5), the selection of the upper threshold T(i) is very important. According to Weber’s law, the ratio of the stimulus change that causes a just noticeable difference (JND) from the original stimulus intensity is a constant. In psychology, the HVS has the property of light adaptation, and the perception of luminance obeys Weber’s law [7]. The just noticeable incremental luminance over the background by the HVS is related to the background luminance.
Inspired by this recognition, in contrast to Weber’s law, we consider that the upper threshold for truncating the significantly perceptible stimulus change is also related to the original stimulus intensity value. Because different pixels in the image correspond to different gray values, the original stimulus intensity values will also be different. Here, we adaptively determine the upper threshold according to the background luminance of different areas of the image.
The adaptively upper threshold is defined as
Where T_{0} is an adjustable threshold parameter. (The details of selecting T_{0} will be presented in section III-A.) I(i) takes the larger value of the luminance of r and d at point i.
In formula (7), the luminance values $$\overline{r}\left(i\right)$$ and $$\overline{d}\left(i\right)$$ at pixel i of r and d is estimated by formulas (8) and (9). For reference image r, denote the square neighborhood as $${\mathbf{\Omega}}_{i}^{r}$$ with center of pixel i and radius of t, and let the intensity value of any pixel in the neighborhood be r_{i,j}, $$j\in {\mathbf{\Omega}}_{i}^{r}$$. Similarly, for the distorted image, denote the square neighborhood as $${\mathbf{\Omega}}_{i}^{d}$$ with center of pixel i and radius of t, and let the intensity value of any pixel in the neighborhood be d_{i,j}, $$j\in {\mathbf{\Omega}}_{i}^{d}$$
Where m = (2t + 1)^{2}.
Based on Eq. (6), the value of the upper threshold at each pixel in an image can be adaptively determined according to the image content. Then, the adaptively truncating gradient is obtained by formulas (4) and (5). Figure 1 shows the gradient map and the adaptively truncating gradient map corresponding to the reference image and the distorted image. It can be seen that the maximum amplitude of the gradient map is approximately 250, while the maximum amplitude of the adaptively truncating gradient is approximately 70.
With the adaptively truncating gradient defined, the local quality of the distorted image is predicted by the similarity between the adaptively truncating gradient of r and d, which is defined as
Where the parameter C is introduced to avoid the denominator becoming zero and supplies numerical stability. The range of S(i) is from 0 to 1. Obviously, on the one hand, S(i) is close to 0 when G_{T} (r, i) and G_{T} (d, i) are quite different. On the other hand, S(i) will achieve the maximal value 1 when G_{T} (r, i) is equal to G_{T} (d, i).
The overall quality score of the distorted image is predicted by the local quality S(i), which is calculated as follows :
A higher score indicates better image quality.
All the experiments in this study were implemented in MATLAB R2016b and executed on a Lenovo Ideapad700 laptop with Intel Core i5-6300HQ@2.3-GHz CPU and 4 GB RAM. Several well-known FR metrics were used when comparing performances with the proposed method, including PSNR, SSIM[1], FSIM [2], GMSD[3], DASM[5], IFC [8], VIF [9], MS-SSIM [10], and SSRM [11]. To widely evaluate the performance of these metrics, six public databases were employed for the experiments: TID2013 [12], TID2008 [13], CSIQ [14], LIVE [15], IVC [16] and A57 [17]. The TID2008 database consists of 25 reference images and a total of 1700 distorted images, each of which is distorted using 17 different types of distortions at four different levels of distortion. The TID2013 is an expanded version of TID2008, which contains 3000 distorted images with 24 distortion types. The LIVE database includes 29 reference images and 779 distorted images with five distortion types. The CSIQ database contains 30 original images and 886 distorted images degraded by six types of distortion. The IVC database consists of 10 reference images and 185 distorted images. The A57 database includes 3 reference images and 54 distorted images. Note that for the color images in these databases, only the luminance component is evaluated.
Four commonly used performance criteria are employed to evaluate the competing IQA metrics. The Spearman rank order correlation coefficient (SROCC) and Kendall rank order correlation coefficient (KROCC) are adopted for measuring the prediction monotonicity of an objective IQA metric. For compute the other two criteria, the Pearson linear correlation coefficient (PLCC) and the root mean squared error (RMSE), we need to apply a regression analysis. The PLCC measures the consistency between the objective scores after nonlinear regression and the subjective mean opinion scores (MOS). The RMSE measures the relative distance between the objective scores after nonlinear regression and MOS. For the nonlinear regression, we used the following mapping function:
where Q and Q_{P} are original objective scores of an IQA metric and the objective scores after regression, respectively. β_{i}, i = 1, 2, ⋯, 5 are the fixed parameters. Higher values of SROCC, KROCC, PLCC and lower RMSE values indicate a better performance of IQA metrics.
For the proposed metric, there are three parameters that need to be set to obtain the final quality score. They are T_{0}, t and C. Selecting the first 8 reference images and corresponding 544 distorted images in the TID2008 database as the testing subset, we choose the parameters that can yield the highest SROCC. The result is T_{0} = 3, t = 51 and C = 1600.
To further analyze the effect of threshold parameter T_{0}, more experiments were carried out. Figure 2 shows the SROCC performance with different T_{0} values on six databases. On most databases, SROCC can is best when T_{0} is 3. This result indicates that the range of upper threshold T is approximately [0,255/3] for an 8-bit grayscale image according to formula (6). If the change in image intensity is above 255/3, then it will be masked in visual perception.
Table I lists the SROCC, KROCC, PLCC and RMSE results of ten metrics on six databases, and the two best results of each row are highlighted in bold. Overall, the methods which employed the gradient feature performs well across all the databases, such as FSIM, GMSD, DASM and the proposed metric. This partly demonstrates the validity of considering the degradation of gray changes in quality evaluation. Furthermore, the proposed metric performs well, outperforming SSIM and SSRM and competing with FSIM and GMSD.
Among the six databases, TID2013 has the highest number of distorted types. Table II lists the SROCC results of ten metrics about each individual distorted type of the TID2013 database. The proposed algorithm performs well in variety of distortion types. In particular, the proposed algorithm is outstanding for JPEG, JP2K and JPEG-trans-error distortion types that are sensitive to variations.
In this paper, we discuss the problem of whether the change measured by the gradient correspond to the change perceived by the HVS. Considering that the ability of the HVS to perceive changes is affected by the upper threshold, we defined the adaptively truncating gradient and proposed a novel IQA index. Numerical experimental results showed that this index performs well on multiple databases. In addition, more studies need to be conducted to address this problem due to its complexity. In future research, we expect to using machine learning methods to further understand this issue.
Figure 1.
The gradient map and the adaptively truncating gradient map corresponding to the reference image and the distorted image. (a) the reference image. (b) the distorted image. (c) and (d) are the gradient map of (a) and (b), respectively. (e) and (f) are the adaptively truncating gradient map of (a) and (b), respectively.