Reducing inverse quantization numbers in intra frame for video transcoding architectures

In this study, a complexity-quality analysis with transcoding architectures is proposed for reducing inverse quantization numbers. This architecture is different from conventional transcoding scheme, which neglects the relationship between previous and current quantizer step size. However, the proposed transcoding architecture depends on the modulus of the ratio of the current and previous quantization parameter. By analyzing the quantized area of the previous and current quantization parameter, we concluded the part of undoing first inverse quantization, to reduce computing complexity. From computer simulation, we verify the merits of the proposed scheme over the conventional transcoding approaches, in terms of achieving better performance based on the computing complexity and objective (e.g., the peak signal-to-noise ratio) analysis.


Introduction
Transcoding is very important in multimedia application. When we would like to share good videos with friends especially, it is a very well way by internet transmission. Limited to internet bandwidth, if we want to deliver video bitstreams, the bit-rate conversion problem we will face. On the other hand, it is also a transcoding problem. Generally, transcoding can be interpreted as the operation of converting a video from one format into another format [1]. For example, an original video is encoded in an MPEG-2 format at 5.3Mb/s, the temporal rate is 30 f/s, and the input resolution is 720×480. Then the original video is transcoded to an MPEG-4 format at 128Kb/s, the temporal rate is 10f/s, and the output resolution is 352×240 [2]. However, the meaning of transcoding is not only an operation of format-conversion but also it can share popular video-audio to another people through the internet or satellite media. This will propagate information unlimitedly.
There are many transcoding application schemes, including the bit-rate reduction, spatial resolution reduction, temporal resolution (skipped frame) reduction, and error-resilience transcoding [3]. The straightforward method is transcoding in pixel-domain [4] which is a direct cascade decoder and encoder approach. That is, the incoming bitstreams are first decoded in the pixel domain, and then the decoded video frames are re-encoded at the bit-rate a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 which client's demands. But the drawbacks of this scheme are high computing complexity and too much memory cost. To reduce the complexity, Youn etc. [5] proposed information reusing method which is a skill that motion vectors from the input bitstreams after decoding can be reused to reduce the computing complexity of transcoder. A distributed video transcoding scheme that uses dependency among a group of pictures by preparing video blocks of variable size was proposed to reduce the bitrate and transcoding time for fast delivery of a video to end users [6]. Van etc. [7] developed several schemes to reduce the computation of closed-loop translating for high-efficiency video coding. A high bit rate input bitstream is decoded and the recovered sequence is then re-encoded at a lower bit rate. A new fast transcoding algorithm to make full use of the prior knowledge of the influence of video brightness on transcoding modes was proposed [8]. It used the information available from previously decoded MBs and YUV difference to decide which modes can be overpassed with little loss to the rate-distortion performance. Jokhio etc. [9] present prediction-based dynamic resource allocation and deallocation algorithms to a dynamically scalable cluster of video transcoding servers. A Hadoopbased distributed video transcoding method that transcodes various video codec formats into the MPEG-4 video format was proposed [10]. Improvements in quality and speed are achieved by adopting the Xuggler Java library for transcoding based on open source.
Early research almost used the re-quantization methods on the transcoder to reduce the complexity [11], [12]. But this kind of method often causes degraded performance by high reduction ratio required by the re-quantization method. Therefore, the frame-skipping technique was introduced [13], [14]. This technique can significantly reduce the bit-rate to match internet bandwidth demands. However, the drawback of this technique is increasing more computing complexity at reconstructing the skipped frame procedure. Using the coefficients of discrete cosine transform and predicted modes, Lin [15] proposed a transcoding method by reducing largest coding unit and early ending. To improve high efficient video coding, Wan [16] developed a transcoder with boosted bit-rate by exploiting the architecture of cascaded pixel-domain. Kim [17] employed quadtree framework with different downscaled resolutions to boost the high efficient video coding transcoder.
For details, Fig 1 describes an encoder-transcoder-decoder common architecture. Table 1 is a nomenclature list of abbreviations. When we would like to transcode one original format bitstreams which were encoded data, the E 1 in Fig 1, to new format bitstreams, the first step is decoding E 1 data to D 1. Then, by inverse first quantization (IQ 1 ), the D 1 transform to D iq 1 data. Following, using inverse discrete cosine transform (IDCT) method transforms the D iq 1 in the frequency domain to the R n 1 in the spatial domain and adds the motion compensation vector from motion compensator (MC) to combine a compensated bitstreams I n 1 . Then I n 1 data subtract the motion compensation vector from MC and after DCT and second quantization (Q 2 ), the bitstreams could be encoded and be delivered to the decoder.
Because the transcoder architecture of Fig 1 is computing complexly, Vetro etc. [3] also proposed a bit-rate reduction method to reduce the bit-rate and maintain the original frames performance. However, the reason that reduced computing complexity by bit-rate reduction method is discarding a few high-frequency data. But the tradeoff is to degrade the performance. Besides, Vetro etc. proposed another scheme, names spatial resolution reduction. This scheme used down-sampling four macroblocks (MBs) to one MB, the associated motion vectors have to be mapped, that is, a reduction factor of two in both the horizontal and vertical resolution. In this case, each motion vector is mapped from 16×16 MB in the original resolution to an 8×8 block in the reduced resolution MB with appropriate scaling by two. Though the down conversion scheme can reduce the number of motion vectors, oppositely it needs to calculate the new motion vectors. The most important point is that this scheme will cause I nÀ 1 1 , The (n-1)th intraframe data in decoder end of transcoder; P n 1 , Predictive data in decoder end of transcoder; R n 2 , Residual data in encoder end of transcoder; P n 2 , Predictive data in encoder end of transcoder; I n 2 , The nth intraframe data in encoder end of transcoder; I nÀ 1 2 , The (n-1)th intraframe data in encoder end of transcoder; E n 2 , Quantized error in encoder end of transcoder.
worse distortion because of the error between new motion vectors and original motion vectors.
In this paper, a complexity-quality analysis for transcoding architectures of reducing inverse quantization numbers is proposed. This architecture is different from conventional transcoding scheme, which neglects the relation between first and second quantizer step size. However, our proposed transcoding architecture depends on the modulus of the ratio of the second quantization and first quantization. By analyzing the quantized area of first quantization and second quantization, we conclude the part of undoing first inverse quantization, to reduce computing complexity. From computer simulation, we verify the merits of the proposed scheme over the conventional approaches, in terms of achieving superior performance based on the computing complexity and objective analysis.
For discussion, this paper is organized as follows; in Section 2, conventional transcoder architecture is first introduced and then the novel modified transcoder architecture is proposed. In Sec. 3, the results of the simulation are provided that confirm and demonstrate the effectiveness of the algorithm, in comparison to the conventional transcoder scheme, in terms of computing complexity reduction. Finally, conclusions are presented in Sec. 4.

Modified transcoder architecture
In this section, we proposed a new architecture which according to the modulus that the quantized step size at transcoder divides the quantized step size at the encoder. We designed several different transcoding processes according to the different modulus of quantization ratio cases. This benefits that transcoding will spend the least computing complexity and maintain the same performance. We will do the computing complexity reduction analysis by PSNR measure objectively and vision measure subjectively in Sec.3.   Because the predicted frame P n 1 is the composition in which the sum of the spatial position vector of the (n-1)-th original picture I nÀ 1 1 and the spatial position vector of the motion compensation vector V n . Hence, P n 1 can be indicated as P n

Conventional transcoder architecture
The decoded frame I n 1 ðxÞ can be yielded by the residual frame R n 1 which was inverse discrete cosine transform (IDCT) adding to the predicted frame P n 1 , that is, Substituting (1) into (2), (2) can be rewritten as From Fig 2, we can see that the residual frame R n 2 is the one which the decoded picture I n 1 ðxÞ subtract the predicted frame P n 2 . So R n 2 can be represented as R n 2 ðxÞ ¼ I n 1 ðxÞ À P n 2 ðxÞ ð4Þ Furthermore, after DCT, Q 2 , IQ 2 , and IDCT, the residual frame R n 2 must introduce a quantized error E n 2 . Hence, the frame I n 2 can be denoted as Take (4) into account, we could rewrite I n 2 as Besides, the predicted frame P n 2 is the composition in which the sum of the spatial position vector of the (n-1)-th original picture I nÀ 1 2 and the spatial position vector of the motion compensation vector V n . Therefore, P n 2 can be indicated as Substituting (5) into (6), (6) can be rewritten as Substituting (7) into (4), the relation between the residual picture R n 2 and the decoded frame I n 1 ðxÞ is following, To get the correlation between the residual frame R n 2 and the residual frame R n 1 , we substituted (3) into (8) and yielded Hence, we simplified Fig 2 to Fig 3. In fact, because DCT and IDCT are all linear operations, the result in Fig 3 would not be changed despite performing adding arithmetic or (I) DCT prior. Therefore, we could move the IDCT operator behind adder ADD1 in Fig 3. As mentioned above, according to the linear property of DCT and IDCT, we could move the IDCT block from left end to right end of Position X and the IDCT and DCT can cancel each other.
By linear property of IDCT, we move IDCT block below Position X to the left of the ADD 2 adder.
Because the incoming bitstreams from IQ 1 are frequency domain coefficients, if the coefficients in the close-loop also are frequency domain coefficients, then it is not necessary to perform IDCT. To simply the (I)DCT blocks, DCT domain transcoding was introduced [18]. Hence, if we converse motion compensate (MC) in the spatial domain to DCT-MC in the frequency domain [19], then we can take away DCT and IDCT and the transcoder can be simplified as Fig 4 which named as simplified DCT-domain transcoder (SDDT) [19] [4]. In fact, though SDDT architecture reduces the number of (I)DCT, it increases the computing complexity which the process of MC converting to DCT-MC introduces. Another drawback of SDDT architecture is that it can only be employed at which the encoder and decoder have the same spatial/temporal resolution. In addition, the output video and input video need use the   [20] in Fig 5 was introduced. However, though the CDDT improves the usable limits, oppositely it increases the complexity of DCT-MC and frame store blocks. Follows, we proposed a novel method to reduce the complexity and still maintain the PSNR.

The Auto-selective transcoder architecture
In this section, we would like to propose a new modified version of transcoding architecture, with auto-selective architecture capability, for computational complexity reduction of bitstreams, during video transcoding processes. We employ the modulus of the ratio of the first quantizer step size and second quantizer step size to design different scheme. That is, the transcoding architecture relies on the modulus that the quantized step size at transcoder divides the quantized step size at the encoder, i.e., mod(Q 2 /Q 1 ).
Whatever the group-of-pictures (GOP) structure of input bitstreams is, the I-pictures are the major elements which spend the most memory. The others P-pictures or B-pictures need only store the motion vector (MV) which the motion estimator (ME) in encoder estimated. So we will reduce the computing complexity to I-pictures below. When the input bitstreams are Intra pictures (I-pictures), these I-pictures need not perform a motion estimate. Hence, we could simply Fig 4 to Fig 6. In P-pictures/B-pictures, because P-pictures/B-pictures are the composition of motion compensates vectors and the residual frame, the P-pictures need consider Fig 4 architecture. Whatever the incoming bitstreams are, they all need to consider the re-quantization problem. This is because that if the value of mod(Q 2 /Q 1 ) is not an integer when performing second quantization, it will cause performance error. This error may be stated as following from Fig 7.  In Fig 7, point A is first quantized to Q A 1 and denoted Q 1 ðAÞ ¼ Q A 1 . The Q A 1 is then second quantized to Q A 2 and yield the pointÂ which is stated Similarly, point B is first quantized to Q B 1 , and is expressed as It is worth mentioning that if point A is first quantized and then directly second quantized, it will get the result of (10). However, if point A is first quantized and perform inverse first quantizing and then continue second quantizing, the result is different from (10) and can be indicated asB Clearly, it will introduce so-called quantized error. In this paper, we classified different schemes using the values of the mod(Q 2 /Q 1 ). When Q 1 = 7 and Q 2 = 8, the shadow region in Fig 8 is the part of the quantized error. We can see that the shadow regions are far smaller than no shadow ones. On the other hand, the quantized error regions are far smaller than direct cascaded quantization regions. Hence we have an idea that the input bitstreams possess autoselective probability of performing inverse quantization, that is, inversely quantize only on those bitstreams of shadow regions. The other no shadow regions can directly perform cascade quantization. This method benefits reducing computing complexity which every block pixels need to inverse quantize IQ 1 and accompanying second quantize Q 2 .
Theoretically, if we know the pixel value of bitstreams and the pixel value of input point A which is between 7 and 8 in Fig 8, then the first quantized point A will get the value of Q 2 1 . Not performing inverse quantization IQ 1 but directly second quantizing Q 2 , it will then yield the value of Q 2 2 . In fact, if point A first inverse quantized IQ 1 and then did second quantization Q 2 , it will get the value of Q 1 2 . Unfortunately, the input bitstreams which we received at the transcoder are quantized value Q 1 , but not original frame pixel values. Hence, we can not perform second quantization using the original frame pixels. If we do not want to perform inverse quantization and directly second quantize, we can see from Fig 8 that only 0~7 and 49~56 can directly second quantize Q 2 . The other Q 2 1 � Q 7 1 all need to do inverse quantization. Despite a few pixels needing to inverse quantize, we still can reduce 2/8 computing complexity which needs to inverse quantize in Fig 8. If we set Q 1 = m, we can summarize a general expression as follow,  Table 1 describe different quantization modes of switch SW of Fig 9. When the SW position at A, the bitstreams do not perform inverse quantization. When the SW position at B, the bitstreams perform IQ 1 . However, we only use software to design the switch instead of hardware architecture. It would not spend any hardware cost. Thus, our proposed method can be employed to not only reduce more computing complexity but also to maintain good performance.

Experimental results
In this section, the superiority, in terms of good visual quality and good peak signal noise ratio (PSNR), of the proposed scheme is verified using computer simulation. For comparison, the 352×288 CIF and 3840×2160 4-k ultra-HD test sequences, viz., Foreman, Susie, Mobile & Calendar, Cactus and Flower Garden are chosen for the data compression process and adopted as simulation sequences. The experiments are performed on a Pentium-IV 1.6GHz PC. Several experiments are made in MPEG II. In fact, the proposed method can be implemented in any coding standard because all transcoding architecture need to process the I-picture of decoding/encoding. From  Table 3 shows our proposed method has better PSNR than CPDT approach about 0.12~0.42 dB and CPDT+FDVS [14] scheme about 0.05~0.28 dB. In Fig 10, the PSNR of our proposed method was about 0.1-0.3 dB less than that of the direct encoding approach but perform better than cascaded quantization transcoding for the Flower Garden sequences. However, the complexity in I-picture transcoding of the proposed scheme was reduced by about 20%, while maintaining good visual performance. Additionally, Fig 11 displays that the proposed system has better objective performance than the other methods. In addition, 4-k ultra-HD video clips are test in Table 4.
This study developed novel modified transcoding architecture, with auto-selective architecture capability, which reduces the computational complexity of video transcoding. Experimental results show that the proposed method can yield better vision and PSNR performance than other approaches.