Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Reducing inverse quantization numbers in intra frame for video transcoding architectures

Abstract

In this study, a complexity-quality analysis with transcoding architectures is proposed for reducing inverse quantization numbers. This architecture is different from conventional transcoding scheme, which neglects the relationship between previous and current quantizer step size. However, the proposed transcoding architecture depends on the modulus of the ratio of the current and previous quantization parameter. By analyzing the quantized area of the previous and current quantization parameter, we concluded the part of undoing first inverse quantization, to reduce computing complexity. From computer simulation, we verify the merits of the proposed scheme over the conventional transcoding approaches, in terms of achieving better performance based on the computing complexity and objective (e.g., the peak signal-to-noise ratio) analysis.

Introduction

Transcoding is very important in multimedia application. When we would like to share good videos with friends especially, it is a very well way by internet transmission. Limited to internet bandwidth, if we want to deliver video bitstreams, the bit-rate conversion problem we will face. On the other hand, it is also a transcoding problem. Generally, transcoding can be interpreted as the operation of converting a video from one format into another format [1]. For example, an original video is encoded in an MPEG-2 format at 5.3Mb/s, the temporal rate is 30 f/s, and the input resolution is 720×480. Then the original video is transcoded to an MPEG-4 format at 128Kb/s, the temporal rate is 10f/s, and the output resolution is 352×240 [2]. However, the meaning of transcoding is not only an operation of format-conversion but also it can share popular video-audio to another people through the internet or satellite media. This will propagate information unlimitedly.

There are many transcoding application schemes, including the bit-rate reduction, spatial resolution reduction, temporal resolution (skipped frame) reduction, and error-resilience transcoding [3]. The straightforward method is transcoding in pixel-domain [4] which is a direct cascade decoder and encoder approach. That is, the incoming bitstreams are first decoded in the pixel domain, and then the decoded video frames are re-encoded at the bit-rate which client’s demands. But the drawbacks of this scheme are high computing complexity and too much memory cost. To reduce the complexity, Youn etc. [5] proposed information reusing method which is a skill that motion vectors from the input bitstreams after decoding can be reused to reduce the computing complexity of transcoder. A distributed video transcoding scheme that uses dependency among a group of pictures by preparing video blocks of variable size was proposed to reduce the bitrate and transcoding time for fast delivery of a video to end users [6]. Van etc. [7] developed several schemes to reduce the computation of closed-loop translating for high-efficiency video coding. A high bit rate input bitstream is decoded and the recovered sequence is then re-encoded at a lower bit rate. A new fast transcoding algorithm to make full use of the prior knowledge of the influence of video brightness on transcoding modes was proposed [8]. It used the information available from previously decoded MBs and YUV difference to decide which modes can be overpassed with little loss to the rate-distortion performance. Jokhio etc. [9] present prediction-based dynamic resource allocation and deallocation algorithms to a dynamically scalable cluster of video transcoding servers. A Hadoop-based distributed video transcoding method that transcodes various video codec formats into the MPEG-4 video format was proposed [10]. Improvements in quality and speed are achieved by adopting the Xuggler Java library for transcoding based on open source.

Early research almost used the re-quantization methods on the transcoder to reduce the complexity [11], [12]. But this kind of method often causes degraded performance by high reduction ratio required by the re-quantization method. Therefore, the frame-skipping technique was introduced [13], [14]. This technique can significantly reduce the bit-rate to match internet bandwidth demands. However, the drawback of this technique is increasing more computing complexity at reconstructing the skipped frame procedure. Using the coefficients of discrete cosine transform and predicted modes, Lin [15] proposed a transcoding method by reducing largest coding unit and early ending. To improve high efficient video coding, Wan [16] developed a transcoder with boosted bit-rate by exploiting the architecture of cascaded pixel-domain. Kim [17] employed quadtree framework with different downscaled resolutions to boost the high efficient video coding transcoder.

For details, Fig 1 describes an encoder-transcoder-decoder common architecture. Table 1 is a nomenclature list of abbreviations. When we would like to transcode one original format bitstreams which were encoded data, the E1 in Fig 1, to new format bitstreams, the first step is decoding E1 data to D1. Then, by inverse first quantization (IQ1), the D1 transform to data. Following, using inverse discrete cosine transform (IDCT) method transforms the in the frequency domain to the in the spatial domain and adds the motion compensation vector from motion compensator (MC) to combine a compensated bitstreams . Then data subtract the motion compensation vector from MC and after DCT and second quantization (Q2), the bitstreams could be encoded and be delivered to the decoder.

Because the transcoder architecture of Fig 1 is computing complexly, Vetro etc. [3] also proposed a bit-rate reduction method to reduce the bit-rate and maintain the original frames performance. However, the reason that reduced computing complexity by bit-rate reduction method is discarding a few high-frequency data. But the tradeoff is to degrade the performance. Besides, Vetro etc. proposed another scheme, names spatial resolution reduction. This scheme used down-sampling four macroblocks (MBs) to one MB, the associated motion vectors have to be mapped, that is, a reduction factor of two in both the horizontal and vertical resolution. In this case, each motion vector is mapped from 16×16 MB in the original resolution to an 8×8 block in the reduced resolution MB with appropriate scaling by two. Though the down conversion scheme can reduce the number of motion vectors, oppositely it needs to calculate the new motion vectors. The most important point is that this scheme will cause worse distortion because of the error between new motion vectors and original motion vectors.

In this paper, a complexity-quality analysis for transcoding architectures of reducing inverse quantization numbers is proposed. This architecture is different from conventional transcoding scheme, which neglects the relation between first and second quantizer step size. However, our proposed transcoding architecture depends on the modulus of the ratio of the second quantization and first quantization. By analyzing the quantized area of first quantization and second quantization, we conclude the part of undoing first inverse quantization, to reduce computing complexity. From computer simulation, we verify the merits of the proposed scheme over the conventional approaches, in terms of achieving superior performance based on the computing complexity and objective analysis.

For discussion, this paper is organized as follows; in Section 2, conventional transcoder architecture is first introduced and then the novel modified transcoder architecture is proposed. In Sec. 3, the results of the simulation are provided that confirm and demonstrate the effectiveness of the algorithm, in comparison to the conventional transcoder scheme, in terms of computing complexity reduction. Finally, conclusions are presented in Sec. 4.

Modified transcoder architecture

In this section, we proposed a new architecture which according to the modulus that the quantized step size at transcoder divides the quantized step size at the encoder. We designed several different transcoding processes according to the different modulus of quantization ratio cases. This benefits that transcoding will spend the least computing complexity and maintain the same performance. We will do the computing complexity reduction analysis by PSNR measure objectively and vision measure subjectively in Sec.3.

Conventional transcoder architecture

Fig 2 describes a pixel-domain transcoding architecture, named cascaded pixel-domain transcoder (CPDT) [18].

Because the predicted frame is the composition in which the sum of the spatial position vector of the (n-1)-th original picture and the spatial position vector of the motion compensation vector Vn. Hence, can be indicated as (1)

The decoded frame can be yielded by the residual frame which was inverse discrete cosine transform (IDCT) adding to the predicted frame , that is, (2)

Substituting (1) into (2), (2) can be rewritten as (3)

From Fig 2, we can see that the residual frame is the one which the decoded picture subtract the predicted frame . So can be represented as (4)

Furthermore, after DCT, Q2, IQ2, and IDCT, the residual frame must introduce a quantized error . Hence, the frame can be denoted as

Take (4) into account, we could rewrite as (5)

Besides, the predicted frame is the composition in which the sum of the spatial position vector of the (n-1)-th original picture and the spatial position vector of the motion compensation vector Vn. Therefore, can be indicated as (6)

Substituting (5) into (6), (6) can be rewritten as (7)

Substituting (7) into (4), the relation between the residual picture and the decoded frame is following, (8)

To get the correlation between the residual frame and the residual frame , we substituted (3) into (8) and yielded (9)

Hence, we simplified Fig 2 to Fig 3. In fact, because DCT and IDCT are all linear operations, the result in Fig 3 would not be changed despite performing adding arithmetic or (I)DCT prior. Therefore, we could move the IDCT operator behind adder ADD1 in Fig 3.

thumbnail
Fig 3. Simplified architecture using the correlation between and .

https://doi.org/10.1371/journal.pone.0215131.g003

As mentioned above, according to the linear property of DCT and IDCT, we could move the IDCT block from left end to right end of Position X and the IDCT and DCT can cancel each other.

By linear property of IDCT, we move IDCT block below Position X to the left of the ADD2 adder.

Because the incoming bitstreams from IQ1 are frequency domain coefficients, if the coefficients in the close-loop also are frequency domain coefficients, then it is not necessary to perform IDCT. To simply the (I)DCT blocks, DCT domain transcoding was introduced [18]. Hence, if we converse motion compensate (MC) in the spatial domain to DCT-MC in the frequency domain [19], then we can take away DCT and IDCT and the transcoder can be simplified as Fig 4 which named as simplified DCT-domain transcoder (SDDT) [19][4]. In fact, though SDDT architecture reduces the number of (I)DCT, it increases the computing complexity which the process of MC converting to DCT-MC introduces. Another drawback of SDDT architecture is that it can only be employed at which the encoder and decoder have the same spatial/temporal resolution. In addition, the output video and input video need use the same motion vectors and encoding modes. Thus, the cascaded DCT-domain transcoder (CDDT) [20] in Fig 5 was introduced. However, though the CDDT improves the usable limits, oppositely it increases the complexity of DCT-MC and frame store blocks. Follows, we proposed a novel method to reduce the complexity and still maintain the PSNR.

The Auto-selective transcoder architecture

In this section, we would like to propose a new modified version of transcoding architecture, with auto-selective architecture capability, for computational complexity reduction of bitstreams, during video transcoding processes. We employ the modulus of the ratio of the first quantizer step size and second quantizer step size to design different scheme. That is, the transcoding architecture relies on the modulus that the quantized step size at transcoder divides the quantized step size at the encoder, i.e., mod(Q2/Q1).

Whatever the group-of-pictures (GOP) structure of input bitstreams is, the I-pictures are the major elements which spend the most memory. The others P-pictures or B-pictures need only store the motion vector (MV) which the motion estimator (ME) in encoder estimated. So we will reduce the computing complexity to I-pictures below. When the input bitstreams are Intra pictures (I-pictures), these I-pictures need not perform a motion estimate. Hence, we could simply Fig 4 to Fig 6.

In P-pictures/B-pictures, because P-pictures/B-pictures are the composition of motion compensates vectors and the residual frame, the P-pictures need consider Fig 4 architecture. Whatever the incoming bitstreams are, they all need to consider the re-quantization problem. This is because that if the value of mod(Q2/Q1) is not an integer when performing second quantization, it will cause performance error. This error may be stated as following from Fig 7. In Fig 7, point A is first quantized to and denoted . The is then second quantized to and yield the point which is stated (10) Similarly, point B is first quantized to , and is expressed as . Then is second quantized to and yield the point which is indicated .

It is worth mentioning that if point A is first quantized and then directly second quantized, it will get the result of (10). However, if point A is first quantized and perform inverse first quantizing and then continue second quantizing, the result is different from (10) and can be indicated as (11) Clearly, it will introduce so-called quantized error. In this paper, we classified different schemes using the values of the mod(Q2/Q1). When Q1 = 7 and Q2 = 8, the shadow region in Fig 8 is the part of the quantized error. We can see that the shadow regions are far smaller than no shadow ones. On the other hand, the quantized error regions are far smaller than direct cascaded quantization regions. Hence we have an idea that the input bitstreams possess auto-selective probability of performing inverse quantization, that is, inversely quantize only on those bitstreams of shadow regions. The other no shadow regions can directly perform cascade quantization. This method benefits reducing computing complexity which every block pixels need to inverse quantize IQ1 and accompanying second quantize Q2.

Theoretically, if we know the pixel value of bitstreams and the pixel value of input point A which is between 7 and 8 in Fig 8, then the first quantized point A will get the value of . Not performing inverse quantization IQ1 but directly second quantizing Q2, it will then yield the value of . In fact, if point A first inverse quantized IQ1 and then did second quantization Q2, it will get the value of . Unfortunately, the input bitstreams which we received at the transcoder are quantized value Q1, but not original frame pixel values. Hence, we can not perform second quantization using the original frame pixels. If we do not want to perform inverse quantization and directly second quantize, we can see from Fig 8 that only 0~7 and 49~56 can directly second quantize Q2. The other all need to do inverse quantization. Despite a few pixels needing to inverse quantize, we still can reduce 2/8 computing complexity which needs to inverse quantize in Fig 8.

If we set Q1 = m, we can summarize a general expression as follow, (12)

thumbnail
Table 1. Different quantization modes of switch SW of the proposed transcoder in I-pictures.

https://doi.org/10.1371/journal.pone.0215131.t001

According to the energy concentration property of DCT, the input bitstreams picture pixels which are DCT and quantized, the most part values of them are small and concentrated in exclude the DC value. Therefore we can perform the first proposed method on AC values and DC values still do inverse first quantization IQ1 and accompanying second quantization Q2. Thus, we can modify Fig 6 to Fig 9. Table 1 describe different quantization modes of switch SW of Fig 9. When the SW position at A, the bitstreams do not perform inverse quantization. When the SW position at B, the bitstreams perform IQ1. However, we only use software to design the switch instead of hardware architecture. It would not spend any hardware cost. Thus, our proposed method can be employed to not only reduce more computing complexity but also to maintain good performance.

Experimental results

In this section, the superiority, in terms of good visual quality and good peak signal noise ratio (PSNR), of the proposed scheme is verified using computer simulation. For comparison, the 352×288 CIF and 3840×2160 4-k ultra-HD test sequences, viz., Foreman, Susie, Mobile & Calendar, Cactus and Flower Garden are chosen for the data compression process and adopted as simulation sequences. The experiments are performed on a Pentium-IV 1.6GHz PC. Several experiments are made in MPEG II. In fact, the proposed method can be implemented in any coding standard because all transcoding architecture need to process the I-picture of decoding/encoding. From Table 2, we can see that our proposed method is faster than CPDT about 21.3fps, SDDT about 5fps, CDDT about 14.2fps in IPPP… case for the Foreman sequences. In IBBP… case, our proposed method is faster than CPDT about 14.2fps, SDDT about 4.2fps, CDDT about 11.6fps for the Foreman sequences. Besides, we can see that our proposed method is faster than CPDT about 21.5fps, SDDT about 5.1fps, CDDT about 12.4fps in IPPP… case for the Mobile & Calendar sequences. In IBBP… case, our proposed method is faster than CPDT about 15.1fps, SDDT about 4.3fps, CDDT about 11.1fps for the Mobile & Calendar sequences. Table 3 shows our proposed method has better PSNR than CPDT approach about 0.12~0.42 dB and CPDT+FDVS [14] scheme about 0.05~0.28 dB. In Fig 10, the PSNR of our proposed method was about 0.1–0.3 dB less than that of the direct encoding approach but perform better than cascaded quantization transcoding for the Flower Garden sequences. However, the complexity in I-picture transcoding of the proposed scheme was reduced by about 20%, while maintaining good visual performance. Additionally, Fig 11 displays that the proposed system has better objective performance than the other methods. In addition, 4-k ultra-HD video clips are test in Table 4.

thumbnail
Fig 10. Intra-frame transcoding of Flower Garden encoded Q1 = 16, transcode different Q2.

https://doi.org/10.1371/journal.pone.0215131.g010

thumbnail
Table 2. The comparison of the proposed MQSS scheme using SDDT with CPDT, SDDT, CDDT, in terms of fps, for the Foreman, Mobile & Calendar and Susie CIF sequences which are encoded at QP = 7, and then transcoded at QP = 15.

https://doi.org/10.1371/journal.pone.0215131.t002

thumbnail
Table 3. Average PSNR comparison using MPEG-2 as a front encoder, for the Susie, Mobile & Calendar and Foreman sequences which are encoded at QP = 7, and then transcoded at QP = 15.

https://doi.org/10.1371/journal.pone.0215131.t003

thumbnail
Table 4. The comparison of the proposed MQSS scheme using SDDT with CPDT, SDDT, CDDT, in terms of fps, for the Foreman, Mobil & Calendar, susie and Cactus 4-k ultra-HD sequences which are encoded at QP = 7, and then transcoded at QP = 15.

https://doi.org/10.1371/journal.pone.0215131.t004

This study developed novel modified transcoding architecture, with auto-selective architecture capability, which reduces the computational complexity of video transcoding. Experimental results show that the proposed method can yield better vision and PSNR performance than other approaches.

Conclusions

In this paper, we have proposed a new modified version of transcoding architecture, with auto-selective architecture capability, for computational complexity reduction of bitstreams, during video transcoding processes. Experimental results show that our method can obtain good vision and PSNR performance in comparison with other approaches.

Appendix

A Proof That Proposed Video Transcoding Architectures

The all re-quantization possibilities we summarized as follows:

Case 1: Q1 = Q2 = 7

That is, , we can directly quantize by Q2 and need not to perform inverse first quantization IQ1. Therefore we can reduce 100% computing complexity which needs to inverse quantize and can be expressed as (A.1)

Case 2: Q1 = 7 and Q2 = 8

When , two quantized regions and can be directly quantized by Q2, and , and need to perform inverse quantization. So we can reduce at least (25%) computing complexity which needs to inverse first quantize IQ1 and be indicated as (A.2)

Case 3: Q1 = 7 and Q2 = 9

When , two quantized regions and can be directly quantized by Q2, and , and need to perform inverse quantization. So we can reduce (22.2%) computing complexity which needs to inverse first quantize IQ1 and be stated (A.3)

Case 4: Q1 = 7 and Q2 = 10

In the same way as (13), when , we can reduce (20%) computing complexity which needs to inverse first quantize IQ1 and be stated (A.4)

Case 5: Q1 = 7 and Q2 = 11

When , we can reduce (18.2%) computing complexity which needs to inverse first quantize IQ1 and be stated (A.5)

Case 6: Q1 = 7 and Q2 = 12

When , we can reduce (16.7%) computing complexity which needs to inverse first quantize IQ1 and be stated (A.6)

Case 7: Q1 = 7 and Q2 = 13

When , we can reduce (15.4%) computing complexity which needs to inverse first quantize IQ1 and be stated (A.7)

References

  1. 1. Xin J, Lin CW, Sun MT. Digital video transcoding. Proceedings of the IEEE. 2005; 93(1); pp. 84–97.
  2. 2. Chang SF, Vetro A. Video adaptation: concepts, technologies, and open issues. Proceeding of the IEEE. 2005; 93(1); pp. 148–158.
  3. 3. Vetro A, Christopoulos C, Sun H. Video transcoding architectures and techniques: An overview. IEEE Signal Processing Magazine. 2003; 20(2); pp.18–29.
  4. 4. Keesman G, Hellinghuizen R, Hoekema F, Heideman G. Transcoding of MPEG bitstreams. Signal Processing: Image Communication. 1996; 8(6); pp. 481–500.
  5. 5. Youn J, Sun MT, Lin CW. Motion estimation for high performance transcoding. IEEE Trans. Consumer Electronic. 1998; 44(3); pp. 649–658.
  6. 6. Zakerinasab MR, Wang M. Dependency-Aware Distributed Video Transcoding in the Cloud. IEEE 40th Conference on Local Computer Networks. Florida. 2015; pp. 245–252.
  7. 7. Van LP, Praeter JD, Wallendael GV, Leuven SV, Cock JD, Walle RVD. Efficient Bit Rate Transcoding for High Efficiency Video Coding. IEEE Trans. Multimedia. 2016; 18(3).
  8. 8. Shen K, Wang Z, Han Z. Fast video enhancement transcoding. IEEE International Conference on Image Processing. 2016; pp. 2177–2188.
  9. 9. Jokhio F, Ashraf A, Lafond S, Porres I, Lilius J. Prediction-Based Dynamic Resource Allocation for Video Transcoding in Cloud Computing. 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing. 2013; pp. 254–261.
  10. 10. Kim M, Cui Y, Han S, Lee H. Towards Efficient Design and Implementation of a Hadoop-based Distributed Video Transcoding System in Cloud Computing Environment. International Journal of Multimedia and Ubiquitous Engineering. 2013; 8(2); pp.213–224.
  11. 11. Seo KD, Lee SH, Koh JS, and Kim JK. Rate control algorithm for fast bit-rate conversion transcoding. IEEE Trans. Consumer Electronic. 2000; 46(4); pp.1128–1136.
  12. 12. Sostawa B, Dannemann T, Speidel J. DSP-based transcoding of digital video signals with MPEG-2 format. IEEE Trans. Consumer Electronic. 2000; 46(2); pp.358–362.
  13. 13. Seo KD, Heo SC, Kwon SK, Kim JK. Dynamic Bit-Rate Reduction Based on Requantization and Frame-Skipping for MPEG-1 to MPEG-4 Transcoder. IEICE Trans. Fundamentals. 2004; E87-A(4); pp. 903–911.
  14. 14. Fung KT, Chan YL, Siu WC. New Architecture for Dynamic Frame-Skipping Transcoder. IEEE Transactions on Image Processing. 2002; 11(8); pp.886–900. pmid:18244683
  15. 15. Lin CS, Yang WJ, Su CW. FITD: Fast Intra Transcoding from H.264/AVC to high efficiency video coding based on DCT coefficients and prediction modes. Journal of Visual Communication and Image Representation. 2016; 38; pp.130–140.
  16. 16. Wang J, Li L, Zhi G, Zhang Z, Zhang H. Efficient algorithms for HEVC bitrate transcoding. Multimedia Tools and Applications. 2017; 76(24); pp. 26581–26601.
  17. 17. Kim M, Sung M, Kim M, Woo W. RoExploiting Pseudo-Quadtree Structure for Accelerating HEVC Spatial Resolution Downscaling Transcoder. IEEE Transactions on Multimedia. 2018; 20(9); pp.2262–2275.
  18. 18. Sun H, Kwok W, Zdepski JW. Architectures for MPEG Compressed Bitstream Scaling. IEEE Transactions on Circuits and Systems for Video Technology. 1996; 6(2); pp.191–199.
  19. 19. Assunção PAA, Ghanbari M. A frequency-Domain Video Transcoder for Dynamic Bit-Rate Reduction of MPEG-2Bit Streams. IEEE Transactions on Circuits and Systems for Video Technology. 1998; 8(6); pp.953–967.
  20. 20. Zhu W, Yang K, and Beacken M. CIF-to-QCIF video bitstream down-conversion in the DCT-domain. Bell Labs. Tech. J. 1998; 3(3); pp.21–29.