# Fast Angular Mode Decision Hardware Implementation of HEVC Intra Prediction Jooyong Choi, Seungyong Park and Kwangki Ryoo Dept. of Information & Communication Eng, Hanbat National University 125 Dongseodaero, Yuseong-gu, Daejeon 305-719, Republic of Korea cjy8069@hanbat.ac.kr, srrr.kr@gmail.com, kkryoo@gmail.com #### Abstract In this paper, we propose fast angular mode decision hardware implementation of HEVC intra prediction. Intra Prediction coding of HEVC is a method of predicting a current block not referring to the different image temporally (time) but uses the spatial information from the peripheral image to be currently encoded. The newly designed video coding standard HEVC supports block sizes from 64×64 to 4×4 and a total of 35 Intra prediction modes. While HEVC has a higher performance than H.264/AVC because it uses more prediction modes, there is a high amount of calculation and operational time. HEVC's implementation of selecting prediction mode of the encoding algorithm has a high amount of calculation and operational time for optimum prediction. Efficient Intra prediction mode is determined by the designed hardware HEVC encoder proposed in this paper employing an algorithm to efficiently select the Intra prediction modes to reduce computational time and hardware area. In this paper, we will determine the horizontal and vertical direction from two Angular modes one by one. In Angular prediction mode, an efficient hardware design was proposed to minimize the operational process in parallel by processing block sizes from 64×64 to 4×4. Angular mode is determined in parallel to all the blocks to reduce hardware area and minimized operation and calculation time significantly which is a problem with existing hardware. The proposed hardware architecture was designed using Verilog HDL, implemented on a 65 nm technology, synthesized with Synopsys design compiler. Synthesized gate count amounted to 14.9 K and the maximum operating frequency at 2 GHz. Keywords: HEVC, Intra prediction, Hardware design, Angular mode ### 1. Introduction Recently, due to the development of various video devices supporting high resolution video, interest and demand for high resolution video of users has increased. For this reason, it has become necessary to develop a new video compression technology standard to support high resolution images such as UHD-class images. HEVC (High Efficiency Video Coding) is a new generation video compression standard developed jointly by JCT-VC in January 2010 by MPEC of ISO/IEC and VCEG of ITU-T. HEVC is a video compression standard established as an international standard in April 2013. The HEVC has more than twice the coding efficiency compared to H.264/AVC, but has high complexity [1-2]. The intra prediction of the HEVC generates a predicted frame most similar to the current frame by performing prediction on a PU having a size of 64×64 blocks to 4×4 blocks, and has a total of 35 prediction modes. The intra prediction mode is composed of 1 DC mode, 1 planar mode, 33 angular modes, and improves the coding performance by determining an optimal mode among the 35 modes. The Planar mode generates a predictive pixel value and using the location of the reference pixel, the DC mode generates a prediction pixel using the average value of the reference and the remaining 33 angular modes predict a reference pixel in each direction to obtain the ISSN: 2005-4254 IJSIP Copyright © 2017 SERSC Australia difference between the original pixels. However, in order to process all of the 35 modes, the computational complexity and operational time required are high [3]. In this paper, to reduce the computational complexity and processing time, we use the difference and position of the original pixel data to select the direction, and we can select 1 mode in the horizontal and vertical directions of 33 angular modes and hardware design using an algorithm to select efficiently was performed. ### 2. Intra Prediction in HEVC The intra prediction of HEVC is a method of predicting the current block by referring to the samples reconstructed around the current block. The intra prediction is used to eliminate spatial redundancy. The intra-picture prediction order is shown in Figure 1 and proceeds in the order of reference sample preparation, intra-picture prediction picture generation, and prediction mode coding. Reference sample preparation steps include reference sample padding and reference sample filtering. **Figure 1. Intra Prediction Process** The intra prediction is performed using surrounding reconstructed reference samples. In accordance with the mode, the reference sample is padded and filtered, and intra prediction is performed. Reference sample padding is a step of performing padding in the case where reference samples do not exist, thereby preparing a reference sample, and filtering has strong filtering and weak filtering. Then, the difference signal between the original image and the predicted image is transformed and quantized, and the optimal mode is selected by measuring the block reconstruction and rate-distortion cost. The existing standard H.264/AVC supports a total of 9 prediction modes in $4\times4$ to $16\times16$ prediction blocks, whereas HEVC supports a total of 35 prediction modes in $4\times4$ to $64\times64$ prediction blocks. Therefore, HEVC has higher intra prediction performance than H.264/AVC [4-6]. However, in order to select the optimal prediction mode, it is necessary to calculate the rate-distortion cost for all the prediction modes, so that it has high calculation amount and calculation time [7]. Table 1 show the prediction modes and names used in intra sample prediction. **Table 1. Intra Prediction Modes and Names** | Intra Prediction mode | Associated names | | | |-----------------------|------------------|--|--| | 0 | Planar mode | | | | 1 | DC mode | | | | 2-34 | Angular mode | | | Intra prediction mode 0 is the planar mode using the value and position of the reference pixel, Mode 1 is the DC mode using the average value of the reference pixel, and mode 2 to 34 are the angular mode using the directionality of the reference pixel [8]. Figure 2 shows directionality of the angular mode. Figure 2. Directionality of the Angular Mode Angular mode is a method of obtaining prediction samples considering directionality from 2 to 34 among intra prediction modes. Figure 2 shows 33 intra prediction modes for the angular mode and intra prediction with one of the parameters for generating the prediction samples with directionality. The intraPredAngle is defined by numerical values as shown in Table 2, and both the encoder and the decoder have tables. In order to perform an accurate prediction of the angular mode, the invAngle parameter is needed in addition to intraPredAngle. The contents of the invAngle parameter are summarized in Table 3. These two parameters can be used to generate intra prediction samples. Table 2. intraPredAngle According to Intra Prediction Modes | intraPredMode | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |----------------|----|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|----| | intraPredAngle | - | - | 32 | 26 | 21 | 17 | 13 | 9 | 5 | 2 | 0 | -2 | | intraPredMode | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | | intraPredAngle | -5 | -9 | -13 | -17 | -21 | -26 | -32 | -26 | -21 | -17 | -13 | -9 | | intraPredMode | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 1 | | intraPredAngle | -5 | -2 | 0 | 2 | 5 | 9 | 13 | 17 | 21 | 26 | 32 | - | Table 3. invAngle According to Intra Prediction Modes | intraPredMode | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | |----------------|-------|-------|------|------|------|-------|-------|------| | intraPredAngle | -4096 | -1638 | -910 | -630 | -482 | -390 | -315 | -256 | | intraPredMode | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | | intraPredAngle | -315 | -390 | -482 | -630 | -910 | -1638 | -4096 | - | In addition, the parameters to be determined for intra prediction are iIdx and iFact. iIdx is a parameter that determines how many values of the main array are used to generate one prediction sample Pred [x][y], and iFact can have a constant value between 0 and 31, It is a variable for taking the value between reference samples by dividing into 32 according to intraPredAngle. The equation for obtaining iIdx and iFact to generate the prediction sample predSamples [x] [y] in the vertical direction is shown in equation (1). After calculating two parameters, the process of generating a prediction sample can be defined by equation (2). Equations (3) and (4) are used to obtain iIdx and iFact for the horizontal direction and to generate the prediction samples. $$iIdx = ((y+1) \times intraPredAngle) \square 5$$ $iFact = ((y+1) \times intraPredAngle) \& 31$ (1) $$predSamples[x][y] = ((32-iFact) \times ref[x+iIdx+1] + iFact \times ref[x+iIdx+2] + 16) \square$$ (2) $$iIdx = ((x+1) \times intraPredAngle) \square 5$$ $iFact = ((x+1) \times intraPredAngle) \& 31$ (3) $$predSamples[x][y] = ((32-iFact) \times ref[y+iIdx+1] + iFact \times ref[y+iIdx+2]+16) \square$$ (4) ## 3. Fast Angular Mode Decision Algorithm The existing standard H.264/AVC supports a total of 9 prediction modes in 4×4 to 16×16 prediction blocks, whereas HEVC supports a total of 35 prediction modes in 4×4 to 64×64 prediction blocks. Therefore, HEVC has higher intra prediction performance than H.264/AVC. However, in order to select the optimal prediction mode, it is necessary to calculate the rate-distortion cost for all the prediction modes, so that it has high calculation amount and calculation time. In this paper, in order to reduce the computational complexity and computation time of the conventional intra prediction mode, the directionality is selected by using the difference and position of the original pixel data, and the algorithm that efficiently selects one of the 33 angular modes is implemented by hardware. The conventional intra prediction mode decision algorithm predicts all of the DC mode, planar mode and 33 angular mode in each 64×64 to 4×4 PU, but high computational complexity and processing time are problematic. The effective intra-picture prediction mode selection algorithm applied is to derive the most divergent positions through the pixel differences in the horizontal and vertical directions, and then predict the directionality and efficiently select the angular mode to reduce the computation time and computation time. Figure 3. Directional Estimation of the Vertical The fast angular mode decision algorithm was applied to predict direction with difference between the positions of information of original pixel to one selected angular mode. Figure 3 shows the operational methods of a vertical line in the 5x5 pixel size, as the calculation method of vertical lines. First, 5x5 pixels are separated by vertical lines, and the difference between the original pixels is calculated to obtain the position having the largest value. Then, the directionality is predicted by using the difference of the positions having the largest greatest difference of each line. Red pointer is the pixel position having the maximum difference. Table 4 compares the performance of the HM-16.9 standard software and the proposed algorithm for efficient angular mode decision. On the average, the BDPSNR increased by 0.035, the BDBitrate decreased by 0.623, and the encoding time decreased by a maximum of 11.389% compared to the HM-16.9 algorithm. Table 4. Comparison of HM-16.9 Standard Software and Proposed Algorithms | Class | Resolution | Proposed algorithm | | | | | | | |------------|------------|--------------------|-----------|-------------|----------|--------|--|--| | Class Reso | Resolution | BDPSNR | BDBitrate | ∆Bitrate(%) | ∆PSNR(%) | ∆TS(%) | | | | 4k | 3840×2160 | 0.016 | -0.613 | -0.663 | 0.007 | 7.181 | | | | Class A | 2560×1600 | 0.031 | -0.518 | -0.366 | 0.023 | 6.714 | | | | Class B | 1920×1080 | 0.027 | -0.642 | -0.543 | 0.013 | 6.112 | | | | Class C | 832×480 | 0.039 | -0.620 | -0.469 | 0.023 | 6.043 | | | | Class D | 416×240 | 0.042 | -0.672 | -0.678 | 0.012 | 11.389 | | | | Class E | 1280×720 | 0.046 | -0.848 | -0.726 | 0.019 | 5.829 | | | | Class F | 832×480 | 0.047 | -0.449 | -0.391 | 0.016 | 5.064 | | | | Average | - | 0.035 | -0.623 | -0.548 | 0.016 | 6.905 | | | # 4. Proposed Angular Mode Decision Hardware Architecture Figure 4. Proposed Hardware Architecture The proposed intra prediction hardware architecture is divided into memory part and mode decision part. The memory block consists of horizontal and vertical directions and stores the original pixels. The original pixel from the memory block is used as input to the mode decision block. In the mode decision block, the mode is determined using the difference and index of the original pixel. The mode decision is performed in parallel from 4x4 block size to 64x64 block size. The proposed hardware architecture has memory for each direction for efficient memory management. Figure 4 shows a block diagram of the proposed intra prediction angular mode decision hardware architecture. The operation of the proposed intra prediction hardware begins by receiving the original pixel data from the memory where the original pixels are stored in the horizontal and vertical directions. W\_mem\_ctrl is memory for horizontal direction and H\_mem\_ctrl is memory for vertical direction. Values received from each memory are used as inputs to the W\_ctrl module and H\_ctrl module. W\_ctrl is the angular mode selection module for the horizontal direction and H\_ctrl is the angular mode selection module for the vertical direction. The W\_ctrl and H\_ctrl modules use the original pixel data from each memory to determine the angular mode for 4x4 PU block size to 64x64 PU block block size. w\_mode4x4 to w\_mode64x64 output the angular mode for horizontal direction and h\_mode4x4 to h\_mode64x64 output the angular mode for horizontal direction and h\_mode4x4 to h\_mode64x64 output angular mode for horizontal direction and h\_mode4x4 to h\_mode64x64 output angular mode for vertical direction. **Figure 5. Original Pixel Operation Processing** Figure 5 shows the calculation process of the original pixel data. The original pixel data is used as an input from the memory for each direction, and the input original pixel data is stored in a specific register. The subtraction operation is carried out using the old original pixel data stored in the register and the new original pixel data, and the difference value and its position are stored in the Value and Position registers. Only one register was used with a comparator to use the minimized register. The stored difference value is compared with the newly obtained difference value, and a large value is updated in the register. When the calculation is completed, the largest difference value and its position are stored. This simple operation and the use of minimized registers reduce the hardware area. Figure 6 shows the parallel processing of the original pixel data. The proposed intra prediction can be processed in parallel from $4 \times 4$ block size to $64 \times 64$ block size. First, when the original pixel data is processed up to a size of $64 \times 4$ indicated by a red dot, a mode of a $4 \times 4$ block of one line is output. Then, when the process proceeds to the size of $64 \times 8$ blocks, the modes of $4 \times 4$ and $8 \times 8$ blocks are determined in parallel. When the final original pixel data is input in this manner and the mode is determined, the mode determination is performed in parallel from the size of $4 \times 4$ blocks to the size of $64 \times 64$ blocks. Figure 6. Proposed Hardware Architecture Table 5 shows the number of modes according to the block size, and the number of output modes in the horizontal direction and vertical direction is the same. For example, in a $4 \times 4$ block, the number of mode outputs for one line in a $64 \times 4$ block is 16. If all the pixel data is processed based on the 64x64 block, then the total number of modes for the 4x4 block size becomes 256. For example, in an 8x8 block, the number of mode outputs for one line in a 64x8 block is 8, and a total of 64 modes are output when the last pixel data is processed. The number of modes for the remaining block sizes is shown in the table blow. Table 5. Number of Modes by Block Size | Block size | Number of modes in one line | Total | |------------|-----------------------------|-------| | 4x4 | 16 | 256 | | 8x8 | 8 | 64 | | 16x16 | 4 | 16 | | 32x32 | 2 | 4 | | 64x64 | 1 | 1 | The overall operation of the proposed hardware architecture is shown in Figure 6. It is the same as the applied efficient Angular Mode Decision Algorithm, and the Angular Mode is efficiently determined by predicting the direction through simple calculation. While the existing intra prediction mode decision algorithm determines the mode by taking all 35 intra prediction modes into consideration, the proposed intra prediction mode angular mode decision hardware determines the mode by predicting the direction through simple calculation, Mode decision is possible and the hardware area is reduced. In order to manage the memory efficiently, memory for horizontal and vertical directions was separately prepared and original pixels for each direction were stored. Then, the process proceeds in parallel from the $4 \times 4$ block size to the $64 \times 64$ block size. When the final original pixel data is processed, the mode decision for all the block sizes proceeds. Figure 6. Proposed Hardware Architecture Operation # 5. Implementation Result In this paper, we apply an efficient Angular Mode Decision Algorithm to hardware design that shows good results when compared with standard software. We use a minimized arithmetic unit by applying the algorithm that determines the angular mode to the hardware by predicting the direction through a simple operation rather than the existing intra prediction algorithm. As a result, the hardware area is minimized by using a minimized arithmetic operator. The proposed hardware architecture was designed with Verilog HDL and used 65nm process. Synopsys synthesized by Design Compiler. Table 6 shows synthesis result and comparison of proposed hardware structure. The synthesis result shows that the gate count is 14.9K and the maximum operating frequency is 2GHz. The hardware structure of this paper is compared with Lu [9], which has the best results in terms of the number of gates and the maximum operating frequency in the comparative paper. As a result, the number of gates decreased by 75% and the maximum operating frequency increased from 622MHz to 2GHz. Table 6. Comparison with Synthesis Result of Intra Prediction Hardware Proposed | | Proposed | Lu [9] | Zhu [10] | |---------------------------|------------|------------|---------------| | Technology (nm) | 65 | 65 | 90 | | Gate Count<br>(Nand gate) | 14.9K | 59.5K | 214.1K | | Blocks supported | All blocks | All blocks | Without 64x64 | | Cycles/64x64 | 4,100 | 2,539 | 15,908 | | Max. Frequency (MHz) | 2,000 | 622 | 357 | | Throughout | 8K@60fps | 8K@30fps | 1K@44fps | ## 6. Conclusion In this paper, we describe the hardware design of the intra prediction mode for the HEVC encoder. The proposed intra-frame prediction angular mode decision hardware design using the algorithm to select the angular mode efficiently uses the efficient memory structure that stores the original pixel data in the horizontal and vertical directions and the block size of $64 \times 64$ Size blocks, and minimizes the hardware area and computation time by processing the modes for each of the divided blocks in parallel. The proposed intra-scene prediction angular mode decision hardware was designed with Verilog HDL and verified using Modelsim SE-64 10.1c simulator. We compared the designed hardware results with the results of the mode decision algorithm extracted from the HEVC standard reference software HM-16.9, and confirmed that it operates normally. The proposed intra prediction hardware was designed with Verilog HDL and Synopsys' Design Compiler supported by IDEC. The gate number is 14.9K and the maximum operating frequency is 2GHz. Finally, compared with Lu [9], which has better performance than the existing structure, the area is reduced by 75% and the maximum operating frequency is increased from 622MHz to 2GHz. ## Acknowledgments This paper is a revised and expanded version of a paper entitled [Hardware Design of Intra Prediction Angular Mode Decision for HEVC Encoder] presented at [The Third International Mega-Conference on Green and Smart Technology (GST 2016), Jeju Island, Korea and December 22, 2016]. And, this research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the Global IT Talent support program(IITP-2016-R0134-16-1019) and Human Resource Development Project for Brain scouting program(IITP-2016-R2418-16-0007) supervised by the IITP (Institute for Information and Communication Technology Promotion. #### References - [1] Y. Zhang, S. Kwong, G. Zhang, Z. Pan, H. Yuan, and G. Jiang, "Low Complexity HEVC INTRA Coding for High-Quality Mobile Video Communication," IEEE Trans. on Ind. Informat, vol. 11, no. 6, (2015) pp. 1492-1504. - [2] Y. Seo, B. Kim, and D. Kim, "Analysis of Intra Prediction for Digital Watermarking based on HEVC," Journal of the Korea Institute of Information and Communication Engineering, vol. 19, no. 5, (2015) pp. 1189-1198. - [3] H. Jung, and K. Ryoo, "An Intra Prediction Hardware Architecture Design for Computational Complexity Reduction of HEVC Decoder," Journal of the Korea Institute of Information and Communication Engineering, vol. 17, no. 5, (2013) pp. 1203-1212. - [4] H. Zhang, and Z. Ma, "Fast Intra Mode Decision for High Efficiency Video Coding (HEVC)," IEEE Trans. Circ. and Syst. for Video Technol, vol. 24, no. 4, (2014) pp. 660-668. - [5] A. Vetro, T. Wiegand, and G. Sullivan, "Overview of the Stereo and Multiview Video Coding Extensions of the H.264/MPEG-4 AVC Standard," Proc. IEEE, Special Issue on 3D media and Displays, vol.99, no. 4, (2011) pp.626-642. - [6] G. Sullivan, J. Ohm, and W. Han, "Overview of the high efficiency video coding (HEVC) standard," IEEE Trans. on Circuit Syst. Video Technol, vol. 22, no. 12, (2012) pp. 1649-1668. - [7] N. Hu, and E. Yang, "Fast Mode Selection for HEVC Intra-Frame Coding With Entropy Coding Refinement Based on a Transparent Composite Model," IEEE Trans. Circuits Syst. Video Technol, vol. 25, no. 9, (2015) pp. 1521–1532. - [8] Y. Zhang, S. Kwong, G. Zhang, Z. Pan, H. Yuan, and G. Jiang, "Low Complexity HEVC INTRA Coding for High-Quality Mobile Video Communication," IEEE Trans. on Ind. Informat, vol. 11, no. 6, (2015) pp. 1492-1504. - [9] Y. Lu, W. Cheng, L. Huang, X. Zeng, and Y. Fan, "A Flexible HEVC Intra Mode Decision Hardware for 8kx4k Real Time Encoder," 2015 IEEE 11th International Conference on ASIC, 10.1109/ASICON.2015.7517190, (2015) pp. 1-4. - [10] J. Zhu, Z. Liu, D. Wang, Q. Han, and Yang Song, "HDTV1080p HEVC Intra encoder with source texture based CU/PU mode pre-decision," Design Automation Conference (ASP-DAC), 2014 19th Asia and South Pacific, (2014) pp.367-372. ## **Authors** Jooyong Choi, received a BS Degree in Information and Communication Engineering from Hanbat National University, South Korea, in 2015. He is currently pursuing a MENG Degree in Information and Communication Engineering at Hanbat National University, South Korea. His research interests include SoC Design and Verification Platforms, Image Signal Processing and Multimedia Codec Design. **Seungyong Park,** received a BS Degree in Information and Communication Engineering from Hanbat National University, South Korea, in 2010 and MENG Degree in Information and Communication Engineering from Hanbat National University, South Korea in 2012. He is currently pursuing a PhD Degree in Information and Communication Engineering at Hanbat National University, South Korea. His research interests include SoC Design and Verification Platforms, Image Signal Processing and Multimedia Codec Design. **Kwangki Ryoo,** received BS, MS and PhD Degrees in Electronic Engineering from Hanyang University, South Korea in 1986, 1988 and 2000 respectively. From 1991 to 1994, he was an Assistant Professor at the Korea Military Academy in South Korea. From 2000 to 2002, he worked as a Senior Researcher at Electronics and Telecommunication Research Institute, South Korea. From 2010 to 2011, he was a Visiting Professor at University of Texas at Dallas. Since 2003, he has been a Professor at Hanbat National University, South Korea. His research interests include Engineering Education, SoC Design and Verification, Image Signal Processing and Multimedia Codec Design.