# FPGA Implementation of MQ Coder in JPEG 2000 Standard - A Review

### S.D. Jayavathi and A. Shenbagavalli

Department of Electronics & Communication Engineering, National Engineering College, Kovilpatti, Tamil Nadu, India

Copyright © 2016 ISSR Journals. This is an open access article distributed under the *Creative Commons Attribution License*, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**ABSTRACT:** JPEG2000 is a one of the popular image compression standard. The vital part of this standard JPEG2000 is Embedded Block Boding with Optimal Truncation (EBCOT). This block conserves major part of the processing time for performing compression operation. The EBCOT block consists of two components called bit-plane coder and MQ coder. The use of Field Programmable Gate Arrays (FPGAs) provides specific reprogrammable hardware technology that can be properly subjugated to obtain a reconfigurable system. The current MQ coder architecture seeks ways to provide high throughput and minimum execution time with the decreasing the size and power consumption. In this study, various techniques used to implement the MQ coder block in FPGA are compared.

**KEYWORDS:** MQ coder, EBCOT, JPEG 2000 Standard, FPGA.

## **1** INTRODUCTION

Image compression gained great attention as the demand for images, video sequences and computer animation has increased at very high rate. Many Image compression techniques have evolved to perform the compression process. Lossy compression has gained increasing popularity since the release of the JPEG standard. The same concept is used by the JPEG 2000 standard with minor modifications and is developed by JPEG team [1], [2] & [3]. The JPEG 2000 standard provides a rich set of features such as error resilience, manipulation of images in the compressed domain, region-of-interest coding, acceptable performance even at very low bit rates, rate control, etc., which are not available in existing standards, in addition to excellent compression performance. [4] & [5].

Fig. 1 shows elevated block diagram of the JPEG 2000 compression scheme. In the JPEG 2000, the input image in RGB color space is transformed to YUV space. The de-correlation of YUV space image is performed by application of Discrete Wavelet Transform (DWT) [1].



Fig. 1. Block diagram of JPEG 2000 Standard

The precision of input data is high leads to degradation in compression in lossless domain. Alternatively, in a lossy compression, the quantization reduces the precision of data effectively. The transformed coefficients from DWT are then quantized. The quantized sub-bands are then divided into a number of smaller code-blocks and the code-blocks are processed in EBCOT. The core part of JPEG2000 is its coding engine, Embedded Block Coding with Optimized Truncation (EBCOT). The EBCOT algorithm is bifurcated into two layers called Tier-1 and Tier-2. Tier-1 is responsible for source modeling and entropy coding, while Tier 2 produces the output stream [6]. EBCOT Tier-1 block consists of bit plane coding and binary arithmetic coding (MQ coder) block. Fig 2 shows the block diagram of the EBCOT Tier-1. The code-block is splitted into bit-planes and each bit plane is processed by Bit-plane coding. In the bit plane coding, the most significant bits (MSBs) of all the coefficients in the code block are processed, then the next most significant bits and so on. Each bit-plane is further decomposed into three passes, known as the significance propagation (SP) pass, the magnitude refinement (MR) pass, and the clean up (CU) pass. A bit may belong to only one of the three passes. The bit-plane is first encoded by a bit-plane coding (BPC) to generate intermediate data in the form of a context (Cx) and a binary decision (D) value for each bit position. Entropy coding is performed by MQ coder and is variant of binary arithmetic coding. To drive the probability models of the MQ coder, the bit-plane coder generates a *context*. The value of the bit, or symbol, (D) and its context (Cx) together form a context-data (CxD) pair.

The estimated probability value from a lookup table is then chosen by the context information generated by EBCOT and MQ coder exploits this probability value to adjust the intervals and produce the compressed codes. The initialization of probability estimation ( $Q_e$ ) modified according to the repetitive MSB binary values reduces the width thereby, the size of Look-Up Table (LUT) is reduced. The reduction of LUT leads to memory reduction and power consumption effectively. The majority of processing time is consumed by the operation of the EBCOT algorithm in JPEG2000.



Fig. 2. Block diagram of EBCOT Tier-I

The Field Programmable Gate Arrays (FPGAs) offers specific reprogrammable hardware technology that can be properly used to obtain a reconfigurable system. This is used to enable the implementation of complex applications at a very low-power consumption. FPGAs have been referred due to the high efficiency provided by their architectural flexibility (on-chip memory, parallelism, *etc.*), reconfigurability and superb performance in the development of algorithms for highly demanding tasks.

The rest of the paper is organized as follows. Section II provides various works related to MQ coder implementation on FPGA devices and comparisons between the architectures are discussed in Section III. Finally section IV concludes the review of FPGA implementation of MQ coder in JPEG Standard.

## 2 RELATED WORK

Kishore Andra et al. [7] proposed system level hardware architecture for JPEG 2000 core algorithm. The important components are wavelet, bit plane and arithmetic coders and memory interfacing between coders and it is implemented in VHDL and the estimated area of hardware architecture is 3mm square and the frequency of operation is 200 MHz.

Since, throughput is the bottleneck in JPEG 2000 standard Arithmetic encoder, Yu-Wei Chang et al. [8] proposes Arithmetic Encoder (AE) capable of encoding multiple symbols per cycle is a competent approach to improve the throughput for high resolution image which can encode two symbols per cycle. This architecture exploits high1evel Parallelism to shorten the critical path for two-symbol encoding. This proposed architecture can achieve 180 M symbols with 0.35  $\mu m$  CMOS technology and the gate count is 7.7 K.

In [9], Liu Kai et al. have presented a new architecture with bit plane-parallel coder for Embedded Block Coding with Optimized Truncation (EBCOT) entropy encoder used in JPEG2000 to process all bit planes concurrently. In this architecture, the coding information of each bit plane can be obtained simultaneously and processed parallel. This architecture has advantages of no waste clock cycles for a single point and high parallelism compared with other architectures. The experimental results reveal that the processing time is reduced about 86% than that of bit plane sequential scheme. The prototype chip is designed using Field Programmable Gate Array (FPGA) and simulation results prove that it can able to process 512x512 gray-scaled images with more than 30 frames per second at 52MHz. The bit plane-parallel context modeling method can improve the block coding efficiency significantly. Moreover, only one low-cost arithmetic encoder is required to avoid using three arithmetic encoders. And high parallelism is the highlight of their architecture compared with existing architectures.

In [10], K. Varma, et al. proposed a new fast Split Arithmetic Encoder (SAE) block in EBCOT tier-1 design. The proposed block makes use of concurrency to attain enhanced throughput while maintaining coding efficiency. Two methods are used to evaluate the SAE process: clock cycle estimation, and FPGA hardware implementation. High throughput is achieved using both and the hardware implementation exhibits the highest speedup.

Michael Dyer et al. [11] have developed novel techniques which are capable of absorbing the high symbol rate from highly performed bit-plane coders, as well as providing five flexible design choices with novel architectures for realizing the MQ Coder for JPEG2000. Hypothesis testing method has demonstrated itself as the better solution than existing single and 2 CxD coders for a combined EBCOT system utilizing a single clock domain. This solution meets the throughput requirements of such a system without large increase in hardware cost. In terms of raw throughput, brute force coder with modified byte emission provides the highest CxD rate, with a reduced hardware cost compared to the original brute force technique.

Yijun Li et al. [12] have proposed a three-level parallel high-speed power-efficient architecture for EBCOT tier-1. This architecture is sub-divided into bit-plane coding (BC), arithmetic encoding (AE), and first-in first-out (FIFO) which connects BC with AE and the different throughput is balanced between them. To improve the system throughput, parallelism is adopted in three levels in BC: 1) the parallelism among bit planes; 2) the parallelism among coding bits and 3) the parallelism among three pass scans. Four pipeline stages are used in AE implementation. To achieve power efficiency, several techniques such as simple control logics are added to reduce computation in BC, memory access is reduced since AE is fed with fixed values instead of reading from FIFO, simple control logics are added to reduce computation in AE and forwarding technique is adopted to reduce switching activities in the very last two pipeline stages combined with clock gating. The proposed architecture can encode one code block with size in only around (0.35~0.46) clock cycles. Experimental results show that the proposed power reduction techniques keep the same system throughput and achieve power consumption of about 27% improvement compared with the architecture without these techniques.

In another implementation, two improved methods, referred as data-pairs ordering (DPO) and flexible MQ (FMQ) coder proposed by Yi-Zhen Zhang et al. [13]. It solves the configuration problem between the arithmetic coding module and the parallel context modeling module, takes full benefit of the bit plane parallel encoding technique to get better EBCOT encoder coding speed and efficiency significantly. The design of parallel EBCOT encoder is tested on the Altera Company Field Programmable Gate Array (FPGA) platform. The simulation results reveal that it can encode 54 million samples at 55-MHz working frequency on average. The proposed one can reduce execution time by 24% compared with the conventional parallel architecture design for bit plane coder.

Kie Liu et al. [14] have developed a novel architecture for an MQ arithmetic coder with high throughput. The architecture can process two symbols in parallel. The main characteristics are eight process elements for probability interval A prediction, the combination of computation units for the code register C with the Byte out & Flush procedure, and the utilization of a dedicated probability estimation table to reduce the internal memory. From synthesis results, the throughput of the architecture's can reach 96.60M context symbols per second with size of 1509 bits internal memory, compared to other architectures and suitable for chip implementation.

David Joseph Lucking.B.S. [15] has proposed a new FPGA MQ decoder design to be an improvement over previous software and FPGA implementations. The proposed design reduces the resources required by the MQ decoder by 35% and increases the clock speed by 5%. The number of clock cycles for a code block, on average, is also decreased by 7%. With the decrease in resources, clock cycles, and the increase in clock speed, the design presented in this paper achieves a higher throughput by 42%.

Nandini Ramesh Kumar et al. [16] proposed a field-programmable gate array (FPGA) based enhanced architecture of the arithmetic coder, which processes two-symbols per clock cycle as compared to the conventional architecture in which only one symbol per clock is processed. The output of the bit plane coder is the input to the arithmetic coder, in which more than two context decision pairs per clock cycle, is generated. Hence, two-symbol architecture is proposed which not only doubles the throughput, but also can be operated at frequencies more than 100 MHz to overcome this slow processing speed of the arithmetic coder, and speed up the process. A throughput of 210 Msymbols/sec and the critical path at 9.457 ns are achieved by this architecture. In [25], implementation of an efficient hardware for arithmetic coding is proposed which uses efficient parallel processing and pipelining for intermediate blocks. To provide a two-symbol coding engine, this idea is proposed which is efficient in terms of memory, performance and hardware. Verilog hardware definition language is used to implement this architecture and synthesized using Altera company field programmable gate array. The FIFO (first in first out) of 256 bits is the only memory unit used in this design to store the CXD pairs at the input, which is negligible compared to the existing architecture of arithmetic coding hardware designs. The simulation and synthesis results reveals that the operating frequency of the architecture proposed is more than 100 MHz and a throughput of 212 Msymbols/sec is achieved, which is double the throughput compared to conventional one-symbol implementation and enables at least 50% throughput increase compared to the existing two symbol architectures.

Minsoo Rhu, et al. have proposed a novel pipelined BAC architecture that can encode input symbols at a much higher rate than the conventional BAC architectures in [17]. The proposed architecture reduces the critical path delay and achieves a throughput of 400Msymbols/s. The critical path delay synthesized with 0.18µm CMOS technology is 2.42 ns, which is almost half of the delay taken in conventional BAC architectures.

Kishor Sarawadekar et al. [18] have investigated the rate of concurrent context generation to increase the throughput and they have devised a technique named as compact context coding. As a consequence, high throughput is attained and also it cuts down the hardware requirement. Renormalization and byte out stages operated concurrently to improve the performance of the matrix quantizer coder. The EBCOT encoder entire design is implemented on the field programmable gate array platform. The implementation results reveals that throughput of the proposed architecture is 163.59 MSamples/s. However, only bit plane coder (BPC) architecture operates at 315.06MHz which indicates that it is 2.86 times faster than the fastest available BPC design so far. in addition, it is able to encode digital cinema size (2048 × 1080) at 42 f/s. In [19] they have studied the rate of byte emission in an image and the number of rotations performed. It shows that in an image, one and two shifts are occurred on an average 75.03% and 22.72% of time, respectively. Similarly, two bytes are emitted concurrently about 5.5% of time. Based on these facts, a new MQ coder architecture is proposed which is capable of consuming one symbol per clock cycle. The throughput of this coder is improved by concurrently operating the renormalization and byte out stages. Synchronous shifters are used instead of hard shifters to reduce the hardware cost. The proposed architecture is tested on Stratix FPGA and is able to operate at 145.9 MHz. A minimum of 66% of memory requirement of the proposed architectures.

Omkar C Kulkarni, et al. [20] proposed high speed and area efficient MQ decoder architecture which is implemented on Virtex-2 FPGA. The implementation results reveal that the architecture operates at 142 MHz and it has very low hardware cost. Estimated frame rate is 20.41 frames per second (FPS) at this frequency. The device design operates at 222.8 MHz on Virtex-5 and estimated frame rate is 32.02 FPS. Because shift register based renormalization unit is used which operates at high speed, to the great extent hardware overhead is reduced.

Mahesh Krishnappa [23] has devised parallel coding pass architecture in which a speedup factor of up to 4.6 is offered as compared to serial hardware architecture. It was found that BPC is computationally exhaustive as compared to BAC from the FPGA resource utilization. Synthesis results have shown that BPC and BAC hardware design can work at maximum clock frequency of 128.758 MHz and 112.927 MHz. Hence hardware implementation of BPC and BAC exhibits great prospects in reducing compression time as compared to software implementation of BPC and BAC.

M. Ahmadvand et al. [24] proposed a new simplified pipelined architecture for the JPEG2000 MQ-Coder. 20% decrease in hardware requirements and 10% increase in clock frequency resulted from the proposed approach. Post synthesis simulations indicate that the proposed architecture is able to compress 4 CIF video (704×576 pixels) at a rate of 30 frames per second, making it a good approach for high resolution real time video coding, or high speed compression of high resolution images.

Jie Guo, et al. [26] presented an efficient VLSI architecture of JPEG2000 encoder. The proposed architecture functionally consists of three important parts: Discrete Wavelet Transform (DWT), block encoder (i.e., known as Embedded Block Coding with Optimized Truncation (EBCOT), which is has bit plane coder, MQ coder and rate-distortion (RD) truncation and memory management unit (MMU). For the block encoder, the bit plane parallel architecture and efficient MQ coder scheme are

adopted to improve parallelism and hardware utility. Experimental results show that a throughput of 120M samples per second is attained by the proposed efficient architecture.

David J. Lucking et al. [21] & [27] have presented JPEG- 2000 binary arithmetic decoder flexible FPGA implementation. The proposed JPEG2000 binary arithmetic decoder decreases the amount of resources used on the FPGA allowing 19% more entropy block decoders to fit on chip and consequently improving the throughput by 21% beyond previous designs.

Taoufik Saidani et al. [28] presented and implemented a hardware architecture for the high-speed parallel bit-plane coding (BPC) in EBCOT module in JPEG 2000. Experimental results demonstrate how that their design outperforms well-known techniques with respect to the processing time. It can reach 70 % reduction when compared to bit plane sequential processing.

Layla Horrigue et al. have proposed a high speed and area efficient architecture of MQ decoder which is implemented on different FPGA platforms in [29]. A novel architecture for a MQ decoder with high throughput is presented by them in [30]. This architecture has been implemented in VHDL and synthesized using Xilinx's and Altera's design platforms respectively. The design operates at 439.5 MHz when implemented on Virtex-6 and the estimated frame rate is 63.24 frames per second (FPS) at this frequency. On Stratix III device, the design operates at 214.4 MHz and it has very low hardware cost. Hardware overhead is minimized to a great extent because the structure of the probability estimation table (PET) is replaced by a small PET ROM. The internal memory is reduced by the use of a dedicated probability estimation table.

## 3 DISCUSSIONS

Still, MQ coder plays vital role in EBCOT of the JPEG2000 standard which is an important bottleneck for real-time applications. Many researchers have proposed different hardware architecture for MQ coder in order to meet the real-time requirement and table 1 shows comparison of MQ coder implementation in JPEG 2000 with the author/year, title of the paper, architecture, its merit and results which shows the critical path delay, throughput, gate count, power consumption, processing time to improve the coding speed and maximum clock frequency that it operates with demerits. Most of the researcher uses parallel architectures to obtain improved throughput. Resource utilization of the architecture has reduced by reducing memory requirement using various methods. Still there is room to reduce the resource utilization by reducing the memory requirement of PET ROM table which is used to store the probability estimation values. Further power consumed by various blocks can also be reduced.

| S.No | Author/Year                 | Title of the paper                                                                        | Architecture                                                                                                                 | Merits                                                                             | Results/Demerits                                                                                                                                                                                          |
|------|-----------------------------|-------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1    | <u>,</u>                    | High Performance Two-<br>Symbol Arithmetic Encoder<br>in JPEG 2000                        | Arithmetic encoder<br>2 stage pipeline                                                                                       | High level parallelism to<br>shorter the critical path<br>for two-symbol encoding. | Critical path 11ns<br>Throughput 2 symbol/cycle<br>achieves 180 Msymbolssec                                                                                                                               |
| 2    | Liu Kai, et al.,<br>(2006)  | A High-Performance VLSI<br>Architecture Of EBCOT<br>Block Coding In JPEG2000              | EBCOT bit plane-parallel scheme                                                                                              | High parallelism, and no waste clock cycles                                        | Reduces the processing time about<br>86%                                                                                                                                                                  |
| 3    | K. Varma, et<br>al., (2006) | A Fast JPEG 2000 EBCOT<br>Tier-1 Architecture that<br>Preserves Coding<br>Efficiency      | Split Arithmetic<br>Encoder (SAE) process.<br>parallelism of the<br>arithmetic encoding<br>tasks.                            | Concurrency to obtain<br>improved throughput                                       | 28% in total clock cycle savings is<br>realized.<br>Demerits: AE modules are slower<br>than the CSA module; SAE<br>architecture requires more<br>hardware, multiple AE modules and<br>extra FIFO buffers. |
| 4    |                             | Concurrency techniques for<br>arithmetic coding in<br>JPEG2000                            | MQ coder hypothesis<br>testing, brute force<br>method.                                                                       | Improve MQ coder<br>throughput and the byte<br>out processing<br>mechanism.        | High throughput<br>Demerits: Lower clock rates due to<br>the increased logic paths.<br>Brute force coder suffers from a<br>great reduction in operating<br>frequency,                                     |
| 5    | Yijun Li, et al.,<br>(2006) | A Three-Level Parallel High-<br>Speed Low-Power<br>Architecture for EBCOT of<br>JPEG 2000 | EBCOT tier-1, bit-plane<br>coding (BC), arithmetic<br>encoding AE),<br>parallelism in BC. simple<br>control logics are added | Reduce computation in BC; power consumption and area.                              | 27% improvement in the power<br>consumption, 6.6K gate counts for<br>data path and control unit and 49<br>29 bits memory                                                                                  |

## Table 1. Comparison of MQ coder Implementation in JPEG 2000 standard

| 6  | Yi-Zhen<br>Zhang et al.,<br>(2007)            | Performance analysis &<br>architecture design<br>forJPEG2000                                   | EBCOT data-pairs<br>ordering (DPO) and<br>flexible MQ (FMQ)<br>coder.                                                                             | Improves the coding speed and efficiency.                                                                                                                    | Reduced execution time by 24%.<br>Demerits: Delay may occur in stage<br>1 of FMQ. Some delays can be<br>smoothed by DPO, but others<br>cannot be. |
|----|-----------------------------------------------|------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| 7  | Kai Liu a,n et<br>al.,(2010)                  | A high performance MQ<br>encoder architecture in<br>JPEG2000                                   | MQ Encoder pipelining                                                                                                                             | Process two symbols in<br>parallel. High throughput.<br>Simple structure of the<br>probability estimation<br>table(PETROM).                                  | Clock 220.10 MHZ<br>Throughput 440.20MHZ reach<br>96.60MSPSat its maximum speed                                                                   |
| 8  | Nandini<br>Ramesh<br>Kumar, et al.,<br>(2010) | An FPGA based fast two<br>symbol processing<br>architecture for JPEG 2000<br>arithmetic coding | MQ Coder<br>Two symbol<br>architecture<br>Pipeline 4stage in VHDL                                                                                 |                                                                                                                                                              | Critical path 9.457ns<br>Throughput 210 msymbol/sec<br>Max frequency 106.2MHZ                                                                     |
| 9  | Minsoo Rhu<br>et al., (2010)                  | Optimization of arithmetic<br>coding for JPEG2000                                              | Arithmetic encoder<br>Trace Pipeling<br>Renormalizing look<br>ahead scheme                                                                        | the critical path delay.<br>Increase the throughput.<br>Reduce the number of                                                                                 | Critical path2.42ns<br>Clk Frequency 413MhZ<br>FOM 46.93<br>Demerit: Gate count is increased<br>only by 5%.due to parallel<br>processing          |
| 10 |                                               | An Efficient Pass-Parallel<br>Architecture for<br>Embedded Block Coder in<br>JPEG 2000         | EBCOT<br>compact context coding                                                                                                                   | High throughput is<br>attained and hardware<br>requirement<br>is also cut down.                                                                              | Throughput is 163.59 MSamples/s<br>Demerits: Design requires more<br>memory bits.                                                                 |
| 11 | Kishor<br>Sarawadekar<br>et al., (2012)       | VLSI design of memory-<br>efficient,high-speed base<br>line MQ coder for JPEG<br>2000          | MQ coder operating<br>the renormalization and<br>byteout stages<br>concurrently.<br>synchronous shifters<br>are used instead of hard<br>shifters. | Throughput is improved.<br>Hardware cost is reduced.<br>Memory requirement is<br>reduced                                                                     | Operating at 145.9MHz.<br>Memory increased to 66%                                                                                                 |
| 12 | Omkar C<br>Kulkarni, et<br>al., (2011)        | VLSI Implementation of MQ<br>Decoder in JPEG2000                                               | MQ Decoder                                                                                                                                        | High operating frequency.                                                                                                                                    | Maximum operating speed is 222.8<br>MHz and estimated frame rate is<br>32.02 FPS.<br>Demerit: Requires one extra cycle<br>to decode each symbol.  |
| 13 |                                               | Low GPU occupancy<br>Approach to Fast Arithmetic<br>coding in JPEG 2000                        | MQ coder<br>improved enhanced<br>renormalization                                                                                                  | -                                                                                                                                                            | -                                                                                                                                                 |
| 14 | M.<br>Ahmadvand,<br>et al., (2012)            | A New Pipelined<br>Architecture for JPEG2000<br>MQ-Coder                                       | MQ Coder<br>Five pipeline stage                                                                                                                   | the number of clock<br>cycles for an encoding<br>process is higher critical<br>path has been reduced                                                         | Clock 208.1 MHZ<br>Gate count 20% lower than the next<br>fastest design.                                                                          |
| 15 | Nandini<br>Ramesh<br>Kumar et al.,<br>(2012)  | Two-Symbol FPGA<br>Architecture for<br>Fast Arithmetic Encoding in<br>JPEG 2000                | Arithmetic encoder<br>Pipelinig and parallel<br>processing                                                                                        | Providing a higher<br>throughput and an<br>operating frequency of<br>100 MHz. The coding<br>efficiency is not affected<br>and the memory is kept<br>minimum. | Memory unit is negligible<br>Operating frequency 100MHz<br>Throughput 212 Msymbols/sec                                                            |
| 16 | Jie Guo, et al.,<br>(2013)                    | Efficient VLSI Architecture<br>of JPEG2000 Encoder                                             | JPEG 2000 Encoder RD<br>truncation                                                                                                                | Used to gain higher<br>computational accuracy<br>under lower hardware<br>overhead constraints.<br>Reduce processing time.                                    | Throughput of 120M Samples per second.                                                                                                            |
| 17 | David J.<br>Lucking, et<br>al., (2013)        | FPGA implementation of<br>the JPEG2000 binary<br>arithmetic (MQ) decoder                       | MQ Decoder<br>Eliminated barrel<br>shifter and instead<br>performing the                                                                          | Higher clock<br>speed. Minimizes the<br>amount of logic                                                                                                      | Reduces the resources required by 37% and increases the clock speed by 12% increasing the throughput by 21%                                       |

|    |                            |                                                                                         | renormalization in a recursive nature.                                                    |                                                                                                                                      |                                                                                            |
|----|----------------------------|-----------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
| 18 | Saidani, et al.,<br>(2013) | An efficient hardware<br>implementation of<br>parallel EBCOT algorithm<br>for JPEG 2000 | EBCOT bit<br>plane-parallel scheme                                                        | Reduced clock<br>cycles,faster,LEs has<br>reduced due to the<br>reduced data width<br>of the (CX,D) pairs.                           | 70 % reduction in processing time.<br>Maximum operating speed of this<br>design is 186 MHz |
| 19 |                            | A High Performance MQ<br>Decoder Architecture in<br>JPEG2000                            | MQ Decoder                                                                                | high speed architecture                                                                                                              | Maximum operating speed is<br>439.58 MHz and estimated frame<br>rate is 63.23 FPS.         |
| 20 | Horrigue, et               | implementation of MQ                                                                    | MQ Decoder based<br>on reduced probability<br>estimation block and<br>faster MQ decoding. | Use of a dedicated<br>probability estimation<br>table decreases the<br>internal memory. Uses a<br>small area to get a high<br>speed. | The Maximum operating speed is<br>439 MHz. Memory requirement is<br>reduced by 37.1%       |

## 4 CONCLUSION

This paper reviews the different hardware architecture used in FPGA Implementation of MQ coder of EBCOT in JPEG 2000 standard and the performance of architectures are analyzed in terms of resource utilization, maximum clock frequency, execution time and throughput. It is found that most of the architecture uses either pipelining or parallel processing or both to process two symbols at a time. Minimum area is the one of design goal of the VLSI technology. Hence it is also identified to reduce the area by minimizing the memory requirement of the probability estimation table without affecting the performance of the MQ coder. In future, the research, by reducing the chip area and power consumption, efficient architectures for MQ coder can be designed by minimizing the memory requirement.

## REFERENCES

- [1] Tinku Acharya & Ping-Sing Tsai, JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures, John Wiley & Sons, Inc., Publication, 2005.
- [2] ISO/IEC JTC 1/SC 29/WG 1, (ITU-T SG8) Coding of Still Pictures, JBIG (Joint Bilevel Image Experts Group), JBIG Committee, 16 Juillet 1999.
- [3] ISO/IEC JTC1/SC29/WG1 (ITU-T SG 16) The JPEG-2000 Still Image Compression Standard, document JPEG 2000 Part 1 020719 (final publication draft) (2002)
- [4] Taubman, D.S., Marcellin, M.W, "JPEG2000 image compression fundamentals, standards, and practice" (2002)
- [5] Rabbani, M., Joshi, R. "An overview of the JPEG 2000 still image compression standard". Signal Process Image Commun 17(1), 3–48 (2002)
- [6] Gaetano Impoco , JPEG2000 A Short Tutorial
- [7] Kishore Andra, Chaitali Chakraborti, and Tinku Acharya, "High performance JPEG 2000 Architecture", IEEE conf. 2002.
- [8] Yu-Wei Chang, Hung-Chi Fang, and Liang-Gee Chen, "High Performance Two-Symbol Arithmetic Encoder in JPEG 2000", *IEEE conference proceedings*,2004.
- [9] Liu Kai, Wu Chengke, Li Yunsong, "A High-Performance VLSI Architecture Of EBCOT Block Coding In JPEG2000", Journal Of Electronics (China), Vol.23 No.1 January 2006
- [10] K. Varma, A. E. Bell, H. B. Damecharla, J. E. Carletta, "A Fast JPEG2000 EBCOT Tier-1 Architecture That Preserves Coding Efficiency", *IEEE conference proceedings*, 2006.
- [11] M. Dyer, D. Taubman, S. Nooshabadi, and A. Kumar Gupta, "Concurrency techniques for arithmetic coding in JPEG2000", IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, no. 6, pp. 1203–1213, Jun. 2006.
- [12] Yijun Li, and Magdy Bayoumi, "A Three-Level Parallel High-Speed Low-Power Architecture for EBCOT of JPEG 2000", *IEEE Transactions On Circuits And Systems For Video Technology*, Vol. 16, NO. 9, pp.no.1153-1163, September 2006.
- [13] Yi-Zhen Zhang, Chao Xu, Wen-Tao Wang, and Liang-Bin Chen, "Performance Analysis and Architecture Design for Parallel EBCOT Encoder of JPEG2000", IEEE Transactions On Circuits And Systems For Video Technology, Vol. 17, No. 10, pp.no. 1336-1347, October 2007.
- [14] K. Liu, Y. Zhou, Y. Song Li, J.F. Ma, "A high performance MQ encoder architecture in JPEG2000", INTEGRATION, the VLSI journal 43 (3) (2010) 305–317.
- [15] David J. Lucking, Eric J. Blaster, Kerry L. Hill, Frank A. Scarpino, "FPGA implementation of the JPEG2000 MQ decoder", Master's thesis, University of Dayton, May 2010.

- [16] N.Nandini Ramesh Kumar, Wei Xiang, Yafeng Wang, An FPGA-Based Fast Two-Symbol Processing Architecture For JPEG 2000 Arithmetic Coding", *IEEE conf.* 2010.
- [17] M. Rhu and I.-C. Park, "Optimization of arithmetic coding for JPEG2000," *IEEE Transactions on Circuits and Systems for Video Technology*, vol. 20, pp. 446-451, 2010.
- [18] K. Sarawadekar and S. Banerjee, "An Efficient Pass-Parallel Architecture for Embedded Block Coder in JPEG 2000," *IEEE Transactions on Circuits and Systems for Video Technology*, vol. 21, pp. 825-836, 2011.
- [19] K. Sarawadekar and S. Banerjee, "VLSI design of memory-efficient, high-speed baseline MQ coder for JPEG 2000", VLSI journal INTEGRATION, vol. 45, pp. 1-8, 2012.
- [20] Omkar C. Kulkarni, Kishor Sarawadekar, Swapna Banerjee, "VLSI Implementation of MQ Decoder in JPEG2000", in *Proceeding of the 2011 IEEE Students' Technology Symposium*, pp. 193–197, January 2011.
- [21] David J. Lucking, Eric J. Blaster, Kerry L. Hill, Frank A. Scarpino, "FPGA implementation of the JPEG2000 binary arithmetic MQ decoder", J. Real Time Image Process. Syst. (July) (2011). Springer.
- [22] Jiri Matela, Martin Srom, and Petr Holub, "Low GPU Occupancy Approach to Fast Arithmetic Coding in JPEG2000", Conference paper, January 2011. at: http://www.researchgate.net/publication/221274103
- [23] Mahesh Krishnappa, "Parallel architectural design space exploration for real-time image Compression", Maters thesis, University of Stuttgart, May 2011.
- [24] M. Ahmadvand and A. Ezhdehakosh, "A new pipelined architecture for JPEG2000 MQ-coder," in Proceedings of the World Congress on Engineering and Computer Science, 2012, pp. 24-26.
- [25] Nandini Ramesh Kumar, Wei Xiang, Yafeng Wang, "Two-symbol FPGA architecture for fast arithmetic encoding in JPEG 2000", J. Signal Process. Syst. 69 (2012) 213–224. Springer.
- [26] Jie Guo, Yunsong Li, Kai Liu, Jie Lei, Chengke Wu, "Efficient VLSI Architecture of JPEG2000 Encoder", 6th International Congress on Image and Signal Processing (CISP 2013)
- [27] D. J. Lucking, E. J. Balster, K. L. Hill, and F. A. Scarpino, "FPGA implementation of the JPEG2000 binary arithmetic (MQ) decoder," *Journal of real-time image processing*, vol. 8, pp. 411-419, 2013.
- [28] T. Saidani, M. Atri, L. Khriji, and R. Tourki, "An efficient hardware implementation of parallel EBCOT algorithm for JPEG 2000", *Journal of Real-Time Image Processing*, pp. 1-12, 2013.
- [29] L. Horrigue, T. Saidani, R. Ghodhbane, and M. Atri, "A high performance MQ decoder architecture in JPEG2000," in *World Congress on Computer Applications and Information Systems* (WCCAIS), 2014, pp. 1-5.
- [30] L. Horrigue, T. Saidani, R. Ghodhbani, J. Dubois, J. Miteran, and M. Atri, "An efficient hardware implementation of MQ decoder of the JPEG2000", *Microprocessors and Microsystems*, vol. 38, pp. 659-668, 2014.