# Design of FFT Processor using Modified Modulo $\mathbf{2 n}^{\text {n }}$ +1 Adder 

Fathima Nishah $P^{1}$ and Ruksana Maitheen ${ }^{2}$<br>${ }^{1}$ Applied Electronics, Ilahia college of engineering and Technology, Ernakulam, Kerala, India<br>${ }^{2}$ Electronics and Communication, Ilahia college of engineering and Technology, Ernakulam, Kerala, India

Copyright © 2014 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

ABSTRACT: In this paper we present two different architectures for modulo $2^{n}+1$ adder and by using this an efficient FFT computation is performed. One of the architecture is based on a sparse carry computation unit in which only some of the carries are computed. In this an inverted circular idempotency property of the parallel prefix carry operator is used and its efficiency is increased by a new prefix operator. The resulting adders will be having less area and power. The second architecture is derived by modifying modulo $2^{n}-1$ adders with minor hardware overhead. By using this adder we can implement FFT processor with improved performance.

Keywords: Parallel prefix carry computation, Modulo addition, Diminished-1 addition, inverted circular idempotency, IEAC adder.

## 1 INTRODUCTION

Very Large Scale Integration (VLSI) has made a dramatic impact on the growth of integrated circuit technology. The positive improvements have resulted in significant performance/cost advantages in VLSI systems. As we know, to human decimal numbers are easy to implement for performing arithmetic operations. Binary adders are one of the most essential logic elements in a digital system. Therefore, binary addition is essential and any improvement in binary addition can result in improved performance of the system. The major problem for binary addition is the carry chain. As the width of the input operand increases, the length of the carry chain increases. In this paper two architectures for modulo addition is designed and is verified using Xilinx. The main goal is to improve the performance of the system in terms of area, speed, power etc. Using this a modified FFT processor is also designed using the above modified modulo adders.

The concept of the modulo $2^{n}+1$ adder is based on an inverted end around carry(IEAC) $n$ - bit adder which is an adder that accepts two $n$-bit operands and provides a sum increased by one compared to their integer sum if their integer addition does not result in a carry output. Since the carry output depends on the carry input, a direct connection between input and output forms a combinational loop which leads to an unwanted race condition .To avoid this Zimmermann [2],[3] proposed IEAC adders that make use of a parallel-prefix carry computation unit along with an extra prefix level that handles the inverted end-around carry.

In [4] it is explained that the recirculation of the inverted end around carry can be performed within the existing prefix levels, that is, in parallel with the carries' computation. In this way, the need of the extra prefix level is canceled and parallelprefix IEAC adders are derived that can operate fast with a logic depth of $\log _{2} \mathrm{~N}$ prefix levels. Since this requires more area than [2], [3] a double parallel-prefix computation tree is required in several levels of the carry computation unit. Select-prefix and circular carry select IEAC adders proposed in [5], [6] has less area but only less operating speed.

A fast Fourier transform (FFT) is an algorithm to compute the discrete Fourier transform (DFT) and it's inverse. Fourier analysis converts time (or space) to frequency and vice versa, FFT rapidly computes such transformations by factorizing the DFT matrix. As a result, fast Fourier transforms are widely used for many applications in engineering, science, and mathematics. Here an efficient FFT algorithm is also implemented.

## 2 Parallel-Prefix Adders

Generally parallel-prefix $n$-bit adder considered as a three stage circuit. They are pre-processing-stage, carry-computation-unit and post-processing-stage. Suppose that $A=A_{n-1} . A_{n-2} \ldots \ldots A_{0}$ and $B=B_{n-1} \cdot B_{n-2} \quad \ldots \ldots . . B_{0}$ represent the two numbers to be added and $S=S_{n-1} S_{n-2} \ldots S_{0}$ denotes their sum. The preprocessing stage computes the carry-generate bits $G_{i}$, the carry-propagate bits Pi , and the half-sum bits $\mathrm{H}_{\mathrm{i}}$, for every $\mathrm{i} ; 0<=\mathrm{i}<=\mathrm{n}-1$, according to

$$
G_{i}=A_{i} \cdot B_{i}: P_{i}=A_{i}+B_{i}: H_{i}=A_{i} \text { xor } B_{i}
$$

Where . , + , and xor denote logical AND, OR, and exclusive OR, respectively. The second stage of the adder called the carry computation unit, computes the carry signals $C_{i}$, for $0<=i<=n-1$ using the carry generate and carry propagate bits $G_{i}$ and $P_{i}$. The third stage computes the sum bits according to


Fig. 1. Parallel prefix addition basics

### 2.1 Pre-Processing Stage

The preprocessing stage computes the carry-generate bits $\mathrm{G}_{\mathrm{i}}$, the carry-propagate bits Pi , and the half-sum bits $\mathrm{H}_{\mathrm{i}}$, for every $\mathrm{i} ; 0<=\mathrm{i}<=\mathrm{n}-1$, according to

$$
\mathrm{G}_{\mathrm{i}}=\mathrm{A}_{\mathrm{i}} \cdot \mathrm{~B}_{\mathrm{i}}: \mathrm{P}_{\mathrm{i}}=\mathrm{A}_{\mathrm{i}}+\mathrm{B}_{\mathrm{i}}: \mathrm{H}_{\mathrm{i}}=\mathrm{A}_{\mathrm{i}} \text { xor } \mathrm{B}_{\mathrm{i}}
$$



Fig.2. Pre processing stage

### 2.2 Carry Computation Unit

The second stage of the adder called the carry computation unit, computes the carry signals $\mathrm{C}_{\mathrm{i}}$, for $0<=\mathrm{i}<=\mathrm{n}-1$ using the carry generate and carry propagate bits $\mathrm{G}_{\mathrm{i}}$ and $\mathrm{P}_{\mathrm{i}}$. The third stage computes the sum bits according to

$$
\mathrm{S}_{\mathrm{i}}=\mathrm{H}_{\mathrm{i}} \operatorname{xor} \mathrm{C}_{\mathrm{i}-1}
$$

Carry computation is done by using an operator called parallel prefix operator i.e, 'dot' operator, which associates pairs of generate and propagate signals and is defined as

$$
(G, P) o\left(G^{\prime}, P^{\prime}\right)=\left(G+P . G^{\prime}, P . P^{\prime}\right) .
$$

In a series of associations of consecutive generate/propagate pairs $(G, P)$, the notation ( $\left.G_{k: J} ; P_{k: J}\right)$, with $k>J$, is used to denote the group generate/propagate term produced out of bits $k ; k-1 ; \ldots$; .


Fig.3. Carry computation unit

### 2.3 Post Processing Stage

The third stage computes the sum bits according to

$$
\mathrm{S}_{\mathrm{i}}=\mathrm{H}_{\mathrm{i}} \operatorname{xor} \mathrm{C}_{\mathrm{i}-1}
$$



Fig.4. Post processing stage
Based on these concepts three types of parallel prefix adders are designed.


Fig.5. Examples of 8-bit parallel-prefix adders. (a) Kogge-Stone [7], (b) Ladner-Fischer [8] and (c) Knowles [9] family of adders

## 3 Modulo $\mathbf{2}^{\mathrm{N}} \pm 1$ Adders

### 3.1 Modulo $2^{\text {N }} \mathbf{- 1}$ Adders

The modulo $2^{n}-1$ addition is defined as

$$
\begin{align*}
&(A+B) \bmod \left(2^{n}-1\right)=\{(A+B),(A+B)<2^{n} \\
&(A+B+1) \bmod \left(2^{n}\right),  \tag{1}\\
&(A+B) \geq 2^{n}
\end{align*}
$$

A modulo $2^{n}-1$ adder can be implemented using an integer adder that increments also its sum when the carry output is one. (when $A+B \geq 2^{n}$ ). The conditional increment can be implemented by an additional carry increment stage as shown in fig. 6 . In this case, one extra level of ' $\bullet$ ' cells driven by the carry output of the adder, is required. When $A+B=2^{n}+1$, the adder may produce an all 1 s output vector, in place of zero. In most applications, this is the second representation for zero.


Fig .6. Parallel prefix modulo $2^{8}-1$ adder
The implementation of a modulo $2^{n}-1$ adder requires the connection of the carry output $C_{n-1}=G_{n-1: 0}$ of an integer adder to its carry-input port. The carries of the modulo $2^{n}-1$ adder is given by $C_{i}^{-}=G_{i: 0}+P_{i: 0}$. $\mathrm{C}_{\mathrm{in}}$. Therefore, connecting the carry output to the carry input leads to $\mathrm{C}_{\mathrm{i}}^{-}=\mathrm{G}_{\mathrm{i}: 0}+\mathrm{P}_{\mathrm{i}: 0} . \mathrm{G}_{\mathrm{n}-1: 0}$. This relation can be simplified to

$$
\begin{equation*}
\mathrm{C}_{\mathrm{i}}^{-}=\mathrm{G}_{\mathrm{i}: 0}+\mathrm{P}_{\mathrm{i}: 0} \cdot \mathrm{G}_{\mathrm{n}-1: i+1} \tag{2}
\end{equation*}
$$

The simpler equation can be equivalently expressed using the ${ }^{\circ}$ operator as follows

$$
\begin{equation*}
C_{i}^{-}=\left(G_{i}, P_{i}\right) \circ \ldots\left(G_{0}, P_{0}\right) \circ\left(G_{n-1}, P_{n-1}\right) \circ \ldots \circ\left(G_{i+1}, P_{i+1}\right) \tag{3}
\end{equation*}
$$

The above equation (3) that computes the modulo $2^{n}-1$ carries has a cyclic form and the number of generate and propagate pairs $\left(\mathrm{G}_{\mathrm{i}}, \mathrm{P}_{\mathrm{i}}\right)$ associated for each carry is n . This means that the parallel-prefix carry computation unit of a modulo $2^{n}-1$ adder has significantly increased area complexity than that of a corresponding integer adder. In terms of delay, the carries C - can be computed in $\log _{2} \mathrm{n}$ levels using regular parallel-prefix structures using end around technique. The final sum bits
$\mathrm{S}_{\mathrm{i}}^{-}$are equal to Hi xor $\mathrm{C}_{\mathrm{i}-1}{ }^{-}$.

### 3.2 Modulo $2^{\text {N }}+1$ Adders

- First method


Fig .7. Parallel prefix modulo $2^{8}+1$ adders
First method is obtained by giving slight modification to modulo $2^{n}-1$ adders. Similar to modulo $2^{n}-1$ case, the carry $\mathrm{C}_{\mathrm{i}}{ }^{+}$at the $\mathrm{i}^{\text {th }}$ bit position of an IEAC adder, when feeding the carry input $\mathrm{C}_{\text {in }}=\mathrm{C}_{-1}{ }^{+}$with the inverted carry out $\overline{\mathrm{Cn}-1}=\overline{\mathrm{Gn}-1: 0}$ can be computed as

$$
\mathrm{C}_{\mathrm{i}}^{+}=\mathrm{G}_{\mathrm{i}: 0}+\mathrm{P}_{\mathrm{i}: 0} . \overline{\mathrm{Gn}-1: l+1}
$$

Which is expressed as

$$
C_{i}^{+}=\left(G_{i}, P_{i}\right) \circ \ldots\left(G_{0}, P_{0}\right) \circ \overline{(G n-1, P n-1) o \ldots o\left(G l+1, P_{l}+1\right)}
$$

where $(\overline{g, p})$ is equal to $\overline{(g}, p)$, and the final sum bits are equal to $\mathrm{H}_{\mathrm{i}}$ xor $\mathrm{C}_{\mathrm{i}-1}{ }^{+}$.

- Second method

In this section, we focus on the design of diminished-1 modulo adders with a sparse parallel-prefix carry computation stage [1] that can use the same carry-select blocks as the sparse integer adders. The sum of a diminished-1 modulo adder is derived according to the following cases:

1. When none of the input operands is zero their number parts $A^{*}$ and $B^{*}$ are added modulo $2^{n}+1$.
2. When one of the two inputs is zero the result is equal to the nonzero operand.
3. When both operands are zero, the result is zero.

So a true modulo addition in a diminished-1 adder is needed only in case 1 , while in the other cases the sum is known in advance. The result is given as

$$
\begin{aligned}
S^{+}=\left(A^{*}+B^{*}\right) \bmod \left(2^{n}+1\right)= & \left\{\left(A^{*}+B^{*}+1\right) \bmod 2^{n}, A^{*}+B^{*}<2^{n}\right. \\
& \left(A^{*}+B^{*}\right) \bmod 2^{n},
\end{aligned} A^{*}+B^{*} \geq 2^{n} .
$$

Based on [1] different architectures for modulo $2^{16}+1$ adders are designed .According to the inverted circular idempotency property

$$
\begin{align*}
\left(\mathrm{G}_{\mathrm{i}: 0}, \mathrm{P}_{\mathrm{i}: 0}\right) & \circ \overline{(\mathrm{Gn}-1: l+1, \mathrm{Pn}-1: l+1) \mathrm{o}(\mathrm{G} l: 0, \mathrm{P} l: 0)} \\
& =\left(\mathrm{G}_{\mathrm{i}: 0}, \mathrm{P}_{\mathrm{i}: 0}\right) \circ \overline{(\mathrm{Gn}-1: l+1, \mathrm{Pn}-1: l+1)} \tag{4}
\end{align*}
$$

With $\overline{(G, P)})=(\bar{G}, P)$.
Figure (8) shows sparse-4 modulo $2^{16}+1$ adders (a)using doubled up operators and (b) using the sparse approach which is enabled by the inverted circular idem potency property. In first case the computation can be performed within $\log _{2} \mathrm{~N} \operatorname{logic}$ levels. Here some prefix operators are doubled up, since two carry computations need to be performed in parallel; one on normal propagate and generate signals, while the other on their complements. Although the sparse version of the parallelprefix adders has a lot of regularity and the area-overhead problem there is still a lot of space for improvement.


Fig.8. Modulo $2^{16}+1$ diminished-1 adders (a) existing and (b) using a sparse carry computation unit
To avoid this problem a new prefix operator called gray operator is introduced. Gray operator accepts five inputs and produces four outputs. Three of the inputs of a gray operator residing at prefix level $\mathrm{j}-1$, namely, $\mathrm{G}^{\mathrm{j}-1} \mathrm{v}, \mathrm{P}^{\mathrm{j}-1} \mathrm{v}$ and $\mathrm{T}^{\mathrm{j}-1}{ }_{v}$ form the operator's vertical input bus, the rest two $\mathrm{G}^{j-1}{ }_{L}$ and $\mathrm{P}^{j-1}{ }_{L}$ form its lateral input bus. The lateral bus signals are driven inverted to the operator. The gray operator produces three signals for its vertical successor of level $\mathrm{j}^{( } \mathrm{G}^{\mathrm{j}}{ }_{v} ; \mathrm{P}^{\mathrm{j}}{ }_{v}$ and $\left.\mathrm{T}^{\mathrm{j}}{ }_{v}\right)$ and one ( $c_{j}$ ) for its lateral successor. Compared to the 'o' prefix operator, the gray operator requires one extra gate but no extra logic levels.

Based on [1], by using gray operator,

1. Doubled up operators that associate inverted signals can be removed,
2. We can replace the top operator of every column excluding the leftmost that accepts a feedback signal with a gray operator, where $T_{V}$ input is tied to zero
3. Replace every vertical successor of a gray operator in the previous step with a gray one.


Fig.9. Proposed [1] sparse-4 modulo $2^{16}+1$ diminished- 1 adder.


Fig.10. Design of a 4-bit Carry Select Block


Fig.11. Design of a gray operator
By using this proposed sparse-4 modulo Diminished-1 adder an efficient radix-2 FFT algorithm is also implemented. A fast Fourier transform (FFT) is an algorithm to compute the discrete Fourier transform (DFT) and it's inverse. Fourier analysis converts time (or space) to frequency and vice versa; an FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse (mostly zero) factors.


Let us consider the computation of the $N=2^{v}$ point DFT. We split the $N$-point data sequence into two $N / 2$-point data sequences $f_{1}(n)$ and $f_{2}(n)$, corresponding to the even-numbered and odd-numbered samples of $x(n)$, that is,

$$
\begin{gathered}
F 1(n)=x(2 n) \\
F 2(n)=x(2 n+1), \quad n=0,1, \ldots . N / 2-1
\end{gathered}
$$

Now the N-point DFT can be expressed in terms of the DFT's of the decimated sequences as follows

$$
\begin{aligned}
X(k) & =\sum_{n=0}^{N-1} x(n) W_{N}^{k n}, \quad k=0,1, \ldots, N-1 \\
& =\sum_{n \text { even }} x(n) W_{N}^{k n}+\sum_{n \text { odd }} x(n) W_{N}^{k n} \\
& =\sum_{m=0}^{(N / 2)-1} x(2 m) W_{N}^{2 m k}+\sum_{m=0}^{(N / 2)-1} x(2 m+1) W_{N}^{k(2 m+1)}
\end{aligned}
$$

Hence the sequence $X(k)$ will be obtained. Here we are considering the 8 -point dft using the modified modulo $2^{n}+1$ adder. The Inputs are given in parallel. The advantage of using this is power and area reduction. But the hardware requirement is high. So in order to reduce this pipelining concept is introduced in which the critical path is reduced by placing delay elements between the registers. It also helps to increase the speed.

## 4 Result Analysis

The simulation is performed using XILINX in verilog HDL. The figure below shows the experimental results after the simulation.

Table 1. Experimental Results for Parallel Prefix Adders

|  | Koggestone <br> adder | Ladner-fischer <br> adder | Knowles <br> adder |
| :--- | :---: | :--- | :--- |
| Delay(ns) | 2.951 | 2.396 | 13.572 |
| Memory <br> Usage(kb) | 316132 | 317092 | 199496 |
| Power(mw) | 0.067 | 0.066 | 0.52 |

Table 2. Experimental Results for modulo $2^{16}+1$ adders

|  | Existing modulo $2^{16}+1$ <br> adder | Modulo $2^{16}+1$ adder using <br> sparse carry computation | Proposed sparse-4 modulo <br> $2^{16}+1$ diminished- -1 adder |
| :--- | :--- | :--- | :--- |
| Delay(ns) | 4.387 | 3.868 | 3.301 |
| Memory Usage(kb) | 318372 | 234916 | 144816 |
| Power(mw) | 0.068 | 0.066 | 0.042 |

### 4.1 RTL SChematic Of 8-Point FFT Processor

Here 8 inputs are given and correspondingly there will be 8 outputs also. Corresponding RTL schematic is given below.

(a)

(b)

Fig.13. RTL schematic


Fig .14. output waveform

## 5 Conclusion

In this paper, two modified power efficient modulo $2^{n}+1$ adders are presented. A novel architecture has been proposed that uses the inverted circular idempotency property of the parallel-prefix carry operator in modulo $2^{n}+1$ addition and by introducing a new prefix operator that eliminates the need for a double computation tree in the earlier fastest proposals. The experimental results indicate that the proposed architecture heavily outperforms the earlier solutions. Also an efficient 8point FFT is designed using the modified modulo adders which has performance advantages in terms of power and area.

## Acknowledgment

I have taken efforts in this project. However, it would not have been possible without the kind support and help of many individuals and organizations. I would like to extend my sincere thanks to all of them.

I am highly indebted to Mrs. Ruksana Maitheen for her guidance and constant supervision as well as for providing necessary information regarding the project \& also for their support in completing the project.

I would like to express my gratitude towards my parents \& project coordinator Mrs. Lekshmi M S (Ilahia college of engineering and Technology) for their kind co-operation and encouragement which help me in completion of this project.

My thanks and appreciations also go to my colleague in developing the project and people who have willingly helped me out with their abilities.

## References

[1] Haridimos T. Vergos and Giorgos Dimitrakopoulos "On Modulo $2^{n}+1$ Adder Design", IEEE transactions on computers, vol. 61, no. 2, February 2012
[2] R. Zimmermann, "Binary Adder Architectures for Cell-Based VLSI and Their Synthesis," PhD dissertation, Swiss Fed. Inst. Of Technology, 1997.
[3] R. Zimmerman, "Efficient VLSI Implementation of Modulo $2^{n} \pm 1$ Addition and Multiplication," Proc. 14th IEEE Symp. Computer Arithmetic, pp. 158-167, Apr. 1999.
[4] H.T. Vergos, C. Efstathiou, and D. Nikolos, "Diminished-One Modulo $2^{n}+1$ Adder Design," IEEE Trans. Computers, vol. 51, no. 12, pp. 1389-1399, Dec. 2002.
[5] C. Efstathiou, H.T. Vergos, and D. Nikolos, "Modulo $2^{n} \pm 1$ Adder Design Using Select Prefix Blocks," IEEE Trans. Computers, vol. 52, no. 11, pp. 1399-1406, Nov. 2003.
[6] S.-H. Lin and M.-H. Sheu, "VLSI Design of Diminished-One Modulo $2^{n}+1$ Adder Using Circular Carry Selection," IEEE Trans. Circuits and Systems II, vol. 55, no. 9, pp. 897-901, Sept. 2008.
[7] P.M. Kogge and H.S. Stone, "A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations," IEEE Trans. Computers, vol. C-22, no. 8, pp. 786-792, Aug. 1973.
[8] R.E. Ladner and M.J. Fischer, "Parallel Prefix Computation," J. ACM, vol. 27, no. 4, pp. 831-838, 1980.
[9] S. Knowles, "A Family of Adders," Proc. 14th IEEE Symp. Computer Arithmetic, pp. 30-34, 1999.
[10] G. DimtraKopulous, D.G. Nikolos, D. Nikolos, H.T. Vergos, and C. Efstathiou,' New Architectures for Modulo $2^{n}-1$ Adders," proc. IEEE Int'1 Conf. Electronics, Circuits, and Systems, 2008.

