## Clock Gating Based Energy Efficient and Thermal Aware Design For Vedic Equation Solveron 28nm and 40nmFPGA

Bishwajeet Pandey<sup>1</sup>, Sujeet Pandey<sup>2</sup>, Shivani Sharma<sup>3</sup>, Kartik Kalia<sup>4</sup>, Khyati Nanda<sup>5</sup> and D M Akbar Hussain<sup>6</sup>

<sup>1-5</sup>Center of Excellence of Green Computing, Gyancity Research Lab, Jammu, India

<sup>6</sup>Department of Energy Technology, Aalborg University, Denmark gyancity@gyancity.com, welcomesujeet@gmail.com, shalu16sharma@gmail.com, kartikkalia4@gmail.com, khyati729@gmail.com, akh@et.aau.dk

#### Abstract

In this paper, we are integrating clock gating in design of energy efficient equation solver circuits based on Vedic mathematics. Clock gating is one of the best energy efficient techniques. The Sutra 'SunyamSamyasamuccaye' says thatif sum of numerator and sum of denominator is same then we can equate that sum to zero and find the value of unknown variable. In order to test the portability of our design, we are operating our design with respective frequency of different mobile architecture. Operating frequency of iPhone6 is 2100MHz. For thermal analysis of our energy efficient design, we have taken temperatures of four different regions of Furnace Creek Ranch (329.85K), Mohenjo-Daro (326.65K), and median temperature of Delhi (313.15K) and standard normal temperature (294.15K). Saving in clock power dissipation is 96.15% for 1400MHz, 94.59% for 1.2GHz, 93.75% for 2100MHz, 94.23% for 1700MHz, 94.54% for 1800MHz, and 94.02% for 2.2GHz, when we use gated clock instead of un gated one on 40nm FPGA and temperature is 329.85K. Power consumption in 28nm FPGA is less than 40nm FPGA.

Keywords: Arithmetic Circuits, Vedic Mathematics, FPGA, Energy Efficient, VLSI

#### 1. Introduction

For thermal analysis of our energy efficient design, we have taken temperatures of four different regions from reference [1]. Furnace Creek Ranch is area of North America recorded the highest temperature of the world that is 56.7°C [1]. Approximately 53.5°C is the maximum temperature recorded in Mohenjo-Daro situated in Sind, Pakistan [1]. Median temperature of Delhi is 40°C and standard normal temperature is 21°C [1].In order to test the portability of our design, we are operating our design with respective frequency of different mobile architecture as shown in Table 1. *e.g.* Operating frequency of iPhone 6 is 2100MHz.

Table 1. Set of Frequencies Taken in Consideration

| Frequency | Mobile set          |
|-----------|---------------------|
| 1400MHz   | Nokia Lumia 710     |
| 1.2GHz    | Samsung Galaxy Core |
| 2100MHz   | iPhone6             |
| 1700MHz   | HTC/T               |
| 1800MHz   | Micromax X091       |
| 2.2GHz    | Sony Xperia Z1      |

Latch free clock gating, latch based clock gating, and flip-flop based clock gating are three different clock gating technique [2].Our design is based on 28nm FPGA and 40nm

ISSN: 2005-4297 IJCA Copyright © 2016 SERSC FPGA and the code has been tested on Kintex-7 and Virtex-6 FPGA as shown in Table 2. Clock gating technique is used to design energy efficient sequential circuit [3], 8-bit ALU [4] and 64-bit ALU [5], global reset ALU [6], and ITC'99-b01 Benchmark Circuit [7]. In [8], architecture of multiplier based on mathematics is discussed. In this paper, we are integrating clock gating [2-7] in another Vedic formula based factorization circuits. The main motive of this work is to design an energy efficient design [9-11].

Table 2. Different Parameters in Kintex-7 and Virtex-6 FPGA

|                       | Kintex-7    | Virtex-6    |
|-----------------------|-------------|-------------|
| IO pins               | 676         | 360         |
| LUT Elements          | 101400      | 74,496      |
| Flip Flop             | 202800      | 148,912     |
| DSPS                  | 600         | -           |
| Available IOBS        | 400         | 240         |
| Block RAM             | 325         | 312         |
| GTXE2 Transceiver     | 8           | 8           |
| MMCMS                 | 8           | 8           |
| Operating Temperature | 0 °C (Min)  | 0 °C (Min)  |
| Operating Temperature | 85 °C (Ref) | 85 °C (Ref) |
| Operating Temperature | 85 °C (Max) | 85 °C (Max) |
| Temperature Grade     | C           | C           |

The Sutra 'SunyamSamyasamuccaye' says that if we sum of numerator and sum of denominator is same then we can equate that sum to zero and find the value of unknown variable.

$$\frac{3x+4}{3x+5} = \frac{3x+5}{3x+4} \tag{1}$$

In this Vedic formula, we are adding numerator  $N_1 + N_2 = 3x+4+3x+5=6x+9$ , and also adding denominator  $D_1 + D_2 = N_1 + N_2 = 6x+9$  and from SunyaSamuccaya, we get 6x+9 = 0 *i.e.* x=-3/2, solution of equation 1.

Table 3. List of Co-efficient

| D11 | D12 | D21 | D22 | N11 | N12 | N21 | N22 |
|-----|-----|-----|-----|-----|-----|-----|-----|
| 3   | 5   | 3   | 4   | 3   | 4   | 3   | 5   |

D11 is the first co-efficient of denominator and D12 is the second co-efficient of denomina tor as shown in Table 3 and Figure 1.



Figure 1. Top Level Schematic of Vedic Equation Solver

N11 is the first co-efficient of first numerator and N12 is the second co-efficient of first numerator as shown in Table 3 and Figure 1. Similarly, N21 is the first co-efficient of second numerator and N12 is the second co-efficient of second numerator.



Figure 2. RTL Schematic of Vedic Equation Solver

RTL schematic is native generic register (NGR) file. Figure 2 is RTL Schematic of Vedic Equation Solver implemented with adder, subtractor, divisor, and gate, and D flipflop.

## 2. Power Analysis

### 2.1. Power Analysis at 329.85 Kelvin Ambient Temperature

Table 4. Effect of Clock Gating and 28nm FPGA on Power Dissipation

| FREQUENCY | Clock Gated Total Power | Ungated Total Power |
|-----------|-------------------------|---------------------|
| 1400MHz   | 0.469                   | 0.479               |
| 1.2GHz    | 0.415                   | 0.423               |
| 2100MHz   | 0.664                   | 0.679               |
| 1700MHz   | 0.551                   | 0.563               |
| 1800MHz   | 0.580                   | 0.593               |
| 2.2GHz    | 0.690                   | 0.706               |

There is 2.08% for 1400MHz, 1.08% for 1.2GHz, 2.2% for 2100MHz, 2.1% for 1700MHz, 2.19% for 1800MHz, 2.26% for 2.2GHz reduction in total power dissipation when we use gated clock instead of un gated one on 28nm FPGA and temperature is 329.85K ambient temperature.



Figure 3. Power Analysis at 329.85Kon 28nm FPGA

Table 5. Effect of Clock Gating and 40nm FPGA on Power Dissipation

| Frequency | Ungated Total | Ungated Clock | Gated Total | Gated Clock |
|-----------|---------------|---------------|-------------|-------------|
| 1400MHz   | 2.424         | 0.052         | 2.132       | 0.002       |
| 1.2GHz    | 2.009         | 0.037         | 1.971       | 0.002       |
| 2100MHz   | 2.775         | 0.064         | 2.709       | 0.004       |
| 1700MHz   | 2.428         | 0.052         | 2.374       | 0.003       |
| 1800MHz   | 2.518         | 0.055         | 2.461       | 0.003       |
| 2.2GHz    | 2.856         | 0.067         | 2.786       | 0.004       |

There is 12.04% for 1400MHz, 1.89% for 1.2GHz, 2.37% for 2100MHz, 2.22% for 1700MHz, 2.26% for 1800MHz, 2.54% for 2.2GHz reduction in total power dissipation when we use gated clock instead of un gated one on 40nm FPGA and temperature is 329.85K ambient temperature.



Figure 4. Total Power Analysis at 56.7°Celsius on 40nm FPGA

There is 96.15% for 1400MHz, 94.59% for 1.2GHz, 93.75% for 2100MHz, 94.23% for 1700MHz, 94.54% for 1800MHz, 94.02% for 2.2GHz reduction in clockpower dissipation when we use gated clock instead of un gated one on 40nm FPGA and temperature is 329.85K.



Figure 5. Clock Power Analysis at 329.85K on 40nm FPGA

## 2.2. Power Analysis at 326.65K Ambient Temperature

Table 6. Effect of Clock Gating and 28nm FPGA on Power Dissipation

| Frequency | Clock Gated Total<br>Power | Ungated Total Power |
|-----------|----------------------------|---------------------|
| 1400MHz   | 0.446                      | 0.456               |
| 1.2GHz    | 0.392                      | 0.401               |
| 2100MHz   | 0.641                      | 0.656               |
| 1700MHz   | 0.528                      | 0.540               |
| 1800MHz   | 0.557                      | 0.570               |
| 2.2GHz    | 0.667                      | 0.682               |

There is 2.19% for 1,400MHz, 2.24% for 1,200MHz, 2.28% for 2100MHz, 2.22% for 1700MHz, 2.28% for 1800MHz, 2.19% for 2200MHz reduction in total power dissipation when we use gated clock instead of un gated one on 28nm FPGA and temperature is 326.65K.



Figure 6. Power Analysis at 326.65Kon 28nm FPGA

Table 7. Effect of Clock Gating and 40nm FPGA on Power Dissipation

| Frequency | Ungated Total | Ungated Clock | Gated Total | Gated Clock |
|-----------|---------------|---------------|-------------|-------------|
| 1400MHz   | 2.399         | 0.052         | 2.103       | 0.002       |
| 1.2GHz    | 1.981         | 0.037         | 1.943       | 0.002       |
| 2100MHz   | 2.746         | 0.064         | 2.679       | 0.004       |
| 1700MHz   | 2.399         | 0.052         | 2.345       | 0.003       |
| 1800MHz   | 2.489         | 0.055         | 2.432       | 0.003       |
| 2.2GHz    | 2.826         | 0.067         | 2.757       | 0.004       |

There is 12.33% for 1400MHz, 1.91% for 1.2GHz, 2.43% for 2100MHz, 2.25% for 1700MHz, 2.29% for 1800MHz, 2.44% for 2.2GHz reduction in total power dissipation when we use gated clock instead of un gated one on 40nm FPGA and temperature is 326.65K.



Figure 7. Total Power Analysis at 326.65K on 40nm FPGA

There is 96.15% for 1400MHz, 94.59% for 1.2GHz, 93.75% for 2100MHz, 94.23% for 1700MHz, 94.54% for 1800MHz, 94.02% for 2.2GHz reduction in clock power dissipation when we use gated clock instead of un gated one on 40nm FPGA and temperature is 326.65K.



Figure 8. Clock Power Analysis at 53.5°Celsius on 40nm FPGA

### 2.3. Power Analysis at 313.15K Ambient Temperature

Table 8. Effect of Clock Gating and 28nm FPGA on Power Dissipation

| Frequency | Clock Gated Total Power | Ungated Total Power |
|-----------|-------------------------|---------------------|
| 1400MHz   | 0.378                   | 0.388               |
| 1.2GHz    | 0.325                   | 0.333               |
| 2100MHz   | 0.572                   | 0.587               |
| 1700MHz   | 0.460                   | 0.472               |
| 1800MHz   | 0.489                   | 0.501               |
| 2.2GHz    | 0.598                   | 0.613               |

There is 2.57% for 1400MHz, 2.40% for 1.2GHz, 2.55% for 2100MHz, 2.54% for 1700MHz, 2.39% for 1800MHz, 2.44% for 2.2GHz reduction in total power dissipation when we use gated clock instead of un gated one on 28nm FPGA and temperature is 313.15K



Figure 9. Power Analysis at 313.15K on 28nm FPGA

Table 9. Effect of Clock Gating and 40nm FPGA on Power Dissipation

| Frequency | Ungated Total | <b>Ungated Clock</b> | Gated Total | Gated Clock |
|-----------|---------------|----------------------|-------------|-------------|
| 1400MHz   | 2.289         | 0.052                | 1.994       | 0.002       |
| 1.2GHz    | 1.872         | 0.037                | 1.835       | 0.002       |
| 2100MHz   | 2.634         | 0.064                | 2.568       | 0.004       |
| 1700MHz   | 2.289         | 0.052                | 2.235       | 0.003       |
| 1800MHz   | 2.379         | 0.055                | 2.322       | 0.003       |
| 2.2GHz    | 2.715         | 0.067                | 2.645       | 0.004       |

There is 12.88% for 1400MHz, 1.97% for 1.2GHz, 2.50% for 2100MHz, 2.35% for 1700MHz, 2.39% for 1800MHz, 2.57% for 2.2GHz reduction in total power dissipation when we use gated clock instead of un gated one on 40nm FPGA and temperature is 313.15K.



Figure 10. Total Power Analysis at 313.15K on 40nm FPGA

There is 96.15% for 1400MHz, 94.59% for 1.2GHz, 93.75% for 2100MHz, 94.23% for 1700MHz, 94.54% for 1800MHz, 94.02% for 2.2GHz reduction in clock power dissipation when we use gated clock instead of un gated one on 40nm FPGA and temperature is 313.15K.



Figure 11. Clock Power Analysis at 313.15K on 40nm FPGA

## 2.4. Power Analysis at 294.15KAmbient Temperature

Table 10. Effect of Clock Gating and 28nm FPGA on Power Dissipation

| FREQUENCY | Clock Gated Total<br>Power | Ungated Total Power |
|-----------|----------------------------|---------------------|
| 1400MHz   | 0.331                      | 0.341               |
| 1.2GHz    | 0.277                      | 0.286               |
| 2100MHz   | 0.524                      | 0.538               |
| 1700MHz   | 0.412                      | 0.424               |
| 1800MHz   | 0.441                      | 0.454               |
| 2.2GHz    | 0.550                      | 0.565               |

There is 2.93% for 1400MHz, 3.14% for 1.2GHz, 2.60% for 2100MHz, 2.83% for 1700MHz, 2.86% for 1800MHz, 2.65% for 2.2GHz reduction in total power dissipation when we use gated clock instead of un gated one on 28nm FPGA and temperature is 294.15K.



Figure 12. Power Analysis at 294.15K on 28nm FPGA

Table 11. Effect of Clock Gating and 40nm FPGA on Power Dissipation

| Frequency | Ungated Total | Ungated Clock | Gated Total | Gated Clock |
|-----------|---------------|---------------|-------------|-------------|
| 1400MHz   | 2.165         | 0.052         | 1.871       | 0.002       |
| 1.2GHz    | 1.750         | 0.037         | 1.712       | 0.002       |
| 2100MHz   | 2.509         | 0.064         | 2.443       | 0.004       |
| 1700MHz   | 2.165         | 0.052         | 2.111       | 0.003       |
| 1800MHz   | 2.254         | 0.055         | 2.197       | 0.003       |
| 2.2GHz    | 2.589         | 0.067         | 2.520       | 0.004       |

There is 13.57% for 1400MHz, 2.17% for 1.2GHz, 2.63% for 2100MHz, 2.49% for 1700MHz, 2.52% for 1800MHz, 2.66% for 2.2GHz reduction in total power dissipation when we use gated clock instead of un gated one on 40nm FPGA and temperature is 294.15K.



Figure 13. Total Power Analysis at 294.15K on 40nm FPGA

There is 96.15% for 1400MHz, 94.59% for 1.2GHz, 93.75% for 2100MHz, 94.23% for 1700MHz, 94.54% for 1800MHz, 94.02% for 2.2GHz reduction in clock power dissipation when we use gated clock instead of un gated one on 40nm FPGA and temperature is 294.15K.



Figure 14. Clock Power Analysis at 294.15K on 40nm FPGA

## 2.5. Power Analysis ForDifferent Frequencies andDifferent Temperature on 28nm FPGA

Table 12. Variation in Power Dissipation with Frequency & Thermal Scaling

| Frequency | 329.85K | 326.65K | 313.15K | 294.15K |
|-----------|---------|---------|---------|---------|
| 1400MHz   | 0.469   | 0.446   | 0.378   | 0.331   |
| 1.2GHz    | 0.415   | 0.392   | 0.325   | 0.277   |
| 2100MHz   | 0.664   | 0.641   | 0.572   | 0.524   |
| 1700MHz   | 0.551   | 0.528   | 0.460   | 0.412   |
| 1800MHz   | 0.580   | 0.557   | 0.489   | 0.441   |
| 2.2GHz    | 0.690   | 0.667   | 0.598   | 0.550   |

From Table and Figure, we can state that the maximum power consumption is at 2.2GHz and minimum power consumption is at 1.2GHz. When we talk in terms of temperature, we can see that maximum power is consumed at 329.85K and minimum power is consumed at 294.15K.



Figure 15. Power Dissipation with Frequency & Temperature Variation

# 2.6. Power Analysis For Different Frequencies & Different Temperature on 40nm FPGA

Table 13. Variation in Power Dissipation with Frequency & Thermal Scaling

| Frequency | 329.85K | 326.65K | 313.15K | 294.15K |
|-----------|---------|---------|---------|---------|
| 1400MHz   | 0.479   | 0.456   | 0.388   | 0.341   |
| 1.2GHz    | 0.423   | 0.401   | 0.333   | 0.286   |
| 2100MHz   | 0.679   | 0.656   | 0.587   | 0.538   |
| 1700MHz   | 0.563   | 0.540   | 0.472   | 0.424   |
| 1800MHz   | 0.593   | 0.570   | 0.501   | 0.454   |
| 2.2GHz    | 0.706   | 0.682   | 0.613   | 0.565   |

From Table and Figure we can state that the maximum power consumption is at 2.2GHz and minimum power consumption is at 1.2GHz. When we talk in terms of temperature we can see that maximum power is consumed at 56.7 degree Celsius and minimum power is consumed at 21 degree Celsius.



Figure 16. Power Dissipation with Frequency& Temperature Variation

### 2.7. Comparison between Kintex-7 and Virtex-6 FPGA

Table 14. Total Power Analysis for 40nm and 28nm FPGA at 329.85K

| Frequency | 28nm  | 40nm  |
|-----------|-------|-------|
| 1400MHz   | 0.469 | 0.479 |
| 1.2GHz    | 0.415 | 0.423 |
| 2100MHz   | 0.664 | 0.679 |
| 1700MHz   | 0.551 | 0.563 |
| 1800MHz   | 0.580 | 0.593 |
| 2.2GHz    | 0.690 | 0.706 |



Figure 17. Total Power for Different Frequency and FPGA

Table 15. Total Power Analysis for 40nm and 28nm FPGA at 294.15K

| Frequency | 28nm  | 40nm  |
|-----------|-------|-------|
| 1400MHz   | 0.331 | 0.341 |
| 1.2GHz    | 0.277 | 0.286 |
| 2100MHz   | 0.524 | 0.538 |
| 1700MHz   | 0.412 | 0.424 |
| 1800MHz   | 0.441 | 0.454 |
| 2.2GHz    | 0.550 | 0.565 |



Figure 18. Total Power for Different Frequency and FPGA at 294.15K

From Table 14-15 and Figure 17-18, we can state that the power consumption is more in the case of 40nm (Virtex-6) and is less in the case of 28nm (Kintex-7).

#### 3. Conclusion

This design is low power energy efficient and the code has been implemented in Xilinx ISE Design Suite 14.2 and results were tested on 28nm FPGA platform that is Kintex-7 and on 40nm FPGA platform known as Virtex-6. We have designed a device using Vedic mathematics technique which is used to find factors of any given equation and the technique is famously known as SunyamSamyaSamuccaye. This device consists of 10

inputs and 1 output. The 8 inputs are the coefficients of the numerator and the denominator and other inputs are clock and en (enable). Enable is used when we are doing gated analysis. In this paper we have done power analysis on 2 FPGA families that are Kintex-7 and Virtex-6. The design is tested by varying frequencies at different temperatures. Chosen temperatures are 56.7, 53.5, 40, 21 degree Celsius. Technique used is clock gating in this paper comparison has been done between the gated and un gated power. And from all the analysis we can conclude for 40nm and 28nm that the maximum power consumption is at 2.2GHz and minimum power consumption is at 1.2GHz. When we talk in terms of temperature we can see that maximum power is consumed at 56.7 degree Celsius and minimum power is consumed at 21 degree Celsius. And we can state that the power consumption is more in the case of 40nm (Virtex-6) and is less in the case of 28nm (Kintex-7).

## 4. Future Scope

The future scope of Clock Gating Based Energy Efficient Design for Factorization Using Sunyam Samya Samuccaye on 28nm and 40nm FPGA is that we can also implement this design on 22nm or 18 nm FPGA. We can also use different FPGA families like automotive Artix7, automotive Coolrunner2, automotive Spartan, automotive Spartan-3A DSP, automotive Spartan 3A, automotive Spartan 3E, automotive Spartan6, Spartan3, Spartan3E.Here, we are using clock gating techniques. We can redesign this design with other energy efficient technique like capacitance scaling, thermal scaling, frequency scaling, and impedance matching with different logic family, and mapping.

#### References

- [1] T. Kumar, T. Das, B.Pandey and D. M. A. Husain, "IO Standard Based Thermal/Energy Efficient Green Communication For Wi-Fi Protected Access on FPGA", 6th International Congress on Ultra-Modern Telecommunications and Control systems and Workshops, St. Petersburg, Russia, (2014).
- [2] B.Pandey and M.Pattanaik, "Clock Gating Aware Low Power ALU Design and Implementation on FPGA", International Journal of Future Computer and Communication (IJFCC), vol.2, no.5, (2013), pp. 461-465, ISSN: 2010-3751.
- [3] M.P.Dev, D. Baghel, B.Pandey, M.Pattanaik and A.Shukla, "Clock Gated Low Power Sequential Circuit Design", IEEE Conf. on Information and Communication Technologies(ICT), (2013).
- [4] B.Pandey, J.Yadav, N.Rajoria and M.Pattanaik, "Clock Gating Based Energy Efficient ALU Design and Implementation on FPGA", IEEE International Conference on Energy Efficient Technologies for Sustainability-(ICEETs), (2013), pp.93-97.
- [5] T. Kumar, B.Pandey, T. Das and S. M. M. Islam, "64 Bit Green ALU Design Using Clock Gating Technique on Ultra Scale FPGA", IEEE International conference on Green Computing, Communication and Conservation of Energy (ICGCE), (2013).
- [6] B.Pandey, J.Yadav, J. Kumar and R. Kumar, "Clock Gating Aware Low Power Global Reset ALU and Implemented On 28nm FPGA", IEEE Intl Conf on Computational Intelligence and Communication Networks (CICN), Mathura, (2013).
- [7] V. P. Singh, V. S.Chaurasia, B.Pandey, and J.Yadav, "Power Reduction of ITC'99-b01 Benchmark Circuit Using Clock Gating Techniques", IEEE International Conference on Computational Intelligence and Communication Networks (CICN), Mathura, (2013).
- [8] K. Goswami and B.Pandey, "LVCMOS Based Thermal Aware Energy Efficient Vedic Multiplier Design on FPGA", IEEE 6th International Conference on Computational Intelligence and Communication Networks (CICN), Udaipur, (2014).
- [9] T. Kumar, "CTHS Based Energy Efficient Thermal Aware Image ALU Design on FPGA", Springer Wireless Personal Communications, An International Journal, ISSN: 0929-6212(print), ISSN:1572-834X(electronic), SCI Indexed, vol. 83, no. 1. (2015)
- [10] S. H. A. Musavi, B. S. Chowdhry, T. Kumar, B. Pandey and W. Kumar, IoTs Enable Active Contour Modeling Based Energy Efficient and Thermal Aware Object Tracking on FPGA. Springer Wireless Personal Communications, vol. 85, no. 2, (2015), pp. 529-543. ISSN:1572-834X.
- [11] T. Kumar, B. Pandey, T. Das and B.S. Chowdhry, "Mobile DDR IO Standard Based High Performance Energy Efficient Portable ALU Design on FPGA", Springer Wireles Personal Communications, An International Journal, vol. 76, no. 3, (2014), pp. 569-578.

International Journal of Control and Automation Vol. 9, No. 7 (2016)