# The Design of FPGA-based Digital Image Processing System and Research on Algorithms

Rui Lu<sup>1</sup>, Xiaohui Liu<sup>1</sup>, Xiaodan Wang<sup>2</sup> \*, Jin Pan<sup>1</sup>, Kuangyi Sun<sup>1</sup> and Hellen Waynes<sup>3</sup>

<sup>1</sup>National Computer Network Emergency Response Technical Team/Coordination Center of China <sup>2</sup>Yonyou Network Co. Ltd <sup>3</sup>State University of New York at Cortland, NY, USA (Corresponding Author: Xiaodan Wang)

### Abstract

Image preprocessing system is developing towards high-speed, high-resolution, highintegration and high-reliability. Image processing systems are applied in military and commercial fields widely. Respectively, in the military field, because of its interference immunity, noncontact and concealment, objects detection which is based on image processing technology has become an important research topic. In the commercial field, it is widely used in machine vision and industrial detection systems. In general, there are 3 different kinds of image processing systems to implement the digital image processing algorithms, and the main chips of each system are ASIC (Application Specific Integrated Circuit), DSP (Digital Signal Process chip) and FPGA (Field Programmable Gate Array). In this paper, we designed a FPGA-based image processing system. The system can sample the data stream from the camera and then digitalize it, convert the digital signals to analog signals. The results show that FPGA-based image processing system is suitable for image preprocessing.

Keywords: image processing, FPGA, fast median filtering, square window

# **1. Introduction**

The study of image acquisition and image processing has been a hot research field, and signal processing, pattern recognition, artificial intelligence are involved. This technology is mainly used in automotive electronics, consumer electronics, security monitoring, national defense and other fields of 3D projection. The increasing popularity of digital image processing technology is inseparable with perfecting of processing systems. In the image processing system, the key technology is real-time image acquisition and processing. Meanwhile, the speed and quality of image acquisition directly affect the system.

The image processing system contains two parts, image acquisition and image processing. Image acquisition section contains video image processor, image cache, and control interface circuit. The main use of this section is to convert analog video signals obtained from real-time vision sensor into digital image signals, then transfers these digital image signals to computer for next image processing section. The image processing part can be a computer, a special image processing device, or combination of these two devices. A special image processor is mainly made up by ASIC (Application Specific Integrated Circuit), DSP (Digital Signal Process), and FPGA (Field Programmable Gate Array). They can complete the real-time image processing algorithms effectively.

In recent years, development of microelectronic technology and large-scale integrated circuits manufacturing technology, especially FPGA, provides new ideas and methods in order to improve the performance of image processing system. Due to the large amount of data and fast processing speed-needed of the low-level image preprocessing, the image processing system based on FPGA is well used in image preprocessing field.

With the development of multimedia technology in recent years, the demand of video information has been a new lead for people, by the way, image acquisition and processing is becoming more and more important. Rely on the technology of computer, communication, there are five developing trends of image processing system.

(1) With the development of hardware, the performance of the image processing system will become higher, and the price will be gradually reduced.

(2) The function of the image processing system will be integrated in a portable electronic device, instead of PC or any other accessory equipment.

(3) Due to the popularity of the network, the image processing system will combine with the network in order to achieve remote image acquisition and transmission.

(4) For integrated development of software in image processing system, it makes easier to develop new image processing algorithms, and has high efficiency.

(5) In order to meet different requirements, the developers may use DSP, ASIC, or FPGA for dedicated image processing system [3-4].

FPGA can design different function hardware circuits according to different demands. Pipelining and parallel processing technology can be used in system design, in order to make processing algorithms more effective. The development cycle of FPGA is shorter than ASIC, while FPGA is easy to maintain and expand, and it has great advantages in real-time image processing. FPGA applies in many image preprocessing algorithms, such as image scaling, image rotation, image compression, edge detection, median filtering and histogram equalization.

In this paper, we introduced the basic principle and structure of FPGA, and analyzed The principle of image processing algorithms. In our researches, we proposed fast median filtering and edge detection algorithms based on FPGA image processing system.

### 2. Design Introduction of Field Programmable Gate Array (FPGA)

With the continuous development of electronic technology, digital integrated circuits are widely used in various fields. They developed to VLSIC and ASIC from early middle or small scale integrated circuit. Although ASIC has low cost, however, its design cycle is long, and has high risks. Thus, people will pay more attention to FPGA which is flexible. FPGA is a product of the further development on the basis of PAL and GAL. FPGA not only solves the disadvantages of custom circuits, but also overcomes the disadvantage of the limited gate number of original programmable devices [5].

FPGA has a strong flexibility and practicability, it can change the function of the chips by means of changing internal hardware logic without limits. So far, FPGA chip has achieved higher level status in aspects of integration, capacity and speed [6].

#### 2.1. Development of FPGA

Programmable logic devices (PLD) is mainly used to solve the problems of various types of storage since 1970s. Later it applied in several kinds of application logic [7].

(1) Early PLDs, such as ROM, EPROM and EEPROM, are mainly used to figure out several kinds of storage issues. However, due to their limited structure, they can only complete the simple digital logic function.

(2) For some complex structure PLDs, such as PAL (Programmable Array Logic), GAL (Generic Array Logic) and PLA (Programmable Logic Array), they are all flexible in their designs, although they achieve small scale circuits.

(3) In 1980s, CPLD (Complex Programmable Logic Device) and FPGA (Field Programmable Gate Array) came to our sight. They both have architectures, flexible logical units(LU), high integration levels, *et al.* FPGA can achieve large-scale circuits designs and flexible programming, it 's design and development cycle is short, owns advanced develop tools which cost less.

At present, FPGA chips have several functions [8]:

(1) Support D/A and A/D, and contain difference interface which more than 50HMz.

(2) Use structure of on-chip phase locked loop, to reduce the distortion of the signal and the clock multiplexing in support of high-speed clock at the same time.

(3) Further, simplify the logic and I/O function module, meanwhile, provide more routing resources.

(4) Provide RAM-distributed and RAM-blocked for different demands of various RAM sizes.

(5) Simplified logical function blocks(LFB) contain fast carry logic function, and provide multiplying circuits for DSP applications.

(6) The local routing of LFB and the general routing can predict network time-delay precisely.

In addition, when the line width of chip decreases, chip working voltage is reduced accordingly.

### 2.2. The Fundamental Principle and Characteristics of FPGAs

**2.2.1. The Structure and Principle of FPGAs:** The structure of FPGA can be divided into two types, one is PLD structure based on the product-term which is produced by EEPROM and Flash technology, and it can work when power on without other chips. Other one is the structure based on Look-Up-Table (LUT) which is introduced and used in this paper.

At present, most of FPGA chips use LUT logical unit with 4 inputs, thus, each LUT can be regarded as a  $16 \times 1$  RAM with 4-bit address line. When users describe a logic circuit by use of HDL language or a schematic diagram, the development software of CPLD/FPGA could compute all the possible outcomes of logical circuits automatically, and then writes the results to RAM. In this way, a logic operation of each input signal is equal to the input of an address for the LUT, then output the content corresponding to the addresses.



Figure 1. Image Processing Structure based on Image Frame Storing



Figure 2. Image Processing Structure based on Computer Memory

**2.2.2. The Fundamental Features of an FPGA:** FPGA has all advantages of ASIC, in addition, it has other following features[9]:

(1) FPGA can be repeatedly programmed, edited, and used. In the case of inoperation of peripheral circuit, design of different on-chip logic can achieve different circuit functions. In certain areas, ASIC cannot keep up with technological development needs, thus, we can only use FPGA to develop new systems.

(2) Process with less investment. FPGA chips are tested in manufactory, and the design of these chips is flexible, so it is easy to check out the mistakes. Thus, we can save a lot of potential costs.

(3) Scale grows larger. With the rapid development of electronic technology and especial progress in VLSI recently, a single chip can contain millions of transistors, it is the same to FPGA. The larger the size of chip can make functions stronger.

(4) Good in efficiency and security. FPGA technology can protect the safety of the system and The intellectual property rights of the designers.

(5) FPGA technology has intelligent development tools and powerful function. FPGA software is easy to use, so that the designers can concentrate on the circuit design and push the products to the market as soon as possible.

(6) The new FPGA embedded CPU or DSP core which can be used as hardware platform of SOPC, and support hardware-software Co-design.

Namely, FPGA chip is the best choice which improves integration level and reliability of systems. Figure 3 shows the relative costs of FPGAs, ASICs and structured ASICs.



Figure 3. The Relative Costs of FPGAs, ASICs and Structured ASICs

### 2.3. The Fundamental Principle and Characteristics FPGA

FPGA is made up by three programmable circuits and a SRAM[10,11]. These three programmable circuits contain CLB, IOB and IR.



Figure 4. The Basic Architecture of an FPGA

Figure 5 shows the types of programmable logic array. Left: both the input AND and output OR sections are programmable. Centre: PAL, with programmable AND section and fixed OR section; right: PROM, with fixed AND section and programmable output OR section; bottom: interpretation of the programmable sections.



Figure 5. Types of Programmable Logic Array

In Figure 6, we point out several important functional units(FU) in an FPGA.





Embedded Multiplier Block

Figure 6. Important Functional Units in an FPGA

#### 2.4. Steps to Implementing a Design on FPGAs

The tools for implementing algorithms and systems on FPGAs need working at a pretty higher level. By the way, implementing a working design requires several steps, which is illustrated in Figure 6[12]. Firstly, the design is coded by using a Hardware Description Language(HDL) and catch up the essential aspects of the design. Most development environments enable the hardware description to be 'compiled' and simulated at the logical level to make sure that it behaves in the intended manner.

*Synthesis* takes the logical representation describing the hardware, and converts it to devices or gate level net-list representing the actual hardware that is to be constructed on an FPGA. Synthesis constraints control aspects of the synthesis process, for instance: optimise for speed or areal; automatically extract and optimise finite state machines and so on. Gate level functional simulation verifies that the design has been synthesised in the right way, and provides some information on the specific low level characteristics of the circuits than the higher level simulation.

The next step is to map the net-list onto the FPGA, and there are two phases to this process. *Mapping* determines how the logic maps onto the specific components available on FPGA. *Place and route* associates these mapped components with particular logic blocks on FPGA and determines the routing required to connect the logic blocks, memories and I/Os as well.

The final stage is to generate the configuration file required to programme the FPGA. During development, the configuration file can be loaded onto the target system to verify that this design works as intended and is correctly interacts with the other components within the system.

As we can see from Figure 7, implementing a design on an FPGA is quite different form implement a design in software. It requires a hardware mindset, even during the initial design. Nevertheless, image processing is considered to be a software development topic. Successful implementation of algorithms on an FPGA therefore requires a mix of both hardware and software skills.



Figure 7. Steps to Implementing a Design of an FPGA

### **3.** Algorithms of Digital Image Processing

In this paper, we mainly focus on the design of image processing system based on FPGA and study of algorithms, thus, image processing algorithms are the key content. Deep understanding and analyses of these image processing algorithms is important to design system.

In this section, firstly, we introduce the concept of a square window, and analyze median filtering, fast median filtering, rank order filter. In particular, we proposed a fast median filtering algorithm on the basis of traditional median filtering algorithm.

### 3.1. Square Window

During digital image processing, many algorithms are based on sliding window, while sliding window operation is the basis of image filtering and morphological operation. This operation uses a window to compute the output of algorithms. For instance, sliding square window operation could compute the average value of all pixels in the whole neighborhood, and this is the principle of mean filter.



Figure 8. 3×3 Square Window

This paper is based on the square window map to achieve several image processing algorithms. In general, filter windows are sliding windows which has odd size, such as  $3\times3$ ,  $5\times5$ ,  $7\times7$ , *et.al*. We chose  $3\times3$  square window because it is enough to process a gray image (size 256×256). The larger size of square window we choose; the higher cost will generate.

### **3.2. Median Filtering**

Median filter [13,14] is a kind of nonlinear image processing method. It is able to filter the impulse noise and protect the target image edge when compare to other linear filters. This method is a neighborhood operation which is similar to convolution, and this method is to order the neighborhood pixels in terms of gray level, then output the median value as pixel value. Median filtering can be defined as follows,

$$g(x, y) = median\{f(x-i, y-j)\} \ (i, j) \in W$$

$$(1)$$

Where g(x, y) is input pixel gray value and f(x-i, y-j) is output pixel gray value, W is template Window. W can be a line, square, ten-shaped, circle, diamond, *et.al*. We chose  $3 \times 3$  square window to achieve median filtering on an FPGA for convenience.



Get 3x3 Square Window Median Value replaces Middle Position Pixel Value



Median Value

Figure 9. Framework of Median Filtering Algorithm

In Figure 9, the window per-pixel slides with the line direction of image data. During sliding, all pixels in this window are ordered by the terms of the gray value. The median value is regarded as output value, and replaces the middle position pixel value which is in the center of original window function.

Although median filtering is pretty easy, the computation is extremely big. In a realtime image processing system, we must keep a high speed to process the data. We only take one hour to complete the median filtering processing when we use FPGA hardware.

### 3.3. Fast Median Filtering

According to the traditional median filtering algorithm sort time consuming, which is unfavorable for real-time processing. We proposed an in-window pixel gray value sorting algorithm, that is fast median filtering. This method makes image data processing speed quicker than the traditional median filtering algorithm.

**3.3.1. Introduction of Fast Median Filtering Algorithm:** Firstly, sort pixel gray value in descending order, where  $X_{i,j}: i = 1 \sim W$ ,  $j = 1 \sim W$  (*W* is window, and W = 2k + 1), and we obtain  $X'_{i,j}(X'_{i-1,j} \ge X'_{i,j})$ . Similarly, sort  $X'_{i,j}$  in descending order, and obtain  $X''_{i,j}(X''_{i-1,j} \ge X''_{i,j})$ , output median pixel value on the diagonal line. The value denoted as  $M' = Med\{X''_{(w+1-k),k}: k = 1 \sim W\}$ . The process is shown in Figure 10.



Figure 10. Flow Chart of Fast Median Filtering Algorithm

### 3.3. Rank Order Filter

Rank order filter [15] belongs to nonlinear filter which is widely applied in digital image processing, it can smooth the images and remove noise. Rank order filter can implement arbitrary rank filter such as minimum filter, maximum filter, *et.al*.

The algorithm principle of rank order filter is similar to median filtering algorithm. Take  $3\times3$  window, for example. We sorted the 9 pixel values in this window, and determined the output pixel value according to the output order series, which replaces middle position pixel value.



### 4. Algorithms Implementation and Simulation

The algorithm proposed in this paper differs from traditional sorting algorithm, this is a improvement algorithm based on bubble sort method, and combine FPGA hardware advantages. The result shows that we implement median filtering with low calculation.

Fast median filtering algorithm use pipeline parallel method, thus, implementation of one median filter requires 19 comparisons. After 9 clocks, the system computes a result each clock. All 9 data which output from window are divided into 3 groups ( $C_1$ ,  $C_2$ ,  $C_3$ ), then obtain the minimum value in group *max* after parallel comparison between  $C_1$ ,  $C_2$ ,  $C_3$ .

Figure 12 is the schematic diagram of fast median filtering. There are 7 comparators with 3-input and 3-output and latency is 3 clock signals. It based on bubble sort method, and output from height to low.



Figure 12. Schematic Diagram of Fast Median Filtering

The algorithm proposed in this paper was described with VHDL, and the architecture was implemented in MAXPLUSH of Altera. Figure 13 shows the 8-bit data median filtering algorithm with window width of 9 samples. For instance, if the input 9 samples are 24,138,98,42,200,34,121,132 and 124, then the staggered inputs from input shift register arrays are shifted into each bit planes. In this example, the final result is decimal value of 121. During every clock cycle, the window shifts right by one column, then 24,148 and 98 would be replaced by 150,43 and 122 respectively. The same operations are continuously performed until all the image samples are entered.



Figure 13. Description of a Median Filtering Operation on 8-bit data with Window Width of Nine Samples

Figure 14 shows the simulated result of the median operation on an arbitrary sequence with nine samples. We can figure out that the first median value is generated with a latency of 11 clock cycles after the first column entered sample, whereas the others appear in each clock cycle.



Figure 13. Simulation Result of Median Operation on an Arbitrary Input Sequences with 9 Samples

The performances of filtering algorithm proposed in this paper and some other filters are compared in Table 1. It is pretty clear that the proposed filtering algorithm requires less logic cells and can implement higher operating frequency than traditional filters. In our implementation, the filter is composed of 462 LEs, and can be operated at the speed of 37.60MHz.

| Table 1. Comparison Results of Performance between Proposed Algorithm |  |  |  |  |  |
|-----------------------------------------------------------------------|--|--|--|--|--|
| and other Filters                                                     |  |  |  |  |  |
|                                                                       |  |  |  |  |  |

| Hardware<br>Name            | References  | In/Out Pins | Device     | Operating<br>Frequency | LE used |
|-----------------------------|-------------|-------------|------------|------------------------|---------|
| Median filter               | [16]        | 9/72        | EPF10K20TC | 31.34MHz               | 972     |
| based on a                  |             |             | 144-3      |                        |         |
| 'bubble sort'               |             |             |            |                        |         |
| method                      |             |             |            |                        |         |
| Median filter               | [17]        | 25/8        | EPF10K10LC | 34.60 MHz              | 521     |
| using 1-bit                 |             |             | 84-3       |                        |         |
| odd/even                    |             |             |            |                        |         |
| transposition<br>network in |             |             |            |                        |         |
| MD block                    |             |             |            |                        |         |
| Median filter               | <u>[10]</u> | 25/8        | EPF10K20TC | 20.57 MHz              | 835     |
| using                       | [18]        | 23/8        | 144-3      | 20.37 MITZ             | 855     |
| adder&LUT                   |             |             | 144-3      |                        |         |
| in MD block                 |             |             |            |                        |         |
| Median filter               | [19]        | 25/8        | EPF10K10LC | 36.23 MHz              | 497     |
| using adder                 | [17]        | 20/0        | 84-3       | 50.25 MILE             | .,,     |
| in MD block                 |             |             |            |                        |         |
| Median filter               | Proposed in |             | EPF10K10LC | 37.59 MHz              | 462     |
| using an                    | this paper  |             | 84-3       |                        |         |
| optimize 9-                 | ~ ~         |             |            |                        |         |
| bit sorting                 |             |             |            |                        |         |
| network in                  |             |             |            |                        |         |
| MD block                    |             |             |            |                        |         |

# **5.** Conclusion

We mainly proposed fast median filtering algorithm that can be implemented on hardware. This algorithm is better that traditional median filtering algorithms in terms of system performance, and it costs less. Thus this algorithm can meet the requirements of real time image processing extremely. We introduce some other algorithms and their principles as well. The result shows that the methods based on FPGA proposed in this paper have high speed and less cost that are unparalleled.

#### References

- S. Jenyu, L. Yilin and H. Weicheng, "Applying FPGA to implement real-time infrared tracking of moving objects", Journal of Selected Areas in Microelectronics, no. 11, (2012), pp. 12-18.
- [2] S. Hack, "The Role of FPGA's in Reprogrammable Systems", Proceedings of the IEEE, vol. 4, (1998), pp. 615-638.
- [3] S. Gao and L. Xia, "Technology of FPGA and Applications", Beijing: Posts & Telecom Press, (2001), pp. 32-37.
- [4] X. Zhang and T. Cao, "Principle and Applications of DSP", Beijing: Electronic Industry Pres, (2000), pp. 21-34.
- [5] Z. Xu and G. Xu, "The development and application of FPGA", Beijing: Posts & Telecom Press, (2002).
- [6] J. Wu and C. Wang, "The design of Altera. FPGA(Advanced)[M]", Beijing: Posts & Telecom Press, (2005).
- [7] S. Pan and J. Huang, "Practical tutorial Technology of EDA", Beijing: Science Press, (2002).
- [8] Altera, Cyclone Device Handbook Data Sheet, Http://www.altera.com.
- [9] W. Wolf, "Modern VLSI Design: System-on-chip Design (Third Edition)", Beijing: Science Press, (2003).
- [10] Z. Zhu and M. Weng, "Design and Applications of FPGA", Xi'an, Xi'an Electronic Science &Technology University Press, (2002).
- [11] L. Kessal, N. Abel and D. Dermigny, "Real-time image processing with dynamically reconfigurable architecture", Real-Time Imageing, vol. 9, no. 5, (2003), pp. 297-313.
- [12] D. G. Bailey, "Design for Embedded Image Processing on FPGAs (First edition)", IEEE, (2011).
- [13] R. C. Gonzalez and R. E. Woods, "Digital Image Processing", Second Edition. Beijing, (2003).
- [14] G. V. L. Bates and S. Nooshabadi, "FPGA Implementation of a Median Filter", IEEE, TENCON-Speed and Image Technologies for Computing and Telecommunications, (1997), pp. 437-440
- [15] G. R. Arce and R. F. Foster, "Detail-preserving ranked-order based filters for image processing", IEEE Transactions on ASSP, vol. 37, no. 1, (1989), pp. 32-98
- [16] K. Oflazar, "Design and implementation of a single chip 1-D median filter", IEEE Trans, Acoustic Speech and Signal Processing, ASSP-31, (1983), pp. 1164-1168.
- [17] L. W. Chang and J. H. Lin, "A bit-level systolic array for median filter", IEEE Trans. on Signal Processing, vol. 40, iss. 8, (1992), pp. 20079-2083.
- [18] K. Benkrid, D. Crookes and A. Benkrid, "Design and implementation of a novel algorithm for general purpose median filtering on FPGAs", IEEE International Symposium on Circuits and Systems, vol. 4, (2002), pp. IV 425- IV 428.
- [19] B. K. Kar and D. K. Pradhan, "A new algorithm for order statistic and sorting", IEEE Trans. on Signal Processing, vol. 41, (**1993**), pp. 2688-2694.