# On Optimum Designs of Universal Switch Blocks

Hongbing Fan<sup>1</sup>, Jiping Liu<sup>2\*</sup>, Yu-Liang Wu<sup>3\*\*</sup>, and Chak-Chung Cheung<sup>3</sup>

- <sup>1</sup> University of Victoria, Victoria, BC. Canada V8W 3P6 hfan@csr.uvic.ca
- <sup>2</sup> The University of Lethbridge, Lethbridge, AB. Canada T1K 3M4 liu@cs.uleth.ca
- <sup>3</sup> The Chinese University of Hong Kong, Shatin, N.T., Hong Kong ylw, cccheung@cse.cuhk.edu.hk

Abstract. This paper presents a breakthrough decomposition theorem on routing topology and its applications in the designing of universal switch blocks. A switch block of k sides and W terminals on each side is said to be universal (a (k, W)-USB) if it is routable for every set of 2-pin nets with channel density at most W. The optimum USB design problem is to design a (k, W)-USB with the minimum number of switches for every pair of (k, W). The problem was originated from designing better k sides switch modules for 2D-FPGAs, such as Xilinx XC4000-type FPGAs, where k < 4. The interests in the generic USBs come from their potential usages in multi-dimensional and some non-conventional 2D-FPGA architectures, and as an individual switch components. The optimum (k, W)-USB was solved previously for even W, but left open for odd W. Our new decomposition theorem states that when  $W \ (> \frac{\bar{k}+3-i}{3})$ , a (k, W) routing requirement ((k, W)-RR) can be decomposed into one  $(k,\frac{k+3-i}{3})$ -RR and  $\frac{3W-k+i-3}{6}$  (k,2)-RRs, where  $1 \leq i \leq 6$  and  $k \equiv i$ (mod 6). By this theorem and the previously established reduction design scheme, the USB design problem is reduced to its minimum kernel designs, that enables us to design the best approximated (k, W)-USBs for all odd W. We also run extensive routing experiments using the currently best known FPGA router VPR, and the MCNC circuits with the conventional disjoint switch blocks and two kinds of universal switch blocks. The experimental results show that both kinds of USBs consistently improve the entire chip routability by over 6% than the conventional switch blocks.

Keywords. FPGA architecture, routing, universal switch block

#### 1 Introduction

Switch blocks are critical reconfigurable components in Field Programmable Gate Arrays (FPGAs); they have great effects on the area and time efficiency

<sup>\*</sup> This research was partially supported by the Natural Sciences and Engineering Research Council of Canada.

<sup>\*\*</sup> Research partially supported by a Hong Kong Government RGC Earmarked Grant, Ref. No. CUHK4236/01E, and Direct Grant CUHK2050244

M. Glesner, P. Zipf, and M. Renovell (Eds.): FPL 2002, LNCS 2438, pp. 142-151, 2002.

<sup>©</sup> Springer-Verlag Berlin Heidelberg 2002



Fig. 1. Examples of (k, 3)-SBs,(k, 3)-RRs and the corresponding detailed routings.

and routability of FPGA chips. Many kinds of switch blocks have been studied, designed and used in various kinds of FPGA architectures [3, 2]. We here consider the (k, W) switch block ((k, W)-SB for short), in which terminals are grouped into k sides, each side has W terminals, and non-direct configurable switches connect pairs of terminals from different sides. The (4, W)-SBs are typical switch modules used in two dimensional FPGA architectures such as Xilinx XC4000-type FPGAs [3, 4, 10, 12, 13].

The routability and area efficiency are the foremost two issues in switch block designs. But the high routability and high area efficiency are two conflicting goals. It is easy to see that an FPGA with complete switch blocks, namely having a switch between every pair of terminals from different sides, will have the highest routability of the same channel density. But it has the lowest area efficiency and it is impractical when the channel density is high.

To balanced the two goals, Rose and Brown [10] introduced an important concept called the *flexibility*, denoted by  $F_s$ , which is the maximum number of switches in a switch block from a terminal to others. They investigated the effects of flexibility on the global routability, and observed that (4, W)-SBs with  $F_s = 3$  result in a sufficiently high global routability, which is an acceptable tradeoff between global routability and area efficiency. However, there are various designs with the same flexibility. This raises the interests of designing switch blocks with a high routing capacity, a small flexibility and the minimum number of switches.

To achieve high routing capacity, Chang et al. [4] proposed the significant concept of universal switch modules. A (k, W)-SB is said to be universal (a (k, W)-USB) if it is routable for every set of nets satisfying the routing constraint, i.e., the number of nets on each side is at most W. The net used in the definition of universal switch blocks is actually 2-pin net. Fan et al. [5] generalized the concept of USB to hyper-universal switch blocks (HUSB) by allowing multi-pin nets. The main contribution of [5] is the discovery of the decomposition property and the reduction design scheme for switch block designs.

The first optimum (4, W)-USB was given in [4], called a symmetric switch module. It has 6W switches with  $F_s = 3$ . The symmetric universal switch mod-

ules were generalized to genetic symmetric (k, W)-SB in [11]. However, the generalized symmetric switch blocks are not universal for odd  $W (\geq 3)$  when  $k \geq 7$ . This was firmly proved in [7]. Consequently, the optimum design USB problem for  $k \geq 7$  and odd W is still open. This paper continues the investigation on the unsolved part of the optimum USB design problem.

To avoid ambiguity, we next specify the terms net, detailed routing, and routing requirement (called global routing in [5]) with respect to a (k, W)-SB. By a *net* we mean an indication of two sides of the switch block in which two terminals should be connected by a switch. A *detailed routing of a net* is an exact assignment of a switch whose two terminals are in the sides indicated by the net. A (k, W)-routing requirement ((k, W)-RR for short) is a set of nets such that the number of nets that connect each side is no more than W. A *detailed routing of a* (k, W)-RR *in a* (k, W)-SB is an assignment of switches in the switch block such that each net in the routing requirement corresponds to a switch, and the switches corresponding to different nets are not incident. For example, Fig. 1(a), (b), (c) depict a (6,3)-SB, a (6,3)-RR, and a detailed routing of (b) in (a). Thus a (k, W)-SB is universal if it has a detailed routing for every (k, W)-RR. The optimum USB design problem can be described as: for any given pair of k and W, design a (k, W)-USB with the minimum number of switches.

The difficulty of the optimum USB design problem is due to the verification of a design. That is, to prove a (k, W)-SB has a detailed routing for every (k, W)-RR. The verification involves two subproblems: (a) to generate all testing routing requirements, and (b) to find a detailed routing algorithm.

In the case of W being even, the two problems were solved precisely by a strong decomposition property of (k, 2m)-RRs: a (k, 2m)-RR can be decomposed into m (k, 2)-RRs. Thus a union of m (k, 2)-USBs forms a (k, 2m)-USB, and the design job is reduced to compute all (k, 2)-RRs and to design a (k, 2)-USB. All (k, 2)-RRs and optimum (k, 2)-USBs are given in [11, 7], and the union of m optimum (k, 2)-USBs forms an optimum (k, 2m)-USB. For a (k, 2m)-USB formed by union of m (k, 2)-USBs, a detailed routing of a (k, 2m)-RR can be easily done by first decomposing the (k, 2m)-RR into m (k, 2)-RRs and then accommodating them in the m (k, 2)-USBs. This outlines the so called reduction design scheme.

For odd W, it is known that there is a minimum integer  $f_2(k)$  such that a (k, W)-RR can be decomposed into a  $(k, f_2(k))$ -RR and some (k, 2)-RRs [6];  $f_2(k)$  is the maximum value w such that there is a non-decomposable (k, w)-RR. The value of  $f_2(k)$  is important in the generation of all (k, W)-RRs and (k, W)-USB design as well.

In Section 2, we will introduce a breakthrough result on  $f_2(k)$ :  $f_2(k) = \frac{k+3-i}{3}$  where k and i satisfy  $k \geq 7$  and  $1 \leq i \leq 6$  and  $k \equiv i \pmod 6$ . This result gives the best decomposition theorem, which makes it possible to generate all (k, 2m+1)-RRs, and to design the best approximated (k, 2m+1)-USBs using the reduction design scheme.

The routability of 2D-FPGA with USBs was tested in [4]. We further perform the experimental justification using VPR [1] with the commonly used disjoint switch block, the symmetric USB [4] and an alternative USB. The setting and results are presented in Section 4.

### 2 The Extreme Decomposition Theorems

We use the combinatorial and graph models to represent routing requirements, switch blocks and detailed routing as in [5]. For convenience, we describe the modeling briefly as the following.

We label the sides of a (k, W)-SB by  $1, 2, \ldots, k$ , respectively, then a 2-pin net can be represented as a size 2 subset of  $\{1, 2, \ldots, k\}$ . For example, a net that connects two terminals on sides 1 and 2 can be represented by  $\{1, 2\}$ . A (k, W)-RR is a collection (multiple set) of size 2 subsets (also called nets) of  $\{1, 2, \ldots, k\}$ , such that each  $i \in \{1, 2, \ldots, k\}$  is contained in no more than W subsets in the collection. A (k, W)-SB can be modeled as a graph: represent the jth terminal on side i by a vertex  $v_{i,j}$  and a switch connecting  $v_{i,j}$  and  $v_{i',j'}$  by an edge  $v_{i,j}v_{i',j'}$ , then a (k, W)-SB corresponds to a k-partite graph G with vertex partition  $(V_1, \ldots, V_k)$ , where  $V_i = \{v_{i,j} | j = 1, \ldots, W\}, i = 1, \ldots, k$ . We also call such a graph a (k, W)-SB. Two (k, W)-SBs are isomorphic if there is an isomorphism which preserves the vertex partitions. A detailed routing of a net  $\{i, j\}$  can be represented by an edge connecting a vertex in part  $V_i$  and a vertex in part  $V_j$ . A detailed routing of a (k, W)-RR in a (k, W)-SB corresponds to a subgraph consisting of independent edges.

The verification of USBs can be simplified by using formalized routing requirements. First of all, add some singletons (nets of size one) to a (k, W)-RR such that each element appears W times; called a balanced routing requirement ((k, W) - BRR), or k-way BRR (k-BRR) with density W. Second, pair up the non-equal singletons until no two different singletons are left; such a BRR is called a primitive BRR (PBRR). It can be seen that a (k, W)-SB is universal if and only if it has a detailed routing for every (k, W)-PBRR. We need to compute all (k, W)-PBRRs.

The decomposition property of PBRRs provides an efficient way to compute all (k, W)-PBRRs. Let R be a (k, d)-PBRR and R' be a subset of R. If R' is a (k, d')-PBRR with d' < d, then we say R' is a sub-routing requirement of R. A PBRR is said to be a minimal (PMBRR for short) (or non-decomposable) if it contains no sub-routing requirement. A (k, W)-PBRRs can be decomposed into k-PMBRRs, so that, if all k-PMBRRs are known, then we can use them construct all (k, W)-PBRRs. The following is the fundamental decomposition theorem.

**Theorem 2.1.** [6] For any given integer k, the number of k-PMBRRs is finite and every (k, W)-PBRR can be decomposed into k-PMBRRs with densities at most  $f_2(k)$ , where  $f_2(k)$  equals the maximum density of all k-PMBRRs.

The function  $f_2(k)$  is important in the computation of the complete list of k-PMBRRs. If we know the value of  $f_2(k)$ , we can at least enumerate all k-PBRRs with densities no more than  $f_2(k)$  and check each of them see if it is a k-PMBRR.

It was known that  $f_2(k) = 1$  for k = 1, 2, and  $f_2(k) = 2$  for  $3 \le k \le 6$  [6], and  $f_2(k) = 3$  for k = 7, 8; and that the complete lists of k-PMBRRs for  $k \geq 8$ . It was conjectured that  $f_2(k) = \frac{k+3-i}{3}$ . This conjecture is proved recently by employing the graph theory. Let R be a (k, W)-PBRR. Then R corresponds to a W-regular 2-graph with vertex set  $\{1,\ldots,k\}$  and edge set R. Note that 2-graphs allow edges with one vertex. An r-factor of a graph is a spanning regular subgraph of the graph. So in terms of graph theory, a (k, W)-PMBRR corresponds to a Wregular 2-graph without proper regular factors. The following result is proved, the detail is omitted.

**Theorem 2.2.** A (2r+1)-regular G has no proper regular factor if and only if G has a 2-factor free block which is incident to at least (2r+1) cut edges.

The significance of the theorem is, (a) it gives a characterization for (k, W)-PMBRRs, which makes it possible to generate all k-PMBRRs efficiently, and (b) it leads to the following result on  $f_2(k)$ .

**Theorem 2.3.** Let  $k \geq 7$  be an integer. Then  $f_2(k) = \frac{k+3-i}{3}$ , where  $1 \leq i \leq 6$ and  $k \equiv i \pmod{6}$ .

*Proof.*  $f_2(k) \geq \frac{k+3-i}{3}$  follows from the examples in [7]. Now we show  $f_2(k) \leq \frac{k+3-i}{3}$  $\frac{k+3-i}{3}$ . Let D be the 2-graph of a  $(k, f_2(k))$ -PMBRR. If D contains singletons (edge of size one), we transform D into a  $f_2(k)$ -regular graph R as follows.

Let x be the vertex of D such that  $\{x\}$  is a singleton of D, and let p be the multiplicity of  $\{x\}$  in D. If  $p = f_2(k)$ , then x is an isolated vertex, delete x from D; if p = 2m for some m, then add in vertices y, z, and m copies of xy, m copies of xz,  $f_2(k)-m$  copies of yz; else we have  $p=2m+1< f_2(k)$ , then add in new vertices y,z,w, and 2m+1 copies of the edge xy,  $\frac{f_2(k)-2m-1}{2}$  copies of yz and yw, 2m+1 copies of zw. Let R be the graph obtained by the above construction, then R is minimal for otherwise D would be not minimal. Note that  $|R| \leq k+3$ .

By Theorem 2.2, R has a 2-factor free block C which is incident to at least  $f_2(k)$  cut edges. Each such cut edge joins a component of R with at least 3 vertices because R is a  $f_2(k)$ -regular graph and  $f_2(k) \geq 3$  is odd. It follows that  $3f_2(k) + |C| \le |R|$  and

$$f_2(k) \le \frac{|R| - |C|}{3} \le \frac{k+3-1}{3} = \frac{k+2}{3}.$$

Let k = 6r + i, where  $r \ge 1$  and  $1 \le i \le 6$ . Then we have  $f_2(k) \le \frac{k+2}{3} = \frac{6r + i + 2}{3} = 2r + 1 + \frac{i-1}{3}$ . Since  $\lfloor \frac{i-1}{3} \rfloor = 0$  and  $f_2(k)$  is odd,  $f_2(k) \le 2r + 1 = \frac{k+3-i}{3}$ .

As an immediate consequence of Theorem 2.3, we have the following new extreme decomposition theorem of (k, W)-PBRRs.

**Theorem 2.4.** Let  $k \geq 7$  and  $1 \leq i \leq 6$  with  $i \equiv k \pmod{6}$ , and W be odd.

Then the following statements hold: (i) If  $W > \frac{k+3-i}{3}$ , then every (k,W)-PBRR can be decomposed into a  $(k,\frac{k+3-i}{3})$ -PBRR and  $\frac{3W-k-3+i}{6}$  (k,2)-PBRRs.

(ii) There are (k, W)-PMBRRs for every  $W \leq \frac{k+3-i}{3}$ .

By the above decomposition theorem, we know, when W is odd and  $W > \frac{k+3-i}{3}$ , the disjoint union of one  $(k,\frac{k+3-i}{3})$ -USB and  $\frac{3W-k-3+i}{6}$  (k,2)-USBs forms a (k,W)-USB; when W is odd and  $W \leq \frac{k+3-i}{3}$ , no (k,W)-USB is the disjoint union of smaller USBs. Therefore by the reduction scheme, for any fixed k, we need to design the basic (k,r)-USBs for  $r=1,2,3,5,\ldots,\frac{k+3-i}{3}$ . Once these basic USB has been designed, then we can combine m (k,2)-USB to obtain a (k,2m)-USB, and combine one  $(k,\frac{k+3-i}{3})$ -USB and  $\frac{6m-k+i}{6}$  (k,2)-USBs to obtain a (k,2m+1)-USB. The SBs obtained in this way is scalable, and detailed routing can be done efficiently.

### 3 The Design Scheme for Basic USBs

The USB design problem has been reduced to the designing of the basic (k,r)-USBs for  $r=1,2,3,5,\ldots,\frac{k+3-i}{3}$ . Designing the basic USBs is the problem kernel, and it is a real tough task except for r=1,2. The optimum (k,1)-USB and (k,2)-USB were designed in [11, 7]. But for  $k\geq 7$  and odd r  $(3\leq r\leq \frac{k+3-i}{3})$ , no optimum (k,r)-USB is known yet. However, we can design approximated basic (k,r)-USBs by the following inductive design scheme.

Let U(k, 1) and U(k, 2) be optimum (k, 1)-USB and (k, 2)-USB, respectively. Construct a (k, 3)-USB U(k, 3) by, first making a copy of U(k, 1) and a U(k, 2), and then adding some switches between them such that the resulting switch block is routable for all (k, 3)-PMBRRs. A (k, 5)-USB U(k, 5) can then be constructed by combining a copy of U(k, 3) and U(k, 2) and adding some switches such that it is routable for all (k, 5)-PMBRRs. Continue this construction until a  $U(k, \frac{k+3-i}{3})$ -USB is constructed.

Note that in the universalbility verification of U(k,r), we only check detailed routings for (k,r)-PMBRRs, not for all (k,r)-PBRRs. This is because that those decomposable (k,r)-PBRRs are routable in the union of U(k,r-2) and U(k,2).

Next we illustrate this method in detail for k=7. Since  $f_2(7)=3$ , we need only construct a (7,3)-USB. Denote by U(7,1)+U(7,2) the disjoint union of U(7,1) and U(7,2). We next consider adding the minimum number edges between U(7,1) and U(7,2) (called cross edges) so that the resulting graph  $\bar{U}(7,3)$  is routable for every (7,3)-PMBRRs. By Theorem 2.2, a (7,3)-PMBRR must be isomorphic to the 2-graph shown in Fig. 2(a). Let R be a (7,3)-PMBRR. Then to be routable in  $\bar{U}(7,3)$ , there is at least one cross edge which will be used in the detailed routing. We consider a detailed routing of R which uses exact one of the cross edges. Suppose we use one cross edge to detailed route  $\{i_1,i_2\}$  and  $i_1$  corresponds to a vertex v in U(7,1). Then we must use three independent edges in  $U(7,1)-\{v\}$  to implement three independent pairs in  $R-\{i_1,i_2\}$ . Therefore, we should select  $\{i_1,i_2\}$  in R such that  $R-\{i_1\}$  contains three disjoint pairs. It is easy to see that such a  $\{i_1,i_2\}$  must be an edge in a triangle of R.

A smallest (in terms of number of edges) graph on seven vertices which will always contain a triangle edge of any (7,3)-PMBRR is given in Fig. 2-(b). We call it a connection pattern. The labels of the vertices and the orientation of edges in the pattern are arbitrary. A directed edge (i,j) in the pattern corresponds



**Fig. 2.** A (7,3)-USB.

to a cross edge joining the *i*-th side of U(7,1) and the *j*-th side of U(7,2) (the joining of the cross edge is not unique; it depends on only the label of the sides). With this pattern, we obtain a (7,3)-USB  $\bar{U}(7,3)$  as shown in Fig. 2(c).

We next consider the (7,2h+1)-USB obtained by combining  $\bar{U}(7,3)$  and U(7,2)s. Let  $\bar{U}(7,2h+1) = \bar{U}(7,3) + \sum_{i=1}^{h-1} U(7,2)$ .

Note that the number of switches in an optimum (k, W)-USB, denoted by  $e_2(k, W)$ , is bounded below by  $\binom{k}{2}W$  because there must be at least W switches between any two sides of a (k, W)-USB.

**Theorem 3.1.** Let  $W (\geq 3)$  be an odd integer, then  $\bar{U}(7,W)$  is a USB with approximation ratio  $\frac{|E(\bar{U}(7,W))|}{e_2(7,W)} \leq 1 + \frac{6}{21W}$  and flexibility  $F_s = 7$ .

*Proof.* By our construction, we see that  $\bar{U}(7,3)$  is a (7,3)-USB. By Theorem 2.4, every (7,W)-PBRR can be decomposed into a (7,3)-PBRR and  $\frac{W-3}{2}$  (7,2)-PBRRs. The (7,3)-PBRR have detailed routing in  $\bar{U}(7,3)$ , and each (7,2)-PBRR has a detailed routing in one of the  $\frac{W-3}{2}$  U(7,2). Therefore,  $\bar{U}(7,W)$  is universal.  $|E(\bar{U}(7,W))| = {7 \choose 2}W + 6 = 21W + 6$  and  $e_2(7,W) \geq {7 \choose 2}W = 21W$ . Therefore,  $\frac{|E(\bar{U}(7,W))|}{e_2(7,W)} \leq 1 + \frac{6}{21W}$ .

By the above theorem, we see when W is large, the ratio is close to 1. Hence  $\bar{U}(7,W)$  is nearly optimal when W is large. Since  $f_2(k)=3$  for k=8,9,10,11,12, therefore,  $\bar{U}(k,W)$  can be constructed similarly for these ks.

Finally we provide an alternative design of k sides switch block which is routable for every (k,W)-RRs. Let  $\underline{U}(k,W) = \sum_{i=1}^{\lceil \frac{W}{2} \rceil} U(k,2)$ .

**Theorem 3.2.** If W is even, the  $\underline{U}(k,W)$  is an optimum (k,W)-USB. If W is odd,  $\underline{U}(k,W)$  is a (k,W+1)-SB routable for every (k,W)-RRs, and has approximation ratio  $\frac{|E(\underline{U}(k,W))|}{e_2(k,W)} \le 1 + \frac{1}{W}$  and flexibility  $F_s = k-1$ .

Proof. This is clearly true when W is even. For odd W, we need to show that any (k, W)-RR R is routable in  $\underline{U}(k, W)$ . Let  $R_1$  be any (k, 1)-RR and let  $R' = R \cup R_1$ . Then R' is a (k, W+1)-RR. Since W+1 is even, R' can be decomposed into  $\frac{W+1}{2} = \lceil \frac{W}{2} \rceil$  (k, 2)-RRs. Each of these (k, 2)-RRs can be detail-routed in one U(k, 2), therefore, R' is routable in  $\underline{U}(k, W)$ . Simply remove the detailed routing for  $R_1$  yields a detailed routing of R in  $\underline{U}(k, W)$ . Therefore,  $\underline{U}(k, W)$  is routable for every(k, W)-RR.

If W is even, 
$$|E(\underline{U}(k,W))| = {k \choose 2}W$$
; otherwise  $|E(\underline{U}(k,W))| = {k \choose 2}(W+1)$ . Therefore,  $\frac{|E(\underline{U}(k,W))|}{e_2(k,W)} \le 1 + \frac{1}{W}$ .

## 4 Experimental Results

As we can see, USB is defined to have highest local routing capacity. There is no theoretical proof that USB can lead to a high global routability, however we can test the routing behavior by experiments. Universal property is an isomorphic property. Two isomorphic USB designs may have different layouts. The routing network in FPGA is determined by the settings of each switch blocks. If all switch blocks use the the same layout, then we have a disjoint grid routing networks. However, no analytical model for global routing is well-established. Lemieux and Lewis [8] proposed an analytical framework for overall routings. We use the probabilistic model [3] and experiment to justify the entire chip routability by USBs with different layouts.

We adopt the well-known FPGA router VPR [1] for our experiment. The logic block structure for our VPR runs is set to contain one 4-input LUT and one flip-flop. The input or output pin of the logic block is able to connect to any track in the adjacent channels ( $F_c = W$ ). Inside the switch box, each input wire segment can connect to three other output wire segments of other channels ( $F_s = 3$ ).

The results in [4] shown a notable improvement on global routability of symmetric USB against the disjoint type and antisymmetric type in [10]. Their experiments were done by using the modified CGE router [10] and CGE benchmark circuits. In our experiment, we conduct a vast experiment on 21 large benchmark circuits with Disjoint switch-blocks, symmetric USBs and an alternative USBs of different channel widths.

Fig. 3(a), Fig. 3(b) and Fig. 3(c) show the actual connection of the Disjoint switch-block, the symmetric USB and the alternative USB of channel width 8. The alternative (4,8)-USB is a union of 4 (4,2)-USBs, which is isomorphic to the symmetric (4,8)-USB but has different layout.

Table 1 shows the results on the number of tracks required to route some larger MCNC benchmark circuits [14] by FPGAs with the three SBs respectively. Overall, the routing results of the symmetric USB and our proposed USB FPGAs both use about 6% less tracks than that by Disjoint SBs. There is no big differences between the symmetric USBs and the proposed ones; this indicates that the global routability depends largely on the topological structures of the switch blocks rather than their layouts.

**Table 1.** Channel widths required for different benchmark circuits  $F_C = W$ ,  $F_S = 3$ .

|              | Disjoint | Symmetric USB | Alternative USB |
|--------------|----------|---------------|-----------------|
| alu4         | 10       | 10 (-0%)      | 10 (-8.3%)      |
| apex2        | 12       | 11 (-8.3%)    | 11 (-8.3%)      |
| apex4        | 13       | 12 (-7.7%)    | 13 (-0%)        |
| bigkey       | 7        | 7 (-0%)       | 6 (-14.3%)      |
| clma         | 13       | 11 (-15.4%)   | 12 (-7.7%)      |
| des          | 8        | 7 (-12.5%)    | 7 (-12.5%)      |
| diffeq       | 8        | 7 (-12.5%)    | 7 (-12.5%)      |
| $_{ m dsip}$ | 7        | 7 (-0%)       | 7 (-0%)         |
| elliptic     | 11       | 10 (-9.1%)    | 10 (-9.1%)      |
| ex1010       | 11       | 10 (-9.1%)    | 10 (-9.1%)      |
| ex5p         | 14       | 13 (-7.1%)    | 13 (-7.1%)      |
| frisc        | 13       | 12 (-7.7%)    | 12 (-7.7%)      |
| misex3       | 11       | 11 (-0%)      | 11 (-0%)        |
| pdc          | 17       | 16 (-5.9%)    | 16 (-5.9%)      |
| s298         | 8        | 7 (-12.5%)    | 7 (-12.5%)      |
| s38417       | 8        | 7 (-12.5%)    | 8 (-0%)         |
| s38584.1     | 8        | 8 (-0%)       | 8 (-0%)         |
| seq          | 12       | 11 (-8.3%)    | 11 (-8.3%)      |
| spla         | 14       | 14 (-0%)      | 13 (-7.1%)      |
| tseng        | 7        | 6 (-14.3%)    | 6 (-14.3%)      |
| e64          | 8        | 8 (-0%)       | 8 (-0%)         |
| Total        | 220      | 205 (-6.8%)   | 206 (-6.3%)     |

#### 5 Conclusions

We have addressed the open USB design problems on odd densities. We have provided an extreme decomposition theorem, reduced the USB design problem to the basic USB design problem, and outlined an inductive design scheme for designing the basic USBs. We have shown two types of universal switch blocks for all (k, W)-RRs. The first are (k, W)-USBs with higher approximation ratio, but have higher flexibility. The second uses one more track on each side, but with the minimum flexibility k-1. Our extensive experimental results further justify that, under the same hardware cost, the USBs can bring the global routability improvement of over 6% on the 2-D FPGA entire chip routings.

### References

- [1] V. Betz and J. Rose. "A New Packing, Placement and Routing Tool for FPGA Research". Seventh International Workshop on Field-Programmable Logic and Applications (Available for download from http://www.eecg.toronto.edu/~jayar/software.html), pages 213-222, 1997.
- [2] V. Betz, J. Rose, and A. Morquardt. Architecure and CAD for Deep-Submicron FPGAs. Kluwer-Academic Publisher, Boston MA, 1999.
- [3] S. Brown, R. J. Francise, J. Rose, and Z. G. Vranesic. Field-Programmable Gate Arrays. Kluwer-Academic Publisher, Boston MA, 1992.
- [4] Y. W. Chang, D. F. Wong, and C. K. Wong. "Universal switch models for FPGA". ACM Trans. on Design Automation of Electronic Systems, 1(1):80–101, January 1996.



**Fig. 3.** Structures of switch blocks.

- [5] H. Fan, J. Liu, and Y. L. Wu. "General Models for Optimum Arbitrary-Dimension FPGA Switch Box Designs". Proc. IEEE International Conference on Computer-Aided Design (ICCAD), Nov. 2000.
- [6] H. Fan, J. Liu, and Y. L. Wu. "Combinatorial routing analysis and design of universal switch blocks". Proceedings of the ASP-DAC 2001, 2001.
- [7] H.B. Fan, Y.L. Wu, and Y.W. Chang. "Comment on General Universal Switch Blocks". *IEEE Transactions on Computers*, 51(1):93–95, Jan. 2002.
- [8] G. G. Lemieux and D. M. Lewis. "Analytical Framework for Switch Block Design". 12th International Conference on Field Programmable Logic and Application, 2002.
- [9] E. C. Milner. "Basic wqo- and bqo-theory". Graphs and Order (Banff, 1984), NATO Adv. Sci. Inst. Ser. C: Math. Phys. Sci. 147, Reidel, Dordrecht-Boston, pages 487–502, 1985.
- [10] J. Rose and S. Brown. "Flexibility of interconnection structures for field-programmable gate arrays". IEEE J. Solid-State Circuits, 26(3):277–282, 1991.
- [11] M. Shyu, G. M. Wu, Y. D. Chang, and Y. W. Chang. "Generic Universal Switch Blocks". *IEEE Trans. on Computers*, pages 348–359, April 2000.
- [12] Y. L. Wu and M. Marek-Sadowska. "Routing for array type FPGAs". *IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems*, 16(5):506–518, May 1997.
- [13] Y. L. Wu, S. Tsukiyama, and M. Marek-Sadowska. "Graph based analysis of 2-D FPGA routing". IEEE Trans. on Computer-Aided Design, 15(1):33-44, 1996.
- [14] S. Yang. "Logic Synthesis and Optimization Benchmarks, Version 3.0". Tech. Report, Microelectronics Centre of North Carolina, 1991.