# **On Optimal Irregular Switch Box Designs**

Hongbing Fan<sup>1</sup>, Yu-Liang Wu<sup>2\*</sup>, Chak-Chung Cheung<sup>3</sup>, and Jiping Liu<sup>1</sup>

<sup>1</sup> The University of Lethbridge, Lethbridge, AB. Canada T1K 3M4 {fan, liu}@cs.uleth.ca

<sup>2</sup> The Chinese University of Hong Kong, Shatin, N.T., Hong Kong ylw@cse.cuhk.edu.hk

<sup>3</sup> Department of Computing, Imperial College London, United Kingdom rcheung@doc.ic.ac.uk

Abstract. In this paper, we develop a unified theory in analyzing optimal switch box design problems, particularly for the unsolved irregular cases, where different pin counts are allowed on different sides. The results drawn from our system of linear Diophantine equations based formulation turn out to be general. We prove that the divideand-conquer (reduction) design methodology can also be applied to the irregular cases. Namely, an optimal arbitrarily large irregular or regular switch box can be obtained by combining small prime switch boxes, which largely reduces the design complexity. We revise the known VPR router for our experiments and show that the design optimality of switch boxes does pay off.

Keywords. Configurable computing, on-chip network, FPGA, switch box

## 1 Introduction

A switch box (SB) consists of terminals (pins) and programmable switches, with each switch connecting two pins on different sides. A switch box is *regular* if all sides have the same number of pins; otherwise it is *irregular*. As the optimality of a switch box design imposes a crucial impact on silicon cost and performance of FPGAs, extensive investigations on this problem have been carried out in recent years, see [3,4,5,7,12,13] for examples. Chang et al.[5] started the study on the so-called optimal Universal Switch Block (USB) structure, which is defined as a switch box being able to accommodate any 2-pin net routing requirement with the least number of switches. In [9,7,8], the so-called Hyper-Universal Switch Box (HUSB) was investigated to cover the general cases of multi-pin routings. Although it has been shown that this optimal switch box design problem can be solved by divide-and-conquer (reduction) approaches [7,8], only regular switch box cases were analyzed before.

<sup>\*</sup> Research partially supported by a Hong Kong Government RGC Earmarked Grant, Ref. No. CUHK4236/01E, and Direct Grant CUHK2050244

J. Becker, M. Platzner, S. Vernalde (Eds.): FPL 2004, LNCS 3203, pp. 189–199, 2004.

<sup>©</sup> Springer-Verlag Berlin Heidelberg 2004



Fig. 1. Examples of irregular switch boxes.

Despite the surprising result suggesting that square switch boxes might be the best in terms of area and delay [10], Fig. 1 gives some scenarios where irregular switch boxes are efficient and desirable. Besides the classic crossbar structures with unequal input and output pins [11], some recent technological advances have somehow stirred the study interest on developing more general irregular switch boxes, which, for example, can allow more customization flexibility for embedded FPGA cores of SoC designs. In [1], directional bias and non-uniform FPGA architectures were experimentally addressed. The directional bias refers to the different number of tracks between horizontal and vertical channels, while the non-uniformness refers to channel width variation between different channels of the same direction. In [10], rectangular switch blocks formed by a union of several aligned regular switch boxes [14] were studied. Irregular switch boxes can also be used in hierarchical FPGA architectures and circuit-switching based reconfigurable on-chip networks with non-uniform I/O port densities of different sides. In all these examples, irregular switch boxes provide extra flexibility in designing on-chip networks with non-uniform channel densities.

Similar to the regular switch boxes design problems, the problem is to design optimal irregular switch boxes satisfying two specifications: 1) shape specification, which includes the number of sides (dimension) and the number of terminals on each side (channel density), and 2) routability specification, which is characterized by the set of routable routing cases.

We use  $(r_1, \ldots, r_k)$ -SB to denote a k-sided switch box with channel density  $r_i$ on side *i* for  $i = 1, \ldots, k$ . We are interested in designing a generic class of switch boxes determined by a *channel density ratio vector* **d** and a *residual vector* **c**, i.e., a group of  $(w\mathbf{d} + \mathbf{c})$ -SBs with all integer scales  $w \ge 1$ . In particular, when  $\mathbf{d} = (1, \ldots, 1)$  and  $\mathbf{c} = (0, \ldots, 0)$ , a  $(w\mathbf{d} + \mathbf{c})$ -SB is a regular switch box of k sides with w terminals on each side. We will show that a solution scheme for the generic switch box design problem can be used to design a specific irregular switch box.

In this paper, we first formulate routing requirements as nonnegative integer solutions of System of Linear Diophantine Equations (SLDEs), then apply the theory of SLDE to find decompositions of routing requirements. Accordingly a reduction design scheme for irregular switch box design is obtained, which generalizes the design scheme for regular switch boxes. In other words, an arbitrarily large irregular switch box can also be obtained by combining some small prime switch boxes. The VPR [2] router is used to compare the routability of different irregular switch-boxes on a fixed channel density ratio. The large MCNC benchmark circuits are used in the experimental test.

This paper is organized as follows. Terminology and switch box design problem are given in Section 2. Section 3 formulates a new modelling of routing requirements and develops decomposition theory of routing requirements by applying the theory of system of linear Diophantine equations. In Section 4, the generalized reduction design scheme for irregular switch boxes of arbitrary shapes is introduced. Two design examples for illustration and experimental results are presented in Sections 5 and 6, respectively. Conclusions are drawn in Section 7.

#### 2 The Switch Box Design Problem

We model a switch box as a graph as in [7]. For an  $(r_1, \ldots, r_k)$ -SB, we denote the *j*-th terminal on side *i* by  $v_{i,j}$  for  $i = 1, \ldots, k, j = 1, \ldots, r_i$ . If there is a switch joining terminals  $v_{i,j}$  and  $v_{i',j'}$ , then we denote the switch by an edge  $v_{i,j}v_{i',j'}$ . Thus, an  $(r_1, \ldots, r_k)$ -SB corresponds to a *k*-partite simple graph with vertex partition  $(V_1, \ldots, V_k)$ , where  $V_i = \{v_{i,j} | j = 1, \ldots, r_i\}, i = 1, \ldots, k$ .

The disjoint union of two k-sided switch boxes  $G_1$  and  $G_2$  is a k-sided switch box with the *i*-th side being the union of the *i*-th sides of both  $G_1$  and  $G_2$ together with all switches of  $G_1$  and  $G_2$ , denoted by  $G_1 + G_2$ . The disjoint union of h copies of  $G_1$  is denoted by  $hG_1$ . As depicted in Fig.2, the (4, 3, 4, 3)-SB (c) is a disjoint union of (2, 1, 2, 1)-SB (a) and (2, 2, 2, 2)-SB (b).



Fig. 2. An example of the disjoint union of two switch boxes.

A (signal) net for a k-sided switch box is a connection request on some terminals of the switch box. In our switch box design problems, a net only specifies the sides where its terminals are located; a router will take care of exact terminal assignments besides switch connection assignments [5,7,8,13]. A net is said to be an m-pin net if it specifies m different sides; an m-pin net which specifies sides  $i_1, \ldots, i_m$  will be expressed as  $\{i_1, \ldots, i_m\}$ , which is a subset of  $\{1, \ldots, k\}$ . For example, a 3-pin net connecting three terminals in sides 1, 2 and 3 is represented by  $\{1, 2, 3\}$ . Sometimes only certain types of nets are considered in the switch box design; this set of types consists of some subsets of  $\{1, \ldots, k\}$ , it is called a *net pattern set (over*  $\{1, \ldots, k\}$ ), denoted by  $\mathcal{P}$ . A net N in  $\mathcal{P}$  is referred as a  $\mathcal{P}$ -net. A net of size 1 (singleton) does not need a switch in routing,

but it is very convenient when consider its mathematical properties. Therefore, we always assume that any  $\mathcal{P}$ -net contains all singletons.

For examples, the net pattern set  $\mathcal{P} = \mathcal{P}_2 = \{N | N \subset \{1, \ldots, k\}, |N| \leq 2\}$ is used in the study of universal switch boxes[5], while  $\mathcal{P} = \mathcal{P}_k = \{N | N \subset \{1, \ldots, k\}\}$  is used for hyperuniversal switch box designs[7].

A routing requirement (RR) for a switch box is a group of nets need to be connected simultaneously through the switch box. Formally, a  $\mathcal{P}$ -net  $(r_1, \ldots, r_k)$ -RR is a collection of  $\mathcal{P}$ -nets  $[N_1, \ldots, N_r]$  such that  $N_j \in \mathcal{P}$  for  $j = 1, \ldots, r$ , and the number of  $N_j$ 's that specify side *i* is equal to  $r_i$ , i.e.,  $|\{j|i \in N_j\}| = r_i$  for  $i = 1, \ldots, k$ .

A feasible routing of a routing requirement in a switch box is an ON/OFF assignment of the switches such that all the nets of the routing requirement are connected (realized) simultaneously. A realization of a net is modelled as a tree with one vertex in each side specified by the net. Formally, it is defined as follows. Let G be a  $(r_1, \ldots, r_k)$ -SB with sides  $V_i = \{v_{i,j} | j = 1, \ldots, r_i\}, i = 1, \ldots, k$ . An  $(r_1, \ldots, r_k)$ -RR  $R = [N_1, \ldots, N_m]$  is said to be routable in G if G contains m vertex disjoint subtrees  $L_1, \ldots, L_m$  such that for each  $i = 1, \ldots, m, L_i$  has exactly one vertex in the sides specified by  $N_i$ , i.e.,  $|V(L_i) \cap V_j| = 1$  for each  $j \in N_i$ . We call  $\{L_1, \ldots, L_m\}$  a feasible routing of R in G, and  $L_i$  a feasible routing of  $N_i$  in G. We note that if  $N_i$  is a singleton, then its feasible routing only consists of a terminal with no switch used. Therefore adding (or removing) singletons to a routing requirement does not change its routability.

Fig.3(a) shows a (4, 4, 4, 4)-SB, where each side has four terminals which are assigned unique track IDs (1 to 4). Fig.3(b) shows a (4, 4, 4, 4)-RR, which has seven nets:  $N_1 = \{1, 2\}, N_2 = \{1, 2, 4\}, N_3 = \{1, 4\}, N_4 = \{2, 3, 4\},$  $N_5 = \{1, 3\}, N_6 = \{2, 3\}, N_7 = \{3, 4\}$ . Net  $N_2$  is a 3-pin net, which requires two switches to connect its three terminals in sides 1, 2, and 4. Fig.3(c) shows a feasible routing for the routing requirement.



Fig. 3. An example of switch box, routing requirement and feasible routing.

An  $(r_1, \ldots, r_k)$ -SB *G* is said to be  $\mathcal{P}$ -universal if every  $\mathcal{P}$ -net  $(r_1, \ldots, r_k)$ -RR is routable in *G*, and an *optimal*  $\mathcal{P}$ -universal switch box is one with the least number of switches. The notion of  $\mathcal{P}$ -universal unifies both universal and hyper-

universal discussed in [5,7]. The  $\mathcal{P}_2$ -universal is just the so called universal, while the  $\mathcal{P}_k$ -universal is the hyperuniversal.

**Generic switch box design problem:** Given k-dimensional nonnegative integer vectors  $\mathbf{d}$  and  $\mathbf{c}$  and a net pattern set  $\mathcal{P}$ , design an optimal  $\mathcal{P}$ -universal  $(w\mathbf{d} + \mathbf{c})$ -SB for every  $w \geq 1$ .

Our ultimate goal is to derive a general method to solve the generic switch box design problem. A solution scheme for a generic switch box design problem can be used to design a specific  $(r_1, \ldots, r_k)$ -SB. For a given vector  $(r_1, \ldots, r_k)$ , we can select proper  $\mathbf{d}, \mathbf{c}$  and  $w_0$  such that  $(w_0\mathbf{d} + \mathbf{c}) = (r_1, \ldots, r_k)$ , then a  $(w_0\mathbf{d} + \mathbf{c})$ -SB is an  $(r_1, \ldots, r_k)$ -SB.

### 3 Decomposition Theorems

Our design technique for generic switch boxes is based on the decomposition properties of routing requirements. We prove the general decomposition theorems by employing the routing requirement vectors and the theory of system of linear Diophantine equations.

The routing requirement vectors were first used to represent (w, w, w, w)-RRs in [5]. We modify the definition to fit in our routing requirements modelling as follows. For a 2-pin net (w, w, w, w)-RR R, let  $n_{i,j}$  denote the number of net  $\{i, j\}$  in R, and let  $n_i$  denote the number of singleton  $\{i\}$  in R, we call vector  $(n_1, n_2, n_3, n_4, n_{1,2}, n_{1,3}, n_{1,4}, n_{2,3}, n_{2,4}, n_{3,4})$  a 2-pin net routing requirement vector of R. Obviously a nonnegative integer vector is a routing requirement vector if and only if it satisfies the following equation.

$$\begin{cases} n_{1,2} + n_{1,3} + n_{1,4} + n_1 = w \\ n_{1,2} + n_{2,3} + n_{2,4} + n_2 = w \\ n_{1,3} + n_{2,3} + n_{3,4} + n_3 = w \\ n_{1,4} + n_{2,4} + n_{3,4} + n_4 = w \end{cases}$$
(1)

In general, for a given net pattern set  $\mathcal{P} = \{S_1, \ldots, S_t\}$ , a  $\mathcal{P}$ -net  $(r_1, \ldots, r_k)$ -RR  $R = [N_1, \ldots, N_m]$  can be expressed by a vector  $X = (x_1, \ldots, x_t)$  where  $x_i$  is the number of  $N_i$ s in R, i.e.,  $x_i = |\{j|N_j = S_i\}|$ , denoted by  $\mathcal{P}$ -net  $(r_1, \ldots, r_k)$ -RRV. A vector  $X = (x_1, \ldots, x_t)$  is a  $\mathcal{P}$ -net  $(r_1, \ldots, r_k)$ -RRV if and only if it is a nonnegative integer solution of

$$AX^T = (r_1, \dots, r_k)^T, \tag{2}$$

where  $A = (a_{i,j})_{k \times t}$  is the incidence (characterization) matrix of  $\mathcal{P}$ . I.e.,  $a_{i,j} = 1$  if  $i \in S_j$ ; otherwise  $a_{i,j} = 0$ . Therefore, we can compute all routing requirements by finding all nonnegative integer solutions of equation (2).

In mathematics, a linear system  $AX^T = \mathbf{b}^T$  is called a system of linear Diophantine equations (SLDE) if the entries of A and  $\mathbf{b}$  are integers, and only nonnegative integer solutions are considered. If  $\mathbf{b}^T = \mathbf{0}$ , the system is homogeneous. The SLDE has been studied extensively. Let  $X = (x_1, \ldots, x_t)$  and  $X' = (x'_1, \ldots, x'_t)$  be two nonnegative integer solutions of an SLDE. Define  $X \leq X'$  if  $x_i \leq x'_i$  for all  $i = 1, \ldots, t$ . A solution of an SLDE X is said to be a

minimal solution if there is no other solution X'' satisfying  $X'' \leq X$ . It is known that the set of all minimal solutions is finite, and that any nonnegative integer solution of a homogeneous SLDE is a nonnegative integer linear combination of its minimal solutions (called the *Hilbert basis*). We use  $\mathcal{B}[S]$  to denote the set of all minimal solutions of an SLDE S. There are several known algorithms for computing the set of minimal solutions of an SLDE. Interested readers can consult Contejean and Devie [6].

Given nonnegative integer vectors  $\mathbf{d} = (d_1, \ldots, d_k)$  and  $\mathbf{c} = (c_1, \ldots, c_k)$ , a  $\mathcal{P}$ net  $(w\mathbf{d}+\mathbf{c})$ -RRV corresponds to a (X, w), which is a nonnegative integer solution
of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$ . There is a vector  $(X', w') \in \mathcal{B}[(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T]$ such that  $(X', w') \preceq (X, w)$ . (X, w) - (X', w') is a solution of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$ ]  $= \mathbf{0}^T$ , thus, (X, w) - (X', w') is a nonnegative-integer linear combination of
minimal solutions of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T$ . Therefore, $(X, w) = (X', w') + \sum_{i=1}^m a_i(X_i, w_i)$ , where  $(X_i, w_i)$ s are minimal solutions of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T$ . In summary, we have the following theorem.

**Theorem 3.1 (The first decomposition theorem).** Let  $\mathbf{d}$  and  $\mathbf{c}$  be two kdimensional nonnegative integer vectors and  $\mathcal{P}$  be a net pattern set. Then any  $\mathcal{P}$ -net  $(w_0\mathbf{d} + \mathbf{c})$ -RRV can be expressed as a vector in  $\mathcal{B}[(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T]$ plus a nonnegative integer linear combination of vectors in  $\mathcal{B}[(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T]$  $= \mathbf{0}^T]$ , where A is the incidence matrix of  $\mathcal{P}$ .

**Theorem 3.2 (The second decomposition theorem).** Let **d** and **c** be two k-dimensional nonnegative integer vectors and  $\mathcal{P}$  be a net pattern set. Then there exists an integer p > 0 and a finite set of nonnegative integers D satisfying the following properties: for any  $w \ge 1$ , there is a  $q_w \in D$  such that every  $(w\mathbf{d} + \mathbf{c})$ -RRV can be represented as a sum of one  $(q_w\mathbf{d} + \mathbf{c})$ -RRV and  $\frac{w-q_w}{p}$  (pd)-RRVs. Consequently, if  $U_0$  is a  $\mathcal{P}$ -universal (pd)-SB and  $U_w$  is a  $\mathcal{P}$ -universal  $(q_w\mathbf{d} + \mathbf{c})$ -SB, then  $U_w + \frac{w-q_w}{p}U_0$  is a  $\mathcal{P}$ -universal  $(w\mathbf{d} + \mathbf{c})$ -SB.

*Proof.* Due to page limitations, the proof is not included in this paper and is available upon request.  $\Box$ 

# 4 Generalized Reduction Design Scheme

The decomposition theorems described in the last section establish the following reduction design scheme for generic switch boxes with simple structure and reduced number of switches.

#### **Reduction Design Scheme for Generic Switch Boxes**

Given two k-dimensional nonnegative integer vectors  $\mathbf{d}$  and  $\mathbf{c}$  and a net pattern set  $\mathcal{P}$  with an incidence matrix A:

**I.** Compute  $\mathcal{B}[(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T]$  and  $\mathcal{B}[(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T]$  using Hilbert basis algorithm, where A is the incidence matrix of the net pattern set  $\mathcal{P}$ . Suppose  $\mathcal{B}[(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T] = \{(X_1, w_1), \dots, (X_m, w_m)\}$ and  $\mathcal{B}[(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T] = \{(X'_1, w'_1), \dots, (X'_l, w'_l)\}.$ 

- **II.** Use  $S = \{w_1, \ldots, w_m\}, S' = \{w'_1, \ldots, w'_l\}$  to compute an integer p and a set D satisfying the conditions of Theorem 3.2. We have that p is bounded by the least common multiple of  $w_1, \ldots, w_m$ , but p could be much smaller, and  $D \subset \{0, 1, \ldots, mp m + \max\{w'_1, \ldots, w'_l\}\}.$
- **III.** Design a  $\mathcal{P}$ -universal  $(p\mathbf{d})$ -SB  $U_0$  and set up a feasible routing table recording feasible routings for every  $p\mathbf{d}$ -RRs in  $U_0$ . For each  $r \in D$ , design a  $\mathcal{P}$ -universal  $(r\mathbf{d} + \mathbf{c})$ -SB,  $U_r$ , and set up the corresponding feasible routing table. We call  $U_0$  and  $U_r$   $(r \in D)$  prime switch boxes.
- **IV.** For any  $w \ge 1$ , construct a  $\mathcal{P}$ -universal  $(w\mathbf{d} + \mathbf{c})$ -SB as follows: if  $w \in D$ , then use the prime  $(w\mathbf{d} + \mathbf{c})$ -SB  $U_w$ , otherwise choose the minimum q such that  $w qp \in D$ . The disjoint union of one  $U_{w-qp}$  and q copies of  $U_0$ , i.e.,  $U_{w-qp} + qU_0$ , is a  $\mathcal{P}$ -universal  $(w\mathbf{d} + \mathbf{c})$ -SB. We call it a compound switch box.

**Remark:** We note that if we only want to construct a  $\mathcal{P}$ -universal (wd + c)-SB for a specific w, we only need to construct a  $\mathcal{P}$ -universal (pd)-SB U, and a  $\mathcal{P}$ -universal (qwd + c)-SB U<sub>w-qp</sub>. Then U<sub>w-qp</sub> +  $\frac{w-q_w}{p}U_0$  is a (wd + c)-SB.

The reduction design scheme reduces the generic switch box design problem to its prime switch box design problems. Although there is still no efficient known method for designing optimal prime switch boxes, the degree of difficulty has been largely reduced due to the much smaller sizes of prime switch boxes. Nonetheless, as a complete switch box has a switch joining every pair of terminals from different sides, it is  $\mathcal{P}$ -universal for any  $\mathcal{P}$ . Therefore, if we simply let  $U_0$  be the complete  $(p\mathbf{d})$ -SB  $K_{p\mathbf{d}}$  and  $U_r$  be the complete  $(r\mathbf{d} + \mathbf{c})$ -SB  $K_{(r\mathbf{d}+\mathbf{c})}$ , then  $K_{(q_w\mathbf{d}+\mathbf{c})} + \frac{w-q_w}{p}K_{p\mathbf{d}}$  is a  $\mathcal{P}$ -universal  $(w\mathbf{d} + \mathbf{c})$ -SB, and it has O(w) number of switches. We also have that the decomposition of a routing requirement can be done in a polynomial time, and finding a feasible routing in a prime switch box can be done in a constant time by looking up a routing table created for the prime switch box. Therefore, there is a polynomial time algorithm for finding a feasible routing in the compound switch box.

**Theorem 4.1.** For any given vectors  $\mathbf{d}$ ,  $\mathbf{c}$  and net pattern  $\mathcal{P}$ , there is a  $\mathcal{P}$ -universal  $(w\mathbf{d} + \mathbf{c})$ -SB with O(w) switches for every  $w \ge 1$ , and an algorithm which finds a feasible routing for any  $(w\mathbf{d} + \mathbf{c})$ -RR in the switch box in time polynomial of w.

#### 5 Two Examples of Irregular Switch Box Designs

In this section, we show how to design a specific optimal (4, 5, 6)-HUSB and a (5, 6, 7)-HUSB using the reduction design scheme. The strategy consists of choosing  $\mathbf{d} = (1, 1, 1)$  and  $\mathbf{c} = (0, 1, 2)$  first, then designing the generic (w, w + 1, w + 2)-HUSBs. The target switch boxes are the cases when w = 4, 5. I. The net pattern set for 3-sided hyper-universal switch boxes is  $\{\{1\}, \{2\}, \{3\}, \{1, 2\}, \{1, 3\}, \{2, 3\}, \{1, 2, 3\}\}$ . The incidence matrix of the net pattern set is



Fig. 4. Optimal (3, 4, 5)-HUSB and (5, 6, 7)-HUSB.



Fig. 5. Rectangular switch boxes.

$$A = \begin{pmatrix} 1 \ 0 \ 0 \ 1 \ 1 \ 0 \ 1 \\ 0 \ 1 \ 0 \ 1 \ 0 \ 1 \\ 0 \ 0 \ 1 \ 0 \ 1 \ 1 \\ \end{pmatrix}.$$

By computing the set of minimal solutions of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T$ , we obtain

(1, 1, 1, 0, 0, 0, 0, 1), (0, 0, 0, 0, 0, 0, 1, 1), (1, 0, 0, 0, 0, 1, 0, 1), (0, 1, 0, 0, 1, 0, 0, 1), (0, 0, 0, 1, 1, 1, 0, 2), (0, 0, 1, 1, 0, 0, 0, 1)

By computing the set of minimal solutions of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$  we obtain

(0, 1, 2, 0, 0, 0, 0, 0), (0, 0, 1, 0, 0, 1, 0, 0), (0, 0, 0, 0, 1, 2, 0, 1).

II. Compute p and D of Theorem 3.2. We have p = 2 and  $D = \{1, 2\}$ . That is, any solution (X, w) of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$  can be expressed as  $(X, w) = (X', w') + \sum_{i=1}^{(w-w')/2} (X_i, 2)$ , where (X', w') is a minimal solution of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$  and  $(X_i, 2)$ s are solutions of  $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T$ , and w' = 1 or 2 according to the parity of w.

III. Design an optimal (2d)-HUSB  $U_0$ , (1d + c)-HUSB  $U_1$  and (2d + c)-HUSB  $U_2$ , see Fig.4(a),(b),(c).

IV. An optimal  $(w\mathbf{d} + \mathbf{c})$ -SB can be obtained by combining (w - w')/2 copies of  $U_0$  and one  $U_1$  or  $U_2$  depending on the parity of w. In particular,  $U_2 + U_0$  is an optimal (4, 5, 6)-HUSB, and  $U_1 + 2U_0$  is an optimal (5, 6, 7)-HUSB. See Fig.4(d) and (e). The second example is the design of generic rectangular universal switch boxes with channel density ratio vector  $\mathbf{d} = (1, 2, 1, 2)$  and residual vector  $\mathbf{c} = (0, 0, 0, 0)$ . Following the design scheme, we obtain p = 2 and  $D = \{1, 2\}$ . Since  $\mathbf{c} = \mathbf{0}$ , we only need to design two prime switch boxes (2, 4, 2, 4)-USB  $U_0$  and

(1, 2, 1, 2)-USB  $U_1$ . Fig. 5(a) and (b) show the optimal design of the prime switch boxes, which can be used to construct optimal (w, 2w, w, 2w)-USBs for all  $w \ge 3$ .

#### 6 Experimental Results

In the experiment, we focus on the simple issue: what could be the routability difference on entire-chip routings between FPGAs adopting optimal irregular switch boxes, or other random but basically reasonable irregular switch boxes?

**Table 1.** Comparison of VPR experimental results on channel density w between disjoint like (w, 2w, w, 2w)-SBs and our optimal (w, 2w, w, 2w)-USBs.

|          | Disjoint-like | Optimal Design |          | Disjoint-like | Optimal Design |
|----------|---------------|----------------|----------|---------------|----------------|
| alu4     | 7             | 7              | ex5p     | 11            | 10             |
| apex2    | 8             | 8              | frisc    | 10            | 9              |
| apex4    | 10            | 9              | misex3   | 9             | 8              |
| bigkey   | 5             | 5              | s298     | 6             | 6              |
| clma     | 9             | 9              | s38417   | 6             | 5              |
| des      | 6             | 5              | s38584.1 | 6             | 6              |
| diffeq   | 6             | 6              | seq      | 9             | 8              |
| dsip     | 5             | 5              | spla     | 10            | 10             |
| elliptic | 10            | 9              | tseng    | 5             | 5              |
| ex1010   | 8             | 7              | e64      | 6             | 6              |
| Total    |               |                |          | 152           | 143 (-6.3%)    |

Direct experimental comparisons with other previous works are basically not available, since the result given in [1] was global routing only, and the switch density used in [10] is quite different from ours.

Here we give the experimental test for our (w, 2w, w, 2w)-USB designs. We revise the well considered, effective, and fair FPGA router VPR [2] and run large MCNC benchmark circuits for our routing experiments. The logic block structure for our VPR runs is set to consist of one 4-input LUT and one flip-flop. The input or output pin of the logic block is able to connect to any track in the adjacent channels, i.e.  $F_c = w$  (or 2w for wide sides). A reasonable switch design with the same switch count, which is an extension of the known disjoint-like (Fig. 5(c)) switch structure, is adopted for comparison.

Fig. 5(d) illustrates our proposed optimal S-box structure and its corresponding routing result. As shown in Table 1, the switch box design optimality does matter. FPGAs adopting the optimal switch box design can save 6% switch resources according to this experiment.

## 7 Conclusions

We presented a Divide and Conquer method for designing a wide range of irregular switch boxes. That is, an arbitrarily large optimal irregular switch box can be constructed by a simple disjoint union of some smaller prime switch boxes. To achieve this, we expressed a routing requirement as an integer vector satisfying



Fig. 6. Routing result of e64 by using Optimal S-Box, w=6 on (w, 2w, w, 2w)-USB

a System of Linear Diophantine Equations (SLDE). By applying the theory of SLDE, we solved the generating problem of routing requirements and proved a general decomposition theorem, which established our reduction design scheme: first design a few prime switch boxes, then use them to build others. As a direct consequence, a switch box designed in this way has a linear number of switches and a linear time detailed routing algorithm.

# References

- 1. V. Betz and J. Rose. Directional bias and non-uniformity in FPGA global routing architectures. In *Proceedings of the IEEE/ACM International Conference on Computer-Aided Design*, pages 652–659, Washington, Nov. 10–14 1996. IEEE Computer Society Press.
- V. Betz and J. Rose. "A New Packing, Placement and Routing Tool for FPGA Research". Seventh International Workshop on Field-Programmable Logic and Applications, pages 213–222, 1997.
- V. Betz, J. Rose, and A. Marquardt. Architecure and CAD for Deep-Submicron FPGAs. Kluwer-Academic Publisher, Boston MA, 1999.
- S. Brown, R. Francis, J. Rose, and Z. Vranesic. *Field Programmable Gate Arrays*. Kluwer-Academic Publisher, Boston MA, 1992.
- Y.-W. Chang, D. F. Wong, and C. K. Wong. Universal switch modules for FPGA design. ACM Transactions on Design Automation of Electronic Systems., 1(1):80– 101, Jan. 1996.

- 6. E. Contejean and H. Devie. An efficient incremental algorithm for solving systems of linear diophantine equations. *Inform. and Comput.*, 113(1):143–172, 1994.
- H. Fan, J. Liu, and Y. L. Wu. General models and a reduction design technique for FPGA switch box designs. *IEEE Transactions on Computers*, 52(1):21–30, Jan. 2003.
- H. Fan, J. Liu, Y. L. Wu, and C. C. Cheung. On optimum switch box designs for 2-D FPGAs. In *Proceedings of the 2001 Design Automation Conference (DAC-01)*, pages 203–208, New York, June 18–22 2001. ACM Press.
- H. Fan, J. Liu, Y. L. Wu, and C. K. Wong. Reduction design for generic universal switch blocks. ACM Transactions on Design Automation of Electronic Systems, 7(4):526-546, Oct. 2002.
- P. Hallschmid and S. Wilton. Detailed routing architectures for embedded programmable logic IP cores. In in the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 69–74, Monterey, CA, Feb. 2001.
- 11. S. Nakamura and G. M. Masson. Lower bounds on crosspoints in concentrators. *IEEE Transactions on Computers*, 31:1173–1179, 1982.
- J. Rose and S. Brown. Flexibility of interconnection structures for fieldprogrammable gate arrays. *IEEE Journal of Solid State Circuits*, 26(3):277–282, Mar. 1991.
- M. Shyu, G. M. Wu, Y. D. Chang, and Y. W. Chang. "Generic Universal Switch Blocks". *IEEE Trans. on Computers*, pages 348–359, April 2000.
- 14. S. J. Wilton. Architecture and Algorithms for Field-Programmable Gate Arrays with Embedded Memory. PhD thesis, University of Toronto, 1997.