# The Exact Channel Density and Compound Design for Generic Universal Switch Blocks HONGBING FAN Wilfrid Laurier University JIPING LIU University of Lethbridge YU-LIANG WU The Chinese University of Hong Kong and CHAK-CHUNG CHEUNG Imperial College London A switch block of k sides W terminals on each side is said to be universal (a (k,W)-USB) if it is routable for every set of 2-pin nets of channel density at most W. The generic optimum universal switch block design problem is to design a (k,W)-USB with the minimum number of switches for every pair of (k,W). This problem was first proposed and solved for k=4 in Chang et al. [1996], and then solved for even W or for $k\leq 6$ in Shyu et al. [2000] and Fan et al. [2002b]. No optimum (k,W)-USB is known for $k\geq 7$ and odd $W\geq 3$ . But it is already known that when W is a large odd number, a near-optimum (k,W)-USB can be obtained by a disjoint union of $(W-f_2(k))/2$ copies of the optimum (k,2)-USB and a noncompound $(k,f_2(k))$ -USB, where the value of $f_2(k)$ is unknown for $k\geq 8$ . In this article, we show that $f_2(k)=\frac{k+3-i}{3}$ , where $1\leq i\leq 6$ and $i\equiv k\pmod 6$ , and present an explicit design for the noncompound $(k,f_2(k))$ -USB. Combining these two results we obtain the exact designs of (k,W)-USBs for all $k\geq 7$ and odd $W\geq 3$ . The new (k,W)-USB designs also yield an efficient detailed routing algorithm. Categories and Subject Descriptors: B.6.3 [Logic Design]: Design Aids—Switching theory; B.7.2 [Integrated Circuits]: Design Aids—Placement and routing General Terms: Theory, Design Dr. J. Liu passed away recently. This Research was partially supported by the NSERC of Canada for H. Fan, and a Hong Kong Government RGC Earmarked Grant Ref. No. 417106 and Direct Grant CUHK2050320/2050351 for Y.-L. Wu. Authors' addresses: H. Fan, Department of Physics and Computer Science, Wilfrid Laurier University, Waterloo, ON, Canada N2L 3C5; email: hfan@wlu.ca; J. Liu, Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge, AB, Canada T1K 3M4; Y.-L. Wu, Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong; email: ylw@cse.cuhk.edu.hk; and C.-C, Cheung, Department of Computing, Imperial College London, London, UK; email: rcheung@doc.ic.ac.uk. Permission to make digital or hard copies part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from the Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org. © 2007 ACM 1084-4309/2007/04-ART19 \$5.00 DOI 10.1145/1230800.1230811 http://doi.acm.org/10.1145/1230800.1230811 $ACM\ Transactions\ on\ Design\ Automation\ of\ Electronic\ Systems,\ Vol.\ 12,\ No.\ 2,\ Article\ 19,\ Publication\ date:\ April\ 2007.$ Additional Key Words and Phrases: FPGA architecture, routing algorithm, universal switch block # **ACM Reference Format:** Fan, H., Liu, J., Wu, Y.-L., and Cheung, C.-C. 2007. The exact channel density and compound design for generic universal switch blocks. ACM Trans. Des. Autom. Electron. Syst. 12, 2, Article 19 (April 2007), 12 pages. DOI = 10.1145/1230800.1230811 http://10.1145/1230800.1230811 ## 1. INTRODUCTION Switch blocks (also called switch boxes) are critical reconfigurable components in field programmable gate arrays (FPGAs); they have great effects on the area, time efficiency, and routability of FPGA chips. Many kinds of switch blocks have been designed and used in various FPGA architectures [Betz et al. 1999; Brown et al. 1992]. We consider a generic (k,W) switch block ((k,W)-SB for short) in which terminals are grouped into k sides, each side having W terminals, and configurable switches connecting pairs of terminals on different sides. Generic switch blocks have been investigated in Fan et al. [2002a] and Shyu et al. [2000], and the case for k=4 was studied previously in Brown et al. [1992], Chang et al. [1996], Pan et al. [1998], Rose and Brown [1991], and Wu et al. [1996] for island-style FPGAs. Routability and area efficiency are the two foremost issues in switch block design. However, high routability and high area efficiency are two conflicting goals. It is obvious that an FPGA with complete switch blocks (having a switch between every pair of terminals from different sides) has the highest routability, but also the lowest area efficiency and is impractical to use when the channel density is high. To balance these two goals, Rose and Brown [1991] introduced the concept of flexibility, denoted by $F_s$ , which is the maximum number of switches connecting a terminal in a switch block. They investigated the effects of flexibility on the entire-chip (global) routability, and observed that (4, W)-SBs with $F_s = 3$ results in a sufficiently high global routability, which is an acceptable tradeoff between global routability and area efficiency. However, there are various designs with the same flexibility. This increases the benefits of designing an optimal switch block with a high routing capacity, small flexibility, and the minimum number of switches. A (k,W)-SB is said to be universal (or (k,W)-USB) if it is routable for every set of 2-pin nets satisfying the routing constraint, that is, the number of nets on each side is at most W. The generic optimum USB design problem can be described as follows. $USB\ Design\ Problem.$ For any given pair of positive integers k and W, design an optimum (k,W)-USB, that is, a (k,W)-USB with the minimum number of switches. Chang et al. [1996] first proposed the concept of universal switch blocks and gave the first optimum (4, W)-USB, called a symmetric switch module, which has 6W switches and $F_s = 3$ . The concept of a universal switch block, as well as the symmetric switch module, was generalized to general $k \geq 5$ in Shyu et al. [2000]. It was strictly proved in Fan et al. [2002a, 2002b] that the symmetric (k, W)-SB is an optimum (k, W)-USB when W = 1, W is even, or $k \leq 6$ , and Fig. 1. Examples of (k, 3)-SBs,(k, 3)-RRs, and the corresponding detailed routings. that the symmetric (k,W)-SB is not universal when $k \geq 7$ and $W (\geq 3)$ is odd. It was also proved that when W is a large odd integer, there exists a near-optimum (k,W)-USB which is a disjoint union of a nondecomposable $(k,f_2(k))$ -USB and $(W-f_2(k))/2$ copies of the optimum (k,2)-USB, where $f_2(k)$ is the maximum channel density of a nondecomposable k-way routing requirement. But the exact value of $f_2(k)$ was unknown, and there was no explicit method to design an efficient noncompound $(k,f_2(k))$ -USB. In this article, we solve these two problems. We give the exact value of $f_2(k)$ and provide a simple design for $(k,f_2(k))$ -USB. For completeness and also in order to avoid ambiguity, we specify some terminologies used in this article although they can be found in Fan et al. [2002a]. A net (i.e., a 2-pin net) is an indication of two sides of a switch block in which two terminals should be connected by a switch. A detailed routing of a net is an assignment of a switch whose two ends are on the sides indicated by the net. A (k, W)-routing requirement ((k, W)-RR for short) is a set of nets such that the number of nets that connect each side is at most W. A detailed routing of a (k, W)-RR in a (k, W)-SB is an assignment of switches in the switch block such that each net in the routing requirement corresponds to a switch, and the switches corresponding to different nets are not incident. For example, Figure 1(a), (b), and (c) depict a (6, 3)-SB, a (6, 3)-RR, and a detailed routing of (6, 3)-RR, respectively. Thus a (k, W)-SB is universal if it has a detailed routing for every (k, W)-RR. The decomposition property of (k,W)-RR plays an important role in the design of (k,W)-USBs. The decomposition property was first given in Chang et al. [1996] for k=4. It was then applied to general $k\geq 5$ in Shyu et al. [2000]. The decomposition theorem for general k was formally proved in Fan et al. [2002a]. For even W, the decomposition property can be stated as follows. Theorem 1.1. Any (k, 2m)-RR can be decomposed into m (k, 2)-RRs. A disjoint union of m copies of the optimum (k, 2)-USB forms an optimum (k, 2m)-USB. For odd W, it was shown in Fan et al. [2002a] that any (k,W)-RR can be decomposed into a $(k,f_2(k))$ -RR and $\frac{W-f_2(k)}{2}$ (k,2)-RRs when $W\geq f_2(k)$ , where $f_2(k)$ is the maximum integer w such that there is a nondecomposable (k,w)-RR. This implies that when W is odd and $W\leq f_2(k)$ , a (k,W)-USB can not be compound, that is, it must be connected. When W is odd and $W > f_2(k)$ , a (k, W)-USB can be obtained by combining $\frac{W - f_2(k)}{2}$ copies of optimum (k, 2)-USB and one noncompound $(k, f_2(k))$ -USB. It remained to determine the exact value of $f_2(k)$ and to design an efficient noncompound (k, W)-USB for odd W with $3 \le W \le f_2(k)$ . In Section 2, we show that $f_2(k) = \frac{k+3-i}{3}$ , where $i=k \pmod{6}$ and $1 \le i \le 6$ by applying a factor theorem developed recently in graph theory. In Section 3, we present a new (k,W)-USB design for odd W with $3 \le W \le f_2(k)$ , which uses much lesser number of switches than does a complete (k,W)-USB, but still keeps a high routing capacity. Moreover, we provide an efficient routing algorithm for the new USBs, though the existence of an efficient detailed routing algorithm for an arbitrarily given switch block is unknown. To see the impact of local routing capacity of switch blocks on the global routability, we performed an experiment using VPR [Betz and Rose 1997] with the widely-used benchmark circuits and disjoint switch blocks, symmetric universal switch blocks [Chang et al. 1996], Wilton's switch blocks [Wilton 1997], alternative universal switch blocks [Fan et al. 2002a], our new USBs, and the complete switch blocks. The experimental results are presented in Section 4. Conclusions are given in Section 5. ## 2. DECOMPOSITION THEOREM The graph modeling for 2-pin routing requirements, switch blocks, and detailed routings have been given in Fan et al. [2000, 2001]. We briefly describe the modeling as follows. We label the sides of a (k, W)-SB by $1, 2, \ldots, k$ , respectively, and then a 2-pin net can be represented as a size-two subset of $\{1, 2, \ldots, k\}$ . For example, a net that connects two terminals on sides 1 and 2 can be represented by $\{1, 2\}$ . A (k, W)-RR is a collection (multiple set) of size-two subsets (also called nets) of $\{1, 2, \ldots, k\}$ such that each $i \in \{1, 2, \ldots, k\}$ is contained in at most W subsets in the collection. A (k, W)-SB can be modeled as a graph: Denote the jth terminal on side i by a vertex $v_{i,j}$ , and a switch connecting $v_{i,j}$ , and $v_{i',j'}$ by an edge $v_{i,j}v_{i',j'}$ . Thus, a (k, W)-SB corresponds to a k-partite graph G with vertex partition $(V_1, \ldots, V_k)$ , where $V_i = \{v_{i,j} | j = 1, \ldots, W\}$ , $i = 1, \ldots, k$ . We also call such a graph a (k, W)-SB. A detailed routing of a net $\{i, j\}$ can be represented by an edge connecting a vertex in part $V_i$ and a vertex in part $V_j$ . A detailed routing of a (k, W)-RR in a switch block corresponds to a subgraph consisting of independent edges. The verification of universal switch blocks can be simplified by using formalized routing requirements. First of all, when we add some singletons (nets of size one)<sup>1</sup> to a (k, W)-RR such that each element appears exactly W times, the resulting (k, W)-RR is called a *balanced routing requirement* ((k, W)-BRR), or a k-way BRR (k-BRR) with density W. Second, we pair up the nonequal singletons until no two different singletons are left; such a BRR is called a <sup>&</sup>lt;sup>1</sup>The net of size one has no use in practice. But it brings a lot of convenience in our analysis because we can model a routing requirement as a regular hypergraph instead of a graph, so that theories for regular graphs can be applied. primitive BRR (PBRR). We note that a (k,r)-PBRR is an r-regular hypergraph on k vertices such that each edge has size one or two. It is a graph if all edges have size two. For convenience, we call such hypergraphs 2-graphs. It is obvious that a (k, W)-SB is universal if and only if it has a detailed routing for every (k, W)-PBRR. A (k, W)-BRR is minimal, denoted by (k, W)-MBRR, if it does not contain a (k, W')-BRR with W' < W. Thus, $$f_2(k) = \max\{W | \text{there exists a } (k, W) - \text{MBRR}\}.$$ In terms of graph theory, a (k, W)-MBRR is a factor-free W-regular 2-graph on k vertices, and $f_2(k)$ is the maximum degree over all nondecomposable (i.e., without a proper factor) regular 2-graphs on k vertices. The following theorem from graph theory plays an important role in the determination of $f_2(k)$ , since a (k,W)-BRR R (which is a 2-graph in general) can be transformed into a W-regular graph G, and R is not minimal if and only if G has a proper factor. This is equivalent to the fact that G has a 2-factor because W is odd and the well-known fact that any even, regular graph is 2-factorable. Theorem 2.1 [Fan et al. 2006]. A(2r+1)-regular G has no proper regular factor if and only if G has a 2-factor free block which is incident to at least (2r+1) cut edges. Theorem 2.1 leads to the determination of the function $f_2(k)$ . THEOREM 2.2. Let $k \geq 7$ be an integer. Then $f_2(k) = \frac{k+3-i}{3}$ , where $1 \leq i \leq 6$ and $i \equiv k \pmod{6}$ . PROOF. By Fan et al. [2002a], we only need show that $f_2(k) \leq \frac{k+3-i}{3}$ , where $1 \leq i \leq 6$ and $i \equiv k \pmod{6}$ . Since a nondecomposable $f_2(k)$ -regular 2-graph on k vertices plus an extra vertex contained in $f_2(k)$ singletons gives a nondecomposable $f_2(k)$ -regular 2-graph on k+1 vertices, we have $3 \leq f_2(k) \leq f_2(k+1)$ when $k \geq 7$ . Moreover, for $k \geq 7$ , $f_2(k)$ must be odd. Let G be a nondecomposable $f_2(k)$ -regular 2-graph on k vertices. We may assume that G has at most one vertex incident with singletons. We construct a nondecomposable $f_2(k)$ -regular graph G' as follows. If G does not have a singleton, then let G' = G. Otherwise let x be the vertex incident with the singleton $\{x\}$ . Let p be the multiplicity of $\{x\}$ in E(G). Then $1 \le p \le f_2(k)$ , and we construct G' according to the following cases. Case 1. $p = f_2(k)$ . Then x is an isolated vertex. Let G' = G - x. Case 2. p = 2m. We remove p copies of $\{x\}$ , add in new vertices y and z, add m copies of xy, m copies of xz, and $f_2(k) - m$ copies of yz. Let G' be the resulting graph. Case 3. $p=2m+1 < f_2(k)$ . We remove p copies of $\{x\}$ , add new vertices y, z, and w, add 2m+1 copies of the edge xy, $\frac{f_2(k)-2m-1}{2}$ copies of yz and yw, and $\frac{f_2(k)+2m+1}{2}$ copies of zw. Let G' be the resulting graph. It is readily seen that in each of the preceding cases, the resulting graph G' is $f_2(k)$ -regular. Since G is nondecomposable, we know that G' is nondecomposable and G' has at most k+3 vertices. By Theorem 2.1, G' has a 2-factor-free component C which is incident with at least $f_2(k)$ cut edges. Each of these cut edges joins C with a component of G with at least three vertices. Then we have $3f_2(k) + |V(C)| \le |V(G')|$ and hence $$f_2(k) \le \frac{|V(G')| - |V(C)|}{3} \le \frac{k+3-1}{3} = \frac{k+2}{3}. \tag{1}$$ Let k=6r+i, where $r\geq 1,\ 1\leq i\leq 6$ , and $i\equiv k\pmod 6$ . By Eq. (1) we have $f_2(k)\leq \frac{k+2}{3}=\frac{6r+i+2}{3}=2r+1+\frac{i-1}{3}$ . Since $\lfloor \frac{i-1}{3}\rfloor\leq 1$ and $f_2(k)$ is odd, it follows that $f_2(k)\leq 2r+1=\frac{k+3-i}{3}$ . $\square$ As an immediate consequence of Theorem 2.2, we have the following exact decomposition theorem for (k, W)-BRRs with odd W. THEOREM 2.3. Let $k \geq 7$ and $1 \leq i \leq 6$ with $i \equiv k \pmod{6}$ , and let W be odd. Then the following statements hold. - 1. There exists a (k,W)-MBRR if $1 \le W \le \frac{k+3-i}{3}$ . 2. If $W \ge \frac{k+3-i}{3}$ , then every (k,W)-BRR can be decomposed into a $(k,\frac{k+3-i}{3})$ -BRR and $\frac{3W-k-3+i}{6}$ (k,2)-BRRs. # 3. NEW (k, W)-USBs By the previous decomposition theorem, we see that when W is odd and $W > \frac{k+3-i}{3}$ , the disjoint union of one $(k,\frac{k+3-i}{3})$ -USB and $\frac{3W-k-3+i}{6}$ copies of (k,2)-USB gives a (k,W)-USB. When W is odd and $W \leq \frac{k+3-i}{3}$ , no (k,W)-USB is the disjoint union of smaller universal switch blocks. Therefore, for any fixed k, we only need to design prime (k,r)-USBs for $r=3,5,\ldots,\frac{k+3-i}{3}$ . But for a fixed W, we only need to design a (k, W)-USB if $W < \frac{k+3-i}{3}$ , and a $(k, \frac{k+3-i}{3})$ -USB if $W \geq \frac{k+3-i}{2}$ . Let U(k, 1) be the complete graph of k vertices. Clearly, U(k, 1) is routable for all (k,1)-RRs. It is known that U(k,1) is an optimum (k,1)-USB with $\frac{k(k-1)}{2}$ Let U(k, 2) be the k-partite graph with vertex set $(V_1, \ldots, V_k)$ , $V_i = \{v_{i,1}, v_{i,2}\}$ , $i = 1, \dots, k$ , and the edge set $$\bigcup_{1 \le i < j \le k} \{v_{i,p} v_{j,p+(j-i)-1} | p = 1, 2\},\$$ where the second index of the subscript is evaluated to 1 when it is odd, and 2 otherwise. U(k, 2) has k(k - 1) switches and it has been known that U(k, 2) is an optimum (k,2)-USB for every $k\geq 2$ [Fan et al. 2002a]. For $k\geq 7$ and odd r $(3\leq r\leq \frac{k+3-i}{3})$ , no optimum (k,r)-USB is known. The complete (k,r)-USB $K_{k,r}$ has been used as the prime (k,r)-USB in Fan et al. [2002a], which results in a (k, W)-USB with O(W) switches when k is fixed. In the following, we design a new (k,r)-USB which uses much lesser switches than does the complete (k, r)-USB. Let $U_{k,r}$ be the k-partite graph with vertex set $V = V_1 \cup V_2 \cup \cdots \cup V_k$ , where $V_i = \{v_{i,j} | j = 1, 2, ..., r\}$ , and the edge set $$\bigcup_{1 \le i \ne j \le k} \{v_{i,j} v_{i',j'} | |j - j'| \le 1\}.$$ THEOREM 3.1. $U_{k,r}$ is a (k,r)-USB for any pair of (k,r) with k>1, r>1. PROOF. $U_{k,r}$ is universal when r is even because it contains a disjoint union of r/2 copies of $K_{k,2}$ . Therefore we assume that r is odd. Let R be any (k,r)-BRR. We show that R is routable in $U_{k,r}$ . Let R' be the (k,r+1)-RR obtained from R by adding singletons $\{1\},\ldots,\{k\}$ . Since r+1 is even, by Theorem 1.1, R' can be decomposed into a disjoint union of m (k,2)-RRs $R'=R'_1\cup R'_2\cup\cdots\cup R'_m$ , where r+1=2m. By removing the k singletons $\{1\},\ldots,\{k\}$ from $R'_1,\ldots,R'_m$ , we obtain $R_1,\ldots,R_m$ , respectively. Then R is a disjoint union of $R_1,\ldots,R_m$ . Next we show that the elements of R can be ordered as $e_1,e_2,\ldots,e_s$ with the following property: For each $1\leq h\leq s-1$ and $G_h=(\{1,\ldots,k\},\{e_1,\ldots,e_h\})$ and $e_{h+1}=\{p,q\},|d_{G_h}(p)-d_{G_h}(q)|\leq 1$ , where $d_{G_h}(p)$ denotes the degree of p in $G_h$ . We prove this by showing that $R_1\cup R_2\cup\cdots\cup R_n$ has the property, by induction on n. It is clear that any ordering of the elements of $R_1$ satisfies the property. Assume that the elements of $R_1 \cup R_2 \cup \cdots \cup R_{n-1}$ has an ordering $e_1, e_2, \ldots, e_t$ satisfying the conditions. We show that the elements of $R_n$ can be added to the back of $e_1, e_2, \dots, e_t$ so that the resulting sequence has the property. Note that $G_t = (\{1, \dots, k\}, R_1 \cup R_2 \cup \dots \cup R_{n-1})$ . Then degrees of vertices of $G_t$ are either 2(n-1) or 2(n-1)-1. The degrees of vertices of the 2-graph $(\{1,\ldots,k\},R_n)$ are either 1 or 2 because all vertices of the 2-graph $(\{1,\ldots,k\},R'_n)$ have degree 2 and $R_n$ is obtained from $R'_n$ by removing the singletons $\{1\}, \ldots, \{k\}$ . For each component of $R_n$ we order the edges as follows. First, we list all edges with two end-vertices of degree 2(n-1)-1 in $G_t$ . Second, for a maximal path of edges $l_1, l_2, \dots l_p$ such that both ends have different degrees in $G_t$ , we order the edges as $l_1, l_3, \dots, l_2, l_4, \dots$ Third, we list rest edges arbitrarily. Finally, list the singletons in $R_n$ if there are any. By the construction, we see that the ordering of edges of $(\{1,\ldots,k\},R_1\cup R_2\cup\cdots\cup R_n)$ satisfies the property. It follows from the induction hypothesis that the elements of $R_1 \cup R_2 \cup \cdots \cup R_m$ can be ordered to satisfy the property. Now we can obtain a detailed routing of R in $U_{k,r}$ by routing the nets of R in ordering $e_1, e_2, \ldots, e_s$ using the first available switches. Therefore, $U_{k,r}$ is universal. $\square$ We note that the number of switches in $U_{k,r}$ is $\frac{k(k-1)}{2}(3r-2)$ , while the number of switches in the complete $K_{k,r}$ is $\frac{k(k-1)}{2}r^2$ . It is known [Shyu et al. 2000] that an optimum (k,r)-USB has at least $\frac{k(k-1)}{2}r$ switches. Therefore, the number of switches in $U_{k,r}$ is at most three times the number of switches in an optimum (k,r)-USB. The good property is that the detailed routing in $U_{k,r}$ can be done efficiently. As shown in the proof of the theorem, we can order the nets in a routing requirement and then route them using the first available switches. Next we use $U_{k,r}$ as a prime (k,r)-USB to construct large compound (k,W)-USBs. For odd W (W>3) and $1 \le i \le 6$ with $i \equiv k \pmod 6$ , define $$U(k,W) = \left\{ \begin{array}{ll} U_{k,W}, & W \leq \frac{k+3-i}{3} \\ U_{k,\frac{k+3-i}{3}} + \frac{W - \frac{k+3-i}{3}}{2} U(k,2), & W > \frac{k+3-i}{3}. \end{array} \right.$$ ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 2, Article 19, Publication date: April 2007. THEOREM 3.2. The U(k, W) is a (k, W)-USB. The number of switches in U(k, W) over the number of switches in an optimum (k, W)-USB is at most $1 + \frac{k+1-i}{W}$ , which goes to 1 as W goes to $\infty$ . PROOF. Since any (k,W)-PBRR R can be decomposed into a $(k,\frac{k+3-i}{3})$ -PBRR and $\frac{W-\frac{k+3-i}{3}}{2}$ (k,2)-PBRRs, any $(k,\frac{k+3-i}{3})$ -PBRR is detail-routable in a $U_{k,\frac{k+3-i}{3}}$ , and any (k, 2)-PBRR is detail-routable in a (k, 2)-USB, R is routable in U(k, W). Thus U(k,W) is a (k,W)-USB. Since the number of switches in an optimum (k,W)-USB is at least $\frac{k(k-1)}{2}W$ , we have $\frac{\text{The number of switches in } U(k,W)}{\text{The number of switches in an optimum } (k,W)-USB}$ $$\leq \frac{\frac{k(k-1)}{2}[W+3(\frac{k+3-i}{3})-2]}{\frac{k(k-1)}{2}W} = 1 + \frac{k+1-i}{W}.$$ Clearly, this ratio approaches 1 when W is large. $\square$ The Routing Algorithm for U(k, W) with $W \ge \frac{k+3-i}{3}$ . Input: A (k, W)-RR R. **Step 1** Add singletons to R to obtain a (k, W)-BRR, still denoted by R. Let d = W and $\mathcal{F}_2 = \emptyset$ . **Step 2** Repeat the following steps until $d \leq \frac{k+3-i}{3}$ . **2.1** Applying the algorithm in Lovasz and Pummer [1986] to find a 2-factor F of R and set $\mathcal{F}_2 = \mathcal{F}_2 \cup \{F\}$ . **2.2** Set R = R - F and d = d - 2. **Step 3** Detailed routing: **3.1** If $R \neq \emptyset$ , rout R in $U_{k,\frac{k+3-i}{2}}$ according to the method used in the proof of Theorem 3.1 to obtain a detailed routing dR of R in U(k, W). **3.2** Repeat until $\mathcal{F}_2 = \emptyset$ . For $F \in \mathcal{F}_2$ , rout F in an unused U(k, 2) in U(k, W) obtaining a detailed routing dF in U(k, W). Set $\mathcal{F}_2 = \mathcal{F}_2 - F$ . **Step 4** $dR \cup (\bigcup_{F \in \mathcal{F}_0} dF)$ is a detailed routing of R in U(k, W). The correctness of this algorithm follows from Theorems 2.3 and 3.2. The running time of the algorithm is polynomially bounded in terms of k and Wbecause finding a 2-factor in the graph can be done in polynomial time. # 4. EXPERIMENTAL RESULTS FOR (4, W)-SBs From a practical point of view, it is more important to evaluate the quality of a switch block design in terms of entire-chip routability. Since there is no known theoretical model for the entire-chip routing of FPGAs, most justification was usually done through extensive routing experiments based on the original idea of Rose and Brown [1991]. Lemieux and Lewis [2003] proposed an analytical framework for entire-chip routings which uses the probabilistic model [Brown et al. 1992] and experiments to justify the entire-chip routability for FPGA routing structures. We adopt the well-known FPGA router VPR [Betz and Rose 1997] in our experiments. The purpose of our experiment is to see the performances of different types of switch blocks in terms of channel usage under the same placement and routing algorithm. We use the disjoint switch block and the complete switch block as references for the maximum channel usage and minimum channel usage. The logic block structure for our VPR runs is set to contain one four-input LUT and one flip-flop. All input and output pins of the logic block are able to connect to any track in their adjacent channels ( $F_c = W$ ). We conduct our experiments on 21 large benchmark circuits with 4-sided disjoint switch blocks (a disjoint union of U(4,1)s) [Brown et al. 1992], 4-sided symmetric universal switch blocks [Chang et al. 1996], Wilton's switch blocks [Wilton 1997], alternative universal switch blocks (AUSBs) [Fan et al. 2002a], the $U_{4,W}$ , and the complete (4,W)-SB. We note that the first four types of (4,W)-SB have different connection topology, but the same number of switches 6W. The alternative (4,W)-USB is isomorphic to the symmetric (4,W)-USB but has a different connection style. The $U_{4,W}$ has 18W-12 switches and the complete (4,W)-SB has $6W^2$ switches. Table I shows our experimental results. Figure 2 shows a final layout with the alternative (4,W)-USB for the benchmark circuit e64, which justified the correctness and completion of the scheme. We observe that compared with disjoint SBs, the three architectures based on the USB ( $F_3=3$ ) achieve approximately 6% less tracks for the routing test runs under 35 router iterations, and achieve about 3–5% less tracks for runs under 100 router iterations. The $U_{4,W}$ ( $F_s=9$ )-based architectures achieve 17.51% less tracks on 35 router iterations, and 14.41% less tracks on 100 router iterations. By contrast, the complete switch block has 22.12% less tracks on 35 router iterations and 18.18% less tracks on 100 router iterations, which are the best results possible. This provides evidence that USB-based FPGA chip design can lead to a better entire-chip routing result due to the optimal local routability. The result for $U_{4,W}$ and the complete switch blocks tells us that increasing the routing capacity (the number of switches) of switch blocks can significantly improve the global routability. # 5. CONCLUSIONS It was previously known that the best compound (k,W)-USB for $k\geq 7$ and large odd W can be constructed as a disjoint union of a noncompound $(k,f_2(k))$ -USB and $(W-f_2(k))/2$ copies of the optimum (k,2)-USB. In this article, we solved the exact value of $f_2(k)$ , namely, $f_2(k)=\frac{k+3-i}{3}$ where $1\leq i\leq 6$ and $i\equiv k\pmod 6$ , and gave an efficient noncompound (k,r)-USB $U_{k,r}$ design, where r is odd and $r\leq f_2(k)$ . Consequently, we obtained an exact near-optimum (k,W)-USB for all $k\geq 7$ and odd $W\geq 3$ , which basically solved the generic USB design problem. Table I. Channel Widths Required (by VPR router) for Different Benchmark Circuits and Switch Block Designs | | VPR with 35 iterations | | | | | | | | |-----------|------------------------|--------|--------|--------|--------------------|----------|--|--| | | Disjoint | USB | Wilton | AUSB | $U_{4,\mathrm{W}}$ | Complete | | | | alu4 | 10 | 10 | 10 | 10 | 9 | 8 | | | | apex2 | 12 | 11 | 11 | 11 | 10 | 9 | | | | apex4 | 12 | 12 | 12 | 12 | 10 | 9 | | | | bigkey | 7 | 6 | 7 | 6 | 6 | 6 | | | | clma | 12 | 11 | 11 | 12 | 10 | 10 | | | | des | 8 | 7 | 7 | 7 | 6 | 6 | | | | diffeq | 8 | 7 | 7 | 8 | 6 | 6 | | | | dsip | 7 | 7 | 7 | 7 | 6 | 6 | | | | elliptic | 10 | 10 | 10 | 10 | 8 | 8 | | | | ex1010 | 11 | 10 | 10 | 10 | 9 | 8 | | | | ex5p | 14 | 14 | 13 | 14 | 12 | 11 | | | | frisc | 13 | 12 | 11 | 12 | 10 | 10 | | | | misex3 | 12 | 10 | 10 | 10 | 9 | 9 | | | | pdc | 17 | 16 | 16 | 16 | 14 | 13 | | | | s298 | 7 | 7 | 7 | 7 | 6 | 6 | | | | s38417 | 8 | 7 | 7 | 7 | 7 | 6 | | | | s38584.1 | 9 | 8 | 8 | 8 | 7 | 7 | | | | seq | 12 | 11 | 11 | 11 | 10 | 9 | | | | spla | 13 | 13 | 13 | 13 | 11 | 10 | | | | tseng | 7 | 6 | 6 | 6 | 6 | 5 | | | | e64 | 8 | 8 | 8 | 8 | 7 | 7 | | | | Total | 217 | 203 | 202 | 205 | 179 | 169 | | | | Reduction | | -6.45% | -6.91% | -5.53% | -17.51% | -22.12% | | | | | VPR with 100 iterations | | | | | | | | |----------|-------------------------|-------|--------|-------|--------------------|----------|--|--| | | Disjoint | USB | Wilton | AUSB | $U_{4,\mathrm{W}}$ | Complete | | | | alu4 | 9 | 9 | 9 | 9 | 8 | 8 | | | | apex2 | 11 | 10 | 10 | 10 | 9 | 9 | | | | apex4 | 11 | 11 | 11 | 11 | 10 | 9 | | | | bigkey | 6 | 6 | 6 | 6 | 6 | 5 | | | | clma | 11 | 11 | 11 | 10 | 10 | 9 | | | | des | 7 | 6 | 7 | 6 | 6 | 6 | | | | diffeq | 7 | 7 | 7 | 7 | 6 | 6 | | | | dsip | 7 | 6 | 7 | 6 | 6 | 6 | | | | elliptic | 10 | 9 | 9 | 10 | 8 | 8 | | | | ex1010 | 10 | 9 | 10 | 9 | 8 | 8 | | | | ex5p | 12 | 12 | 13 | 12 | 11 | 10 | | | | frisc | 12 | 11 | 11 | 11 | 10 | 9 | | | | misex3 | 10 | 10 | 10 | 10 | 9 | 8 | | | | pdc | 16 | 15 | 15 | 16 | 13 | 12 | | | | s298 | 7 | 6 | 7 | 6 | 6 | 6 | | | | s38417 | 7 | 7 | 7 | 7 | 6 | 6 | | | | s38584.1 | 8 | 7 | 7 | 8 | 7 | 7 | | | | seq | 11 | 10 | 10 | 10 | 9 | 9 | | | | spla | 12 | 12 | 12 | 12 | 10 | 10 | | | | tseng | 6 | 6 | 6 | 6 | 5 | 5 | | | | e64 | 8 | 7 | 7 | 8 | 7 | 6 | | | | Total | 198 | 187 | 192 | 190 | 170 | 162 | | | | | | -5.5% | -3.0% | -4.0% | -14.41% | -18.18% | | | Routing succeeded with a channel width factor of 8. Fig. 2. Routing result of e64 by using alternative USB S-box, W = 8. Our noncompound (k,r)-USB design $U_{k,r}$ has $\frac{k(k-1)}{2}(3r-2)$ switches. Even though it has more switches, it has better routing properties. The detailed routing can be done by first ordering the nets and then using the first available switch for each net. We expect that this new type of (k,W)-SB design and routing scheme will have applications in the design of customized on-chip networks and customized FPGAs when a high routing capacity is required. The decomposition Theorems 1.1 and 2.3 showed the difference between even and odd density routing requirements. Odd density cases turn out to be much more complicated which seems a bit counterintuitive. This suggests that in practice we should avoid a design with an odd number of tracks. # ACKNOWLEDGMENT The authors wish to thank the anonymous reviewers for their constructive comments and suggestions. ACM Transactions on Design Automation of Electronic Systems, Vol. 12, No. 2, Article 19, Publication date: April 2007. ## **REFERENCES** - Betz, V. and Rose, J. 1997. A new packing, placement and routing tool for FPGA research. In *Proceedings of the 7th International Workshop on Field-Programmable Logic and Applications*. 213–222. http://www.eecg.toronto.edu/~jayar/software.html. - Betz, V., Rose, J., and Morquardt, A. 1999. Architecture and CAD for Deep-Submicron FPGAs. Kluwer Academic, Boston, MA. - Brown, S; Francine, R. J., Rose, J., and Vranesic, Z. G. 1992. Field-Programmable Gate Arrays. Kluwer Academic, Boston, MA. - CHANG, Y. W., WONG, D. F., AND WONG, C. K. 1996. Universal switch models for FPGA. ACM Trans. Des. Autom. Electron. Syst. 1, 1 (Jan.), 80–101. - Fan, H., Liu, G., and Liu, J. 2006. Minimal regular 2-graphs and applications. Sci. China Series A Math. 49, 158–172. - FAN, H., LIU, J., AND WU, Y. L. 2000. General models for optimum arbitrary-dimension FPGA switch box designs. In Proceedings of the IEEE International Conference on Computer-Aided Design (ICCAD) (Nov.). 93–98. - Fan, H., Liu, J., Wu, Y. L., and Cheung, C. C. 2001. On optimum switch box designs for 2-D FPGAs. In *Proceedings of the IEEE/ACM Design Automation Conference (DAC)* (Jun.). 203–208. - Fan, H., Liu, J., Wu, Y. L., and Wong, C. K. 2002a. Reduction design for generic universal switch blocks. ACM Trans. Des. Autom. Electron. Syst. 7, 4 (Dec.), 526–546. - Fan, H., Wu, Y. L., and Chang, Y. W. 2002b. Comment on general universal switch blocks. *IEEE Trans. Comput.* 51, 1 (Jan.), 93–95. - Lemieux, G. and Lewis, D. 2003. Design of Interconnection Networks for Programmable Logic. Kluewer Academic, Boston, MA. - LOVASZ, L. AND PUMMER, M. D. 1986. Matching Theory. Elsevier Science, New York. - Pan, J. F., Wu, Y. L., Yan, G., and Wong, C. K. 1998. On the optimal four-way switch box routing structures of FPGA greedy routing architectures. *Integration VLSI J. 25*, 137–159. - Rose, J. and Brown, S. 1991. Flexibility of interconnection structures for field-programmable gate arrays. *IEEE J. Solid-State Circ.* 26, 3, 277–282. - Shyu, M., Wu, G. M., Chang, Y. D., and Chang, Y. W. 2000. Generic universal switch blocks. *IEEE Trans. Comput.*, (Apr.), 348–359. - Wilton, S. J. E. 1997. Architecture and algorithms for field-programmable gate arrays with embedded memory. Ph.D. thesis, University of Toronto. - Wu, Y. L., Tsukiyama, S., and Marek-Sadowska, M. 1996. Graph based analysis of 2-D FPGA routing. *IEEE Trans. Comput.-Aided Des.* 15, 1, 33–44. Received October 2003; revised March 2006; accepted September 2006