# Decomposition Design Theory and Methodology for Arbitrary-Shaped Switch Boxes Hongbing Fan, Member, IEEE, Yu-Liang Wu, Member, IEEE, Ray Chak-Chung Cheung, Student Member, IEEE, and Jiping Liu **Abstract**—We consider the optimal design problem for arbitrary-shaped switch box, $(r_1, \ldots, r_k)$ -SB, in which $r_i$ terminals are located on side i for $i = 1, \dots, k$ and programmable switches are joining pairs of terminals from different sides. Previous investigations on switch box designs mainly focused on regular switch boxes in which all sides have the same number of terminals. By allowing different numbers of terminals on different sides, irregular switch boxes are more general and flexible for applications such as customized FPGAs and reconfigurable interconnection networks. The optimal switch box design problem is to design a switch box satisfying the given shape and routing capacity specifications with the minimum number of switches. We present a decomposition design method for a wide range of irregular switch boxes. The main idea of our method is to model a routing requirement as a nonnegative integer vector satisfying a system of linear equations and then derive a decomposition theory of routing requirements based on the theory of systems of linear Diophantine equations. The decomposition theory makes it possible to construct a large irregular switch box by combining small switch boxes of fixed sizes. Specifically, we can design a family of hyperuniversal (universal) (wd + c)-SBs with $\Theta(w)$ switches, where d and c are constant vectors and w is a scalar. We illustrate the design method by designing a class of optimal hyperuniversal irregular 3-sided switch boxes and a class of optimal rectangular universal switch boxes. Experimental results on the rectangular universal switch boxes with the VPR router show that the optimal design of irregular switch boxes does pay off. Index Terms—FPGA, reconfigurable interconnection network, switch box, switch block, universal, hyperuniversal. ## INTRODUCTION ☐ WITCH boxes, also called switch blocks, are fundamental Ocomponents in reconfigurable interconnection networks such as Field Programmable Gate Arrays (FPGAs). In general, a switch box consists of terminals (ports or pins) and prefabricated programmable switches that connect these terminals. A switch box is used to make physical connections for a given routing requirement by configuring its programmable switches. Usually, the terminals of a switch box are located on several sides and each switch joins a pair of terminals on different sides. A switch box is regular if all of its sides have the same number of terminals. A k-sided switch box with W terminals on each side is denoted by (k, W)-SB. Extensive investigations on regular switch boxes have been carried out in recent years due to their usage in FPGA routing networks. Rose and Brown [13] pioneered the investigation on switch boxes, in particular, the (4, W)-SBs, which are the key switch components in island-style two-dimensional FPGA architectures [3], [4]. • H. Fan is with the Department of Physics and Computer Science, Wilfrid Laurier University, Waterloo, ON Canada N2L 3C5. E-mail: hfan@wlu.ca. Manuscript received 26 Aug. 2004; revised 24 May 2005; accepted 25 Aug. 2005; published online 22 Feb. 2006. For information on obtaining reprints of this article, please send e-mail to: tc@computer.org, and reference IEEECS Log Number TC-0276-0804. They observed that the flexibility, $F_s$ , of a switch box, i.e., the maximum number of switches connecting a terminal, is an important factor for FPGA architecture design and showed experimentally that $F_s = 3$ or 4 is a good trade-off between the complexity of a switch box and the routability of the FPGAs. Chang et al. [5] proposed the concept of Universal Switch Block (USB). A switch box is said to be universal if it is able to accommodate any routing requirement consisting of 2-pin nets, where a 2-pin net is a request of connecting two terminals. They designed the first optimal (4, W)-USB, which has 6W switches and flexibility 3, and showed a significant improvement in routability over the disjoint switch boxes used in Xilinx 4000 series FPGAs. The generic universal (k, W)-SBs for $k \ge 5$ were further studied in [10], [8], [14]. Fan et al. [9], [11], [8] generalized the concept of universal switch boxes to Hyper-Universal Switch Boxes (HUSB) for multipin net routing cases. They designed (4, W)-HUSBs with at most 6.34W switches and showed a significant improvement on routability over (4, W)-USBs. The main results in [9], [11], [8] include a decomposition theory for routing requirements and a reduction design scheme that breaks the problem into a problem of designing a few small switch boxes and constructs a large switch box by combining the small ones. Irregular switch boxes (different sides may have a different number of terminals), on the other hand, provide extra flexibility in designing reconfigurable interconnection networks. We write $(r_1, \ldots, r_k)$ -SB for a k-sided switch box with $r_i$ terminals on side i for i = 1, ..., k. Fig. 1 illustrates four irregular switch boxes, where the black solid points on Y.-L. Wu is with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT Hong Kong. E-mail: ylw@cse.cuhk.edu.hk. R.C.C. Cheung is with the Department of Computing, Imperial College London, UK SW7 2AZ. E-mail: cccheung@ieee.org. J. Liu was with the Department of Mathematics and Computer Science, The University of Lethbridge, Lethbridge, AB Canada T1K 3M4. Fig. 1. Examples of irregular switch boxes. (a) (3, 2, 2)-SB. (b) (3, 2, 3, 2)-SB. (c) (2, 2, 3, 2, 3)-SB. (d) (2, 3, 2, 3, 2, 3)-SB. Fig. 2. (a) Tree of meshes (fat tree) with 3-sided regular/irregular switch boxes. (b) Irregular switch boxes in a reconfigurable SoC. the edges of polygons represent terminals and a dashed line joining two points indicates a switch. Some recent technological advances have somehow stirred interest in developing more general irregular switch boxes, which, for example, can allow more flexibility in the design of customized FPGA cores in SoCs. Betz and Rose [1] studied the directional bias and nonuniform FPGA architectures experimentally. Hallschmid and Wilton [12] further investigated embedded FPGA cores with unequal vertical and horizontal channel densities in which rectangular switch boxes (i.e., $(r_1, r_2, r_1, r_2)$ -SBs) are used. Dehon [7] used both regular and irregular 3-sided switch boxes in a tree of mesh (fat tree) interconnection network, as shown in Fig. 2a, where $d_1:d_2:d_3$ gives the ratios of channel densities of the 3-sided switch boxes. Fig. 2b depicts a scenario where irregular switch boxes might be desirable in a reconfigurable SoC. In the design of rearrangeable Polygonal Switching Network (PSN) on N terminals [15], a regular $(\sqrt{N}, \sqrt{N})$ -USB is used at the center. However, when $\sqrt{N}$ is not an integer, an irregular universal switch box should be used. Even though many results have been achieved for regular switch box designs, there are very few results on the design of irregular ones. The purpose of this paper is to present a general decomposition design method for the arbitrary-shaped switch boxes including both regular and irregular switch boxes. Our main contributions consist of a new algebraic model for routing requirements, a new decomposition theory for routing requirements, and a new reduction design scheme for switch boxes of arbitrary shape. As a result, the design of large irregular switch boxes can be decomposed into the designs of small switch boxes. In other words, we can design a few small prime switch boxes and use them to build large compound switch boxes. Moreover, the switch boxes designed in this way have a linear number of switches and the routing can be done efficiently. These results are generalizations of the previous work on regular switch boxes [5], [9], [11]. We will also provide some conditions for obtaining optimal designs under this scheme. ## 2 TERMINOLOGY AND PRELIMINARIES Similarly to the regular switch box design problem, the design of an arbitrary-shaped switch box is to meet 1) a given channel density specification, i.e., the number of sides and the number of terminals on each side, 2) a given routing capacity specification, i.e., the type of routing requirements to be routed in the switch box, and 3) the minimization of the number of switches. Meanwhile, the switch boxes designed must not be too complicated to fabricate and must be easily routed. A channel density specification can be simply described by a vector $(r_1, \ldots, r_k)$ , referred to as a *channel density vector*, for a k-sided switch box with $r_i$ terminals on side i for $i=1,\ldots,k$ . To simplify the design problem, we are interested in designing a generic class of switch boxes with channel density vector $(w\mathbf{d}+\mathbf{c})$ , where $\mathbf{d}$ and $\mathbf{c}$ are constant nonnegative k-dimensional integer row vectors called the density ratio vector and residual vector, respectively. The corresponding switch boxes are the class of generic $(w\mathbf{d}+\mathbf{c})$ -SBs. For example, the 1:1:2 switch boxes used in the mesh tree structure (Fig. 2a) are generic irregular switch boxes with density ratio vector $\mathbf{d}=(1,1,2)$ and residual vector $\mathbf{c}=(0,0,0)$ . In particular, regular (k,w)-SBs are generic $(w\mathbf{d}+\mathbf{c})$ -SBs with $\mathbf{d}=(1,\ldots,1)$ and $\mathbf{c}=(0,\ldots,0)$ . The routing capacity of a switch box can be characterized by a set of routable routing requirements. If, in an application, only a few types of connections can occur, then there is no need to design a switch box capable of realizing all kinds of connections. Under this consideration, we will use a set of nets that are to be routed, called the *net pattern set* $\mathcal{P}$ , to describe a routing capacity specification. A switch box is said to be $\mathcal{P}$ -universal if it is routable for all (legitimate) routing requirements consisting of nets from $\mathcal{P}$ . In this way, we can unify all previous routing capacity specifications. For instance, a universal switch box is just $\mathcal{P}_2$ -universal, where $\mathcal{P}_2 = \{N: N \subset \{1, \dots, k\}, |N| \leq 2\}$ , while a hyperuniversal switch box is $\mathcal{P}_k$ -universal, where $\mathcal{P}_k = \{N: N \subset \{1, \dots, k\}\}$ . There are three challenging subproblems in the switch box design. The first problem is how to compute the set of all legitimate routing requirements subject to a given channel density specification $(r_1, \ldots, r_k)$ and a net pattern set $\mathcal{P} = \{S_1, \dots, S_t\}$ . This problem was not fully solved, even for regular switch box designs [11]. In this paper, we will present a general method to solve this problem. We represent a routing requirement as a nonnegative integer vector $X = (x_1, \dots, x_t)$ satisfying the system of linear equations $AX^T = (r_1 \dots, r_k)^T$ , where A is a matrix determined by $\mathcal{P}$ , and $(r_1, \dots, r_k)^T$ denotes the transpose of $(r_1, \ldots, r_k)$ . With this modeling, we are able to compute all routing requirements by solving the system, where only nonnegative integer solutions are considered. It is known that there are a finite number of "minimal" solutions to such a system of equations, which can be computed by some existing algorithms. It is also known that any solution to $AX^T = (r_1, \dots, r_k)^T$ can be expressed as a sum of a minimal solution of $AX^T = (r_1, \dots, r_k)^T$ and a nonnegative integer linear combination of minimal solutions of $AX^{T} = \mathbf{0}$ . In other words, a routing requirement can be decomposed into a few minimal routing requirements which depend on k and $\mathcal{P}$ only. The second problem is how to construct an optimal (with the minimum number of switches) $\mathcal{P}$ -universal (arbitrary-shaped) switch box. This is a difficult problem. This problem was only previously solved for a few special cases, e.g., (k, W)-USBs for even W [5], [10]. Using the decomposition property of routing requirements, we show that a large $\mathcal{P}$ -universal switch box can be obtained by combining a finite number of fixed small $\mathcal{P}$ -universal switch boxes (called prime switch boxes). A switch box obtained in this way is called a *compound* switch box and it can be optimal under certain conditions. Thus, we are able to reduce the design of large switch boxes to the design of much smaller ones, hence reducing the design complexity. For fixed integer vectors $\mathbf{d}$ and $\mathbf{c}$ , our compound hyperuniversal $(w\mathbf{d}+\mathbf{c})\text{-}\mathrm{SB}$ can have $\Theta(w)$ switches, which is of the same complexity as an optimal hyperuniversal $(w\mathbf{d}+c)\text{-}\mathrm{SB}$ . Compared to the trivial design of a complete $(w\mathbf{d}+\mathbf{c})\text{-}\mathrm{SB}$ that has $\Theta(w^2)$ switches, our approach is clearly more useful. The third problem is to find a feasible routing in the designed switch box for any given routing requirement. This problem has not been solved for general switch boxes. But, for our compound switch boxes, a linear time routing algorithm can be derived. We will illustrate our decomposition design method by designing optimal hyperuniversal (4,5,6)-SB and (5,6,7)-SB through the design of optimal generic hyperuniversal (w,w+1,w+2)-SBs. We further design optimal rectangular universal (w,2w,w,2w)-SBs and test their routability with large MCNC benchmark circuits using the extended VPR [2] router. The experimental results demonstrate that the optimal switch boxes have better performance. The rest of this paper is organized as follows: Section 3 describes the graph modeling for arbitrary-shaped switch boxes, an algebraic modeling for routing requirements, as well as the general switch box design problem. In Section 4, we show how to compute routing requirements by solving the corresponding linear equation systems and prove two decomposition theorems which form the bases for our switch box design technique. In Section 5, the generalized reduction design scheme for switch boxes of arbitrary shape is introduced. Section 6 gives the design of a class of rectangular switch boxes with experimental results on routability by the VPR. Section 7 concludes the paper. ## 3 Modeling for Arbitrary-Shaped Switch Box Design In this section, we formally model a routing requirement as a nonnegative integer vector satisfying a system of linear equations and a switch box as a graph with terminals as vertices and switches as edges and a feasible routing for a routing requirement as a spanning forest of the switch box graph. The graph modeling of switch boxes was introduced in [11] and the vector representation for a routing requirement is a generalization of the routing requirement vector (RRV) used in [5], [14]. ## 3.1 Graph Modeling of Switch Boxes As in [11], we view a k-sided switch box as a k-partite graph. For an $(r_1,\ldots,r_k)$ -SB, denote the jth terminal on side i by a vertex $v_{i,j},\ i=1,\ldots,k, j=1,\ldots,r_i$ , and a switch joining terminals $v_{i,j}$ and $v_{i',j'}$ by an edge $v_{i,j}v_{i',j'}$ . Let $V_i=\{v_{i,j}:j=1,\ldots,r_i\}, i=1,\ldots,k$ , then $(V_1,\ldots,V_k)$ forms a partition of the vertex set. A complete $(r_1,\ldots,r_k)$ -SB, denoted by $K_{(r_1,\ldots,r_k)}$ , corresponds to the complete k-partite graph with vertex set $V_1\cup\ldots\cup V_k$ and edge set $\{v_{i,j}v_{s,t}:i,s=1,\ldots,k;j,t=1,\ldots,W;i\neq s\}$ . The *disjoint union* of two k-sided switch boxes $G_1$ and $G_2$ with disjoint sets of terminals, denoted by $G_1 + G_2$ , is defined to be a k-sided switch box with the ith side being the union of the ith side of $G_1$ and the ith side of $G_2$ for Fig. 3. Demonstration for arbitrary-shaped switch boxes, disjoint unions, routing requirements, and feasible routings. (a) (2, 2, 2,)-SB. (b) (1, 2, 3)-SB. (c) (2, 3, 4)-SB. (d) (4, 5, 6)-SB. (e) (5, 6, 7)-SB. (f) (4, 5, 6)-RR. (g) Feasible routing R in $U_2 + U_0$ . (h) Feasible routing of R1 in $U_2$ . (i) Feasible routing of R2 in $U_0$ . $i = 1, \dots, k$ , and all switches of $G_1$ and $G_2$ . The disjoint union of h copies of $G_1$ (the terminals of each copy are considered to be different) is denoted by $hG_1$ . As depicted in Fig. 3, the (4,5,6)-SB $U_2 + U_0$ in (d) is the disjoint union of the (2,3,4)-SB $U_2$ in (c) and the (2,2,2)-SB $U_0$ in (a). The (5,6,7)-SB $U_1 + 2U_0$ in (e) is the disjoint union of the (2,3,4)-SB $U_1$ in (b) and two copies of $U_0$ . ## 3.2 Algebraic Modeling of Routing Requirements A switch box is used to route signals from different channels specified by routing requirements consisting of a set of nets. A net specifies a group of sides which a signal can reach and a routing of the net determines the exact terminals and switches (setting to on) that carry the signal. Therefore, a net, or a t-pin net, for which a k-sided switch box specifies t different sides $i_1, \ldots, i_t$ from which t terminals must be connected, can be expressed as $\{i_1,\ldots,i_t\}$ , which is a subset of $\{1,\ldots,k\}$ . Thus, a net pattern set $\mathcal{P}$ can be expressed as a set of types which are allowed to happen in a routing requirement. A routing requirement can be expressed as a multiset of subsets from a net pattern set because two different nets have the same subset representation if they specify the same group of sides. Formally, a routing requirement can be defined as follows: **Definition 1.** Given a channel density vector $(r_1, \ldots, r_k)$ and a net pattern set P, where P is a set of subsets of $\{1, \ldots, k\}$ . A routing requirement for an $(r_1, \ldots, r_k)$ -SB with nets coming from $\mathcal{P}$ , or simply a $\mathcal{P}$ -net $(r_1,\ldots,r_k)$ -RR, is a multiset $\{N_1,\ldots,N_m\}$ such that 1. $$N_i \in \mathcal{P}$$ for $j = 1, ..., m$ and 1. $$N_j \in \mathcal{P}$$ for $j = 1, ..., m$ and 2. $|\{j : i \in N_j, j = 1, ..., m\}| \le r_i, i = 1, ..., k$ . Condition 1 indicates that every net in a routing requirement is from P, i.e., a P-net, and Condition 2 means that a routing requirement must be legitimate, i.e., it is subject to the channel density constraint: The number of nets specifying a side cannot be bigger than the channel density of the side. Without loss of generality, we may assume that ${\cal P}$ contains singletons $\{i\}$ for i = 1, ..., k and equalities hold in Condition 2. That is: 2'. $$|\{j: i \in N_j, j = 1, \dots, m\}| = r_i, i = 1, \dots, k.$$ In order to compute all $\mathcal{P}$ -net $(r_1, \ldots, r_k)$ -RRs for the purpose of testing a switch box design, we introduce the following algebraic modeling for routing requirements: **Definition 2.** Let $\mathcal{P} = \{S_1, \dots, S_t\}$ be a net pattern set. For any $\mathcal{P}$ -net $(r_1,\ldots,r_k)$ -RR, $R=\{N_1,\ldots,N_m\}$ , let $x_i$ be the number of occurrences of $S_i$ in R for i = 1, ..., t. We call *vector* $(x_1, \ldots, x_t)$ *the* $\mathcal{P}$ -net routing requirement vector of R, and abbreviate it as $\mathcal{P}$ -net $(r_1, \ldots, r_k)$ -RRV. Equivalently, a nonnegative integer vector X = $(x_1,\ldots,x_t)$ is a $\mathcal{P}$ -net $(r_1,\ldots,r_k)$ -RRV iff it satisfies $$AX^{T} = (r_1, \dots, r_k)^{T}, \tag{1}$$ where A is the incidence matrix of $\mathcal{P}$ , i.e., $A = [a_{i,j}]_{k \times t}$ and for $i = 1, \ldots, k, j = 1, \ldots, t, \ a_{i,j} = 1 \text{ iff } i \in S_j.$ With the above modeling, we can generate all $\mathcal{P}$ -net $(r_1, \ldots, r_k)$ -RRs by calculating all nonnegative integer solutions of (1). For example, given net pattern set $$\mathcal{P} = \{\{1\}, \{2\}, \{3\}, \{1, 2\}, \{1, 3\}, \{2, 3\}, \{1, 2, 3\}\},\$$ the incidence matrix of $\mathcal{P}$ is $$A = \begin{pmatrix} 1 & 0 & 0 & 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 & 0 & 1 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 \end{pmatrix}.$$ Fig. 3f depicts a $\mathcal{P}$ -net (4,5,6)-RR R. The subset representation of R is $$R = \{\{1,2\},\{1,3\},\{1,3\},\{2,3\},\{2,3\},\{2,3\},\{1,2,3\}\}.$$ The vector representation of R is X = (0, 0, 0, 1, 2, 3, 1), which satisfies $AX^T = (4, 5, 6)^T$ . ## 3.3 Feasible Routing A feasible routing of a routing requirement in a switch box is an ON/OFF assignment of the switches such that every net in the routing requirement is realized by a group of assigned terminals and ON-switches connecting the terminals and the realizations of different nets are not connected. We extend the modeling of feasible routing in [11] to arbitrary-shaped switch boxes as follows: **Definition 3.** Let G be an $(r_1, \ldots, r_k)$ -SB with sides $V_i = \{v_{i,j} : j = 1, \ldots, r_i\}$ , $i = 1, \ldots, k$ and let $R = \{N_1, \ldots, N_m\}$ be an $(r_1, \ldots, r_k)$ -RR. We say that R is routable in G if G contains m vertex disjoint subtrees $T_1, \ldots, T_m$ , i.e., a forest, such that, for each $i = 1, \ldots, m$ and $j \in N_i$ , $|V(T_i) \cap V_j| = 1$ . We call $\{T_1, \ldots, T_m\}$ a feasible routing for R and $T_i$ a feasible routing for net $N_i$ . An $(r_1, \ldots, r_k)$ -SB G is said to be $\mathcal{P}$ -universal if every $\mathcal{P}$ -net $(r_1, \ldots, r_k)$ -RR is routable in G. A $\mathcal{P}$ -universal $(r_1, \ldots, r_k)$ -SB is optimal if it has the minimum number of switches over all $\mathcal{P}$ -universal $(r_1, \ldots, r_k)$ -SBs. In particular, a $\mathcal{P}_2$ -universal $(r_1, \ldots, r_k)$ -SB is called a universal $(r_1, \ldots, r_k)$ -SB (or simply $(r_1, \ldots, r_k)$ -USB). A $\mathcal{P}_k$ -universal $(r_1, \ldots, r_k)$ -SB is called a hyperuniversal $(r_1, \ldots, r_k)$ -SB (or simply $(r_1, \ldots, r_k)$ -HUSB). We note that, in a feasible routing of a routing requirement, a net can be assigned to use any available terminals on the sides specified by the net and the connection is made by turning on the switches corresponding to the subtree. We also note that a singleton net in a routing requirement only uses a terminal, it does not use any switch, so that adding or removing singletons does not affect the routability of a routing requirement. Fig. 3g shows a feasible routing of the routing requirement given in Fig. 3f in the (4,5,6)-SB $U_2+U_0$ . ## 3.4 Switch Box Design Problems As we mentioned in the introduction, the main switch box design problem is to design an optimal switch box satisfying given specifications on channel density and routing capacity. The problem can be formally described as follows: General switch box design problem. Given a channel density vector $(r_1, ..., r_k)$ and a net pattern set $\mathcal{P}$ , design a $\mathcal{P}$ -universal $(r_1, ..., r_k)$ -SB with the least number of switches. **Switch box design problem.** Given a density ratio vector $\mathbf{d}$ , a residual vector $\mathbf{c}$ and a net pattern set $\mathcal{P}$ , design a $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SB with the least number of switches for every w > 1. Since a design scheme for $(w\mathbf{d} + \mathbf{c})$ -SB can be used to solve an $(r_1, \dots, r_k)$ -SB design problem by choosing proper $\mathbf{d}$ , $\mathbf{c}$ , and $w_0$ such that $w_0\mathbf{d} + \mathbf{c} = (r_1, \dots, r_k)$ , we will focus on the $(w\mathbf{d} + \mathbf{c})$ -SB design problem. ## 4 ROUTING REQUIREMENT VECTOR GENERATION AND DECOMPOSITION THEOREMS To design $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SBs, we need first to compute all $\mathcal{P}$ -net $(w\mathbf{d} + \mathbf{c})$ -RRs for the purpose of testing a design. This computation can be done by solving for all nonnegative integer solutions X satisfying the following system of linear equations for all $w \geq 1$ : $$AX^{T} = (w\mathbf{d} + \mathbf{c})^{T}. (2)$$ In this section, we present a systematic method to compute the routing requirement vectors using the Hilbert basis and prove the main decomposition theorems which form the basis of our reduction design scheme. ## 4.1 Generating RRVs by the Hilbert Basis In the field of algebra, a system of linear equations $AX^T = b^T$ is referred to as a *system of linear Diophantine equations* (SLDE) if the entries of A and b are integers and only nonnegative integer solutions are considered. A system $AX^T = \mathbf{b}$ is *homogeneous* if $\mathbf{b} = \mathbf{0}$ . We only consider systems of linear Diophantine equations in this paper. Let $X=(x_1,\ldots,x_t)$ and $X'=(x_1',\ldots,x_t')$ be two nonnegative integer solutions to an SLDE. We define $X' \leq X$ if $x_i' \leq x_i$ for all $i=1,\ldots,t$ and call X a minimal solution if there is no other solution X' such that $X' \leq X$ . It is known that the set of all minimal solutions of an SLDE is finite. The set of minimal solutions to a homogeneous SLDE is referred to as the Hilbert basis of the SLDE. We see that a solution X to $AX^T = \mathbf{b}^T$ can always be expressed as a sum of a minimal solution to $AX^T = \mathbf{b}^T$ and a nonnegative integer linear combination of the Hilbert basis of $AX^T = \mathbf{0}$ . In fact, the statement is obviously true if X is minimal. Otherwise, there is a minimal solution X'such that $X' \leq X$ , then X = X' + Y, where Y = X - X' is a solution to $AX^T = \mathbf{0}$ . If Y is a minimal solution of $AX^T = \mathbf{0}$ , then the statement is true. Otherwise, there is a minimal solution Y' such that $Y' \leq Y$ . Then, Y = Y' + (Y - Y') and Y - Y' is a solution to $AX^T = \mathbf{0}$ . Continuing this process, we conclude that Y is a sum of minimal solutions of $AX^T =$ $\mathbf{0}$ (repetition is allowed) so that Y can be expressed as a nonnegative integer linear combination of the Hilbert basis of $AX^T = \mathbf{0}$ . Therefore, we can generate all solutions of $AX^T = \mathbf{b}^T$ from the set of minimal solutions of $AX^T = \mathbf{b}^T$ and the Hilbert basis of $AX^T = \mathbf{0}$ . A Hilbert basis algorithm can also be used to compute the set of all minimal solutions to nonhomogeneous system $AX^T = \mathbf{b}^T$ . This can be done in three steps: First, compute the Hilbert basis of $(A, -\mathbf{b}^T)(X, y)^T = \mathbf{0}$ , then select those solutions with component of y equal to one, and, finally, remove the y component from the selected solutions. The computation for the Hilbert basis has been studied extensively in recent years. We can apply an existing Hilbert basis algorithm, e.g., Contejean and Devie [6], to the computation of routing requirement vectors as follows: Let X be a $\mathcal{P}$ -net $(w\mathbf{d}+\mathbf{c})$ -RRV, then (X,w) satisfies (2) or, equivalently, $$(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T.$$ (3) Therefore, (X, w) can be expressed as the sum of a minimal solution of (3) and a nonnegative integer linear combination of the Hilbert basis of the following homogeneous equation system: $$(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T. \tag{4}$$ That is, $$(X, w) = (X', w') + \sum_{i=1}^{m} a_i(X_i, w_i),$$ (5) where (X',w') is a minimal solution of (3) and $(X_i,w_i), i=1,\ldots,m$ are all minimal solutions of (4) and $a_i$ s are nonnegative integers. Relation (5) implies that $w=w'+a_1w_1+\cdots+a_mw_m$ , and $X=X'+a_1X_1+\cdots+a_mX_m$ . In summary, we have the following theorem: **Theorem 1.** Let $\mathbf{d}$ and $\mathbf{c}$ be k-dimensional nonnegative integer vectors and A be the incidence matrix of a net pattern set $\mathcal{P}$ . Then, a nonnegative integer vector X is a $\mathcal{P}$ -net $(w\mathbf{d} + \mathbf{c})$ -RRV iff (X, w) can be expressed as the sum of a minimal solution of $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$ and a nonnegative integer linear combination of the minimal solutions of $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T$ . Therefore, we can obtain all $\mathcal{P}$ -net ( $w\mathbf{d} + \mathbf{c}$ )-RRVs (RRs) by computing the set of minimal solutions of (3) and (4). ## 4.2 Decomposition Theorems In this section, we prove the main decomposition theorems that power our switch box design technique. The following lemma was developed in [11]: **Lemma 2.** Let $w_1, \ldots, w_m$ be positive integers and p be the least common multiple of $w_1, \ldots, w_m$ . If $a_1w_1 + \cdots + a_mw_m \ge mp - m + 1$ , where $a_1, \ldots, a_m$ are nonnegative integers, then there are integers $y_1, \ldots, y_m$ such that $0 \le y_i \le a_i, i = 1, \ldots, m$ and $y_1w_1 + \cdots + y_mw_m = p$ . We first extend Lemma 2 to a set of nonnegative integer vectors. **Lemma 3.** Let $S = \{w_1, \dots, w_m\}$ be a set of positive integers. Then, there exist positive integers p and q with the following property: For any given integer $w \ge 0$ , there is an integer $p_w$ with $0 \le p_w \le q$ such that if a nonnegative integer vector $(a_1, \dots, a_m)$ satisfies $a_1w_1 + \dots + a_mw_m = w$ , then there exist $\frac{w-p_w}{p} + 1$ nonnegative integer vectors $(y_{i,1}, \dots, y_{i,m}), i = 1, \dots, \frac{w-p_w}{p} + 1$ satisfying $$y_{i,1}w_1 + \dots + y_{i,m}w_m = \begin{cases} p, & i = 1, \dots, \frac{w - p_w}{p}, \\ p_w, & i = \frac{w - p_w}{p} + 1, \end{cases}$$ (6) and $$\sum_{i=1}^{\frac{w-p_w}{p}+1} (y_{i,1}, \dots, y_{i,m}) = (a_1, \dots, a_m).$$ (7) **Proof.** Let p be the least common multiple of $w_1, \ldots, w_m$ and let q = mp - m. We show by induction on w that p and q satisfy the property of the lemma. When w=0, then $(a_1,\ldots,a_m)$ must be $(0,\ldots,0)$ and $p_w=0$ and $(y_{1,1},\ldots,y_{1,m})=(0,\ldots,0)$ satisfy the conditions. We continue to prove the truth of the lemma for w by assuming the truth for all values less than w. If $w \le q = mp - m$ , choose $p_w = w$ . Then, $\frac{w - p_w}{p} = 0$ . For any nonnegative vector $(a_1, \ldots, a_m)$ with $a_1w_1 + \cdots + a_mw_m = w$ , we choose $$(y_{1,1},\ldots,y_{1,m})=(a_1,\ldots,a_m).$$ Then, the statement holds. Otherwise, we have w>q=mp-m, let w'=w-p. Since w'< w, by the induction hypothesis, there exists a $p_{w'}$ with $0 \le p_{w'} \le q$ satisfying the statements with respect to w'. Let $p_w=p_{w'}$ . We show that $p_w$ satisfies the requirements with respect to w. Let $(a_1,\ldots,a_m)$ be a nonnegative integer vector satisfying $a_1w_1+\cdots+a_mw_m=w$ . Since w>q=mp-m, we have $w\geq mp-m+1$ . By Lemma 2, there exists a vector $(y_{1,1},\ldots,y_{1,m})$ such that $(0,\ldots,0)\preceq (y_{1,1},\ldots,y_{1,m})\preceq (a_1,\ldots,a_m)$ and $\sum_{j=1}^m y_{1,j}w_j=p$ . Now, consider $(a_1-y_{1,1},\ldots,a_m-y_{1,m})$ . Since $(a_1-y_{1,1})w_1+\ldots+(a_m-y_{1,m})w_m=w-p=w'< w$ , by the induction hypothesis, there exist vectors $(y_{i,1},\ldots,y_{i,m}), i=2,\ldots,\frac{w'-p_{w'}}{p}+2=\frac{w-p_{w}}{p}+1$ satisfying (6) and (7) with respect to $(a_1-y_{1,1},\ldots,a_m-y_{1,m})$ . Therefore, $(y_{i,1},\ldots,y_{i,m}), i=1,\ldots,\frac{w-p_{w}}{p}+1$ satisfy (6) and (7) with respect to $(a_1,\ldots,a_m)$ . **Theorem 4 (Decomposition Theorem).** Let $\mathbf{d}$ and $\mathbf{c}$ be k-dimensional nonnegative integer vectors and $\mathcal{P}$ be a net pattern set. Then, there exist positive integers p and q with the following property: For any given $w \geq 0$ , there is an integer $p_w$ with $0 \leq p_w \leq q$ such that every $\mathcal{P}$ -net $(w\mathbf{d} + \mathbf{c})$ -RRV can be expressed as the sum of one $(p_w\mathbf{d} + \mathbf{c})$ -RRV and $\frac{w-p_w}{p}$ $(p\mathbf{d})$ -RRVs. **Proof.** Let $\mathcal{B}_0 = \{(X_1, w_1), \dots, (X_m, w_m)\}$ be the Hilbert basis of $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}^T$ and, if $\mathbf{c} \neq \mathbf{0}$ , let $\mathcal{B} = \{(X_1', w_1'), \dots, (X_l', w_l')\}$ be the set of all minimal solutions to $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$ ; otherwise, let $\mathcal{B} = \emptyset$ . Let $S = \{w_1, \dots, w_m\}$ . Then, there exist positive integers p' and q' satisfying the statements of Lemma 3. Let p = p' and q = q' + q'', where $q'' = \max\{w_1', \dots, w_l'\}$ . We show that p and q have the property of the theorem. Let X be a $\mathcal{P}$ -net $(w\mathbf{d} + \mathbf{c})$ -RRV. Then, (X, w) is a nonnegative integer solution to $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$ . If $0 \le w \le q$ , then let $p_w = w$ ; otherwise, let $p_w = w - np$ , where n is the least integer such that $p_w = w - np \le q$ . We show that $p_w$ and q satisfy the conditions. By Theorem 1, we have that (X,w) can be expressed a s $(X,w)=(X',w')+a_1(X_1,w_1)+\cdots+a_m(X_m,w_m)$ , where $(X',w')\in\mathcal{B}$ and $a_1,\ldots,a_m$ are nonnegative integers. Then, $w=w'+a_1w_1+\cdots+a_mw_m$ . For w-w', by Lemma 3 and the choice of p (=p') and q', there is an integer $p_{w-w'}$ with $0\leq p_{w-w'}\leq q'$ and anonnegative integer vectors $(y_{i,1},\ldots,y_{i,m}), i=1,\ldots,\frac{w-w'-p_{w-w'}}{p}+1$ satisfying (6) and (7), namely, $$\sum_{j=1}^{m} y_{i,j} w_j = \begin{cases} p, & i = 1, \dots, \frac{w - w' - p_{w - w'}}{p}, \\ p_{w - w'}, & i = \frac{w - w' - p_{w - w'}}{p} + 1 \end{cases}$$ and $$\sum_{i=1}^{ rac{w-w'-p_{w-w'}}{p}+1} (y_{i,1},\ldots,y_{i,m}) = (a_1,\ldots,a_m).$$ Therefore, $$a_j = \sum_{i=1}^{ rac{w-w'-p_{w-w'}}{p}+1} y_{i,j}, j=1,\dots,m.$$ Since $0 \le w - \frac{w-w'-p_{w-w'}}{p} \times p = w' + p_{w-w'} \le q$ (as $0 \le w' \le q''$ and $0 \le p_{w-w'} \le q'$ ) and n is the least integer such that $0 \le w - np \le q$ , we have $n \le \frac{w-w'-p_{w-w'}}{p}$ and, hence, $$(X, w) = (X', w') + \sum_{j=1}^{m} a_j(X_j, w_j)$$ $$= (X', w') + \sum_{j=1}^{m} \sum_{i=1}^{\frac{w-w'-p_{w-w'}+1}{p}} y_{i,j}(X_j, w_j)$$ $$= (X', w') + \sum_{i=1}^{\frac{w-w'-p_{w-w'}+1}{p}} \sum_{j=1}^{m} y_{i,j}(X_j, w_j)$$ $$= (X', w') + \sum_{i=n+1}^{p} \sum_{j=1}^{m} y_{i,j}(X_j, w_j)$$ $$+ \sum_{i=1}^{n} \sum_{j=1}^{m} y_{i,j}(X_j, w_j).$$ Then, $$(X',w') + \sum_{i=n+1}^{\frac{w-w'-p_{w-w'}}{p}+1} \sum_{j=1}^{m} y_{i,j}(X_j,w_j)$$ is a solution to $(A, -\mathbf{d}^T)(X, p_w)^T = \mathbf{c}^T$ , which corresponds to a $(p_w \mathbf{d})$ -RRV as $(A, -\mathbf{d}^T)(X', w')^T = \mathbf{c}^T$ and $(A, -d^T)(X_i, w_i)^T = \mathbf{0}^T$ for j = 1, ..., m, and $$w' + \sum_{i=n+1}^{\frac{w-w'-p_{w-w'}}{p}+1} \sum_{j=1}^{m} y_{i,j} w_j = w' + p(\frac{w-w'-p_{w-w'}}{p} - n) + p_{w-w'} = w - np = p_w.$$ And, for each $i=1,\ldots,n=\frac{w-p_w}{p}$ , $\sum_{j=1}^m y_{i,j}(X_j,w_j)$ is a solution to $(A,-\mathbf{d}^T)(X,p)^T=\mathbf{0}^T$ . Therefore, $\sum_{j=1}^m y_{i,j}X_j$ is a $(p\mathbf{d})$ -RRV for $i=1,\ldots,\frac{w-p_w}{p}$ . This completes the proof. Theorem 5 (Switch Box Construction Theorem). Let ${\bf d}$ and ${\bf c}$ be k-dimensional nonnegative integer vectors, P be a net pattern set, and w be any positive integer. Let $p_w$ and p be positive integers satisfying the statements of Theorem 4. Suppose that $U_0$ is a $\mathcal{P}$ -universal $(p\mathbf{d})$ -SB and $U_{p_w}$ is a $\mathcal{P}$ -universal $(p_w \mathbf{d} + \mathbf{c})$ -SB. Then, $U_{p_w} + \frac{w - p_w}{n} U_0$ is a $\mathcal{P}$ -universal ( $w\mathbf{d} + \mathbf{c}$ )-SB. **Proof.** By Theorem 4, every $\mathcal{P}$ -net $(w\mathbf{d} + \mathbf{c})$ -RR can be decomposed into one $\mathcal{P}$ -net $(p_w \mathbf{d} + \mathbf{c})$ -RR and $\frac{w - p_w}{r} \mathcal{P}$ -net $(p\mathbf{d}) ext{-RRs}$ . Since $U_0$ and $U_{p_w}$ are $\mathcal{P} ext{-universal}$ , each $(p\mathbf{d}) ext{-RR}$ is routable in one copy of the $U_0$ and the $(p_w \mathbf{d} + \mathbf{c})$ -RR is routable in $U_{p_w}$ . Therefore, $U_{p_w} + \frac{w - p_w}{p} U_0$ is $\mathcal{P}$ -universal. $\square$ For given vectors $\mathbf{d}$ , $\mathbf{c}$ , and a net pattern set $\mathcal{P}$ , by Theorems 4 and 5, we can design a $\mathcal{P}$ -universal ( $p\mathbf{d}$ )-SB and a $\mathcal{P}$ -universal $(r\mathbf{d} + \mathbf{c})$ -SB for every $1 \le r \le q$ and then use them to build other $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SBs by disjoint union operation for any nature number w. This is the main idea of our reduction design technique. #### 5 REDUCTION DESIGN SCHEME In this section, we present a general reduction design scheme for generic arbitrary-shaped switch boxes, followed by an example and optimality analysis. ## **Reduction Design Scheme for Generic Switch** 5.1 Given k-dimensional nonnegative integer vectors d and c and a net pattern set $\mathcal{P}$ . Determine the Hilbert basis and the set of minimal Compute the Hilbert basis $\mathcal{B}_0$ of $(A, -\mathbf{d}^T)(X, w)^T =$ $\mathbf{0}^T$ using an existing algorithm such as the one in [6] to obtain $\mathcal{B}_0 = \{(X_1, w_1), \dots, (X_m, w_m)\}.$ If $c \neq 0$ , compute the set $\mathcal{B}$ of all minimal solutions of $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$ by computing the Hilbert basis of $(A, -\mathbf{d}, -\mathbf{c}^T)(X, w, y)^T = \mathbf{0}$ and then selecting those solutions with y = 1 followed by removing the y component. Suppose $$\mathcal{B} = \{(X'_1, w'_1), \dots, (X'_l, w'_l)\}.$$ If $\mathbf{c} = \mathbf{0}$ , let $\mathcal{B} = \emptyset$ . - II. Determine integers p and q satisfying the statements of Theorem 4 using $S = \{w_1, \dots, w_m\}$ and $S' = \{w'_1, \dots, w'_l\}.$ - III. Design a $\mathcal{P}$ -universal $(p\mathbf{d})$ -SB $U_0$ and a $\mathcal{P}$ -universal $(r\mathbf{d} + \mathbf{c})$ -SB $U_r$ for each $r = 1, \dots, q$ . We call $U_0$ and $U_r$ prime switch boxes. Set up a feasible routing table $RT(U_0)$ recording a feasible routing for each (pd)-RRs in $U_0$ , and a routing table $RT(U_r)$ for every $U_r, 1 \leq r \leq q$ . - IV. For any $w \ge 1$ , construct a $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SB using the prime switch boxes $\{U_r: r=1,\ldots,q\}$ produced in III as follows: If $w \leq q$ , use the prime $(w\mathbf{d} + \mathbf{c})\text{-SB }U_w$ ; otherwise, choose the minimum nsuch that $w - np \le q$ . Then, the disjoint union of $U_{w-np}$ and n copies of $U_0$ , $U_{w-np} + nU_0$ , is a $\mathcal{P}$ -universal ( $w\mathbf{d} + \mathbf{c}$ )-SB. ### A routing algorithm Let $U = U_{p_w} + nU_0$ be a compound $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SB, where $p_w = w - np$ , and R be a P-net $(w\mathbf{d} + \mathbf{c})$ -RR. Then, the following procedure finds a feasible routing for R in U: **Step 1.** Transform R into a $(w\mathbf{d} + \mathbf{c})$ -RRV (X, w) satisfying $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T.$ **Step 2.** If $\mathbf{c} \neq \mathbf{0}$ , find $(X', w') \in \mathcal{B}$ such that $(X', w') \leq (X, w)$ ; otherwise, let $(X', w') = \mathbf{0}$ . **Step 3.** If $(X, w) \neq (X', w')$ , decompose Y = (X, w) - (X', w') into a nonnegative integer linear combination of $\mathcal{B}_0$ by the following procedure: ## Decomposition algorithm Input $$Y$$ and $\mathcal{B}_0 = \{(X_1, w_1), \dots, (X_m, w_m)\}$ for $j$ from 1 to $m$ do $$a_j = 0 \qquad \text{while } (X_j, w_j) \preceq Y \text{ do} \\ a_j \leftarrow a_j + 1, Y \leftarrow Y - (X_j, w_j) \\ \text{end while} \\ \text{end for} \\ \text{Output } a_1, a_2, \dots, a_m$$ Then, we have $$(X, w) = (X', w') + \sum_{j=1}^{m} a_j(X_j, w_j).$$ **Step 4.** Transform the above format to the following format by the algorithm given in the proof of Lemma 3. $$(X, w) = (X', w') + \sum_{i=n+1}^{\frac{w-w'-np}{p}+1} \sum_{j=1}^{m} y_{i,j}(X_j, w_j)$$ $$+ \sum_{i=1}^{n} \sum_{j=1}^{m} y_{i,j}(X_j, w_j),$$ (8) where $(y_{i,1}, \ldots, y_{i,m})$ , $i = 1, \ldots, \frac{w-w'-np}{p} + 1$ satisfy (6) and (7). **Step 5.** Using the feasible routing table $RT(U_{p_w})$ of $U_{p_w}$ , find a feasible routing of $$(X',w') + \sum_{i=n+1}^{\frac{w-w'-np}{p}+1} \sum_{j=1}^{m} y_{i,j}(X_j,w_j)$$ in the $U_{p_w}$ . For each $i=1,\ldots,n$ , find a feasible routing of $\sum_{j=1}^m y_{i,j}(X_j,w_j)$ in a $U_0$ from the routing table $RT(U_0)$ of $U_0$ . The correctness of the above decomposition algorithm follows from two facts: 1) Each intermediate Y (= $Y - (X_j, w_j)$ ) is still a nonnegative integer solution to $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}$ because the initial Y = (X, w) - (X', w') is and $(X_j, w_j) \leq Y$ ; 2) the outputs $a_1, a_2, \ldots, a_m$ satisfy that, for each h with $2 \leq h \leq m$ , $Y = (X, w) - (X', w') - \sum_{j=1}^{h-1} a_j(X_j, w_j)$ does not contain any of $(X_j, w_j)$ for $j = 1, \ldots, h-1$ because $a_h$ is the maximum nonnegative integer such that $$a_h(X_j, w_j) \leq (X, w) - (X', w') - \sum_{j=1}^{h-1} a_j(X_j, w_j).$$ The correctness of the above routing algorithm is guaranteed by the decomposition theorem and the fact that $U_0$ and $U_{p_w}$ are $\mathcal{P}$ -universal. We see that the above feasible routing algorithm is an exact algorithm with running time O(w). It is clear that Step 1 takes a linear time and Step 2 takes a constant time. For Step 3, since k is fixed, both $\mathcal{B}$ and $\mathcal{B}_0$ are finite sets, so it is processed in at most a linear time with the while loop. Therefore, Step 3 takes a linear time. Step 4 also takes a linear time because there is a fixed number of possible combinations and each of them takes a linear time to check. Since it takes a constant time to find a feasible routing by a routing table, it takes a linear time to complete Step 5. We note that the value of q in Step II could be smaller than the choice in the proof of Theorem 4. For instance, when $\mathbf{d} = (1,1,1,1)$ and $\mathbf{c} = (0,0,0,0)$ and hyperuniversal (w,w,w,w)-SBs are considered, it was proven in [11] that p=6 and q=7 meet the requirements. If we only want to construct a $\mathcal{P}$ -universal $(wd + \mathbf{c})$ -SB for a fixed number w, we just need to construct a $\mathcal{P}$ -universal $(p\mathbf{d})$ -SB $U_0$ and a $\mathcal{P}$ -universal $(p_w\mathbf{d} + \mathbf{c})$ -SB $U_{p_w}$ . Then, $U_{p_w} + \frac{w-p_w}{p}U_0$ is a $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SB. ## 5.2 Conditions for Design Optimality Next, we investigate when the design scheme will produce optimal switch boxes. A *full net pattern set* is a net pattern set $\mathcal P$ such that any pair $\{i,j\}$ is contained in a net of $\mathcal P$ for $1 \leq i,j \leq k$ . Note that both $\mathcal P_2$ and $\mathcal P_k$ are full net pattern sets. If $\mathcal P$ is a full net pattern set, then any $\mathcal P$ -universal $(r_1,\ldots,r_k)$ -SB needs at least $\min\{r_i,r_j\}$ switches to route $\min\{r_i,r_j\}$ nets between sides i and j, $1 \leq i < j \leq k$ , so that it needs at least $lb(r_1,\ldots,r_k) = \sum_{1 \leq i < j \leq k} \min\{r_i,r_j\}$ switches. This observation is helpful for us to obtain some sufficient conditions under which our design scheme can produce optimal switch boxes. **Theorem 6.** Let $\mathcal{P}$ be a full net pattern set and let $\mathbf{d} = (d_1, \ldots, d_k)$ and $\mathbf{c} = (c_1, \ldots, c_k)$ be k-dimensional nonnegative integer vectors such that $d_i \leq d_j$ implies $c_i \leq c_j$ for $1 \leq i < j \leq k$ . If an optimal $\mathcal{P}$ -universal $(p\mathbf{d})$ -SB has $lb(p\mathbf{d}) = \sum_{1 \leq i < j \leq k} p \min\{d_i, d_j\}$ switches, and an optimal $\mathcal{P}$ -universal $(p_w\mathbf{d} + \mathbf{c})$ -SB has $lb(p_w\mathbf{d} + \mathbf{c}) = \sum_{1 \leq i < j \leq k} \min\{p_wd_i + c_i, p_wd_j + c_j\}$ switches, then the compound $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SB $U_{p_w} + \frac{w - p_w}{p}U_0$ is optimal. **Proof.** Let $U_0$ be an optimal $\mathcal{P}$ -universal $(p\mathbf{d})$ -SB and $U_{p_w}$ be an optimal $\mathcal{P}$ -universal $(p_w\mathbf{d}+\mathbf{c})$ -SB. Then, the numbers of switches in $U_0$ and $U_{p_w}$ are $\sum_{1\leq i< j\leq k} p\min\{d_i,d_j\}$ and $\sum_{1\leq i< j\leq k} \min\{p_wd_i+c_i,p_wd_j+c_j\}$ , respectively. The number of switches in $U_{p_w}+\frac{w-p_w}{p}U_0$ is equal to $$\begin{split} & \sum_{1 \leq i < j \leq k} \min\{p_w d_i + c_i, p_w d_j + c_j\} \\ & + \frac{w - p_w}{p} \sum_{1 \leq i < j \leq k} p \min\{d_i, d_j\} \\ &= \sum_{1 \leq i < j \leq k} (\min\{p_w d_i + c_i, p_w d_j + c_j\} \\ & + \min\{(w - p_w) d_i, (w - p_w) d_j\}) \\ &= \sum_{1 \leq i < j \leq k} \min\{w d_i + c_i, w d_j + c_j\} \\ &= lb(w \mathbf{d} + \mathbf{c}). \end{split}$$ Therefore, $U_{p_w} + \frac{w - p_w}{p} U_0$ is an optimal $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SB. The above theorem says that, under certain conditions, combining optimal prime switch boxes by the reduction design scheme yields optimal switch boxes. It is still unknown if the design scheme produces optimal switch boxes when the conditions are not met. No counterexample is found, but there are optimal $(r_1, \ldots, r_k)$ -HUSBs with more switches than $lb(r_1, \ldots, r_k)$ . For instance, it was shown in [9] that an optimal (4, 4, 4, 4)-HUSB has 25 switches, rather than 24 (= lb(4, 4, 4, 4)) switches. We point out that the prime $\mathcal{P}$ -universal $(p\mathbf{d})$ -SB $U_0$ plays a more important role as $U_0$ is the major building block. If $U_0$ has $lb(p\mathbf{d})$ switches, then we know that the compound $\mathcal{P}$ -universal $(w\mathbf{d}+\mathbf{c})$ -SB $U_{p_w}$ is nearly optimal when w is large. If $U_0$ has $lb(p\mathbf{d})\alpha$ switches $(\alpha>1)$ , then the compound $\mathcal{P}$ -universal $(w\mathbf{d}+\mathbf{c})$ -SB $U_{p_w}$ has at most $\alpha$ times the number of switches in an optimal $\mathcal{P}$ -universal $(w\mathbf{d}+\mathbf{c})$ -SB. Therefore, designing good prime switch boxes will result in a good compound switch box which is close to optimal; the number of switches in it is bounded by a ratio of a constant. From our observation, this constant should be very small and approach to one when w is large. In the worst case, since a complete switch box is always a $\mathcal{P}$ -universal for any $\mathcal{P}$ , so we can simply let $U_0$ be a complete $(p\mathbf{d})$ -SB $K_{(p\mathbf{d})}$ , and let $U_{p_w}$ be the complete $(r\mathbf{d}+\mathbf{c})$ -SB $K_{(r\mathbf{d}+\mathbf{c})}$ , then $K_{(r\mathbf{d}+\mathbf{c})}+\frac{w-p_w}{p}K_{(p\mathbf{d})}$ is a $\mathcal{P}$ -universal $(w\mathbf{d}+\mathbf{c})$ -SB which contains $\Theta(w)$ switches. Therefore, we have the following theorem: **Theorem 7.** Let $\mathbf{d}$ and $\mathbf{c}$ be k-dimensional nonnegative integer vectors and $\mathcal{P}$ be any net pattern set. Then, there is a compound $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SB with $\Theta(w)$ switches and a $\Theta(w)$ time routing algorithm. In particular, the number of switches in an optimal $(w\mathbf{d} + \mathbf{c})$ -HUSB is $\Theta(w)$ . ## 5.3 Illustration of Optimal Design Examples We apply the reduction design scheme to design an optimal (4,5,6)-HUSB and an optimal (5,6,7)-HUSB. Our strategy is to design generic optimal (w,w+1,w+2)-HUSBs, then derive a (4,5,6)-HUSB and a (5,6,7)-HUSB by letting w=4,5. To design (w, w+1, w+2)-HUSBs, we let $\mathbf{d}=(1,1,1)$ and $\mathbf{c}=(0,1,2)$ . Since we consider 3-sided hyperuniversal switch boxes, the net pattern set is $$\{\{1\},\{2\},\{3\},\{1,2\},\{1,3\},\{2,3\},\{1,2,3\}\}.$$ We proceed with the reduction design scheme as follows: I. The corresponding incidence matrix of the net pattern set is $$A = \begin{pmatrix} 1 & 0 & 0 & 1 & 1 & 0 & 1 \\ 0 & 1 & 0 & 1 & 0 & 1 & 1 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 \end{pmatrix}.$$ Computing the Hilbert basis of the following SLDE: $$\begin{pmatrix} 1 & 0 & 0 & 1 & 1 & 0 & 1 & -1 \\ 0 & 1 & 0 & 1 & 0 & 1 & 1 & -1 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 & -1 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \\ x_6 \\ w \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix},$$ we obtain $$\mathcal{B}_0 = \{(1,1,1,0,0,0,0,1), (0,0,0,0,0,0,1,1),\\ (1,0,0,0,1,0,1), (0,1,0,0,1,0,0,1),\\ (0,0,0,1,1,1,0,2), (0,0,1,1,0,0,0,1)\}.$$ Furthermore, computing the set of minimal solutions of $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{c}^T$ , we obtain $$\mathcal{B} = \{(0, 1, 2, 0, 0, 0, 0, 0), (0, 0, 1, 0, 0, 1, 0, 0), (0, 0, 0, 0, 1, 2, 0, 1)\}.$$ II. We have $S = \{1, 2\}$ , $S' = \{1\}$ . Then, p = 2 and q = 2 satisfy the statements of Theorem 4. III. We need to design three prime switch boxes: a 2(1,1,1)-HUSB $U_0$ , a (1,2,3)-HUSB $U_1$ and a (2,3,4)-HUSB $U_2$ . Fig. 3a, Fig. 3b, and Fig. 3c show the designs of the three prime hyperuniversal switch boxes. Routing tables are omitted. Since the 2(1,1,1)-HUSB $U_0$ has $6 \ (= lb(2,2,2))$ switches, $U_0$ is an optimal (2,2,2)-HUSB. Similarly, the (1,2,3)-HUSBs $U_1$ has lb(1,2,3)=4 switches and the (2,3,4)-HUSB $U_2$ has lb(2,3,4)=7 switches, both are optimal. IV. The compound (w, w+1, w+2)-HUSBs for $w \geq 3$ are $U_1 + \frac{w-1}{2}U_0$ when w is odd and $U_2 + \frac{w-2}{2}U_0$ when w is even. By Theorem 6, all compound (w, w+1, w+2)-HUSBs constructed above are optimal. In particular, when w=4 or 5, we obtain an optimal (4,5,6)-HUSB $U_2 + U_0$ , and an optimal (5,6,7)-HUSB $U_1 + 2U_0$ . See Fig. 3d and Fig. 3e. Next, we illustrate the routing process by showing steps of finding a feasible routing for the (4,5,6)-RR R of Fig. 3f in $U_2 + U_0$ . The corresponding (4,5,6)-RRV is X = (0,0,0,1,2,3,1) and w = 4. We have $$\begin{split} (X,w) &= (0,0,0,1,2,3,1,4) \\ &= (0,0,0,0,1,2,0,1) + (0,0,0,0,0,0,1,1) \\ &+ (0,0,0,1,1,1,0,2). \end{split}$$ The first vector on the right-hand side is a minimal solution to $AX^T-(1,1,1)^Tw=(0,1,2)^T$ ; the second and third vectors are minimal solutions to $AX^T-(1,1,1)^Tw=\mathbf{0}^T$ . By combining the first and the second vectors, we have (X,w)=(0,0,0,0,1,2,1,2)+(0,0,0,1,1,1,0,2). It follows that X=(0,0,0,0,1,2,1)+(0,0,0,1,1,1,0), where vector (0,0,0,0,1,2,1) corresponds to (2,3,4)-RR $R1=\{\{1,3\},\{2,3\},\{2,3\},\{1,2,3\}\}$ , which can be routed in $U_2$ , and (0,0,0,1,1,1,0) corresponds to (2,2,2)-RR $R2=\{\{1,2\},\{1,3\},\{2,3\}\}$ , which can be routed in $U_0$ . Therefore, R is routable in $U_2+U_0$ . Fig. 3h shows a feasible routing of R1 in $U_2$ and Fig. 3i shows a feasible routing of R2 in $U_0$ . Fig. 3g gives a feasible routing of R in $U_2+U_0$ which is the disjoint union of Fig. 3h and Fig. 3i. # 6 RECTANGULAR SWITCH BOXES FOR CUSTOMIZED FPGAS In this section, we first present an optimal design for rectangular (w,2w,w,2w)-USB, then show the experimental results on routability in a customized FPGA architecture by VPR. Fig. 4. Prime and two compound optimal (w, 2w, w, 2w)-USBs. (a) Prime (w, 2w, w, 2w)-USBs. (b) (3, 6, 3, 6)-USB $U_1 + U_0$ . (c) (4, 8, 4, 8)-USB $2U_0$ . Let $\mathbf{d} = (1, 2, 1, 2)$ and $\mathbf{c} = (0, 0, 0, 0)$ . The problem is to design optimal *w***d**-USBs. The net pattern set is $\{\{1\}, \{2\}, \{3\}, \{4\}, \{1, 2\}, \{1, 3\}, \{1, 4\}, \{2, 3\}, \{2, 4\}, \{3, 4\}\}\}$ . We proceed with the reduction design scheme as follows: I. The incidence matrix of the net pattern set is $$A = \begin{pmatrix} 1 & 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 & 1 & 0 \\ 0 & 0 & 1 & 0 & 0 & 1 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 & 0 & 1 & 0 & 1 & 1 \end{pmatrix}.$$ Solving the Hilbert basis of $(A, -\mathbf{d}^T)(X, w)^T = \mathbf{0}$ , we obtain $$\mathcal{B}_0 = \{(0,0,0,2,1,0,0,1,0,0,1), (0,0,0,0,1,0,0,0,1,1,1),\\ (1,0,0,1,0,0,0,1,1,0,1), (1,1,0,0,0,0,0,0,1,1,1),\\ (0,2,0,2,0,1,0,0,0,0,1), (0,0,0,0,0,1,1,1,0,1),\\ (0,1,0,1,0,1,0,0,1,0,1), (0,2,0,0,0,0,1,0,0,1,1),\\ (1,1,0,2,0,0,0,1,0,0,1), (0,0,1,1,1,0,0,0,1,0,1),\\ (1,2,0,1,0,0,0,0,0,1,1), (0,1,1,0,0,0,1,0,1,0,1),\\ (1,1,1,1,0,0,0,0,1,0,1), (0,1,1,2,1,0,0,0,0,0,1),\\ (1,2,1,2,0,0,0,0,0,0,1), (2,0,0,0,0,0,1,3,1,2),\\ (1,0,1,0,0,0,0,2,0,1), (0,1,0,1,1,0,0,0,0,1,1),\\ (0,0,2,0,1,0,1,0,3,0,2), (0,0,0,0,0,1,0,0,2,0,1),\\ (0,2,1,1,0,0,1,0,0,0,1), (0,1,0,1,0,0,1,1,0,0,1) \}.$$ $$\mathcal{B} = \emptyset.$$ II. From the above solutions, we see that the w-values are 1 and 2. That implies $S = \{1, 2\}$ and $S' = \emptyset$ . Therefore, we have p = 2, q = 2. III. The prime universal w(1,2,1,2)-SB is a (2,4,2,4)-SB $U_0$ (corresponding to both 2d-USB and $(r\mathbf{d}+\mathbf{c})$ -USB with r=2) and a (1,2,1,2)-USB $U_1$ (corresponding to $(r\mathbf{d}+\mathbf{c})$ -USB with r=1). Fig. 4a shows a (2,4,2,4)-USB $U_0$ and a (1,2,1,2)-USB $U_1$ . It is easy to see that both are optimal with $lb(r_1,r_2,r_3,r_4)$ switches. The routing tables are omitted. IV. For $w \ge 3$ , let $B = U_1 + \frac{w-1}{2}U_0$ when w is odd and $B = \frac{w}{2}U_0$ when w is even. By Theorem 6, B is an optimal (w, 2w, w, 2w)-USB. $U_1 + U_0$ and $2U_0$ are shown in Fig. 4b and Fig. 4c, respectively. Next, we present the experimental results with VPR. In this experimental work, we compare the entire-chip routability between the FPGA adopting optimal irregular switch boxes and the FPGA using other random but basically reasonable irregular switch boxes. Direct experimental comparisons with other previous work are not available since the result given in [1] concerns global routing only and the switch density used in [12] is quite different from ours. We compare the optimal (w, 2w, w, 2w)-USBs with a disjoint-like (w, 2w, w, 2w)-SBs, where a disjoint (w, 2w, w, 2w)-SB is defined by switch set: $$\begin{split} &\{v_{1,j}v_{3,j}:j=1,\ldots,w\} \cup \{v_{2,j}v_{4,j}:j=1,\ldots,2w\} \\ &\cup \{v_{1,j}v_{2,j}:j=1,\ldots,w\} \cup \{v_{4,j}v_{1,j}:j=1,\ldots,w\} \\ &\cup \{v_{2,w+j}v_{3,w-j+1}:j=1,\ldots,w\} \\ &\cup \{v_{4,w+j}v_{3,w-j+1}:j=1,\ldots,w\}. \end{split}$$ We revise the well-considered, efficient, and fair FPGA router VPR [2] and run large MCNC benchmark circuits for our routing experiments. The logic block structure for our VPR runs is set to consist of one 4-input LUT and one flipflop. The input or output pin of the logic block is able to connect to any track in the adjacent channels, i.e., $F_c = w$ (or 2w for wide sides). Fig. 5a shows the structure of the disjoint-like switch box with w=4 and Fig. 5b illustrates our proposed optimal switch box structure. As shown in Table 1, FPGAs adopting the optimal (w, 2w, w, 2w)-USBs can save 6 percent channel resources according to this experiment. ## 7 CONCLUSIONS The major contribution of this paper is the extension of the reduction design technique for regular switch boxes to the most general, arbitrary-shaped switch boxes. By introducing the concepts of a net pattern set $\mathcal P$ and Fig. 5. Structures of S-boxes (w=4). (a) Disjoint-like. (b) Optimal design. TABLE 1 Comparison of VPR Experimental Results on Channel Density w between Disjoint-Like (w,2w,w,2w)-SBs and Our Optimal (w,2w,w,2w)-USBs | | Disjoint-like | Optimal Design | | Disjoint-like | Optimal Design | |----------|---------------|----------------|----------|---------------|----------------| | alu4 | 7 | 7 | ex5p | 11 | 10 | | apex2 | 8 | 8 | frisc | 10 | 9 | | apex4 | 10 | 9 | misex3 | 9 | 8 | | bigkey | 5 | 5 | s298 | 6 | 6 | | clma | 9 | 9 | s38417 | 6 | 5 | | des | 6 | 5 | s38584.1 | 6 | 6 | | diffeq | 6 | 6 | seq | 9 | 8 | | dsip | 5 | 5 | spla | 10 | 10 | | elliptic | 10 | 9 | tseng | 5 | 5 | | ex1010 | 8 | 7 | e64 | 6 | 6 | | Total | | | | 152 | 143 (-6.3%) | $\mathcal{P}$ -universal, we are able to unify the known universal and hyperuniversal switch boxes and also to provide the most general platform for designing any switch boxes (arbitrary-shaped) with any special type of routing requirements. To achieve this, we first modeled a routing requirement as a nonnegative integer solution of a system of linear equations. Then, a decomposition theory was established based on the theory of systems of linear Diophantine equations. The decomposition theory enabled us to develop a reduction design scheme for arbitrary-shaped switch boxes. That is, for any fixed nonnegative integer vectors $\mathbf{d}$ and $\mathbf{c}$ , the design of $\mathcal{P}$ -universal $(w\mathbf{d} + \mathbf{c})$ -SBs is reduced to the design of a finite number (depends on only k and $\mathcal{P}$ ) of small prime switch boxes such that a $\mathcal{P}$ -universal $(w\mathbf{d}+\mathbf{c})$ -SB can be obtained by a disjoint union of two types of the prime switch boxes with a linear number of switches. As any $(r_1,\ldots,r_k)$ -SB can be expressed as $(w\mathbf{d}+\mathbf{c})$ -SB by a proper selection of the vectors $\mathbf{d}$ and $\mathbf{c}$ , a powerful design scheme for any optimal arbitrary-shaped $\mathcal{P}$ -universal switch box is developed. In addition, a switch box designed by using this scheme can be routed efficiently and can be easily laid out. We hope this general theory and scheme will have some impact, theoretically and practically, on the design of switch boxes and interconnection networks in the future. ### **ACKNOWLEDGMENTS** The authors are indebted to the referees for many valuable suggestions. This research was partially supported by the Natural Sciences and Engineering Research Council of Canada, and a Hong Kong Government RGC Earmarked Grant. ## REFERENCES - V. Betz and J. Rose, "Directional Bias and Non-Uniformity in FPGA Global Routing Architectures," Proc. IEEE/ACM Int'l Conf. Computer-Aided Design., pp. 652-659, 1996. - Computer-Aided Design, pp. 652-659, 1996. [2] V. Betz and J. Rose, "A New Packing, Placement and Routing Tool for FPGA Research," Proc. Seventh Int'l Workshop Field-Programmable Logic and Applications, pp. 213-222, 1997. - [3] V. Betz, J. Rose, and A. Marquardt, Architecure and CAD for Deep-Submicron FPGAs. Boston: Kluwer Academic, 1999. - [4] S. Brown, R. Francis, J. Rose, and Z. Vranesic, Field Programmable Gate Arrays. Boston: Kluwer-Academic, 1992. - [5] Y.-W. Chang, D.F. Wong, and C.K. Wong, "Universal Switch Modules for FPGA Design," ACM Trans. Design Automation of Electronic Systems, vol. 1, no. 1, pp. 80-101, Jan. 1996. - [6] E. Contejean and H. Devie, "An Efficient Incremental Algorithm for Solving Systems of Linear Diophantine Equations," *Information and Computing*, vol. 113, no. 1, pp. 143-172, 1994. [7] A. DeHon, "Balancing Interconnect and Computation in a - [7] A. DeHon, "Balancing Interconnect and Computation in a Reconfigurable Computing Array," Proc. ACM/SIGDA Seventh Int'l Symp. Field Programmable Gate Arrays, pp. 69-78, 1999. - [8] H. Fan, J. Liu, Y. Wu, and C.C. Cheung, "On Optimum Switch Box Designs for 2-D FPGAs," Proc. Design Automation Conf. (DAC-01), pp. 203-208, 2001. - [9] H. Fan, J. Liu, Y. Wu, and C.C. Cheung, "On Optimal Hyper Universal and Rearrageable Switch Box Designs," *IEEE Trans. Computer Aided Designs*, vol. 22, no. 12, pp. 1637-1649, Dec. 2003. - Computer Aided Designs, vol. 22, no. 12, pp. 1637-1649, Dec. 2003. [10] H. Fan, J. Liu, Y. Wu, and C. Wong, "Reduction Design for Generic Universal Switch Blocks," ACM Trans. Design Automation of Electronic Systems, vol. 7, no. 4, pp. 526-546, Oct. 2002. - of Electronic Systems, vol. 7, no. 4, pp. 526-546, Oct. 2002. [11] H. Fan, J. Liu, and Y.L. Wu, "General Models and a Reduction Design Technique for FPGA Switch Box Designs," IEEE Trans. Computers, vol. 52, no. 1, pp. 21-30, Jan. 2003. - [12] P. Hallschmid and S. Wilton, "Detailed Routing Architectures for Embedded Programmable Logic IP Cores," Proc. ACM/SIGDA Int'l Symp. Field-Programmable Gate Arrays, pp. 69-74, Feb. 2001. - [13] J. Rose and S. Brown, "Flexibility of Interconnection Structures for Field-Programmable Gate Arrays," *IEEE J. Solid State Circuits*, vol. 26, no. 3, pp. 277-282, Mar. 1991. - [14] M. Shyu, G.M. Wu, Y.D. Chang, and Y.W. Chang, "Generic Universal Switch Blocks," *IEEE Trans. Computers*, vol. 49, no. 4, pp. 348-359, Apr. 2000. - [15] M. Yen, S. Chen, and S. Lan, "A Three-Stage One-Sided Rearrangeable Polygonal Switching Network," *IEEE Trans. Computers*, vol. 50, no. 11, pp. 1291-1294, Nov. 2001. Hongbing Fan received the PhD degree in computer science from the University of Victoria in 2003 and joined Wilfrid Laurier University in 2004. Previously, he received the BS degree in mathematics and the PhD degree in operational research and control theory from Shandong University in 1982 and 1990, respectively. He has worked at Shandong University, the Chinese University of Hong Kong, and the University of Lethbridge. His current research interests are interconnection networks, algorithms for CAD of VLSI, network on-chip-based System-on-Chip applications, combinatorial algorithm, and complexity. He is a member of the IEEE. Yu-Liang (David) Wu received the BS degree and MS degree in computer science from Florida International University of Miami in 1983 and 1984, respectively. He received the PhD degree in electrical and computer engineering from the University of California at Santa Barbara in 1994. He worked at Internet Systems Corp., AT&T Bell Labs, Amdahl Corporation, and Cadence Design Systems Inc. before he joined the Chinese University of Hong Kong in January 1996. His current research interests mostly relate to optimization of logic and physical design automation of VLSI circuits and FPGA related CAD tool designs and architectural analysis/optimization. He is a member of the IEEE. Ray Chak-Chung Cheung received the BEng degree and MPhil degree in computer engineering and computer science and engineering from The Chinese University of Hong Kong (CUHK) in 1999 and 2001, respectively. In 2001, he worked as a system administrator in the Center of Large-Scale Computation (CLC) of Cluster Technology, Hong Kong. From January 2002 to December 2003, he was an instructor in the Department of Computer Science and Engineer- ing, CUHK. He is now a PhD candidate, in the Custom Computing Group, Department of Computing, Imperial College London. His current research interests are computer arithmetic hardware designs and design exploration of System-on-Chip (SoC) designs and embedded systems. He is a student member of the IEEE and the ACM and is the newsletter and Web editor of the SIGDA UK chapter. Jiping Liu received the BS degree and MS degree in mathematics from Shandong University in 1982 and 1986, respectively, and the PhD degree in combinatorics and graph theory from Simon Fraser University in 1992. He joined the University of Lethbridge in 1995. His research interests were in various optimum design problems in VLSI, graph algorithms and complexities, special classes of graphs, graph decompositions, partitions, and factorizations and Hamiltonian related properties of certain graphs. He recently died as the result of an automobile accident. ⊳ For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.