# On Optimal Hyperuniversal and Rearrangeable Switch Box Designs Hongbing Fan, Jiping Liu, Yu-Liang Wu, Member, IEEE, and Chak-Chung Cheung, Student Member, IEEE Abstract—This paper explores theories on designing optimal multipoint interconnection structures, and proposes a simple switch box design scheme which can be directly applied to field programmable gate arrays (FPGAs), switch box designs, and communication switching network designs. We present a new hyperuniversal switch box designs with four sides and Wterminals on each side, which is routable for every multipin net-routing requirement. This new design is proved to be optimum for W = 1, ..., 5 and close to optimum for W > 6 with $6.\overline{3}W$ switches. We also give a formal analysis and extensive benchmark experiments on routability comparisons between today's most well-known FPGA switch boxes like disjoint switch blocks (Xilinx XC4000 Type), Wilton's switch blocks, Universal switch blocks, and our Hyperuniversal switch boxes. We apply the design scheme to rearrangeable switching network designs targeting for applications of connecting multiple terminals (e.g., teleconferencing). Simply using a k-sided hyperuniversal switch block with a $W \times W$ crossbar attached to each side, one can build a three-stage one-sided polygonal switching network capable of realizing every multipoint connection requirement on kWterminals. Besides, due to the fine-grained decomposition property of our design scheme, the new switch box designs are highly scalable and simple on physical layout and routing algorithm implementations. *Index Terms*—Field programmable gate arrays (FPGA), hyper-rearrangeable, hyperuniversal, routings, switch box, switching network. #### I. INTRODUCTION WITCH BOXES, also called switch modules, or switch blocks, or switching networks in other literatures, are basic components in reconfigurable interconnection networks. A switch box basically consists of terminals (ports) and prefabricated programmable switches for connecting these terminals. The functionality of switch box is to implement a given routing requirement using its programmable switches. In field programmable gate arrays (FPGAs), switch boxes are key Manuscript received October 10, 2002; revised May 28, 2003. This work was supported in part by the Natural Sciences and Engineering Research Council of Canada and in part by the Hong Kong Government RGC earmarked under Grant CUHK4236/01E, Direct Grant CUHK2050244, and Direct Grant NSFC/RGC02 2900304. This paper was recommended by Associate Editor M. Sarrafzadeh. H. Fan is with the Department of Computer Science, University of Victoria, Victoria, BC V8W 3P6, Canada (e-mail: hfan@cs.uvic.ca). J. Liu is with the Department of Mathematics and Computer Science, University of Lethbridge, Lethbridge, AB T1K 3M4, Canada (e-mail: liu@cs.uleth.ca). Y.-L. Wu and C.-C. Cheung are with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong (e-mail: ylw@cse.cuhk.edu.hk; cccheung@cse.cuhk.edu.hk). Digital Object Identifier 10.1109/TCAD.2003.819430 Fig. 1. Switch boxes in 2-D FPGA. components, which determine the routability and the area efficiency of an FPGA chip [6], [11]. In a circuit-switching-based communication network, such as a traditional telephone network [1], [2], switch boxes are used to set up physical connections for communication parties. A switch box with k sides and W terminals on each side, denoted by (k, W)-SB, consists of bidirectional programmable switches connecting terminals on different sides. (4, W)-SBs are the key-switch modules in the island-style two-dimensional (2-D) FPGA architectures [5], [6], [9], [15], [17]. Fig. 1 shows such an FPGA architecture using (4, 4)-SBs. Routability and area efficiency are two important issues in switch box designs. Routability of a switch box is the capability in realizing all kinds of routing requirements, while the area efficiency can be measured by the number of switches employed. There are two conflicting issues. It can be seen that a *complete* (k,W)-SB, (i.e., having a switch between every pair of terminals of different sides) will have the highest routability. However, it has the lowest area efficiency and high cost in fabrication, and is impractical in layout when k and W are large. One of the design goals is to design an optimum switch box, which is routable for all given routing requirements, has a minimum number of switches fabricated, but does not cause much layout complication. To address the tradeoff between chip-level routability and area efficiency, Rose and Brown [5] introduced a useful measure called *flexibility*, denoted by $F_s$ , which is the maximum number of switches connected to a terminal in the switch box. They investigated the relationship between flexibility and routability, and observed that (4, W)-SBs with $F_s = 3, 4$ yield a good tradeoff between the number of switches and routability. Fig. 2. Routing requirements and feasible routings for various (4, 3)-SBs. However, as there are many different designs for switch boxes with the same flexibility, it is clearly important to analyze the routability differences among them and to find out the optimum designs. Fig. 2(a)–(d) provides four representative FPGA switch box structures. A Disjoint (4, W)-SB [Fig. 2(a)] consisting of a disjoint union of W complete (4, 1)-SBs is used in Xilinx XC4000 Type FPGAs. Fig. 2(b) gives the so-called Wilton's (4, 3)-SB [26] which is a nondecomposable switch box. It has been shown experimentally that nondecomposable switch-box design may cause some layout complications. Chang et al. [9] proposed a decomposable design, called universal switch modules [9], and better routability was achieved both theoretically and experimentally. A (k, W)-SB is said to be *universal* [or a (k, W)-USB] if it is routable for every set of 2-pin net routing requirement satisfying the routing constraints, i.e., the number of nets incidents with each side is at most W. In [9], the so-called symmetric (4, W)-USBs, denoted by $M_{4,W}$ , were proposed and proved to be universal. It is also proved that $M_{4,W}$ is an optimal (4, W)-USB with 6W switches. Fig. 2(c) shows a (4, 3)-USB which is isomorphic to $M_{4,W}$ . To remove the 2-pin nets routing limitation, Fan *et al.* [19]–[21] generalized the notion of universal to *hyperuniversal* by allowing multipin nets. A (k, W)-SB is said to be hyperuniversal if it is routable for every set of multipin nets satisfying the routing constraints. A hyperuniversal (k, W)-SB is denoted by (k, W)-HUSB. Fig. 2(d) presents our new (4, 3)-HUSB design. Fig. 2(e)–(l) shows feasible routings of some routing requirements in the corresponding switch boxes, respectively. It is interesting to note that all of the four (4, 3)-SBs shown in Fig. 2 contain eighteen switches, but their routing capacities are not the same. In this paper, we will formally prove their unequal routabilities. We will show in Section II-E that (4, W)-USB has higher routability than the Disjoint-(4, W)-SB, the routabilities of Disjoint-(4, W)-SB, (4, W)-USB, and Wilton's (4, W)-SB are not comparable, but (4, W)-HUSB has the highest routability which, thus, has better routability than any of the other (4, W)-SBs. These results clearly suggest that besides the number of fabricated switches, the connection topology of a switch box plays an important role in deciding the routability of a switch box. Levering both the objectives of routability and simplicity for designs and fabrication, a systematic reduction design scheme for general (k, W)-HUSBs was proposed in [19]–[21]. In this design scheme, for any given k, we need only design (k, r)-HUSBs for a few values of rs, called prime k-HUSBs. Then we use the prime k-HUSBs to build all of the other (k, W)-HUSBs by disjoint-union operation. This scheme guarantees the hyperuniversality for any W, while still maintaining good scalability and a small number of switches. As a result, the complicated HUSB design problem is reduced to the problem of designing a few numbers of prime k-HUSBs—each of the prime k-HUSBs is small in size. This constructive design scheme not only provides a set of well-structured and scalable HUSBs, but also makes the implementation of routing algorithm and chip layout easy [20]. In the case of k=4, the prime 4-HUSBs are (4, r)-HUSBs for r = 1, 2, 3, 4, 5, 6, 7. That is, there are seven prime 4-HUSBs. In [19]–[21], a set of prime 4-HUSBs was given, and a class of (4, W)-HUSBs with 6.67W switches composed by these prime 4-HUSBs was constructed. In this paper, we further explore that although the number of possible routing cases increases dramatically from 2-pin nets to multipin nets, it is possible to build (4, W)-HUSBs using only a few more switches than that of (4, W)-USBs [23]. For example, an optimum (4, 4)-USB has 24 switches, while an optimum (4, 4)-HUSB is shown to have 25 switches. By definition, it is obvious that a (k, W)-HUSB must be a (k, W)-USB, but the reverse is not true. As it can be shown that 6W, the number of switches used in an optimum (4, W)-USB, is not sufficient for general (4, W)-HUSBs (e.g., an optimum (4, 4)-HUSB requires $6 \times 4 + 1 = 25$ switches), 6W can only be considered as a trivial *loose lower bound* for optimum (4, W)-HUSBs. In this paper, we present a new set of prime (4, W)-HUSBs, which is optimum for $W=1,\ldots,5$ , and has 6W+2 switches for W=6 and 6W+1 switches for W=7. Using this new set of prime 4-HUSBs to build other larger (4, W)-HUSB, the switch count of the new design is improved to $6.\overline{3}W$ , which is quite close to the loose bound (6W). Moreover, it is interesting to note that, for practical range of W values, only a few more switches are needed to make the currently known nonhyperuniversal switch boxes become hyperuniversal by using our design scheme. We will give a complete proof on the hyperuniversality of the new prime 4-HUSB designs, followed by extensive FPGA routing experiments to demonstrate their routability improvement, even when an entire chip routing is exercised. To make this complicated formal proof manageable, we use the decomposition theory developed in previous works [19]–[21] and some new simplification techniques. To have a fair experimental comparison, we run the VPR [27] on benchmarks for Disjoint (4, W)-SBs, Wilton's (4, W)-SBs, (4, W)-USBs, and our new (4, Fig. 3. Three-stage one-sided rearrangeable switching network (k=4, W=3) capable of realizing all possible multipoint interconnection requirements and a realization for the connection requirement $\{\{1,4,5\},\{2,6,7,11,12\},\{3,8\},\{9,10\}\}$ . Second Stage W)-HUSBs (with switches trimmed down to be the same for fair comparisons). The improvement for the entire chip routing is also demonstrated. Dynamic (Reconfigurable) switching networks have been widely used for many applications including parallel processing of multiprocessors, telecommunications, etc. To reduce the number of switches, a multistage structure is needed with the cost of more switching delays. A two-sided (input, output) switching network is rearrangeable if it is able to realize arbitrary permutation between input and output terminals, however terminals of the same side may not be connectable. To avail any set of simultaneous point-to-point (2-point) connections between all terminals, a three-stage one-sided rearrangeable polygonal switching network (PSN), which makes use of the 2-point universal connectivity of USBs, has been recently proposed [18]. Similarly, we can design a hyperuniversal rearrangeable switch box (HRSB) with the ability of allowing simultaneous multipoint connections. Using a (k, W)-HUSB as the central component and a $W \times W$ crossbar attached to each side, we build a three-stage one-sided (k, W)-HRSB capable of realizing any multipoint connection requirement for the kWterminals. Fig. 4. Examples of a track-free routing requirement and a completed routing. In Fig. 3, we show a (4, 3)-HRSB and the realization for a multipoint connection requirement $\{\{1,4,5\},\{2,6,7,11,12\},\{3,8\},\{9,10\}\}$ . If we attach crossbar boxes to h of the k sides, then we obtain a so-called (h,k,W)-HRSB, which can be used for building improved greedy routing architectures (GRAs) [10]. Besides the guaranteed hyperuniversality, the simplicity and decomposable construction nature of our proposed design scheme should be of equal significance. In our design scheme, a (4, W)-HUSB of large W is built from fine-grained prime 4-HUSBs, which makes the physical layout and routing algorithm designs as simple as that of a Disjoint (4, W)-SB and a (4, W)-USB. The rest of the paper is organized as follows. In Section II, we, first, formally define the track-free routing requirement and hyperuniversal switch boxes. Then, we briefly describe the reduction design method followed by showing the new prime 4-HUSBs. In Section III, we address track-fixed routing requirements and present our designs for (4, *W*)-HRSBs and applications for improved GRAs. We show our experiments in Section IV and give conclusions in Section V. The formal proof of the prime 4-HUSBs is presented in the Appendix. Feasible routing tables for the prime 4-HUSBs can be derived directly from the proof. #### II. HYPERUNIVERSAL SWITCH BOXES A net is a set of terminals (pins) that need to be interconnected. A *routing requirement* around a switch box is a set of nets, which is also termed as a *global routing* in some literatures. A *feasible routing* (or detailed routing) for a routing requirement is a realization of all of the nets in the routing requirement. A proper mathematical modeling for routing requirements is important in solving the optimal switch box designing problems. In this paper, we will consider two kinds of routing requirements, *track-free* and *track-fixed* routing requirements. In this section, we investigate the track-free routing requirements and the associated hyperuniversal switch box design problem, and present our new set of prime 4-HUSB design for (4, W)-HUSBs. #### A. Track-Free Routing Requirements In a (track-free) routing requirement for a (k, W)-SB, only the sides of the terminals are specified, while the actual tracks used in a feasible routing will be decided by routers. Fig. 4(a) shows a (4, 4)-SB, where each of the four sides has four terminals (tracks), each terminal is assigned to a unique track IDs (1–4). A net is called a t-pin net if it is requested to connect t terminals on t different sides. For instance, the net $N_2$ in Fig. 4(b) is a 3-pin net; it requires connecting three terminals on sides 1, 2, and 4. It is up to the router to decide which terminals are actually assigned in a track-free routing requirement. Fig. 4(c) gives a feasible routing example for the routing requirement with seven track-free nets as shown in Fig. 4(b). In the following text, unless stated otherwise, track-free will be the default condition for routing requirements and the word track-free is omitted for brevity. In general, a (track-free) routing requirement for a (k, W)-SB is a set of nets satisfying the channel-density constraint, i.e., the number of nets incident to every side is no more than W. Our first step in switch box design is to model a track-free routing requirement as a collection of subsets of $\{1, 2, \dots, k\}$ [19], [21]. We label the k sides of the (k, W)-SB by $1, 2, \ldots, k$ , respectively. A t-pin net, which requests connecting t terminals on t different sides labeled $i_1, i_2, \ldots, i_t$ , is represented by $\{i_1, i_2, \dots, i_t\}$ . Thus, a routing requirement for a (k, W)-SB is a collection of subsets of $\{1,\ldots,k\}$ such that for each $i\in$ $\{1,2,\ldots,k\}$ , it appears in at most W nets. A one-pin net (a net of size 1) corresponds to a singleton $\{i\}$ , which does not need any switch for realization. For simplicity, we simply add some singletons to a routing requirement to make sure that every $i \in \{1, 2, \dots, k\}$ appears in exactly W nets of the resulting routing requirement. A (k, W)-SB can be viewed as a graph G such that terminals are vertices and switches are edges. The stage of a switch box is the maximum number of edges in the shortest path joining two terminals on different sides. If we denote the jth terminal on side i by $v_{i,j}$ and let $V_i = \{v_{i,j}|j=1,\ldots,W\},\ i=1,\ldots,k,$ then a one-stage (k,W)-SB corresponds to a k-partite graph, with parts $V_1,\ldots,V_k$ , and edges $v_{i,j}v_{i',j'}$ , if there is a switch connecting $v_{i,j}$ and $v_{i',j}$ . Next, we formally define a routing requirement and a feasible routing of a routing requirement in a switch box. Definition 1: A collection $\{N_i|i=1,\ldots,l\}$ of subsets of $\{1,2,\ldots,k\}$ is said to be a k-way routing requirement of density d, written as (k,d)-RR, if each $i\in\{1,2,\ldots,k\}$ appears in exactly d subsets of the collection. A (k, d)-RR is said to be a primitive (k, d)-RR, written as (k, d)-PRR, if it does not contain two singletons $\{x\}$ and $\{y\}$ , such that $x \neq y$ . A (k, d)-RR R is said to be a subrouting requirement of a (k, d')-RR R' if R is a subcollection of R, and proper is in addition d < d'. A (k, d)-RR is said to be a minimal (k, d)-RR, written (k, d)-MRR, if it does not contain a proper subrouting requirement. We denote k-RR (k-PRR, k-MRR, k-MPRR) as the k-way (primitive, minimal, minimal primitive) routing requirement. For example, $\{\{1,2\},\{1,3\},\{1,4\},\{2,3\},\{2,3\},\{3,4\},\{1\},\{2\},\{4\},\{4\}\}\}$ is (4,4)-RR. It can be converted to a (4,4)-PRR by replacing $\{1\},\{2\}$ by $\{1,2\}$ , and we obtain $\{\{1,2\},\{1,3\},\{1,4\},\{2,3\},\{2,3\},\{3,4\},\{1,2\},\{4\},\{4\}\}\}$ . Then, it can be decomposed into three 4-MPRRs: $\{\{1,2\},\{3,4\}\},\{\{1,4\},\{2,3\}\},\{\{1,2\},\{2,3\},\{1,3\},\{4\},\{4\}\}$ . Definition 2: A feasible routing of a (k, W)-RR $R = \{N_i | i=1,\ldots,l\}$ in a (k,W)-SB G is a set of mutually vertex disjoint subgraphs of $G\{T(N_i)|i=1,\ldots,l\}$ satisfying: 1) $T(N_i)$ is a tree of $|N_i|$ vertices and 2) $|V_j \cap V(T(N)_i)| = 1$ , if $j \in N_i$ , for $i=1,\ldots,l$ . Where $N_i$ is called a net, or a $|N_i|$ -pin net, (or a multipin net if $|N_i| \geq 3$ ). $T(N_i)$ is called a routing of $N_i$ in G. We say that G is routable for R if R has a feasible routing in G. By the above definitions, the (4, 4)-RR shown in Fig. 4(b) is $\{N_1, \ldots, N_7\} = \{\{1, 2\}, \{1, 2, 4\}, \{1, 3\}, \{1, 4\}, \{2, 3, 4\}, \{2, 3\}, \{3, 4\}\}$ . Fig. 4(a) shows a one-stage (4, 4)-SB. A corresponding feasible routing in this switch box is shown in Fig. 4(c). #### B. Hyperuniversal Switch Box and Design Method A (k, W)-SB is said to be *hyperuniversal* if it has a feasible routing for every (k, W)-RR. The HUSB design problem is: for a fixed k, to design an optimum (k, W)-HUSB for every $W \ge 1$ , where optimum design refers to the design of minimum number of switches of all (k, W)-HUSBs. The model of universal switch block was first proposed in [9] and was extensively studied in [12] for generalized designs. The problem is further investigated in [22] and [24]. A (k, W)-SB is *universal* (or a (k, W)-USB), if it is routable for every track-free 2-pin net (k, W)-RR. The difference between HUSB and USB is obvious by definition, the former is routable for multipin nets, while the latter is only routable for 2-pin nets. Thus, a (k, W)-HUSB must be a (k, W)-USB, but the converse is not true in general. In [19]–[21], a decomposition theory and a reduction design technique were proposed for designing (k, W)-HUSBs. The decomposition theory stated that, for a fixed k, the number of minimal k-RRs is finite, and a (k, W)-RR can always be decomposed into a union of minimal k-RRs. As a result, a (k, W)-HUSB can be constructed by a finite number of k-HUSBs of small size. In other words, we can design a few number (depends on k only) of prime k-HUSBs and then use them to build any k-sided HUSBs by applying disjoint union. For k=4, there are seven prime 4-HUSBs (4, r)-HUSBs for $r=1,\ldots,7$ . #### C. New Prime HUSB Designs Fig. 5 shows our new set of prime 4-sided HUSBs. *Theorem 1:* The following statements are true for $H_i$ . - 1) $H_i$ is an optimum (4, *i*)-HUSB, for i = 1, 2, 3, 4, 5. - 2) $H_6$ is hyperuniversal with 38 switches. $H_6$ is close to optimum by, at most, two switches. - 3) $H_7$ is hyperuniversal with 43 switches. $H_7$ is close to optimum by, at most, one switch. The proof of this theorem is relatively technical and lengthy, therefore, the complete proof has been put in the Appendix. Among these proposed prime HUSBs, $H_3$ is a particularly *perfect* example in terms of its hyperuniversal property. Also, it has exactly the same number of switches as that of the Disjoint (4, 3)-SB, Symmetric (4, 3)-USB $M_{4,3}$ [9], and Wilton's (4, 3)-SB [26]. However, none of these designs except $H_3$ is hyperuniversal. Fig. 5. Prime 4-HUSB designs, which are constructed from fine-grained $H_1$ to $H_3$ and are highly scalable. Compared to those none hyperuniversal designs, at most an extra 1 or 2 switches are used in a prime 4-HUSB. $H_4$ is a disjoint union of two $H_2$ plus one extra switch. It has 25 switches, only one switch more than the optimum (4, 4)-USB. This clearly indicates that the lower bound of needed switches for (4, W)-HUSBs cannot be only 6W and a USB is not an HUSB in general. Similarly, $H_5$ is a union of one $H_2$ and one $H_3$ , and $H_6$ is a union of two $H_3$ s plus two bridge switches. $H_6$ has 38 switches, only two switches more than the loose lower bound of 6W (36). $H_7$ is a union of one $H_3$ and one $H_4$ . $H_7$ has 43 switches, only one switch more than that of the corresponding (4, 7)-USB (42). It is quite hard to formally prove if $H_6$ and $H_7$ are both optimum HUSBs. However, their switch counts are so close to any possible lower bound, we conjecture that both $H_6$ and $H_7$ are optimum. #### D. General (4, W)-HUSBs By the reduction design method [19]–[21], using $H_i$ , i = 1,2, 3, 4, 5, 6, 7 as prime building blocks, we construct general (4, W)-HUSBs G(W) as the following: $$G(W) = \begin{cases} H_1, & \text{if } W = 1\\ hH_6, & \text{if } W = 6h\\ (h-1)H_6 + H_7, & \text{if } W = 6h+1\\ hH_6 + H_2, & \text{if } W = 6h+2\\ hH_6 + H_3, & \text{if } W = 6h+3\\ hH_6 + H_4, & \text{if } W = 6h+4\\ hH_6 + H_5, & \text{if } W = 6h+5. \end{cases}$$ TABLE I ROUTABLE AND NON-ROUTABLE CASES. PLEASE REFER TO TABLE III FOR THE DEFINITION OF $GR^k_{i,j}$ | Routable? | Disjoint | Wilton's | Universal | HUSB | | |----------------------------|----------|----------|-----------|----------|--| | | (4,3)-SB | (4,3)-SB | (4,3)-SB | (4,3)-SB | | | $2GR_{1,2}^1 + GR_{2,2}^1$ | YES | NO | YES | YES | | | $GR_{1,3}^2 + GR_{1,2}^1$ | NO | YES | YES | YES | | | $GR_{1,1}^{3}$ | NO | YES | NO | YES | | where $hH_6 + H_i$ means a disjoint union of h copies of $H_6$ and one $H_i$ . The number of switches of G(W) is equal to $$|E(G(W))| = \begin{cases} \frac{19}{3}W, & \text{if } W = 0 \pmod{6} \\ \frac{19}{3}W - \frac{4}{3}, & \text{if } W = 1 \pmod{6} \\ \frac{19}{3}W - \frac{2}{3}, & \text{if } W = 2 \pmod{6} \\ \frac{19}{3}W - 1, & \text{if } W = 3 \pmod{6} \\ \frac{19}{3}W - \frac{1}{3}, & \text{if } W = 4 \pmod{6} \\ \frac{19}{3}W - \frac{5}{3}, & \text{if } W = 5 \pmod{6}. \end{cases}$$ To summarize, we have the following theorem. Theorem 2: G(W) is hyperuniversal with the number of switches between 6 and $6.\overline{3}W$ . #### E. Routing Analysis of Various Switch Boxes We compare the routabilities of Disjoint (4, W)-SBs, (4, W)-USB, Wilton's (4, W)-SB, and (4, W)-HUSB. Table I below shows the exceptions of (4, 3)-PRRs for the four different switch boxes. In general, a (4, W)-RR, which is routable in the Disjoint (4, W)-SB, must be decomposable to W (4, 1)-PRRs; thus, it is also routable in the (4, W)-USB. A minimal nondecomposable (4, 3)-PRR is not routable in (4, W)-USB. We note from Table I that Wilton's (4, W)-SB cannot route some PRRs, which are routable in Disjoint SBs, but Wilton's (4, W)-SB can route some (4, W)-PRRs, which are not routable in both Disjoint (4, W)-SB and (4, W)-USB. Let RR(G) denote the set of (4, W)-RRs, which are routable in switch box G. We have the following general relations when $W \geq 3$ : $$RR$$ (Disjoint(4, W) - SB) $\subset RR$ ((4, W) - USB) $\subset RR$ ((4, W) - HUSB), $RR$ (Wilton's(4, W) - SB) $\subset RR$ ((4, W) - HUSB). But $RR(\mbox{Wilton's}(4,\mbox{W}) - \mbox{SB})$ has no inclusive relationship with $RR(\mbox{Disjoint}(4,\mbox{W}) - \mbox{SB})$ or $RR((4,\mbox{W}) - \mbox{USB})$ . This implies that HUSB is of the highest routing capacity. Another advantage of our design of HUSBs is that a large $(4,\mbox{W})$ -HUSB is a disjoint union of some smaller HUSBs. This makes our design scalable and easy to implement, though the HUSB design requires a few extra switches. ## III. HYPERREARRANGEABLE SWITCH BOXES In this section, we investigate track-fixed routing requirements and the associated switch box design problems. We will use the (4, W)-HUSBs presented in last section to build switch boxes for the targeted applications. Fig. 6. Example of track-fixed routing requirement and its feasible routing. #### A. Track-Fixed Routing Requirements The routing requirements that we have discussed previously are track-free, which means there is no prescribed terminal (track) assignment for routers to follow. On the other hand, a nontrack-free (track-fixed) routing requirement specifies both the sides and terminals for certain nets. Fig. 6(a) shows a track-fixed routing requirement, where the numbers at the net ends indicate the terminal (track) IDs preassigned for the routings. Fig. 6(c) shows a feasible routing on the switch box shown in Fig. 6(b). Track-fixed routing requirements were used in the design of h-side track-fixed (predetermined) switch boxes for GRAs [13], where tracks on given h sides are fixed in the routing requirements. Track-fixed routing requirements can also be used to model interconnect requirement in communication networks. Here, we assume that the routing requirements are valid, i.e., channel-density constraints are met and there is no conflicting track assignments. Similarly, an h-sided-track-fixed (k, W)-SB design problem [10] is a problem of designing a (k, W)-SB which is routable for all (k, W)-routing requirements with track-fixed on a h given sides and track-free on the rest sides. #### B. Designs of Hyperrearrangeable Switch Boxes Dynamic (Reconfigurable) switching networks have been widely used for many applications including parallel processing of multiprocessors, telecommunications, etc. [18]. To reduce the number of switches, a multistage structure is needed with the cost of more switching delays. A two-sided switching network is rearrangeable if it is able to realize any arbitrary permutation between terminals of the two sides [1], [2], [18]. In [18], a three-stage one-sided rearrangeable polygonal switching network PSN(n, m, s) is proposed, which can route any all-track-fixed 2-pin net routing requirements. The PSN(n, m, s) consists of an (s, m)-USB as the second stage and $s \, n \times m$ crossbars CB(n, m) (as the first and third stages) attached to each side. Compared to a (one-stage) s-sided fixed (s, m)-SB, a PSN(n, m, s) takes a fewer number of switches, with the cost of more switching delays. However, because all of these designs are for point-to-point connection models, they are limited for the multipoint connection requirements (e.g., teleconference applications). Using the results of HUSBs we have developed, we can design an HRSB with the ability of allowing simultaneous multipoint connections. Fig. 7. (h, 4, 4)-HRSB designs for $h = 0, \dots, 4$ . The basic idea of our design for an h-sided-track-fixed (h, k, W)-HRSB is to use a (k, W)-HUSB as the central component, and attach h crossbars CB(W, W) to the specified h sides. Fig. 7 shows the structure of the design. Let R be an h-sided-track-fixed (k, W)-routing requirement. We can find a feasible routing of R in two steps. - 1) Let R' be the track-free (k, W)-RR induced by R, that is, changing all the specifications on track numbers to don't care. Since the center component is a (k, W)-HUSB, R' has a feasible routing in the (k, W)-HUSB. - 2) For each track-fixed side, permute the terminals through the CB(W, W) such that the input terminal tracks meet the track-fixed specification. It shows that a three-stage switch box constructed in this way is routable for all routing requirements with track-fixed on the h specified sides. Based on Theorem 1 and the complete permutation capability of CB(W, W), we have the following theorem. Theorem 3: The (h, 4, W)-HRSBs proposed can be routable for any h-side-track-fixed (4, W)-routing requirement. In particular, using a (k,W)-HUSB as the central component, with a $W \times W$ crossbar attached to each side, we can build an efficient three-stage one-sided (k,k,W)-HRSB being able to realize any all-track-fixed multipoint connection requirement for the $k \times W$ terminals. Fig. 8(a) and (c) show a (4, 4)-track-fixed routing requirement and its routing in a three-stage (4, 4, 4)-HRSB, where the central component is an $H_4$ and the peripherals are crossbars CB(4,4). ## C. Designs of Multistage Switch Boxes for Improved GRAs In [10], Wu *et al.* have investigated the problem of designing h-side-track-fixed (k, W)-SBs, where terminals on certain h sides have been preassigned. These h-side-track-fixed switch boxes were originally addressed for a kind of hypothetical FPGA structures called GRAs, which possesses a unique property that a local (around an SB) detailed routing can be greedily extended into an entire chip routing [10], [13], [17]. Fig. 8. Example of a (4, 4, 4)-HRSB and its realization for a track-fixed routing requirement of 7 nets. Since it has been shown that there is no polynomial algorithm for realizing the entire chip routing for a given global routing [16], the GRAs routing property is useful for this purpose. Fig. 9 shows the H-tree GRA and Snake-like GRA. In this scheme, a routing process starts from a prespecified switch box and follows a specified order (e.g., either spiral or snake-like [13]). Upon the completion of the last local routing, without changing any routing done previously, an entire chip routing is completed. Consequently, a routing problem for the entire chip can be greedily decomposed into a sequence of localized optimum h-side-track-fixed routing problems, where the optimum h-side-track-fixed switch boxes are designed for this purpose. This raised the h-side-track-fixed k-sided switch box design problems $(0 \le h \le k)$ . References [10] and [13] solved the cases for $k \leq 4$ . However, as the number of switches required for such switch boxes is high, the GRAs do not seem practical for today's FPGA applications. Nonetheless, allowing such h-side-track-fixed switch boxes be implemented in multiple stages, the number of switches can be further reduced. For example, a (4, 4, W)-HRSB under the new design scheme can be implemented in $4W^2 + 6.\overline{3}W$ switches, compared to the $6W^2$ switches required in a single-stage design. The following table compares, the switch counts and flexibilities of the h-side-track-fixed switch boxes built in three-stage ((h, 4, W)-HRSB) and single stage ((h, 4, W)-SB) from [10]. | h | (h,4,W) – HRSB | $F_s$ | (h,4,W) – SB | $F_s$ | |---|--------------------------|-------|---------------|-------| | 0 | $6.\overline{3}W$ | 4 | NA | NA | | 1 | $W^2 + 6.\overline{3}W$ | W+3 | $1.5W^2 + 3W$ | W+3 | | 2 | $2W^2 + 6.\overline{3}W$ | W+3 | $3.5W^2 + 2W$ | 3W | | 3 | $3W^2 + 6.\overline{3}W$ | W+4 | $5W^2 + W$ | 3W | | 4 | $4W^2 + 6.\overline{3}W$ | W+4 | $6W^2$ | 3W | We note that, as most applications use 4-way switch boxes, our h-side-track-fixed multistage (h, 4, W)-HRSBs have provided a family of switch boxes for solving the h-side-track-fixed routing problems using less number of switches. Moreover, as our design scheme is constructive, we only need to use $H_6$ s and at most one $H_i$ for i=2,3,4,5,7. Besides, we can also design an efficient routing algorithm for switch boxes produced using this scheme [25]. Routing for the H-tree architecture Routing for the Spiral routing architecture Fig. 9. Two GRAs, which can greedily extend a locally optimum routing to an optimum entire chip routing following the shown extending sequence. #### IV. EXPERIMENTAL RESULTS It is always arguable whether a switch box, being a local optimum, will also be a global optimum, when a routing requirement involving the entire chip is rendered. Formal routability analysis covering the entire chip has rarely been done, partly because of its nontractable complexity and partly because of some variable factors: there are very long nets and very short nets and the distribution can be application dependent. It seems that a quick way to get some references is to resort to the benchmark experiments, although the results could still be router dependent. Besides the theoretical analysis, in order to get some experimental comparisons, we choose to adopt the current best known FPGA router VPR [27], which is available on the Web, for our experiment. The logic block structure for our VPR runs is set to consist of one four-input LUT and one flip-flop. The input or output pin of the logic block can connect to any track in the adjacent channels, i.e., $F_c = W$ . Inside a switch box, each input wire segment can connect to three other output wire segments of other channels, i.e., $F_s = 3$ . In order to have a fair comparison (partially also due to the $F_s=3$ limitation set by VPR router) with the well-known Fig. 10. Routing result of e64 by using H'USB S-Box, W = 7. Disjoint structure, we deliberately eliminate those "additional" bridge switches of our HUSBs to make our H'USBs have density of 6W, which is the same as Disjoint S-boxes. Fig. 10 shows the structure of a hyperuniversal S-box and a routing result of our experiments. In Table II, we show the compared results of the number of tracks required to route some larger Microelectronics Center of North Carolina benchmark circuits [28] by Disjoint (4, W)-SBs, Wilton's (4, W)-SBs, (4, W)-USBs, and our (4, W)-H'USBs. It is observed that, except for the most decomposable Disjoint SBs, all the other threes achieve similar improved ( $\sim$ 10% less tracks) results for 35 router iterations, and achieve about 5% less tracks for 100 router iterations. The difference between them is probably not significant statistically since the factor of router design is still influencing. (Meanwhile, since the VPR is a simulated annealing-based nondeterministic router, the results we produced could be a bit different to other VPR reported results.) It seems that improving the routability of local switching boxes can also help the entire chip routing, and seeking a good balance between layout simplicity, design scalability, where Disjoint (4, W)-SBs could be the best, and routability might be an engineering issue worth involved justification. #### V. CONCLUSION From the combinatorial analysis shown above, we obtain an important result that any multipin routing requirement can be decomposed into minimal subrouting requirements for any given number of sides. Therefore, the complicated optimum hyperuniversal switch box design problem can be treated constructively. We found that for some Ws, there exist 4-sided HUSBs with switch number of only 6W (what used in Disjoint (4, W)-SBs, Wilton's (4, W)-SBs, and (4, W)-USBs), however, only the (4, W)-HUSBs can route all routing requirements. It seems encouraging to find that only very few more switches are needed to make the today's known nonhyperuniversal switch boxes become hyperuniversal for the practical range of W values. Nonetheless, as observed in the construction of some optimum (4, W)-HUSBs, the generation of general optimum (4, W)-HUSBs still seem to hardly possess strict regularity or very high scalability observed in 4-sided Disjoint or Universal switch boxes. To maintain high scalability for layout simplicity while still achieve excellent optimality, in this paper, we present a new class of (4, W)-HUSBs, which improves the previous designs in [19]–[21] by reducing the number of switches from 6.67 to 6. $\overline{3}W$ . We proved that this new design is optimum for $W=1,\ldots,5$ and near the optimum for $W\geq 6$ . Hyperrearrangeable switching networks for multiple terminal connections are useful for today's many practical applications (e.g., teleconferences). By simply using a k-side HUSB as the center component and attaching a $W \times W$ crossbar to each side, we build a three-stage one-sided polygonal switching network, which can realize simultaneous connection for any partition of the kW terminals. Like many other problems, there still exists tough open questions. Although we can show that, in the case of a four-sided switch module, the number of switches used for this HRSB is $(N^2/4)+1.57N$ , where $N=4\times W$ is the total number of terminals, the optimum number of switches with respect to any given number of terminals is still under investigation. We will also explore potential applications for other similar problems using the techniques developed in this paper. #### APPENDIX PROOF OF THEOREM 1 In order to make the proof easy to verify, we redraw $H_i$ s in a way that a feasible routing can be checked easily on the diagram. Fig. 11 below has shown the new drawing of $H_i$ s. #### A. Transformation, Simplification, and Decomposition If a (4, r)-RR R is not primitive, then we can combine the unequal nets of size 1 into nets of size 2 to obtain a (4, r)-PRR R'. Any feasible routing of R' will induce a feasible routing for R by simply deleting the edges of those one edge trees representing the nets of size two in R', which are obtained by combining the unequal nets of size 1 in R. Therefore, to verify a (4, r)-SB is hyperuniversal, we only need to show that it is routable for all (4, r)-PRR. We generate all (4, r)-PRRs through disjoint union of minimal 4-PRRs. This method could generate all (4, r)-PRRs, without missing any case, but it may repeat some (4, r)-PRRs because the decomposition of a (4, r)-PRR into minimal 4-PRRs is not unique in general. Table III gives all minimal 4-PRRs [19]–[21], where $GR_{j,k}^i$ denotes the kth PRR of type j of density i. It has been proven that $H_1$ and $H_2$ are optimum HUSBs [19]–[21]. Since every (4, 5)-PRR can be decomposed into a union of 4-MPRRs of channel densities 1, 2, and 3, a (4, 5)-PRR can be regrouped into one (4, 2)-PRR and one (4, 3)-PRR. $H_5$ = | | VPR with 35 iterations | | | VPR with 100 iterations | | | | | |--------------|------------------------|---------|----------|-------------------------|----------|-------|----------|-------| | | Disjoint | USB | Wilton's | H'USB | Disjoint | USB | Wilton's | H'USB | | alu4 | 10 | 10 | 10 | 10 | 10 | 10 | 9 | 9 | | apex2 | 12 | 11 | 11 | 11 | 11 | 11 | 10 | 10 | | apex4 | 13 | 12 | 12 | 12 | 12 | 11 | 11 | 11 | | bigkey | 7 | 7 | 6 | 7 | 6 | 6 | 6 | 6 | | $_{ m clma}$ | 13 | 11 | 11 | 11 | 11 | 11 | 11 | 11 | | des | 8 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | | diffeq | 8 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | | dsip | 7 | 7 | 7 | 7 | 6 | 6 | 6 | 6 | | elliptic | 11 | 10 | 10 | 10 | 10 | 9 | 9 | 9 | | ex1010 | 11 | 10 | 10 | 10 | 10 | 9 | 10 | 10 | | ex5p | 14 | 13 | 13 | 13 | 13 | 12 | 12 | 12 | | frisc | 13 | 12 | 12 | 12 | 12 | 11 | 11 | 11 | | misex3 | 11 | 11 | 10 | 11 | 10 | 10 | 10 | 10 | | $_{ m pdc}$ | 17 | 16 | 16 | 17 | 16 | 15 | 16 | 15 | | s298 | 8 | 7 | 7 | 7 | 7 | 6 | 7 | 6 | | s38417 | 8 | 7 | 7 | 8 | 7 | 7 | 7 | 7 | | s38584.1 | 8 | 8 | 8 | 8 | 7 | 7 | 7 | 7 | | seq | 12 | 11 | 11 | 11 | 11 | 11 | 11 | 10 | | $_{ m spla}$ | 14 | 14 | 13 | 13 | 13 | 12 | 12 | 12 | | tseng | 7 | 6 | 6 | 7 | 6 | 6 | 6 | 6 | | e64 | 8 | 8 | 8 | 8 | 8 | 7 | 7 | 7 | | Total | 230 | 205 | 202 | 207 | 200 | 191 | 192 | 189 | | Reduction | | -10.09% | -12.17% | -10% | | -4.5% | -4.0% | -5.5% | TABLE II Channel Widths Required (by VPR Router) for Different Benchmark Circuits $F_C=W,\,F_S=3$ Fig. 11. Alternative representation of prime 4-sided HUSBs. $H_2+H_3$ is hyperuniversal provided that $H_2$ and $H_3$ are both hyperuniversal. Similarly, since a (4, 7)-PRR can always be decomposed into one (4, 3)-PRR and one (4, 4)-PRR, $H_7=H_3+H_4$ is hyperuniversal provided that $H_3$ and $H_4$ are both hyperuniversal. Therefore, we only need to prove $H_3$ , $H_4$ , and $H_6$ are all hyperuniversal. #### B. H<sub>3</sub> is an Optimal (4, 3)-HUSB It is sufficient to show that $H_3$ has a feasible routing for every (4, 3)-PRR obtained by combining 4-MPRRs in Table III. Let R be such a (4, 3)-PRR. It is obvious to see that the side permutation $\sigma = (1,4)(2,3)$ induces an automorphism of $H_3$ (symmetric against the central vertical line). This indicates that if R is routable in $H_3$ , then the (4,3)-PRR $\sigma(R)$ is also routable in $H_3$ . Therefore, we only need to consider (4,3)-PRRs, which are not equivalent under $\sigma$ . Case 1) R is one of $GR_{i,j}^3$ . Since $GR_{2,1}^3$ and $GR_{2,2}^3$ ( $GR_{2,3}^3$ and $GR_{2,4}^3$ ) are equivalent under $\sigma$ , it is sufficient to consider $GR_{1,1}^3$ , $GR_{2,1}^3$ , and $GR_{2,3}^3$ . Feasible routings of them are given in Fig. 12, in which the numbers represent the side labels associated vertices. #### Case 2) R consists of three (4, 1)-MPRRs. ## **Subcase 2.1.** R does not contain a $GR_{2,3}^1$ . In this case, we consider the three subgraphs $B_1$ , $B_2$ , and $B_3$ of $H_3$ . $B_1$ consists of the lower level, $B_2$ consists of the left half of the upper two levels, and $B_3$ consists of the right half of the upper two levels [see Fig. 13(i)]. We note that each subgraph can route any one in $\{GR_{i,j}^1: i \neq 2, j \neq 3\}$ . Therefore, R has a feasible routing in $H_3$ . **Subcase 2.2.** $R = GR_{2,3}^1$ . A feasible routing is given in Fig. 13(a). # **Subcase 2.3** R contains only one $GR_{2,3}^1$ . If R contains a P which is one of $GR_{2,3}^1+GR_{2,2}^1$ , $GR_{2,3}^1+GR_{3,2}^1$ , $GR_{2,3}^1+GR_{2,1}^1$ , and $GR_{2,3}^1+GR_{1}^1$ , then we can route P as shown in Fig. 13(b)–(e). We note that the unused part in $H_3$ can route any one of $\{GR_{1,1}^1, GR_{2,1}^1, GR_{2,2}^1, GR_{3,1}^1, GR_{3,2}^1\}$ . This proves that GR is routable in $H_3$ . Let R contain $GR_{2,3}^1+GR_{3,1}^1$ . We may assume that $GR_{2,2}^1$ is not in R. Then, we route $GR_{2,3}+GR_{3,1}^1$ as shown in Fig. 13(f). The unused part in $H_3$ can route any $GR_{i,j}^1$ except $GR_{2,3}^1$ . #### **Subcase 2.4.** R contains two $GR_{2.3}^1$ s. If we route $2GR_{2,3}^1$ as shown in Fig. 13(g), then, the unused top level can route any of $GR_{1,1}^1$ , $GR_{2,1}^1$ , $GR_{3,1}^1$ , $GR_{3,2}^1$ . For $R = 2GR_{2,3}^1 + GR_{2,2}^1$ , a feasible routing is given by Fig. 13(h). $\begin{tabular}{ll} TABLE & III \\ ALL & MINIMAL & PRIMITIVE 4-WAY & ROUTING & REQUIREMENTS \\ \end{tabular}$ $$GR_{1,1}^1 = \{\{1,2,3,4\}\}, GR_{2,1}^1 = \{\{1,2\},\{3,4\}\}, GR_{2,2}^1 = \{\{1,3\},\{2,4\}\}, GR_{2,3}^1 = \{\{1,4\},\{2,3\}\}, GR_{3,1}^1 = \{\{1\},\{2,3,4\}\}, GR_{3,2}^1 = \{\{2\},\{1,3,4\}\}, GR_{3,3}^1 = \{\{3\},\{1,2,4\}\}, GR_{3,4}^1 = \{\{4\},\{1,2,3\}\}, GR_{1,1}^2 = \{\{1,2,3\},\{2,3,4\},\{1,4\}\}, GR_{1,2}^2 = \{\{1,2,3\},\{2,3,4\},\{1,4\}\}, GR_{1,3}^2 = \{\{1,2,4\},\{2,3,4\},\{1,3\}\}, GR_{1,3}^2 = \{\{1,2,4\},\{2,3,4\},\{1,2\}\}, GR_{1,5}^2 = \{\{1,2,3\},\{1,3,4\},\{2,4\}\}, GR_{1,6}^2 = \{\{1,2,4\},\{1,3,4\},\{2,3\}\}, GR_{2,1}^2 = \{\{1,2,4\},\{1,3\},\{2\},\{3,4\}\}, GR_{2,2}^2 = \{\{1,2,4\},\{1,3\},\{2\},\{3,4\}\}, GR_{2,3}^2 = \{\{2,3,4\},\{1,4\},\{2\},\{1,3\}\}, GR_{2,4}^2 = \{\{1,2,3\},\{1,4\},\{3\},\{2,4\}\}, GR_{2,5}^2 = \{\{1,4,3\},\{1,2\},\{3\},\{1,4\}\}, GR_{2,6}^2 = \{\{2,4,3\},\{1,2\},\{4\},\{1,3\}\}, GR_{2,9}^2 = \{\{1,2,4\},\{1,3\},\{4\},\{2,3\}\}, GR_{2,1}^2 = \{\{1,2,4\},\{4,3\},\{1\},\{2,3\}\}, GR_{2,1}^2 = \{\{1,2,4\},\{4,3\},\{1\},\{2,3\}\}, GR_{2,1}^2 = \{\{1,2,4\},\{4,3\},\{1\},\{2,3\}\}, GR_{2,1}^2 = \{\{1,2,4\},\{4,3\},\{1\},\{2,3\}\}, GR_{2,1}^2 = \{\{1,2,4\},\{4,3\},\{1\},\{2,3\}\}, GR_{2,1}^2 = \{\{1,2,4\},\{4,3\},\{1\},\{2,3\}\}, GR_{3,1}^2 = \{\{1,2\},\{4,1\},\{2,4\},\{3\},\{3\}\}, GR_{3,1}^2 = \{\{1,2\},\{4,1\},\{2,4\},\{3\},\{3\}\}, GR_{3,1}^2 = \{\{1,2\},\{4,1\},\{2,4\},\{3\},\{3\}\}, GR_{3,1}^2 = \{\{1,2,3\},\{4,4\},\{2,4\},\{3,4\},\{1,2,3\}\}, \{\{2,3,4\},\{1,2\},\{4,4\},\{3,4\},\{1,2,3\}\}, GR_{3,1}^2 = \{\{2,3,4\},\{1,2\},\{4,4\},\{3,4\},\{4,4\}\}, GR_{3,2}^2 = \{\{2,3,4\},\{1,2\},\{4,4\},\{3,4\},\{4,4\}\}, GR_{3,2}^2 = \{\{2,3,4\},\{1,2\},\{4,4\},\{3,4\},\{4,4\}\}, GR_{3,2}^2 = \{\{2,3,4\},\{1,2\},\{4,4\},\{3,4\},\{4,4\}\}, GR_{3,2}^2 = \{\{2,3,4\},\{1,2\},\{4,4\},\{3,4\},\{4,4\}\}, GR_{3,2}^2 = \{\{2,3,4\},\{1,2\},\{4,4\},\{3,4\},\{4,4\}\}, GR_{3,4}^2 = \{\{4,1,2\},\{3,4\},\{3,4\},\{4,4\}\}, GR_{3,4}^2 = \{\{4,1,2\},\{3,4\},\{3,4\},\{4,4\}\}, GR_{3,4}^2 = \{\{4,4,2\},\{3,4\},\{4,4\}\}, GR_{3,4}^2 = \{\{4,4,2\},\{3,4\},\{4,4\}\}, GR_{3,4}^2 = \{\{4,4,2\},\{3,4\},\{4,4\}\}, GR_{3,4}^2 = \{\{4,4$$ Table 3: All minimal primitive 4-way routing requirements. # Case 3) R contains a $GR_{i,j}^2$ It is sufficient to consider the following $GR_{i,j}^2$ s which are not equivalent under $\sigma$ : $$\begin{aligned} & GR_{1,1}^2, GR_{1,2}^2, GR_{1,3}^2, GR_{1,6}^2, GR_{2,1}^2, GR_{2,2}^2, \\ & GR_{2,3}^2, GR_{2,7}^2, GR_{2,8}^2, GR_{2,9}^2, GR_{3,1}^2, GR_{3,2}^2 \end{aligned}$$ If R does not contain $\mathrm{GR}^1_{2,3}$ , then we route R in $H_3$ as in Fig. 14, in which each diagram shows a feasible routing of the above $R^2_{i,j}$ in top two levels, and the lower level is used to route any of $\mathrm{GR}^1_{i,j}$ except $\mathrm{GR}^1_{2,3}$ . If R contains a $\mathrm{GR}^1_{2,3}$ , then we route R as in Fig. 15. Finally, we conclude that $H_3$ is routable for all (4, 3)-PRRs. Hence, $H_3$ is a (4, 3)-HUSB. $H_3$ has 18 switches, which is equal to the lower bound 6 $\times$ 3 on the number of switches of a (4, 3)-HUSB. Therefore, $H_3$ is an optimum (4, 3)-HUSB. Fig. 12. Feasible routings of $GR_{i,j}^3$ in $H_3$ . Fig. 13. Feasible routings of three $GR_{i,j}^1$ s in $H_3$ . Fig. 14. Feasible routings of $GR_{i,j}^2 + GR_{h,k}^1$ in $H_3$ . Fig. 15. Feasible routings of $GR_{i,j}^2 + GR_{2,3}^1$ in $H_3$ . #### C. $H_4$ is an Optimum (4, 4)-HUSB $H_4$ has 25 switches. We first show that $H_4$ is hyperuniversal, then show that there is no (4, 4)-HUSB with 24 switches. Let R be any (4, 4)-PRR which is a union of minimal 4-PRRs. Since $H_4$ is a union of two $H_2$ plus one extra switch, R is routable in $H_4$ when it is a union of two (4, 2)-PRRs. Therefore, we only need to consider the case when R is a union of a $R_{i,j}^1$ and a $R_{h,k}^3$ . If R does not contain $R_{2,3}^1$ , we can first route the five $R_{h,k}^3$ first as shown in Fig. 16(a). Note that the unused part in $H_4$ is a cycle 1, 2, 4, 3, 1 which can be used to route any of $R_{i,j}^1$ except $GR_{2,3}^1$ . If R contains a $GR_{2,3}^1$ , then a feasible routing of R in $H_4$ is given in Fig. 16(b). This proves that $H_4$ is hyperuniversal. Next, we show that no (4,4)-SB with 24 switches is an HUSB. Suppose on the contrary that $G = ((V_1, V_2, V_3, V_4), E)$ is a (4, 4)-HUSB with 24 switches. In an HUSB, every pair of sides must induce a (2, 4)-HUSB with at least 4 switches. Moreover, every three sides of G induces an optimal (3, 4)-HUSB, which is either a cycle of length 12 or a union of two cycles of length 6 [19]–[21]. If G is not connected, then it must be a union of two (4,2)-SBs by the above arguments. But such a switch box is not routable for a $R_{i,j}^3 + R_{k,h}^1$ . Therefore, G must be connected. The idea of the proof is that we enumerate every possible graph G with the above properties and obtain a contradiction by finding a nonroutable (4, 4)-PRR for G. Next, we only present graphs G such that every three sides of G induces a cycle of length 12. The cases that some three sides of G induce two cycles of length 6 can be proved similarly. Suppose there is a 4-matching between each pair of $V_i$ , $V_j$ for $i \neq j$ and the induced subgraph of G on each set $V_i \cup V_j \cup V_m (i \neq j \neq m)$ is a cycle. It is obvious (by relabeling if necessary) that G contains the graph $G_1$ as shown in Fig. 17(a). Starting from $G_1$ , we have to select a matching M between $V_1$ and $V_3$ so that the induced subgraph G[1,2,3] on $V_1 \cup V_2 \cup V_3$ is a cycle. We note that $v_{1,1}v_{3,1}$ , $v_{1,2}v_{3,2}$ , $v_{1,3}v_{3,3}$ , and $v_{1,4}v_{3,4}$ are forbidden edges; otherwise, Fig. 16. Feasible routings of (4, 4)-PRRs in $H_4$ . GR<sub>2, 3</sub>+GR<sub>2, 1</sub> $GR_{2,3}^1 + GR_{1,1}^3$ the induced subgraph on sides 1, 2, and 3 contains cycles of length 3. There are only six possible cases of M such that the subgraph G[1,2,3] is a cycle. GR<sub>2, 3</sub> +GR<sub>2, 2</sub> $M=\{v_{1,1}v_{3,2},v_{1,2}v_{3,3},v_{1,3}v_{3,4},v_{1,4}v_{3,1}\},$ which gives the graph shown in Fig. 17(b). $GR_{2,3}^1 + GR_{2,3}^3$ $GR_{2,3}^1 + GR_{2,4}^3$ $M = \{v_{1,1}v_{3,2}, v_{1,2}v_{3,4}, v_{1,3}v_{3,1}, v_{1,4}v_{3,3}\}$ , which gives a graph shown in Fig. 17(c). $M = \{v_{1,1}v_{3,3}, v_{1,2}v_{3,4}, v_{1,3}v_{3,2}, v_{1,4}v_{3,1}\}$ , which gives a graph shown in Fig. 17(d). $M = \{v_{1,1}v_{3,3}, v_{1,2}v_{3,1}, v_{1,3}v_{3,4}, v_{1,4}v_{3,2}\}$ , which gives a graph shown in Fig. 17(e). $M = \{v_{1,1}v_{3,4}, v_{1,2}v_{3,1}, v_{1,3}v_{3,2}, v_{1,4}v_{3,3}\}$ , which gives a graph shown in Fig. 17(f). $M = \{v_{1,1}v_{3,4}, v_{1,2}v_{3,3}, v_{1,3}v_{3,1}, v_{1,4}v_{3,2}\}$ , which gives a graph shown in Fig. 17(g). For each of these choices [the graphs in Fig. 17(b)–(g)], we need to select a matching between $V_1$ and $V_4$ to obtain G so that the induced subgraphs of G on $V_1 \cup V_2 \cup V_4$ and on $V_1 \cup V_3 \cup V_4$ are Hamiltonian cycles. If we take the case of Fig. 17(d) as an example, the forbidden edges are $v_{1,1}v_{4,2}, \ v_{1,1}v_{4,3}, \ v_{1,2}v_{4,3}, \ v_{1,2}v_{4,4}, \ v_{1,3}v_{4,2}, \ v_{1,3}v_{4,4}, \ \text{and} \ v_{1,4}v_{4,1}.$ Therefore, the matchings between $V_1$ and $V_4$ that can make the induced subgraphs of G on $V_1 \cup V_2 \cup V_4$ and on $V_1 \cup V_3 \cup V_4$ become Hamiltonian cycles $\{v_{1,1}v_{4,1}, v_{1,2}v_{4,2}, v_{1,3}v_{4,3}, v_{1,4}v_{4,4}\}, \ \{v_{1,1}v_{4,4}, v_{1,2}v_{4,1}, v_{1,3}v_{4,3}, v_{1,4}v_{4,2}\}, \ \text{and} \ \{v_{1,1}v_{4,4}, v_{1,2}v_{4,3}\}.$ The corresponding three graphs are listed to the right of Fig. 17(d), and each is labeled by a 4-PRR which has no feasible routing in the switch box. There are totally 21 graphs G and each of them is labeled by a 4-PRR, which has no feasible routing in G. This proves that any switch box G with 24 switches is not a HUSB and thus $H_4$ is an optimum (4, 4)-HUSB. Fig. 17. Nonroutable (4, 3)-PRRs in all (4, 4)-SBs of 24 switches. #### D. H<sub>6</sub> is Hyperuniversal To prove that $H_6$ is hyperuniversal, we need to show $H_6$ contains a feasible routing for every (4, 6)-PRR. Let R be a (4, 6)-PRR, then R can be decomposed into either two (4,3)-PRRs or three minimal (4,2)-PRRs. In the first case, Fig. 18. Feasible routings of three $GR_{i,j}^2$ s in $H_6$ . R is clearly routable in $H_6$ as $H_6$ contains two disjoint $H_3$ s. In the second case, we then show that R is also routable in $H_6$ . Let $R = R_1 + R_2 + R_3$ , where each $R_i$ is a minimal (4,2)-PRR from Table III. Let K(i,i+1) be the subgraph of $H_6$ , which consists of the levels i and i+1. We have three disjoint subgraphs K(1,2), K(3,4), and K(5,6) of $H_6$ . From Fig. 14, we observe that K(1,2) and K(5,6) can route any minimal (4,2)-PRR from Table III. Next, we show that K(3,4) is routable for any $\mathrm{GR}_{i,j}^2$ . Again, we note that the permutation $\sigma=(1,4)(2,3)$ is an automorphism of $H_6$ , so that it is sufficient to check those $\mathrm{GR}_{i,j}^2$ s which are different in $\sigma$ . Fig. 18 lists the feasible routings of these $\mathrm{GR}_{i,j}^2$ s in K(3,4). Therefore, $R_1,R_2,R_3$ are routable in K(1,2),K(3,4),K(5,6), respectively. Hence, R is routable in $H_6$ . This completes the proof of Theorem 1. ### REFERENCES - V. E. Benes, "On rearrangeable three-stage connecting networks," *Bell Syst. Tech. J.*, vol. 41, no. 5, pp. 1481–1492, 1962. C. Mitchell and P. Wild, "On a class of rearrangeable networks," *IEEE* - [2] C. Mitchell and P. Wild, "On a class of rearrangeable networks," *IEEE Trans. Commun.*, vol. 37, pp. 52–56, Jan. 1989. - [3] Altera Corp., The Maximalist Handbook, 1990. - [4] Xilinx. Inc., The Programmable Logic Data Book, 1994. - [5] J. Rose and S. Brown, "Flexibility of interconnection structure for field-programmable gate arrays," *IEEE J. Solid-State Circuits*, vol. 26, pp. 277–282, Mar. 1991. - [6] S. Brown, J. Rose, and Z. G. Vranesic, A Detailed Router for Field-Programmable Gate Arrays. Norwell, MA: Kluwer, 1992. - [7] M. J. Alexander and G. Robins, "New performance FPGA routing algorithms," in *Proc. ACM/IEEE Design Automation Conf.*, 1995, pp. 562–567. - [8] Y. S. Lee and A. C. H.Allen C. H. Wu, "A performance and routability driven router for FPGA's considering path delays," in *Proc. ACM/IEEE Design Automation Conf.*, 1995, pp. 557–561. - [9] Y. W. Chang, D. F. Wong, and C. K. Wong, "Universal switch modules for FPGA design," ACM Trans. Design Automation Electron. Syst., vol. 1, no. 1, pp. 80–101, 1996. - [10] J. F. Pan, Y. L. Wu, C. K. Wong, and G. Yan, "On the optimal four-way switch box routing structures of FPGA greedy routing architectures," *Integration, The VLSI J.*, vol. 25, pp. 137–159, 1998. - [11] V. Beta, J. Rose, and A. Marquardt, Architecture and CAD for Deep-Submicrom FPGAs. Norwell, MA: Kluwer, 1999. - [12] M. Shyu, Y. D. Chang, G. M. Wu, and Y. W. Chang, "Generic universal switch blocks," *IEEE Trans. Comput.*, vol. 49, pp. 348–359, Apr. 2000. - [13] Y. L. Wu and D. Chang, "On NP-completeness of 2-D FPGA routing architectures and a novel solution," in *Proc. Int. Conf. Computer-Aided-Design*, 1994, pp. 362–366. - [14] Y. L. Wu, D. Chang, M. Marek-Sadowska, and S. Tsukiyama, "Not necessarily more switches more routability," in *Proc. Asia South Pacific Design Automation Conf.*, 1997, pp. 579–584. - [15] Y. L. Wu and M. Marek-Sadowska, "Routing for array type FPGAs," IEEE Trans. Computer-Aided Design, vol. 16, pp. 506–518, May 1997. - [16] Y. L. Wu, S. Tsukiyama, and M. Marek-Sadowska, "On computational complexity of a detailed routing problem in two-dimensional FPGAs," in *Proc. 4th Great Lakes Symp. VLSI*, Mar. 1994, pp. 70–75. - [17] Y. L. Wu, M. Tsukiyama, and M. Marek-Sadowska, "Graph-based analysis of 2-D FPGA routing," *IEEE Trans. Computer-Aided Design*, vol. 15, pp. 33–44, Jan. 1996. - [18] M. H. Yen, S. J. Chen, and S. H. Lan, "A three-stage one-sided rearrangeable polygonal switching network," *IEEE Trans. Comput.*, vol. 50, pp. 1291–1294, Nov. 2001. - [19] H. Fan, J. P. Liu, and Y. L. Wu, "General models for optimum arbitrary-dimension FPGA switch box designs," in *Proc. IEEE Int. Conf. Computer-Aided Design*, San Jose, CA, Nov. 2000, pp. 93–98. - [20] H. Fan, J. Liu, and Y. Wu, "General models for switch box design of FPGA architecture," *IEEE Trans. Comput.*, vol. 52, pp. 21–30, Jan. 2003. - [21] H. Fan, J. Liu, Y. L. Wu, and C. K. Wong, "Reduction design for generic universal switch blocks," ACM Trans. Design Automation Electron. Syst., vol. 7, no. 4, pp. 526–546, 2002. - [22] H. Fan, Y. L. Wu, and Y. W. Chang, "Comment on general universal switch blocks," *IEEE Trans. Comput.*, vol. 51, pp. 93–95, Jan. 2002. - [23] H. Fan, J. Liu, Y. L. Wu, and C. C. Cheung, "On optimum switch box designs for 2-D FPGAs," in *Proc. IEEE/ACM Design Automation Conf.*, Las Vegas, NV, June 2001, pp. 203–208. - [24] —, "On optimum designs of universal switch blocks," in *Proc. IEEE Int. Conf. Field Programm. Logic Applicat.*, Montpellier, France, 2002, pp. 142–151. - [25] J. Liu, H. Fan, D. Porto, and Y. L. Wu, "An Efficient exact router for hyper-universal switching box," *IEICE Trans. Fundamentals Electron.*, *Commun., Comput. Sci.*, vol. E86–A, no. 6, pp. 1430–1436, 2003. - [26] S. J. E. Wilton, "Architecture and Algorithms for field-programmable gate arrays with embedded memory," Ph.D. disseration, Dept. of Elect. Comput. Eng., Univ. Toronto, Toronto, Canada, 1997. - [27] V. Betz and J. Rose, "A new packing, placement and routing tool for fpga research," in *Proc. 7th Int. Workshop Field-Programm. Logic Applicat.*, 1997, pp. 213–222. - [28] S. Yang, "Logic Synthesis and Optimization Benchmarks, Version 3.0," Microelectronics Centre of North Carolina, 1991. **Jiping Liu** received the B.S. and M.S. degrees in mathematics from Shandong University, Shandong, in 1982 and 1986, respectively, and the Ph.D. degree in combinatorics and graph theory from Simon Fraser University, Burnaby, BC, Canada, in 1992. He was a Lecturer at Shandong University, from 1982 to 1987. He held various positions as a Post-doctoral Fellow and Research Associate before joining the University of Lethbridge, Lethbridge, AB, Canada, in 1995, where he is now an Associate Professor of mathematics and computer science. His research interests include various optimum design problems from VLSI, algorithms and complexities, graph theory including graph decompositions, factorizations, colorings, graph domination, and Hamiltonian properties of certain graphs. Yu-Liang Wu (M'96) received the B.S. and M.S. degrees in computer science from the Florida International University, Miami, in 1983 and 1984, respectively. He received the Ph.D. degree in electrical and computer engineering from the University of California, Santa Barbara, in 1994. In 1985, he was with Internet Systems Corporation, where he was a System Programmer of network communication protocols (DARPA TCP/IP, Telnet). From 1986 to 1988, he was with AT&T Bell Labs on the development of several telephone distributed op- eration systems (RMAS, MFOS). From 1988 to 1989, he was with the Amdahl Corporation on tester software designs. He joined The Chinese University of Hong Kong, Shatin, Hong Kong, in January 1996. In December 1994, he joined Cadence Design Systems Inc., where he worked as a Senior MTS in research and development of the silicon synthesis product (PBS) targeting at binding the gap between logical and physical level optimizations for deepsubmicron chip designs. His current research interests mainly relate to optimization of logic and physical design automation of VLSI circuits and FPGA-related CAD tool designs and architectural analysis/optimization. Hongbing Fan received the B.S. degree in mathematics from Shandong University, Shandong, China, in 1982, the Ph.D. degree in operational research and control theory from the Institute of Mathematics, Shandong University, in 1990, and the Ph.D. degree in computer science from the University of Victoria, Victoria, BC, Canada, in 2003. He joined the Mathematics Department, Shandong University, in 1990 and served as an Associate Professor, later working with the Department of Computer Science and Engineering as an Research Associate, in 1998. His research interests are in combinatorial algorithm and complexity, graph theory, and various topics in VLSI design and operations research. Chak-Chung Cheung (SM'01) received the B.Eng. degree in computer engineering and the M.Phil. degree in computer science and engineering from The Chinese University of Hong Kong (CUHK), Shatin, Hong Kong, in 1999 and 2001, respectively. In 2001, he was a System Administrator with the Center of Large-Scale Computation, Cluster Technology, Hong Kong. He is now an Instructor in the Department of Computer Science and Engineering, CUHK, and is responsible for the courses in Internet and web programming technologies. His current research interests are optimization of logic and physical design automation of VLSI ASIC/FPGA designs and high-level synthesis of reconfigurable computing.