The size of a single-hop cross-bar fabric is still limitedby the technology, and the fabrics available on the market do notexceed the terabit capacity. A multihop fabric such as Clos networkprovides the higher capacity by using the smaller switchingelements (SE). When the traffic load is balanced over the switchesin a middle stage, all the traffic would get through the fabric, aslong as the switch outputs are not overloaded. However, the delaythat packets experience through the Clos switch depends on thegranularity of flows that are balanced. We examine the maximumfabric utilization under which a tolerable delay is provided for variousload balancing algorithms, and derive the general formula forthis utilization in terms of the number of flows that are balanced.We show that the algorithms which balance flows with sufficientlycoarse granularity provide both high fabric utilization and delayguarantees to the most sensitive applications. Since no admissioncontrol should be performed within the switch, the fast traffic-patternchanges can be accommodated in the proposed scalable architecture.Index Terms—Delay guarantees, Internet routers, non-blocking,packet switches, performance analysis, scalability.
THE CLOS circuit switch has been proposed by Closin 1953s at Bell Labs . Fig. 1 shows the connectionsbetween switching elements (SE) in a symmetric Closthree-stage switch. This interconnection rule is: the xth SE insome switching stage is connected to the xth input of each SEin the next stage –. Here, all connections have the samebandwidths. It has been shown that a circuit can be establishedthrough the Clos switching fabric without rearranging existingcircuits as long as the number of SEs in the second stage isat least twice the number of inputs of an SE in the first stageminus 1, i.e., . It has also been shown that acircuit can be established through the Clos switching fabric aslong as the number of SEs in the second stage is no less thanthe number of inputs of an SE in the first stage, i.e., .In the latter case, the number of required SEs and their totalcapacity are smaller due to the fact that the existing circuits canbe rearranged. While the complexity of the switching fabrichardware is reduced, the complexity of the algorithm for acircuit setup is increased. In both cases, non-blocking propertyof the Clos architecture has been proven assuming the specificalgorithms for circuit setup . Various implications of Closfindings have been examined in .The Clos switching fabric can be used for increasing capacityof packet switches as well. The interconnection of SEs would be the same as in the circuit switch case. However, these SEsshould be reconfigured in each cell time slot based on the outputsof outstanding cells. Here, packets are split into cells of afixed duration, which is typically 50 ns (64 bytes at 10 Gb/s).Algorithms for circuit setup in Clos circuit switches cannot bereadily applied in Clos packet switches. First, all SEs should besynchronized on a cell-by-cell basis. Then, an implementationof the algorithm that rearranges connections on a cell-by-cellbasis in SEs of a rearrangeable non-blocking Clos switch wouldbe prohibitively complex . So, the Clos fabric with the largerhardware, , is needed for a non-blocking packet switch.A scheduling algorithm that would provide non-blocking in aClos packet switch would require the higher processing complexitythan its counterpart designed for a cross-bar switch ,. Few heuristics have been proposed to configure SEs inClos packet switches without assessment of their blocking nature, .On the other side, it has been recognized that a Clos packetswitch in which the traffic load is balanced across the SEs providesnon-blocking, i.e., with sufficiently large buffers it passesall the traffic if the outputs are not overloaded. Such architecturehas been described in , . There is a buffering in eachstage of the architecture, and the SEs in the heading stages arebalancing packets over the SEs in the succeeding stages. Turnershowed that the architecture is non-blocking if the traffic ofeach end-to-end session is balanced over the SEs in a Benespacket switch . We prove in a similar way that a threestageClos packet switch based on load balancing is also nonblocking.We focus on the three-stage architecture because it incursa lower delay than the recursive Benes architecture with thelarger number of stages. Advantages of the Clos packet switcheswith load balancing are multifold. First, their implementation issimple: there is no need for the high-capacity shared buffers orcross-bars, there is no need for the cell-by-cell
Download full report