Re: "Layer 5 Switch" project @ Watson

Ben Black (
Sat, 14 Aug 1999 15:10:09 -0700

Foundry has been shipping this for several months.

On Sat, Aug 14, 1999 at 06:09:54AM -0800, Rohit Khare wrote:
> >Until recently the word switching was synonymous with forwarding
> >frames based on link layer addresses. Of late,
> >the definition of switching has been extended to include routing
> >packets based on layer 3 and layer 4 information.
> >Layer 3 switches, also known as IP switches, use IP addresses for
> >network path selection and forwarding. They
> >are fairly commonplace today and are being used as replacements for
> >traditional routers. In addition to layer 2 and
> >3 information, a layer 4 switch examines the contents of the
> >transport layer header, such as TCP and UDP port
> >numbers, to determine how to route connections. Layer 4 switches are
> >slowly making their way into the marketplace
> >and are primarily used as load balancing connection routers for
> >server clusters. Moving one level higher in the
> >protocol stack, we can define a layer 5 switch that uses session
> >level information, such as Uniform Resource Locators
> >(URL) [3], in addition to layer 2-3-4 information to route traffic
> >in the network. In this paper, we share our experience
> >in designing and building a layer 5 switch, which we call L5.
> >Although, the L5 system can potentially be used any
> >where in the network, we mainly focus on its usefulness as a
> >front-end to a server cluster. We explore the value of a
> >content aware session router in a cluster of Web servers and Web caches.
> >As a session aware load balancer for a
> >Web cache cluster, the L5 system effectively partitions the URL
> >space among the cluster nodes,
> Getting closer to what I'd term instead an "application-layer router"
> -- rather than a processor complex, I'd emphasize storage: large
> current queues of available messages to be handed off, with
> lookup-by-message-id as one of the core high-speed functions. That is
> to say, an ALR is an endpoint in itself, not just a load balancer
> along the way. That said, I'm very curious to learn more about their
> project. Notes on their white paper follow this page clipping:
> Rohit
> =================================================
> Title L5: A self-learning layer 5 switch
> Objective The objective of this project is to use session
> level information in addition to layer 2-3-4 information to route
> traffic in the network. Routing traffic using session level
> information is not a new idea. In fact application level proxies
> which are functionally equivalent to the L5 switch, have been around
> for years. The feature that distinguishes the L5 switch from an
> application layer proxy is its superior data handling capability. It
> combines the functionalities of an application layer proxy and the
> data handling capabilities of a switch into a single system.
> Members George Apostolopoulos, Vinod Peris, Prashant Pradhan and Debanjan Saha.
> Reports L5: A Self-learning Layer 5 Switch
> Presentations L5: A Self-learning Layer 5 Switch
> Description The L5 system consists of a switch core to which a
> number of custom built intelligent port controllers are attached. In
> addition, it is equipped with a processor complex. Layer 5 functions,
> such as the parsing of HTTP protocol messages and URL based routing,
> are performed by the processor. The job of the port controllers is to
> identify the packets that require layer 5 processing and forward them
> to the processor. In our design, we make sure that only packets that
> need to be handled by the processor are forwarded to it. The rest of
> the packets are processed by the port controllers. In most common
> scenarios only a very small fraction of the packets are processed by
> the CPU. As a result we can achieve very high speeds while delivering
> useful layer 5 functionality.
> The L5 switch can potentially be used any where in the network.
> However, it is most useful as a front-end to a server cluster. In
> this configuration, the L5 switch intercepts the TCP connection setup
> request from the client and responds by establishing a connection to
> the client. It acts as a proxy for the server cluster reading in as
> much layer 5 information as is needed to make a routing decision.
> Depending on the specific layer 5 protocol involved, it parses the
> layer 5 protocol messages and determines where to route the session
> based on the corresponding layer 5 routing database. After the
> routing decision is made, it sets up a second connection to the
> appropriate server node. Finally, the two TCP connections are
> spliced. After splicing, all packet processing is handled by the port
> controllers leading to very efficient data handling through the
> switch. We are currently exploring the use of the L5 system as a
> session aware load balancer for Web serve and Web cache clusters.
> Unlike content blind load balancers, the L5 switch takes into account
> session level information, such as URLs when routing a connection to
> a cluster node. Consequently, it makes it possible to partition the
> URL space among the server nodes thus improving the performance of
> the server cluster. We are also investigating the use of the L5
> switch for content based service differentiation and filtering.
> Content based service differentiation useful in large e-commerce
> sites to differentiate serious buyers from casual browsers. Content
> based filtering is a useful feature in ISP and corporate access
> gateways..
> ========================================
> >We show that the L5 system can greatly improve the overall
> >throughput of a secure Web server
> >cluster by dispatching SSL connections based on session information.
> I suppose that's at the risk of trusting the L5 Router with all the
> secrets that are so "difficult" to share amongst the cluster nodes.
> [-- as it turns out, this guess was absolutely wrong. Since they
> don't try to share secrets (allow restart to any cluster node),
> there's no additional trust management problems.
> >The search engine implements longest prefix match in hardware.
> It's an off-the-shelf switch core, but that still sounds cool :-)
> >In an application layer proxy, the processor remains on the data
> >path and copies data between the
> >two connections. In the L5 system, the processor gets out of the
> >data path at an opportune moment by splicing the
> >two TCP connections.
> Looks like we *do* differ in terminology. Of course, the difference,
> as Joseph Reagle struggled with, is a matter of agency... does the
> splicer bear any of a proxy's responsibilities? Arguably not...
> >The temporary classifiers are timed out after configurable periods
> >of inactivity. The search engine is
> >equipped with hardware mechanisms to identify inactive
> >classification entries and automatically add them to a list
> >of inactive classifiers. A low priority task processes this list and
> >takes appropriate action.
> The splicing occurs with some nifty tricks that reduce the problem to
> merely renumbering the server sequence number on the way out. Much
> like the TIME_WAIT problem, though, this seems an adhoc way to expire
> the slice, when application-layer information is handy, to compute
> when a connection is absolutely finished.
> They also note that seq#s appear *within* SACK packets, so those need
> some munging too. Otherwise, it's fairly low-overhead.
> >Our results show that the L5 system is capable of
> >handling over 7000 layer 5 sessions per second. Assuming an average
> >transfer size of 15 KB per session, this
> >should be able to sustain the throughput of a Gigabit link.
> >--15 KB is the average transfer size for the SPECweb96 [16] benchmark
> on a 233 MHz PowerPC 603e.
> >We are implementing a more
> >ambitious scheme where the L5 system learns the mapping using an URL
> >Resolution Protocol (URP). In many
> >ways URP is similar to the Address Resolution Protocol (ARP) [12] in
> >the sense that it uses a simple set of request-response
> >messages to resolve the URL to server mapping.
> >Whenever the L5 system sees a new request it multicasts an URP query
> >to the all-server-nodes multicast group.
> >Each server node runs an URP agent that joins this multicast group.
> >On receipt of a URP query the agent checks to
> >see if the requested URL is hosted on this server. If the URL is
> >present, the agent sends a unicast URP response back
> I love the analogy to ARP! -- message-id's are truly the ethernet
> hardware addresses, mapped to URIs and IP#s respectively on a local
> {link, server}.
> They offer some structuring, as well. The URI is seen as a series of
> prefixes (host, path components, etc), and the URP reply is a bitmask
> indicating the longest common prefix that server hosts. So a specific
> request can tag an entry for a range of future requests. (How the
> switch decides "least loaded" to choose between several is another
> question entirely).
> >The L5 system intercepts the client Hello message and extracts the
> >session ID. If the session ID is zero, that
> >means a new session has to be established. Server affinity does not
> >dictate the session routing decision in this case.
> >Instead, load balancing among the cluster nodes is used as the
> >guiding criterion. If the session ID is non-zero, the
> >SSL protocol processor searches its database of session ID to
> >cluster node mappings to determine which server the
> >connection should be routed to. The L5 system builds up the session
> >ID to server node mapping by intercepting the
> >server Hello messages and extracting the session ID set by the server.
> Life is also better if you can direct SSL traffic to the same cluster
> node again and again -- you can use the restart instead of fresh
> handshaking. BUT, since you can't read the message traffic, you can't
> use the url-director for consistency. So they just use the SSL ID,
> just as they used TCP server seq#s earlier.
> >Preliminary measurements indicate that the CPU overhead for routing
> >SSL sessions based on session
> >ID is very close to that of routing HTTP sessions using URLs. We
> >estimate that the L5 system will be able to handle
> >about 7000 HTTPS requests per second.
> I am a little less confident of their hopes for future work:
> >The work presented in this paper can be extended in many ways. We
> >are currently exploring the use of the L5
> >system for content based service differentiation. Content based
> >service differentiation is particularly useful in large
> >e-commerce sites to differentiate serious buyers from casual
> >browsers. Content based service differentiation can also
> >be used to provide service differentiation based on user profiles.
> >Web servers often set cookies to identify users and
> >track session information. The L5 system can make use of the cookies
> >in the HTTP requests to determine the level
> >of service required by a given connection. We are also investigating
> >the usefulness of the L5 system as a content
> >based filter at ISP and corporate access gateways.
> At some point, "routing" won't be about just looking up destinations
> from a fast hash tree keyed by session, URL, or cookie, and you'll be
> back at application-layer proxies for true "content-based" policy
> routing, like Content-Type, QoS, or content selection. But this is
> still smarter than LocalDirector or other L3 hacks to front-end
> server clusters...
> Rohit Khare