Monday, January 2, 2012

10 Gigabit Ethernet is Ready For Your Cluster


!±8± 10 Gigabit Ethernet is Ready For Your Cluster

Say "cluster" and try to keep your mind from images of massive, government-funded scientific applications or herds of caffeine-fueled grad students. Pretty tough. But in fact, the vast majority of high performance computing (HPC) clusters are nowhere near large enough to qualify as massive, are used in commercial environments, and run on Gigabit Ethernet interconnects. Even within the TOP500® Supercomputer Sites the number of clusters running Gigabit Ethernet is more than double the number of clusters running InfiniBand. Certainly, higher speed and lower latency would be nice for any installation. But the performance requirements for most applications just don't merit the high cost and labor-intensive maintenance of InfiniBand.

What most Gigabit Ethernet HPC sites could really use is an upgrade to 10 Gigabit Ethernet (10GE)-if it could be done cost-effectively and reliably. Until now, that idea would generate hesitation and skepticism among knowledgeable decision-makers. But with Gigabit Ethernet already entrenched in the HPC market and providing a slew of advantages, only a few obstacles have prevented the widespread growth of 10GE. Those obstacles are quickly evaporating. With recent technology advances, pricing improvements, and proven vendors entering the market, the choice of 10GE for HPC clusters has become quite attractive.

Understanding 10GE
Understanding the environment for 10GE merits a little history. Although Ethernet has been around for three decades, the technology remains viable because it has evolved over time to meet changing industry requirements. Widespread Ethernet adoption began when the IEEE established the 10 Mbps Ethernet standard in 1983. That standard evolved to Fast Ethernet (100 Mbps), Gigabit Ethernet (1000 Mbps), and 10 Gigabit Ethernet, with 40 and 100 Gigabit standards coming soon. In fact, discussions have started about Terabit Ethernet-a million Mbps-a speed that was hard to imagine just a few years ago.

Despite this evolution, the basic Ethernet frame format and principles of operation have remained virtually unchanged. As a result, networks of mixed speeds (10/100/1000 Mbps) operate uniformly without the need for expensive or complex gateways. When Ethernet was first deployed it could easily be confused with true plumbing-it was coaxial tubing which required special tools even to bend it. As Ethernet evolved it absorbed advancements in cabling and optics, changed from shared to switched media, introduced the concept of virtualization via VLANs, and incorporated Jumbo Frames and many other improvements. Today Ethernet continues to evolve with sweeping changes such as support for block-level storage (Fibre Channel over Ethernet).

Ratified in 2002 as IEEE 802.3ae, today's 10GE supports 10 Gigabits per second transmission over distances up to 80 km. In almost every respect, 10GE is fully compatible with previous versions of Ethernet. It uses the same frame format, Media Access Control (MAC) protocol, and frame size, and network managers can use familiar management tools and operational procedures.

Ethernet Advantages for HPC
The fact that more than half of the TOP500 Supercomputer Sites and almost all smaller clusters run Ethernet is no surprise when you look at the benefits this technology offers:
o High Comfort Level: As a widely-used standard, Ethernet is a known environment for IT executives, network administrators, server vendors, and managed service providers around the world. They have the tools to manage it and the knowledge to maintain it. Broad vendor support is also a plus-almost all vendors support Ethernet.

o Best Practices: High availability, failover, management, security, backup networks, and other best practices are well-established in Ethernet and their implementation is widely understood. This is another example of the wide acceptance and vendor support for Ethernet. (Good luck finding an InfiniBand firewall, for example!)

o Single Infrastructure: Ethernet gives HPC administrators the advantage of a single infrastructure that supports the four major connectivity requirements: user access, server management, storage connectivity, and cluster interconnect. A single infrastructure is easier to manage and less expensive to purchase, power, and maintain than using a separate technology for storage or for the processor interconnect.

o Lower Power Requirements: Power is one of the biggest expenses facing data center managers today. New environmental mandates combined with rising energy costs and demand are forcing administrators to focus on Green initiatives. Ethernet is an efficient option for power and cooling, especially when used in designs that reduce power consumption.

o Lower cost: With new servers shipping 10G ports on the motherboard and 10G switch ports now priced below 0, 10GE has a compelling price/performance advantage over niche technologies such as InfiniBand.

o Growth Path: Higher-speed Ethernet will capitalize on the large installed base of Gigabit Ethernet. New 40GE and 100GE products will become available soon, and will be supported by many silicon and equipment vendors.

For those applications that could benefit from higher speeds, 10GE offers even more benefits.
o More Efficient Power Utilization: 10GE requires less power per gigabit than Gigabit Ethernet, so you get ten times the bandwidth without ten times the power.

o Practical Performance: 10GE can obviously move data 10 times faster than Gigabit Ethernet, but due to the new generation of 10GE NICs it also can reduce latency between servers by about 8 times.

This bandwidth and latency gain translates into higher application performance than you might imagine. For molecular dynamics (VASP running on a 64 core cluster) the application ran more than six times faster than Gigabit Ethernet and was nearly identical to InfiniBand DDR. In a mechanical simulation benchmark (PAM CRASH running on a 64 compute core cluster), 10GE completed tasks in about 70 percent less time than Gigabit Ethernet, and was equal to InfiniBand DDR. Similar results have been observed on common HPC cluster applications such as FLUENT and RADIOSS, and more test results are coming in with similar results.

These benchmarks are impressive. Vendors love talking about microseconds and gigabits per second. But the real advantage in commercial applications is the increase in user productivity, and that's measured by the clock on the wall. If computations run 70 percent faster, users can be 70 percent more productive.
The advantages of 10GE have many cluster architects practically salivating at the prospect of upgrading to 10GE, and experts have been predicting rapid growth in the 10GE for cluster market for years. That hasn't happened-yet.

Obstacles Eradicated
Until recently, 10GE was stuck in the starting gate because of a few-but arguably significant-problems involving pricing, stability, and standards. Those problems have now been overcome, and 10GE has taken off. Here's what happened.

o Network Interface Cards (NICs): Some early adopters of 10GE were discouraged by problems with the NICs, starting with the price. Until recently, the only NICs available for 10GE applications cost about 0 and many users prefer to use two of them per server. Now server vendors are starting to add an Ethernet chip to the motherboard-known as LAN-on-Motherboard (LOM)-instead of using a separate board. This advance drops the cost to well under 0 and removes the NIC price obstacle from 10GE. Standalone NIC prices are now as low as 0 and will continue to drop as LOM technology lets NIC vendors reach the high volumes they need to keep costs down.

Another NIC-related obstacle was the questionable reliability of some of the offerings. A few of these created a bad initial impression of 10GE, with immature software drivers that were prone to underperforming or even crashing. The industry has now grown past those problems, and strong players such as Chelsio, Intel and Broadcom are providing stable, reliable products.

o Switch Prices: Like NICs, initial 10GE switch prices inhibited early adoption of the technology. The original 10GE switches cost as much as ,000 per port, which was more than the price of a server. Now list prices for 10GE switches are lower than 0 per port, and street prices are even lower. And that pricing is available for embedded blade switches as well as the top of rack products.

o Switch Scaling: A market inhibitor for large clusters was how to hook switches together to create a nonblocking cluster. Most clusters are small enough that this is not an issue. For larger clusters, CLOS technology for scaling Ethernet switches provides a solution, and is starting to become established in the market.

o PHY Confusion: Rapid evolution of the different fiber optic transceiver standards was a stopper for customers. Standards defining the plug-in transceiver quickly changed from XENPAK to X2 to XFP to SFP+, with each bringing smaller size and lower cost. But because each type of transceiver has a different size and shape, a switch or NIC is only compatible with one option. Using multiple types of optics would increase data center complexity and add costs such as stockpiling additional spares. With visions of Blue-ray versus HD-DVD, VHS versus Betamax, and MS-DOS versus CP/M, users were unwilling to bet on a survivor and shunned the technology as they waited to see which way the market would move.

Eventually, the evolution culminated in SFP+. This technology is specified by the ANSI T11 Group for 8.5- and 10-Gbps Fibre Channel, as well as 10GE. The SFP+ module is small enough to fit 48 in a single rack-unit switch, just like the RHJ-45 connectors used in previous Ethernet generations. It also houses fewer electronics, thereby reducing the power and cost per port. SFP+ has been a boon to the 10GE industry, allowing switch vendors to pack more ports into smaller form factors, and lowering system costs through better integration of IC functions at the host card level. As a result, fewer sparks are flying in the format wars, and the industry is seeing a very rapid convergence onto SFP+.

o Cabling: Many users have been holding out for 10GBase-T because it uses a common RJ45 connector and can give the market what it's waiting for: simple, inexpensive 10GE. But the physics are different at 10GE. With current technology, the chips are expensive, power hungry, and require new cabling (Cat6A or Cat 7). 10GBase-T components also add 2.6 microseconds latency across each cable-exactly what you don't want in a cluster interconnect. And as we wait for 10GBase-T, less expensive and less power-hungry technologies are being developed. 10GBASE-CX4 offers reliability and low latency, and is a proven solution that has become a mainstay technology for 10GE.

Making the wait easier is new SFP+ Copper (Twinax) Direct Attach cables, which are thin, passive cables with SFP+ ends. With support for distances up to 10 meters, they are actually ideal for wiring inside a rack or between servers and switches that are in close proximity. At an initial cost of to and with an outlook for much lower pricing, Twinax provides a simpler and less expensive alternative to optical cables. With advances such as these, clarity is overcoming confusion in the market. The combination of SFP+ Direct Attach cables for short distances, familiar optical transceivers for longer runs, and 10GBASE-CX4 for the lowest latency, there are great choices today for wiring clusters.

When the Cluster Gets Larger
Until this point we've talked about how the barriers to 10GE adoption have been overcome for the many HPC clusters that use Gigabit Ethernet. Now let's look at the possibility of bringing the benefits of 10GE to much larger clusters with more demanding requirements. Those implementations require an interconnect that provides sufficient application performance, and a system environment that can support the rigorous hardware challenges of multiple processors such as heat dissipation and cost-effective power use.
Examining the performance question reveals that some HPC applications that are loosely coupled or don't have an excessive demand for low latency can run perfectly well over 10GE. Many TCP/IP-based applications fall into this category, and many more can be supported by adapters that offload TCP/IP processing. In fact, some TCP/IP applications actually run faster and with lower latency over 10GE than over InfiniBand.

For more performance-hungry and latency-sensitive applications, the performance potential of 10GE is comparable to current developments in InfiniBand technology. InfiniBand vendors are starting to ship 40 Gig InfiniBand (QDR), but let's look at what that really delivers. Since all InfiniBand uses 8b/10b encoding, take 20 percent off the advertised bandwidth right away-40 Gig InfiniBand is really 32 Gig, and 20 Gig InfiniBand is really only capable of 16 Gig speeds. But the real limitation is the PCIe bus inside the server-typically capable of only 13 Gigs for most servers shipped in 2008. Newer servers may use "PCIe Gen 2" to get to 26 Gigs, but soon we will begin to see 40 Gigabit Ethernet NICs on faster internal buses, and then the volumes will increase and the prices will drop. We've seen this movie before-niche technologies are overtaken by the momentum and mass vendor adoption of Ethernet.

In addition, just as Fast Ethernet switches have Gigabit uplinks, and Gigabit switches have 10 GE uplinks, it won't be long before 10 Gigabit switches have 40 and 100 Gigabit links to upstream switches and routers. And you won't need a complex and performance-limiting gateway to connect to resources across the LAN or the wide area network. At some point, 10, 40, and 100 Gigabit Ethernet will be the right choice for even the largest clusters.

What's Important: Application Performance
One Reuters Market Data System (RMDS) benchmark (stacresearch.com) that compared InfiniBand with a BLADE Network Systems 10GE solution showed that 10GE outperformed InfiniBand, with significantly higher updates per second and 31 percent lower latency (see Figure 1 and Figure 2). These numbers demonstrate the practical benefits of 10GE far more conclusively than the micro-benchmarks of the individual components.

Practical Considerations
Switches can come in many sizes and shapes, and new, more efficient form factors are emerging. Blade servers can be used to create an efficient and powerful solution suitable for clusters of any size, with the switching and first level of interconnection entirely within the blade server chassis. Connecting server blades internally at either 1 or 10 Gigabits greatly reduces cabling requirements and generates corresponding improvements in reliability, cost, and power. Since blade servers appeared on the scene a few years ago, they have been used to create some of the world's biggest clusters. Blade servers are also frequently used to create compact departmental clusters, often dedicated to performing a single critical application.

One solution designed specifically to support the power and cooling requirements for large clusters is the IBM® System x(TM) iDataPlex(TM). This new system design is based on industry-standard components that support open source software such as Linux®. IBM developed this system to extend its proven modular and cluster systems product portfolio for the HPC and Web 2.0 community.

The system is designed specifically for power-dense computing applications where cooling is critical. An iDataPlex rack has the same footprint as a standard rack, but has much higher cooling efficiency because of its reduced fan air depth. An optional liquid cooled wall on the back of the system eliminates the need for special air conditioning. 10GE switches from BLADE Network Technologies match the iDataPlex specialized airflow, which in turn matches data centers' hot and cold aisles and creates an integrated solution that can support very large clusters.

Blade servers and scale-out solutions like iDataPlex are just two of the emerging trends in data center switching that can make cluster architectures more efficient.

A Clear Path
The last hurdles to 10GE for HPC have been cleared:
o NIC technology is stable and prices are continuing to drop while latency and throughput continue to improve, thanks to improved silicon and LAN-on-Motherboard (LOM) technology.

o 10GE switches are now cost-effective at under 0 per port.

o The combination of SFP+ Direct Attach cabling, SFP+ optics, and 10GBASE-CX4 provides a practical and cost-effective wiring solution.

o New platforms are being introduced with power efficiency and cooling advances that can meet demanding HPC requirements, even for large clusters.

o New benchmarks are proving that 10GE can provide real business benefits in faster job execution, while maintaining the ease-of-use of Ethernet.

o Blade server technology can support 10GE while meeting the demanding physical requirements of large clusters.

With Gigabit Ethernet the de-facto standard for all but the largest cluster applications and the last hurdles to 10GE for HPC cleared, it's time to re-create the image of the HPC network: standards-based components, widely-available expertise, compatibility, high reliability, and cost-effective technology.


10 Gigabit Ethernet is Ready For Your Cluster

Memory Flash Card On Line




No comments:

Post a Comment


Twitter Facebook Flickr RSS



Fran�ais Deutsch Italiano Portugu�s
Espa�ol ??? ??? ?????







Sponsor Links