CFDesign for clusters

11 September 2009

Armed with a brand new solver technology, CFdesign can now use clusters to accelerate solve times. Greg Corke reports on how this move to High Performance Computing (HPC) will change the game for desktop CFD

Over the past ten years CFD (Computational Fluid Dynamics) has changed from being a niche application for specialists to one that is also easily accessible to non-expert designers and engineers. These desktop CFD applications work hand in hand with CAD software and enable CFD to be used to help drive the actual design process, rather than just being used as a verification and validation tool.

Blue Ridge Numerics, a developer of one of the leading desktop CFD applications, calls this upfront CFD. Its CFdesign software is Windows-based and works alongside all the mainstream 3D CAD systems, including Inventor, SolidWorks, Solid Edge, Catia, Pro/Engineer, SpaceClaim, NX, and CoCreate Modelling.

For a more elegant cluster Cray’s new CX1-LC deskside is an all in one supercomputer

The software is used in equal measure for electronics cooling and mechanical flow, and is put to work on all manner of fluid dynamics projects including valves, manifolds, ducts, and diffusers, simulating a variety of fluids including water, air, and petrol.

By bringing CFD into the hands of the non-expert user, CFdesign started out being used on relatively simple problems, but this has changed over time, as Derrek Cooper, product manager, CFdesign, explains. “A few years ago people were so excited just to be able to run flow in a little valve out of their CAD ystem, but now people want to do a lot more,” he says. “The models are bigger and they want to run more of them. Now people are doing huge ventilation systems with multiple floors and multiple rooms.”

With this ever increasing complexity of problems, last year it became clear that more performance was required to reduce the solve time of its CFD simulations. However, because CFdesign had always been a desktop Windows application, its solver code was multithreaded and was only designed to run on a single workstation. And while it would work with multi-core workstations, the limitations of the then current Intel Front Side Bus (FSB) architecture (Core 2 Duo), meant that users would only experience a 1.3 or 1.4 times performance boost with a quad core chip over a single core CPU.

Blue Ridge Numeric’s solution was to re-write its solver code to work with High Performance Computing (HPC) clusters, which are essentially a collection of computers joined together with high-speed interconnects.

Clusters are commonly used with more traditional CFD packages intended for dedicated analysts – such as STAR-CD from CD-adapco or Ansys CFX – as the datasets are much bigger. Here, complex aerospace and automotive problems of biblical proportions are typical with simulations often taking many hours, if not days.

Code breaking

Re-writing CFD solver code to work on a cluster is not a trivial task. It means the solver needs to be changed from a multithreaded structure (designed to work with multiple CPU cores in a single machine) into an MPI (Message Passing Interface) structure (designed to work with multiple machines in a distributed environment).

Derrek Cooper explains the differences between the two and the limitations of multi-threaded code in relation to CFdesign. “With multithreaded code, one solver (a CFD calc) tries to spread its information over multiple CPU cores. Eventually it will saturate and you won’t get any additional speedup,” he says. “With the MPI approach you get multiple CFD calcs all running independently of each other and the interface collects the information and knows how to spread it intelligently over multiple machines. In this case the speed up is substantial.”

With CFdesign and all its supporting CAD applications, running on the Windows platform, developing a solution for a Linux cluster was not seen as the most suitable option. Instead, Windows HPC Server 2008, Microsoft’s second-generation HPC cluster technology, was preferred.
For Blue Ridge Numerics this was easier than developing and supporting a whole new set of Linux technologies. It also meant that users would feel more comfortable working in an environment that was familiar to them.

A mini CFD cluster

The next challenge for Blue Ridge was to make the hardware affordable and easy enough for non-expert users to put together. “We recognised very quickly that people weren’t going to spend $50,000 on computers just to get some speed up in CFdesign,” explains Derrek Cooper. “So what we focused on primarily was leveraging the workstation and expanding that into a cluster environment.”

Dell workstation cluster

A cluster built from standard Dell workstations, InfiniBand cards and dedicated cabling

Blue Ridge chose to develop its HPC solution for a mini cluster with two, three or four workstations (nodes) hooked together with fast interconnectors. It initially tested with Core 2 Duo-based Dell and HP workstations running Windows HPC Server 2008 side-by-side and with certain models experienced performance increases of up to 4.0 with two nodes and 5.5 with four.

Each machine was fitted with high-speed InfiniBand PCI cards and connected with an InfiniBand cable. With complex problems holding huge amounts of data, the speed of this interconnect is vital to the overall performance of the cluster.
“You could hook up the workstations with Gigabyte Ethernet, but the performance will drop off substantially,” stresses Derrek. “It might save you a few hundred dollars by not buying InfiniBand but you might as well not bother [making a cluster] because the amount of information that needs to pass between the machines is tremendous.”

Blue Ridge Numerics wanted its solution to be easy enough for non-expert users to build a mini-cluster and particularly in these credit crunch times, this may be an attractive solution. However, ready-built systems are also available from the likes of Dell and specialist hardware manufacturers.
If trailing cables infuriate you, mini clusters are also available in a single compact chassis with all the nodes and interconnects stored internally. While these solutions cost more, Cray’s new CX1-LC deskside supercomputer, for example, looks to be a very elegant solution.

The Nehalem advantage

At the tail end of last year and in the middle of Blue Ridge’s development cycle, Intel unveiled a brand new CPU architecture code-named Nehalem. This has now been brought to market with the Core i7 and Xeon 5500 series processors. At the heart of this new architecture was a change in the way the chip accesses memory. Instead of the CPU communicating with the memory via the Front Side Bus (FSB), Nehalem receives data directly from the system RAM. This has had a huge impact on performance for CFdesign, which supports Nehalem (Core i7) in the new version, CFdesign 2010, due to be released this month.

When running multithreaded code on the older generation Core 2 Duo workstations, performance peaked at two cores on a single workstation (node) with a 1.3 or 1.4 boost over a single core. With Nehalem (Core i7) though, because the memory talks to the chip directly, it is able to run MPI code on a single machine very efficiently. So efficiently, in fact, that according to Blue Ridge, with four cores in a single machine (node) the performance boost is in the order of 2.3 and with eight cores it’s in the order of 3.0. The downside is that when you move to a cluster-based solution, the additional benefits are much smaller (see Figure 3).


Graph showing relative performance of CFDesign’s new HPC Solver under different hardware and node/core combinations

Using CFD as an integral part of the product development process is the foundation on which CFdesign is built. But having to wait for results to come back can seriously hinder the efficiency of this practice.

By introducing an HPC solution, Blue Ridge is not only providing a way of slashing solve times, but enabling users to do many more iterations to help come to a better solution.
The Windows-based setup of the system is likely to appeal to smaller companies who may not have in-house Linux expertise. However, the introduction of Nehalem (Core i7) with its excellent performance with MPI code in a single workstation, means that a cluster may not even need to be built. And while some users will always need clusters to reduce solve times to their absolute minimum, with eight core CPUs coming soon this will make single workstations even more powerful.

Unfortunately being able to take advantage of such processing power inside a single workstation is not free. Blue Ridge considers anything over four cores to be a HPC system, regardless of where these cores are located, and an additional cost is levied. While this is often a contentious issue in the CFD community, when Blue Ridge sets the fee in the forthcoming release of CFdesign 2010, it is likely to be a small price to pay for the potential to transform the role CFD plays in the product development process.


Comments on this article:

Leave a comment

Enter the word you see below: