Using Infiniband on Azure Kubernetes Service (AKS) for HPC Applications

Courtesy FLSmidth
In the last years there has been a growing interest in extending the use of cloud computing for HPC applications. HPC applications have different properties than enterprise applications, and hence have different infrastructure requirements than the typical enterprise applications. One of the most challenging issues with moving HPC to the cloud is related to the network infrastructure - as the network is often the bottleneck of the scalability for these applications.
Rather than having multiple applications running on a single server, in HPC often a single application runs simultaneously on many servers constantly exchanging messages through MPI. 
For a couple of years UberCloud is adopting managed Kubernetes as the container orchestrator for engineering applications. While Kubernetes provides a standardized interface beyond raw container management, managed Kubernetes solutions, such as AKS, even provide simple interfaces for allocating the cloud infrastructure along with Kubernetes. To improve the situation for HPC,  UberCloud collaborates closely with the cloud vendors. Big thanks to Azure and Google for appreciating our feedback!
On Azure the Infiniband network provides the best networking option for HPC engineering workloads.
With a latency down to 2 microseconds and throughput up to 200 gigabit it outperforms any other network option on Azure.
UberCloud is helping our customers exploit this network on solution stacks such as Azure CycleCloud and VM-based Kubernetes solutions using SUSE’s RKE. But until now, for AKS using Infiniband there was a limitation to 3 nodes only due to a missing setting for node pools. Even as machine sizes were getting large (up to 120 cores per machine) in many setups this is just not enough. Hence we are very pleased to have the new Azure settings which allows to allocate AKS node pools in a single placement group. That way even large node pools can reliably communicate through Infiniband.
What is required for getting Infiniband inside the HPC workload on AKS is as follows:
  • Use host networking in order to access the IB device on the underlying host
  • Have the Infinband driver installed on the host and the device configured. We are using our own daemonset for the task but there are also official Kubernetes operators available for doing that.
  • Have a compatible software stack installed in the container which can exploit the Infinband device
  • Passwordless ssh configured inside the containers and the host names of the nodes is required for most HPC software. Note, since hostnetworking is used the sshd ports being used must be different to the host services (like the host SSH)

Courtesy RTE


At UberCloud we are successfully running the new upcoming AKS node pool feature with Intel MPI, Comsol, and Ansys applications. More application and customer deployments to come.
With single-placement groups, Kubernetes managed containers can give us VM level performance thus allowing us to run the most demanding applications on managed Kubernetes. Other notable features we used in this setup are Azure Teleport (to reduce deployment time) and Azure Files (as shared storage). 
Great thanks to all of the Azure team collaborating on this new crucial feature, especially to Justin Davis, Dennis Zielke, and Kai Neuffer! Hats off! 
Thanks to my UberCloud colleague Ozden Akinci who co-authored this post with me.


Stay in the loop