The Role of Resource Managers in a Cloud First World

Written by Burak Yenier | Dec 10, 2017 3:48:42 AM

We were all accustomed to on-premise clusters managed through our favorite resource manager. Univa GridEngine, IBM LSF, Altair PBS Pro or another. But, since the Cloud came along, our world has been changing.

For a few years, we ignored the Cloud. It was a good fit for enterprise workloads like, CRM, ERP, office automation, “but NOT for HPC!” we exclaimed in unison. Didn’t last long.

The Cloud operators busted down our doors. At the Supercomputing 2017 Conference in Denver, it wasn’t only Amazon AWS and Microsoft Azure. Google Compute, Oracle Cloud, Penguin on Demand, IBM Softlayer and many others were there, too. The Cloud is here to stay and it will continue to change HPC.

If you’ve been sitting on the sidelines and not exploring the Cloud, you are not alone, and this is not your fault. You’ve been told the Cloud is complex, expensive and not secure; and you need a whole new software stack to manage it. At the giant Supercomputing 2017 Exhibition floor, I asked my colleagues: "Does the Cloud mean that we have to learn how to do HPC all over again?"

The good news is that it isn’t as scary as you may think. Providers addressed the cost and security aspects over the years. Yes, you need to do your budget math and yes, you need to decide how to secure your cloud. But the good news is the templates and tools are in place now.

You will not need to learn new ways because most popular resource managers are now Cloud aware. Univa GridEngine, IBM LSF, Altair PBS Pro, all tie Cloud resources to on-premise resources to manage the Hybrid environment.

Univa Resource Manager

I attended a Univa lunch-and-learn, at a quiet hotel conference room near the Supercomputing 2017 show floor. I learned that Univa® UniCloud® is the solution for organizations experiencing increasing volumes of workloads. UniCloud dynamically adjusts Cloud usage according to rules you define. UniCloud monitors workloads queuing up in your on-premise Univa Grid Engine® resource manager. Then, sends eligible workloads to a your Cloud provider, such as Microsoft Azure.

I noticed that this feature set it becoming quite common. IBM, Altair and others built similar features. Plus, the list of Cloud providers the resource managers support is getting longer.

Why was Cloud support by resource managers important for HPC users?

Because, resource managers tie HPC workloads together. This is how we abstract the complexity of HPC workloads.

Every engineer uses applications with a variety of compute requirements. Some applications require distributed memory via MPI, where others are single threaded. Workloads also differ in their need for hardware requirements. For example some workloads run faster with the aid of a co-processor.

Managing hybrid infrastructures and assigning the most appropriate resources to each workload is what resource managers are very good at. A resource manager also good at tying workloads together to form a workflow.

Cloud is here. Popular resource managers are supporting the Cloud and making it possible to mix on-premise compute resources and Cloud resources to achieve a Hybrid infrastructure. Your turn now.

View full post