More and more manufacturers are moving high-performance computing (HPC) workloads to the cloud. They start with an analysis of benefits versus challenges of using cloud resources and a total cost of ownership (TCO) analysis, perform a Proof of Concept (PoC), set up their (hybrid) production environment, and move more and more simulation workloads to the cloud. The TCO analysis starts with the analysis of the company’s current in-house (on-premise) computing environment by summing up all costs and compare this to the cloud TCO.
In a recent whitepaper, we presented guidelines on how to compute the total cost of the in-house computing environment and compare this to the equivalent cloud solution. We illustrated this process with concrete calculations which we have done recently for a manufacturing customer use case. Finally, we analyzed a solution for manufacturers who are interested in moving (part of) their computing workload to the cloud, and performed a detailed cost/benefit analysis for different scenarios: in-house versus cloud versus a hybrid in-house/cloud solution, for actual and future user requirements. Here, we summarize the major results.
Total Cost of Ownership of an On-Premise Compute Cluster
To quote Gigaom, “Total Cost of Ownership, or TCO, is a formula that assesses direct and indirect costs and benefits related to the purchase of any IT component. The goal is a final figure that will reflect the true purchase price, all things considered”.
In an internal 2018 report on Total Cost of Ownership of an HPC system over its lifetime, one of our partners presented the following results of their market analysis (sum = 100%):
With a few assumptions, we achieved the following results based on information from the References included below. We’ll look at a real use case from one of our manufacturing customers with 12,000 employees and $3 Billion in annual revenue:
Existing on-premise cluster: Actual Cloud HPC environment:
Cluster size: 512 cores Size: 1000 core cluster
Usage: ~42k core hrs/week (49% utilization) Usage: ~70k core hours/week
Users: 15 engineers Users: 15 engineers
Based on this information and a detailed cost analysis, we were able to develop the following results for their 512-node on-premise computing cluster (here we skip the detailed calculations that we have done using our internal UberCloud TCO Calculator):
On-Premise 512-core cluster | |||
year 1 | year 2 | year 3 | |
Hardware investment | € 400.000 | ||
Power consumption | € 65.000 | € 65.000 | € 65.000 |
HW Licenses and Support* | € 30.000 | € 30.000 | € 30.000 |
3-year TCO € 685.000 | |||
* Setup and maintenance of CAE tools and engineering workflows not included |
The analysis for a dynamic on-demand compute cluster in the cloud is more straight-forward. All important cost factors are already included in the cloud provider’s per-core-hour cost. For our manufacturing customer, summing up the on-boarding effort, the Azure consumption for an average CPU usage of 70k core hours/week, and the annual subscription for the UberCloud Engineering Simulation Platform for 15 users, results in 227.000 Euro per year or 681.000 Euro TCO for three years, versus the 685.000 Euro on-premise cluster. This comparison shows that in this case of using latest technologies both scenarios - HPC Cloud versus in-house computing resources - result in similar Total Cost of Ownership!
Return on Investment
TCO is not the only factor influencing the buying decision for a traditional in-house cluster versus subscribing to an automated, self-service, on-demand HPC cloud simulation platform. In fact, one should ask, what benefits for engineers, for management, and for the company itself do I get for this total cost. And it can be easily shown that the benefits of using HPC cloud by far exceed the investment. One example, admittedly an extreme one, comes from one of our largest customers, a Global 50 leader in Consumer Electronics, that runs electromagnetic field simulations with CST Studio. Senior Antenna Engineer: “I am 30X more productive since we started to work with UberCloud. Before UberCloud I used a desktop because that’s what I could fit under my desk and manage on my own.”
Let’s try to turn this productivity benefit into numbers. For the sake of simplicity, just consider a 2X productivity increase by running simulations in the cloud and ignore, for a moment, other benefits like ‘faster time to market’, ‘better product quality’, ‘higher flexibility’, ‘opex instead of capex’, etc., by being able to run 2X more simulations. In a comparable on-premise situation, to achieve the same productivity increase of 2X, you would have to set up a second simulation environment: hiring another engineer, providing office, HPC hardware, electrical power, software, support, etc., easily resulting in $200K TCO just in annual personnel and related cost. And that’s what you would save when just setting up a cloud simulation platform for your existing engineer. And 2X is just a very conservative number; many of our customers achieve a productivity gain of 10X and more, by using more hardware for more simulations and/or higher accuracy .
Productivity boost with using cloud computing is not the only business benefit when moving your simulations to the cloud. The following table shows major benefits of using cloud compared to on-premise computing, for our existing customer use case:
On premise HPC | Cloud HPC |
Single site, 512 core HPC system | Global HPC, scales with business needs |
Limited HPC access for engineers | HPC access for all engineering sites |
2 GPUs for compute | 8+ GPUs (V100) for compute |
3-5 years hardware renewal cycles | Seamless update to new HW and SW |
High latency remote visualization | Low latency accelerated cloud viz |
Limited local storage | Global data storage incl. backup |
Local HPC maintenance | UberCloud / Azure global HPC support |
Optimized CAE license storage | |
Minimal costs to add users and use-cases | |
Foundation for corporate digital solution | |
Highly agile, flexible environment that reacts instantly on requirement changes |
Additional details, comments, and conclusions can be found in our recent TCO whitepaper.
UberCloud’s TCO Service for Manufacturers
Based on our TCO calculations, we have developed a TCO calculation service for enterprises moving engineering simulation workloads to HPC cloud and wanting to perform a detailed TCO cost/benefit analysis including in-house vs cloud computing resources.
This cost/benefit analysis starts with looking at the manufacturer’s existing resources and their total cost, and comparing this to an equivalent compute environment in the cloud. Next step is to look at the manufacturer’s future requirements, such as next computing demands and infrastructure, increasing number of engineers and their tasks, applications, software licenses, usage requirements in 1, 2, and 3 years, etc.
According to these requirements, UberCloud will provide a cost/benefit analysis for this and the following years. The result of this study will be a detailed cost/benefit analysis for the different scenarios, for in-house computing resources, for an equivalent cloud hosting solution, for cloud bursting on demand, and for the most cost-effective combination of these three scenarios. Please ask UberCloud for more details at https://www.TheUberCloud.com/help/.
References
[1] David S. Linthicum: Cloud computing’s elusive TCO (total cost of ownership), Gigaom 2014, https://gigaom.com/2014/05/09/cloud-computings-elusive-tco-total-cost-of-ownership/
[2] Nicole Hemsoth: The Cloud Versus HPC Cluster Cost Conundrum, June 2015, http://www.nextplatform.com/2015/06/03/the-hpc-cloud-versus-cluster-cost-conundrum/
[3] Wolfgang Gentzsch: How Cost Efficient is HPC in the Cloud? A Cost Model for In-House Versus In-Cloud High Performance Computing, 2015, http://www.theubercloud.com/cost/
[4] On premise versus cloud computing infographics: http://www.databax.co.uk/blog/the-best-cloud-computing-infographics-and-images-ever#.Vt06X5MrIUE