Cloud Computing

Cloud Computing can be extremely useful in particular situations, but like many new technologies, it often gets over-adopted, and deployed in situations where it's either not appropriate or downright detrimental.

The classic case for Cloud Computing (which is still a good application of Cloud today) is the case of a start-up company. In the simplest case, a new company needs a web site to advertise their business. Purchasing a server for a simple web site will cost several thousand dollars, and there's additional cost for network bandwidth, and support staff to maintain the server. In contrast, a virtual machine on Amazon Web Services (AWS) can be brought up with no capital investment and extremely low operating costs (in fact, at Amazon's lowest level offering, your virtual machine can be free for the first year). Although you still need staff to support the software on the machine, all hardware and network maintenance is provided by Amazon. Furthermore, this is a pay-as-you-go service, so you can shut your machine down and incur no additional charges.

A similar case can be made for established businesses that need single-purpose servers that are outside their business domain, or that may have different reliability or bandwidth requirements. For example, if you run a web server company you probably want to host your own web site. If you run a manufacturing business, your experience is probably more with the computers and related hardware that run your factory floor than with web servers. You can let a web company worry about the hardware maintenance of your web server, and have your IT staff focus on the manufacturing core of your business. The cloud companies typically have better bandwidth than most individual businesses, so they're a good option for customer facing services where response time is critical. Accordingly, it saves bandwidth on the company network for normal company operations.

Beyond small business use cases, cloud is sometimes used by larger organizations for burst capacity. If you run an HPC environment that only occasionally requires many more nodes than you have on-hand, it might be more cost-effective to extend your cluster with nodes from a cloud provider than to buy more capacity that will sit idle most of the time.

You also may to able to use cloud solutions to test new hardware. Currently, Amazon offers images with GPUs, and while GPUs are mostly ubiquitous in HPC clusters these days, you may be one of the "late adopters" and need to justify the purchases to your management. The cloud is a reasonable place to get some base-level benchmarking numbers. Beyond that, hardware that's considered a bit more exotic, like FPGAs, are also now available on Amazon. While Amazon remains the clear leader in the cloud space, the smaller cloud providers have started to differentiate themselves by offering more unique and customized solutions.

Another advantage to cloud computing is "data safety," in quotes because it's ultimately up to you to ensure your data is secure. Nevertheless, the cloud providers are very good at making sure the data they store is available, accessible, and reasonably secure, and they've probably invested in much better hardware than you did. The cloud providers typically have people dedicated to data storage, and they most likely are better at doing their jobs than your poor overworked system administrator who is also responsible for your cluster, infrastructure, training, applications, desktops, etc.

The dark side of Cloud

With all that said, cloud computing isn't really the magical unicorn that it's typically depicted as in the marketing glitz of these large cloud providers. The expenses associated with computing in the cloud can be sometimes be surprising. You're also likely sharing a resource with someone else. Particularly with HPC, it's often not clear how to get a dedicated machine with dedicated network bandwidth (this seems to be a moving target on Amazon in particular, so read the documentation).

In terms of costs, the per-hour charges for cloud computing seem to be surprisingly low. Again, if you're only need the resources periodically, cloud is still a good option. However, if you regularly have 40,000 cores running at 100% utilization, cloud quickly becomes overly expensive. Additionally, you're paying not just for the compute cores, but also ingress/egress charges to move your data to and from the cloud. You'll also be charged for storage to hold that data, for as long as you have the storage allocated to your account (whether your data is there or not). Are you going to offer your data on the Internet? Or even to your local users on a secure connection so it can be browsed online? Expect to pay for network bandwidth charges. All of these charges begin to add up quickly.

When cloud computing first came into vogue several years ago, a lot of companies jumped onto the bandwagon without a lot of thought about the costs, some even proudly declaring themselves a "cloud first" company, meaning all IT projects would be targeted to the cloud instead of local resources. While cloud computing continues to grow as a business, many companies have started to engage in "cloud repatriation" over the past few years.

The reasons for cloud repatriation are varied, but they mostly seem to center on the problem that companies are not getting the full benefits that they were promised with the cloud. Expenses are cheaper in some instances, but not all. You still have to have an IT staff; cloud computing doesn't do the work for you, they just offer the resources to do it remotely. While the connectivity (uptime) of the cloud companies is excellent, accidents do happen occasionally, and huge tracts of resources sometimes go offline. They probably have better uptime than your local resources, but if you have local resources and your Internet connection goes down, local users can still work. Not so with remote resources.

One of the largest reasons for cloud repatriation is probably still poorly planned and managed cloud migrations. Operating in a cloud environment is not the same as operating in your local data center. There's a lot of training for users and operators, a lot of application re-engineering, and a lot of planning to do before you can smoothly and successfully migrate any major or critical application to a cloud platform. There's still an issue with security. While the cloud platforms themselves are remarkably secure, you application and data security is still your responsibility, and security in a cloud environment is a different beast than security on-site. Even if you're successful in making the transition, you may find that the actual costs are still prohibitive.

Should you do cloud? Maybe. But start small, start slowly, and get an idea for what makes sense to move to a cloud platform and what doesn't, and get an idea of the real costs involved. In the end, as the not-so-old saying goes, "there is no cloud, it's just some else's computer."