Dr StrangeCloud or: How I learned to stop worrying and love multicloud

January 31, 2023

Face it, you are already multicloud, so there is no sense in hating the term. This article aims to define multicloud in a way that doesn’t suck.

Introduction

You hear and read a lot about multicloud online. Some praise it as the solution to all problems, heralding it as the future of cloud computing, while others vilify it and pray you stay away from this approach. The problem is that definitions of multicloud vary depending on who is talking about it.

Most common definitions advocate using multiple Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) providers seamlessly for redundancy. These definitions don’t include Software as a Service (SaaS) like G Suite or GitHub, and the hype usually revolves around avoiding lock-in, being cloud agnostic, and making yourself more resilient.

Other definitions cast a wider net, saying that using any combination of SaaS, IaaS, and PaaS from different providers constitutes multicloud. In this sense, you may argue that the vast majority of companies are already doing it. We agree with this definition, but feel it lacks nuance because it doesn’t explicitly describe the benefits of multicloud or how to do it well.

This article isn’t about convincing you to avoid using multicloud because we believe it is inevitable. Instead, we aim to define multicloud in a way that anyone can understand how do it well and safely.

What is multicloud?

Multicloud is the strategic use of various public and private cloud providers to help optimize costs, performance, and usability, while increasing the scope of features and regions available. Strategic use means picking IaaS, PaaS, or SaaS offerings that can be integrated seamlessly, and avoiding the creation of one or more single points of failure. It also means abstracting complexity away from the engineers using it, by providing a common interface that’s intuitive, easy to use, and consistent.

Multicloud the wrong way

We deliberately keep these motivations outside of our definition of multicloud. This is because using them as a justification generally yields overcomplicated setups that have no meaningful advantages over a single cloud setup.

Multicloud as a means of redundancy and resilience

Some companies venture into a multicloud strategy because they feel that having multiple cloud providers will result in more uptime and disaster recovery.

The reality is that you don’t need to have multiple clouds for redundancy and a robust disaster recovery strategy. All the major players like GCP, AWS, DigitalOcean, CloudFlare, and Azure are extremely robust. Downtimes per zone are rare, downtimes per region are very rare, and global (multiregion) downtimes are extremely rare. Even when they do happen, it is in the best interest of your provider to act fast and fix them much quicker than you could.

Any company worth its salt has realistic Service Level Objectives (SLOs), so they can manage their customer’s expectations accordingly with reasonable Service Level Agreements (SLAs). 100% SLOs are not an achievable goal, since the closer you get to reaching it, the more costly it gets. Not to mention the fact that it’s actually impossible to reach a 100% SLO.

Having unrealistic SLOs also impairs your ability to innovate because your aim will always be to play it safe, this can lead to more downtime than if you set sensible SLOs. For example, imagine a situation where you don’t update a system to avoid downtime, and so eventually you become a zero day attack target.

While you should still take every measure available to mitigate downtime, you don’t need multiple clouds in order to achieve this. Instead you can leverage the multiregional and redundancy capabilities offered by your cloud provider.

Opting for a multicloud strategy as a way to achieve redundancy and increase resilience means deepening the scope of understanding for all your clouds. This can be very taxing on all engineers involved, or force you to hire additional talent.

Downtime is most often caused by human error. So if you have multiple clouds, the chances of making a mistake and the time to resolution is likely to increase. For example, imagine having an issue and needing to figure out which is the affected cloud.

The bottom line is, downtime is much more likely to be your fault than your cloud provider’s. Even if adding more clouds gives you 0.01% more uptime at a much higher cost, there is no guarantee that you’ll actually achieve it.

If after considering all these factors, you have done your math and research, and come to the conclusion that doing multicloud for redundancy is good for your organization. We recommend you carefully consider the following points.

Centralizing services that become a single point of failure

One of the biggest downsides of multicloud is that you then need to contend with multiple control planes, observability panes, APIs, Terraform providers, etc.

To overcome this, some companies naturally look for solutions that help them take the reins of all of clouds under one control pane, which ironically makes them more vulnerable to downtime.

We believe there is one wrong way and one right way to centralize your cloud control:

The wrong way: Centralize services that become a single point of failure senselessly. This includes services that if hacked, would provide access to all your clouds and services.
Right Way: Centralize observability and tools that improve the user experience as a whole.

A centralized service spanning all clouds is fine, it may even be necessary to keep things easy and your sanity in check. However, much care must be taken during implementation to ensure maximum security and resilience, and avoid centralizing services that end up as needless single points of failure.

For example, one company I consulted for decided to force the use of Azure DevOps on all teams across all clouds to deploy infrastructure. As you can imagine, this resulted in every private key for each cloud being stored in the same place. Another company insisted on having one container registry for all clouds. In doing so they thought they could centralize security scanning, even though all major providers offer this by default and their clouds operated mostly independently.

In summary: If one of the reasons you’re considering multicloud is redundancy, but then insist on centralizing it in a way that compounds your single points of failure, you're actually achieving the opposite at a much higher cost.

If you want to centralize anything across all clouds, it’s important to ensure you don’t create a single point of failure. We recommend observability tools like cloud logging or a Security Information and Event Management (SIEM) as suitable ways to to centralize. Complex tools and programming languages that have a steep learning curve, such as Terraform for all your IaC, also make sense as a standardization solution. Just avoid making them a single point of failure.

If you opt to deploy seamlessly across three clouds for redundancy, compliance, or geolocation, tools like Anthos can make your life easier. Anthos not only enables you to manage all your Kubernetes clusters under a single control plane, it also brings the GKE ease of use to places where using Kubernetes is more convoluted — such as AWS. Humanitec can also help here, as we’ll discuss later in this article.

So to recap, we advise against convoluted multicloud strategies simply because there are tools that may help simplify things. That said, great as these tools may be, the reality is that using many clouds will always be more cumbersome and complex than using just one.

Doing it for security

Picking a multicloud strategy on the basis that it’s more secure is disingenuous. All three major providers (GCP, AWS, Azure) are extremely secure and resilient, so it's generally down to your configuration and how secure your workloads are. Very rarely will the issue be with the cloud provider itself.

In fact I can think of three scenarios when a multicloud strategy can be less secure than using multiple cloud providers:

Connection between cloud providers is sent over the open internet. This is not a problem with modern encryption even if the data is intercepted, but if someone manages to get hold of your keys, then it becomes a problem. Whitelisting access to your cloud provider based on IP is not always simple, as those IPs change in the case of buckets and other services. Although you can get around this with direct connections and in other ways, it’s alway a hassle.
Critical services are badly centralized, which are then left vulnerable to hackers or downtime (see above).
Attempting to keep everything within a secure perimeter. For example if you use GCP alone, it doesn’t make sense to keep everything under a security perimeter with their BeyondCorp offering. Whereas using multiple clouds and hybrid becomes more challenging (although not impossible).

If security is a major multicloud driver, you may want to instead consider segmenting your infrastructure with different organizations, accounts, VPCs etc within your cloud provider. Should you choose this path, using multicloud is unnecessary.

To benefit from any additional security from a hybrid or multicloud approach, you must be sure to control all data by keeping some of it on-prem. This also applies if you operate in a country with no rule of law, or where you can’t trust that the government cannot access your data.

Doing it just to avoid vendor lock in (AKA to be cloud agnostic)

There is no such thing as a lock-in free cloud agnostic setup. Being completely cloud agnostic would cost an incredible amount of effort that may diminish any advantage from doing it in the first place.

That’s not to say that you should always opt for proprietary solutions. For example, using Kubernetes is a better bet than going with ECS, due to it being open source and supported in all clouds. It also means that when running your workloads into containers, moving them around to different platforms will be far easier.

And remember, it’s not always the cloud provider that creates the problem. The choices you make when designing applications and infrastructure can lock you in too, which becomes your issue, not the cloud provider.

One way to overcome this is when designing monolithic applications, aim to containerize your workloads with containers using the 12 factor apps method. This, together with picking products compatible with open source databases, will give them greater portability should you ever need to move.

It’s worth noting that some cloud providers have worked to become more vendor agnostic than others. For example, GCP aims to open source as much as possible, and features projects such as Kubernetes, Tensorflow, etc. Whereas Amazon offers very little in the way of open source. That’s not to say it’s impossible to avoid the dreaded lock-in with AWS, because they have versions of these tools running in their cloud too. So instead of using very specific AWS products like ECS, you can opt for EKS instead.

All that being said, there are always ways that can lock you into your cloud, whether it be your databases, interaction with IAM, Zero Trust security model, or your load balancers. Having multiple clouds is unlikely to solve this issue and may make it worse in some cases. It is ultimately down to you to design your applications to be as agnostic as possible.

Doing it for the Hype

Before embarking on your multicloud journey, we highly recommend doing your due diligence and researching the topic ad-nauseam. If you base your company’s engineering on hype (see all points above), you could end up following an industry narrative that doesn’t apply to your current reality, or that’s largely impractical. Each business is different with unique needs, and what works for others may not work for yours. It’s worth taking your time to determine whether multicloud is right for you. To keep things simple, here are some key considerations:

Ensure you understand your company’s needs and infrastructure, including the way your engineers work now. Talk to as many people in your organization as possible to set realistic goals.
Define what multicloud means to you based on what you learn, this will determine the scope of your reach and manage the expectations for engineering.
Determine your current cloud costs and see whether they can be mitigated by better practices and engineering, instead of using multiple clouds or going hybrid.
Read the pros and cons of doing multicloud (including this article :-))
Explore the services available in all clouds and assess their ease of use, functionality, and price, to determine if they are a better option for you.
Research options to manage centrally without adding bottlenecks or single points of failure.
Keep an open mind and make a well-informed decision.

Even if you have no plans for expanding your multicloud offering, we still suggest following this process. At the very least it’s a great way to explore all available options for your company, and potential opportunities for improvement.

Key multicloud benefits and drivers

Let’s next cover the reasons why and how multicloud might be a good choice for you. These are the reasons we have used as the foundation for our definition of multicloud.

Cost, compatibility, functionality and usability

What if you could stick to one cloud as your main provider, while strategically cherry picking the best offerings from others based on usability, compatibility, functionality, and price?

For example, it is common to use Cloudflare products in combination with GCP or AWS. Tools are generally cheaper and more resilient when it comes to thwart DDoS attacks, and serve a global distribution.

Cloudflare also recently has released their own buckets (R2) that have free egress data and are fully compatible with AWS S3. In this particular instance, if you’re paying an exorbitant amount of egress fees with AWS to serve data to your customers, it would seem like a no brainer to move them to CloudFlare. Especially when moving data doesn’t require a gigantic amount of engineering work.

You should also aim to leverage SaaS offerings as much as possible, since using tools like GitHub and GitLab SaaS are a lot less hassle than doing it yourself. Similarly, choosing tools like G Suite for your email and doc collaboration make more sense as a way to mitigate the cognitive load in your organization. As a bonus they are also very easy to integrate with your main cloud provider. Strategically picking PaaS and IaaS offerings over other clouds doesn’t remove the focus from your main provider, since you are only leveraging small offerings as they suit your needs. It also means you can take small forays into other vendors without being too overwhelmed by complexity.

The main idea here is to be smart about your decisions and do your research. If tools are relatively easy to use and can integrate easily with your current cloud, it makes sense to follow this approach, simply stick to using whatever your cloud already offers.

Compliance and geolocation

Perhaps your cloud provider does not comply with government regulations for the type of workload you need. Or you need to geolocate data or servers to a place that’s out of reach for your current provider. In such cases, a multicloud approach may be more beneficial to you. Most big cloud providers offer services all over the world and are largely compliant with the relevant certifications, so this is not a problem for most companies.

Inheriting additional clouds

If you have inherited a department, or your company acquires another company that has all of its workloads in a different cloud provider, you may want to stay there and come up with a strategy to manage both offerings. In this case you'll need to assess what would be best for you.

If there is no difficulty moving workloads in either direction, then perhaps it’s wiser to consolidate both clouds into one. Especially if there isn’t enough in-house cloud expertise to cover both cloud providers. Otherwise it may be best to let both companies operate independently, while only centralizing observability or any compliance related reasons.

When to use on-prem: control, latency and cost

Considering all the rage about the cloud and that most companies are moving to it, talking about on-prem may feel like a discussion on ancient history. However many companies are actually moving at least some of their workloads back to their own managed infrastructure.

Hey/Basecamp is one such business that recently decided to leave the cloud, due to leave the cloud, due to the extremely high costs they were incurring from AWS. And they were not alone; Apple, Spotify, Dropbox, Netflix, and others also decided to move at least some of their workloads to on-prem. Generally speaking, the main reasons to move workloads on-prem are the following:

Costs. On-prem is dramatically cheaper and it seems to be the main driving force to move away from the cloud. For example, Hey was spending over half a million dollars a year just on RDS costs with AWS and a grand total of more than three million overall—even with heavily optimized usage. For that amount you can definitely get a lot more for your buck with on-prem infrastructure, even by hiring a company to do it for you.
Performance. For certain types of workloads you may get better performance, such as low latency requirements.
Flexibility: On-prem is as flexible as it gets because you control the entire stack.
Compliance: In some cases there may be a requirement to have complete control of the data

On-prem is not going away any time soon and if you have requirements that fall under the above, you may consider this in addition to your cloud workloads. Especially if your growth is predictable and you don’t have huge differences between peak and normal service hours.

Find the right abstractions for your engineering teams

Using multiple clouds means a wider engineering surface to cover. This requires additional authentication tokens, additional APIs, additional configuration consoles, and more.

Your platform engineering team will need to find ways to not only manage these consistently without introducing multiple single points of failure, but to also abstract as much of this complexity away from developers as possible.

This is where a Platform Orchestrator like Humanitec can help. As we’ll discuss later, Humanitec enables you to define ways to provision resources. This means developers only need to specify what they require, without concerning themselves with where that resource lives. In this way, Humanitec can work as an integral part of your internal developer platform.

How can Humanitec help multicloud setups?

Now for our shameless sales pitch. You knew it was coming, after all, we have a tool to sell so we can keep on feeding our dogs and cats.

Here at Humanitec we focus a lot on agnostic setups and abstracting away pain and complexity from developers. So if you opt for this multicloud approach, we’ve got you covered.

Let’s say your applications are containerized and you've decided to have multiple Kubernetes clusters in several clouds. Humanitec can help simplify your workload deployments, and manage them from one of multiple control planes.

We do this by enabling workload-centric deployments, meaning your developers can deploy Workloads and their resources to all environments using a single Workload Specification.

Your platform engineers can enable this by creating Resource Definitions. These determine where your workloads will be deployed, where and how your resources will be provisioned, and to which destination cluster and cloud.

With Humanitec’s dynamic configuration management approach, developers don’t need to know or care where your workloads and resources are deployed, since platform engineers can abstract this from them.

Platform engineers on the other hand have more control over how infrastructure and deployments are standardized and provisioned.

For example, a platform engineer can define a DB type Postgres resource in AWS, GCP, Azure, etc. This would get automatically allocated to the workload based on the environment the workload is deployed to. All the developer needs to specify is that they need a Postgres database for this workload. Other resources are also supported too, like DNS and storage.

This can help in situations where you have a development cluster in one cloud and a production cluster on-prem, or with another cloud provider. With Humanitec you can also assign different permissions to users which means that if one account gets compromised, it would only affect the workloads for that user.

If Humanitec went down, your workloads would be unaffected too. You would only lose the capacity to deploy new workloads until service is restored, making it rather resilient.

Conclusion

Chances are, you’re likely already using multicloud, so it may make no sense to disdain it. Instead, we suggest an effective and achievable strategy, that covers how to implement it in a way that best suits your organization.

It’s worth noting that the definition we provided for multicloud is just a reference based on sensible practices that would work for most companies. However, it's best to define what multicloud means to you and your organization based on your needs and engineering capacity, and then plan your strategy accordingly.

Focusing on your needs and engineering capacity is also essential to avoid getting bogged down by any hype you may read online. And remember, it’s much easier to increase complexity as you grow than reducing it later, so plan accordingly.

If you struggle with your multicloud setup and you’d like to dig deeper, feel free to reach out to me or hop on a call with one of our sales engineers.

Fernando Villalba

Sr. Tech Evangelist

Fernando Villalba has been a sysadmin and worked as a DevOps consultant in the past, which led him to write multiple articles (and some rants) about the topic. Right now he is a big proponent of platform engineering and implementing Internal Developer Platforms as way to bring a better developer experience to the world.

Upcoming events

No items found.

See all events