Disclaimer: This article is crafted to inform and educate readers, incorporating a blend of factual insights and subjective viewpoints on the discussed topic. It is important to note that the content of this blog article is not intended to give strategic guidance or to universally criticize any form of organizational IT-strategy.

What Exactly is Multi-Cloud, and Why is it Relevant?

In short, “multi-cloud” refers to a specific type of strategy applied by an organization to use multiple public cloud providers such as AWS, Azure, GCP, or others. Sunit Parekh gives a more sophisticated definition in this minute in the official thoughtworks technology podcast. The opposing strategy would be a situation where a company only intends to use services from one cloud provider.

This decision to go multi-cloud is strategic and will have effects on an IT organization. With a multi-cloud approach, organizations are likely to allocate resources to fields that are not directly seen by your users; however there could be some benefits on the way.

In this post, we will outline some of the motivations for choosing a multi-cloud approach, then provide a list of principles on how to get there. We will also look at a sample architecture and conclude by weighing the potential benefits against the burdens of adopting a multi-cloud strategy.

Why Multi-Cloud?

So why would you prefer to keep a zoo of providers instead of just one? Betting on one may seem straight forward, as the big hyperscalers (AWS, Azure, GCP) offer a wide ecosystem of solutions, including so called “higher level services” sometimes referred to as XaaS (Everything As Service). Configuring those services and plugging them together is not just easy and fast, it also takes away the heavy burden of infrastructure-lifting (networking, OS patching, hardware, and data center handling etc.) with the provider taking care as part of the service. At least that is the marketing claim and - to my judgement - that comes quite close to reality.

The higher level services work well within the closed ecosystem. However, outside of the ecosystem their degree of compatibility with other technologies is low. And if they are compatible it is usually for one reason, and that is to help feed the cloud providers’ business model. In short, this relies on creating the situation that a customer will consume more and more services over time. In one of its latest code reports on Youtube Fireship go as far to say that using cloud providers could be a trap to step in. Going further they compare consuming cloud services for a company to an addictive drug from which it is hard to get away from. A deliberate exaggeration; but the risk of vendor-lock-in seems to motivate business decision makers to reconsider cloud consumption.

Other considerations in the decision-making process for or against multi-cloud include factors such as reduced/increased complexity, required team skill, overall price, risk of control loss, amount and variety of services, data sovereignty.

So back to the “Why”: The motivation is rather business driven than purely technology driven, nevertheless it will have implications on how IT-organizations will use cloud technology.

How to get there?

So let’s assume you are convinced and want to follow the multi-cloud approach. How would you get there?

A north star way of thinking can help: In general, IT-leaders want to encourage their teams to use technologies that work on any cloud provider. In short, less of the highly integrated, highly customized, highly provider-specific services and more of the technologies and services that are based on industry-wide, generally accepted standards, usually supported by software foundations (e.g. Cloud Native Computing Foundation), and/or developed within the open source community.

Here is an incomplete list of principles I think could help organizations to follow:

1️⃣ Overarching Tools

Use overarching tools that enable you to provision and maintain the cloud resources from different provider at the same time! In the area of infrastructure-as-code those are terraform or pulumi. The according services that tend to be avoided are AWS CDK, Cloudformation, Azure Biceps, ARM.

2️⃣ Overarching Solutions for Central Functions

Use overarching special purpose solutions for central company functions, such as end-2-end logging, cloud-cost-management! Grafana/Prometheus stack just to name one for the function logging. Those tools can span across multiple cloud providers. The services to be avoided in this case are AWS Cloudwatch, Azure Monitor, etc..

3️⃣ Containers and Kubernetes

Technologies that are often non-proprietary (usually open source) at its core, such as Kubernetes and Docker/Podman Containerization can help. The according services that tend to be avoided in this case are AWS Lamdba, AWS ECS, Azure Functions, Azure Container Instances, Azure Service Fabric, and other provider-specific compute engine services.

4️⃣ Standardized Protocols

The forth principle refers to communication protocols between services. E.g. the OpenID Connect (OIDC) protocol was developed by the OpenID Foundation which is a non-profit international standardization organization. Another standard protocol is HTTPS. It is the backbone of the internet and therefore independent from any proprietary provider-specific solutions.

5️⃣ Endpoints

That one builds a bit on top of the previous one. Use endpoints that are “open”. E.g. AWS PrivateLink only lets you connect private endpoints from other AWS Accounts. The same with Azure Private Link, both are tailored to their respective ecosystem. VPN on the other hand, can offer similar functionality while working independent the ecosystem.

How does it look like? A Sample solution architecture:

The image below shows how a high-level sample solution architecture could look like:

The workloads run in two “clouds” both on the cloud providers’ Kubernetes service. And the idea is that workloads can be migrated with manageable effort in case ever needed due to the usage of this standardized technology layer. Sometimes this approach is also called “cloud agnostic”. One thing to mention in this scenario are the so called “egress costs” which can in some cases sum up to large portion of your cloud bill. More details about egress costs including latest talk of the town on egress costs you can find in this Reddit Threat.

What’s the point?

Is it even possible to achieve cloud agnosticism to a satisfying extent? Indeed, it is a balancing act. Reflecting on the start of the whole movement to the cloud, we can recall the initial motivation behind transitioning from our traditional data center strategy: The aim was not just to escape the burdens of infrastructure management but to leverage the benefits of specialization and cost efficiency by outsourcing non-core tasks to highly specialized vendors. This is akin to a car manufacturer opting not to raise its own cattle for leather seats but instead focusing on what they do best.

Thus, the decision lies within navigating between two poles: The complete reliance on bespoke services and maintaining everything from the ground up. The key is finding the right balance that serves your organizational goals and technological aspirations.

For example, one approach to achieving the right balance could be to set a target timeframe for crucial applications to migrate away from a specific cloud provider and establish this as an objective for the responsible application team. This way, the team is accountable for designing the application architecture to meet the timeframe objective in case of a potential migration, while still making technological decisions independently.

If you are interested in the technical details and tangible implementation insights not impossible to achieve, just checkout the source code on GitHub, and start building your multi-cloud tech-stack! It gives an impression on how the multi-cloud strategy might look like in practice.

References: