Platform Orchestration is a comparably new concept. At Humanitec, we’ve been going all in on this for years and have been calling it by its name since early 2022. Since then, Thoughtworks has put it on the radar and even Microsoft is calling it out in their platform documentation. Here’s a longer summary, but let’s quickly revisit what Platform Orchestration actually is.
What is Platform Orchestration?
Platform Orchestration is about a Platform Orchestrator reading an abstract relationship between workload and dependent resources provided by the application developers.
Based on the context (I'm deploying this app to staging), the Orchestrator picks the right recipe and updates/creates app and infra configurations.
It does this dynamically with every deployment. If the recipe changes, all impacted resources get updated. If the workload has a new dependency (a new DB of type Postgres for instance) this will automatically be created across all environments.
With implications for the enterprise IT organization, at this point, few realize that:
- All of a sudden, custom resource configurations for every resource are a thing of the past. A fleet of workloads can suddenly share the same resource definition. This results in 95% less configuration surface area -> and thus requiring only a fraction of the Ops FTEs required in “old” setups.
- Zero load on the developer -> speed (4X the deployment frequency). You build it, you run it is now an actual reality with extreme implications for the internal political balance.
- Near complete freedom in infrastructure choice. It simply doesn't matter where you run. And so we ask the question, what does this mean for the Infrastructure as a Service (IaaS) landscape?
This isn’t a theoretical concept. It’s long in use at enterprise scale. The impact is already being realized. The train has left the station.
You cannot remove complexity, you can just shift it somewhere else
Let’s try to understand where the complexity at the moment is coming from. Why do we still need massive operations teams? The simple answer is “because cloud native is complex”. But if we really zoom in, this complexity is primarily a function of the total number of configuration options available on the IaaS layer.
If you configure your databases on a workload-by-workload basis individually, and AWS provides 7 million different options on how to configure it, the complexity is simply immense. If I’m looking into basically any enterprise team today, I’ll find hundreds of slightly different ways to configure the same resource.
And with immense complexity comes immense cost. If we stack rank by impact starting from the smallest:
- Time lost with initial config (Ops team, tickets, dev cognitive load)
- Security costs of governing and patching insecure configs
- Increased cloud costs through inefficient configs
- Maintenance costs
This model has clear flaws - so why are we following it? Probably several reasons:
- Emotional: we’re of the opinion that only if we configure from scratch, we can hit the requirements our service needs to run.
- Historical: we’re lazy, we always did it this way.
- Narrative: neither vendors nor SIs have a significant incentive to change this approach.
How many ways are there to configure RDS in staging?
That’s a key question. We would probably agree that it’s not “in the hundreds”. I would claim it’s at max a handful. And this brings us to the question: should we not enforce that there are only 5 different ways to configure RDS in production? Where’s the added value of not doing this?
This is exactly what Platform Orchestration promises, even at this level we’re reducing the level of complexity by magnitudes. We reduce it from “all possible options” on the IaaS layer to 4-5 options on the “Recipe” or “Resource Definition” layer.
To eliminate drift entirely, the right recipe has to be enforced with every single deployment.
For this to work, we need to be context-aware because the definition of a resource, even if we want to enforce it, will differ by context. In simpler words: the RDS configs for staging will differ from the prod configs for the same workload.
This brings us to the necessity for introducing an additional abstraction layer, the “workload specification”. If developers don’t need to worry about how resources materialize by environment, they can just describe the workload and its dependencies in abstract terms.
“My Workload requires a database of type Postgres, file storage, and a DNS” is already enough. Here is the final picture we get:
The implications are already extreme. The wild west of cloud-native is over. The time of clean-up, standardization, and consolidation has long begun. This change will mostly be positive. It will include efficiency gains, reduction in waiting times, increased innovation and simply shorter time to market.
This does change the profile of operations teams. They will have to change from reactive ticket Ops to proactive platform engineering. I cannot express enough how important it is to embrace this new paradigm. If your job is reviewing repetitive Terraform files, you have to prepare for change.
Another heavy implication will be on the IaaS layer. Its differentiation is significantly decreasing. The proprietary serverless layer and its lock-in promise has essentially failed. With Platform Orchestration, the only threshold for moving clouds is the database layer and the cost of transfer. The necessity of refactoring applications and retraining staff are almost entirely eliminated. This builds us up to the fascinating question, which Hyperscaler will benefit? And the answer will probably develop alongside the following points:
- Most efficient operating model
- Most efficient GTM model
- Best developer experience
- Who owns the #1 Platform Orchestration system
So, sit back, and get some popcorn. The next phase of the cloud wars has just begun.