TL;DR: What is an Internal Developer Platform?
An Internal Developer Platform (IDP) is the sum of all the tech and tools that a platform engineering team binds together to pave golden paths for developers. IDPs lower cognitive load across the engineering organization and enable developer self-service, without abstracting away context from developers or making the underlying tech inaccessible. Well designed IDPs follow a Platform as a Product approach, where a platform team builds, maintains and continuously improves the IDP, following product management principles and best practices.
End-to-end DevOps platforms, developer portals, service catalogs, Heroku-like PaaS solutions and tools that cover single parts of the software delivery lifecycle or infrastructure provisioning (e.g. Environment as a Service) are not Internal Developer Platforms. IDPs are built out of different tools and technologies (e.g. open-source, proprietary, self-developed) and might include things like developer portals, but they don’t come as an out-of-the-box solution. An IDP covers the following foundations:
- Infrastructure orchestration
- Application configuration
- Deployment management
- Environment management
When well designed and built, an IDP is a compelling product that lets platform teams “make the right thing [to do] the intuitive thing” (Gartner). It increases developer productivity and velocity.
IDPs and the evolution of DevOps
Why do teams build Internal Developer Platforms and why are more and more organizations looking at the broader trend of platform engineering? It all started back in 2006, when Werner Vogels, CTO of Amazon, announcing AWS famously yelled “You build it, you run it”.
A new era of DevOps was ushered into existence. Engineering organizations of all sizes could finally tear down the proverbial fence between developers and operations and shift ops left to developers. While that sounded great in theory, the practice looked quite different as time went by. Alongside cloud adoption and DevOps, other trends started taking hold and accelerating: the move to microservice architectures, application containerization and the popularization of orchestrators like Kubernetes, Infrastructure as Code (IaC), etc.
These innovations brought an unparalleled amount of progress to the software development industry and kick started the cloud native era. However, this also came with an explosion in the tooling landscape.
Developers suddenly needed to have an end-to-end understanding of increasingly complex cloud native toolchains just to do simple tasks such as updating an environment variable. They now had to be familiar with 10+ tools simply to debug a basic database provisioning. The promise of DevOps, of shifting ops left to developers and letting them deploy and operate their applications, quickly led to increased levels of cognitive load, heavily impacting developer experience and overall productivity.
While most engineering organizations were still trying to catch up to this new DevOps methodology, top performing orgs had already identified extra cognitive load as a potentially deadly threat to the velocity of their teams. The likes of Google and AirBnb understood that expecting all developers to master their increasingly complex toolchains was not only unrealistic but simply bad practice. After all, the history of software engineering has been one of progressively building useful abstractions.
That’s why leading tech companies started creating platform engineering teams that worked as product teams with one major focus: building Internal Developer Platforms for their own customers, the developers. Jason Warner, CTO at GitHub, explains how building an IDP rapidly became an existential matter for their hyper scaling organization. Netflix famously did the same, Google built Borg, AirBnb their own platform on top of Kubernetes, and we recently spoke with Courtney Kissler, who built similar IDPs at Nike, Starbucks and Nordstrom.
It became increasingly clear that engineering organizations that invested into building an Internal Developer Platforms (IDPs) showed better performance on all DORA metrics (lead time, deployment frequency, change failure rate, MTTR) than the vast majority of orgs that got stuck somewhere along the way of this DevOps transformation.
Puppet’s State of DevOps Report from 2021 clearly shows the correlation between top performing teams and their platform capabilities.
Humanitec’s DevOps Benchmarking similarly shows how the vast majority of the more than 1800 engineering orgs surveyed got stuck halfway through their DevOps journey.
The key difference between top performers and the rest of the industry is their ability to design and build Internal Developer Platforms. In particular, the best teams treat their IDPs as products.
Platform as a Product (and the team behind)
Already back in 2017, Thoughtworks Tech Radar outlined the impact that having dedicated platform engineering product teams can have:
The adoption of cloud and DevOps, while increasing the productivity of teams who can now move more quickly with reduced dependency on centralized operations teams and infrastructure, also has constrained teams who lack the skills to self-manage a full application and operations stack. Some organizations have tackled this challenge by creating platform engineering product teams. These teams operate an internal platform which enables delivery teams to self-service deploy and operate systems with reduced lead time and stack complexity. The emphasis here is on API-driven self-service and supporting tools, with delivery teams still responsible for supporting what they deploy onto the platform. Organizations that consider establishing such a platform team should be very cautious not to accidentally create a separate DevOps team, nor should they simply relabel their existing hosting and operations structure as a platform.
In hindsight, it's actually surprising that this approach didn't get more attention earlier on, as Manuel Pais noted in his recent talk at PlatformCon. But what does Platform as Product mean in practice? It means that in order to build a well designed IDP you should start from thorough user research. Platform teams need to build a tight feedback loop with the rest of the engineering organization to identify issues both on a team and on a company-wide level. They should also avoid common pitfalls and fallacies of platform engineering practices. But let’s focus on the key problems platform teams have to tackle.
Pain points an IDP solves
While the gravity of the issues and combination thereof will vary from org to org, below we clustered the major patterns and pain points we have seen teams of all sizes experience (and address by building an Internal Developer Platform).
Org-level pain points:
- A general lack of standardization, leading to configuration drift and a massive amount of static scripts mushrooming across teams.
- No unified developer experience across the organization, making it hard to switch from one team or feature/product to another.
- Broad performance issues (low deployment frequency, long lead times, high change failure rate, long MTTR).
Operations’ pain points:
- Having to do repetitive, manual work over and over again.
- Constantly playing catch up to ticket ops, effectively becoming a help desk (and bottleneck) for devs, being bothered because devs are not enabled to self-serve what they need to run their apps.
- Overall setup becomes very hard to maintain and scale with.
Developers’ pain points:
- Lack of developer self-service, constantly having to wait times on Ops to do simple tasks (e.g. provision a database or spin up an environment).
- Expected to understand complex toolchains end-to-end if they don’t want to wait on Ops. This leads to extra cognitive load and derails them from coding. It often also results in shadow Ops.
- Poor developer experience (lack of documentation, context switching, but also lack of context when underlying tech and tools get hidden from them).
To address this range of challenges, there’s no one-size-fits-all solution that can be implemented, especially for organizations of a certain size (20+ developers) and complex brownfield enterprise setups.
According to Gartner’s <it>Software Engineering Leader’s Guide to Improving Developer Experience<it> (full report behind paywall), “platforms don’t enforce a specific toolset or approach – it is about making it easy for developers to build and deliver software while not abstracting away useful and differentiated capabilities of the underlying core services.”
A fast scaling startup building its new infrastructure from scratch (greenfield) can easily adopt the latest technology and is likely a completely different case than large enterprise teams with a lot of legacy tooling (brownfield).
This means, although there’s no shortage of such offerings in the market, solutions that claim to provide end-to-end workflows for the entire application delivery lifecycle are likely not going to be able to address your specific set of pain points in a satisfying and scalable way.
What we have seen over and over in the market is that top performing platform engineering teams build their IDPs on their own, finding the right combination of platform tooling that works for them. That however, doesn’t mean that you need to start from zero: over the last months we built an overview of the platform engineering tooling landscape that can hopefully help you get going.
While CI, registry, messaging, database & storage, security, logging, DNS, IaC and cloud providers should be pretty self-explanatory, let’s have a closer look at the remaining parts:
- Service catalogs, developer portals or platform UIs, e.g. Backstage: tools from this category are not an IDP, but they can play a very useful role in your IDP setup. As Gartner’s Software Engineering Leader’s Guide to Improving Developer Experience puts it: “Internal developer portals serve as the interface through which developers can discover and access internal developer platform capabilities.”
- Platform Orchestrator: this is a new category that enables dynamic configuration management. A Platform Orchestrator is the centerpiece of every dynamic IDP.
- Kubernetes control planes: these are abstraction layers on top of Kubernetes that reduce the complexity developers are exposed to. Be aware that everything beyond Kubernetes is not covered.
- Infrastructure control planes: these are abstraction layers on top of the IaC setup to reduce the complexity developers are exposed to. Be aware that everything beyond IaC is not covered.
How your IDP will eventually look will depend on the technologies you already have in place, the ones you want to get rid of, the ones you want to keep, the size of your org, the preferred workflows of your dev teams, external factors like regulations, and so on. So it’s only natural that different companies will take very different approaches in how they build their IDPs.
A common antipattern we have seen in the industry is orgs taking a UI-based approach. This usually fails as providing a click-ops experience to developers tends to make the underlying infrastructure and delivery setup feel too abstracted away. Engineers feel they lack context and either actively block the platform rollout or let adoption passively fall flat. What this makes painstakingly obvious is that developer portals are not an IDP.
Gartner clarifies that “Internal developer portals serve as the interface through which developers can discover and access Internal Developer Platform capabilities''. They are, as mentioned, only an interface to an IDP. But you still need to build IDP functionalities under the hood.
An orthogonal path is applying a GitOps methodology, like Palantir or nesto did. This is better than the UI approach, as it tends to fit more naturally into the code-based workflows of developers. However, it often leads to huge complexity down the line due to the exponential growth of static config files and unstructured manifests, which as you grow paralyzes Ops teams. And you are back at square 1 with the pain points described above.
A Platform Orchestrator provides a more flexible and resilient option. It works with any combination of interfaces, such as an API, a CLI or a UI like a developer portal, e.g. Spotify’s Backstage (you can dive deeper into how a platform orchestrator like Humanitec complements a portal like Backstage here). By enabling dynamic configuration management, an IDP built with a Platform Orchestrator naturally avoids the explosion of static scripts typical of traditional GitOps setups.
The current state of platforming
To recap, if you don’t build your platform, it will build itself. Or put differently, devs don’t want to do Ops. If you don’t offer developers a well designed IDP:
- Experienced developers will end up helping less experienced colleagues all the time (i.e. doing shadow Ops), which reduces developer productivity.
- Ops, DevOps or SRE team will glue together some tools to reduce toil, but not in a thought through way, this will backfire down the line.
- Self-scripted workflows will pop up across the organization, making the overall setup hard to maintain and scale with.
We have looked at the platform tooling landscape and at the different approaches teams take to build IDPs. Most IDPs we see today in the market do a great job at enabling developers to deploy an updated image from one stage to another, as long as the infrastructure of the app doesn’t change.
This is a great first step and such setups work very well in 80% of delivery use cases. But they remain static. Meaning there are static configuration files manually scripted against a set of static environments and infrastructure, as shown below.
While these static IDPs cover most simple use cases concerning the simple update of an image, they tend to break or cause overhead once teams want to do any of the following:
- Roll back
- Change configs
- Add services and dependencies
- Add or change infrastructure
- Spin up environments
- Onboard new developers to their delivery setup or work with external ones
- Audit the delivery flows
When faced with such tasks, developers have the choice between trying to do these things themselves (which derails them from coding, leads to shadow ops, etc.) or ask Ops/DevOps/SRE teams for help (which creates waiting times and bottlenecks on the Ops side).
We see the majority of tooling from the previous section often built into static platforms to solve single pain points (e.g. environment as a service) or into self-scripted workflows that tend to lead to shadow ops or increased dependency on operations teams. The issue is that most of these tools don’t address the core issue in most delivery setups: a static way of managing both application and infrastructure configurations.
Platform Orchestrators enable dynamic configuration management: the next generation of IDPs
Platform Orchestrators are the latest answer in the space to the challenge of static configuration management. A Platform Orchestrator is the center piece at the heart of modern dynamic IDPs that enable dynamic configuration management.
It lets engineering organizations of all sizes enforce a Declarative Application Model, where developers can specify environment agnostic configurations for their workloads (e.g. env variables, dependencies, etc.), valid across all environments and stages of the delivery pipeline.
The Platform Orchestrator matches the environment agnostic configurations to the workload and infrastructure profiles that the platform team establishes across the entire organization and dynamically creates new manifests and config files at deployment time.
This approach to configuration management solves the issues we have seen emerging in static setups and opens up a whole new set of capabilities for both Ops and dev teams:
Standardization by design
Using dynamic configuration management we not only differentiate the environment-agnostic from the environment-specific elements of configuration, we also share workload and infrastructure profiles across multiple workloads/apps or teams. This limits the variance between configurations significantly. Individual contributors focus on the abstract workload specification (only one per workload) and which is the same across all environments. Platform teams can govern workload profiles, infrastructure profiles and resource matching. This way of working leads to standardization by design of all configuration components. Even security reviews become faster, as you only need to do them once to be able to get a new resource from a pre-vetted template.
Reduced maintenance overhead
Similarly, by introducing a standardized way of creating configuration you get rid of the randomness of manual “change by change” configurations. This significantly reduces the overhead of maintaining and documenting existing setups. Something you will be grateful for as the application lifetime increases.
Reduced change failure rate by eliminating config drift
What connects to what resource is now pulled into one place per app, the resource matching. The abstract workload specification remains the exact same across any environment. This makes it really hard to have your workload running in prod connect to a test DB (although arguably not impossible).
Abstract without abstracting
Rather than having to deal and dissect every single file that composes the application, developers can choose to stay “high-level” on the abstract workload specification. At the same time, they can dive into the level of the workload profile and infrastructure profile any time. This allows them to move fast without losing any context.
Reduced cognitive load for the developer
The approach of letting developers handle the full depth of configurations from image to resource has led to significantly slower delivery and shadow ops. The recent DevOps benchmarking report paints a good picture of that. Dynamic configuration management gives devs full flexibility with minimal load. Even the config break between local and the cloud can be removed by resolving the workload configuration against something like Docker Compose dynamically.
More self-service for developers, without more responsibility
In a dynamic model adding an S3 bucket to your architecture is literally as simple as describing the new resource and adding a parameterized environment variable. As long as S3 buckets have infrastructure profiles and are matched in your resource matching to your environment type, they can be created and immediately wired up. This eliminates the need for putting tickets into JIRA that some poor operations team have to bash through, while developers wait.
New way of working and new features
There is a wealth of functionality that dynamic configuration management enables that was simply not possible before. Like taking the state of any environment and launching it as a new environment with the exact same resource components. Or getting an end to end audit log of everything that was ever deployed, by who and where, for easy debugging.
Platform engineering and Internal Developer Platforms are revolutionizing the way engineering organizations of all sizes are designing their delivery setups and developer workflows. As shown by Gartner however, platform engineering is still a nascent discipline and so many of its principles and best practices still need to be defined.
Both the platform tooling and Internal Developer Platforms represent this well, with new tools popping up every year and IDP design quickly adjusting to the evolving needs of teams. Dynamic IDPs and Platform Orchestrators are the latest frontier of platform engineering, enabling developer self-service and dynamic configuration management (and true DevOps). I look forward to seeing where this space will go next.