Vol. 77: Reference Architectures for GitOps based IDPs

Hey there,

Let’s do a slightly more technical newsletter this time. I’ve been working closely with the fine folks at Red Hat and we’ve created a new reference architecture integrated with all their great products. You can read the full whitepaper here. What’s cool about this is that it’s following a number of paradigms:

Golden paths should be E2E automated and should require zero tickets and zero manual work.
The platform should manage resources across its entire life cycle and promote a high degree of standardization.
We use frontend for presentation and backend for logic handling
GitOps first. 100% code in and 100% code out - all following industry standards.
Sensible abstraction with Score for developers
High degree of standardization for infra and operations teams with resource definitions
Secure by design - no push traffic into the network from the delivery plane

This is how it looks like:

I’ll explain how the components interplay below but if you already want to dive deeper, here’s how:

Look at the code in Github of this reference architecture and try it yourself with this tutorial.
Read the whitepaper on the interplay of all components.

But while we’re at it let’s actually follow the flow of a user request through this architecture to understand how all of this fits together. I’ll just choose the request of an application developer “I need a redis for my existing workload”. I love this example because it sounds trivial but requires an awful lot of complex logic handling for the backend (you need new workload configs, create the correctly configured Redis by environment, pull the credentials, inject the secrets, run policy checks and sign-offs, put everything together and ship). So, let’s see how the user would go about this and what would happen in the architecture.

So, the user would likely want to stay “in code” and “in the editor” because “they are there already” and interfaces are often not appreciated by developers. So they would likely open the Score file (an abstract description of the relationship between service and dependent resources) and add exactly two lines of code:

Cache:

Type: redis

Here’s how the score file looks afterwards:

All that’s left to do is commit to this change and here’s what happens next:

GitHub Actions will run and forward the changes to the Platform Orchestrator.
The Orchestrator will read the Score file, build a diff of the changes, and analyze the meta-data to understand what the target of the deployment is (let’s say we’re deploying to an environment of type staging).
The information resource type=redis and context = staging is sufficient to identify the correct resource definition. The resource definition is set by the platform or infra and ops team and defines how a Redis in staging should be configured. Here’s how this resource definition would look like if we used a Terraform module to create and update the state. It’s probably worth mentioning here that you don’t need to use TF. It can also be Crossplane, Pulumi, call the API directly, etc.

Now that the Orchestrator knows how to create/update all resources it will dissect how those resources fit together and whether there are things that depend on each other. Maybe it needs a role or service account first etc. It will create an acyclic resource graph (which is why Platform Orchestrators are called graph-based backends) and then update/create all resources.
Next, it will regenerate the workload configurations.
If configured, the system can run a policy check to let a third party confirm that no policies are violated and in the case of a prod deployment maybe even a human sign-off.
If all of this is successful, it will store the workload configs, infra configs, and the acyclic resource graph in a Target State repository. This contains, as the name suggests the target state the resource plane will be in once the deployment is executed.
As the repo is updated the change is detected by ArgoCD which listens to changes on this repo. It will pull in the changes into an Openshift cluster and continue the execution in the network.
ArgoCD will hand over to the Humanitec Operator in-cluster. The operator will read the resource graph and start updating/creating the resources in the right sequence. It will then collect the credentials and inject them through secret at run-time into the container and deploy everything.
Finally, the Orchestrator might push a message to Microsoft Teams with the success or error notification, move the ticket to Jira, and finally update the portal.

And here’s the connection to the frontend which is the display layer of all of this. In this example, we are using Redhats' new Developer Hub. It’s consuming the API of the Platform Orchestrator as the central source of truth and is thus always kept up to date. The relationship of all components in all environments is neatly documented and cataloged. This was a ton - I hope you can follow. Again, give it a spin, download the whitepaper, and give in. And if you want us to analyze how this fits into your setup book a call with one of our Platform Architects. We’re also happy to put you in touch with RedHat Architects trained on this.

Cheers,

Kaspar