Through the nature of my job I see a lot of DevOps setups. In fact hundreds a year. Very often when I talk to folks I find they feel uneasy about what feedback to listen to, what to ignore, what to change and what to keep the same. In this article I want to briefly show that this (natural) feeling stems from too many voices shouting at you with too little context. I believe a comparably small number of data-points can help provide this context and in turn, do define some state of maturity.
Maturity is a matter of context
Your DevOps setup is mature. Alright. In regards to what benchmark? While the concept of maturity implies that you can actually measure this on a scale, this is actually not true. Just because you are running on Kubernetes doesn’t mean you are mature. In fact if you are running on Kubernetes and you have only one static service (bending this example slightly) you have an extremely immature DevOps setup, because Netlify would have perfectly done the job. Then again, if you are 250 developers working on one application and it’s a monolith then this setup is not mature either. Obviously DevOps maturity is depending on the specifics of your setup.
One of the most dangerous things I observe is a) people want to feel “more mature” which in turn leads to them adopting a technology that is clearly overkill just to see progress. Leaving Heroku to migrate to Kubernetes too early is a perfect example. And b) asking people for advice who either have not got the right experience or provide the wrong input. Both have fatal consequences.
What key attributes you need to evaluate DevOps maturity
We don’t need to reinvent the wheel here. There are abundant resources and books on these topics such as Accelerate. Inherent to processes is the value of key performance metrics. These include Lead Time, deployment frequency, MTTR (mean time to recovery/restore), and change fail percentage. The metrics each team decides to track depend on the teams themselves, their specific pain points, and the organization's broader goals. Both processes and goals are flexible, malleable, and continuously changing as an organization's progress, and priorities shift over time. The importance of shared goals, strategies, and responsibilities is critical, and even more critical is shared ownership of the importance of DevOps.
Starting from these core metrics, I have tried to develop a model that triangulates this at least to a certain extent and provides the key points of information I like to look at when doing DevOps assessments. These key points of information should underpin the workings of anyone who provides you with any kind of effective DevOps assessment. I found the score to be surprisingly accurate:
1. Number of developers
That sounds trivial, but it’s not. It’s the single biggest input factor needed to determine your DevOps maturity. One tech might be great for a team of 30 but horrible for a team of 250. More people changes the organizational structure, increases complexity and specialization.
2. Architecture type
Microservices? Monolith? Loosely coupled? Looking at this attribute without watching the size of your development team is useless. While I would personally prevent microservices wherever possible they are almost a must if you are working in large distributed teams on a single application. They also have a massive impact on your tooling. As Jason Warner, Githubs CTO told me recently: "When you split into microservices that’s like giving birth to your first child. Nothing will ever be the same."
3. Containerization
The degree doesn’t matter so much. The fact is, whether you’re containerized matters. Independent of the team-size (unless you are a 3 person squad) containerization is a must if you want to get to a resource efficient delivery model. Containers yes/no is a good indication whether a team has its shit together.
4. The way you store and version your App configs
Again, dependent on team size. This grows in importance if you are getting yourself into orchestration engines such as Kubernetes. While a smaller team can perfectly well apply changes one by one, it is definitely necessary to have a clear change-log of config changes if you mature. This is a key risk mitigation to ensure you have a backup if a key team member leaves and is part of your internal docs. It also ensures your disaster recovery is working.
5. How you store and version your infrastructure configs
This relates to App configs,but is even more important in the context of disaster recovery.
6. What orchestrator you use
First of all, that you are using one is information. If you don’t require the sophistication level and flexibility of in cluster jobs and complex routing, are you sure you need Kuberentes? Wouldn’t Heroku do the job? So this alone in combination with the previous answers can give you an indication of maturity.
7. On what infrastructure composition you run
A 250 person team in a non-regulated industry and you are multi-cloud with parts still on-prem? There are reasons that might make this necessary (such as you have business in China or very specific GDPR related reasons) but usually one should prevent this setup wherever possible.
8. The degree of developer self-service
If developer self-service is underappreciated it offers an indication, albeit dependent on team size. A small team with a “you build it, you run it” approach? Good. Huge team and every second application developer is drowning in delivery work? Not good at all, meaning the platform team isn’t doing a great job in standardizing workflows and dealing with things on the edge.
9. The division of tasks between ops and devs
Ops shouldn’t be ticket ops. They should set baseline-charts, design workflows, and set clear boundaries. Application Developers should ONLY code. “Senior Devs are the only ones that can actually deploy to production and they have to write all the configs themselves” isn’t doing anyone a favour here.
10. Deployment frequency
This is generally applicable to all setups. The higher the better.
11. Lead time
Same here. She shorter the better.
12. Mean time to respond to outages in production
Same.
13. Percentage of failed deploys to production that result in roll-back
This offers a strong indication for the usual reasons around test quality and discipline in the engineering organization.
How mature is your DevOps setup really?
We’ve put all of these questions into one assessment tool and I’d encourage you to try it out. We’ll calculate your score and send it to you. Yes, this will be a linear score but it will take the dependencies mentioned above in consideration.
If I find your score interesting, I might reach out afterwards to give you some ideas of what you might want to change. I also might not do that. And no, I will not give you more crappy smart-ass advice.
Other than that, I’d be interested in your take on this. Are there any key points missing from the list above that would make the assessor understand your case significantly better?
E-mail me your points to kaspar@humanitec.com and I’ll send back a six-pack of good old German beer in return.
Wherever you live in the world. Not kidding.