Devops Automation: Find the Right Balance
The goal of DevOps is to increase the efficiency and collaboration of teams.
According to the latest Puppet State of DevOps Report 2021, “Automating repetitive tasks may not be sufficient for DevOps, but it is absolutely necessary.”
The report shows that “highly evolved firms are far more likely to have implemented extensive and pervasive automation, but being good at automation does not make you good at DevOps.”
While 90% of top performing teams report their team has automated most repetitive tasks, 62% of organizations that are stuck in the mid-evolution also report high levels of automation. This raises the question for what purpose you should automate, it is no end in itself.
Automation helps teams gain visibility, streamlines processes, and near continuous feedback. But how much automation is too much, where do you start, and where do you stop? Automation is not automation if your teams are waiting for other teams to complete tasks before they can finish their work.
An ideal aim for any team switching to DevOps is to create a working environment where developers can “self-serve” all they need without being dependent on an Ops team and without repetitive tasks.
Developer self-service means that infrastructure provisioning (for example cloud resources like databases), the creation of new feature environments or app configuration is not a question of “ticket-ops” or results in fiddling with brittle custom build scripts. Developer self-service implies that redundant tasks are automated and a platform/ops/DevOps team creates a great internal developer experience, so that developers can focus on more complex and creative problems.
Planning
As with any major change or overhaul, stop, take a step back and take your time. There’s no need to do everything at once or take on too much at once. Start small and automate one service or system at a time to avoid overwhelming those involved. Good planning and calling on experience is more crucial than throwing more people at a problem.
As always xkcd has not one, but two comics that suit the topic of determining the right level of automation perfectly.
First is this helpful chart that correlates time savings verses the time you might invest in automating a process.
There’s always a temptation for developers to think that codifying something is always an improvement, but this is not always true. Sometimes the human manual method could remain the best use of time.
Digging deeper into why this might be the case is a fairly well known comic that highlights how much time you may truly spend on a development task.
https://imgs.xkcd.com/comics/automation.png
I am sure many developers have found themselves stuck in an hours-long debugging issue wondering how they got there. When this was for a task that was due to save a few seconds a day, it’s clearer to see that perhaps you have taken automation too far.
Often we make assumptions about how people work, so before beginning on the path to DevOps, take stock of your applications, data stores, development and testing processes, configuration management, environment provisioning, and more. This helps you understand what your teams do and how they do it in more detail and unearth any surprises and unknown quantities.
It’s a great idea to read about the experiences and steps taken by large companies and get inspiration from them, but don’t feel you have to replicate them completely or follow in their footsteps.
Don’t automate existing inefficiencies
Automating inefficiencies doesn’t help you solve the inefficiency, and in fact, you stand the chance of exacerbating it. The aim of automation is to improve relevant processes and make purposeful improvements to your processes. If there are mundane tasks in your existing processes, question if there’s any reason to waste automation on them and instead replace them with a more appropriate task.
Solve your biggest blockers first
It’s easy to become overwhelmed with choice when making a major switch, and especially in a busy and thriving ecosystem such as DevOps. A good way to start prioritizing is to look back at the planning stage and identify the areas that are the biggest time wasters for your team and find ways to automate those.
Do they spend too much time on manual tests? Automate as much of your testing as possible.
Do they spend too much time creating and provisioning production environments? Find a provisioning tool to help.
Are too many processes dependent on custom scripts or hardcoded variables? Invest time in switching to provisioning scripts, infrastructure as code options, and dynamic variable insertion.
Build around standards
There are a lot of well-established standards for many components in the DevOps ecosystem that are well used, respected, and production-hardened. There is rarely a need to start from scratch, and looking at what you can replace with standards is a great place to start. This means you can build upon best practices, tooling, learning, and an entire ecosystem.
Building on standards also gives you more potential to hire experienced staff who can get stuck into solving complex problems instead of onboarding with custom tooling.
Equally, building around standards gives you flexibility in the way you build. You can switch providers or between tooling that uses the same standards (for example, container runtimes) when one offers you better value or features.
Create fundamental building blocks
Once you have identified your biggest issues and blockers, it can still be hard to know where to start automating the essential building blocks to consider are the following.
Version control
Not every code-base, especially one with a long legacy, is managed by version control. Managing code with one such as SVN, Git, or CVS is not so complex, but getting a team used to using it is a learning curve.
However, it brings the ability to track changes, roll them back, and trigger other automation processes based on branches or different versions of code. If a change introduces a bug, then reverting a change in version control can trigger all the processes needed to test, provision, and roll back changes to all environments.
CI/CD
Otherwise known as continuous integration, continuous delivery, and continuous deployment, these are a collection of tools and services to help you automate essential tasks including, testing, building, and deploying code to environments. They are the glue that bind together many of the other DevOps components.
There are a lot of CI/CD options to evaluate, but they are an essential tool to understand and experiment with before continuing to automate.
Build upon building blocks
With the fundamentals in place, the next steps to consider is starting to automate the tasks traditionally undertaken manually.
Testing
Probably one of the biggest time consumers in the development process is testing software manually. It’s possibly also one of the larger parts to automate, especially to automate well and reliably. But if you have confidence in your tests, then you unblock a major hurdle to DevOps automation as you can trust code changes as you push them through other automated processes.
Tests can include (but there are many more):
- Unit tests
- Integration tests
- Acceptance tests
- Load tests
- Chaos engineering tests
And there are a wide variety of well-established tools and frameworks to help you create what you need.
Automate infrastructure
One aspect of automation often forgotten or left to later in the process is automating the creation of infrastructure and environments themselves. Called “infrastructure as code” (IaC) there’s plenty of tooling that lets you define infrastructure in a declarative way, meaning you can manage infrastructure in version control and thus trigger automated processes in a similar way to code. The tooling is fairly well established, but there is often a reluctance to implement it fully and instead stick to trusted and long-lasting environments. This is fine for a while but can slow you down when developers have to wait for deployment or testing access to shared environments, or worse if an environment breaks or needs updating.
Automating environment generation and making it dynamic removes these blockers and also makes it easier to scale and iterate on infrastructure as it becomes another resource you can create, remove, and add.
While you automate environments, don’t leave behind the ancillary services to environments such as persistent data stores and DNS servers, there are tools and services to help you automate the provisioning of these too.
You might want to consider something like an internal developer platform (IDP) to tie together IaC tools with these ancillary services, and any other processes. You can still put a central Ops or platform team in charge of defining barriers and policies with a clear Role Based Access Control (RBAC) model for services or applications, but they are now more responsible for maintaining the platform than responding to requests.
Developers work autonomously with the IDP through a UI, CLI or API, according to their role and permissions. At deployment time, the IDP takes care of pulling in code, generating new manifests that are executed against clusters and infrastructure, and developers can specify what they require (database, Ingress, DNS, File storage, etc.). The IDP makes sure the requested infrastructure is provisioned by open source drivers and serves it back to the developers.
Logging and monitoring
Understanding the effectiveness of your application is essential, and good log management and monitoring helps you with this. Modern applications can generate a lot of data, and processing and understanding it all manually is a long and error-prone task.
There are a lot of tools available to help you store, aggregate, analyze, and receive alerts on how your applications are performing.
Automating logging and monitoring helps you close off the DevOps loop as you can see the effects of code changes, and trace them back to particular deployments. Instead of spending time on finding anomalies, you can focus on getting to the source of a problem and solving them.
Measure and iterate
Automating repetitive tasks is almost always faster than human action (in the long run after setup and configuration), but there’s always room for improvement. As you make changes, measure and evaluate them, and periodically see if there are ways to speed up an automated process.
Is one CI or container image faster at building your code-base than another? Then after testing and evaluating, switch to it.
Does a testing framework or testing restructure run your tests faster and more reliably? Then consider a change.
Every second you save multiples across time and people and is time saved for other activities.
Set the correct level of automation
If you’re setting out on the journey of automating DevOps practices, the thought of over automating may seem a long way off. But it’s all too easy to start automating something by playing with one tool and thinking of a clever way to connect it to another automated process, etc. etc. This can rapidly lead to a complex succession of tooling somewhat taking over, and team members losing sight and control of what is going on. How does someone start or stop a process? How brittle are the connections between processes if a previous step results in an unexpected outcome.
Your outcomes and desired workflows should come first. Find the correct tools to support those, not the other way around.
That said if your planning process identifies practices that need fixing, and you’re not sure where to start to improve them, an opinionated tool could give you a good place of inspiration just be sure not to let it dictate everything you do.