Every organization aims for success - great products and services, committed and happy team members, effective processes and practices, great feedback from repeat and new customers and markets, and plenty of profit.
DevOps is about extending these aims to focus on the practices and processes between the development, operations, and business roles of an organization. An organization full of siloed teams may be struggling with deployment delays, missed deadlines, unrealistic expectations, and the challenges of diagnosing and solving problems with the software. Fortunately, a DevOps approach provides a means to understand and integrate the most effective practices in software development.
DevOps distinguishes between low, medium and high performers
Arguably the bible of DevOps, Accelerate, is the largest DevOps research project to date to investigate how the most innovative organizations are leading the way in using DevOps principles and practices. The authors measure software delivery performance—and what drives it—using rigorous statistical methods. It offers new insights into what enables both software-delivery performance and organizational performance, as represented by profitability, productivity, and market share. The quantitative analysis provides actionable insights for any company.
The researchers have determined that only four key metrics differentiate between low, medium, and high performers: lead time, deployment frequency, mean time to restore (MTTR), and change failure percentage. While there's plenty of opportunities to expand these to multiple metrics and actionable aims, they provide a neat and elegant means to measure software delivery performance in real-time (and more considerable organizational success) with plenty of scope for progress.Â
Pivotal to each metric is continuous improvement. Increased deployment reduces bloated time-wasting practices and increased efficiency, faster mean time to restore increases customer satisfaction, automation and monitoring lessen the frequency of failures leading to fasting deployment. Let's explore each of these metrics:
Lead time
The lead time is the time it takes to implement, test, and deliver code. To measure lead time, teams need to have a clear definition of when work begins and ends such as the measurable time between when a commit has been made, and the resulting code gets into production. The aim here is to increase the speed of deployment through automation such as an optimization of the integration of the testing process to shorten overall time to deployment. This enables a clear metric with which to measure if/when team deployments are increasing in a way that can be understood by the team and any external customers.Â
Deployment frequency
Deployment frequency is the number of deployments over a period of time. It could be measured through a range of ways, such as via an automated deployment pipeline, API calls, or manual scripts. It's not about the frequency of delivery but about technical performance, although increasing deployments is also about reducing failed deployments and related failures which is part of a bigger picture of customer satisfaction. Deployment frequency is about little and often, and thus a team should be typically deploying several times each day rather than infrequently.
Mean time to restore (MTTR)
Put simply, the mean time to restore (MTTR) is the time that it takes to go back to service after a production failure. For example, the time that we need to recover from a feature breaking commit, or a broken database. This is about measuring the time to restore a service, from when an incident is reported to when it is resolved. MTTR is not the time it takes to fix a build. Instead, it's about measuring the responsiveness of a DevOps team to customer support issues as well as their capability to resolve and deploy solutions.
It may not be necessary to apply MTTR measurements to all customer support. Instead, support should be prioritized with the level of importance to ensure that there is not a distortion of data through a loop of low-level recurring problems that are more annoying than problematic.
Change fail percentage
The change fail percentage is the ratio between unsuccessful and successful changes. It requires a shared understanding of what makes a successful change and what makes a fail. Failure encompasses any change that results in service degradation, impairment or outage that requires fixes such as a hotfix, a rollback, a fix-forward, or a patch. In this respect, failure could include expected outages but also hard performance degradations, or other unplanned failures.
As a measurement, the change fail percentage enables DevOps teams to measure and track their progress. The expectation in a high functioning team is that the Change Failure Rate should decrease over time as the team develops their experience and efficacy. For example, if two deployments fail each week out of three, the Change Failure Rate would be 66% with the goal to reduce it to 33%. A high failure rate may indicate problems in the DevOps process and result in downtimes that cause a company a loss of revenue. That said, it's not about never failing. Failure is a scenario that often leads to new insights and fixes.Â
DevOps as an opportunity for progress
DevOps aims to generate business value through the continuous delivery of products and services. DevOps metrics are an effective way to measure the effectiveness (and improvements) of DevOps in development, deployment, and production within an organization. However, while key metrics are used to measure high performing DevOps teams they should also be considered within broader organizational goals such as profit, number of customers, quality of product and customer service.
Do you have more questions around DevOps topics? Humanitec's DevOps experts are happy to answer your questions during a free webinar!