I got the following question (or similar at least) twice this month: “Should you tie a developer’s bonus to her performance?” I know some of you have to deal with similar questions so I wanted to share my thoughts on this. And give you some logical gunpowder why I think it sucks.
Wikipedia defines the “Butterfly effect” as sensitive dependence on initial conditions in which a small change in one state of a deterministic nonlinear system can result in large differences in a later state. In a system as interdependent as the world we live in, the wing of a butterfly in Brazil can cause a thunderstorm in Thailand. It is impossible to connect the dots between these events, no model we currently know of can even get close to a prediction. At the same time, we know of the compounding effect that the most seemingly insignificant events can cause. We know we should be theoretically able to model this, but we aren’t able to do so in reality. What we can clearly observe is that the further you try to “backward deduce”, the more effort and time you have to put in, the higher the variance of potential explanation. In other words: the time you need to figure out interdependencies in the system increases non-linearly, as the degree of complexity of the system itself grows.
Performance measurement within engineering teams is similar to this theorem in that pinpointing what exactly the impact of an individual is and binding this into clear KPIs is almost impossible. The following are just three of dozens of examples:
- Quality of code is in no way correlated with total output. In some cases, the correlation is even the other way around. Using “lines of code” in any given timeframe as a KPI doesn’t make sense.
- It’s not only hard to predict, how an engineering team is supposed to achieve a target, but the target itself is constantly moving. The idea of using story points suggests somebody in the team can forecast the complexity of work. This barely works.
- What success looks like is relative and has situational dependencies. Maximizing a frontend test coverage might be a good idea if you have less than 50% coverage today. It might not a great idea if you’re above 70% already.
What it all comes down to: the relative effort one would need to put into deriving, measuring, and updating the correct KPIs is depending on the interdependency and instability of the system you try to model. It is very simple for a sales team (revenue), it is slightly harder for marketing, I would argue it is impossible for engineering.
I couldn’t name a single example where a well-functioning engineering team pays individual bonuses by working against KPIs. So what to do? This is where the beauty of the human brain comes in. We are able to factor in a tremendous amount of data points, observations, and experiences and derive threshold KPIs we commonly call gut-feeling. If 360-degree feedback with colleagues and peers signals a “yes” to performance-based compensation components, the likelihood of this answer outperforming every model you could possibly build is extremely high.
Let’s appreciate the butterfly effect. For me personally, it’s the last resort to believe that there is free will. Hidden away in the overwhelming complexity of interdependencies. What you cannot measure preserves the beauty of belief.
Kaspar
Ps. I am working on analyzing data from over 1,500 DevOps setups we surveyed in the last months. I’ll be presenting my learnings here in a couple of weeks. And the next day we’ll show how Flink built their IDP with Humanitec. I am excited about these two? Join me.