Bringing Kubernetes into your infrastructure stack and team workflow too early can introduce unnecessary complications. The productivity of your team can decrease as they struggle with the increased complexity.
Kubernetes (K8s) offers you a lot of features and tooling “out of the box” (rolling upgrades, robust failover, etc.). You define a lot of these features in a standard declarative file-based way, which enables you to apply them across local, cloud and hybrid cloud setups, and make them reproducible. This declarative configuration is the same everywhere, which means that experienced users can take their knowledge with them from task to task and job to job instead of having to learn new configuration formats each time.
However, K8s also adds yet another layer of organizational overhead and complexity to your infrastructure that you need to keep up to date, maintain, and debug. It initially makes the development process harder as developers sometimes need to learn and maintain an entire stack on their machines to just make code changes, something they might not have needed to do before.
Good developer experience for your team should seek to reduce the “cognitive load” on developers, not add to it. According to the book, Team Topologies by Manuel Pais and Mathew Skelton: planning regarding the rollout of new technologies or approaches often assumes that team members have near full-time availability and zero existing cognitive load. As this is typically not true, major changes can impact the quality of the delivery of existing work, or affect the implementation and attitudes towards new tooling.
If you are already using containers successfully, you’re a small team, or you’re only maintaining one application instance, then you may not need to switch to Kubernetes at all. You should at least consider whether you have a specific business or technical need for Kubernetes before you commit. If you have invested the time experimenting with and understanding what Kubernetes can bring to your team, then start to consider it. Even then, embark on a phased roll out to see if the benefits you were hoping for actually materialize before you have irrevocably migrated fully to Kubernetes.
Build around Kubernetes
As you further build around your K8s-based infrastructure, it’s important to start treating your configuration in a similar way to your application code. At a minimum, this means being able to track changes to configuration. This allows you to identify errors caused by configuration changes and then roll back problematic changes.
Larger companies like GitHub have built Internal Developer Platforms around Kubernetes clusters. These allow teams and team members to get their code running, without needing a detailed understanding of how Kubernetes operates. But if you’re not careful you can counteract and supplant the standards that come from using Kubernetes in the first place.
The aim of Internal Developer Platforms is to free ops teams from spending time on bit work of supporting development teams. Instead, they should become an Internal Developer Platform team that creates and maintains tools that help developers instead self-serve things like deploying their code or manipulating cluster resources. When making this transition, it’s important to keep this core requirement (Self-serve) in mind. While it’s OK to put in place boundaries and “guard rails,” developers should in general be able to do more than they were able to do before the platform was introduced.
If you get the balance wrong, then you run the risk of actually making the situation worse. Team members are unable to complete all the tasks they need to do their job. This leads to frustration, relearning, and wasted time. It may be that the process of changing your approach technically highlights other organizational issues between teams and their responsibilities.
Unless your business is Kubernetes, you should think of it as an implementation detail, (most of) your development teams should be working on your apps, not maintaining ops tooling.
If you decide to build your own Internal Developer Platform, there will naturally be a period of time where certain team members spend more time building the platform than your application. Look to scale this down gradually and as appropriate. You can draw inspiration from what others have done before, for example, AirBnB or Zalando. Another approach is to use providers such as Humanitec to build your own Internal Developer Platform.
Changing your infrastructure and the way your team uses and interacts with it is a perfect time to also consider restructuring teams, or changing their responsibilities and mindset. While switching tooling doesn’t fix all “human” issues, it’s a perfect time to make a gamut of changes.
As your team grows, you may need to bolster it with people who have more experience working with K8s. A rapidly scaling company (say receiving series B funding) might want to hire dozens or hundreds of developers over the next couple of quarters. As K8s is still a relatively new technology, not all developers have used it in previous roles, and it can be hard to know how to judge the skill level of potential hires. There are now certification programs, but many great developers don’t spend their time acquiring certificates to prove their expertise. A big problem with certification is that it doesn’t always translate into the “real world” so smoothly. And how do you judge veteran programmers? Could someone who has spent 15 years understanding how to develop on top of Java Virtual Machines apply that knowledge to something like K8s quickly?
If you need to upskill an existing workforce or new hires, consider setting developer teams self-paced training programs, or bringing in external trainers for group or one-to-one programs. At least to get your team started on improving their skills. If you manage to find and hire developers with solid k8s knowledge and experience, then they can assist with that training, or offer to mentor other team members.
Another approach is to prioritize simplicity and abstraction in your own internal developer platform, meaning that the platform fills those knowledge gaps for you by proving a simpler way of accomplishing them.
Building the right team balance
As you start to roll out Kubernetes, it’s crucial to ensure that adoption is not driven by a single key person. If they are pulled elsewhere or leave with no one else taking their place, you can end up with an out of date setup that becomes a liability. Despite the name, the same applies to a managed K8s service, because most providers only keep the levels of the stack they are responsible for up to date, you still need to maintain containers and other dependencies.
When you are a small team, most of the team are generalists and can complete common Kubernetes tasks such as adding new services and updating and installing Helm charts. Maybe there are one or two specialists with more interest and enthusiasm than the rest. Smaller teams move quickly as there are few barriers, permissions, or strict workflows in place. As a team grows, it starts to be necessary to introduce restrictions and limitations on what team members can do with running applications and clusters. More hands on the controls means more chances for accidents to happen such as suddenly breaking production instances for customers. When a company has passed through this period and has funds for bigger teams, they can hire individual specialists to handle specific application and infrastructure components.
The period in-between growing from a small to enterprise sized teams is where most of the challenges lie. The tradeoffs between how to configure and secure K8s to best balance development speed and safety are difficult to make. If you introduce too many blocks to developer productivity too soon, people find ways around them so that they can still get what they need done. This can lead to security and stability issues, and contribute to tensions between your teams.
It is worth considering a level of separation and isolation around cluster access on a team by team basis. It may be that a team consists of one person, or that people are on multiple teams. But if your team built a feature, they can create, change, and remove the K8s objects that relate to it, but not the objects for any other team’s features.
Giving developers the ability to self-serve can also help build more responsibility in development teams. If they have to maintain the services they create (more than before), they may take more time to ensure they run more efficiently, or exercise more caution in diving into the latest greatest tools and frameworks.
To relieve some operational overhead, you can consider a managed Kubernetes service, direct from a cloud provider, or provider agnostic. If you are already using cloud services from one or more vendors, then this is an easier transition. But there are some technical and compliance edge cases that might prevent you from using a managed service.