The Top 5 challenges when running CI workloads on Kubernetes

Some people think running CI workloads in Kubernetes is all rainbows and unicorns! "Let’s just sprinkle some magic K8s dust on it and everything will be golden." 🌈🦄✨

There’s a lot of talk about the benefits of Kubernetes, and that’s because there are a lot of benefits. But you should be ready to tackle the complexity that comes with all the benefits because there are some real challenges to running CI on Kubernetes.

Here’s a list of the top 5 challenges that we ran into whilst building CI/CD workflows on Kubernetes.

1 – Impedance mismatch

You might think the ephemeral execution model of CI and Kubernetes will align nicely. But there are different things happening that aren’t quite the same speed.

There are likely to be cost savings by scaling up and down, but you've got to be careful. You don't want to evict a pod that's doing work. The consumption model of a CI build is extremely spiky, but it doesn't quite match how you're saving costs and allocating resources with Kubernetes. Parallelism is wonderful, but now you need a copy of your repo on every pod, or you need a persistent volume claim (PVC) that's going to be shared among these. The complexity begins to spiral.

2 – Building containers gets weird

Building containers in Kubernetes gets weird. You’re running a container on a container runtime. How do you do the build interface? Docker-in-Docker (DIND) is ugly. There are other tools, but they don’t use the standard Dockerfile interface, and do I choose Bazel, jib, Buildpacks, Buildkit, or Kaniko? And how do they interact with my development workflows and what the rest of my team are familiar with?

3 – Locking down the build

We all agree that security is job zero, so you’ll need robust network security. You also want interpod communication, which is easy when everything is open and can talk freely. As soon as you begin restricting this, things get complicated. Host access is super convenient, but is also a privilege escalation path you don’t want to enable.

Then there’s Role-based access control (RBAC), and Kubernetes expects the namespace to be the only scoping capability. But you've got credentials you want to share among build agents. You've got credentials you want to share to be able to deploy different places, and that scoping gets challenging.

4 – Caching is hard

You’ve likely heard the joke, and it’s not funny because caching is hard! CI in Kubernetes is ephemeral by nature, and having things spinning up and down constantly means you can’t have a consistent copy of things that’s rolling out all over the place. And because you don’t want to be downloading node modules for every single job, you’re faced with a significant challenge.

Bazel remote caching is the best in breed in this ecosystem, but it can be complex – especially for teams who haven’t worked with it before. And if you’re building a monorepo, Gitlab and GitHub network charges can be a problem if you start to clone the repository in every job step.

5 – Tool layering

A workflow is not a native concept in Kubernetes, so if you need to layer things on top of other things to model workflows and dependencies the way you’d expect to get them in a CI tool, you’re facing what we’re calling: the tool layering challenge.

Perhaps you want to use Istio Service Mesh and you’re using Vault for secret and token management. In that case, you’ve now got isolation and other capability problems–these two things don't play nicely together.

Why do we keep doing it to ourselves?

Why tho cat meme, a strange white cat with hands outstretched asking "whhhhyyy?"

Kubernetes in CI – Why tho?

All this complaining, all these problems, why do we keep doing it to ourselves? We keep going to Kubernetes for CI because it’s great! These are the challenges that we are faced with when we're building these systems, but really effective build ecosystems and developer experiences are built on top of Kubernetes.

Further reading:

Buildkite Pipelines is a CI/CD tool designed for developer happiness and efficiency. Easily follow and decipher logs, get observability into key build metrics, and tune for enterprise-grade speed, scale, and security. Every new signup gets a free 30-day trial to test out the key features. See Buildkite Pipelines to learn more.