NewBuildkite hosted agents. Check out the Q1 Release for the latest features, including managed CI/CD offerings for Mac and Linux.

Local privilege escalation: Fixing security issues in the agent

Chapter 1: When life was good

Suggested soundtrack: Sunshine, Lollipops And Rainbows - Lesley Gore

Once upon a time, there were no computers, and life was good. "File systems" might have meant pieces of paper and manilla folders organised into big metal filing cabinets. "Docker" sounded like someone who works with boats. Saying "symlink" and "TOCTOU" out loud would have been met with strange looks.

Fast forward to a few years ago. We have an Elastic CI Stack, which runs agents on AWS EC2 instances. Docker is a thing that means containers, a technology for faking up a computer inside another computer (the EC2 instances themselves are another kind of fake computer, but I digress). Customers run jobs in Docker containers, but frequently the files that they got back out of the container had different file permissions, and user namespacing in Docker isn't good enough, or something. This isn't always a problem, but the next time the agent ran a job, there would be these leftover files from the previous job that couldn't be cleaned up, and would interfere.

"Oh, I know!" someone probably exclaimed. "What we'll do is, we'll have a small Bash script that fixes the file permissions in the agent environment hook. That way it can clean up leftover files," they probably said.

#!/bin/bash chown -R buildkite-agent:buildkite-agent /var/lib/buildkite-agent/builds

Someone else chimed in. "But how will that work? The agent runs as user buildkite-agent, but the files are owned by root. You can't change file ownership without either being the owner, or being root, which in this case is the same thing, but is a case people in the future might wonder about so I'm saying it now," they probably replied.

"Hmm. Oh, I got it. We let buildkite-agent run the script with sudo. That's computer-speak for Simon Says, so it has to do it. Here's a sudoers configuration."

buildkite-agent ALL=NOPASSWD: /usr/bin/fix-buildkite-agent-builds-permissions

Chapter 2: The best laid plans

Suggested soundtrack: Journey to the real world – Tame Impala

Some time passes.

"Hey, our builds are really slow. It looks like the environment hook is spending a lot of time changing file permissions."

"That's weird... oh no actually it's not weird. The agent has run a lot of different pipelines, so there's lots of different leftover directories underneath the builds directory. It's trying to fix them all."

A brief pause.

"Oh, I got it. We'll pass in some arguments to the permissions fixer script to specify which directory to fix. That way it only fixes what it needs to."

#!/bin/bash AGENT_DIR="$1" # => "my-agent-1" ORG_DIR="$2" # => "my-org" PIPELINE_DIR="$3" # => "my-pipeline" BUILDS_PATH="/var/lib/buildkite-agent/builds" # And now we can reconstruct the full agent builds path: PIPELINE_PATH="${BUILDS_PATH}/${AGENT_DIR}/${ORG_DIR}/${PIPELINE_DIR}" # => "/var/lib/buildkite-agent/builds/my-agent-1/my-org/my-pipeline" # If it doesn't exist, then we won't do anything. if [[ -e "${PIPELINE_PATH}" ]]; then /bin/chown -R buildkite-agent:buildkite-agent "${PIPELINE_PATH}" fi

"Oh. That's good. I spot one problem though. What if someone tries to escape the builds directory? This was a common problem with things like old web servers, where you could ask for ../../../../../etc/passwd and be given the contents of /etc/passwd, even though the web server supposedly only served from /var/www."

"Ah, gotcha... so we have to block dots like . and .., and probably blank items, and also slashes / as well..."

# Make sure it doesn't contain any slashes by substituting slashes with nothing # and making sure it doesn't change function exit_if_contains_slashes() { if [[ "${1//\//}" != "${1}" ]]; then exit 1 fi } function exit_if_contains_traversal() { if [[ "${1}" == "." || "${1}" == ".." ]]; then exit 2 fi } function exit_if_blank() { if [[ -z "${1}" ]]; then exit 3 fi } # Check them for slashes exit_if_contains_slashes "${AGENT_DIR}" exit_if_contains_slashes "${ORG_DIR}" exit_if_contains_slashes "${PIPELINE_DIR}" # Check them for traversals exit_if_contains_traversal "${AGENT_DIR}" exit_if_contains_traversal "${ORG_DIR}" exit_if_contains_traversal "${PIPELINE_DIR}" # Check them for blank values exit_if_blank "${AGENT_DIR}" exit_if_blank "${ORG_DIR}" exit_if_blank "${PIPELINE_DIR}" # If we make it here, we're safe to go!

(Spoilers: we weren't safe to go.)

Chapter 3: Shenanigans ensue

Suggested soundtrack: Sabre Dance / Rattle – Khachaturian, Berliner Philharmoniker

Years pass. Many many agents run many many jobs, many of them run the script. Eventually, a security researcher reports a local privilege escalation vulnerability.

"Hi, sorry to interrupt. You forgot about symlinks."

"Huh? What about symlinks? Surely we're protected from symlink problems because by default, chown -R doesn't traverse symlinks, right?"

"Yes that's true, but that doesn't apply to the path given to chown in its argument. chown, like most other tools, has to resolve the path it is given before it can do anything."

"Your point?"

"Well, if I can control what a job does, then I can make a job that replaces ORG_DIR (for instance) with a symlink to /usr/bin. Then I can call the script - with sudo, right, because jobs that the agent run, run as buildkite-agent too, so I can use sudo on it too, rules is rules. The path segments that the script is given look totally innocent, but chown will resolve the path to /usr/bin/something. So I can change ownership of things in /usr/bin, not things in /var/lib/buildkite-agent/builds."

"Ohhhhh. Gotcha. That's bad. Let's check for symlinks then..."

The engineer responsible for fixing the problem then went on a bit of a tangent.

"Let's see, readlink and realpath lets us resolve a path fully, so if realpath and the path we think we're operating on are different, then something's up and we should bail. Is readlink or realpath better? realpath -e? Or -f? That would be nice, since that prevents symlinks in the whole path. Oh wait, we're running tests in an Alpine container, so those flags don't exist. Ohhhh, but what if someone tries to override realpath through the PATH environment variable? Hmm... no, we're good, we don't let sudo propagate the environment, and even if we did, PATH is forbidden. We're good."

After some furious coding, they merge a PR. "I think this will do."

# If it doesn't exist, then we won't do anything. if [[ ! -e "${PIPELINE_PATH}" ]]; then exit 0 fi # Check for symlink shenanigans. if [[ "$(realpath "${PIPELINE_PATH}")" != "${PIPELINE_PATH}" ]]; then exit 4 fi # It should be a directory. if [[ ! -d "${PIPELINE_PATH}" ]]; then exit 5 fi # If we make it here, we're safe to go! /bin/chown -R buildkite-agent:buildkite-agent "${PIPELINE_PATH}"

Chapter 4: Tick Tock Too

Suggested soundtrack: Anti Hero – Taylor Swift

The security researcher, looking on, bemused, spoke again.

"That's definitely harder to exploit, but... there's still a problem. It's TOCTOU."

The engineer had heard of TOCTOU before. "I...oh. Dammit, you're right."

They turned to face the reader, and explained.

"We check that it's not a symlink, then pass it to chown. But an attacker could sneak in between checking and chowning and change the intermediate path to a symlink."

They paused again.

"But that means... it's practically impossible to use a Bash script to solve this problem. chown doesn't have a flag for preventing its argument from containing symlinks. It has -h, but that changes what it does when given a symlink. It has -P, which is the default for recursive mode, that prevents traversing symlinks that it finds. It has -H and -L and those are definitely not what we want! But nothing to prevent intermediate symlinks in the path it is given."

The engineer became visibly agitated, breathing in deeply, but just as a stream of blood-curdling profanity and invocation of eldritch horrors was about to escape their mouth, they calmed again.

"It's kinda like…untrusted input shouldn't be allowed anywhere near privileged scripts, or something."

Chapter 5: Jail for Mother! Jail for One Thousand Years!

Suggested soundtrack: Shawshank Redemption Theme

"Let's... uh, let's make a jail? A chroot jail is where you change the apparent root of the file system. This prevents accessing anything outside the jail. Seems nifty."


"Oh no, that's difficult. We'd have to copy the chown tool (and any libraries it depends on) into the jail in order to use it. And possibly the script, and also Bash in order to run the script, and its dependencies... but then... we'd have to prevent the script from being used on itself in the same way... and also prevent chown from operating on itself or Bash, or ... argghh."

Chapter 6: Containers

Suggested soundtrack: Dream Within a Dream – from the Inception soundtrack

"Containers got us into this mess in the first place. Surely containers can get us out of it!"

(Containers might be able to get us out of this mess security-wise, but among other reasons, performance would probably suck.)

Chapter 7: A million tiny jails

Suggested soundtrack: No Surprises – Radiohead

"Surely someone has solved this problem. Okay, basics. We at least need to be able to open a directory and at the same time prevent its path from containing any symlinks. Do syscalls even exist for that?"

A small amount of research later...

"It does! Praise be unto Linus! It's called openat2. It has the flags. All we have to do is pass it a file descriptor for a directory and a subpath. And then when we want to change file ownership, we can do a similar sort of thing with fchownat."

"Wait, how does that fix anything?"

"I think I see it. We can open the builds directory, /var/lib/buildkite-agent/builds, which we trust is not alterable by the job. Then we can open the subpath given by the arguments using openat2, which ensures that it is a subpath of the builds directory while simultaneously preventing any symlinks, guaranteed by the Linux kernel. Once we have it open, it's a file descriptor that refers to a specific inode on disk. It doesn't matter what an attacker does to the path in the meantime, because the path is already resolved. It's like a tiny per-open-call chroot jail."

"All we have to do is…implement our own recursive chown using this technique."