Pagerduty - Buildkite Case Study

Case Studies

Pagerduty

Software - 233 Buildkite users

Pagerduty logo

Pagerduty is the leader in digital operations management with more than 13,000 customers worldwide who rely on its platform to keep their own digital services running. Its platform helps clients identify issues and opportunities in real time and bring together the right people to fix problems faster and prevent them in the future. Headquartered in San Francisco, PagerDuty has a distributed engineering team operating in Toronto, Atlanta, London, Sydney, and Seattle in addition to its home base.

As the world rushes online and to remote work, there has been a 47% increase in the number of daily incidents. However, PagerDuty customers have been able to resolve those incidents 20% faster. PagerDuty’s use of AI and ML to address issues in minutes and seconds, not hours, is what has attracted 60 of the Fortune 100, and companies like GE, Cisco, Genentech, Electronic Arts, Netflix, Shopify, Zoom, DoorDash, lululemon and more as customers.

99%
of production builds pass through Buildkite
531
average builds per day
The ability to bootstrap new services quickly is key for us. With Buildkite and a small amount of automation we’ve built ourselves, teams are able to go from a blank repo to a service running in production in a few minutes. This allows us to go from an idea to an MVP much quicker.
Tristan Bates

Tristan Bates
Senior Site Reliability Engineer

When CI/CD Becomes A ‘Choose Your Own Adventure’ Game

As one can imagine, the ability to deploy, test, and ship code faster is essential for a company that describes itself as the “central nervous system” of an organization’s IT operations. However, the CI/CD solutions used by PagerDuty weren’t able to keep up with the testing and deployment needs of engineering and other teams.

“CI/CD was a choose-your-own adventure for our feature delivery teams,” recalled Tristan Bates, Senior Site Reliability Engineer at PagerDuty. “Deployments were either a mix of self-hosted GoCD or manual deployments using shell scripts and Makefiles.”

Neither option was ideal for PagerDuty. “GoCD was difficult to maintain and required constant collaboration between feature delivery teams and the SRE team whenever a change to a deploy pipeline was required, a new service was created, or an engineer switched teams,” said Bates.

He added, “Custom scripts for deployment meant there was no standardization and it was difficult to tell when, how, or why something was deployed.”

This inconsistent process prompted the search for a replacement. After reviewing more than 50 vendors, PagerDuty narrowed it down to three finalists: Jenkins with CloudBees support, an updated version of GoCD, and Buildkite.

They arrived at their final decision by allowing service delivery teams to build pipelines for all three tools and test them against one another.

“We set up environments for all of these so that teams could test deploying to staging, deploying to production,” said Bates. “We created a decision matrix of all the different features from all the tools and our requirements, and scored each of them. It was a very analytically-driven selection.”

Buildkite’s Solution for Faster Shipping

In the end, Buildkite was the solution most closely aligned with PagerDuty’s key requirements:

  • Hybrid - PagerDuty needed a hybrid solution where the control plane was cloud managed, but agents could run locally.
  • Secrets management - The team needed to keep all of their secrets within their own infrastructure.
  • Self-service and ease of use - PagerDuty’s multiple engineering teams operate under a full-service ownership model. Common tooling and services helps them reduce their cognitive load.

“The hassle of managing the control plane is totally out of our hands,” Bates said of Buildkite. “We can set up our own secrets management, access to internal schedulers, and AWS and don’t have to have those secrets out on the cloud at large.”

PagerDuty has since moved all deployment pipelines to the platform. This includes the work of any team that touches code within the company, not just engineering.

“Ninety-nine percent of everything that makes it into production passes through Buildkite for deployment,” said Bates. “We also use it a lot for running tests and other jobs as well.”

Freeing Up Time for SRE and Feature Teams

PagerDuty has multiple SRE teams, however the one responsible for Buildkite is charged with enabling the rest of the engineering teams to deliver reliable and scalable services efficiently.

“The ability to bootstrap new services quickly is key for us so we can focus on delivering features, bug fixes, and improvements instead of repeating the same common setup and configuration steps for every service,” Bates said. “With Buildkite and a small amount of automation we’ve built ourselves, teams are able to go from a blank repo to a service running in production in a few minutes. This allows us to go from an idea to an MVP much quicker.”

Recalling PagerDuty’s previous CI/CD process, Bates said, “It used to take a day to get a pipeline up and running because you had to learn this archaic XML format, and set up credentials, and perform these rituals and other manual steps. It was just really difficult to get code deployed to any environment.”

But now with Buildkite “it’s trivial,” Bates said. “You click a button and it generates a pipeline, and it does all this dynamic magic. Teams just don’t think about it anymore.”

Photo of some of the Pagerduty engineering team
Photo of the Pagerduty office
Two people launching a pod

Start Trial

Sign up for free, and
connect your first agent.

Start Trial →

Talk to our team

Send us an email if you’d like to chat about how Buildkite could help you.

More case studies

Intercom

Software - 120 engineers

Intercom is a live chat system for support, sales, and marketing teams that allows businesses to track and filter customer data; this data can be used to create personalized, automated marketing emails and in-app messages. With Buildkite, Intercom has full confidence that they can ship extremely fast and efficiently because they’ve got full control over the infrastructure.

Read the Case Study
150deploys per day
85%reduction in test times

REA Group

Ecommerce - 550 engineers

REA Group is a multinational digital advertising business specialising in property. Moving to Buildkite from a mix of different build systems, REA was able to significantly decrease their maintenance costs, more easily adopt best practices across all their teams, and make full use of their AWS and Docker expertise.

Read the Case Study
550engineers globally
80%reduction in team setup time

Shopify

Ecommerce - 1800 engineers

Shopify is a global ecommerce platform serving hundreds of millions of shoppers. Switching to Buildkite allowed Shopify to reduce their core application’s build times to under 5 minutes, supported an engineering team growth of 300%, and helped to smoothly transition from AWS to Google Cloud.

Read the Case Study
75%reduction in build wait times
4xincrease in CI speed for the same budget