NewBuildkite hosted agents. Check out the Q1 Release for the latest features, including managed CI/CD offerings for Mac and Linux.

How Rippling reduced CI/CD costs by 50% with AWS Spot Instances


We're always excited when teams find innovative ways to optimize their CI/CD pipelines using Buildkite. One success story that stands out is how Rippling, a leading workforce management platform, combined Buildkite with AWS Spot Instances to reduce infrastructure costs while positioning themselves for greater scale.

In this post, we'll explore how Rippling approached this challenge. You'll learn about the obstacles they faced and the clever strategies they implemented to get the cost savings of Spot Instances without compromising on performance or reliability.

Whether you're just starting with Spot Instances or looking to improve your usage, Rippling's experience offers valuable insights.

The challenge: Scaling CI/CD for rapid growth

As a fast-growing company with engineers pushing significant changes each day, Rippling faced a daunting challenge with their CI/CD pipelines to balance:

  • Speed of delivery
  • Quality
  • Cost

With a million-dollar infrastructure bill, other companies might have hit the panic button and gone into cost-cutting overdrive. But not Rippling—they were determined to maintain a good developer experience and slay all three dragons at once.

Organization snapshot:

  • Engineers: >650
  • Tests: 60,000
  • CI/CD load: Hundreds of PRs a day

Each build used extensive parallelization, with multiple nodes and processes per node, resulting in a staggering demand for compute resources. During peak periods, Rippling could have over 1,200 large virtual machines running simultaneously, equivalent to 50,000 CPUs and 100,000 GB of memory.

The solution: Embracing Spot Instances with Buildkite

AWS Spot Instances are spare compute capacity available in the AWS cloud at steep discounts compared to regular On-Demand Instance pricing. These instances are sold on a Spot Instance market, where AWS dynamically adjusts prices based on the supply and demand for spare EC2 capacity.

By using a flexible CI/CD tool like Buildkite Pipelines, Rippling has the freedom to bring their own compute resources and optimize performance exactly as they want. In this case, by running their CI/CD build infrastructure on Spot Instances.

So what's the catch? Spot Instances can be interrupted by AWS with two minutes' notice when they need the capacity back. However, this interruption risk is offset by low prices, making them an attractive option for workloads that can tolerate potential interruptions, such as CI/CD pipelines.

When requesting Spot Instances, you specify the instance type, the desired number of instances, and the maximum price you're willing to pay per hour. If the current Spot price for the instance type is at or below your maximum price, your request is fulfilled.

AWS continuously monitors the prices and terminates Spot Instances when the demand for capacity increases or the Spot price exceeds your maximum price. This dynamic pricing model allows AWS to optimize capacity utilization while you benefit from significant cost savings compared to On-Demand Instance pricing.

The advantages of using Spot Instances for CI/CD pipelines include:

  • Cost savings: The primary benefit is the potential for substantial cost savings, typically ranging from 50% to 90% compared to On-Demand Instance prices.
  • Scalability: Spot Instances let organizations rapidly scale their CI/CD pipelines by using spare compute capacity as needed, without up-front commitments or long-term contracts.
  • Parallel execution: CI/CD workloads often involve parallel test execution, and Spot Instances are an ideal fit as they can be launched in bulk and terminated individually as needed.

However, there are also some potential drawbacks to consider:

  • Interruption risk: Spot Instances can be interrupted with just two minutes' notice, potentially leading to incomplete builds or test runs and requiring robust fault tolerance mechanisms.
  • Availability constraints: Spot Instance availability can vary based on region, instance type, and availability, potentially limiting the ability to scale during periods of high demand.
  • Complexity: Using Spot Instances requires careful monitoring, configuration, and management of instance lifecycle events, which adds complexity to the CI/CD infrastructure.

Rippling navigated these challenges and capitalized on the substantial cost savings Spot Instances offers through strategic planning, iterative optimization, and monitoring.

AWS Spot Instance alternatives

While this post focuses on AWS Spot Instances, each major cloud provider has a similar solution:

The how: Moving to AWS Spot Instances

Rippling's journey to using Spot Instances was iterative, with plenty of creative problem-solving along the way.

Here's a quick rundown of how they approached this challenge:

1. Estimating potential savings: Rippling started by analyzing their AWS bill, identifying that computing costs (EC2 instance pricing) accounted for 90% of their total cloud costs. By estimating that they could achieve up to 75% savings on computing costs with Spot Instances, they projected an overall cost reduction of approximately 50%.

2. The naive approach: Initially, Rippling lowered the OnDemandPercentage in their existing architecture using Buildkite's Elastic CI Stack for AWS, gradually increasing the Spot Instance usage. However, they encountered issues with outages during peak periods and a high level of Spot interruptions, negatively impacting build times.

3. Mitigating Spot outages: To address the outages, Rippling developed services to detect them and dynamically switch their CI pipeline to use On-Demand Instances when necessary. While this approach improved reliability, it presented challenges in managing the switch between queues, persisting the configuration, and allowing retries.

4. Making the pipeline Spot-friendly: Rippling reconsidered their approach and focused on extending their Spot availability pool by accepting a broader range of instance types. They also optimized their pipeline to handle Spot interruptions more gracefully, preventing unnecessary test retries and improving fault tolerance.

5. Using price-optimized instances: Finally, Rippling used AWS's lowest-price strategy for Spot Instances, ensuring they always obtained the most cost-effective instances while accommodating potential interruptions. They could support this allocation strategy by forking and modifying Buildkite's open-source templates.

So there you have it—Rippling's multi-step masterclass on migrating to Spot Instances. For more details on each step, see Rippling's blog or register to watch our webinar with them.

Your journey may not look exactly the same, but hopefully their innovative approach and lessons can help you fast-track your own migration.

The results: Massive cost savings and future-proof scaling

All that hard work paid off for Rippling in both cost savings and developer experience:

  • 🏆 60% savings on EC2 compute costs.
  • 💰 50% savings on their total cloud costs.
  • ⚡️ Fast builds making developers happy.

A couple of key factors really made the difference. First, investing early in observability and monitoring to track infrastructure costs related to compute and Spot Instance pricing. Then, implementing mechanisms to gracefully handle interruptions and continuously optimize for the best Spot pricing.

But it doesn't stop there. As Rippling keeps growing and more engineers are added to the mix, their codebases will expand even faster. A bigger codebase means a bigger test suite that takes longer to run on CI, leading to higher costs.

By implementing these Spot Instance optimizations, Rippling has future-proofed themselves and ensured even greater savings as their operation continues to scale up.

So, while the numbers are already pretty mind-blowing, Rippling's real win is setting themselves up to keep compounding those cost-saving wins for years to come. This is a great example of how a little innovative thinking can turn a potential challenge into a business advantage.

Conclusion

Rippling's story is a prime example of how Buildkite's flexibility opens up impactful cost-saving and scalability opportunities—especially when you get creative with innovative technologies like AWS Spot Instances. By taking a thoughtful and iterative approach, Rippling shows how to reap the benefits of Spot Instances without sacrificing performance or reliability.

At Buildkite, we're stoked to play a part in success stories like these. Whether you're already riding the Spot Instance wave or just starting to explore the possibilities, stories like Rippling's show how a little ingenuity and the right platform can take your CI/CD game to a whole new level. The future is bright for teams willing to get inventive!

Buildkite Pipelines is a CI/CD tool designed for developer happiness. Easily follow and decipher logs, get observability into key build metrics, and tune for enterprise-grade speed, scale, and security. Every new signup gets a free 30-day trial to test out the key features. See Buildkite Pipelines to learn more.