Oliver Koo is a Senior Mobile Engineer at Pinterest. Recently at UnblockConf '21 he shared the story of his team’s evolution from on-prem physical hardware to modern cloud-hosted architecture on AWS EC2 Mac. He took us on a whirlwind tour through Pinterest’s iOS CI infrastructure and how they make use of a bunch of AWS services including RDS, Cloudwatch, along with Packer, Bazel and Terraform and Buildkite, all working together.
- 450M+ Global Monthly Active Users
- 300B Pins saved
- 2500 Employees
"We are supporting a lot of engineers and a lot of builds per day!"Oliver Koo
The catalyst for change
There is pain that comes with maintaining physical machines on premises – especially for teams maintaining iOS and mobile CI/CD infrastructure. Physically procuring hardware, spending countless hours manually configuring and upgrading machines one by one means this “keep the lights on” work takes people away from more impactful efforts.
For Pinterest, maintaining these systems became increasingly cumbersome as the work increased and the engineering team scaled. The physical boxes needed frequent rebooting, resulting in service disruptions, which made the CI service slow and unreliable to software engineers who just wanted to ship their software updates frequently. As a result, the outputs were limited and ultimately a reduction in quality for end users.
“Maintaining our systems felt really cumbersome, we wanted the team to thrive, and to be able to focus on the things that they actually wanted to be doing.”Oliver Koo
With a goal to improve the developer experience for software engineers and app quality, Pinterest made the decision to modernize their iOS CI/CD infrastructure, and to move to AWS EC2 Macs.
A new mobile CI architecture
Pinterest’s new and far more flexible mobile CI architecture was only made possible by this migration to cloud services.
Each queue is attached to its own autoscaling group; each group can scale to meet demand (or queue size). Each autoscaling group is paired with a specific version of the launch template, the launch template specifies which VPC and subnet to launch the Clusters into and which Amazon Machine Image (AMI) to use (AWS’s AMI for EC2 Mac provides the team with all the information needed to launch an EC2 Mac instance). The ability to select different AMIs to use in the Launch Template lets the team easily rollout or rollback deployments on their CI clusters.
Pinterest’s EC2 Mac AMI can be broken down into two layers; the base layer and a customized layer. The base layer includes the macOS (provided by Amazon’s macOS AMI) along with Xcode. Sitting atop the base AMI, is a customized layer, where tools and libraries are installed, and secrets and agents are configured. Together the two layers make up the machine image, having these two separate layers allows minor tooling updates and config changes to be made without having to reinstall the OS, or more importantly, Xcode (a notoriously lengthy process).
Adaptable CI infrastructure that scales with demand
The payoff has been significant. Besides the ease of now being able to manage and architect their infrastructure as code, the team is now able to scale build clusters to meet build demands leveraging the Buildkite Agent Scaler.
Build machine upgrades are faster and more reliable with Amazon Machine Image (AMI). Different machine specs are specified in the team’s AMI which is created via a specific AMI creation pipeline (utilising Packer and Bazel) that packages AMI then automatically publish to the AWS AMI registry.
Customizing access privileges per cluster with AWS Identity and Access Management (IAM) to set granular access boundaries limits the scope of potential security vulnerabilities considerably. Each cluster and build queue only has access to what they need.
Moving to AWS for EC2 Mac also meant the team were able able to leverage AWS services to make life much easier:
- Secrets Manager to securely store, retrieve and rotate credentials as needed.
- Elastic Block Store (EBS) to dynamically adjust instance storage per cluster.
- Relational Database Service (RDS) to easily create databases and backups to store build metrics.
- Cloudwatch along with the Buildkite Agent Metrics lambda to store build related metrics.
When comparing the pipelines now run by EC2 Mac Autoscaling Clusters to those run by on-premises machines, the results are impressive:
- 18.4% improvement in speed
- 80.5% less CI-related build failures
- 43% reduction in upgrade times It's easier to install and upgrade libraries and new versions of Xcode. In the end, that’s happier engineers and end users, win win.
Check out Oliver's talk for a first hand look at the Pinterest Mobile team’s infrastructure architecture, and a practical demo on how they’re utilising Packer and Bazel in their AMI registry, and how they're using Terraform to manage their AWS EC2 cluster autoscaling groups.