Terraform setup for the Elastic CI Stack for GCP
This guide helps you to get started with the Elastic CI Stack for GCP using Terraform.
Elastic CI Stack for GCP allows you to launch a private, autoscaling Buildkite Agent cluster in your own GCP project.
Before you start
Before deploying the Elastic CI Stack for GCP, review the prerequisites, required skills, and billable services to ensure you have the necessary tools, knowledge, and budget planning in place.
Prerequisites
- Terraform version >= 1.0
- Buildkite Account
- GCP Account with a project
- gcloud CLI configured
Required and recommended skills
The Elastic CI Stack for GCP does not require familiarity with the underlying GCP services to deploy it. However, to run builds, some familiarity with the following GCP services is recommended:
-
Google Compute Engine (to select a
machine_typeappropriate for your workload) - Google Cloud Storage (for storing build artifacts)
- Secret Manager (for storing the Buildkite Agent token securely)
Elastic CI Stack for GCP provides defaults and pre-configurations suited for most use cases without the need for additional customization. Still, you'll benefit from familiarity with VPCs, Cloud NAT, and firewall rules for custom instance networking.
For post-deployment diagnostic purposes, deeper familiarity with Compute Engine is recommended to be able to access the instances launched to execute Buildkite jobs over SSH or Identity-Aware Proxy.
Billable services
The Elastic CI Stack for GCP template deploys several billable GCP services that do not require upfront payment and operate on a pay-as-you-go principle, with the bill proportional to usage.
| Service name | Purpose | Required |
|---|---|---|
| Compute Engine | Deployment of VM instances | ☑️ |
| Persistent Disk | Root disk storage of VM instances | ☑️ |
| Cloud Functions | Publishing queue metrics for autoscaling | ☑️ |
| Secret Manager | Storing the Buildkite agent token (recommended) | ☑️ |
| Cloud Logging | Logs for instances and Cloud Function | ☑️ |
| Cloud Monitoring | Metrics for autoscaling | ☑️ |
| Cloud NAT | Outbound internet access for instances | ☑️ |
| Cloud Storage | Build artifacts storage (if enabled) | ❌ |
Buildkite services are billed according to your plan.
What's on each machine?
- Debian 12 (Bookworm)
- The Buildkite Agent
- Git
- Docker (when using custom Packer image)
- Docker Compose v2 (when using custom Packer image)
- Docker Buildx (when using custom Packer image)
- gcloud CLI - useful for performing any ops-related tasks
- jq - useful for manipulating JSON responses from CLI tools
For more details on what versions are installed, see the corresponding Packer templates.
The Buildkite agent runs as user buildkite-agent.
Supported builds
This stack is designed to run your builds in a shared-nothing pattern similar to the 12 factor application principles:
- Each project should encapsulate its dependencies through Docker and Docker Compose.
- Build pipeline steps should assume no state on the machine (and instead rely on the build meta-data, build artifacts, or Cloud Storage).
- Secrets are configured using environment variables exposed using Secret Manager.
By following these conventions, you get a scalable, repeatable, and source-controlled CI environment that any team within your organization can use.
Custom images
Custom images help teams ensure that their agents have all required tools and configurations before instance launch. This prevents instances from reverting to the base image state when agents restart, which would lose any manual changes made during run time.
Requirements
To use the Packer templates provided, you will need to install the following installed on your system:
- Docker
- Make
- gcloud CLI
The following GCP IAM permissions are required for building custom images using the provided Packer templates:
{
"title": "Packer Image Builder",
"description": "Permissions required to build VM images with Packer",
"includedPermissions": [
"compute.disks.create",
"compute.disks.delete",
"compute.disks.get",
"compute.disks.use",
"compute.images.create",
"compute.images.delete",
"compute.images.get",
"compute.images.useReadOnly",
"compute.instances.create",
"compute.instances.delete",
"compute.instances.get",
"compute.instances.setMetadata",
"compute.instances.setServiceAccount",
"compute.machineTypes.get",
"compute.networks.get",
"compute.subnetworks.use",
"compute.subnetworks.useExternalIp",
"compute.zones.get",
"iam.serviceAccounts.actAs"
]
}
It is also recommended that you have a base knowledge of:
- Packer
- HashiCorp Configuration Language (HCL)
- Bash scripting
Creating an image
To create a custom image with Docker support (recommended for production):
cd packer
./build --project-id your-gcp-project-id
This builds a Debian 12-based image with:
- Pre-installed Buildkite Agent
- Docker Engine with Compose v2 and Buildx
- Multi-architecture build support
- Automated Docker garbage collection
- Disk space monitoring and self-protection
- Centralized logging with Ops Agent
For more details, see packer/README.md.
Deploying the stack
This section walks through the deployment process step by step, from obtaining your agent token to initializing and applying your Terraform configuration.
Step 1: Get your Buildkite agent token
Go to the Agents page in the Buildkite Pipelines web interface and click Reveal Agent Token:
The agent token is used to register agents with your Buildkite organization.
Step 2: Store the token in Secret Manager (recommended)
For production deployments, store the token in Secret Manager:
echo -n "your-agent-token" | gcloud secrets create buildkite-agent-token \
--data-file=- \
--project=your-project-id
# Verify the secret was created
gcloud secrets describe buildkite-agent-token --project=your-project-id
Step 3: Create your Terraform configuration
Create a new directory for your Terraform configuration:
mkdir buildkite-gcp-stack
cd buildkite-gcp-stack
Create a main.tf file:
terraform {
required_version = ">= 1.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 4.0, < 8.0"
}
}
}
provider "google" {
project = var.project_id
region = var.region
}
module "buildkite_stack" {
source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"
# Required
project_id = var.project_id
buildkite_organization_slug = var.buildkite_organization_slug
buildkite_agent_token_secret = "projects/${var.project_id}/secrets/buildkite-agent-token/versions/latest"
# Stack configuration
stack_name = "buildkite"
buildkite_queue = "default"
region = var.region
# Scaling configuration
min_size = 0
max_size = 10
# Instance configuration
machine_type = "e2-standard-4"
}
Create a variables.tf file:
variable "project_id" {
description = "GCP project ID"
type = string
}
variable "region" {
description = "GCP region"
type = string
default = "us-central1"
}
variable "buildkite_organization_slug" {
description = "Buildkite organization slug"
type = string
}
Create a terraform.tfvars file:
project_id = "your-gcp-project-id"
region = "us-central1"
buildkite_organization_slug = "your-org-slug"
Create an outputs.tf file (optional):
output "network_name" {
description = "Name of the VPC network"
value = module.buildkite_stack.network_name
}
output "instance_group_name" {
description = "Name of the managed instance group"
value = module.buildkite_stack.instance_group_manager_name
}
output "agent_service_account_email" {
description = "Email of the agent service account"
value = module.buildkite_stack.agent_service_account_email
}
Step 4: Initialize and deploy
- Authenticate with GCP:
gcloud auth application-default login
- Initialize Terraform:
terraform init
- Review the planned changes:
terraform plan
- Deploy the stack:
terraform apply
- Type
yeswhen prompted to confirm the deployment.
The module will create:
- VPC network with Cloud NAT
- IAM service accounts with appropriate permissions
- Managed instance group with Buildkite agents
- Cloud Function for autoscaling metrics
- Health checks and autoscaling based on queue depth
Advanced configuration
This section covers some of the configurations you might want to use for a deeper customization of your stack.
Using a custom VM image
If you built a custom Packer image with Docker support:
module "buildkite_stack" {
source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"
# ... other configuration ...
# Use custom image family
image = "buildkite-ci-stack"
}
Multiple queues
To create multiple agent pools with different configurations, deploy multiple stacks with different queue names:
# Production stack
module "buildkite_stack_production" {
source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"
stack_name = "buildkite-production"
buildkite_queue = "production"
machine_type = "e2-standard-4"
max_size = 20
# ... other configuration ...
}
# Build stack for larger builds
module "buildkite_stack_builds" {
source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"
stack_name = "buildkite-builds"
buildkite_queue = "builds"
machine_type = "n1-standard-8"
max_size = 10
# ... other configuration ...
}
Enabling Cloud Storage access
If your builds need to upload/download artifacts to Cloud Storage:
module "buildkite_stack" {
source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"
# ... other configuration ...
enable_storage_access = true
}
Using IAP for secure SSH access
Enable Identity-Aware Proxy for secure SSH access without external IPs:
module "buildkite_stack" {
source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"
# ... other configuration ...
enable_iap_access = true
}
Then connect to instances:
gcloud compute ssh INSTANCE_NAME \
--zone ZONE \
--tunnel-through-iap \
--project PROJECT_ID
Restricting SSH access
Restrict SSH access to specific IP ranges:
module "buildkite_stack" {
source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"
# ... other configuration ...
enable_ssh_access = true
ssh_source_ranges = ["203.0.113.0/24"] # Your office IP range
}
Adding resource labels
Add labels for cost tracking and organization:
module "buildkite_stack" {
source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"
# ... other configuration ...
labels = {
team = "platform"
environment = "production"
cost-center = "engineering"
}
}
Updating the stack
To update your stack configuration:
- Modify your Terraform configuration files
- Review the changes:
terraform plan
- Apply the changes:
terraform apply
Terraform will automatically perform rolling updates to minimize disruption:
- New instances will be created with the updated configuration
- Old instances will be drained and terminated
- The process of updating the stack respects
max_surgeandmax_unavailablesettings
Destroying the stack
To tear down the entire stack, use:
terraform destroy
Additional information
To gain a better understanding of how Elastic CI Stack for GCP works and how to use it most effectively and securely, check out the following resources: