Terraform setup for the Elastic CI Stack for GCP

This guide helps you to get started with the Elastic CI Stack for GCP using Terraform.

Elastic CI Stack for GCP allows you to launch a private, autoscaling Buildkite Agent cluster in your own GCP project.

Before you start

Before deploying the Elastic CI Stack for GCP, review the prerequisites, required skills, and billable services to ensure you have the necessary tools, knowledge, and budget planning in place.

Prerequisites

Billable services

The Elastic CI Stack for GCP template deploys several billable GCP services that do not require upfront payment and operate on a pay-as-you-go principle, with the bill proportional to usage.

Service name Purpose Required
Compute Engine Deployment of VM instances ☑️
Persistent Disk Root disk storage of VM instances ☑️
Cloud Functions Publishing queue metrics for autoscaling ☑️
Secret Manager Storing the Buildkite agent token (recommended) ☑️
Cloud Logging Logs for instances and Cloud Function ☑️
Cloud Monitoring Metrics for autoscaling ☑️
Cloud NAT Outbound internet access for instances ☑️
Cloud Storage Build artifacts storage (if enabled)

Buildkite services are billed according to your plan.

What's on each machine?

For more details on what versions are installed, see the corresponding Packer templates.

The Buildkite agent runs as user buildkite-agent.

Supported builds

This stack is designed to run your builds in a shared-nothing pattern similar to the 12 factor application principles:

  • Each project should encapsulate its dependencies through Docker and Docker Compose.
  • Build pipeline steps should assume no state on the machine (and instead rely on the build meta-data, build artifacts, or Cloud Storage).
  • Secrets are configured using environment variables exposed using Secret Manager.

By following these conventions, you get a scalable, repeatable, and source-controlled CI environment that any team within your organization can use.

Custom images

Custom images help teams ensure that their agents have all required tools and configurations before instance launch. This prevents instances from reverting to the base image state when agents restart, which would lose any manual changes made during run time.

Requirements

To use the Packer templates provided, you will need to install the following installed on your system:

  • Docker
  • Make
  • gcloud CLI

The following GCP IAM permissions are required for building custom images using the provided Packer templates:

{
  "title": "Packer Image Builder",
  "description": "Permissions required to build VM images with Packer",
  "includedPermissions": [
    "compute.disks.create",
    "compute.disks.delete",
    "compute.disks.get",
    "compute.disks.use",
    "compute.images.create",
    "compute.images.delete",
    "compute.images.get",
    "compute.images.useReadOnly",
    "compute.instances.create",
    "compute.instances.delete",
    "compute.instances.get",
    "compute.instances.setMetadata",
    "compute.instances.setServiceAccount",
    "compute.machineTypes.get",
    "compute.networks.get",
    "compute.subnetworks.use",
    "compute.subnetworks.useExternalIp",
    "compute.zones.get",
    "iam.serviceAccounts.actAs"
  ]
}

It is also recommended that you have a base knowledge of:

Creating an image

To create a custom image with Docker support (recommended for production):

cd packer
./build --project-id your-gcp-project-id

This builds a Debian 12-based image with:

  • Pre-installed Buildkite Agent
  • Docker Engine with Compose v2 and Buildx
  • Multi-architecture build support
  • Automated Docker garbage collection
  • Disk space monitoring and self-protection
  • Centralized logging with Ops Agent

For more details, see packer/README.md.

Deploying the stack

This section walks through the deployment process step by step, from obtaining your agent token to initializing and applying your Terraform configuration.

Step 1: Get your Buildkite agent token

Go to the Agents page in the Buildkite Pipelines web interface and click Reveal Agent Token:

The agent token is used to register agents with your Buildkite organization.

Step 3: Create your Terraform configuration

Create a new directory for your Terraform configuration:

mkdir buildkite-gcp-stack
cd buildkite-gcp-stack

Create a main.tf file:

terraform {
  required_version = ">= 1.0"

  required_providers {
    google = {
      source  = "hashicorp/google"
      version = ">= 4.0, < 8.0"
    }
  }
}

provider "google" {
  project = var.project_id
  region  = var.region
}

module "buildkite_stack" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  # Required
  project_id                  = var.project_id
  buildkite_organization_slug = var.buildkite_organization_slug
  buildkite_agent_token_secret = "projects/${var.project_id}/secrets/buildkite-agent-token/versions/latest"

  # Stack configuration
  stack_name      = "buildkite"
  buildkite_queue = "default"
  region          = var.region

  # Scaling configuration
  min_size = 0
  max_size = 10

  # Instance configuration
  machine_type = "e2-standard-4"
}

Create a variables.tf file:

variable "project_id" {
  description = "GCP project ID"
  type        = string
}

variable "region" {
  description = "GCP region"
  type        = string
  default     = "us-central1"
}

variable "buildkite_organization_slug" {
  description = "Buildkite organization slug"
  type        = string
}

Create a terraform.tfvars file:

project_id                  = "your-gcp-project-id"
region                      = "us-central1"
buildkite_organization_slug = "your-org-slug"

Create an outputs.tf file (optional):

output "network_name" {
  description = "Name of the VPC network"
  value       = module.buildkite_stack.network_name
}

output "instance_group_name" {
  description = "Name of the managed instance group"
  value       = module.buildkite_stack.instance_group_manager_name
}

output "agent_service_account_email" {
  description = "Email of the agent service account"
  value       = module.buildkite_stack.agent_service_account_email
}

Step 4: Initialize and deploy

  • Authenticate with GCP:
gcloud auth application-default login
  • Initialize Terraform:
terraform init
  • Review the planned changes:
terraform plan
  • Deploy the stack:
terraform apply
  • Type yes when prompted to confirm the deployment.

The module will create:

  • VPC network with Cloud NAT
  • IAM service accounts with appropriate permissions
  • Managed instance group with Buildkite agents
  • Cloud Function for autoscaling metrics
  • Health checks and autoscaling based on queue depth

Advanced configuration

This section covers some of the configurations you might want to use for a deeper customization of your stack.

Using a custom VM image

If you built a custom Packer image with Docker support:

module "buildkite_stack" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  # ... other configuration ...

  # Use custom image family
  image = "buildkite-ci-stack"
}

Configuring agent tags

Target specific agents in your pipeline steps using tags:

module "buildkite_stack" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  # ... other configuration ...

  buildkite_agent_tags = "docker=true,os=linux,environment=production"
}

Then in your pipeline.yml, set the following:

steps:
  - command: echo "hello from production"
    agents:
      queue: "default"
      environment: "production"

For more information, see Buildkite Agent job queues.

Multiple queues

To create multiple agent pools with different configurations, deploy multiple stacks with different queue names:

# Production stack
module "buildkite_stack_production" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  stack_name      = "buildkite-production"
  buildkite_queue = "production"
  machine_type    = "e2-standard-4"
  max_size        = 20

  # ... other configuration ...
}

# Build stack for larger builds
module "buildkite_stack_builds" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  stack_name      = "buildkite-builds"
  buildkite_queue = "builds"
  machine_type    = "n1-standard-8"
  max_size        = 10

  # ... other configuration ...
}

Enabling Cloud Storage access

If your builds need to upload/download artifacts to Cloud Storage:

module "buildkite_stack" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  # ... other configuration ...

  enable_storage_access = true
}

Using IAP for secure SSH access

Enable Identity-Aware Proxy for secure SSH access without external IPs:

module "buildkite_stack" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  # ... other configuration ...

  enable_iap_access = true
}

Then connect to instances:

gcloud compute ssh INSTANCE_NAME \
  --zone ZONE \
  --tunnel-through-iap \
  --project PROJECT_ID

Restricting SSH access

Restrict SSH access to specific IP ranges:

module "buildkite_stack" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  # ... other configuration ...

  enable_ssh_access  = true
  ssh_source_ranges  = ["203.0.113.0/24"]  # Your office IP range
}

Adding resource labels

Add labels for cost tracking and organization:

module "buildkite_stack" {
  source = "github.com/buildkite/terraform-buildkite-elastic-ci-stack-for-gcp"

  # ... other configuration ...

  labels = {
    team        = "platform"
    environment = "production"
    cost-center = "engineering"
  }
}

Updating the stack

To update your stack configuration:

  • Modify your Terraform configuration files
  • Review the changes:
terraform plan
  • Apply the changes:
terraform apply

Terraform will automatically perform rolling updates to minimize disruption:

  • New instances will be created with the updated configuration
  • Old instances will be drained and terminated
  • The process of updating the stack respects max_surge and max_unavailable settings

Destroying the stack

To tear down the entire stack, use:

terraform destroy

Additional information

To gain a better understanding of how Elastic CI Stack for GCP works and how to use it most effectively and securely, check out the following resources: