Controller configuration

This page covers the available commands for:

  • agent-stack-k8s [flags]
  • agent-stack-k8s [command]

All references to "controller" on this page refer to the Agent Stack for Kubernetes controller.

Available commands

Command Description
completion Generate the autocompletion script for the specified shell
help Help about any command
lint A tool for linting Buildkite pipelines
version Prints the version

Use agent-stack-k8s [command] --help for more information about a command.

Flags

Flag and value type if applicable Description

--agent-token-secret

  Type: string

The name of the Buildkite agent token secret.

Default: buildkite-agent-token

--allow-pod-spec-patch-unsafe-command-modification

Permits podSpecPatch to modify the command or args fields of stack-provided containers. See the warning in the agent-stack-k8s README before enabling this option.

--cluster-uuid

  Type: string

The UUID of the Buildkite cluster. The agent token must be for the Buildkite cluster.

-f, --config

  Type: string

The config file path.

--debug

Debug logs.

--default-image-check-pull-policy

  Type: string

Sets a default PullPolicy for image-check init containers, used if an image pull policy is not set for the corresponding container in a podSpec or podSpecPatch.

--default-image-pull-policy

  Type: string

Configures a default image pull policy for containers that do not specify a pull policy and non-init containers created by the stack itself.

--default-termination-grace-period-seconds

  Type: integer

The maximum number of seconds a pod will run after being told to terminate, if not otherwise set by a podSpec.

Default: 60

--empty-job-grace-period

  Type: duration

The duration after starting a Kubernetes job that the controller will wait before considering failing the job due to a missing pod (for example, when the podSpec specifies a missing service account).

Default: 30s

--enable-queue-pause

  Type: bool

Allow the controller to pause processing the jobs when the queue is paused on Buildkite.
This flag is only available in version 0.24.0 and later of the controller.

Default: false

-h, --help

Displays help for the agent-stack-k8s.

--http-timeout

  Type: duration

Timeout for HTTP requests to the Buildkite agent API.

Default: 60s

--image

  Type: string

The container image to use for the Buildkite agent. Defaults to a version of ghcr.io/buildkite/agent matching the agent-stack-k8s release.

--image-check-container-cpu-limit

  Type: string

Configures the CPU resource limit for all imagecheck-* init containers.

Default: 200m

--image-check-container-memory-limit

  Type: string

Configures the memory resource limit for all imagecheck-* init containers.

Default: 128Mi

--image-pull-backoff-grace-period

  Type: duration

Duration after starting a pod that the controller will wait before considering cancelling a job due to ImagePullBackOff (e.g., when the podSpec specifies container images that cannot be pulled).

Default: 30s

--job-active-deadline-seconds

  Type: integer

The maximum number of seconds a Kubernetes job is allowed to run before terminating all pods and failing.

Default: 21600

--job-cancel-checker-poll-interval

  Type: duration

Controls the interval between job state queries while a pod is still Pending.

Default: 5s

--job-creation-concurrency

  Type: integer

The number of concurrent goroutines for converting Buildkite jobs into Kubernetes jobs.

Default: 25

--job-prefix

  Type: string

The prefix to use when creating Kubernetes job names.

Default: buildkite-

--job-ttl

  Type: duration

The time to retain Kubernetes jobs after completion.

Default: 10m0s

--k8s-client-rate-limiter-burst

  Type: integer

The burst value of the K8s client rate limiter.

Default: 20

--k8s-client-rate-limiter-qps

  Type: integer

The QPS value of the K8s client rate limiter.

Default: 10

--log-format

  Type: string

Sets the log format. One of logfmt (plain or colored text) or json.

Default: logfmt

--log-http-payloads

Logs full HTTP request and response payloads. Only active when log level is debug. This may log sensitive information including tokens and secrets.

--log-level

  Type: string

Sets the log level. One of debug, info, warn, or error. Overridden by --debug if set.

Default: info

--max-in-flight

  Type: integer

The maximum jobs in flight, where a value of 0 means no maximum.

Default: 25

--namespace

  Type: string

The Kubernetes namespace to create resources in.

Default: default

--no-color

Disables colored log output (ANSI escape codes). Colors are disabled automatically when the output is not a terminal.

--org

  Type: string

The Buildkite organization name to watch.

--pagination-depth-limit

  Type: integer

Sets the maximum number of pages when retrieving Buildkite jobs to be scheduled. Increasing this value will increase the number of requests made to the Buildkite API and number of jobs to be scheduled on the Kubernetes cluster.

Default: 2

--pagination-page-size

  Type: integer

Sets the maximum number of jobs per page when retrieving Buildkite jobs to be scheduled.

Default: 1000

--poll-interval

  Type: duration

The time to wait between polling for new jobs (minimum 1s). Note that increasing this causes jobs to be slower to start.

Default: 1s

--profiler-address

  Type: string

The bind address to expose the pprof profiler (for example, localhost:6060).

--prohibit-kubernetes-plugin

Causes the controller to prohibit the Kubernetes plugin specified within jobs (pipeline YAML). Enabling this causes jobs with a Kubernetes plugin to fail, preventing the pipeline YAML from having any influence over the podSpec.

--prometheus-port

  Type: uint16

The bind port to expose Prometheus /metrics. Specifying 0 disables this feature.

--query-reset-interval

  Type: duration

Controls the interval between pagination cursor resets. Increasing this value will increase the number of jobs to be scheduled but also delay picking up any jobs that were missed from the start of the query.

Default: 10s

--queue

  Type: string

The Buildkite queue to poll for jobs. If set, overrides the queue tag.

--skip-image-check-containers

Disables and skips all imagecheck-* init containers.

--tags

  Type: strings

A comma-separated list of agent tags. The "queue" tag must be unique (for example, "queue=kubernetes,os=linux").

Default: [queue=kubernetes]

--work-queue-limit

  Type: integer

Sets the maximum number of jobs the controller will hold in the work queue.

Default: 1000000

Kubernetes node selection

The Buildkite Agent Stack for Kubernetes controller can be deployed to particular Kubernetes Nodes, using the Kubernetes PodSpec nodeSelector field.

The nodeSelector field can be defined in the controller's configuration:

# values.yml
...
nodeSelector:
  teamowner: "services"
config:
...

Additional environment variables for the controller container

If the Buildkite Agent Stack for Kubernetes controller container requires extra environment variables in order to correctly operate inside your Kubernetes cluster, they can be added to your values YAML file and applied during a deployment with Helm.

The controllerEnv field can be used to define extra Kubernetes EnvVar environment variables that will apply to the Buildkite Agent Stack for Kubernetes controller container:

# values.yml
...
controllerEnv:
  - name: KUBERNETES_SERVICE_HOST
    value: "10.10.10.10"
  - name: KUBERNETES_SERVICE_PORT
    value: "8443"
config:
...

Custom annotations for the controller

If you need to add custom annotations to the Agent Stack for Kubernetes controller pod, these annotations can be defined in your values YAML file and applied during a deployment with Helm. Note that the controller pod will also have the annotations checksum/config and checksum/secrets to track changes to the configuration and secrets.

The annotations field can be used to define custom annotations that will be applied to the Buildkite Agent Stack for Kubernetes controller pod:

# values.yml
...
annotations:
  kubernetes.io/description: "Agent Stack K8s Controller"
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
config:
...

Cleaning up old Buildkite Pipelines jobs

If you are using Kubernetes v1.23 and earlier, you may sometimes find that old jobs are still present in your Kubernetes cluster and are not getting automatically cleaned up. This may consume unnecessary space and potentially cause other disruptions with deployments.

If you notice old Buildkite Pipelines jobs still present in your Kubernetes cluster, you can use the clean-up-job.yaml script (with usage instructions provided at the top of this file) located in Agent Stack for Kubernetes repository to clean up your old Buildkite jobs.