Pause and resume an agent

You can pause an agent to prevent any jobs of the cluster's pipelines from being dispatched to that particular agent. This is similar to pausing and resuming queues, but instead, applies to individual agents.

Pausing an agent is a useful alternative to stopping an agent, especially when resources are tied to the lifetime of the agent, such as a cloud instance configured to terminate when the agent exits. By pausing an agent, you can investigate problems in its environment more easily, without the worry of jobs being dispatched to it. Pausing is also useful when performing maintenance on an agent's environment, where idleness would be preferred, especially for maintenance operations that would affect the reliability or speed of jobs if they ran at the same time. Some examples of maintenance operations that could benefit from pausing an agent include:

  • pruning Docker caches
  • emptying temporary directories
  • updating code mirrors
  • installing software updates
  • compacting or vacuuming databases

Pause timeouts

A paused agent continues to consume resources even while it is not running any jobs. Since it could be undesirable to do this indefinitely, each pause has a timeout specified in minutes. The default timeout is 5 minutes.

With Buildkite Agent v3.93 and later, a paused ephemeral agent also remains running after it would normally exit. An ephemeral agent is an agent started with any one of these flags:

  • --acquire-job
  • --disconnect-after-job
  • --disconnect-after-idle-timeout

Pausing an ephemeral agent is useful for preventing ephemeral resources such as EC2 instances or Kubernetes pods from being automatically removed. This allows manually inspecting and diagnosing a failing agent's environment. An ephemeral agent that is paused but otherwise idle will exit once it is resumed.

Paused agents and scaling

The Agent Scaler component of Elastic CI Stack for AWS considers paused agents to be available for jobs, even though they are not. The stack will not scale up extra instances to maintain capacity merely because an agent becomes paused.

To pause an agent:

  1. Select Agents in the global navigation to access the Clusters page.
  2. Select the cluster with the agent to pause.
  3. On the Queues page, select the queue with the agent to resume.
  4. On the queue's details page, select the agent to pause.
  5. On the agent's details page, select Pause Agent.
  6. Enter a timeout (in minutes) and an optional note, and select Yes, pause this agent to pause the agent.

    Note: Use this note to explain why you're pausing the agent. The note will be displayed on the agent's details page.

Jobs already started by an agent that becomes paused will continue to run. New jobs that target the agent's queue will be dispatched to other agents in the queue, or wait.

To resume an agent:

  1. Select Agents in the global navigation to access the Clusters page.
  2. Select the cluster with the agent to resume.
  3. On the Queues page, select the queue with the agent to resume.
  4. On the queue's details page, select the agent to resume.
  5. On the agent's details page, select Yes, resume this agent.

    Jobs will resume being dispatched to the agent as usual, including any jobs waiting to run.

Using the REST API

To pause an agent (clustered or unclustered) using the REST API, run the following example curl command:

curl -H "Authorization: Bearer ${TOKEN}" \
  -X PUT "https://api.buildkite.com/v2/organizations/{org.slug}/agents/{id}/pause" \
  -H "Content-Type: application/json" \
  -d '{
    "note": "A short note explaining why this agent is being paused",
    "timeout_in_minutes": 60
  }'

where:

  • $TOKEN is an API access token scoped to the relevant Organization and REST API Scopes that your request needs access to in Buildkite.
  • {org.slug} can be obtained:

    • From the end of your Buildkite URL, after accessing Pipelines in the global navigation of your organization in Buildkite.
    • By running the List organizations REST API query to obtain this value from slug in the response. For example:

      curl -H "Authorization: Bearer $TOKEN" \
        - X GET "https://api.buildkite.com/v2/organizations"
      

To resume an agent using the REST API, run the following example curl command:

curl -H "Authorization: Bearer ${TOKEN}" \
  -X PUT "https://api.buildkite.com/v2/organizations/{org.slug}/agents/{id}/resume" \
  -H "Content-Type: application/json" \
  -d '{}'

Using the GraphQL API

To pause an agent (clustered or unclustered) using the GraphQL API, run the following example mutation:

mutation {
  agentPause(
      input: {
          id: "The GraphQL ID for the agent to pause"
          note: "Note explaining why the agent is being paused"
          timeoutInMinutes: 60
      }
  ) {
    agent {
      uuid
      paused
      pausedAt
      pausedBy { uuid }
      pausedNote
    }
  }
}

where the GraphQL ID for an agent can be found from an agents GraphQL query:

query {
   organization(slug: "Your_org_slug") {
    agents(first: 10) {
      edges {
        node {
          id
        }
      }
    }
  }
}

To resume an agent using the GraphQL API, run the following example mutation:

mutation {
  agentResume(
      input: {
          id: "The GraphQL ID for the agent to resume"
      }
  ) {
    agent {
      uuid
      paused
    }
  }
}