VPC design for the Elastic CI Stack for AWS

Agent orchestration deployments on AWS require a virtual private cloud (VPC) network.

Your VPC needs to provide routable access to the buildkite.com service so that buildkite-agent processes can connect, and retrieve the jobs assigned to them. The options are:

  • a public subnet, with a route table that has a default route pointing to an internet gateway
  • a private subnet, with a route table that has a default route pointing to an NAT device

Auxiliary services used by the agent or your jobs such as S3, ECR, or SSM, can be routed over the public internet, or though a VPC Endpoint.

The AWS VPC quick start provides a template for deploying a 2, 3, or 4 Availability Zone VPC with parameters for whether to create public and private subnets. Once deployed, these subnets can then be provided as parameters to the agent orchestration templates such as the Elastic CI Stack for AWS.

Use your organization's threat model to guide the selection of a solution that balances operational complexity against acceptable risk for your workload.

Public subnets only

The most basic VPC subnet design involves using only public subnets whose route table's default route points to an internet gateway. Under this design your EC2 instances or ECS tasks are provided with a public IPv4 address in order to access the internet directly. You can use security groups to limit traffic and block inbound network connections to your instances.

Using private subnets for added security

For an added layer of defence against unwanted inbound connectivity, you can place your instances in a private subnet. A private subnet provides the greatest level of control when seeking to restrict the inbound and outbound network connections of your agent instances.

A private subnet's route table does not grant direct routable access to or from the internet. Instead, a private subnet's default route is pointed to a NAT instance or a NAT gateway. A NAT device lives in the public subnet, and rewrites the private source IP address of any outbound connections to its own public IP address. NAT devices statefully limit response traffic to known outbound network connections, similar to a security group.

To diagnose agent instance performance and behaviors, it is common to remotely access an interactive prompt. There are a number of options available for remote access to instances in a private subnet, described in the following sections.

AWS Systems Manager Session Manager

Installing the AWS SSM Agent allows you to initiate sessions on private instances without requiring publicly routable SSH, or adding a VPN gateway to your VPC.

Session Manager provides secure and auditable instance management without the need to open inbound ports, maintain bastion hosts, or manage SSH keys. Session Manager also allows you to comply with corporate policies that require controlled access to instances, strict security practices, and fully auditable logs with instance access details, while still providing end users with one-click cross-platform access to your managed instances.

See the AWS Systems Manager Session Manager documentation for more details.

Bastion instance

The bastion or jump host pattern, involves deploying an instance to a public subnet with a publicly routable IP address and a security group that allows external inbound SSH connections. An additional security group is used to restrict SSH access to the private subnet agent instances to the bastion instances.

This limits the public surface area of your VPC, but still requires exposing an unmanaged instance to public traffic. Public facing instances should be patched and updated regularly.

The Linux Bastion Hosts on AWS Quick Start provides a example of this pattern.

VPN

A client VPN can be used to provide hosts outside your VPC with access to your otherwise non-internet routable private subnets. AWS Client VPN provides a managed client-based VPN.

You can control which resources can be accessed by your Client VPN Endpoint using Client VPN authorization.

A VPN can also be combined with bastion instances to provide additional defence in depth, if appropriate or required for your use case.

S3 VPC endpoint

A gateway VPC endpoint can be used to route traffic directly to regional AWS services. Gateway VPC endpoints can also be used to control access to services, for example restricting which S3 buckets your VPC resources are allowed to access.

The AWS VPC Quick Start creates and configures a gateway VPC endpoint for AWS S3. The private subnet route tables are configured forward the endpoint's IP-prefix list to the endpoint, instead of the NAT gateway. In-region S3 access from the private subnets will routed directly over the VPC endpoint, and bypass the NAT gateway. By default, the VPC endpoint has a permissive "Full Access" policy. Should you wish to customize this, or the security group that the endpoint belongs to, create a fork of the CloudFormation template.