🧪
CI
PublicCI Testing of the vLLM Repo
Queue Paused
[torch.compile] Hide KV cache behind torch.compile boundary (#11677)
Passed in 1h 53m and blocked
bootstrap

Documentation Build
Async Engine, Inputs, Utils, Worker Test
Run Python-only Installation Test
Python-only Installation Test
Basic Correctness Test
Chunked Prefill Test
Run Core Test
Core Test
Entrypoints Test
Run Distributed Tests (4 GPUs)
Distributed Tests (4 GPUs)
Metrics, Tracing Test
Regression Test
Engine Test
V1 Test
Run Examples Test
Examples Test
Prefix Caching Test
Run Samplers Test
Samplers Test
Run LogitsProcessor Test
LogitsProcessor Test
Run Speculative decoding tests
Speculative decoding tests
Run LoRA Test %N
PyTorch Fullgraph Smoke Test
PyTorch Fullgraph Test
Run Tensorizer Test
Tensorizer Test
Run Benchmarks
Benchmarks
Run Quantization Test
Quantization Test
Run LM Eval Small Models
LM Eval Small Models
Encoder Decoder tests
OpenAI-Compatible Tool Use
Basic Models Test
Language Models Test (Standard)
Run Language Models Test (Extended)
Language Models Test (Extended)
Multi-Modal Models Test (Standard)
Run Multi-Modal Models Test (Extended) 1
Multi-Modal Models Test (Extended) 1
Run Multi-Modal Models Test (Extended) 2
Multi-Modal Models Test (Extended) 2
Run Custom Models Test
Custom Models Test
Run Distributed Comm Ops Test
Distributed Comm Ops Test
Run 2 Node Tests (4 GPUs in total)
2 Node Tests (4 GPUs in total)
Distributed Tests (2 GPUs)
Run Plugin Tests (2 GPUs)
Plugin Tests (2 GPUs)
Multi-step Tests (4 GPUs)
Run Pipeline Parallelism Test
Pipeline Parallelism Test
Run LoRA TP Test (Distributed)
LoRA TP Test (Distributed)
Weight Loading Multiple GPU Test
Run Weight Loading Multiple GPU Test - Large Models
Weight Loading Multiple GPU Test - Large Models
Run Distributed Tests (A100)
Distributed Tests (A100)
Run LM Eval Large Models
LM Eval Large Models
Neuron Test
Intel CPU Test
Intel HPU Test
Intel GPU Test
IBM Power(ppc64le) CPU Test
GH200 Test
TPU Test
bootstrapif [[ -n "" ]]; then VLLM_CI_BRANCH= curl -sSL "https://raw.githubusercontent.com/vllm-project/buildkite-ci//scripts/ci_aws_bootstrap.sh" | bash && exit 0; fi && curl -sSL "https://raw.githubusercontent.com/vllm-project/buildkite-ci/main/scripts/ci_aws_bootstrap.sh" | bash
Waited 44s
Ran in 13s

Waited 42s
Ran in 30m 9s
Python-only Installation Test
Core Test
Distributed Tests (4 GPUs)
Examples Test
Samplers Test
LogitsProcessor Test
Speculative decoding tests
1/4
LoRA Test 12/4
LoRA Test 23/4
LoRA Test 34/4
LoRA Test 4Tensorizer Test
Benchmarks
Quantization Test
LM Eval Small Models
Language Models Test (Extended)
Multi-Modal Models Test (Extended) 1
Multi-Modal Models Test (Extended) 2
Custom Models Test
Distributed Comm Ops Test
2 Node Tests (4 GPUs in total)./.buildkite/run-multi-node-test.sh /vllm-workspace/tests 2 2 public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b "VLLM_TEST_SAME_HOST=0 torchrun --nnodes 2 --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=192.168.10.10 distributed/test_same_node.py | grep 'Same node test passed' && VLLM_MULTI_NODE=1 pytest -v -s distributed/test_multi_node_assignment.py && VLLM_MULTI_NODE=1 pytest -v -s distributed/test_pipeline_parallel.py" "VLLM_TEST_SAME_HOST=0 torchrun --nnodes 2 --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=192.168.10.10 distributed/test_same_node.py | grep 'Same node test passed'"
Plugin Tests (2 GPUs)
Pipeline Parallelism Test
LoRA TP Test (Distributed)
Weight Loading Multiple GPU Test - Large Models
Distributed Tests (A100)
LM Eval Large Models
AMD:
build imagedocker build --build-arg max_jobs=16 --tag rocm/vllm-ci:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b -f Dockerfile.rocm --progress plain . && docker push rocm/vllm-ci:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b

Waited 8s
Ran in 12m 51s
AMD: Core Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s core"
Waited 3s
Ran in 18m 17s
AMD: Entrypoints Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_lazy_outlines.py --ignore=entrypoints/llm/test_generate.py --ignore=entrypoints/llm/test_generate_multiple_loras.py --ignore=entrypoints/llm/test_guided_generate.py && pytest -v -s entrypoints/llm/test_lazy_outlines.py && pytest -v -s entrypoints/llm/test_generate.py && pytest -v -s entrypoints/llm/test_generate_multiple_loras.py && pytest -v -s entrypoints/llm/test_guided_generate.py && pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_oot_registration.py && pytest -v -s entrypoints/test_chat_utils.py && pytest -v -s entrypoints/offline_mode"
Waited 7s
Ran in 3m 19s
AMD: Regression Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pip install modelscope && pytest -v -s test_regression.py"
Waited 3m 37s
Ran in 5m 34s
AMD: Engine Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s engine test_sequence.py test_config.py test_logger.py && pytest -v -s tokenization"
Waited 3m 58s
Ran in 18m 55s
AMD: Prefix Caching Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s prefix_caching"
Waited 4m 19s
Ran in 16m 35s
AMD: LogitsProcessor Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s test_logits_processor.py && pytest -v -s model_executor/test_guided_processors.py"
Waited 4m 33s
Ran in 4m 20s
AMD: LoRA Test %Nbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s lora --shard-id=$BUILDKITE_PARALLEL_JOB --num-shards=$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_long_context.py --ignore=lora/test_chatglm3_tp.py --ignore=lora/test_llama_tp.py --ignore=lora/test_minicpmv_tp.py"
Waited 4m 56s
Ran in 16m 46s
AMD: Kernels Test %Nbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s kernels --shard-id=$BUILDKITE_PARALLEL_JOB --num-shards=$BUILDKITE_PARALLEL_JOB_COUNT"
Waited 5m 39s
Ran in 3m 15s
AMD: Tensorizer Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; apt-get update && apt-get install -y curl libsodium23 && export VLLM_WORKER_MULTIPROC_METHOD=spawn && pytest -v -s tensorizer_loader"
Waited 9m 4s
Ran in 11m 11s
AMD: Benchmarksbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/.buildkite ; bash run-benchmarks.sh"
Waited 9m 4s
Ran in 7m 36s
AMD: OpenAI-Compatible Tool Usebash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s tool_use"
Waited 9m 21s
Ran in 17m 37s
TPU Testif [[ -f ".buildkite/run-tpu-test.sh" ]]; then bash .buildkite/run-tpu-test.sh; fi && yes | docker system prune -a
Waited 8s
Ran in 25m 17s
Total Job Run Time: 16h 7m