🧪

CI

Public

CI Testing of the vLLM Repo

Queue Paused

tpu_queueKevin LuuKevin Luu

[torch.compile] Hide KV cache behind torch.compile boundary (#11677)

Passed in 1h 53m and blocked
bootstrap
:docker: build image
Documentation Build
Async Engine, Inputs, Utils, Worker Test
Run Python-only Installation Test
Python-only Installation Test
Basic Correctness Test
Chunked Prefill Test
Run Core Test
Core Test
Entrypoints Test
Run Distributed Tests (4 GPUs)
Distributed Tests (4 GPUs)
Metrics, Tracing Test
Regression Test
Engine Test
V1 Test
Run Examples Test
Examples Test
Prefix Caching Test
Run Samplers Test
Samplers Test
Run LogitsProcessor Test
LogitsProcessor Test
Run Speculative decoding tests
Speculative decoding tests
Run LoRA Test %N
PyTorch Fullgraph Smoke Test
PyTorch Fullgraph Test
Run Tensorizer Test
Tensorizer Test
Run Benchmarks
Benchmarks
Run Quantization Test
Quantization Test
Run LM Eval Small Models
LM Eval Small Models
Encoder Decoder tests
OpenAI-Compatible Tool Use
Basic Models Test
Language Models Test (Standard)
Run Language Models Test (Extended)
Language Models Test (Extended)
Multi-Modal Models Test (Standard)
Run Multi-Modal Models Test (Extended) 1
Multi-Modal Models Test (Extended) 1
Run Multi-Modal Models Test (Extended) 2
Multi-Modal Models Test (Extended) 2
Run Custom Models Test
Custom Models Test
Run Distributed Comm Ops Test
Distributed Comm Ops Test
Run 2 Node Tests (4 GPUs in total)
2 Node Tests (4 GPUs in total)
Distributed Tests (2 GPUs)
Run Plugin Tests (2 GPUs)
Plugin Tests (2 GPUs)
Multi-step Tests (4 GPUs)
Run Pipeline Parallelism Test
Pipeline Parallelism Test
Run LoRA TP Test (Distributed)
LoRA TP Test (Distributed)
Weight Loading Multiple GPU Test
Run Weight Loading Multiple GPU Test - Large Models
Weight Loading Multiple GPU Test - Large Models
Run Distributed Tests (A100)
Distributed Tests (A100)
Run LM Eval Large Models
LM Eval Large Models
Neuron Test
Intel CPU Test
Intel HPU Test
Intel GPU Test
IBM Power(ppc64le) CPU Test
GH200 Test
TPU Test
bootstrapif [[ -n "" ]]; then VLLM_CI_BRANCH= curl -sSL "https://raw.githubusercontent.com/vllm-project/buildkite-ci//scripts/ci_aws_bootstrap.sh" | bash && exit 0; fi && curl -sSL "https://raw.githubusercontent.com/vllm-project/buildkite-ci/main/scripts/ci_aws_bootstrap.sh" | bash
Waited 44s
·
Ran in 13s
:docker: build imageaws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7 && #!/bin/bash && if [[ -z $(docker manifest inspect public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b) ]]; then && echo "Image not found, proceeding with build..." && else && echo "Image found" && exit 0 && fi && docker build --build-arg max_jobs=16 --build-arg buildkite_commit=cf5f000d218fbcbc4bf404de8ed9d9607a128c3b --build-arg USE_SCCACHE=1 --tag public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b --target test --progress plain . && docker push public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b
Waited 42s
·
Ran in 30m 9s
Documentation Build
Waited 41s
·
Ran in 3m 14s
Async Engine, Inputs, Utils, Worker Test
Waited 58s
·
Ran in 53m 53s
Python-only Installation Test
Basic Correctness Test
Waited 59s
·
Ran in 16m 47s
Chunked Prefill Test
Waited 59s
·
Ran in 17m 6s
Core Test
Entrypoints Test
Waited 1m 0s
·
Ran in 1h 21m
Distributed Tests (4 GPUs)
Metrics, Tracing Test
Waited 1m 8s
·
Ran in 12m 37s
Regression Test
Waited 1m 0s
·
Ran in 4m 55s
Engine Test
Waited 1m 0s
·
Ran in 15m 42s
V1 Test
Waited 1m 1s
·
Ran in 8m 6s
Examples Test
Prefix Caching Test
Waited 1m 1s
·
Ran in 10m 55s
Samplers Test
LogitsProcessor Test
Speculative decoding tests
1/4
LoRA Test 1
2/4
LoRA Test 2
3/4
LoRA Test 3
4/4
LoRA Test 4
PyTorch Fullgraph Smoke Test
Waited 1m 2s
·
Ran in 12m 6s
PyTorch Fullgraph Test
Waited 1m 3s
·
Ran in 19m 47s
1/4
Kernels Test 1
Waited 1m 3s
·
Ran in 49m 4s
2/4
Kernels Test 2
Waited 1m 3s
·
Ran in 51m 50s
3/4
Kernels Test 3
Waited 1m 6s
·
Ran in 50m 50s
4/4
Kernels Test 4
Waited 1m 7s
·
Ran in 58m 34s
Tensorizer Test
Benchmarks
Quantization Test
LM Eval Small Models
Encoder Decoder tests
Waited 1m 10s
·
Ran in 7m 54s
OpenAI-Compatible Tool Use
Waited 1m 10s
·
Ran in 36m 30s
Basic Models Test
Waited 1m 11s
·
Ran in 21m 29s
Language Models Test (Standard)
Waited 1m 17s
·
Ran in 30m 7s
Language Models Test (Extended)
Multi-Modal Models Test (Standard)
Waited 1m 20s
·
Ran in 33m 1s
Multi-Modal Models Test (Extended) 1
Multi-Modal Models Test (Extended) 2
Custom Models Test
Distributed Comm Ops Test
2 Node Tests (4 GPUs in total)./.buildkite/run-multi-node-test.sh /vllm-workspace/tests 2 2 public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b "VLLM_TEST_SAME_HOST=0 torchrun --nnodes 2 --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=192.168.10.10 distributed/test_same_node.py | grep 'Same node test passed' && VLLM_MULTI_NODE=1 pytest -v -s distributed/test_multi_node_assignment.py && VLLM_MULTI_NODE=1 pytest -v -s distributed/test_pipeline_parallel.py" "VLLM_TEST_SAME_HOST=0 torchrun --nnodes 2 --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=192.168.10.10 distributed/test_same_node.py | grep 'Same node test passed'"
Distributed Tests (2 GPUs)
Waited 1m 16s
·
Ran in 51m 21s
Plugin Tests (2 GPUs)
Multi-step Tests (4 GPUs)
Waited 1m 19s
·
Ran in 26m 1s
Pipeline Parallelism Test
LoRA TP Test (Distributed)
Weight Loading Multiple GPU Test
Waited 1m 29s
·
Ran in 51m 31s
Weight Loading Multiple GPU Test - Large Models
Distributed Tests (A100)
LM Eval Large Models
AMD: :docker: build imagedocker build --build-arg max_jobs=16 --tag rocm/vllm-ci:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b -f Dockerfile.rocm --progress plain . && docker push rocm/vllm-ci:cf5f000d218fbcbc4bf404de8ed9d9607a128c3b
Waited 8s
·
Ran in 12m 51s
AMD: Core Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s core"
Waited 3s
·
Ran in 18m 17s
AMD: Entrypoints Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s entrypoints/llm --ignore=entrypoints/llm/test_lazy_outlines.py --ignore=entrypoints/llm/test_generate.py --ignore=entrypoints/llm/test_generate_multiple_loras.py --ignore=entrypoints/llm/test_guided_generate.py && pytest -v -s entrypoints/llm/test_lazy_outlines.py && pytest -v -s entrypoints/llm/test_generate.py && pytest -v -s entrypoints/llm/test_generate_multiple_loras.py && pytest -v -s entrypoints/llm/test_guided_generate.py && pytest -v -s entrypoints/openai --ignore=entrypoints/openai/test_oot_registration.py && pytest -v -s entrypoints/test_chat_utils.py && pytest -v -s entrypoints/offline_mode"
Waited 7s
·
Ran in 3m 19s
AMD: Regression Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pip install modelscope && pytest -v -s test_regression.py"
Waited 3m 37s
·
Ran in 5m 34s
AMD: Engine Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s engine test_sequence.py test_config.py test_logger.py && pytest -v -s tokenization"
Waited 3m 58s
·
Ran in 18m 55s
AMD: Prefix Caching Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s prefix_caching"
Waited 4m 19s
·
Ran in 16m 35s
AMD: LogitsProcessor Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s test_logits_processor.py && pytest -v -s model_executor/test_guided_processors.py"
Waited 4m 33s
·
Ran in 4m 20s
AMD: LoRA Test %Nbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s lora --shard-id=$BUILDKITE_PARALLEL_JOB --num-shards=$BUILDKITE_PARALLEL_JOB_COUNT --ignore=lora/test_long_context.py --ignore=lora/test_chatglm3_tp.py --ignore=lora/test_llama_tp.py --ignore=lora/test_minicpmv_tp.py"
Waited 4m 56s
·
Ran in 16m 46s
AMD: Kernels Test %Nbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s kernels --shard-id=$BUILDKITE_PARALLEL_JOB --num-shards=$BUILDKITE_PARALLEL_JOB_COUNT"
Waited 5m 39s
·
Ran in 3m 15s
AMD: Tensorizer Testbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; apt-get update && apt-get install -y curl libsodium23 && export VLLM_WORKER_MULTIPROC_METHOD=spawn && pytest -v -s tensorizer_loader"
Waited 9m 4s
·
Ran in 11m 11s
AMD: Benchmarksbash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/.buildkite ; bash run-benchmarks.sh"
Waited 9m 4s
·
Ran in 7m 36s
AMD: OpenAI-Compatible Tool Usebash .buildkite/run-amd-test.sh "(command rocm-smi || true) && export VLLM_LOGGING_LEVEL=DEBUG && export VLLM_ALLOW_DEPRECATED_BEAM_SEARCH=1 && cd /vllm-workspace/tests ; pytest -v -s tool_use"
Waited 9m 21s
·
Ran in 17m 37s
Neuron Testbash .buildkite/run-neuron-test.sh
Waited 5s
·
Ran in 3m 6s
Intel CPU Testbash .buildkite/run-cpu-test.sh
Waited 50m 45s
·
Ran in 28m 29s
Intel HPU Testbash .buildkite/run-hpu-test.sh
Waited 9s
·
Ran in 1m 2s
Intel GPU Testbash .buildkite/run-xpu-test.sh
Waited 2s
·
Ran in 1m 46s
IBM Power(ppc64le) CPU Testbash .buildkite/run-cpu-test-ppc64le.sh
Waited 11s
·
Ran in 4m 24s
GH200 Testbash .buildkite/run-gh200-test.sh
Waited 10s
·
Ran in 12m 10s
TPU Testif [[ -f ".buildkite/run-tpu-test.sh" ]]; then bash .buildkite/run-tpu-test.sh; fi && yes | docker system prune -a
Waited 8s
·
Ran in 25m 17s
Total Job Run Time: 16h 7m