馃悗

bump up cuda version to 12.4, also update sglang version

Passed in 6h 35m and blocked

Description

This file contains the downloading link for benchmarking results.

Please download the visualization scripts in the post

Results reproduction

  • Find the docker we use in benchmarking pipeline
  • Deploy the docker, and inside the docker:
    • Download nightly-benchmarks.zip.
    • In the same folder, run the following code
export HF_TOKEN=<your HF token>
apt update
apt install -y git
unzip nightly-benchmarks.zip
VLLM_SOURCE_CODE_LOC=./ bash .buildkite/nightly-benchmarks/scripts/run-nightly-benchmarks.sh

And the results will be inside ./benchmarks/results.

bootstrapcurl -sSL https://raw.githubusercontent.com/vllm-project/buildkite-ci/main/scripts/kickoff-benchmark.sh | bash
Waited 34s
Ran in 12s
Kuntai Du unblocked 馃殌 Ready for comparing vllm against alternatives? This will take 4 hours.
A100 vllm latest main
Waited 8s
Ran in 1h 8m
A100 sglang benchmark
Waited 50m 46s
Ran in 1h 9m
A100 lmdeploy benchmark
Waited 2h 2m
Ran in 1h 7m
A100 trt llama-8B
Waited 3h 9m
Ran in 31m 36s
A100 trt llama-70B
Waited 3h 41m
Ran in 56m 58s
Collect the results
Waited 47m 36s
Ran in 20s
Wait for container to be ready
A100
Total Job Run Time: 4h 54m