Jump to content

🐎

Performance BenchmarkPublic

Builds

Bugfix for PixtralHF models without spatial_merge_size (#16513) #11879

43m

Michael Goin ·

· Created yesterday at 11:32 PM

[Bugfix] clean up duplicated code (#16485) #11878

55m

Isotr0py ·

· Created yesterday at 11:19 PM

Update openai_compatible_server.md (#16507) #11877

1h

Christian Sears ·

· Created yesterday at 10:55 PM

[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 (#16488) #11876

1h

Yong Hoon Shin ·

· Created yesterday at 10:26 PM

[Frontend] Added chat templates for LLaMa4 pythonic tool calling (#16463) #11875

1h

Ye (Charlotte) Qi ·

· Created yesterday at 10:26 PM

Improve configs - `LoadConfig` (#16422) #11874

3h

Harry Mellor ·

· Created yesterday at 8:27 PM

[Bugfix] handle alignment of encoder_seq_lens in mllama.py (#14784) #11873

4h

Travis Johnson ·

· Created yesterday at 7:59 PM

[Doc] Document InternVL3 support (#16495) #11872

4h

Michael Goin ·

· Created yesterday at 7:41 PM

[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel (#16366) #11871

6h

Michael Goin ·

· Created yesterday at 5:54 PM

[TPU][V1] Make `--disable_chunked_mm_input` mandatory for serving MM models (#16483) #11870

7h

Nicolò Lucchesi ·

· Created yesterday at 5:06 PM

Fix erroneous "model doesn't support compile" warning (#16486) #11869

7h

Tyler Michael Smith ·

· Created yesterday at 4:24 PM

[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779) #11868

9h

Tomasz Zielinski ·

· Created yesterday at 2:38 PM

more amd tweaks #11867

9h

Lucas Wilkinson ·

neuralmagic:lwilkinson/no-pad-fa3

· Created yesterday at 2:37 PM

amd fixes #11866

7m

Lucas Wilkinson ·

neuralmagic:lwilkinson/no-pad-fa3

· Created yesterday at 2:29 PM

[Bugfix] Fix bugs of running Quark quantized models (#16236) #11865

9h

Michael Goin ·

· Created yesterday at 2:18 PM

[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173) #11864

11h

Michael Goin ·

· Created yesterday at 12:50 PM

Don't install triton on `ppc64le` platform (#16470) #11863

14h

Harry Mellor ·

· Created yesterday at 10:11 AM

[Misc] update api_client example (#16459) #11862

14h

Reid ·

· Created yesterday at 10:05 AM

[Misc] Raise error for V1 not supporting Long LoRA. (#16415) #11861

15h

Jee Jee Li ·

· Created yesterday at 8:51 AM

Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447) #11860

16h

Michael Goin ·

· Created yesterday at 8:09 AM

Next ›