Jump to content

🐎

Performance BenchmarkPublic

main

Fix erroneous "model doesn't support compile" warning (#16486) #11869

16h

Tyler Michael Smith ·

· Created yesterday at 4:24 PM

[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU (#12779) #11868

17h

Tomasz Zielinski ·

· Created yesterday at 2:38 PM

[Bugfix] Fix bugs of running Quark quantized models (#16236) #11865

18h

Michael Goin ·

· Created yesterday at 2:18 PM

[Kernel] support merge_attn_states CUDA kernel, 3x speedup (#16173) #11864

19h

Michael Goin ·

· Created yesterday at 12:50 PM

Don't install triton on `ppc64le` platform (#16470) #11863

22h

Harry Mellor ·

· Created yesterday at 10:11 AM

[Misc] update api_client example (#16459) #11862

22h

Reid ·

· Created yesterday at 10:05 AM

[Misc] Raise error for V1 not supporting Long LoRA. (#16415) #11861

23h

Jee Jee Li ·

· Created yesterday at 8:51 AM

Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True (#16447) #11860

1d

Michael Goin ·

· Created yesterday at 8:09 AM

[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner (#15990) #11859

1d

Jee Jee Li ·

· Created yesterday at 7:32 AM

Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" (#16453) #11858

1d

DefTruth ·

· Created yesterday at 6:44 AM

[Bugfix] Don't set an upper bound on repetition penalty (#16403) #11857

1d

Alex Brooks ·

· Created yesterday at 6:19 AM

[CPU][Bugfix] Fix CPU docker issues (#16454) #11856

1d

Li, Jiang ·

· Created yesterday at 6:19 AM

[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test (#16424) #11855

1d

Isotr0py ·

· Created yesterday at 4:57 AM

Update supported_hardware.md for TPU INT8 (#16437) #11854

1d

Michael Goin ·

· Created yesterday at 4:28 AM

[Llama4] Enable attention temperature tuning by default for long context (>32k) (#16439) #11853

1d

Yong Hoon Shin ·

· Created yesterday at 4:26 AM

update benchmark_serving_structured_output to include auto backend (#16438) #11852

1d

Chenyaaang ·

· Created yesterday at 4:25 AM

check input length of sonnet samples (#16423) #11851

1d

Jee Jee Li ·

· Created yesterday at 2:15 AM

Fix range_ratio Bug in RandomDataset (#16126) #11850

1d

Roger Wang ·

· Created Thursday at 10:31 PM

[TPU][V1] Disable per-request seed/Generator (#16172) #11849

1d

Nicolò Lucchesi ·

· Created Thursday at 9:05 PM

[Bugfix] Fix output token length check logic (#16419) #11848

1d

Roger Wang ·

· Created Thursday at 8:16 PM

‹ Prev Next ›