It’s only the standard for people who self host their llms and don’t have $500k to throw at hardware for GLM-5.1 or similar models.
I have qwen3.6:27b on my local hardware and it’s way better than I expected. I’m excited for the rest of the 3.6 line as it comes out, if they can keep up that quality.
This story is also a nothing burger. Generally, yes, Nvidia will suffer once chinas stack catches up (soon). By then whatever bubble we are in will have normalized one way or the other.
In terms of actually deploying this model, it doesn’t matter what hardware you’re using. VLLM supports almost everything with SIMD-type hardware instructions.
More competition will make everyone happy except Nvidia shareholders.
Gemma4:26b is also worth trying. I find it runs much faster on my hardware.
Edit: Qwen3.6:35B might be the sweet spot. It’s bigger than the 27B, but actually more lightweight when running. TIL the 27B is not a MoE model; it’s a dense model. The 35B is a MoE model with only 3B active params.
So far, I think Qwen3.6:35B might be giving me better results than Gemma4:26B. It’s a bit slower than Gemma4:26B, but definitely faster than Qwen3.6:27B.
It’s only the standard for people who self host their llms and don’t have $500k to throw at hardware for GLM-5.1 or similar models.
I have qwen3.6:27b on my local hardware and it’s way better than I expected. I’m excited for the rest of the 3.6 line as it comes out, if they can keep up that quality.
This story is also a nothing burger. Generally, yes, Nvidia will suffer once chinas stack catches up (soon). By then whatever bubble we are in will have normalized one way or the other.
In terms of actually deploying this model, it doesn’t matter what hardware you’re using. VLLM supports almost everything with SIMD-type hardware instructions.
More competition will make everyone happy except Nvidia shareholders.
Gemma4:26b is also worth trying. I find it runs much faster on my hardware.
Edit: Qwen3.6:35B might be the sweet spot. It’s bigger than the 27B, but actually more lightweight when running. TIL the 27B is not a MoE model; it’s a dense model. The 35B is a MoE model with only 3B active params.
So far, I think Qwen3.6:35B might be giving me better results than Gemma4:26B. It’s a bit slower than Gemma4:26B, but definitely faster than Qwen3.6:27B.