@Franconian_Nomad

Franconian_Nomad@feddit.org · 6 days ago

Sure! How much experience do you have with LLMs?

Franconian_Nomad@feddit.org · edit-2 6 days ago

I have a Radeon RX 7800 XT.

Qwen 3.5-9b is blazingly fast on it. However while it’s its impressive for its size, it has its limitations. Complex tasks with several steps are too much for it.

So now I run the 3.6-35B model with llama.cpp It’s too big for my VRAM so I had to split it: everything that doesn’t fit on the graphics’s card runs in the normal RAM. That slows everything down, but with the right flags I get a bit over 20 tokens/s.

If you have problems with speed and you’re using ollama I would replace it with something faster like llama.cpp.

Franconian_Nomad@feddit.org · 7 days ago

I recommend Qwen3.6, either the 27B dense or the 35B MoE model. Both outstanding for local models.

Franconian_Nomad@feddit.org · 7 days ago

They can also make you smarter if you use them right. Key is to use local models and not giving the techbros any money.