According to screenshots posted on X by tipster @wxnod, Corsair has integrated memory chips manufactured by Chinese DRAM maker ChangXin Memory Technologies (CXMT) into its next-generation memory...
Same. I’m running Qwen3.6-35B-A3B-FP8 (Qwen3.6-35B-A3B-UD-IQ4_XS.gguf) via the turboquant fork of llama.cpp with a few tweaked memory settings, and I get like 40 tokens / second – nothing that required special insight on my part just following the instructions I saw on a youtube video I found via [email protected] and asking claude to help me through the installation.
AI has no economic moat. There’s nothing stopping anyone from running LLMs locally.
I just updated my setup from LMStudio to llama.cpp with the new QWEN 3.6 27B MTP model and I am getting 80-112 tokens/second, 90 average which is just shocking to me. I am on a 4090 with a context
Window of 64k. It hardly use cloud AI anymore as I rarely need more than 64k if I ensure my first prompt is written like a design document. Multiple prompts are not great so I often just figure out where my initial prompt went wrong, adjust and try again in a fresh session. Way faster this way too. It has really worked out well for me as I am getting just as much done locally for free as I was with hundreds of dollar a month on cloud AI. I am still shocked and grateful it flowed this way.
Same. I’m running Qwen3.6-35B-A3B-FP8 (Qwen3.6-35B-A3B-UD-IQ4_XS.gguf) via the turboquant fork of llama.cpp with a few tweaked memory settings, and I get like 40 tokens / second – nothing that required special insight on my part just following the instructions I saw on a youtube video I found via [email protected] and asking claude to help me through the installation.
AI has no economic moat. There’s nothing stopping anyone from running LLMs locally.
I just updated my setup from LMStudio to llama.cpp with the new QWEN 3.6 27B MTP model and I am getting 80-112 tokens/second, 90 average which is just shocking to me. I am on a 4090 with a context Window of 64k. It hardly use cloud AI anymore as I rarely need more than 64k if I ensure my first prompt is written like a design document. Multiple prompts are not great so I often just figure out where my initial prompt went wrong, adjust and try again in a fresh session. Way faster this way too. It has really worked out well for me as I am getting just as much done locally for free as I was with hundreds of dollar a month on cloud AI. I am still shocked and grateful it flowed this way.
What do you run it on?
https://www.amazon.com/dp/B0BV8H8HVD with linux mint installed.