Qwen3 6 35b A3b Q4 Via Llama Cpp Run Locally On Only Cpu Ram At 17t S - Detailed Analysis
The llama.cpp server running with TurboQuant โ serving Qwen3.6-35B-A3B with 128k context. Many reported thinking/reasoning/tool calling issues with this model, but if Stack MTP and ngram-mod together in mainline Try Runpod Today: MTP is Multi-Token Prediction. Timestamps: 00:00 - Intro 01:18 - First Look 02:05 - Technical Look 03:17 -
Photo Gallery



















