Skip to main content
NJannasch.Dev

Snippets

Short, practical code references. Copy-paste ready.

OpenCodellama.cppQwenLocal-First

OpenCode with Local llama.cpp (Qwen 3.6)

Connect OpenCode to a local llama.cpp server running Qwen 3.6 MTP. Zero API costs, 90K context, local-first AI coding.

llama.cppGemmaMTPRTX 5060 Ti

Gemma 4 MTP Server (ik_llama.cpp)

Run Gemma 4 26B-A4B with MTP speculative decoding using ik_llama.cpp. Separate drafter model, 133 t/s on an NVIDIA RTX 5060 Ti 16 GB.

llama.cppQwenMTPRTX 5060 Ti

Qwen 3.6 MTP Server (llama.cpp)

Run Qwen 3.6 35B-A3B with MTP speculative decoding on llama.cpp. 144 t/s on an NVIDIA RTX 5060 Ti 16 GB.

llama.cppGemmaRTX 5060 Ti

Gemma 4 256K Context Server (llama.cpp)

llama-server config for Gemma 4 26B-A4B MoE with full 256K context on an NVIDIA RTX 5060 Ti 16 GB. The key: do NOT use --swa-full.

llama.cppQwenTurboQuantRTX 5060 Ti

Qwen 3.6 with TurboQuant: 400K Context on 16 GB

llama-server config for Qwen 3.6 35B-A3B with TurboQuant turbo3 KV cache. 400K context window on an RTX 5060 Ti 16 GB.

llama.cppCUDABuildRTX 5060 Ti

Build llama.cpp with CUDA

CMake build commands for llama.cpp with CUDA GPU acceleration. Works for mainline, ik_llama.cpp, and TurboQuant forks.