OpenCode with Local llama.cpp (Qwen 3.6)

Point OpenCode at a local llama.cpp server. Works with any OpenAI-compatible endpoint.

Create opencode.json in your project root:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "llamacpp": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "llama-server",
      "options": {
        "baseURL": "http://<your-server-ip>:11433/v1"
      },
      "models": {
        "home-qwen": {
          "name": "Home Qwen",
          "limit": {
            "context": 90000,
            "output": 90000
          }
        }
      }
    }
  },
  "mcp": {
  }
}

Key points:

baseURL points to your llama-server’s /v1 endpoint (OpenAI-compatible API)
Context limit set to 90K to stay within the MTP + TurboQuant sweet spot on 16 GB VRAM
The mcp block is where you’d add MCP servers (analytics, GitHub, etc.)
Model name is arbitrary — OpenCode sends requests to whatever model the server is running