Skip to main content
NJannasch.Dev
llama.cppCUDABuildRTX 5060 Ti

Build llama.cpp with CUDA

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp

cmake -B build \
  -DGGML_CUDA=ON \
  -DCMAKE_CUDA_ARCHITECTURES=90 \
  -DBUILD_SHARED_LIBS=OFF

cmake --build build -j$(nproc) --target llama-server

Set -DCMAKE_CUDA_ARCHITECTURES to match your GPU:

  • 90 — Blackwell (RTX 50 series). Forward compatibility handles sub-architectures.
  • 89 — Ada Lovelace (RTX 40 series)
  • 86 — Ampere (RTX 30 series)
  • 75 — Turing (RTX 20 series)

The same commands work for forks — just change the git clone URL:

  • ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp.git
  • TurboQuant: https://github.com/TheTom/llama-cpp-turboquant.git (use branch feature/turboquant-kv-cache)