llama.cppCUDABuildRTX 5060 Ti
Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build \
-DGGML_CUDA=ON \
-DCMAKE_CUDA_ARCHITECTURES=90 \
-DBUILD_SHARED_LIBS=OFF
cmake --build build -j$(nproc) --target llama-server
Set -DCMAKE_CUDA_ARCHITECTURES to match your GPU:
90— Blackwell (RTX 50 series). Forward compatibility handles sub-architectures.89— Ada Lovelace (RTX 40 series)86— Ampere (RTX 30 series)75— Turing (RTX 20 series)
The same commands work for forks — just change the git clone URL:
- ik_llama.cpp:
https://github.com/ikawrakow/ik_llama.cpp.git - TurboQuant:
https://github.com/TheTom/llama-cpp-turboquant.git(use branchfeature/turboquant-kv-cache)