No GPU support while running llama-cpp-python inside a docker container – Docker

by
Maya Patel
blas boot2docker cublas llama-cpp-python llamacpp

The Problem:

When attempting to run llama index with llama cpp inside a docker container, the environment variables specified in the llama cpp documentation are not functioning as expected. The expected behavior is for the variable BLAS to equal 1, indicating that the GPU should be used by llama. However, the current behavior shows BLAS equaling 0, signifying that the CPU is being utilized by llama. Despite trying numerous methods to set the environment variables within the Dockerfile, including using the FORCE_CMAKE and CMAKE_ARGS arguments, and reinstalling llama-cpp-python within the running container using environment variables, the problem persists. The system’s nvidia-smi output confirms the presence of a GPU, but it remains unused by llama.

The Solutions:

Solution 1: Setting Environment Variables Correctly

The solution to enabling GPU support for llama-cpp-python within a Docker container involves correctly setting the necessary environment variables. This can be achieved by modifying the Dockerfile as follows:

  1. Base Image: Start with an appropriate base image that includes both Python and CUDA support. In this case, the recommended image is nvidia/cuda with a specific version that matches your requirements.
  2. Environment Variables: Set the CMAKE_ARGS environment variable before installing llama-cpp-python. The value of this variable should be "-DLLAMA_CUBLAS=ON", ensuring that llama-cpp-python is built with CUDA support enabled.
  3. Python and Dependency Installation: Install Python and any required dependencies within the container. This can be done using commands like RUN apt-get update && apt-get install -y python3 python3-pip to install Python and pip install --no-cache-dir --upgrade pip && \
    pip install -r requirements.txt --no-cache-dir
    to install Python dependencies.
  4. Running the Server: Finally, specify the command to run the server within the container using CMD ["python3", "./server.py"].

With these modifications, the Dockerfile should correctly set the environment variables and install llama-cpp-python with GPU support enabled, allowing you to utilize GPU resources within the container.

Q&A

How to enable GPU in llama container?

Set environment variable CMAKE_ARGS="-DLLAMA_CUBLAS=ON" before installing llama-cpp-python.

Image of Docker File that works with GPU?

FROM nvidia/cuda:11.7.1-devel-ubuntu22.04

Video Explanation:

The following video, titled "Run Vicuna-13B On Your Local Computer | Tutorial (GPU) - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

In this video, I'll show you how to install and interact with the Vicuna-13B model, which is the best free chat bot according to GPT-4.