The Problem:
When attempting to run llama index with llama cpp inside a docker container, the environment variables specified in the llama cpp documentation are not functioning as expected. The expected behavior is for the variable BLAS to equal 1, indicating that the GPU should be used by llama. However, the current behavior shows BLAS equaling 0, signifying that the CPU is being utilized by llama. Despite trying numerous methods to set the environment variables within the Dockerfile, including using the FORCE_CMAKE and CMAKE_ARGS arguments, and reinstalling llama-cpp-python within the running container using environment variables, the problem persists. The system’s nvidia-smi output confirms the presence of a GPU, but it remains unused by llama.
The Solutions:
Solution 1: Setting Environment Variables Correctly
The solution to enabling GPU support for llama-cpp-python within a Docker container involves correctly setting the necessary environment variables. This can be achieved by modifying the Dockerfile as follows:
- Base Image: Start with an appropriate base image that includes both Python and CUDA support. In this case, the recommended image is
nvidia/cuda
with a specific version that matches your requirements. - Environment Variables: Set the
CMAKE_ARGS
environment variable before installingllama-cpp-python
. The value of this variable should be"-DLLAMA_CUBLAS=ON"
, ensuring thatllama-cpp-python
is built with CUDA support enabled. - Python and Dependency Installation: Install Python and any required dependencies within the container. This can be done using commands like
RUN apt-get update && apt-get install -y python3 python3-pip
to install Python andpip install --no-cache-dir --upgrade pip && \
to install Python dependencies.
pip install -r requirements.txt --no-cache-dir - Running the Server: Finally, specify the command to run the server within the container using
CMD ["python3", "./server.py"]
.
With these modifications, the Dockerfile should correctly set the environment variables and install llama-cpp-python
with GPU support enabled, allowing you to utilize GPU resources within the container.
Q&A
How to enable GPU in llama container?
Set environment variable CMAKE_ARGS="-DLLAMA_CUBLAS=ON" before installing llama-cpp-python.
Image of Docker File that works with GPU?
FROM nvidia/cuda:11.7.1-devel-ubuntu22.04
Video Explanation:
The following video, titled "Run Vicuna-13B On Your Local Computer | Tutorial (GPU) - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.
In this video, I'll show you how to install and interact with the Vicuna-13B model, which is the best free chat bot according to GPT-4.
The following video, titled "Run Vicuna-13B On Your Local Computer | Tutorial (GPU) - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.
In this video, I'll show you how to install and interact with the Vicuna-13B model, which is the best free chat bot according to GPT-4.