[Solved] How do I slim down SBERT's sentencer-transformer library? – Python

by
Alexei Petrov
huggingface-datasets large-language-model python pytorch sentence-transformers

Quick Fix: To slim down SBERT’s sentencer-transformer library when running on CPU, use the official PyTorch Docker image with added ‘sentence-transformers,’ or build your image with the CPU-only version of PyTorch, avoiding Nvidia GPU dependencies.

The Problem:

I am using the sentence-transformer library in Python for vector embeddings of text chunks. However, the library is very large, taking up over 7 GB of disk space and requiring a long build time. How can I slim down the installation to only include the essential components without compromising functionality?

The Solutions:

Solution 1: Use no-cache-dir and torch CPU only version

  • Use --no-cache-dir to prevent caching packages, which can save 2.4 GB of space.
  • Install the CPU only version of torch to avoid installing unnecessary dependencies for GPU usage. This can reduce the image size to 1.39GB.
FROM python:3.11.2-slim-bullseye
RUN pip install --upgrade pip && pip install --no-cache-dir sentence-transformers
FROM python:3.11.2-slim-bullseye
RUN pip install --no-cache-dir --upgrade pip
RUN pip install --no-cache-dir torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir sentence-transformers

Q&A

What is the source of bloat in the SentenceTransformer library?

The primary cause of bloat is the underlying torch library and its dependencies on nvidia-*.

How can I minimize bloat on a CPU-only system?

Use the --no-cache-dir flag and install the CPU-only version of torch.

How can I minimize bloat on a GPU system?

I lack the necessary knowledge to provide specific recommendations for reducing bloat on a GPU system.

Video Explanation:

The following video, titled "Related Video", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

... Slim, Cosmic Gate, Danny Tenaglia, Junior Vasquez, Pineapple Jack, Patrick ... will solve anything.” #Tatanka #Hardstyle #Italianhardstyle ...