How to check number of documents in vectorstore in langchain? – Langchain

by
Liam Thompson
chromadb langchain large-language-model openaiembeddings vector-database

Quick Fix: To get the document count from a vectorstore, use len(vectorstore.get()['documents']). This works on vectorstores created using chromadb.Database.add_vectorstore.

The Problem:

Given a vector store created using the Chroma class from the langchain library, how can I determine the number of documents or embeddings stored within it?

The Solutions:

Solution 1: Get Number of Documents in LangChain VectorStore

To check the number of documents or embeddings inside a LangChain VectorStore, you can use the `len()` function on the `vectorstore.get()` method. This method returns a tuple containing two dictionaries: ‘documents’ and ’embeddings’. The ‘documents’ dictionary contains the document IDs as keys and the corresponding embeddings as values. The ’embeddings’ dictionary contains the embedding IDs as keys and the corresponding embeddings as values.

To get the number of documents in the vector store, you can use the following code:

num_documents = len(vectorstore.get()['documents'])

Similarly, to get the number of embeddings in the vector store, you can use the following code:

num_embeddings = len(vectorstore.get()['embeddings'])

Here’s a complete example:

from langchain.vectorstores import Chroma

vectorstore = Chroma.from_documents(documents=final_docs, embedding=embeddings, persist_directory=persist_dir)

num_documents = len(vectorstore.get()['documents'])
num_embeddings = len(vectorstore.get()['embeddings'])

print(f"Number of documents: {num_documents}")
print(f"Number of embeddings: {num_embeddings}")

This code will print the number of documents and embeddings in the vector store.

Solution 2: Pull the documents and count them

One way to check the number of documents in a VectorStore is to pull the documents and count them. You can use the `get()` method on the collection to retrieve all the documents in a single request. Each document is represented by a dictionary, and you can use the `len()` function to count the number of documents in the collection. Here’s an example:

all_documents = collection.get()['documents']
total_records = len(all_documents)
print("Total records in the collection:", total_records)

This will print the total number of documents in the collection to the console.

Q&A

how to get the number of docs inside vectorstore?

You can get the document count with len(vectorstore.get()['documents'])

How to get the number of embeddings in vectorstore?

There’s no direct way, get all documents and count their embeddings.

Video Explanation:

The following video, titled "Loaders, Indexes & Vectorstores in LangChain: Question Answering ...", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

Full Text Tutorial: https://www.mlexpert.io/prompt-engineering/loaders In this tutorial, we dive deep into the functionalities of ...