Question mutiple pdf's using openai, pinecone, langchain – Pinecone

by
Ali Hasan
llama-cpp-python openai-api pinecone py-langchain

Quick Fix: You can create a directory loader in python using PyPDFDirectoryLoader to load multiple pdfs simultaneously.

The Solutions:

Solution 1: PyPDFDirectoryLoader

Here is a modified version of your code that uses `PyPDFDirectoryLoader` to load multiple PDFs:

from langchain.document_loaders import PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("docs")
data = loader.load()

The PyPDFDirectoryLoader loads all the PDFs in a specified directory. You can then use the rest of your code to process these PDFs as you did with a single PDF.

Note that you may need to adjust the chunk_size parameter of the RecursiveCharacterTextSplitter to a larger value to accommodate the larger size of multiple PDFs.

Q&A

Can you give me a code that can load multiple PDFs at once?

Yes, you can load multiple PDFs with PyPDFDirectoryLoader.

How do you use PyPDFDirectoryLoader?

You can use PyPDFDirectoryLoader by passing the directory path to the constructor of the PyPDFDirectoryLoader. The constructor will load all the PDFs in the directory.

How do I ask questions against multiple documents?

To ask questions against multiple documents, you can use the similarity_search() method of a vector store to find the most similar documents to your query. Then, you can ask your question to each of the similar documents using a text embedding model, such as OpenAI or Pinecone.

Video Explanation:

The following video, titled "GPT-4 Tutorial: How to Chat With Multiple PDF Files (~1000 pages ...", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

In this video we'll learn how to use OpenAI's new GPT-4 api to 'chat' with and analyze multiple PDF files.