How to retrieve source documents via LangChain's get_relevant_documents method only if the answer is from the custom knowledge base – Langchain

by
Alexei Petrov
langchain openai-api py-langchain python

Quick Fix: In order to only return source documents when using the get_relevant_documents method, one needs to switch the chain type to RetrievalQA. Additionally, agents and tools need to be introduced to facilitate the process.

The Problem:

In a chatbot system where knowledge is retrieved from an external knowledge base called docs, there’s a need to filter the relevant documents retrieved by the system based on whether the answer to a user’s query was derived from the knowledge base (docs) or not. The goal is to only return relevant documents if the answer was indeed retrieved from docs. This is to prevent the inclusion of irrelevant documents in cases where the bot’s response is not related to the knowledge base.

The Solutions:

Solution 1: Utilize RetrievalQA chain type to retrieve source documents selectively based on answer’s origin.

This solution addresses the limitation in retrieving relevant documents solely from a custom knowledge base when using LangChain’s `get_relevant_documents` method. It involves introducing agents and tools to selectively retrieve source documents only when the answer is derived from the custom knowledge base.

  1. Utilizing RetrievalQA Chain Type: We switch from ConversationalRetrievalChain to RetrievalQA chain type, which offers more flexibility in controlling the retrieval process.
  2. Agent and Tool Setup: We define a tool named `doc_search_tool` that encapsulates the RetrievalQA chain and includes a description of its purpose. Then, we initialize an agent that utilizes this tool along with an OpenAI language model and a conversation buffer memory.
  3. Query Execution: We execute two queries. The first query pertains to a topic covered in the custom knowledge base, while the second is a general question not related to the knowledge base.
  4. Inspecting Results: After executing the queries, we examine the results. If the result contains a `"intermediate_steps"` key with values, it indicates that the answer was derived from the custom knowledge base, and we can access the source documents through `result["intermediate_steps"][0][1]["source_documents"]`. However, if `"intermediate_steps"` is empty, it suggests that the answer did not require accessing the knowledge base.

This solution effectively addresses the initial problem by allowing selective retrieval of source documents based on the origin of the answer.

Solution 2: Check for Answer Source Existence and Return Relevant Documents Accordingl

To ensure that you only retrieve relevant documents from the custom knowledge base in scenarios where the chatbot’s response is retrieved from that knowledge base, incorporate the following steps:

  1. Use Custom Knowledge Base Flag:

    Add a flag or condition to determine whether the answer generated by the chatbot is derived from your custom knowledge base.

  2. Check Custom Knowledge Base Flag:

    When retrieving relevant documents, check this flag before proceeding.

  3. Retrieve Relevant Documents:

    If the flag indicates the answer is from the custom knowledge base, use the `retriever.get_relevant_documents(query)` method to retrieve and display the relevant documents.

  4. Handle Non-Custom Knowledge Base Answers:

    If the flag indicates the answer is not from the custom knowledge base, handle this scenario as desired, such as by displaying a message indicating no relevant documents are available.

This ensures that relevant documents are only retrieved and displayed when the chatbot’s response is directly retrieved from the custom knowledge base, which aligns with your desired behavior.

Q&A

To check answers from custom knowledge base, which parameter should be set?

Set ‘return_source_documents’ to ‘True’ to get source docs URL.

How to obtain source documents URL?

Use a for loop to iterate through ‘source_documents’ and print ‘metadata’ attribute.

Does adding an agent affect answer length?

I cannot answer this question, it is unrelated to the given context.

Video Explanation:

The following video, titled "LangChain Retrieval QA Over Multiple Files with ChromaDB ...", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

... docs into a single Vectors Store retriever and then do QA over all the docs and return their source info along with answers. My Links ...