LlamaIndex with ChatGPT taking too long to retrieve answers – Python

by
Ali Hasan
chatgpt-api llama-cpp-python llama-index openai-api

Quick Fix: To resolve the issue, add the retrain argument to st.session_state and set it to False at the end of your retrain function. This way, LlamaIndex will only reindex on startup, preventing excessive delays.

The Problem:

A chatbot that utilizes LlamaIndex and ChatGPT for domain knowledge retrieval is experiencing slow performance, with an average response time of 15-20 seconds. Optimizing the Llamaindex with different strategies has not significantly improved the speed. The chatbot is being used on multiple machines, eliminating hardware limitations as a potential cause. Suggestions are sought to enhance the performance of the chatbot, allowing for faster retrieval of answers.

The Solutions:

Solution 1: Set retrain flag to False after reindexing

Streamlit is a stateless framework, meaning that every interaction with the app is treated as a fresh start. By default, the retrain flag is set to True every time the app is run. This means that the entire indexing process, including document loading and indexing, will be executed with each interaction.

To optimize performance and avoid unnecessary reindexing, the retrain flag should only be set to True when the index needs to be rebuilt. This can be done by adding the retrain argument to st.session_state and setting it to False at the end of the retrain function.

Here’s a modified code snippet that incorporates this optimization:

import os
import sys
import streamlit as st
from llama_index import (LLMPredictor, GPTSimpleVectorIndex, 
                             SimpleDirectoryReader, PromptHelper, ServiceContext)
from langchain import OpenAI
    
os.environ["OPENAI_API_KEY"] = ...

# Add retrain flag to st.session_state
if 'retrain' not in st.session_state:
    st.session_state['retrain'] = False

doc_path = 'docs'
index_file = 'index.json'
st.title("Chatbot")

def ask_ai():
    st.session_state.response  = index.query(st.session_state.prompt)

if st.session_state['retrain']:
    documents = SimpleDirectoryReader(doc_path).load_data()
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens = 128))
    num_output = 256
    max_chunk_overlap = 20
    max_input_size = 4096
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    index = GPTSimpleVectorIndex.from_documents(
        documents, service_context=service_context
    )
    index.save_to_disk(index_file)
    st.session_state['retrain'] = False # Set retrain flag to False after reindexing

if 'response' not in st.session_state:
    st.session_state.response = ''

elif os.path.exists(index_file):
    index = GPTSimpleVectorIndex.load_from_disk(index_file)

if index != None:
    st.text_input("Ask something: ", key='prompt')
    st.button("Send", on_click=ask_ai)
    if st.session_state.response:
        st.subheader("Response: ")
        st.success(st.session_state.response) 

By incorporating this change, the reindexing process will only occur when it is necessary, improving the performance of the chatbot.

Q&A

Cause of slow performance of the chatbot?

Retraining the index every interaction may cause the slowness.

Can you provide example of improving performance?

Setting retrain argument to st.session_state and setting it to false at the end of retrain function may resolve the issue.

Can you provide more context on the slow loading of answers?

The performance issue may not be due to hardware limitations.

Video Explanation:

The following video, titled "Fully Functional Chatbot with Llama Index: Build a Custom ChatGPT ...", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

Learn how to build your own ChatGPT with Llama Index. This ChatGPT will allow you to easily create custom prompts for your users.