Is there a way to stream output in Fastapi from the response I get from llama-index – Llama-index

by
Ali Hasan
langchain llama-cpp-python llama-index

Quick Fix: Here’s a quick fix to stream output in FastAPI from the response you get from llama-index:

Use yield function of Python and tag it along with StreamingResponse of FastAPI.

Modify your code like this:

async def astreamer(generator):
    try:
        for i in generator:
            yield (i)
            await asyncio.sleep(.1)
    except asyncio.CancelledError as e:
        print('cancelled')

This solution allows you to stream the output in FastAPI from the response you get from llama-index.

The Problem:

The goal is to stream output in FastAPI from the response obtained using llama-index. However, when attempting to stream the response, an error occurs, indicating that a generator object cannot be pickled. The objective is to establish a stream between the FastAPI API and the output, enabling the streaming of answers. Additionally, it would be helpful to know if a similar approach can be implemented using Flask instead of FastAPI.

The Solutions:

Solution 1: Using a Custom Async Streaming Generator

To stream output in FastAPI from the response of llama-index, a custom async generator function, astreamer, is created. This function wraps the generator returned by query_engine.query and yields its values one by one with a slight delay (await asyncio.sleep(.1)).

The create_item endpoint in FastAPI receives the input text and uses the query engine to get the response. It then returns a StreamingResponse with the custom astreamer as the data source and text/event-stream as the media type.

This approach allows the API to stream the responses from llama-index in real-time, simulating the streaming behavior seen in the console version. The media type is chosen to suit the streaming nature of the response.

Q&A

Is there a way to stream output in Fastapi from the response I get from llama-index ?

Yes, for a quick fix, I did a quick hack using yield function of python and tagged it along with StreamingResponse of FastAPI.

Can I do a similar thing in Flask ?

I am not too sure about Flask. But you can try adapting the same approach as in FastAPI.

Is there any other workaround to do it in FastAPI ?

I am not aware of any other workaround at the moment.

Video Explanation:

The following video, titled "Streaming for LangChain Agents + FastAPI - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.

Play video

Go to channel · Langchain vs. LlamaIndex. Omari Harebin•9.6K views · 23:04 · Go to channel · Using LangChain Output Parsers to get what you want ...