Is there a way to stream output in Fastapi from the response I get from llama-index

Quick Fix: Here’s a quick fix to stream output in FastAPI from the response you get from llama-index:

Use yield function of Python and tag it along with StreamingResponse of FastAPI.

Modify your code like this:

async def astreamer(generator):
    try:
        for i in generator:
            yield (i)
            await asyncio.sleep(.1)
    except asyncio.CancelledError as e:
        print('cancelled')

This solution allows you to stream the output in FastAPI from the response you get from llama-index.

The Problem:

The goal is to stream output in FastAPI from the response obtained using llama-index. However, when attempting to stream the response, an error occurs, indicating that a generator object cannot be pickled. The objective is to establish a stream between the FastAPI API and the output, enabling the streaming of answers. Additionally, it would be helpful to know if a similar approach can be implemented using Flask instead of FastAPI.

The Solutions:

Solution 1: Using a Custom Async Streaming Generator

To stream output in FastAPI from the response of llama-index, a custom async generator function, astreamer, is created. This function wraps the generator returned by query_engine.query and yields its values one by one with a slight delay (await asyncio.sleep(.1)).

The create_item endpoint in FastAPI receives the input text and uses the query engine to get the response. It then returns a StreamingResponse with the custom astreamer as the data source and text/event-stream as the media type.

This approach allows the API to stream the responses from llama-index in real-time, simulating the streaming behavior seen in the console version. The media type is chosen to suit the streaming nature of the response.

Q&A