The Problem:
The goal is to stream output in FastAPI from the response obtained using llama-index. However, when attempting to stream the response, an error occurs, indicating that a generator object cannot be pickled. The objective is to establish a stream between the FastAPI API and the output, enabling the streaming of answers. Additionally, it would be helpful to know if a similar approach can be implemented using Flask instead of FastAPI.
The Solutions:
Solution 1: Using a Custom Async Streaming Generator
To stream output in FastAPI from the response of llama-index, a custom async generator function, astreamer
, is created. This function wraps the generator returned by query_engine.query
and yields its values one by one with a slight delay (await asyncio.sleep(.1)
).
The create_item
endpoint in FastAPI receives the input text and uses the query engine to get the response. It then returns a StreamingResponse
with the custom astreamer
as the data source and text/event-stream
as the media type.
This approach allows the API to stream the responses from llama-index in real-time, simulating the streaming behavior seen in the console version. The media type is chosen to suit the streaming nature of the response.
Q&A
Is there a way to stream output in Fastapi from the response I get from llama-index ?
Yes, for a quick fix, I did a quick hack using yield function of python and tagged it along with StreamingResponse
of FastAPI.
Can I do a similar thing in Flask ?
I am not too sure about Flask. But you can try adapting the same approach as in FastAPI.
Is there any other workaround to do it in FastAPI ?
I am not aware of any other workaround at the moment.
Video Explanation:
The following video, titled "Streaming for LangChain Agents + FastAPI - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.
Go to channel · Langchain vs. LlamaIndex. Omari Harebin•9.6K views · 23:04 · Go to channel · Using LangChain Output Parsers to get what you want ...
The following video, titled "Streaming for LangChain Agents + FastAPI - YouTube", provides additional insights and in-depth exploration related to the topics discussed in this post.
Go to channel · Langchain vs. LlamaIndex. Omari Harebin•9.6K views · 23:04 · Go to channel · Using LangChain Output Parsers to get what you want ...