The Problem:
How to add a custom field to the metadata of a document using the Langchain JavaScript SDK?
The Solutions:
Solution 1: Custom metadata can be added through the 2nd parameter of createDocuments
The 2nd argument of the `createDocuments` method in the `CharacterTextSplitter` can be an array of objects. The properties of these objects will be assigned into the metadata of every element in the returned `documents` array.
For Example:
const myMetaData = { url: "https://www.google.com" };
const documents = await splitter.createDocuments([text], [myMetaData],
{ chunkHeader, appendChunkOverlapHeader: true });
After this, documents
will contain an array, with each element being an object with pageContent
and metaData
properties. Under metaData
, the properties from myMetaData
above will also appear. pageContent
will also have the text of chunkHeader
prepended.
{
pageContent: <chunkHeader plus the chunk>,
metadata: <all properties of myMetaData plus loc (text line numbers of chunk)>
}
Solution 2: Adding Field to Metadata
To add a field to the metadata of a document in LangChain, you can iterate through the documents and modify their metadata directly. Here’s an example with an additional field named “doc_id”:
for (const _doc of docs) {
_doc.metadata['doc_id'] = doc_id;
}
Solution 3: Use `Document` class with `splitDocuments` method
To add a field to the metadata of a Langchain Document while using the CharacterTextSplitter
, follow these steps:
- Create a new instance of the
Document
class, passing thepageContent
and the desired metadata as parameters. - Use the
CharacterTextSplitter.splitDocuments()
method to split your text into multiple documents, passing the newly createdDocument
instance as an argument.
const docOutput = await splitter.splitDocuments([
new Document({ pageContent: text }, metadata: { someField: "someValue" })
]);
Q&A
How should I add a field to the metadata of Langchain’s Documents?
The 2nd argument of createDocuments can take an array of objects whose properties will be assigned into the metadata of every element of the returned documents array.
Video Explanation:
The following video, titled "Retrieval-Augmented Generation (RAG) using LangChain and ...", provides additional insights and in-depth exploration related to the topics discussed in this post.
... TypeScript/JavaScript worlds in the field of AI. As a developer advocate, he regularly shares his work through his articles and demos for ...
The following video, titled "Retrieval-Augmented Generation (RAG) using LangChain and ...", provides additional insights and in-depth exploration related to the topics discussed in this post.
... TypeScript/JavaScript worlds in the field of AI. As a developer advocate, he regularly shares his work through his articles and demos for ...