Vector Store Writer
Enhanced vector storage for text and image content with advanced features
What is Vector Store Writer?
The Vector Store Writer is an advanced node for storing content in vector databases with enhanced capabilities for both text and image data. It offers image support, improved metadata handling, and enhanced document processing workflows for building comprehensive knowledge bases and search systems.
How to use it?
To effectively store content in your vector database, follow these steps:
-
Select Input Type:
- Text: Store textual content like documents, articles, or descriptions
- Image: Store visual content with corresponding embeddings for similarity search
-
Connect Vector Store:
- Postgres: Use PostgreSQL with pgvector extension for relational vector storage
- Pinecone: Connect managed Pinecone vector database for high-performance search
- OpenSearch: Use OpenSearch for distributed vector storage and hybrid search
-
Configure Storage Location:
- For Pinecone/OpenSearch: Specify Index Name (default: "documents")
- For Postgres: Specify Table Name (default: "embeddings")
- Ensure consistent naming across your vector operations
-
Provide Reference Identifier:
- Reference: Unique identifier for the content (URL, file path, custom ID)
- Used for deduplication and content management
- Essential for later retrieval and deletion operations
-
Connect Content Input:
- For Text: Connect text content from document processors, scrapers, or text inputs
- For Image: Connect image files from file readers, API responses, or image processors
-
Connect Processing Components:
- Embedder: Text or Image embedder to convert content to vectors
- Document Splitter: (Text only) Split large documents into manageable chunks
- Metadata: Optional structured data to store alongside content
-
Execute Storage:
- Node processes content, generates embeddings, and stores in vector database
- No output - acts as terminal node for storage operations
Example of usage
Objective: Build a comprehensive knowledge base that supports both text documents and visual content with rich metadata.
Text Content Storage:
- Content Source: Website scraper or document reader provides text content
- Document Processing: Document splitter breaks large texts into searchable chunks
- Embeddings: OpenAI text embedder converts chunks to vector representations
- Metadata: Include source URL, creation date, content type, and tags
- Storage: Content and vectors stored in PostgreSQL with pgvector
Image Content Storage:
- Visual Input: Image files from file readers or API responses
- Image Processing: Image embedder generates visual embeddings
- Metadata: Include image dimensions, file type, source, and visual tags
- Reference System: Use image filename or hash as unique reference
- Storage: Images and embeddings stored alongside text content
Additional information
Input Type Comparison:
Text Input Features:
- Document splitting for optimal chunk size
- Rich text content preservation
- Semantic embedding generation
- Metadata integration for context
Image Input Features:
- Visual similarity embedding generation
- Support for various image formats
- Visual metadata extraction capabilities
- Cross-modal search preparation
Document Splitting Benefits:
- Optimal Chunk Size: Improves embedding quality and search relevance
- Context Preservation: Maintains document structure and meaning
- Search Granularity: Enables precise content retrieval
- Memory Efficiency: Manages large document processing effectively
Common Use Cases:
- Knowledge base construction with mixed content types
- Content management systems with semantic search
- Research databases with visual and textual content
- Product catalogs with descriptions and images
- Educational platforms with multimedia learning materials