MongoDB Integration

Vector Store Implementation - MongoDB

Overview

The MongoDB vector store implementation in Rig provides integration with MongoDB Atlas Vector Search, allowing for semantic search capabilities using MongoDB’s vector search indexes.

Key Features

Cosine similarity search
Custom search parameters
Automatic index validation
Detailed score tracking
Flexible document schema support

Basic Usage

use rig_mongodb::{MongoDbVectorIndex, SearchParams};
 
// Initialize the vector store
let index = MongoDbVectorIndex::new(
    collection,
    embedding_model,
    "vector_index",
    SearchParams::new()
).await?;
 
// Search for similar documents
let results = index.top_n::<Document>("search query", 5).await?;

Implementation Details

Core Components

Vector Index Structure:

rig-mongodb/src/lib.rs [82-89]

/// The `MongoDbVectorIndex` struct is the core component for interacting with MongoDB's vector search capabilities.
/// It encapsulates the MongoDB collection, the embedding model, and the index name, along with search parameters.
///
/// ```rust
/// pub struct MongoDbVectorIndex {
///     collection: Collection<Document>,
///     model: Box<dyn EmbeddingModel>,
///     index_name: String,
///     search_params: SearchParams,
/// }
/// ```
///
/// - `collection`: The MongoDB collection where documents are stored.
/// - `model`: The embedding model used to generate vector representations of text.
/// - `index_name`: The name of the vector index in MongoDB.
/// - `search_params`: Parameters for customizing the search behavior.
pub struct MongoDbVectorIndex {
    collection: Collection<Document>,
    model: Box<dyn EmbeddingModel>,
    index_name: String,
    search_params: SearchParams,
}

Search Parameters:

Configurable field name for embeddings
Customizable number of candidates
Support for MongoDB-specific search options

Search Pipeline

The MongoDB implementation uses an aggregation pipeline with three main stages:

Search Stage: Performs vector similarity search
Score Stage: Calculates and normalizes similarity scores
Project Stage: Formats the output documents

Reference implementation:

    ///     .top_n::<Definition>("My boss says I zindle too much, what does that mean?", 1)
    ///     .await?;
    /// ```
    async fn top_n<T: for<'a> Deserialize<'a> + Send>(
        &self,
        query: &str,
        n: usize,
    ) -> Result<Vec<(f64, String, T)>, VectorStoreError> {
        let prompt_embedding = self.model.embed_text(query).await?;

        let mut cursor = self
            .collection
            .aggregate([

Document Schema Requirements

Documents must include:

A unique identifier field (_id)
An embedding vector field (configurable name)
Optional additional fields for storage

Example schema:

#[derive(Embed, Clone, Deserialize, Debug)]
struct Document {
    #[serde(rename = "_id")]
    id: String,
    #[embed]
    content: String,
    embedding: Vec<f64>,
}

MongoDB Index Requirements

The collection must have a vector search index configured:

rig-mongodb/tests/integration_tests.rs [108-127]

Special Considerations

Index Validation: The implementation automatically validates:
- Index existence
- Vector dimensions
- Similarity metric
Error Handling: MongoDB-specific errors are converted to Rig’s error types:

rig-mongodb/src/lib.rs [54-56]

Performance Optimization:
- Uses MongoDB’s native vector search capabilities
- Supports cursor-based result streaming
- Optimizes query projection

Integration Example

A complete example showing document embedding and search:


#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
    // Initialize OpenAI client
    let openai_api_key = env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY not set");
    let openai_client = Client::new(&openai_api_key);

    // Initialize MongoDB client
    let mongodb_connection_string =
        env::var("MONGODB_CONNECTION_STRING").expect("MONGODB_CONNECTION_STRING not set");
    let options = ClientOptions::parse(mongodb_connection_string)
        .await
        .expect("MongoDB connection string should be valid");

    let mongodb_client =
        MongoClient::with_options(options).expect("MongoDB client options should be valid");

    // Initialize MongoDB vector store
    let collection: Collection<bson::Document> = mongodb_client
        .database("knowledgebase")
        .collection("context");

    // Select the embedding model and generate our embeddings
    let model = openai_client.embedding_model(TEXT_EMBEDDING_ADA_002);

    let words = vec![
        Word {
            id: "doc0".to_string(),
            definition: "Definition of a *flurbo*: A flurbo is a green alien that lives on cold planets".to_string(),
        },
        Word {
            id: "doc1".to_string(),
            definition: "Definition of a *glarb-glarb*: A glarb-glarb is a ancient tool used by the ancestors of the inhabitants of planet Jiro to farm the land.".to_string(),
        },
        Word {
            id: "doc2".to_string(),
            definition: "Definition of a *linglingdong*: A term used by inhabitants of the far side of the moon to describe humans.".to_string(),
        }
    ];

For detailed API reference and additional features, see the MongoDB Atlas Vector Search documentation and Rig’s API documentation.

API Reference (docs.rs)

LanceDB Neo4j