MongoDB Integration
Vector Store Implementation - MongoDB
Overview
The MongoDB vector store implementation in Rig provides integration with MongoDB Atlas Vector Search, allowing for semantic search capabilities using MongoDB’s vector search indexes.
Key Features
- Cosine similarity search
- Custom search parameters
- Automatic index validation
- Detailed score tracking
- Flexible document schema support
Basic Usage
use rig_mongodb::{MongoDbVectorIndex, SearchParams};
// Initialize the vector store
let index = MongoDbVectorIndex::new(
collection,
embedding_model,
"vector_index",
SearchParams::new()
).await?;
// Search for similar documents
let results = index.top_n::<Document>("search query", 5).await?;
Implementation Details
Core Components
- Vector Index Structure:
rig-mongodb/src/lib.rs [82-89]
/// The `MongoDbVectorIndex` struct is the core component for interacting with MongoDB's vector search capabilities.
/// It encapsulates the MongoDB collection, the embedding model, and the index name, along with search parameters.
///
/// ```rust
/// pub struct MongoDbVectorIndex {
/// collection: Collection<Document>,
/// model: Box<dyn EmbeddingModel>,
/// index_name: String,
/// search_params: SearchParams,
/// }
/// ```
///
/// - `collection`: The MongoDB collection where documents are stored.
/// - `model`: The embedding model used to generate vector representations of text.
/// - `index_name`: The name of the vector index in MongoDB.
/// - `search_params`: Parameters for customizing the search behavior.
pub struct MongoDbVectorIndex {
collection: Collection<Document>,
model: Box<dyn EmbeddingModel>,
index_name: String,
search_params: SearchParams,
}
- Search Parameters:
- Configurable field name for embeddings
- Customizable number of candidates
- Support for MongoDB-specific search options
Search Pipeline
The MongoDB implementation uses an aggregation pipeline with three main stages:
- Search Stage: Performs vector similarity search
- Score Stage: Calculates and normalizes similarity scores
- Project Stage: Formats the output documents
Reference implementation:
/// .top_n::<Definition>("My boss says I zindle too much, what does that mean?", 1)
/// .await?;
/// ```
async fn top_n<T: for<'a> Deserialize<'a> + Send>(
&self,
query: &str,
n: usize,
) -> Result<Vec<(f64, String, T)>, VectorStoreError> {
let prompt_embedding = self.model.embed_text(query).await?;
let mut cursor = self
.collection
.aggregate([
Document Schema Requirements
Documents must include:
- A unique identifier field (
_id
) - An embedding vector field (configurable name)
- Optional additional fields for storage
Example schema:
#[derive(Embed, Clone, Deserialize, Debug)]
struct Document {
#[serde(rename = "_id")]
id: String,
#[embed]
content: String,
embedding: Vec<f64>,
}
MongoDB Index Requirements
The collection must have a vector search index configured:
rig-mongodb/tests/integration_tests.rs [108-127]
Special Considerations
-
Index Validation: The implementation automatically validates:
- Index existence
- Vector dimensions
- Similarity metric
-
Error Handling: MongoDB-specific errors are converted to Rig’s error types:
rig-mongodb/src/lib.rs [54-56]
- Performance Optimization:
- Uses MongoDB’s native vector search capabilities
- Supports cursor-based result streaming
- Optimizes query projection
Integration Example
A complete example showing document embedding and search:
#[tokio::main]
async fn main() -> Result<(), anyhow::Error> {
// Initialize OpenAI client
let openai_api_key = env::var("OPENAI_API_KEY").expect("OPENAI_API_KEY not set");
let openai_client = Client::new(&openai_api_key);
// Initialize MongoDB client
let mongodb_connection_string =
env::var("MONGODB_CONNECTION_STRING").expect("MONGODB_CONNECTION_STRING not set");
let options = ClientOptions::parse(mongodb_connection_string)
.await
.expect("MongoDB connection string should be valid");
let mongodb_client =
MongoClient::with_options(options).expect("MongoDB client options should be valid");
// Initialize MongoDB vector store
let collection: Collection<bson::Document> = mongodb_client
.database("knowledgebase")
.collection("context");
// Select the embedding model and generate our embeddings
let model = openai_client.embedding_model(TEXT_EMBEDDING_ADA_002);
let words = vec![
Word {
id: "doc0".to_string(),
definition: "Definition of a *flurbo*: A flurbo is a green alien that lives on cold planets".to_string(),
},
Word {
id: "doc1".to_string(),
definition: "Definition of a *glarb-glarb*: A glarb-glarb is a ancient tool used by the ancestors of the inhabitants of planet Jiro to farm the land.".to_string(),
},
Word {
id: "doc2".to_string(),
definition: "Definition of a *linglingdong*: A term used by inhabitants of the far side of the moon to describe humans.".to_string(),
}
];
For detailed API reference and additional features, see the MongoDB Atlas Vector Search documentation and Rig’s API documentation.