Rig Loaders: File Loading and Processing
Rig’s loader system provides utilities for loading and preprocessing various file types, with a focus on structured data ingestion for LLM applications. The system is designed to be extensible and error-tolerant, with built-in support for common file types.
Core Components
FileLoader
The base loader for handling generic files with features for:
- Glob pattern matching
- Directory traversal
- Error handling
- Content preprocessing
use rig::loaders::FileLoader;
// Load all Rust files in examples directory
let examples = FileLoader::with_glob("examples/*.rs")?
.read_with_path()
.ignore_errors()
.into_iter();
PDF Loader (Optional)
Specialized loader for PDF documents with additional capabilities:
- Page-by-page extraction
- Metadata handling
- PDF-specific error handling
use rig::loaders::PdfFileLoader;
let documents = PdfFileLoader::with_glob("docs/*.pdf")?
.load_with_path()
.ignore_errors()
.by_page()
.into_iter();
Key Features
1. Error Handling
The loaders provide robust error handling through custom error types:
pub enum FileLoaderError {
InvalidGlobPattern(String),
IoError(std::io::Error),
PatternError(glob::PatternError),
GlobError(glob::GlobError),
}
2. Flexible Loading Patterns
Multiple ways to specify input sources:
// Using glob patterns
let glob_loader = FileLoader::with_glob("**/*.txt")?;
// Using directory
let dir_loader = FileLoader::with_dir("data/")?;
3. Content Processing
Built-in methods for common processing tasks:
let processed_files = FileLoader::with_glob("*.txt")?
.read() // Read contents
.ignore_errors() // Skip failed reads
.into_iter()
.collect::<Vec<_>>();
Integration with Rig
Agent Context Loading
Reference:
rig-core/examples/agent_with_loaders.rs [17:28]
// Load in all the rust examples
let examples = FileLoader::with_glob("rig-core/examples/*.rs")?
.read_with_path()
.ignore_errors()
.into_iter();
// Create an agent with multiple context documents
let agent = examples
.fold(AgentBuilder::new(model), |builder, (path, content)| {
builder.context(format!("Rust Example {:?}:\n{}", path, content).as_str())
})
.build();
PDF Document Processing
Reference:
rig-core/src/loaders/pdf.rs [31:45]
fn load(self) -> Result<Document, PdfLoaderError> {
Document::load(self).map_err(PdfLoaderError::PdfError)
}
fn load_with_path(self) -> Result<(PathBuf, Document), PdfLoaderError> {
let contents = Document::load(&self);
Ok((self, contents?))
}
}
impl<T: Loadable> Loadable for Result<T, PdfLoaderError> {
fn load(self) -> Result<Document, PdfLoaderError> {
self.map(|t| t.load())?
}
fn load_with_path(self) -> Result<(PathBuf, Document), PdfLoaderError> {
self.map(|t| t.load_with_path())?
}
Best Practices
-
Error Handling
- Use
ignore_errors()
for fault-tolerant processing - Handle specific error types when needed
- Log errors appropriately
- Use
-
Resource Management
- Process files in batches
- Consider memory usage with large files
- Clean up temporary resources
-
Content Processing
- Preprocess content before LLM ingestion
- Maintain file metadata when relevant
- Use appropriate loader for file type
Common Patterns
Basic File Loading
let loader = FileLoader::with_glob("data/*.txt")?;
for content in loader.read().ignore_errors() {
// Process content
}
PDF Processing
let pdf_loader = PdfFileLoader::with_glob("docs/*.pdf")?;
let pages = pdf_loader
.load_with_path()
.ignore_errors()
.by_page()
.into_iter();
Directory Processing
let dir_loader = FileLoader::with_dir("data/")?
.read_with_path()
.ignore_errors();
for (path, content) in dir_loader {
// Process files with path context
}
See Also
API Reference (Loaders)