Completion in Rig: LLM Interaction Layer

Rig’s completion system provides a layered approach to interacting with Language Models, offering both high-level convenience and low-level control. The system is built around a set of traits that define different levels of abstraction for LLM interactions.

Core Traits

1. High-Level Interfaces

`Prompt` Trait

Simplest interface for one-shot interactions
Fire-and-forget prompting
Returns string responses

async fn prompt(&self, prompt: &str) -> Result<String, PromptError>;

`Chat` Trait

Conversation-aware interactions
Maintains chat history
Supports contextual responses

async fn chat(&self, prompt: &str, history: Vec<Message>) -> Result<String, PromptError>;

`TypedPrompt` Trait

Structured output interface for typed completions
Returns deserialized structured data instead of raw strings
The target type must implement serde::Deserialize and schemars::JsonSchema

pub trait TypedPrompt: WasmCompatSend + WasmCompatSync {
    type TypedRequest<'a, T>: IntoFuture<Output = Result<T, StructuredOutputError>>
       where Self: 'a,
             T: JsonSchema + DeserializeOwned + WasmCompatSend + 'a;
 
    // Required method
    fn prompt_typed<T>(
        &self,
        prompt: impl Into<Message> + WasmCompatSend,
    ) -> Self::TypedRequest<'_, T>
       where T: JsonSchema + DeserializeOwned + WasmCompatSend;
}

This is useful when you need the LLM to return structured data (e.g., JSON conforming to a specific schema) rather than free-form text. See the Structured Output section below for more details.

2. Streaming Interfaces

Rig provides streaming counterparts for all high-level traits. See Streaming for full details.

StreamingPrompt: Streaming one-shot prompts
StreamingChat: Streaming chat with history
StreamingCompletion: Low-level streaming completion interface

3. Low-Level Control

`Completion` Trait

Fine-grained request configuration
Access to raw completion responses
Tool call handling

pub trait Completion<M: CompletionModel> {
    /// Generates a completion request builder for the given `prompt` and `chat_history`.
    /// Fields pre-populated by the implementing type (e.g., Agent preamble) can be
    /// overwritten by calling the corresponding method on the builder.
    fn completion(
        &self,
        prompt: &str,
        chat_history: Vec<Message>,
    ) -> impl Future<Output = Result<CompletionRequestBuilder<M>, CompletionError>> + Send;
}

`CompletionModel` Trait

The provider interface that must be implemented for each LLM backend. In v0.31.0, this trait lives at rig::completion::request::CompletionModel (re-exported via rig::completion).

pub trait CompletionModel:
    Clone
    + WasmCompatSend
    + WasmCompatSync {
    type Response: WasmCompatSend + WasmCompatSync + Serialize + DeserializeOwned;
    type StreamingResponse: Clone + Unpin + WasmCompatSend + WasmCompatSync + Serialize + DeserializeOwned + GetTokenUsage;
    type Client;
 
    // Required methods
    fn make(client: &Self::Client, model: impl Into<String>) -> Self;
    fn completion(
        &self,
        request: CompletionRequest,
    ) -> impl Future<Output = Result<CompletionResponse<Self::Response>, CompletionError>> + WasmCompatSend;
 
    fn stream(
        &self,
        request: CompletionRequest,
    ) -> impl Future<Output = Result<StreamingCompletionResponse<Self::StreamingResponse>, CompletionError>> + WasmCompatSend;
 
    // Provided method
    fn completion_request(
        &self,
        prompt: impl Into<Message>,
    ) -> CompletionRequestBuilder<Self> { ... }
}

Request Building

CompletionRequestBuilder

Fluent API for constructing requests with:

let request = model.completion_request("prompt")
    .preamble("system instructions")
    .temperature(0.7)
    .max_tokens(1000)
    .documents(context_docs)
    .tools(available_tools)
    .build();

Response Handling

CompletionResponse

The CompletionResponse struct wraps the model’s response along with the raw provider-specific data:

pub struct CompletionResponse<T> {
    /// One or more assistant content items (text, tool calls, reasoning, etc.)
    pub choice: OneOrMany<AssistantContent>,
    /// The raw response from the provider
    pub raw_response: T,
}

AssistantContent

In v0.31.0, the old ModelChoice enum has been replaced by a richer AssistantContent enum (in rig::completion::message) that supports multimodal responses:

pub enum AssistantContent {
    /// Plain text response
    Text(Text),
    /// A tool call requested by the model
    ToolCall(ToolCall),
    /// Reasoning/chain-of-thought content (for models that support it)
    Reasoning(Reasoning),
}

The Text struct wraps a string, while ToolCall contains the tool call ID, function name, and arguments:

pub struct ToolCall {
    pub id: String,
    pub function: ToolFunction,
}
 
pub struct ToolFunction {
    pub name: String,
    pub arguments: serde_json::Value,
}

Message Types

The Message enum represents conversation messages with rich content support:

pub enum Message {
    User { content: OneOrMany<UserContent> },
    Assistant { content: OneOrMany<AssistantContent> },
}

UserContent supports text, images, audio, documents, video, and tool results:

pub enum UserContent {
    Text(Text),
    ToolResult(ToolResult),
    Image(Image),
    Audio(Audio),
    Document(Document),
    Video(Video),
}

Token Usage

v0.31.0 adds a Usage struct and the GetTokenUsage trait for tracking token consumption:

pub struct Usage {
    pub prompt_tokens: u64,
    pub completion_tokens: u64,
    pub total_tokens: u64,
}

Implement the GetTokenUsage trait on your provider’s raw response type to expose token metrics.

Error Handling

Comprehensive error types:

pub enum CompletionError {
    HttpError(reqwest::Error),
    JsonError(serde_json::Error),
    RequestError(Box<dyn Error>),
    ResponseError(String),
    ProviderError(String),
}

For structured output, there is an additional error type:

pub enum StructuredOutputError {
    CompletionError(CompletionError),
    JsonError(serde_json::Error),
    // ...
}

Usage Patterns

Basic Completion

let openai = openai::Client::from_env();
let model = openai.completion_model("gpt-4o");
 
let response = model
    .prompt("Explain quantum computing")
    .await?;

Contextual Chat

use rig::completion::Message;
 
let chat_response = agent
    .chat(
        "Continue the discussion",
        vec![Message::user("Previous context")]
    )
    .await?;

Advanced Request Configuration

let response = model
    .completion_request("Complex query")
    .preamble("Expert system")
    .temperature(0.8)
    .documents(context)
    .tools(available_tools)
    .send()
    .await?;

Structured Output

Using the TypedPrompt trait (implemented by Agent), you can get structured responses:

use schemars::JsonSchema;
use serde::Deserialize;
 
#[derive(Deserialize, JsonSchema)]
struct SentimentAnalysis {
    /// The sentiment score from -1.0 to 1.0
    score: f64,
    /// The sentiment label
    label: String,
}
 
let result: SentimentAnalysis = agent
    .prompt_typed("Analyze the sentiment of: 'I love this product!'")
    .await?;

Provider Integration

Implementing New Providers

impl CompletionModel for CustomProvider {
    type Response = CustomResponse;
 
    async fn completion(
        &self,
        request: CompletionRequest
    ) -> Result<CompletionResponse<Self::Response>, CompletionError> {
        // Provider-specific implementation
    }
}

Best Practices

Interface Selection
- Use Prompt for simple interactions
- Use Chat for conversational flows
- Use TypedPrompt for structured data extraction
- Use Completion for fine-grained control
- Use StreamingPrompt/StreamingChat when you need incremental output
Error Handling
- Handle provider-specific errors
- Implement graceful fallbacks
- Log raw responses for debugging
Resource Management
- Reuse model instances
- Batch similar requests
- Monitor token usage via the GetTokenUsage trait