LLM Providers Overview

The LLM is the "brain" of your voice agent - it understands user intent and generates appropriate responses. Choosing the right LLM is critical for both quality and latency.

Supported Providers

Provider	Models	Latency	Cost	Best For
Google Gemini	2.0 Flash, 2.5 Flash-Lite	⚡ Fastest	💰 Cheapest	Voice agents
Gemini Live	2.0, 2.5 HD	⚡⚡ Ultra-fast	💰💰	Native audio
OpenAI	GPT-4o, GPT-4o-mini	🚀 Fast	💰💰💰	Complex reasoning
Anthropic	Claude 3.5 Sonnet	🚀 Fast	💰💰💰	Long context
Azure OpenAI	GPT-4o, GPT-4o-mini	🚀 Fast	💰💰💰	Enterprise

Quick Comparison

Time to First Token (lower is better):
──────────────────────────────────────────────────────────────────

Gemini Live 2.5    ████░░░░░░░░░░░░░░░░░░░░░░░░░░░░  50ms
Gemini 2.5 Lite    ████████░░░░░░░░░░░░░░░░░░░░░░░░  100ms
Gemini 2.0 Flash   ██████████████░░░░░░░░░░░░░░░░░░  150ms
GPT-4o-mini        ████████████████████░░░░░░░░░░░░  180ms
GPT-4o             ████████████████████████████░░░░  250ms
Claude 3.5 Sonnet  ██████████████████████████░░░░░░  220ms

                   0ms              150ms            300ms

Choosing the Right Provider

For Voice Agents (Recommended)

Google Gemini 2.5 Flash-Lite

Fastest time-to-first-token (~100ms)
Excellent for conversational AI
Best cost-performance ratio
1M token context window

{
  "llmProvider": "gemini-2.5",
  "llmModel": "gemini-2.5-flash-lite"
}

For Native Audio (Best Latency)

Gemini Live 2.0 / 2.5

Bypasses STT and TTS entirely
Audio-to-audio in ~50ms
Natural voice with emotions
30 HD voices (2.5)

{
  "llmProvider": "gemini-live-2.5",
  "geminiliveVoice": "Kore"
}

For Complex Reasoning

OpenAI GPT-4o

Best overall reasoning capability
Function calling reliability
Multi-modal understanding
Higher latency (~250ms)

{
  "llmProvider": "openai",
  "llmModel": "gpt-4o"
}

For Enterprise / Compliance

Azure OpenAI

Same models as OpenAI
Enterprise SLAs
Data residency options
SOC 2, HIPAA compliant

{
  "llmProvider": "openai-azure",
  "llmModel": "gpt-4o"
}

Provider Configuration

Basic Setup

{
  "agent": {
    "name": "Customer Support",
    "llmProvider": "gemini-2.5",
    "llmModel": "gemini-2.5-flash-lite",
    "llmTemperature": 0.7,
    "prompt": "You are a helpful customer support agent..."
  }
}

Environment Variables

# Google Gemini
GOOGLE_AI_API_KEY=your_google_ai_key

# OpenAI
OPENAI_API_KEY=your_openai_key

# Anthropic
ANTHROPIC_API_KEY=your_anthropic_key

# Azure OpenAI
AZURE_OPENAI_API_KEY=your_azure_key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=your-deployment-name

Streaming

All providers support streaming for minimal latency:

// LLM generates tokens one at a time
for token := range llm.StreamGenerate(ctx, messages) {
    // Send each token to TTS immediately
    tts.QueueText(token)
}

Streaming Timeline

LLM Output: "Your order is on the way and will arrive tomorrow."
            │       │       │      │       │        │
Token 1:    "Your"  │       │      │       │        │
Token 2:            "order" │      │       │        │
Token 3:                    "is"   │       │        │
Token 4:                           "on"    │        │
Token 5:                                   "the"    │
Token 6:                                            "way..."

TTS starts generating audio from Token 1
User hears audio while LLM is still generating

Function Calling

All supported providers support function/tool calling:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_order_status",
        "description": "Get the status of a customer order",
        "parameters": {
          "type": "object",
          "properties": {
            "order_id": {
              "type": "string",
              "description": "The order ID to look up"
            }
          },
          "required": ["order_id"]
        }
      }
    }
  ]
}

Provider Function Calling Reliability

Provider	Reliability	Notes
GPT-4o	⭐⭐⭐⭐⭐	Best function calling
Gemini 2.0	⭐⭐⭐⭐	Very good
Claude 3.5	⭐⭐⭐⭐	Good
GPT-4o-mini	⭐⭐⭐	Sometimes misses

Context Management

Context Window Sizes

Provider	Context Window	Practical Limit
Gemini 2.0	1M tokens	100K recommended
Gemini 1.5 Pro	2M tokens	200K recommended
GPT-4o	128K tokens	32K recommended
Claude 3.5	200K tokens	100K recommended

Optimizing Context

// Keep only recent conversation history
func trimContext(messages []Message, maxTokens int) []Message {
    // Always keep system prompt
    system := messages[0]

    // Keep recent messages within token limit
    recent := []Message{system}
    tokenCount := countTokens(system.Content)

    for i := len(messages) - 1; i >= 1; i-- {
        msgTokens := countTokens(messages[i].Content)
        if tokenCount + msgTokens > maxTokens {
            break
        }
        recent = append([]Message{messages[i]}, recent[1:]...)
        tokenCount += msgTokens
    }

    return recent
}

Cost Optimization

Cost per 1000 Tokens (Input/Output)

Provider	Input	Output	Monthly @ 1M calls
Gemini 2.5 Lite	$0.015	$0.06	~$150
Gemini 2.0 Flash	$0.075	$0.30	~$750
GPT-4o-mini	$0.15	$0.60	~$1,500
GPT-4o	$2.50	$10.00	~$25,000

Cost Reduction Strategies

Use Gemini for most calls - 10-100x cheaper than GPT-4o
Keep prompts short - Every token costs money
Cache common responses - Don't regenerate identical responses
Route complex tasks - Use GPT-4o only when needed

Fallback Configuration

Configure fallback providers for reliability:

type LLMFallback struct {
    Primary   LLMProvider
    Secondary LLMProvider
    Tertiary  LLMProvider
}

func (f *LLMFallback) Generate(ctx context.Context, messages []Message) (string, error) {
    response, err := f.Primary.Generate(ctx, messages)
    if err == nil {
        return response, nil
    }

    log.Printf("Primary LLM failed: %v, trying secondary", err)
    response, err = f.Secondary.Generate(ctx, messages)
    if err == nil {
        return response, nil
    }

    log.Printf("Secondary LLM failed: %v, trying tertiary", err)
    return f.Tertiary.Generate(ctx, messages)
}

Next Steps

Gemini Configuration - Set up Google Gemini
Gemini Live - Native audio-to-audio
OpenAI Configuration - Set up GPT-4o
Function Calling - Add tools to your agent