Latency Optimization

Minimize delay between user speech and agent response.

Understanding Latency

Total latency = STT + LLM + TTS + Network

Component	Typical Latency
STT	100-500ms
LLM	200-1000ms
TTS	100-500ms
Network	50-200ms
Total	450-2200ms

Optimization Strategies

1. Use Native Audio Models

Bypass STT/TTS entirely:

{
  "llmProvider": "gemini-live-2.5"
}

Model	Avg Latency
Traditional Pipeline	~1200ms
Gemini Live 2.5	~400ms
OpenAI Realtime	~500ms

2. Use Vertex AI (Gemini Live)

Vertex AI reduces Gemini Live latency significantly:

{
  "llmConfig": {
    "gemini-live-2.5": {
      "vertexai": {
        "enabled": true,
        "region": "us-central1"
      }
    }
  }
}

3. Choose Fast Providers

Fastest STT:

Deepgram Nova-2

Fastest TTS:

Cartesia
Deepgram
Sarvam

Fastest LLM:

Gemini 2.5 Flash-Lite

4. Optimize LLM Response

{
  "llmConfig": {
    "maxTokens": 150,
    "temperature": 0.5
  }
}

Shorter responses = faster TTS.

5. Reduce Network Latency

Use nearest region
Check for network issues
Consider dedicated connection

Measuring Latency

Check latency in Call History:

Open a call
View Latency Breakdown
See per-component timing

Latency Targets

Use Case	Target
Sales calls	<800ms
Support	<1000ms
IVR	<500ms

Quick Wins

Switch to Gemini Live
Enable Vertex AI
Use Deepgram STT
Keep responses short