Latency Optimization
Minimize delay between user speech and agent response.
Understanding Latency
Total latency = STT + LLM + TTS + Network
| Component | Typical Latency |
|---|---|
| STT | 100-500ms |
| LLM | 200-1000ms |
| TTS | 100-500ms |
| Network | 50-200ms |
| Total | 450-2200ms |
Optimization Strategies
1. Use Native Audio Models
Bypass STT/TTS entirely:
{
"llmProvider": "gemini-live-2.5"
}
| Model | Avg Latency |
|---|---|
| Traditional Pipeline | ~1200ms |
| Gemini Live 2.5 | ~400ms |
| OpenAI Realtime | ~500ms |
2. Use Vertex AI (Gemini Live)
Vertex AI reduces Gemini Live latency significantly:
{
"llmConfig": {
"gemini-live-2.5": {
"vertexai": {
"enabled": true,
"region": "us-central1"
}
}
}
}
3. Choose Fast Providers
Fastest STT:
- Deepgram Nova-2
Fastest TTS:
- Cartesia
- Deepgram
- Sarvam
Fastest LLM:
- Gemini 2.5 Flash-Lite
4. Optimize LLM Response
{
"llmConfig": {
"maxTokens": 150,
"temperature": 0.5
}
}
Shorter responses = faster TTS.
5. Reduce Network Latency
- Use nearest region
- Check for network issues
- Consider dedicated connection
Measuring Latency
Check latency in Call History:
- Open a call
- View Latency Breakdown
- See per-component timing
Latency Targets
| Use Case | Target |
|---|---|
| Sales calls | <800ms |
| Support | <1000ms |
| IVR | <500ms |
Quick Wins
- Switch to Gemini Live
- Enable Vertex AI
- Use Deepgram STT
- Keep responses short