Google Gemini LLM
Google Gemini is our recommended LLM for voice agents due to its exceptional speed, low cost, and excellent performance with Indic languages.
Why Gemini?
| Feature | Gemini 2.5 Flash-Lite | Gemini 2.0 Flash | GPT-4o |
|---|---|---|---|
| Time to First Token | ~100ms | ~150ms | ~250ms |
| Cost (per 1M tokens) | $0.075 in / $0.30 out | $0.075 in / $0.30 out | $5 in / $15 out |
| Context Window | 1M tokens | 1M tokens | 128K |
| Indic Languages | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
Result: 60-100x cheaper than GPT-4o with 50% lower latency
Configuration
Basic Setup
{
"agent": {
"name": "Customer Support",
"llmProvider": "gemini-2.5",
"llmModel": "gemini-2.5-flash-lite",
"llmTemperature": 0.7,
"prompt": "You are a helpful customer support agent..."
}
}
Environment Variables
GOOGLE_AI_API_KEY=your_google_ai_api_key
Advanced Configuration
{
"llmProvider": "gemini-2.5",
"llmModel": "gemini-2.5-flash-lite",
"llmConfig": {
"temperature": 0.7,
"maxOutputTokens": 500,
"topP": 0.95,
"topK": 40
}
}
Model Comparison
| Model | Provider ID | Speed | Intelligence | Cost | Best For |
|---|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | gemini-2.5 |
⚡⚡⚡ Fastest | ⭐⭐⭐⭐ | 💰 Cheapest | Real-time voice agents |
| Gemini 2.0 Flash | gemini |
⚡⚡ Fast | ⭐⭐⭐⭐ | 💰 Cheap | Standard voice agents |
| Gemini 1.5 Pro | gemini-1.5-pro |
🚀 Moderate | ⭐⭐⭐⭐⭐ | 💰💰 | Complex reasoning |
When to Use Each
Gemini 2.5 Flash-Lite (Recommended for Voice):
├── Lowest latency (~100ms TTFT)
├── Best cost-performance ratio
├── Excellent for simple to moderate tasks
└── 1M token context window
Gemini 2.0 Flash:
├── Proven stability
├── Supports Gemini Live (native audio)
├── Great for Indic languages
└── Good balance of speed and capability
Gemini 1.5 Pro:
├── Best reasoning capabilities
├── 2M token context window
├── Complex multi-step tasks
└── Higher latency (not ideal for voice)
Implementation
Streaming Response
type GeminiLLM struct {
client *genai.Client
model string
}
func (g *GeminiLLM) StreamGenerate(ctx context.Context, messages []Message) <-chan string {
tokenChan := make(chan string)
go func() {
defer close(tokenChan)
model := g.client.GenerativeModel(g.model)
model.SetTemperature(0.7)
// Convert messages to Gemini format
var parts []genai.Part
for _, msg := range messages {
parts = append(parts, genai.Text(msg.Content))
}
iter := model.GenerateContentStream(ctx, parts...)
for {
resp, err := iter.Next()
if err == iterator.Done {
return
}
if err != nil {
return
}
for _, candidate := range resp.Candidates {
for _, part := range candidate.Content.Parts {
if text, ok := part.(genai.Text); ok {
tokenChan <- string(text)
}
}
}
}
}()
return tokenChan
}
Function Calling
func (g *GeminiLLM) GenerateWithTools(ctx context.Context, messages []Message, tools []Tool) (*Response, error) {
model := g.client.GenerativeModel(g.model)
// Convert tools to Gemini format
geminiTools := []*genai.Tool{
{
FunctionDeclarations: convertToGeminiFunctions(tools),
},
}
model.Tools = geminiTools
// Generate response
resp, err := model.GenerateContent(ctx, genai.Text(messages[len(messages)-1].Content))
if err != nil {
return nil, err
}
// Check for function calls
for _, candidate := range resp.Candidates {
for _, part := range candidate.Content.Parts {
if fc, ok := part.(genai.FunctionCall); ok {
return &Response{
ToolCalls: []ToolCall{{
Name: fc.Name,
Arguments: fc.Args,
}},
}, nil
}
}
}
// Return text response
return extractTextResponse(resp), nil
}
Indic Language Excellence
Gemini excels at Indian languages:
Supported Languages
| Language | Code | Quality |
|---|---|---|
| Hindi | hi | ⭐⭐⭐⭐⭐ |
| Bengali | bn | ⭐⭐⭐⭐⭐ |
| Tamil | ta | ⭐⭐⭐⭐⭐ |
| Telugu | te | ⭐⭐⭐⭐⭐ |
| Marathi | mr | ⭐⭐⭐⭐ |
| Gujarati | gu | ⭐⭐⭐⭐ |
| Kannada | kn | ⭐⭐⭐⭐ |
| Malayalam | ml | ⭐⭐⭐⭐ |
| Punjabi | pa | ⭐⭐⭐⭐ |
| Odia | or | ⭐⭐⭐ |
| Assamese | as | ⭐⭐⭐ |
Hindi Voice Agent Example
{
"agent": {
"name": "Hindi Support",
"language": "hi-IN",
"llmProvider": "gemini-2.5",
"llmModel": "gemini-2.5-flash-lite",
"sttProvider": "google",
"sttModel": "chirp_2",
"ttsProvider": "azure",
"ttsVoice": "hi-IN-SwaraNeural",
"prompt": "आप एक मददगार ग्राहक सहायता एजेंट हैं..."
}
}
Latency Optimization
1. Gemini 2.5 Flash-Lite First
Always try the fastest model first:
func selectGeminiModel(complexity string) string {
switch complexity {
case "simple", "moderate":
return "gemini-2.5-flash-lite" // 100ms TTFT
case "complex":
return "gemini-2.0-flash" // 150ms TTFT
case "reasoning":
return "gemini-1.5-pro" // Not recommended for voice
default:
return "gemini-2.5-flash-lite"
}
}
2. Pre-warming Connections
// Pre-connect to Gemini on startup
func warmUpGemini(client *genai.Client) {
model := client.GenerativeModel("gemini-2.5-flash-lite")
// Send a simple request to warm up the connection
_, _ = model.GenerateContent(context.Background(), genai.Text("Hi"))
}
3. Streaming with Early TTS
// Start TTS as soon as we get first tokens
func streamToTTS(llmStream <-chan string, tts TTS) {
var buffer strings.Builder
tokenCount := 0
for token := range llmStream {
buffer.WriteString(token)
tokenCount++
// Start TTS after collecting enough for natural speech
if tokenCount > 5 || strings.ContainsAny(token, ".!?,") {
text := buffer.String()
buffer.Reset()
tokenCount = 0
tts.StreamSynthesize(text)
}
}
}
Safety Settings
Configure content safety for your use case:
model := client.GenerativeModel("gemini-2.5-flash-lite")
model.SafetySettings = []*genai.SafetySetting{
{
Category: genai.HarmCategoryHarassment,
Threshold: genai.HarmBlockMediumAndAbove,
},
{
Category: genai.HarmCategoryHateSpeech,
Threshold: genai.HarmBlockMediumAndAbove,
},
{
Category: genai.HarmCategoryDangerousContent,
Threshold: genai.HarmBlockOnlyHigh,
},
}
Prompt Engineering for Gemini
Voice-Optimized System Prompt
systemPrompt := `You are a helpful customer support agent.
VOICE CONVERSATION RULES:
- Keep responses SHORT (1-2 sentences)
- Use natural, conversational language
- Avoid bullet points and numbered lists
- Say numbers naturally: "one two three" not "123"
- Ask one question at a time
- Confirm before taking any actions
RESPONSE FORMAT:
- Direct, actionable responses
- No emojis or special characters
- No markdown formatting
You have access to these tools:
- get_order_status: Look up order information
- schedule_callback: Schedule a callback
- transfer_call: Transfer to human agent`
Handling Multi-turn Context
func buildGeminiContext(history []Message, userInput string) []genai.Part {
var parts []genai.Part
// Add system prompt
parts = append(parts, genai.Text(systemPrompt))
// Add conversation history (last 10 turns)
recentHistory := history
if len(history) > 20 {
recentHistory = history[len(history)-20:]
}
for _, msg := range recentHistory {
prefix := "User: "
if msg.Role == "assistant" {
prefix = "Assistant: "
}
parts = append(parts, genai.Text(prefix + msg.Content))
}
// Add current user input
parts = append(parts, genai.Text("User: " + userInput))
return parts
}
Error Handling
func (g *GeminiLLM) generateWithRetry(ctx context.Context, messages []Message) (*Response, error) {
maxRetries := 3
backoff := 200 * time.Millisecond
for i := 0; i < maxRetries; i++ {
resp, err := g.generate(ctx, messages)
if err == nil {
return resp, nil
}
// Check error type
if strings.Contains(err.Error(), "429") || strings.Contains(err.Error(), "quota") {
time.Sleep(backoff)
backoff *= 2
continue
}
if strings.Contains(err.Error(), "500") || strings.Contains(err.Error(), "503") {
time.Sleep(backoff)
continue
}
return nil, err // Non-retryable
}
return nil, fmt.Errorf("max retries exceeded")
}
Cost Tracking
func (g *GeminiLLM) trackUsage(resp *genai.GenerateContentResponse) {
if resp.UsageMetadata != nil {
inputTokens := resp.UsageMetadata.PromptTokenCount
outputTokens := resp.UsageMetadata.CandidatesTokenCount
metrics.RecordCounter("llm.gemini.input_tokens", int64(inputTokens))
metrics.RecordCounter("llm.gemini.output_tokens", int64(outputTokens))
// Gemini 2.5 Flash-Lite pricing
inputCost := float64(inputTokens) * 0.000000075 // $0.075/1M tokens
outputCost := float64(outputTokens) * 0.0000003 // $0.30/1M tokens
metrics.RecordCounter("llm.gemini.cost_usd", inputCost+outputCost)
}
}
Fallback to OpenAI
type LLMWithFallback struct {
gemini *GeminiLLM
openai *OpenAILLM
}
func (l *LLMWithFallback) Generate(ctx context.Context, messages []Message) (*Response, error) {
// Try Gemini first (faster, cheaper)
resp, err := l.gemini.Generate(ctx, messages)
if err == nil {
return resp, nil
}
log.Printf("Gemini failed: %v, falling back to OpenAI", err)
// Fallback to OpenAI
return l.openai.Generate(ctx, messages)
}
Next Steps
- Gemini Live - Native audio-to-audio
- OpenAI Configuration - For complex reasoning
- Function Calling - Add tools
- Latency Optimization - Reduce response time