OpenAI LLM
OpenAI's GPT-4o provides the best overall reasoning capabilities and function calling reliability, making it ideal for complex voice agent scenarios.
Why OpenAI?
| Feature | GPT-4o | GPT-4o-mini |
|---|---|---|
| Reasoning | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| Function Calling | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Time to First Token | ~250ms | ~180ms |
| Context Window | 128K | 128K |
| Cost per 1K tokens | $0.005 in / $0.015 out | $0.00015 in / $0.0006 out |
Configuration
Basic Setup
{
"agent": {
"name": "Customer Support",
"llmProvider": "openai",
"llmModel": "gpt-4o",
"llmTemperature": 0.7,
"prompt": "You are a helpful customer support agent..."
}
}
Environment Variables
OPENAI_API_KEY=your_openai_api_key
Advanced Configuration
{
"llmProvider": "openai",
"llmModel": "gpt-4o",
"llmConfig": {
"temperature": 0.7,
"max_tokens": 500,
"top_p": 0.95,
"frequency_penalty": 0.0,
"presence_penalty": 0.0
}
}
Model Comparison
| Model | Speed | Intelligence | Cost | Best For |
|---|---|---|---|---|
| gpt-4o | 🚀 Fast | ⭐⭐⭐⭐⭐ | $$$$ | Complex reasoning, function calls |
| gpt-4o-mini | ⚡ Faster | ⭐⭐⭐⭐ | $ | Cost-sensitive, simpler tasks |
| gpt-4-turbo | 🐢 Slower | ⭐⭐⭐⭐⭐ | $$$$$ | Legacy, long context |
When to Use Each
gpt-4o:
├── Complex multi-step reasoning
├── Function calling with multiple tools
├── Handling ambiguous user requests
└── Premium customer experience
gpt-4o-mini:
├── Simple FAQ responses
├── Single function calls
├── High-volume, cost-sensitive
└── Straightforward conversations
Implementation
Streaming Response
type OpenAILLM struct {
client *openai.Client
model string
}
func (o *OpenAILLM) StreamGenerate(ctx context.Context, messages []Message) <-chan string {
tokenChan := make(chan string)
go func() {
defer close(tokenChan)
// Convert to OpenAI format
openaiMessages := convertMessages(messages)
stream, err := o.client.CreateChatCompletionStream(ctx, openai.ChatCompletionRequest{
Model: o.model,
Messages: openaiMessages,
Stream: true,
})
if err != nil {
return
}
defer stream.Close()
for {
response, err := stream.Recv()
if err == io.EOF {
return
}
if err != nil {
return
}
if len(response.Choices) > 0 {
delta := response.Choices[0].Delta.Content
if delta != "" {
tokenChan <- delta
}
}
}
}()
return tokenChan
}
Function Calling
func (o *OpenAILLM) GenerateWithTools(ctx context.Context, messages []Message, tools []Tool) (*Response, error) {
openaiMessages := convertMessages(messages)
openaiTools := convertTools(tools)
resp, err := o.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
Model: o.model,
Messages: openaiMessages,
Tools: openaiTools,
})
if err != nil {
return nil, err
}
choice := resp.Choices[0]
// Check for function calls
if len(choice.Message.ToolCalls) > 0 {
return &Response{
ToolCalls: convertToolCalls(choice.Message.ToolCalls),
}, nil
}
return &Response{
Content: choice.Message.Content,
}, nil
}
Function Calling (Tools)
OpenAI has the most reliable function calling:
Define Tools
tools := []openai.Tool{
{
Type: openai.ToolTypeFunction,
Function: &openai.FunctionDefinition{
Name: "get_order_status",
Description: "Get the current status of a customer order",
Parameters: map[string]any{
"type": "object",
"properties": map[string]any{
"order_id": map[string]any{
"type": "string",
"description": "The order ID (e.g., ORD-12345)",
},
},
"required": []string{"order_id"},
},
},
},
{
Type: openai.ToolTypeFunction,
Function: &openai.FunctionDefinition{
Name: "transfer_to_agent",
Description: "Transfer the call to a human agent",
Parameters: map[string]any{
"type": "object",
"properties": map[string]any{
"department": map[string]any{
"type": "string",
"enum": []string{"sales", "support", "billing"},
},
"reason": map[string]any{
"type": "string",
},
},
"required": []string{"department"},
},
},
},
}
Handle Tool Calls
func handleToolCall(call openai.ToolCall) (string, error) {
var args map[string]any
json.Unmarshal([]byte(call.Function.Arguments), &args)
switch call.Function.Name {
case "get_order_status":
orderID := args["order_id"].(string)
status := orderService.GetStatus(orderID)
return json.Marshal(status)
case "transfer_to_agent":
dept := args["department"].(string)
return transferToHuman(dept)
default:
return "", fmt.Errorf("unknown function: %s", call.Function.Name)
}
}
Prompt Engineering for Voice
Voice-Optimized System Prompt
systemPrompt := `You are a friendly customer support agent for Acme Corp.
IMPORTANT VOICE GUIDELINES:
- Keep responses SHORT (1-2 sentences max)
- Use conversational language
- Avoid bullet points and lists
- Don't say "I'd be happy to help" - just help
- Numbers: say "one two three" not "123"
- Dates: say "December twenty-ninth" not "12/29"
CONVERSATION RULES:
- Ask one question at a time
- Confirm before taking actions
- If unsure, ask for clarification
You have these tools:
- get_order_status: Look up order information
- transfer_to_agent: Transfer to human support`
Response Length Control
// Add length instruction dynamically
func buildMessages(userInput string, context *ConversationContext) []Message {
messages := []Message{
{Role: "system", Content: systemPrompt},
}
// Add context history
messages = append(messages, context.RecentMessages...)
// Add user input with length guidance
messages = append(messages, Message{
Role: "user",
Content: userInput + "\n\n[Respond in 1-2 short sentences]",
})
return messages
}
Latency Optimization
1. Use gpt-4o-mini for Simple Tasks
func selectModel(intent string) string {
switch intent {
case "greeting", "farewell", "confirmation":
return "gpt-4o-mini" // Fast, simple
case "complex_query", "multi_tool":
return "gpt-4o" // Smarter, reliable tools
default:
return "gpt-4o-mini"
}
}
2. Parallel Tool Calling
GPT-4o can call multiple tools in parallel:
// Single request, multiple tools
User: "What's my order status and when is the store open?"
// GPT-4o returns both tool calls at once:
ToolCalls: [
{Name: "get_order_status", Args: {order_id: "12345"}},
{Name: "get_store_hours", Args: {location: "downtown"}},
]
// Execute in parallel
var wg sync.WaitGroup
results := make([]ToolResult, len(toolCalls))
for i, call := range toolCalls {
wg.Add(1)
go func(idx int, tc ToolCall) {
defer wg.Done()
results[idx] = executeToolCall(tc)
}(i, call)
}
wg.Wait()
3. Streaming with Early TTS
// Start TTS before LLM finishes
func streamToTTS(llmStream <-chan string, tts TTS) {
var buffer strings.Builder
for token := range llmStream {
buffer.WriteString(token)
// Start TTS at first sentence boundary
if strings.ContainsAny(token, ".!?") {
text := buffer.String()
buffer.Reset()
// Send to TTS immediately
tts.StreamSynthesize(text)
}
}
}
Error Handling
func (o *OpenAILLM) generateWithRetry(ctx context.Context, messages []Message) (*Response, error) {
maxRetries := 3
backoff := 500 * time.Millisecond
for i := 0; i < maxRetries; i++ {
resp, err := o.generate(ctx, messages)
if err == nil {
return resp, nil
}
// Check error type
var apiErr *openai.APIError
if errors.As(err, &apiErr) {
switch apiErr.HTTPStatusCode {
case 429: // Rate limit
time.Sleep(backoff)
backoff *= 2
continue
case 500, 502, 503: // Server error
time.Sleep(backoff)
continue
default:
return nil, err // Non-retryable
}
}
return nil, err
}
return nil, fmt.Errorf("max retries exceeded")
}
Fallback Strategy
type LLMWithFallback struct {
primary LLM // gpt-4o
secondary LLM // gpt-4o-mini
tertiary LLM // gemini
}
func (l *LLMWithFallback) Generate(ctx context.Context, messages []Message) (*Response, error) {
// Try primary (best quality)
resp, err := l.primary.Generate(ctx, messages)
if err == nil {
return resp, nil
}
log.Printf("Primary LLM failed: %v", err)
// Try secondary (faster, cheaper)
resp, err = l.secondary.Generate(ctx, messages)
if err == nil {
return resp, nil
}
log.Printf("Secondary LLM failed: %v", err)
// Try tertiary (different provider)
return l.tertiary.Generate(ctx, messages)
}
Cost Optimization
Token Usage Tracking
func (o *OpenAILLM) trackUsage(resp openai.ChatCompletionResponse) {
metrics.RecordCounter("llm.openai.input_tokens", int64(resp.Usage.PromptTokens))
metrics.RecordCounter("llm.openai.output_tokens", int64(resp.Usage.CompletionTokens))
// Calculate cost
inputCost := float64(resp.Usage.PromptTokens) * 0.000005 // $5/1M tokens
outputCost := float64(resp.Usage.CompletionTokens) * 0.000015 // $15/1M tokens
metrics.RecordCounter("llm.openai.cost_usd", inputCost+outputCost)
}
Prompt Compression
// Remove unnecessary whitespace and formatting
func compressPrompt(prompt string) string {
// Remove extra newlines
prompt = regexp.MustCompile(`\n{3,}`).ReplaceAllString(prompt, "\n\n")
// Remove leading/trailing whitespace from lines
lines := strings.Split(prompt, "\n")
for i, line := range lines {
lines[i] = strings.TrimSpace(line)
}
return strings.Join(lines, "\n")
}
Next Steps
- Gemini Configuration - Faster, cheaper alternative
- Gemini Live - Native audio-to-audio
- Function Calling - Advanced tool use
- Latency Optimization - Reduce response time