OpenAI LLM

OpenAI's GPT-4o provides the best overall reasoning capabilities and function calling reliability, making it ideal for complex voice agent scenarios.

Why OpenAI?

Feature	GPT-4o	GPT-4o-mini
Reasoning	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
Function Calling	⭐⭐⭐⭐⭐	⭐⭐⭐
Time to First Token	~250ms	~180ms
Context Window	128K	128K
Cost per 1K tokens	$0.005 in / $0.015 out	$0.00015 in / $0.0006 out

Configuration

Basic Setup

{
  "agent": {
    "name": "Customer Support",
    "llmProvider": "openai",
    "llmModel": "gpt-4o",
    "llmTemperature": 0.7,
    "prompt": "You are a helpful customer support agent..."
  }
}

Environment Variables

OPENAI_API_KEY=your_openai_api_key

Advanced Configuration

{
  "llmProvider": "openai",
  "llmModel": "gpt-4o",
  "llmConfig": {
    "temperature": 0.7,
    "max_tokens": 500,
    "top_p": 0.95,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0
  }
}

Model Comparison

Model	Speed	Intelligence	Cost	Best For
gpt-4o	🚀 Fast	⭐⭐⭐⭐⭐	$$$$	Complex reasoning, function calls
gpt-4o-mini	⚡ Faster	⭐⭐⭐⭐	$	Cost-sensitive, simpler tasks
gpt-4-turbo	🐢 Slower	⭐⭐⭐⭐⭐	$$$$$	Legacy, long context

When to Use Each

gpt-4o:
├── Complex multi-step reasoning
├── Function calling with multiple tools
├── Handling ambiguous user requests
└── Premium customer experience

gpt-4o-mini:
├── Simple FAQ responses
├── Single function calls
├── High-volume, cost-sensitive
└── Straightforward conversations

Implementation

Streaming Response

type OpenAILLM struct {
    client *openai.Client
    model  string
}

func (o *OpenAILLM) StreamGenerate(ctx context.Context, messages []Message) <-chan string {
    tokenChan := make(chan string)

    go func() {
        defer close(tokenChan)

        // Convert to OpenAI format
        openaiMessages := convertMessages(messages)

        stream, err := o.client.CreateChatCompletionStream(ctx, openai.ChatCompletionRequest{
            Model:    o.model,
            Messages: openaiMessages,
            Stream:   true,
        })
        if err != nil {
            return
        }
        defer stream.Close()

        for {
            response, err := stream.Recv()
            if err == io.EOF {
                return
            }
            if err != nil {
                return
            }

            if len(response.Choices) > 0 {
                delta := response.Choices[0].Delta.Content
                if delta != "" {
                    tokenChan <- delta
                }
            }
        }
    }()

    return tokenChan
}

Function Calling

func (o *OpenAILLM) GenerateWithTools(ctx context.Context, messages []Message, tools []Tool) (*Response, error) {
    openaiMessages := convertMessages(messages)
    openaiTools := convertTools(tools)

    resp, err := o.client.CreateChatCompletion(ctx, openai.ChatCompletionRequest{
        Model:    o.model,
        Messages: openaiMessages,
        Tools:    openaiTools,
    })
    if err != nil {
        return nil, err
    }

    choice := resp.Choices[0]

    // Check for function calls
    if len(choice.Message.ToolCalls) > 0 {
        return &Response{
            ToolCalls: convertToolCalls(choice.Message.ToolCalls),
        }, nil
    }

    return &Response{
        Content: choice.Message.Content,
    }, nil
}

Function Calling (Tools)

OpenAI has the most reliable function calling:

Define Tools

tools := []openai.Tool{
    {
        Type: openai.ToolTypeFunction,
        Function: &openai.FunctionDefinition{
            Name:        "get_order_status",
            Description: "Get the current status of a customer order",
            Parameters: map[string]any{
                "type": "object",
                "properties": map[string]any{
                    "order_id": map[string]any{
                        "type":        "string",
                        "description": "The order ID (e.g., ORD-12345)",
                    },
                },
                "required": []string{"order_id"},
            },
        },
    },
    {
        Type: openai.ToolTypeFunction,
        Function: &openai.FunctionDefinition{
            Name:        "transfer_to_agent",
            Description: "Transfer the call to a human agent",
            Parameters: map[string]any{
                "type": "object",
                "properties": map[string]any{
                    "department": map[string]any{
                        "type": "string",
                        "enum": []string{"sales", "support", "billing"},
                    },
                    "reason": map[string]any{
                        "type": "string",
                    },
                },
                "required": []string{"department"},
            },
        },
    },
}

Handle Tool Calls

func handleToolCall(call openai.ToolCall) (string, error) {
    var args map[string]any
    json.Unmarshal([]byte(call.Function.Arguments), &args)

    switch call.Function.Name {
    case "get_order_status":
        orderID := args["order_id"].(string)
        status := orderService.GetStatus(orderID)
        return json.Marshal(status)

    case "transfer_to_agent":
        dept := args["department"].(string)
        return transferToHuman(dept)

    default:
        return "", fmt.Errorf("unknown function: %s", call.Function.Name)
    }
}

Prompt Engineering for Voice

Voice-Optimized System Prompt

systemPrompt := `You are a friendly customer support agent for Acme Corp.

IMPORTANT VOICE GUIDELINES:
- Keep responses SHORT (1-2 sentences max)
- Use conversational language
- Avoid bullet points and lists
- Don't say "I'd be happy to help" - just help
- Numbers: say "one two three" not "123"
- Dates: say "December twenty-ninth" not "12/29"

CONVERSATION RULES:
- Ask one question at a time
- Confirm before taking actions
- If unsure, ask for clarification

You have these tools:
- get_order_status: Look up order information
- transfer_to_agent: Transfer to human support`

Response Length Control

// Add length instruction dynamically
func buildMessages(userInput string, context *ConversationContext) []Message {
    messages := []Message{
        {Role: "system", Content: systemPrompt},
    }

    // Add context history
    messages = append(messages, context.RecentMessages...)

    // Add user input with length guidance
    messages = append(messages, Message{
        Role:    "user",
        Content: userInput + "\n\n[Respond in 1-2 short sentences]",
    })

    return messages
}

Latency Optimization

1. Use gpt-4o-mini for Simple Tasks

func selectModel(intent string) string {
    switch intent {
    case "greeting", "farewell", "confirmation":
        return "gpt-4o-mini" // Fast, simple
    case "complex_query", "multi_tool":
        return "gpt-4o" // Smarter, reliable tools
    default:
        return "gpt-4o-mini"
    }
}

2. Parallel Tool Calling

GPT-4o can call multiple tools in parallel:

// Single request, multiple tools
User: "What's my order status and when is the store open?"

// GPT-4o returns both tool calls at once:
ToolCalls: [
    {Name: "get_order_status", Args: {order_id: "12345"}},
    {Name: "get_store_hours", Args: {location: "downtown"}},
]

// Execute in parallel
var wg sync.WaitGroup
results := make([]ToolResult, len(toolCalls))

for i, call := range toolCalls {
    wg.Add(1)
    go func(idx int, tc ToolCall) {
        defer wg.Done()
        results[idx] = executeToolCall(tc)
    }(i, call)
}
wg.Wait()

3. Streaming with Early TTS

// Start TTS before LLM finishes
func streamToTTS(llmStream <-chan string, tts TTS) {
    var buffer strings.Builder

    for token := range llmStream {
        buffer.WriteString(token)

        // Start TTS at first sentence boundary
        if strings.ContainsAny(token, ".!?") {
            text := buffer.String()
            buffer.Reset()

            // Send to TTS immediately
            tts.StreamSynthesize(text)
        }
    }
}

Error Handling

func (o *OpenAILLM) generateWithRetry(ctx context.Context, messages []Message) (*Response, error) {
    maxRetries := 3
    backoff := 500 * time.Millisecond

    for i := 0; i < maxRetries; i++ {
        resp, err := o.generate(ctx, messages)
        if err == nil {
            return resp, nil
        }

        // Check error type
        var apiErr *openai.APIError
        if errors.As(err, &apiErr) {
            switch apiErr.HTTPStatusCode {
            case 429: // Rate limit
                time.Sleep(backoff)
                backoff *= 2
                continue
            case 500, 502, 503: // Server error
                time.Sleep(backoff)
                continue
            default:
                return nil, err // Non-retryable
            }
        }

        return nil, err
    }

    return nil, fmt.Errorf("max retries exceeded")
}

Fallback Strategy

type LLMWithFallback struct {
    primary   LLM // gpt-4o
    secondary LLM // gpt-4o-mini
    tertiary  LLM // gemini
}

func (l *LLMWithFallback) Generate(ctx context.Context, messages []Message) (*Response, error) {
    // Try primary (best quality)
    resp, err := l.primary.Generate(ctx, messages)
    if err == nil {
        return resp, nil
    }
    log.Printf("Primary LLM failed: %v", err)

    // Try secondary (faster, cheaper)
    resp, err = l.secondary.Generate(ctx, messages)
    if err == nil {
        return resp, nil
    }
    log.Printf("Secondary LLM failed: %v", err)

    // Try tertiary (different provider)
    return l.tertiary.Generate(ctx, messages)
}

Cost Optimization

Token Usage Tracking

func (o *OpenAILLM) trackUsage(resp openai.ChatCompletionResponse) {
    metrics.RecordCounter("llm.openai.input_tokens", int64(resp.Usage.PromptTokens))
    metrics.RecordCounter("llm.openai.output_tokens", int64(resp.Usage.CompletionTokens))

    // Calculate cost
    inputCost := float64(resp.Usage.PromptTokens) * 0.000005  // $5/1M tokens
    outputCost := float64(resp.Usage.CompletionTokens) * 0.000015 // $15/1M tokens

    metrics.RecordCounter("llm.openai.cost_usd", inputCost+outputCost)
}

Prompt Compression

// Remove unnecessary whitespace and formatting
func compressPrompt(prompt string) string {
    // Remove extra newlines
    prompt = regexp.MustCompile(`\n{3,}`).ReplaceAllString(prompt, "\n\n")

    // Remove leading/trailing whitespace from lines
    lines := strings.Split(prompt, "\n")
    for i, line := range lines {
        lines[i] = strings.TrimSpace(line)
    }

    return strings.Join(lines, "\n")
}

Next Steps

Gemini Configuration - Faster, cheaper alternative
Gemini Live - Native audio-to-audio
Function Calling - Advanced tool use
Latency Optimization - Reduce response time