Anthropic Claude LLM

Claude 3.5 Sonnet provides excellent instruction following and nuanced responses, making it ideal for complex customer interactions.

Why Claude?

Feature	Claude 3.5 Sonnet	GPT-4o
Instruction Following	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Context Window	200K tokens	128K tokens
Time to First Token	~220ms	~250ms
Safety	Excellent	Good
Cost	$3/$15 per 1M tokens	$5/$15 per 1M tokens

Best for: Complex conversations requiring nuance, safety-critical applications.

Configuration

Basic Setup

{
  "agent": {
    "name": "Premium Support",
    "llmProvider": "anthropic",
    "llmModel": "claude-3-5-sonnet-20241022",
    "llmTemperature": 0.7,
    "prompt": "You are a helpful customer support agent..."
  }
}

Environment Variables

ANTHROPIC_API_KEY=your_anthropic_api_key

Advanced Configuration

{
  "llmProvider": "anthropic",
  "llmModel": "claude-3-5-sonnet-20241022",
  "llmConfig": {
    "temperature": 0.7,
    "max_tokens": 500,
    "top_p": 0.9,
    "top_k": 40
  }
}

Model Comparison

Model	Speed	Intelligence	Cost	Best For
claude-3-5-sonnet	🚀 Fast	⭐⭐⭐⭐⭐	$$$	Production voice
claude-3-opus	🐢 Slower	⭐⭐⭐⭐⭐	$$$$	Complex reasoning
claude-3-haiku	⚡ Fastest	⭐⭐⭐⭐	$	Cost-sensitive

Implementation

Streaming Response

type AnthropicLLM struct {
    client *anthropic.Client
    model  string
}

func NewAnthropicLLM(apiKey, model string) *AnthropicLLM {
    client := anthropic.NewClient(apiKey)
    return &AnthropicLLM{
        client: client,
        model:  model,
    }
}

func (a *AnthropicLLM) StreamGenerate(ctx context.Context, messages []Message) <-chan string {
    tokenChan := make(chan string)

    go func() {
        defer close(tokenChan)

        // Convert to Anthropic format
        anthropicMessages := convertMessages(messages)

        stream, err := a.client.Messages.Stream(ctx, anthropic.MessageCreateParams{
            Model:     a.model,
            MaxTokens: 500,
            Messages:  anthropicMessages,
        })
        if err != nil {
            return
        }
        defer stream.Close()

        for {
            event, err := stream.Recv()
            if err == io.EOF {
                return
            }
            if err != nil {
                return
            }

            if delta, ok := event.(anthropic.ContentBlockDelta); ok {
                if text := delta.Delta.Text; text != "" {
                    tokenChan <- text
                }
            }
        }
    }()

    return tokenChan
}

Function Calling (Tools)

func (a *AnthropicLLM) GenerateWithTools(ctx context.Context, messages []Message, tools []Tool) (*Response, error) {
    // Convert tools to Anthropic format
    anthropicTools := make([]anthropic.Tool, len(tools))
    for i, tool := range tools {
        anthropicTools[i] = anthropic.Tool{
            Name:        tool.Name,
            Description: tool.Description,
            InputSchema: tool.Parameters,
        }
    }

    resp, err := a.client.Messages.Create(ctx, anthropic.MessageCreateParams{
        Model:     a.model,
        MaxTokens: 500,
        Messages:  convertMessages(messages),
        Tools:     anthropicTools,
    })
    if err != nil {
        return nil, err
    }

    // Check for tool use
    for _, block := range resp.Content {
        if toolUse, ok := block.(anthropic.ToolUseBlock); ok {
            return &Response{
                ToolCalls: []ToolCall{{
                    ID:        toolUse.ID,
                    Name:      toolUse.Name,
                    Arguments: toolUse.Input,
                }},
            }, nil
        }
    }

    // Extract text response
    return extractTextResponse(resp), nil
}

System Prompt Best Practices

Claude excels at following detailed instructions:

systemPrompt := `You are Alex, a customer support agent for Acme Corp.

<persona>
- Warm and professional tone
- Patient with frustrated customers
- Admits uncertainty rather than guessing
</persona>

<voice_guidelines>
- Keep responses to 1-2 sentences
- Use conversational language
- Avoid bullet points and lists
- Numbers: "one two three" not "123"
</voice_guidelines>

<boundaries>
- Only discuss Acme products and services
- Don't make promises about refunds over $100
- Transfer to human for: legal, compliance, security issues
</boundaries>

<tools>
You have access to:
- get_order_status: Look up order information
- transfer_call: Connect to human agent
</tools>

<examples>
User: "Where's my order?"
Good: "I'd be happy to check that for you. What's your order number?"
Bad: "I can help with that! To look up your order status, I'll need your order number. Our orders typically ship within 2-3 business days and..."
</examples>`

Claude's Unique Strengths

1. Nuanced Understanding

Claude handles ambiguous requests well:

User: "I'm not sure if I want this anymore"

Claude: "I understand you're having second thoughts. Would you like me to
        explain the return process, or would you prefer to discuss what's
        making you hesitate?"

2. Safety and Boundaries

Claude naturally respects boundaries:

systemPrompt := `You are a bank support agent.

NEVER:
- Share account numbers over the phone
- Process transactions without verification
- Discuss other customers' accounts

When asked to do these things, politely explain you cannot.`

// Claude will refuse gracefully without being preachy

3. Long Context

Use the 200K context window for detailed history:

func buildContext(history []Message, documents []Document) []Message {
    messages := []Message{{Role: "system", Content: systemPrompt}}

    // Add relevant documents (Claude handles long context well)
    for _, doc := range documents {
        messages = append(messages, Message{
            Role:    "user",
            Content: fmt.Sprintf("<document name=\"%s\">%s</document>", doc.Name, doc.Content),
        })
    }

    // Add conversation history
    messages = append(messages, history...)

    return messages
}

Latency Optimization

1. Use Haiku for Simple Tasks

func selectModel(intent string) string {
    switch intent {
    case "greeting", "confirmation", "farewell":
        return "claude-3-haiku-20240307" // Fast and cheap
    case "complex_query", "reasoning":
        return "claude-3-5-sonnet-20241022" // More capable
    default:
        return "claude-3-haiku-20240307"
    }
}

2. Prompt Caching (Beta)

Reduce latency for repeated prompts:

// Cache the system prompt
resp, err := client.Messages.Create(ctx, anthropic.MessageCreateParams{
    Model: "claude-3-5-sonnet-20241022",
    System: anthropic.SystemPrompt{
        Text: systemPrompt,
        CacheControl: &anthropic.CacheControl{
            Type: "ephemeral",
        },
    },
    Messages: messages,
})

3. Shorter Responses

systemPrompt := `Keep all responses under 30 words.
Be direct and actionable.
Ask only one question at a time.`

Error Handling

func (a *AnthropicLLM) generateWithRetry(ctx context.Context, messages []Message) (*Response, error) {
    maxRetries := 3
    backoff := 500 * time.Millisecond

    for i := 0; i < maxRetries; i++ {
        resp, err := a.generate(ctx, messages)
        if err == nil {
            return resp, nil
        }

        var apiErr *anthropic.APIError
        if errors.As(err, &apiErr) {
            switch apiErr.StatusCode {
            case 429: // Rate limit
                retryAfter := parseRetryAfter(apiErr.Headers)
                time.Sleep(retryAfter)
                continue
            case 529: // Overloaded
                time.Sleep(backoff)
                backoff *= 2
                continue
            case 500, 502, 503:
                time.Sleep(backoff)
                continue
            default:
                return nil, err
            }
        }

        return nil, err
    }

    return nil, fmt.Errorf("max retries exceeded")
}

Cost Tracking

func (a *AnthropicLLM) trackUsage(resp *anthropic.MessageResponse) {
    inputTokens := resp.Usage.InputTokens
    outputTokens := resp.Usage.OutputTokens

    metrics.RecordCounter("llm.anthropic.input_tokens", int64(inputTokens))
    metrics.RecordCounter("llm.anthropic.output_tokens", int64(outputTokens))

    // Claude 3.5 Sonnet pricing
    inputCost := float64(inputTokens) * 0.000003   // $3/1M tokens
    outputCost := float64(outputTokens) * 0.000015 // $15/1M tokens

    metrics.RecordCounter("llm.anthropic.cost_usd", inputCost+outputCost)
}

Comparison with GPT-4o

Aspect	Claude 3.5 Sonnet	GPT-4o
Instruction following	Excellent	Excellent
Function calling	Good	Best
Long context	200K	128K
Safety	More cautious	Less cautious
Creativity	More creative	More factual
Latency	Similar	Similar

When to Choose Claude

Complex, nuanced conversations
Safety-critical applications
Long document context needed
Creative problem-solving

When to Choose GPT-4o

Heavy function calling
Structured output requirements
Multi-modal (image) needs

Next Steps

OpenAI Configuration - GPT-4o setup
Gemini Configuration - Faster alternative
Function Calling - Add tools