Interruption Handling

Interruption handling (also known as barge-in) allows users to speak while the bot is talking, creating a more natural conversation experience.

Why Interruptions Matter

Without interruption handling:
─────────────────────────────────────────────────────────────
Bot: "I can help you with that. Your order number one two three
     four five was shipped on December twentieth and is currently
     in transit. The estimated delivery date is December twenty..."
User: "WAIT! That's the wrong order!"
Bot: "...ninth. Is there anything else I can help you with?"
     ← Bot continues, ignoring user

With interruption handling:
─────────────────────────────────────────────────────────────
Bot: "I can help you with that. Your order number one two three—"
User: "Wait, that's the wrong order!"
Bot: [Stops immediately] "I apologize. What's the correct order number?"
     ← Natural conversation

How Interruption Detection Works

                     ┌─────────────────────────────────────────┐
                     │              Pipeline                   │
                     │                                         │
User Audio ─────────►│─► VAD ──────────────────────────────────│
                     │     │                                   │
                     │     ▼                                   │
                     │   Is bot speaking?                      │
                     │     │                                   │
                     │     ├── No: Normal processing           │
                     │     │                                   │
                     │     └── Yes: INTERRUPTION!              │
                     │            │                            │
Bot Speaking ◄───────│◄───────────│                            │
                     │            ▼                            │
                     │      1. Stop TTS playback               │
                     │      2. Cancel LLM generation           │
                     │      3. Clear audio queue               │
                     │      4. Process user speech             │
                     │                                         │
                     └─────────────────────────────────────────┘

Configuration

Enable/Disable Per Agent

{
  "agent": {
    "name": "Customer Support",
    "allowInterruptions": true,
    "interruptionConfig": {
      "minSpeechDuration": 200,
      "confidenceThreshold": 0.85
    }
  }
}

Configuration Options

Parameter	Type	Default	Description
`allowInterruptions`	bool	true	Enable barge-in
`minSpeechDuration`	int	200	Min ms of speech before interrupt
`confidenceThreshold`	float	0.8	VAD confidence to trigger
`ignoreFillerWords`	bool	true	Don't interrupt for "um", "uh"

Implementation

Interruption Handler

type InterruptionHandler struct {
    pipeline    *Pipeline
    enabled     bool
    minDuration time.Duration
    threshold   float32
    callback    func()
}

func (h *InterruptionHandler) OnVADEvent(event VADEvent) {
    if !h.enabled {
        return
    }

    if event.Type != SpeechStart {
        return
    }

    // Check if bot is currently speaking
    if !h.pipeline.IsBotSpeaking() {
        return
    }

    // Wait for minimum speech duration to avoid false positives
    time.Sleep(h.minDuration)

    // Re-check if still speaking
    if !h.pipeline.IsUserSpeaking() {
        return // Brief noise, not real interruption
    }

    // Trigger interruption
    h.triggerInterruption()
}

func (h *InterruptionHandler) triggerInterruption() {
    log.Debug("Interruption detected, stopping bot output")

    // 1. Stop audio playback immediately
    h.pipeline.ClearAudioQueue()

    // 2. Cancel LLM generation
    h.pipeline.CancelLLMGeneration()

    // 3. Stop TTS synthesis
    h.pipeline.StopTTS()

    // 4. Mark bot as not speaking
    h.pipeline.SetBotSpeaking(false)

    // 5. Enable STT for user speech
    h.pipeline.UnmuteStt()

    // 6. Notify callback
    if h.callback != nil {
        h.callback()
    }

    // Track metric
    metrics.RecordCounter("pipeline.interruptions", 1)
}

Pipeline Integration

func (p *Pipeline) setupInterruptionHandling() {
    if !p.config.AllowInterruptions {
        return
    }

    handler := &InterruptionHandler{
        pipeline:    p,
        enabled:     true,
        minDuration: time.Duration(p.config.MinSpeechDuration) * time.Millisecond,
        threshold:   p.config.InterruptionThreshold,
        callback:    p.onInterruption,
    }

    p.vadProcessor.SetInterruptionHandler(handler)
}

func (p *Pipeline) onInterruption() {
    // Add context for LLM about interruption
    p.context.AddEvent(InterruptionEvent{
        Timestamp: time.Now(),
        BotWasSaying: p.lastBotUtterance,
    })
}

Audio Queue Management

When interrupted, clear pending audio immediately:

type AudioQueue struct {
    queue   [][]byte
    mu      sync.Mutex
    playing bool
}

func (q *AudioQueue) Clear() {
    q.mu.Lock()
    defer q.mu.Unlock()

    q.queue = nil
    q.playing = false
}

func (q *AudioQueue) Enqueue(audio []byte) {
    q.mu.Lock()
    defer q.mu.Unlock()

    q.queue = append(q.queue, audio)
}

// In telephony provider
func (p *TwilioProvider) ClearPlayback() {
    // Clear local queue
    p.audioQueue.Clear()

    // Send clear message to Twilio
    p.sendMessage(TwilioMessage{
        Event:     "clear",
        StreamSid: p.streamSid,
    })
}

LLM Cancellation

Cancel in-flight LLM requests:

type LLMProcessor struct {
    cancelFunc context.CancelFunc
    generating bool
}

func (l *LLMProcessor) Cancel() {
    if l.cancelFunc != nil {
        l.cancelFunc()
    }
    l.generating = false
}

func (l *LLMProcessor) Generate(ctx context.Context, messages []Message) <-chan string {
    // Create cancellable context
    ctx, l.cancelFunc = context.WithCancel(ctx)
    l.generating = true

    tokenChan := make(chan string)

    go func() {
        defer close(tokenChan)
        defer func() { l.generating = false }()

        for token := range l.llm.StreamGenerate(ctx, messages) {
            select {
            case <-ctx.Done():
                return // Cancelled
            case tokenChan <- token:
            }
        }
    }()

    return tokenChan
}

Handling Interruption Context

Help the LLM understand the interruption:

func (p *Pipeline) processInterruption(userSpeech string) {
    // What the bot was saying when interrupted
    botContext := p.lastBotUtterance

    // Build context message for LLM
    interruptContext := fmt.Sprintf(
        "[User interrupted. Bot was saying: \"%s\". User said: \"%s\"]",
        truncate(botContext, 100),
        userSpeech,
    )

    // Add to message history
    p.context.AddMessage(Message{
        Role:    "system",
        Content: interruptContext,
    })
}

LLM Prompt for Handling Interruptions

systemPrompt := `You are a customer support agent.

INTERRUPTION HANDLING:
- If user interrupts, acknowledge briefly and address their concern
- Don't repeat what you were saying
- Stay focused on what the user wants
- Examples:
  * "Got it, let me help with that instead."
  * "Of course, what would you like to know?"
  * "Sure, I'll look into that for you."

DO NOT:
- Say "I was saying..." or "As I was mentioning..."
- Apologize excessively for being interrupted
- Repeat the interrupted content`

Avoiding False Positives

Filler Word Detection

func (h *InterruptionHandler) isFillerWord(transcript string) bool {
    fillers := []string{
        "um", "uh", "hmm", "ah", "er", "like",
        "you know", "i mean", "okay", "right",
    }

    lower := strings.ToLower(strings.TrimSpace(transcript))
    for _, filler := range fillers {
        if lower == filler {
            return true
        }
    }
    return false
}

func (h *InterruptionHandler) OnTranscript(transcript TranscriptEvent) {
    if h.isFillerWord(transcript.Text) {
        return // Don't interrupt for filler words
    }

    // Proceed with interruption handling
    if transcript.IsFinal && h.pendingInterruption {
        h.confirmInterruption()
    }
}

Speech Duration Check

func (h *InterruptionHandler) OnVADEvent(event VADEvent) {
    if event.Type == SpeechStart {
        h.speechStartTime = time.Now()
        h.pendingInterruption = true
    }

    if event.Type == SpeechEnd {
        duration := time.Since(h.speechStartTime)

        if duration < h.minDuration {
            // Too short, likely noise
            h.pendingInterruption = false
            return
        }

        // Real interruption
        h.confirmInterruption()
    }
}

Gemini Live Interruptions

Gemini Live has built-in interruption handling:

func (g *GeminiLiveClient) HandleInterruption() error {
    // Send turn complete signal
    msg := map[string]any{
        "clientContent": map[string]any{
            "turnComplete": true,
        },
    }
    return g.conn.WriteJSON(msg)
}

// Gemini Live automatically:
// 1. Stops generating audio
// 2. Clears pending output
// 3. Listens for new user input

Best Practices

1. Buffer Before Confirming

Wait briefly before confirming interruption:

// Wait 200ms of continuous speech before interrupting
if speechDuration > 200*time.Millisecond {
    triggerInterruption()
}

2. Don't Interrupt During Critical Info

func (p *Pipeline) ShouldAllowInterruption() bool {
    // Don't allow during confirmation
    if p.state == StateConfirmingAction {
        return false
    }

    // Don't allow during sensitive info
    if p.lastMessage.ContainsSensitiveInfo {
        return false
    }

    return p.config.AllowInterruptions
}

3. Track Interruption Patterns

type InterruptionMetrics struct {
    TotalInterruptions  int
    FalsePositives      int
    AveragePosition     float64 // % into bot utterance
    CommonPhrases       map[string]int
}

func (m *InterruptionMetrics) Record(event InterruptionEvent) {
    m.TotalInterruptions++

    position := float64(event.BotPosition) / float64(event.BotTotalLength)
    m.AveragePosition = (m.AveragePosition + position) / 2

    phrase := event.UserFirstWords
    m.CommonPhrases[phrase]++
}

Troubleshooting

Issue	Cause	Solution
Too many interruptions	Threshold too low	Increase `minSpeechDuration`
Missed interruptions	Threshold too high	Decrease `confidenceThreshold`
Delayed response	Queue not cleared	Check `ClearPlayback()`
Bot repeats itself	No interruption context	Add context to LLM

Next Steps

VAD Configuration - Tune voice detection
Turn Detection - Conversation flow
Gemini Live - Built-in interruptions