ElevenLabs Scribe STT
ElevenLabs Scribe provides speech recognition for 99 languages with particularly strong support for regional and less common languages.
Why ElevenLabs Scribe?
| Feature | ElevenLabs Scribe | Deepgram |
|---|---|---|
| Languages | 99 | 35+ |
| Regional Languages | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| Assamese Support | ✅ Yes | ❌ No |
| Cost | $0.007/min | $0.0043/min |
| Latency | ~150ms | ~80ms |
Best for: Assamese, Odia, Nepali, and other regional languages with limited provider support.
Configuration
Basic Setup
{
"agent": {
"name": "Regional Support",
"language": "as-IN",
"sttProvider": "elevenlabs"
}
}
Environment Variables
ELEVENLABS_API_KEY=your_elevenlabs_api_key
Advanced Configuration
{
"sttProvider": "elevenlabs",
"sttConfig": {
"language_code": "as",
"tag_audio_events": false,
"diarize": false,
"num_speakers": 1
}
}
Implementation
WebSocket Connection
type ElevenLabsSTT struct {
apiKey string
language string
conn *websocket.Conn
eventChan chan TranscriptEvent
}
func (e *ElevenLabsSTT) Connect(ctx context.Context) error {
wsURL := "wss://api.elevenlabs.io/v1/speech-to-text/websocket"
headers := http.Header{}
headers.Set("xi-api-key", e.apiKey)
conn, _, err := websocket.DefaultDialer.DialContext(ctx, wsURL, headers)
if err != nil {
return fmt.Errorf("elevenlabs connect: %w", err)
}
e.conn = conn
// Send initial config
config := map[string]any{
"language_code": e.language,
"tag_audio_events": false,
"transcribe_mode": "realtime",
"sample_rate": 8000,
"encoding": "pcm_16",
}
if err := conn.WriteJSON(config); err != nil {
return err
}
go e.receiveLoop()
return nil
}
Sending Audio
func (e *ElevenLabsSTT) SendAudio(audio []byte) error {
// Encode audio as base64
encoded := base64.StdEncoding.EncodeToString(audio)
msg := map[string]any{
"audio": encoded,
}
return e.conn.WriteJSON(msg)
}
Receiving Results
func (e *ElevenLabsSTT) receiveLoop() {
for {
_, msg, err := e.conn.ReadMessage()
if err != nil {
return
}
var response ElevenLabsResponse
json.Unmarshal(msg, &response)
switch response.Type {
case "transcript":
e.eventChan <- TranscriptEvent{
Text: response.Text,
IsFinal: response.IsFinal,
}
case "audio_event":
// Handle audio events if enabled
log.Debug("Audio event: %s", response.Event)
}
}
}
Language Support
Well-Supported Languages
| Language | Code | Accuracy | Notes |
|---|---|---|---|
| English | en | ⭐⭐⭐⭐⭐ | All variants |
| Spanish | es | ⭐⭐⭐⭐⭐ | |
| French | fr | ⭐⭐⭐⭐⭐ | |
| German | de | ⭐⭐⭐⭐⭐ | |
| Hindi | hi | ⭐⭐⭐⭐ | |
| Portuguese | pt | ⭐⭐⭐⭐⭐ |
Regional Languages (Unique Support)
| Language | Code | Accuracy | Notes |
|---|---|---|---|
| Assamese | as | ⭐⭐⭐⭐ | Best option available |
| Odia | or | ⭐⭐⭐ | Limited alternatives |
| Nepali | ne | ⭐⭐⭐⭐ | |
| Sinhala | si | ⭐⭐⭐ | |
| Khmer | km | ⭐⭐⭐ | |
| Lao | lo | ⭐⭐⭐ | |
| Burmese | my | ⭐⭐⭐ |
Audio Events
Detect non-speech audio:
{
"sttConfig": {
"tag_audio_events": true
}
}
Events detected:
laughter- User laughingapplause- Clapping soundsmusic- Background musicsilence- Extended silencenoise- Background noise
func (e *ElevenLabsSTT) handleAudioEvent(event string) {
switch event {
case "laughter":
// User is happy, adjust tone
e.context.SetMood("positive")
case "noise":
// High background noise
log.Warn("High background noise detected")
}
}
Speaker Diarization
Identify multiple speakers:
{
"sttConfig": {
"diarize": true,
"num_speakers": 2
}
}
type DiarizedTranscript struct {
Speaker string
Text string
Start float64
End float64
}
// Response includes speaker labels
// Speaker 0: "Hello, I need help with my order"
// Speaker 1: "Of course, what's your order number?"
Error Handling
func (e *ElevenLabsSTT) handleError(err error) {
switch {
case strings.Contains(err.Error(), "401"):
log.Error("Invalid API key")
case strings.Contains(err.Error(), "429"):
log.Warn("Rate limited, backing off")
time.Sleep(time.Second)
e.reconnect()
case strings.Contains(err.Error(), "language not supported"):
log.Error("Language %s not supported", e.language)
}
}
Cost Optimization
Pricing
| Tier | Price per Minute | Monthly Limit |
|---|---|---|
| Free | $0 | 30 minutes |
| Starter | $0.008 | Pay as you go |
| Creator | $0.007 | Included minutes |
| Pro | $0.006 | Volume discount |
Optimization Tips
// Only transcribe when needed
func (e *ElevenLabsSTT) shouldTranscribe(audio []byte) bool {
// Skip very quiet audio
volume := calculateVolume(audio)
if volume < 0.01 {
return false
}
// Skip very short clips
duration := len(audio) / (8000 * 2) // 8kHz, 16-bit
if duration < 100 { // < 100ms
return false
}
return true
}
Best Practices
1. Fallback Strategy
func (s *STTService) transcribe(audio []byte, language string) (string, error) {
// Try ElevenLabs for regional languages
if isRegionalLanguage(language) {
result, err := s.elevenlabs.Transcribe(audio)
if err == nil {
return result, nil
}
log.Warn("ElevenLabs failed: %v", err)
}
// Fallback to Google for Indic
if isIndicLanguage(language) {
return s.google.Transcribe(audio)
}
// Default to Deepgram
return s.deepgram.Transcribe(audio)
}
func isRegionalLanguage(lang string) bool {
regional := []string{"as", "or", "ne", "si", "km", "lo", "my"}
for _, r := range regional {
if strings.HasPrefix(lang, r) {
return true
}
}
return false
}
2. Handle Connection Drops
func (e *ElevenLabsSTT) maintainConnection() {
for {
select {
case <-e.pingTicker.C:
if err := e.conn.WriteJSON(map[string]string{"type": "ping"}); err != nil {
e.reconnect()
}
case <-e.ctx.Done():
return
}
}
}
3. Audio Quality Check
func checkAudioQuality(audio []byte) AudioQuality {
volume := calculateVolume(audio)
snr := estimateSNR(audio)
if volume < 0.01 {
return AudioQuality{Quality: "poor", Issue: "too quiet"}
}
if snr < 10 {
return AudioQuality{Quality: "poor", Issue: "noisy"}
}
return AudioQuality{Quality: "good"}
}
Assamese Configuration Example
Complete setup for Assamese voice agent:
{
"agent": {
"name": "Assamese Support",
"language": "as-IN",
"llmProvider": "gemini-2.5",
"sttProvider": "elevenlabs",
"sttConfig": {
"language_code": "as"
},
"ttsProvider": "azure",
"ttsVoice": "as-IN-YashicaNeural",
"prompt": "আপুনি এজন সহায়ক গ্ৰাহক সেৱা প্ৰতিনিধি..."
}
}
Next Steps
- AssemblyAI - Alternative provider
- Language Support - Full language matrix
- Fallback Configuration - Provider fallbacks