Language Support
Edesy voice agents support 50+ languages with automatic provider selection optimized for each language.
Supported Languages
Tier 1 (Excellent Support)
| Language | Code | STT | TTS | LLM | Notes |
|---|---|---|---|---|---|
| English (US) | en-US | All | All | All | Best support |
| English (UK) | en-GB | All | All | All | British accent |
| English (IN) | en-IN | All | Azure | All | Indian accent |
| Spanish | es | All | All | All | Latam + Spain |
| French | fr | All | All | All | France + Canada |
| German | de | All | All | All | |
| Portuguese | pt-BR | All | All | All | Brazil |
| Italian | it | All | All | All | |
| Japanese | ja | All | All | All | |
| Korean | ko | All | All | All | |
| Chinese (Mandarin) | zh-CN | All | All | All | Simplified |
Tier 2 (Good Support)
| Language | Code | Best STT | Best TTS | Notes |
|---|---|---|---|---|
| Hindi | hi-IN | Google Chirp | Azure | Excellent support |
| Bengali | bn-IN | Google Chirp | Azure | |
| Tamil | ta-IN | Google Chirp | Azure | |
| Telugu | te-IN | Google Chirp | Azure | |
| Marathi | mr-IN | Google Chirp | Azure | |
| Gujarati | gu-IN | Google Chirp | Azure | |
| Kannada | kn-IN | Google Chirp | Azure | |
| Malayalam | ml-IN | Google Chirp | Azure | |
| Punjabi | pa-IN | Google Chirp | Azure | |
| Dutch | nl | Deepgram | Azure | |
| Polish | pl | Deepgram | Azure | |
| Russian | ru | Deepgram | Azure | |
| Turkish | tr | Deepgram | Azure | |
| Arabic | ar | Azure | MSA + dialects | |
| Hebrew | he | Azure |
Tier 3 (Basic Support)
| Language | Code | Best STT | Best TTS | Notes |
|---|---|---|---|---|
| Assamese | as-IN | ElevenLabs | Azure | Limited vocabulary |
| Odia | or-IN | Azure | ||
| Nepali | ne | Azure | ||
| Sinhala | si | Azure | ||
| Thai | th | Azure | ||
| Vietnamese | vi | Azure | ||
| Indonesian | id | Deepgram | Azure | |
| Malay | ms | Azure | ||
| Filipino | fil | Azure | ||
| Ukrainian | uk | Azure | ||
| Czech | cs | Deepgram | Azure | |
| Romanian | ro | Deepgram | Azure | |
| Hungarian | hu | Deepgram | Azure |
Configuration
Basic Language Setup
{
"agent": {
"name": "Hindi Support Agent",
"language": "hi-IN",
"llmProvider": "gemini-2.5",
"sttProvider": "google",
"sttModel": "chirp_2",
"ttsProvider": "azure",
"ttsVoice": "hi-IN-SwaraNeural"
}
}
Auto-Detection
Enable language detection for multi-lingual support:
{
"agent": {
"name": "Multi-lingual Agent",
"language": "auto",
"supportedLanguages": ["en-US", "hi-IN", "es"],
"sttConfig": {
"enableLanguageDetection": true
}
}
}
Provider Recommendations by Language
English (en-US, en-GB)
{
"sttProvider": "deepgram",
"sttModel": "nova-3",
"ttsProvider": "cartesia",
"llmProvider": "gemini-2.5"
}
Why: Deepgram Nova-3 has the lowest latency and highest accuracy for English. Cartesia provides natural-sounding voices with fast streaming.
Hindi (hi-IN)
{
"sttProvider": "google",
"sttModel": "chirp_2",
"ttsProvider": "azure",
"ttsVoice": "hi-IN-SwaraNeural",
"llmProvider": "gemini-2.5"
}
Why: Google Chirp 2 is specifically optimized for Indic languages. Azure has the best Hindi voices.
Available Hindi Voices (Azure)
| Voice | Gender | Style |
|---|---|---|
| hi-IN-SwaraNeural | Female | Warm, conversational |
| hi-IN-MadhurNeural | Male | Professional |
| hi-IN-AnanyaNeural | Female | Clear, formal |
| hi-IN-ArjunNeural | Male | Young, friendly |
Tamil (ta-IN)
{
"sttProvider": "google",
"sttModel": "chirp_2",
"ttsProvider": "azure",
"ttsVoice": "ta-IN-PallaviNeural",
"llmProvider": "gemini-2.5"
}
Bengali (bn-IN)
{
"sttProvider": "google",
"sttModel": "chirp_2",
"ttsProvider": "azure",
"ttsVoice": "bn-IN-TanishaaNeural",
"llmProvider": "gemini-2.5"
}
Assamese (as-IN)
{
"sttProvider": "elevenlabs",
"ttsProvider": "azure",
"ttsVoice": "as-IN-YashicaNeural",
"llmProvider": "gemini-2.5"
}
Note: Assamese has limited STT support. ElevenLabs Scribe provides the best accuracy.
Language-Specific Configuration
Script Direction
For RTL languages (Arabic, Hebrew):
{
"language": "ar",
"ttsConfig": {
"scriptDirection": "rtl"
}
}
Number Formatting
Configure how numbers are spoken:
// Language-specific number formatting
type NumberFormat struct {
Language string
GroupingSep string // "," or "."
DecimalSep string // "." or ","
SpokenFormat string // "one two three" vs "hundred twenty three"
}
var formats = map[string]NumberFormat{
"en-US": {Language: "en-US", GroupingSep: ",", DecimalSep: ".", SpokenFormat: "natural"},
"hi-IN": {Language: "hi-IN", GroupingSep: ",", DecimalSep: ".", SpokenFormat: "indian"},
"de": {Language: "de", GroupingSep: ".", DecimalSep: ",", SpokenFormat: "natural"},
}
Honorifics
For languages with formal address:
{
"language": "hi-IN",
"prompt": "Always use 'आप' (respectful you) instead of 'तुम'. Address customers with 'जी' suffix.",
"llmConfig": {
"formalityLevel": "formal"
}
}
Multi-lingual Agents
Code-Switching Support
Handle users who switch languages mid-conversation:
{
"agent": {
"name": "Bilingual Support",
"primaryLanguage": "en-US",
"secondaryLanguage": "hi-IN",
"codeSwitchingEnabled": true,
"prompt": "You can respond in both English and Hindi. Match the language the customer uses. Hinglish (mixed) is acceptable if the customer uses it."
}
}
Language Detection
type LanguageDetector struct {
stt STTProvider
}
func (d *LanguageDetector) Detect(audio []byte) (string, float32) {
result := d.stt.DetectLanguage(audio)
return result.Language, result.Confidence
}
// In pipeline
func (p *Pipeline) handleLanguageDetection(audio []byte) {
lang, confidence := p.langDetector.Detect(audio)
if confidence > 0.8 && lang != p.currentLanguage {
p.switchLanguage(lang)
}
}
func (p *Pipeline) switchLanguage(lang string) {
// Update STT language
p.stt.SetLanguage(lang)
// Update TTS voice
p.tts.SetVoice(getVoiceForLanguage(lang))
// Update LLM context
p.llm.SetLanguageContext(lang)
}
Voice Selection by Language
English Voices
| Provider | Voice | Style | Best For |
|---|---|---|---|
| Cartesia | Sophie | Warm, friendly | Customer support |
| ElevenLabs | Rachel | Professional | Sales |
| Azure | en-US-JennyNeural | Clear | IVR |
Hindi Voices
| Provider | Voice | Style | Gender |
|---|---|---|---|
| Azure | hi-IN-SwaraNeural | Conversational | Female |
| Azure | hi-IN-MadhurNeural | Professional | Male |
| hi-IN-Wavenet-A | Clear | Female |
Spanish Voices
| Provider | Voice | Accent | Gender |
|---|---|---|---|
| Azure | es-MX-DaliaNeural | Mexican | Female |
| Azure | es-ES-ElviraNeural | Castilian | Female |
| ElevenLabs | Valentina | Neutral | Female |
Accuracy Considerations
Word Error Rate (WER) by Language
| Language | Deepgram | Azure | ElevenLabs | |
|---|---|---|---|---|
| en-US | 8.4% | 10.2% | 11.5% | 12.1% |
| hi-IN | 22% | 12% | 15% | 18% |
| ta-IN | - | 14% | 18% | - |
| as-IN | - | 30% | - | 20% |
Improving Accuracy
- Custom Vocabulary: Add domain-specific terms
{
"sttConfig": {
"keywords": ["Edesy:2", "voice agent:2"]
}
}
- Prompt Engineering: Help LLM understand context
"If the user's speech is unclear, ask them to repeat.
Common misheard words in Hindi: 'हां' (yes) and 'नहीं' (no)"
- Post-Processing: Correct common errors
func correctTranscript(text, language string) string {
corrections := languageCorrections[language]
for wrong, right := range corrections {
text = strings.ReplaceAll(text, wrong, right)
}
return text
}
Next Steps
- Hindi Configuration - Detailed Hindi setup
- STT Providers - Provider comparison
- TTS Providers - Voice selection