Everything you need to integrate speech recognition. REST API for batch transcription, WebSocket for real-time streaming, and guides for each STT provider.
Sign up for an account and get your API key from the dashboard.
export EDESY_API_KEY="your_api_key_here"Transcribe an audio file with a simple API call:
curl -X POST https://api.edesy.in/v1/speech-to-text \
-H "Authorization: Bearer $EDESY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"audio_url": "https://example.com/audio.mp3",
"provider": "deepgram",
"language": "en"
}'Response includes the transcribed text and metadata:
{
"id": "txn_abc123",
"status": "completed",
"text": "Hello, this is a sample transcription.",
"confidence": 0.95,
"duration_seconds": 5.2,
"words": [
{ "word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98 },
...
]
}Use the REST API for transcribing pre-recorded audio files. Supports URLs or base64-encoded audio.
/v1/speech-to-textRequest Body
{
"audio_url": "string", // URL to audio file
"audio_base64": "string", // OR base64-encoded audio
"provider": "deepgram", // STT provider
"language": "en", // Language code
"options": {
"punctuate": true, // Add punctuation
"diarize": false, // Speaker diarization
"timestamps": "word" // word | sentence | none
}
}Use WebSocket for real-time transcription. Ideal for voice assistants, live calls, and interactive applications.
const ws = new WebSocket(
'wss://api.edesy.in/v1/speech-to-text/stream?provider=deepgram&language=en',
{ headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);
ws.onopen = () => {
console.log('Connected');
// Start sending audio chunks
};
ws.onmessage = (event) => {
const result = JSON.parse(event.data);
console.log('Transcript:', result.text);
console.log('Is Final:', result.is_final);
};
// Send audio data (16kHz, 16-bit PCM)
ws.send(audioChunk);Compare STT providers based on accuracy, latency, language support, and pricing.
| Provider | Accuracy | Latency | Languages | Price |
|---|---|---|---|---|
| Deepgram | 95% | <100ms | 30+ | $0.0042/min |
| Google Chirp | 90% | ~200ms | 100+ | $0.016/min |
| Azure Speech | 92% | ~150ms | 85+ | $0.012/min |
| ElevenLabs Scribe | 88% | ~300ms | 20+ | $0.0067/min |
| AssemblyAI | 91% | ~200ms | 15+ | $0.0075/min |
| OpenAI Whisper | 85% | ~500ms | 50+ | $0.006/min |
Low Latency
Use Deepgram for voice assistants and real-time applications.
Indian Languages
Use ElevenLabs Scribe or Google Chirp for Hindi, Tamil, etc.
Cost-Effective
Use OpenAI Whisper for high-volume batch transcription.
Get your API key and start integrating speech-to-text in minutes.