Documentation

Speech-to-Text API Documentation

Everything you need to integrate speech recognition. REST API for batch transcription, WebSocket for real-time streaming, and guides for each STT provider.

Quick Start

Get transcribing in 5 minutes

REST API

Batch transcription endpoints

WebSocket Streaming

Real-time transcription

Providers

Compare STT providers

Quick Start

Get Started in 5 Minutes

Get Your API Key

export EDESY_API_KEY="your_api_key_here"

Make Your First Request

Transcribe an audio file with a simple API call:

curl -X POST https://api.edesy.in/v1/speech-to-text \
  -H "Authorization: Bearer $EDESY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "provider": "deepgram",
    "language": "en"
  }'

Get Your Transcription

Response includes the transcribed text and metadata:

{
  "id": "txn_abc123",
  "status": "completed",
  "text": "Hello, this is a sample transcription.",
  "confidence": 0.95,
  "duration_seconds": 5.2,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98 },
    ...
  ]
}

REST API

Batch Transcription API

Use the REST API for transcribing pre-recorded audio files. Supports URLs or base64-encoded audio.

POST

/v1/speech-to-text

Request Body

{
  "audio_url": "string",        // URL to audio file
  "audio_base64": "string",     // OR base64-encoded audio
  "provider": "deepgram",       // STT provider
  "language": "en",             // Language code
  "options": {
    "punctuate": true,          // Add punctuation
    "diarize": false,           // Speaker diarization
    "timestamps": "word"        // word | sentence | none
  }
}

Real-Time Streaming

WebSocket Streaming API

Use WebSocket for real-time transcription. Ideal for voice assistants, live calls, and interactive applications.

WebSocket Connection

const ws = new WebSocket(
  'wss://api.edesy.in/v1/speech-to-text/stream?provider=deepgram&language=en',
  { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);

ws.onopen = () => {
  console.log('Connected');
  // Start sending audio chunks
};

ws.onmessage = (event) => {
  const result = JSON.parse(event.data);
  console.log('Transcript:', result.text);
  console.log('Is Final:', result.is_final);
};

// Send audio data (16kHz, 16-bit PCM)
ws.send(audioChunk);

Supports interim results for real-time feedback

Provider Comparison

Choose the Right Provider

Compare STT providers based on accuracy, latency, language support, and pricing.

Provider	Accuracy	Latency	Languages	Price
Deepgram	95%	<100ms	30+	$0.0042/min
Google Chirp	90%	~200ms	100+	$0.016/min
Azure Speech	92%	~150ms	85+	$0.012/min
ElevenLabs Scribe	88%	~300ms	20+	$0.0067/min
AssemblyAI	91%	~200ms	15+	$0.0075/min
OpenAI Whisper	85%	~500ms	50+	$0.006/min

Low Latency

Use Deepgram for voice assistants and real-time applications.

Indian Languages

Use ElevenLabs Scribe or Google Chirp for Hindi, Tamil, etc.

Cost-Effective

Use OpenAI Whisper for high-volume batch transcription.

Ready to Start Transcribing?

Get your API key and start integrating speech-to-text in minutes.

View Pricing

Documentation

Speech-to-Text API Documentation

Everything you need to integrate speech recognition. REST API for batch transcription, WebSocket for real-time streaming, and guides for each STT provider.

Quick Start

Get transcribing in 5 minutes

REST API

Batch transcription endpoints

WebSocket Streaming

Real-time transcription

Providers

Compare STT providers

Quick Start

Get Started in 5 Minutes

Get Your API Key

export EDESY_API_KEY="your_api_key_here"

Make Your First Request

Transcribe an audio file with a simple API call:

curl -X POST https://api.edesy.in/v1/speech-to-text \
  -H "Authorization: Bearer $EDESY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://example.com/audio.mp3",
    "provider": "deepgram",
    "language": "en"
  }'

Get Your Transcription

Response includes the transcribed text and metadata:

{
  "id": "txn_abc123",
  "status": "completed",
  "text": "Hello, this is a sample transcription.",
  "confidence": 0.95,
  "duration_seconds": 5.2,
  "words": [
    { "word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98 },
    ...
  ]
}

REST API

Batch Transcription API

Use the REST API for transcribing pre-recorded audio files. Supports URLs or base64-encoded audio.

POST

/v1/speech-to-text

Request Body

{
  "audio_url": "string",        // URL to audio file
  "audio_base64": "string",     // OR base64-encoded audio
  "provider": "deepgram",       // STT provider
  "language": "en",             // Language code
  "options": {
    "punctuate": true,          // Add punctuation
    "diarize": false,           // Speaker diarization
    "timestamps": "word"        // word | sentence | none
  }
}

Real-Time Streaming

WebSocket Streaming API

Use WebSocket for real-time transcription. Ideal for voice assistants, live calls, and interactive applications.

WebSocket Connection

const ws = new WebSocket(
  'wss://api.edesy.in/v1/speech-to-text/stream?provider=deepgram&language=en',
  { headers: { 'Authorization': 'Bearer YOUR_API_KEY' } }
);

ws.onopen = () => {
  console.log('Connected');
  // Start sending audio chunks
};

ws.onmessage = (event) => {
  const result = JSON.parse(event.data);
  console.log('Transcript:', result.text);
  console.log('Is Final:', result.is_final);
};

// Send audio data (16kHz, 16-bit PCM)
ws.send(audioChunk);

Supports interim results for real-time feedback

Provider Comparison

Choose the Right Provider

Compare STT providers based on accuracy, latency, language support, and pricing.

Provider	Accuracy	Latency	Languages	Price
Deepgram	95%	<100ms	30+	$0.0042/min
Google Chirp	90%	~200ms	100+	$0.016/min
Azure Speech	92%	~150ms	85+	$0.012/min
ElevenLabs Scribe	88%	~300ms	20+	$0.0067/min
AssemblyAI	91%	~200ms	15+	$0.0075/min
OpenAI Whisper	85%	~500ms	50+	$0.006/min

Low Latency

Use Deepgram for voice assistants and real-time applications.

Indian Languages

Use ElevenLabs Scribe or Google Chirp for Hindi, Tamil, etc.

Cost-Effective

Use OpenAI Whisper for high-volume batch transcription.

Ready to Start Transcribing?

Get your API key and start integrating speech-to-text in minutes.

View Pricing

Speech-to-Text API Documentation

Get Started in 5 Minutes

Batch Transcription API

WebSocket Streaming API

Choose the Right Provider

Ready to Start Transcribing?

Stay Updated

Speech-to-Text API Documentation

Get Started in 5 Minutes

Batch Transcription API

WebSocket Streaming API

Choose the Right Provider

Ready to Start Transcribing?

Stay Updated