Quick Start Guide
Get your first AI voice agent up and running in minutes.
Prerequisites
- Node.js 18+ or Go 1.21+
- A Twilio or Exotel account
- API keys for your chosen providers (Deepgram, OpenAI, etc.)
Step 1: Clone the Repository
git clone https://github.com/edesy-labs/voice-agent.git
cd voice-agent/backend-go
Step 2: Configure Environment
Create a .env file with your provider credentials:
# Telephony
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890
# STT (Speech-to-Text)
DEEPGRAM_API_KEY=your_deepgram_key
# TTS (Text-to-Speech)
CARTESIA_API_KEY=your_cartesia_key
# LLM
OPENAI_API_KEY=your_openai_key
# Or for Gemini:
GOOGLE_AI_API_KEY=your_google_key
# Server
NGINX_DOMAIN=your-domain.com
REDIS_HOST=localhost
REDIS_PORT=6379
Step 3: Build and Run
# Install dependencies
go mod tidy
# Build the server
go build -o voice-agent .
# Run
./voice-agent
The server will start on port 8080 by default.
Step 4: Configure Your Agent
Create an agent via the API or dashboard:
curl -X POST https://your-domain.com/api/agents \
-H "Content-Type: application/json" \
-d '{
"name": "Customer Support Agent",
"language": "en",
"greeting": "Hello! How can I help you today?",
"prompt": "You are a helpful customer support agent for Acme Corp...",
"sttProvider": "deepgram",
"ttsProvider": "cartesia",
"llmProvider": "openai"
}'
Step 5: Make Your First Call
Option A: Inbound Calls
Configure your Twilio phone number to point to:
https://your-domain.com/twiml/{agentId}
Option B: Outbound Calls
Trigger an outbound call via API:
curl -X POST https://your-domain.com/make-call \
-H "Content-Type: application/json" \
-d '{
"agent_id": "your_agent_id",
"phone_number": "+1234567890",
"workspace_id": "your_workspace_id"
}'
What Happens During a Call
- Call Initiated: Twilio/Exotel connects the call
- WebSocket Established: Audio stream connects to your server
- Greeting Played: Agent speaks the greeting message
- Conversation Loop:
- User speaks → VAD detects speech
- Audio sent to STT (Deepgram)
- Transcript sent to LLM (OpenAI/Gemini)
- Response sent to TTS (Cartesia)
- Audio played back to user
- Call Ends: Disposition logged, recording saved
Monitoring Your Agent
View real-time logs:
tail -f logs/$(date +%Y-%m-%d)/*.log
Key metrics to watch:
- E2E Latency: Total response time (target: < 500ms)
- STT Latency: Speech recognition time
- LLM Latency: Model response time
- TTS Latency: Audio generation time
Next Steps
- Architecture Overview - Deep dive into system design
- Twilio Integration - Advanced Twilio configuration
- Exotel Integration - Set up Indian telephony
- Deepgram STT - Optimize speech recognition
Troubleshooting
Call Not Connecting
- Verify your domain is accessible via HTTPS
- Check Twilio/Exotel webhook configuration
- Ensure WebSocket endpoint is reachable
High Latency
- Check provider API response times
- Consider using Gemini 2.5 Flash-Lite for faster LLM responses
- Enable streaming for STT and TTS
Audio Quality Issues
- Verify audio encoding (μ-law for Twilio/Exotel)
- Check sample rate (8kHz for telephony)
- Review VAD threshold settings