Quick Start Guide

Get your first AI voice agent up and running in minutes.

Prerequisites

Node.js 18+ or Go 1.21+
A Twilio or Exotel account
API keys for your chosen providers (Deepgram, OpenAI, etc.)

Step 1: Clone the Repository

git clone https://github.com/edesy-labs/voice-agent.git
cd voice-agent/backend-go

Step 2: Configure Environment

Create a .env file with your provider credentials:

# Telephony
TWILIO_ACCOUNT_SID=your_account_sid
TWILIO_AUTH_TOKEN=your_auth_token
TWILIO_PHONE_NUMBER=+1234567890

# STT (Speech-to-Text)
DEEPGRAM_API_KEY=your_deepgram_key

# TTS (Text-to-Speech)
CARTESIA_API_KEY=your_cartesia_key

# LLM
OPENAI_API_KEY=your_openai_key
# Or for Gemini:
GOOGLE_AI_API_KEY=your_google_key

# Server
NGINX_DOMAIN=your-domain.com
REDIS_HOST=localhost
REDIS_PORT=6379

Step 3: Build and Run

# Install dependencies
go mod tidy

# Build the server
go build -o voice-agent .

# Run
./voice-agent

The server will start on port 8080 by default.

Step 4: Configure Your Agent

Create an agent via the API or dashboard:

curl -X POST https://your-domain.com/api/agents \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Customer Support Agent",
    "language": "en",
    "greeting": "Hello! How can I help you today?",
    "prompt": "You are a helpful customer support agent for Acme Corp...",
    "sttProvider": "deepgram",
    "ttsProvider": "cartesia",
    "llmProvider": "openai"
  }'

Step 5: Make Your First Call

Option A: Inbound Calls

Configure your Twilio phone number to point to:

https://your-domain.com/twiml/{agentId}

Option B: Outbound Calls

Trigger an outbound call via API:

curl -X POST https://your-domain.com/make-call \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "your_agent_id",
    "phone_number": "+1234567890",
    "workspace_id": "your_workspace_id"
  }'

What Happens During a Call

Call Initiated: Twilio/Exotel connects the call
WebSocket Established: Audio stream connects to your server
Greeting Played: Agent speaks the greeting message
Conversation Loop:
- User speaks → VAD detects speech
- Audio sent to STT (Deepgram)
- Transcript sent to LLM (OpenAI/Gemini)
- Response sent to TTS (Cartesia)
- Audio played back to user
Call Ends: Disposition logged, recording saved

Monitoring Your Agent

View real-time logs:

tail -f logs/$(date +%Y-%m-%d)/*.log

Key metrics to watch:

E2E Latency: Total response time (target: < 500ms)
STT Latency: Speech recognition time
LLM Latency: Model response time
TTS Latency: Audio generation time

Next Steps

Architecture Overview - Deep dive into system design
Twilio Integration - Advanced Twilio configuration
Exotel Integration - Set up Indian telephony
Deepgram STT - Optimize speech recognition

Troubleshooting

Call Not Connecting

Verify your domain is accessible via HTTPS
Check Twilio/Exotel webhook configuration
Ensure WebSocket endpoint is reachable

High Latency

Check provider API response times
Consider using Gemini 2.5 Flash-Lite for faster LLM responses
Enable streaming for STT and TTS

Audio Quality Issues

Verify audio encoding (μ-law for Twilio/Exotel)
Check sample rate (8kHz for telephony)
Review VAD threshold settings