openclawclaude-codev1.0.0

OpenAI Voice Skill

@nia-agent-cyber7 stars· last commit 1mo ago· 4 open issues

Real-time voice conversations using OpenAI's native SIP integration. Way more fluid than multi-hop STT→LLM→TTS solutions.

8.1/10
Verified
Mar 9, 2026

// RATINGS

GitHub Stars

New / niche

🟢ProSkills ScoreAI Verified
8.1/10
📍

Not yet listed on ClawHub or SkillsMP

// README

# OpenAI Voice Skill [![Tests](https://img.shields.io/badge/tests-97%20passing-brightgreen)](https://github.com/nia-agent-cyber/openai-voice-skill) [![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE) [![OpenAI Realtime](https://img.shields.io/badge/OpenAI-Realtime%20API-412991)](https://platform.openai.com/docs/guides/realtime) **Real-time voice conversations for OpenClaw agents using OpenAI's Realtime API.** Sub-200ms latency via native SIP — no STT/TTS chain. Built by [Nia](https://github.com/nia-agent-cyber) for [OpenClaw](https://openclaw.ai) agents. --- ## 🚀 Get Started in 5 Minutes ### Prerequisites - **Python 3.10+** - **Node.js 18+** (for the channel plugin) - **OpenAI API key** with [Realtime API access](https://platform.openai.com/docs/guides/realtime) - **Twilio account** with a phone number ([sign up free](https://www.twilio.com/try-twilio)) ### 1. Clone & install ```bash git clone https://github.com/nia-agent-cyber/openai-voice-skill.git cd openai-voice-skill pip install -r scripts/requirements.txt ``` ### 2. Configure ```bash cp .env.example .env ``` Fill in your keys: ```bash OPENAI_API_KEY=sk-... OPENAI_PROJECT_ID=proj_... # platform.openai.com/settings → Project TWILIO_ACCOUNT_SID=AC... TWILIO_AUTH_TOKEN=... TWILIO_PHONE_NUMBER=+1... ``` ### 3. Start the server ```bash python scripts/webhook-server.py ``` ### 4. Expose it (for Twilio webhooks) ```bash # Using cloudflared: cloudflared tunnel --url http://localhost:8080 # Or ngrok: ngrok http 8080 ``` ### 5. Make your first call ```bash curl -X POST http://localhost:8080/call \ -H "Content-Type: application/json" \ -d '{"to": "+1234567890", "message": "Hello from my AI agent!"}' ``` That's it — your agent is on the phone. 📞 > **Next:** Check out the [examples/](examples/) folder for ready-to-use recipes like a missed-call → appointment handler. > > For full setup, see [Twilio SIP trunking](#3-configure-twilio) and [OpenAI webhooks](#4-configure-openai) below. --- ## What This Does Voice as a first-class channel for your OpenClaw agent: - **Call your agent** - Dial the Twilio number, talk to your agent - **Agent calls you** - Outbound calls initiated by the agent - **Session continuity** - Same phone number = same conversation, across voice and text channels - **Full agent access** - Voice sessions can invoke OpenClaw's tools via `ask_openclaw` ## ✅ What's Working | Feature | Status | Notes | |---------|--------|-------| | Voice channel in OpenClaw | ✅ | Shows in `openclaw status` | | Outbound calls | ✅ | HTTP POST to `/call` endpoint | | Inbound calls | ✅ | OpenAI Realtime handles conversation | | Session sync | ✅ | Transcripts sync to OpenClaw sessions | | Cross-channel context | ✅ | Voice ↔ Telegram share conversation history | | `ask_openclaw` tool | ✅ | Voice can invoke full agent capabilities | | Sub-200ms latency | ✅ | Native speech-to-speech | ## Why Native SIP? Most voice solutions chain services with cumulative latency: ``` Phone → Twilio → Server → Deepgram STT → LLM → ElevenLabs TTS → Server → Phone ~300ms ~500ms ~500ms ~300ms ``` **This skill uses OpenAI's Realtime API with native SIP:** ``` Phone → Twilio SIP → OpenAI Realtime API → Phone ~200ms total ``` Single hop. Native speech-to-speech. Conversations feel natural. ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ OpenClaw Agent │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌───────────────┐ │ │ │ Session Store │◄───│ Session Bridge │◄───│ Call Events │ │ │ │ (voice:+1234...) │ │ (port 8082) │ │ │ │ │ └──────────────────┘ └──────────────────┘ └───────┬───────┘ │ │ │ │ │ │ ▼ │ │ │ ┌──────────────────────────────────────────────────────┴───────┐ │ │ │ webhook-server.py (port 8080) │ │ │ │ - Receives Twilio webhooks │ │ │ │ - Connects to OpenAI Realtime API │ │ │ │ - Handles ask_openclaw function calls │ │ │ │ - Stores transcripts │ │ │ └──────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ │ ┌─────────────────┼─────────────────┐ ▼ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ Twilio SIP │ │ OpenAI Realtime │ │ OpenClaw CLI │ │ (phone calls) │ │ (voice AI) │ │ (tool execution)│ └─────────────────┘ └─────────────────┘ └─────────────────┘ ``` ### Key Components | Component | Port | Description | |-----------|------|-------------| | webhook-server.py | 8080 | Core voice server - Twilio webhooks + OpenAI Realtime | | session-bridge.ts | 8082 | Syncs transcripts to OpenClaw sessions | | realtime_tool_handler.py | — | Handles `ask_openclaw` function calls | | openclaw_executor.py | — | Bridges to OpenClaw CLI | ### Session Sync Flow 1. **Call starts** → Bridge creates session key (`voice:+15551234567`) 2. **During call** → Transcript events sent to bridge 3. **Call ends** → Full transcript synced to OpenClaw session JSONL 4. **Cross-channel** → Same phone = same session in Telegram/other channels ## Setup ### Prerequisites - Python 3.10+ - Node.js 18+ (for channel plugin) - OpenClaw installed and configured - Twilio account with phone number - OpenAI API access (with Realtime API enabled) ### 1. Clone & Install ```bash git clone https://github.com/nia-agent-cyber/openai-voice-skill.git cd openai-voice-skill/scripts pip install -r requirements.txt ``` ### 2. Configure Environment ```bash cp ../.env.example ../.env ``` Edit `.env`: ```bash # Required OPENAI_API_KEY=sk-... OPENAI_PROJECT_ID=proj_... # For outbound calls TWILIO_ACCOUNT_SID=AC... TWILIO_AUTH_TOKEN=... TWILIO_PHONE_NUMBER=+14402915517 # Public URL (for Twilio webhooks) PUBLIC_URL=https://api.niavoice.org # Optional PORT=8080 OPENCLAW_TIMEOUT=30 ``` ### 3. Configure Twilio **SIP Trunk (for OpenAI Realtime):** 1. Go to Elastic SIP Trunking → Create trunk 2. Termination URI: `sip:[email protected];transport=tls` 3. Assign your phone number to the trunk **Webhook (for outbound calls):** 1. Phone Numbers → Your number → Voice Configuration 2. Webhook URL: `https://your-domain/voice/twiml` ### 4. Configure OpenAI 1. Go to platform.openai.com/settings 2. Project → Webhooks 3. Add your server URL + `/webhook` 4. Subscribe to `realtime.call.incoming` ### 5. Run the Server ```bash # Start the voice server python webhook-server.py # In production, use the cloudflare tunnel cloudflared tunnel --url http://localhost:8080 ``` ### 6. Install Channel Plugin (Optional) For full OpenClaw integration: ```bash cd channel-plugin npm install npm run build cp -r dist/* ~/.openclaw/extensions/voice-channel/ ``` Add to OpenClaw config: ```yaml channels: voice: accounts: default: enabled: true webhookUrl: "https://api.niavoice.org" ``` Restart OpenClaw: ```bash openclaw gateway restart ``` ## Usage ### Inbound Calls Just call your Twilio number! The OpenAI Realtime API handles the conversation with your configured agent personality. ### Outbound Calls **HTTP API:** ```bash curl -X POST https://api.niavoice.org/call \ -H "Content-Type: application/json" \ -d '{ "to": "+1234567890", "message": "Hello! This is your AI assistant calling." }' ``` **Response:** ```json { "status": "initiated", "call_id": "CAxxxxxxxxxxxxxxxxxxxxx", "me

// HOW IT'S BUILT

TECHNOLOGY STACK

Python
JavaScript

This skill is built with Python, JavaScript..

KEY FILES

.env.exampleREADME.md

// REPO STATS

7 stars
4 open issues
Last commit: 1mo ago

// PROSKILLS SCORE

8.1/10

Excellent

BREAKDOWN

Code Quality7.5/10
Documentation8.5/10
Functionality8.5/10
Maintenance7.5/10
Security8/10
Uniqueness8.5/10
Usefulness8.5/10

// DETAILS

Categoryapi
Versionv1.0.0
PriceFree
Securitypending