ElevenLabsの音声クローニングでAI電話受付システムを構築する

ほとんどのAI電話受付がロボット的に聞こえる理由は汎用TTS音声を使っているからです。ElevenLabsのインスタント音声クローニングなら30秒でリアルな声をクローンできます。

クローン音声とTwilioの着信とVAPIを組み合わせて本物の人間のように聞こえるAI受付システムを構築する方法を解説。ElevenLabs→VAPI→Twilioの完全なアーキテクチャをカバーします。

How to Set Up ElevenLabs Voice Cloning for AI Phone Receptionists - DEV Community

• Originally published at callstack.tech

How to Set Up ElevenLabs Voice Cloning for AI Phone Receptionists

How to Set Up ElevenLabs Voice Cloning for AI Phone Receptionists

Most AI receptionists sound robotic because they use generic TTS voices. ElevenLabs instant voice cloning fixes this—clone a real voice in 30 seconds, then route Twilio inbound calls through VAPI with that cloned voice as your assistant. Result: callers hear a consistent, professional receptionist instead of a synthesized bot. Setup: ElevenLabs API key + voice ID + VAPI assistant config + Twilio webhook. Production-ready in under 10 minutes.

You need active accounts with three services: ElevenLabs (voice cloning), Twilio (phone infrastructure), and VAPI (orchestration). Generate API keys from each dashboard—store them in .env files, never hardcode them. ElevenLabs requires a paid tier (Starter or higher) to access voice cloning; free tier blocks instant voice cloning features.

Node.js 16+ with npm or yarn. A machine with at least 512MB free RAM for session management. HTTPS endpoint (ngrok or production domain) for webhook callbacks—Twilio and VAPI reject HTTP.

For professional voice stability, provide 1-2 minute reference audio samples in WAV or MP3 format (16kHz mono, noise-free). Background noise degrades cloning quality significantly.

Credentials to Gather

ElevenLabs API key and Voice ID (generated after cloning)

Twilio Account SID, Auth Token, and phone number

VAPI API key and assistant configuration access

VAPI: Get Started with VAPI → Get VAPI

Step-by-Step Tutorial

Configuration & Setup

Voice cloning breaks when you skip the recording quality check. ElevenLabs requires noise-free audio samples (minimum 1 minute, ideally 5-10 minutes) recorded at 44.1kHz or higher. Background hum, keyboard clicks, or mouth sounds will degrade voice stability below 70% - making your AI receptionist sound robotic.

Critical environment variables:

// .env - Production secrets

VAPI_API_KEY=your_vapi_private_key

ELEVENLABS_API_KEY=your_elevenlabs_api_key

TWILIO_ACCOUNT_SID=your_twilio_sid

TWILIO_AUTH_TOKEN=your_twilio_token

TWILIO_PHONE_NUMBER=+1234567890

WEBHOOK_SECRET=generate_random_32_char_string

Enter fullscreen mode

Install dependencies for webhook handling and voice synthesis:

npm install express body-parser dotenv node-fetch

Enter fullscreen mode

A[Caller] -->|Dials Number| B[Twilio]

B -->|Webhook POST| C[Your Server]

C -->|Create Assistant| D[VAPI]

D -->|Voice Config| E[ElevenLabs API]

E -->|Cloned Voice Audio| D

Enter fullscreen mode

The flow separates responsibilities: Twilio handles telephony, VAPI manages conversation state, ElevenLabs synthesizes cloned voice. Your server bridges them via webhooks. Do NOT configure VAPI to call ElevenLabs directly AND build server-side synthesis - this creates double audio where the bot talks over itself.

Step-by-Step Implementation

Step 1: Clone the target voice in ElevenLabs

Record clean audio samples (no background noise, consistent tone). Upload to ElevenLabs dashboard → Voice Lab → Add Instant Voice Clone. Note the voice_id - you'll need this for VAPI configuration.

Step 2: Configure VAPI assistant with cloned voice

// assistantConfig.js - VAPI assistant with ElevenLabs voice

const assistantConfig = {

systemPrompt: "You are a professional receptionist for Acme Corp. Greet callers warmly, ask how you can help, and route calls appropriately."

voiceId: "your_cloned_voice_id_here", // From ElevenLabs Voice Lab

stability: 0.75, // Higher = more consistent, lower = more expressive

similarityBoost: 0.85, // Higher = closer to original voice

model: "eleven_turbo_v2" // Lowest latency for phone calls

provider: "deepgram",

model: "nova-2-phonecall",

firstMessage: "Thank you for calling Acme Corp. How may I assist you today?"

module.exports = assistantConfig;

Enter fullscreen mode

Step 3: Set up webhook server for Twilio integration

// server.js - Express webhook handler

const express = require('express');

const bodyParser = require('body-parser');

const fetch = require('node-fetch');

require('dotenv').config();

const app = express();

app.use(bodyParser.json());

app.use(bodyParser.urlencoded({ extended: true }));