用 ElevenLabs 聲音克隆搭建 AI 電話接待員

大多數 AI 電話接待員聽起來很機器腔,因爲用了通用 TTS 聲音。ElevenLabs 即時聲音克隆可在 30 秒內克隆真實人聲,徹底改變這一現狀。

本教程介紹如何將克隆聲音與 Twilio 入站電話和 VAPI 結合,搭建聽起來像真人的 AI 接待員。完整技術架構:ElevenLabs 聲音克隆 → VAPI 對話引擎 → Twilio 電話路由。

作者在文章中提供了完整的實現代碼和步驟說明,讀者可以按照教程一步步復現。文章結合實際項目經驗,深入淺出地講解了技術原理和實踐中的常見陷阱。評論區也有不少有價值的補充討論,建議對該技術感興趣的開發者深入閱讀原文。

How to Set Up ElevenLabs Voice Cloning for AI Phone Receptionists - DEV Community

• Originally published at callstack.tech

How to Set Up ElevenLabs Voice Cloning for AI Phone Receptionists

How to Set Up ElevenLabs Voice Cloning for AI Phone Receptionists

Most AI receptionists sound robotic because they use generic TTS voices. ElevenLabs instant voice cloning fixes this—clone a real voice in 30 seconds, then route Twilio inbound calls through VAPI with that cloned voice as your assistant. Result: callers hear a consistent, professional receptionist instead of a synthesized bot. Setup: ElevenLabs API key + voice ID + VAPI assistant config + Twilio webhook. Production-ready in under 10 minutes.

You need active accounts with three services: ElevenLabs (voice cloning), Twilio (phone infrastructure), and VAPI (orchestration). Generate API keys from each dashboard—store them in .env files, never hardcode them. ElevenLabs requires a paid tier (Starter or higher) to access voice cloning; free tier blocks instant voice cloning features.

Node.js 16+ with npm or yarn. A machine with at least 512MB free RAM for session management. HTTPS endpoint (ngrok or production domain) for webhook callbacks—Twilio and VAPI reject HTTP.

For professional voice stability, provide 1-2 minute reference audio samples in WAV or MP3 format (16kHz mono, noise-free). Background noise degrades cloning quality significantly.

Credentials to Gather

ElevenLabs API key and Voice ID (generated after cloning)

Twilio Account SID, Auth Token, and phone number

VAPI API key and assistant configuration access

VAPI: Get Started with VAPI → Get VAPI

Step-by-Step Tutorial

Configuration & Setup

Voice cloning breaks when you skip the recording quality check. ElevenLabs requires noise-free audio samples (minimum 1 minute, ideally 5-10 minutes) recorded at 44.1kHz or higher. Background hum, keyboard clicks, or mouth sounds will degrade voice stability below 70% - making your AI receptionist sound robotic.

Critical environment variables:

// .env - Production secrets

VAPI_API_KEY=your_vapi_private_key

ELEVENLABS_API_KEY=your_elevenlabs_api_key

TWILIO_ACCOUNT_SID=your_twilio_sid

TWILIO_AUTH_TOKEN=your_twilio_token

TWILIO_PHONE_NUMBER=+1234567890

WEBHOOK_SECRET=generate_random_32_char_string

Enter fullscreen mode

Install dependencies for webhook handling and voice synthesis:

npm install express body-parser dotenv node-fetch

Enter fullscreen mode

A[Caller] -->|Dials Number| B[Twilio]

B -->|Webhook POST| C[Your Server]

C -->|Create Assistant| D[VAPI]

D -->|Voice Config| E[ElevenLabs API]

E -->|Cloned Voice Audio| D

Enter fullscreen mode

The flow separates responsibilities: Twilio handles telephony, VAPI manages conversation state, ElevenLabs synthesizes cloned voice. Your server bridges them via webhooks. Do NOT configure VAPI to call ElevenLabs directly AND build server-side synthesis - this creates double audio where the bot talks over itself.

Step-by-Step Implementation

Step 1: Clone the target voice in ElevenLabs

Record clean audio samples (no background noise, consistent tone). Upload to ElevenLabs dashboard → Voice Lab → Add Instant Voice Clone. Note the voice_id - you'll need this for VAPI configuration.

Step 2: Configure VAPI assistant with cloned voice

// assistantConfig.js - VAPI assistant with ElevenLabs voice

const assistantConfig = {

systemPrompt: "You are a professional receptionist for Acme Corp. Greet callers warmly, ask how you can help, and route calls appropriately."

voiceId: "your_cloned_voice_id_here", // From ElevenLabs Voice Lab

stability: 0.75, // Higher = more consistent, lower = more expressive

similarityBoost: 0.85, // Higher = closer to original voice

model: "eleven_turbo_v2" // Lowest latency for phone calls

provider: "deepgram",

model: "nova-2-phonecall",

firstMessage: "Thank you for calling Acme Corp. How may I assist you today?"

module.exports = assistantConfig;

Enter fullscreen mode

Step 3: Set up webhook server for Twilio integration

// server.js - Express webhook handler

const express = require('express');

const bodyParser = require('body-parser');

const fetch = require('node-fetch');

require('dotenv').config();

const app = express();

app.use(bodyParser.json());

app.use(bodyParser.urlencoded({ extended: true }));