### [Voicebox：开源AI语音工作室，本地替代ElevenLabs+WisprFlow，34K+Stars让AI开口说话](https://www.willai.cc/article/2666)

**Published:** 2026-06-25T23:38:36

**Author:** hiyoho

**Excerpt:** Voicebox 是开源的 AI 语音工作室，提供本地运行的语音生成和语音输入解决方案，支持7种TTS引擎、23种语言、MCP协议集成，MIT许可。

.ai-article { max-width: 780px; margin: 0 auto; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif; color: #1a1a2e; line-height: 1.8; } .ai-article h2 { color: #ff6b35; border-left: 4px solid #ff6b35; padding-left: 12px; margin: 2em 0 1em; font-size: 1.4em; } .ai-article h3 { color: #4a4a6a; margin: 1.5em 0 0.8em; font-size: 1.15em; } .ai-article p { margin: 0.8em 0; } .ai-article .hero-banner { background: linear-gradient(135deg, #1a1a2e 0%, #8a2be2 100%); border-radius: 12px; padding: 32px; color: white; margin: 1.5em 0; text-align: center; } .ai-article .hero-banner h1 { margin: 0 0 0.3em; font-size: 1.6em; } .ai-article .hero-banner .subtitle { color: #ff9a65; font-size: 1.05em; margin: 0; } .ai-article .stats-grid { display: grid; grid-template-columns: repeat(auto-fit, minmax(140px, 1fr)); gap: 12px; margin: 1.5em 0; } .ai-article .stat-card { background: #f8f9ff; border: 1px solid #e8e8f8; border-radius: 8px; padding: 16px; text-align: center; } .ai-article .stat-card .num { font-size: 1.6em; font-weight: 700; color: #ff6b35; } .ai-article .stat-card .label { font-size: 0.85em; color: #666; margin-top: 4px; } .ai-article .feature-card { background: #fff8f4; border-left: 4px solid #ff6b35; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 1em 0; } .ai-article .feature-card h3 { margin-top: 0; color: #ff6b35; } .ai-article .engine-table { width: 100%; border-collapse: collapse; margin: 1.5em 0; font-size: 0.95em; } .ai-article .engine-table th { background: #1a1a2e; color: white; padding: 10px 14px; text-align: left; } .ai-article .engine-table td { padding: 10px 14px; border-bottom: 1px solid #eee; } .ai-article .engine-table tr:nth-child(even) td { background: #f8f9ff; } .ai-article .tag-badge { display: inline-block; background: #ff6b35; color: white; border-radius: 12px; padding: 2px 10px; font-size: 0.8em; margin: 2px; } .ai-article .scenario-box { background: #f0f4ff; border: 1px solid #c8d4f8; border-radius: 8px; padding: 16px 20px; margin: 1em 0; } .ai-article .scenario-box h3 { margin-top: 0; color: #4a6cf7; } .ai-article .code-block { background: #1a1a2e; color: #a8f0c6; border-radius: 8px; padding: 16px 20px; font-family: 'Fira Code', monospace; font-size: 0.9em; overflow-x: auto; margin: 1em 0; } .ai-article .recommend-box { background: linear-gradient(135deg, #fff8f4, #f0f4ff); border: 2px solid #ff6b35; border-radius: 12px; padding: 20px 24px; margin: 1.5em 0; } .ai-article .recommend-box h2 { color: #ff6b35; border: none; padding: 0; margin: 0 0 0.8em; } .ai-article .download-links { display: flex; flex-wrap: wrap; gap: 12px; margin: 1.5em 0; } .ai-article .download-links a { background: #ff6b35; color: white; border-radius: 8px; padding: 10px 20px; text-decoration: none; font-weight: 600; transition: 0.2s; } .ai-article .download-links a:hover { background: #e55a28; } .ai-article .download-links a.secondary { background: #4a4a6a; } .ai-article .download-links a.secondary:hover { background: #333; } .ai-article img { max-width: 100%; border-radius: 8px; margin: 1em 0; } .ai-article .divider { height: 2px; background: linear-gradient(90deg, #ff6b35, transparent); margin: 2em 0; }

# 🎙️ Voicebox：开源 AI 语音工作室

免费替代 ElevenLabs + WisprFlow 的全栈 AI 语音解决方案，34K+ Stars，MIT 许可

34K+

GitHub Stars

7种

TTS 引擎

23

支持语言

500+  
开发者

关注者

MIT

开源许可

## 📌 项目简介

**Voicebox** 是一个开源的 AI 语音工作室，由独立开发者 jamiepine 打造，旨在提供完全本地运行的 AI 语音解决方案。它将”语音生成（替代 ElevenLabs）”和”语音输入（替代 WisprFlow）”二合一，所有模型和数据完全在本地运行，无需上传云端，是隐私优先的 AI 语音工具首选。

项目基于 Tauri (Rust) 桌面端 + React/TypeScript 前端 + FastAPI Python 后端架构，支持 macOS、Windows 和 Docker 部署，内置 7 种 TTS 引擎、Whisper STT、本地 Qwen3 LLM，并原生支持 MCP 协议，让 AI 智能体也能”开口说话”。

## ⚙️ 安装要求和过程

### 环境要求

-   **macOS**：Apple Silicon (M1+) 或 Intel Mac，推荐 16GB 内存
-   **Windows**：Windows 10+，支持 CUDA GPU 加速（NVIDIA）或 DirectML（任意 GPU）
-   **Linux**：从源码构建，支持 CUDA/ROCm GPU 加速
-   **通用**：Python 3.11+，Rust（开发构建），Bun (JS 运行时)

### 快速安装（预编译包）

\# macOS (Apple Silicon)  
curl -L https://voicebox.sh/download/mac-arm -o Voicebox.dmg

\# macOS (Intel)  
curl -L https://voicebox.sh/download/mac-intel -o Voicebox.dmg

\# Windows  
\# 下载 MSI：https://voicebox.sh/download/windows

\# Docker 一键启动  
git clone https://github.com/jamiepine/voicebox.git  
cd voicebox  
docker compose up

### 从源码开发构建

\# 克隆仓库  
git clone https://github.com/jamiepine/voicebox.git  
cd voicebox

\# 安装 just 命令工具（任务运行器）  
brew install just # macOS  
\# 或 cargo install just

\# 一键安装依赖并启动开发服务器  
just setup  
just dev

\# 构建生产版本  
just build # CPU 版本  
just build-local # Windows + CUDA 版本

## 🌟 核心功能

### 🎤 7 种 TTS 引擎，覆盖全场景

Voicebox 集成了 7 种开源 TTS 引擎，从超轻量的 Kokoro (82M) 到高质量的 HumeAI TADA (3B)，满足不同场景需求：

| 引擎  | 语言数 | 模型大小 | 核心优势 |
| --- | --- | --- | --- |
| **Qwen3-TTS** | 10  | 0.6B/1.7B | 高质量多语言克隆，支持发音指令 |
| **Chatterbox Multilingual** | 23  | ~1GB | 语言覆盖最广，支持阿拉伯语/芬兰语等 |
| **Chatterbox Turbo** | 英语  | 350M | 超快速度，支持 \[laugh\]/\[sigh\] 表情标签 |
| **Kokoro** | 8   | 82M | 极小模型，CPU 实时 10x+ 速度 |
| **LuxTTS** | 英语  | ~1GB | 48kHz 输出，CPU 150x 实时速度 |
| **HumeAI TADA** | 10  | 1B/3B | 语音语言模型，支持 700s+ 连贯音频 |
| **Qwen CustomVoice** | 10  | –   | 自然语言控制发音，无需参考音频 |

### 🗣️ 语音克隆 + 无限长度生成

支持从几秒音频进行零样本语音克隆，同时内置 Kokoro 和 Qwen CustomVoice 的 50+ 精选预设语音。独创”无限长度生成”机制——自动按句子拆分文本，分块生成后交叉淡入淡出拼接，最大支持 **50,000 字符**的文本输入，彻底打破 TTS 长度限制。

### 🎧 全局语音输入（Dictation）

支持全局热键语音输入，macOS 支持自动粘贴到当前文本框（按住说话/切换模式）。内置 Whisper STT，支持可选 LLM 优化去除口癖、停顿，让语音输入更流畅自然。相当于开源版的 WisprFlow！

### 🤖 AI 智能体语音输出（MCP 支持）

内置本地 MCP 服务器，支持 Claude Code、Cursor、Cline 等 AI 编程助手通过 `voicebox.speak` 工具调用，让 AI 智能体用克隆的语音”开口说话”。支持为不同智能体绑定不同语音，实现个性化语音输出。

\# Claude Code 一键配置 MCP  
claude mcp add voicebox –transport http –url http://127.0.0.1:17493/mcp –header “X-Voicebox-Client-Id: claude-code”

### 🎬 语音故事编辑器 + 音频后处理

内置多轨道时间线编辑器，支持对话、播客、叙事内容制作，支持拖拽、音频裁剪、同步播放。基于 Spotify pedalboard 库提供 8 种音频后处理效果（音调偏移、混响、延迟、合唱、压缩等），并内置”机器人”、”电台”、”回声室”、”低音”4 种预设效果链。

## 💡 典型使用场景

### 场景一：AI 编程助手语音通知

长时间运行的编程任务（如模型训练、测试套件）完成后，通过 Voicebox MCP 集成，让 Claude Code 或 Cursor 用你喜欢的语音播报结果：”测试全部通过，共 42 个用例，耗时 3 分 12 秒”。不用盯着屏幕，声音告诉你进度！

### 场景二：多语言内容创作

使用 Chatterbox Multilingual 引擎（支持 23 种语言），配合语音克隆功能，内容创作者可以用自己（或任何）的声音生成多语言版本的视频配音、播客内容。Qwen3-TTS 还支持输入发音指令（如”慢点说”、”小声说”），让生成语音更自然。

### 场景三：本地隐私优先的语音输入替代

替代 WisprFlow 等云端语音输入工具，所有语音识别和转录均在本地运行（Whisper STT），语音数据不上传任何云端服务器。对隐私敏感的用户、企业内网环境，或者需要离线使用的场景，Voicebox 是最佳选择。

## 💬 推荐理由

## 为什么推荐 Voicebox？

**1\. 隐私优先，本地全栈。**模型、语音数据、录音内容完全本地存储，不依赖任何云服务。对于关注数据隐私的开发者来说，这一点至关重要。

**2\. 二合一解决方案。**一个工具同时替代 ElevenLabs（语音生成）和 WisprFlow（语音输入），不需要订阅两个服务，省心省钱。

**3\. 引擎覆盖全面。**7 种 TTS 引擎从 82M 到 3B 参数，从 CPU 到 GPU 加速，从英语到 23 种语言，几乎覆盖了所有使用场景。

**4\. MCP 原生支持。**AI 智能体生态正在爆发，Voicebox 率先支持 MCP 协议，让 AI 智能体具备语音输出能力，这在开源项目中非常前瞻。

**5\. 活跃开发中。**485 个开放 Issues 说明社区非常活跃，项目在快速迭代。MIT 许可允许自由修改和分发，适合二次开发。

**个人使用感受：**Voicebox 的 MCP 集成体验非常顺滑，配置一次后，Claude Code 就能直接调用语音输出。用它来做长时间编程任务的语音通知，比盯着终端看进度条优雅太多。唯一的小遗憾是 Linux 目前还没有预编译包，需要自己从源码构建。

## 📥 下载地址

[🌐 官方网站](https://voicebox.sh)  
[🐙 GitHub 仓库](https://github.com/jamiepine/voicebox)  
[📖 官方文档](https://docs.voicebox.sh)  
[🍎 macOS 下载](https://voicebox.sh/download/mac-arm)  
[🪟 Windows 下载](https://voicebox.sh/download/windows)

**项目信息：**  
⭐ GitHub Stars: 34,192  
📜 开源许可: MIT License  
💻 技术栈: Tauri (Rust) + React/TypeScript + FastAPI (Python)  
🌐 官网: [voicebox.sh](https://voicebox.sh)  
📦 Docker: `docker compose up`  
最近更新: 2026 年 6 月

**Tags:** AI, AI Agent, AI开源项目, LLM, MCP, Python, TTS, TypeScript, 开源, 语音AI

**Categories:** 开源项目

---