Building an end-to-end voice outreach pipeline in n8n: GPT-4o scripts, ElevenLabs cloning, and the tuning parameters I got wrong first
How I built an n8n workflow that turns a CRM lead into a personalized voiced voicemail in ~15 seconds — and what I learned about ElevenLabs stability, similarity boost, and why the defaults sound robotic.
The idea sounds simple: a lead enters your CRM, your founder's cloned voice leaves them a personalized voicemail 15 seconds later. The execution involves more decisions than that makes it sound.
The pipeline shape
The n8n workflow is webhook-triggered. A POST to /new-lead kicks off:
-
Normalize the lead payload — CRMs ship inconsistent shapes. HubSpot uses
firstName, Pipedrive usesfirst_name, Apollo usesfirst. A single Code node normalizes everything before anything downstream touches it. Fail fast on missing required fields rather than letting a malformed prompt reach GPT-4o. -
Enrich via Apollo (soft-fail) — If the lead comes in without a title or company context, Apollo fills the gap. The node uses
neverError: trueand an 8-second timeout. Enrichment is nice-to-have; the workflow must not block on a third-party rate limit. -
Build the prompt in JS — Prompt construction lives in a Code node, not expression syntax. The voicemail needs conditional logic based on seniority and industry that's unreadable as
{{ }}expressions. Keeping the prompt next to the data shape that feeds it makes it maintainable. -
GPT-4o generates the script — Target: 70–95 words. Not less (too abrupt), not more (ElevenLabs charges per character, and a 300-word script triples cost while breaking the 30-second voicemail format). The prompt enforces this hard.
-
Parse and validate — Word count check before any TTS spend is committed. A malformed GPT response gets routed to the error branch, not to ElevenLabs.
-
ElevenLabs synthesizes with a cloned voice — The audio comes back as binary. n8n handles it natively via the binary item channel.
-
S3 upload → Airtable log → Slack notification — The last three run in parallel branches. The webhook responder waits only on Airtable (source of truth). Slack is fire-and-forget.
The voice tuning problem
This is the part I got wrong the first time.
ElevenLabs has four settings that matter: stability, similarity_boost, style, and use_speaker_boost. The defaults feel safe. They produce a voice that sounds like a well-rehearsed podcast intro — which is exactly wrong for a sales voicemail.
The settings I ended up with, and why:
{
"stability": 0.45,
"similarity_boost": 0.85,
"style": 0.20,
"use_speaker_boost": true
}
Stability 0.45 — At 0.7+, the voice sounds practiced and flat. At 0.2, it gets emotionally exaggerated in a distracting way. 0.45 sits in the warm-but-conversational zone where a voicemail should live.
Similarity boost 0.85 — Below 0.7, a cloned voice starts to drift toward generic. 0.85 keeps it recognizably this person without amplifying recording artifact noise from the source clips.
Style 0.20 — Style exaggeration above 0.3 adds a performative quality. Fine for audiobooks, wrong for a 25-second message. 0.2 keeps it natural.
Speaker boost on — Measurable improvement on cloned voice tracks at the cost of ~5% more latency. Worth it.
I validated these through a blind A/B listener test across 12 parameter combinations with a five-person panel. The losing configurations either sounded like a corporate IVR or an over-excited podcast host.
Cost per execution
| Component | Per run | |---|---| | GPT-4o (~600 in / ~120 out tokens) | ~$0.0036 | | ElevenLabs turbo_v2_5 (~600 chars) | ~$0.18 | | Apollo enrichment (1 call) | ~$0.01 | | S3 + bandwidth | <$0.001 | | Total | ≈ $0.20 |
Two optimizations baked in: turbo_v2_5 instead of multilingual_v2 (50% cheaper, near-identical quality on English), and Apollo is skipped entirely if the CRM payload already contains a title.
What n8n makes easy and what it doesn't
The Code node is the escape hatch. When an HTTP node's expression syntax gets unreadable, dropping into a Code node with proper JS is always the right call. Readable code beats a 4-line {{ }} mustache nightmare.
What n8n makes harder: binary file handling is not obvious. The ElevenLabs response comes back as audio bytes, and getting those into a named S3 upload required understanding how n8n's binary item channel works. Once it clicked, it was clean — but the documentation is sparse on this path.
The error sub-workflow posts every failure to #automation-alerts with the failing node name and a direct execution link. That's table stakes for anything running in production.