ElevenLabs is still the go-to for voice cloning in 2026, but the setup process trips people up in the same ways every time. This is a clean walkthrough.
What you need before you start
- An ElevenLabs account (free tier works for testing, but Starter or higher for clones)
- Audio samples: 1-10 minutes of clean speech, no background noise, no music
- The voice owner’s permission if it is not your voice
Step 1: Prepare your audio
This is where most clones fail. Common mistakes:
- Room echo makes the clone sound hollow
- Compression artifacts bleed into the clone
- Multiple speakers confuse the model
Record in a quiet room. Use a decent mic (even a phone in a padded closet beats a bad room). Export as WAV or high-bitrate MP3.
Step 2: Create the clone
- Go to Voices > Add Voice > Voice Cloning
- Upload your samples (multiple shorter files are fine)
- Set a name and description (the description helps with style matching)
- Click Add Voice
The clone generates in under a minute.
Step 3: Test and iterate
Type a test sentence you did NOT include in your training audio. Listen for:
- Pronunciation accuracy
- Natural prosody (rhythm and stress)
- Artifacts or robotic moments
If the quality is low, add more varied samples. Monotone training audio produces monotone clones.
Step 4: Use it in Selendia
Once your voice is created, grab the Voice ID from the ElevenLabs dashboard. You can use it directly in any TTS workflow inside Selendia.
Stability vs Similarity settings
- Higher stability = more consistent, less expressive
- Higher similarity = closer to original, but can amplify imperfections
- For narration: stability 0.6, similarity 0.75 is a solid starting point
Have you tried cloning your own voice? What were your results?
Curated by Selendia AI 🎙️