How to Get Voice Output With Python: An ElevenLabs API Tutorial

For my AI-generated Japan Daily News podcast I use ElevenLabs and have been amazed at the quality of the voice output. It may not be the cheapest text-to-speech API but in my testing it seems to be the best (at the time of writing).

Firstly, you don’t need an API key to get started. Just start coding and playing, and eventually you’ll get a message from the API that you need to sign up to continue. By then, you should have become familiar with how it works.

Making Python talk

The Python package provided by ElevenLabs is easy to use, so let’s start there. Install the elevenlabs package and then import it in a new Python file.

$ pip install elevenlabs

import elevenlabs

Speech creation is done in two parts:

Generate the audio
Play (or save) the audio

The elevenlabs module contains a generate() function which takes at least two self-explanatory arguments: text and voice

audio = elevenlabs.generate(
    text = "Hi, I'm from the future!",
    voice = "Bella"
)

The voice argument can be a voice name, voice ID, or voice object. (Jump below to see premade ElevenLabs voice IDs and a video of voice samples.)

To hear the result, pass the generated audio to the play() function:

elevenlabs.play(audio)

If you want to output this audio as an MP3 file, use the save() function:

elevenlabs.save(audio, "audio.mp3")

I love that it’s so logical!

Tweaking the voice settings

* Note: The VoiceSettings class doesn’t seem to work in some software such as PyCharm, so just use the voice name or voice ID only.

Now it’s time to take it up a notch by controlling how expressive the voice is. Although there are two settings available – stability and similarity_boost – I’ve found that only stability really makes a difference to how the output sounds. In any case, you can’t set only one value so here’s how to set both of them.

Firstly we need to create a voice object, specifying the voice ID and the two setting values we want to use. The range for both settings is 0 to 1. Having a stability setting of 1 makes the output sound quite boring, whereas a setting of 0 makes the speaker sound excited and emotional. Usually somewhere near the default of 0.5 is best but it’s fun to play with the extremes!

voice = elevenlabs.Voice(
    voice_id = "ZQe5CZNOzWyzPSCn5a3c",
    settings = elevenlabs.VoiceSettings(
        stability = 0.3, # Lower is more expressive.
        similarity_boost = 0.75
    )
)

Once you have your voice object, you can play or save it with the same syntax as before. Here’s the working code in full.

import elevenlabs

voice = elevenlabs.Voice(
    voice_id = "ZQe5CZNOzWyzPSCn5a3c",
    settings = elevenlabs.VoiceSettings(
        stability = 0,
        similarity_boost = 0.75
    )
)

audio = elevenlabs.generate(
    text = "Hi, I'm from the future!",
    voice = voice
)

elevenlabs.play(audio)
elevenlabs.save(audio, "audio.mp3")

Adding your API key

Finally, once you hit the trial limits you can add your ElevenLabs API key with the built-in function like this:

elevenlabs.set_api_key("my-api-key")

I recommend you use environment variables to hide your API key instead of putting it directly in your code.

It’s really fun to play with this technology and easier than I expected, so give it a go and see the ElevenLabs Python project for more documentation and examples.

Reference: ElevenLabs Voice IDs & Samples

Adam: pNInz6obpgDQGcFmaJgB
Antoni: ErXwobaYiN019PkySvjV
Arnold: VR6AewLTigWG4xSOukaG
Bella: EXAVITQu4vr4xnSDxMaL
Callum: N2lVS1w4EtoT3dr4eOWO
Charlie: IKne3meq5aSn9XLyUdCD
Charlotte: XB0fDUnXU5powFXDhCwa
Clyde: 2EiwWnXFnvU5JabPnv8n
Daniel: onwK4e9ZLuTAKqWW03F9
Dave: CYw3kZ02Hs0563khs1Fj
Domi: AZnzlk1XvdvUeBnXmlld
Dorothy: ThT5KcBeYPX3keUQqHPh
Elli: MF3mGyEYCl7XYWbV9V6O
Emily: LcfcDJNUP1GQjkzn1xUU
Ethan: g5CIjZEefAph4nQFvHAz
Fin: D38z5RcWu1voky8WS1ja
Freya: jsCqWAovK2LkecY7zXl4
Gigi: jBpfuIE2acCO8z3wKNLl
Giovanni: zcAOhNBS3c14rBihAFp1
Glinda: z9fAnlkpzviPz146aGWa
Grace: oWAxZDx7w5VEj9dCyTzz
Harry: SOYHLrjzK2X1ezoPC6cr
James: ZQe5CZNOzWyzPSCn5a3c
Jeremy: bVMeCyTHy58xNoL34h3p
Jessie: t0jbNlBVZ17f02VDIeMI
Joseph: Zlb1dXrM653N07WRdFW3
Josh: TxGEqnHWrfWFTfGW9XjX
Liam: TX3LPaxmHKxFdv7VOQHJ
Matilda: XrExE9yKIg1WjnnlVkGX
Matthew: Yko7PKHZNXotIFUBG7I9
Michael: flq6f7yk4E4fJM5XTYuZ
Mimi: zrHiDhphv9ZnVXBqCLjz
Nicole: piTKgcLEGmPE4e6mEKli
Patrick: ODq5zmih8GrVes37Dizd
Rachel: 21m00Tcm4TlvDq8ikWAM
Ryan: wViXBPUzp2ZZixB1xQuM
Sam: yoZ06aMxZJJ28mfd3POQ
Serena: pMsXgVXv3BLzUgSXRplE
Thomas: GBv7mTt0atIp3Br8iCZE

2 responses to “How to Get Voice Output With Python: An ElevenLabs API Tutorial”

How to use custom voices with the ElevenLabs API – PuppyCoding

October 9, 2023 at 3:15 pm

[…] my basic ElevenLabs API tutorial, a couple of people have asked how to use custom voices with the […]

LikeLike

How to use the ElevenLabs API: Python text-to-speech tutorial with examples – gofullday.com

February 27, 2024 at 11:40 am

[…] How to Get Voice Output With Python: An ElevenLabs API Tutorial […]

LikeLike

PuppyCoding