PuppyCoding

Friendly Python & AI tutorials for beginner & intermediate programmers.


How to Get Voice Output With Python: An ElevenLabs API Tutorial

A friendly monster shouting or singing into a microphone.

For my AI-generated Japan Daily News podcast I use ElevenLabs and have been amazed at the quality of the voice output. It may not be the cheapest text-to-speech API but in my testing it seems to be the best (at the time of writing).

Firstly, you don’t need an API key to get started. Just start coding and playing, and eventually you’ll get a message from the API that you need to sign up to continue. By then, you should have become familiar with how it works.

Making Python talk

The Python package provided by ElevenLabs is easy to use, so let’s start there. Install the elevenlabs package and then import it in a new Python file.

$ pip install elevenlabs
import elevenlabs

Speech creation is done in two parts:

  1. Generate the audio
  2. Play (or save) the audio

The elevenlabs module contains a generate() function which takes at least two self-explanatory arguments: text and voice

audio = elevenlabs.generate(
    text = "Hi, I'm from the future!",
    voice = "Bella"
)

The voice argument can be a voice name, voice ID, or voice object. (Jump below to see premade ElevenLabs voice IDs and a video of voice samples.)

To hear the result, pass the generated audio to the play() function:

elevenlabs.play(audio)

If you want to output this audio as an MP3 file, use the save() function:

elevenlabs.save(audio, "audio.mp3")

I love that it’s so logical!

Tweaking the voice settings

* Note: The VoiceSettings class doesn’t seem to work in some software such as PyCharm, so just use the voice name or voice ID only.

Now it’s time to take it up a notch by controlling how expressive the voice is. Although there are two settings available – stability and similarity_boost – I’ve found that only stability really makes a difference to how the output sounds. In any case, you can’t set only one value so here’s how to set both of them.

Firstly we need to create a voice object, specifying the voice ID and the two setting values we want to use. The range for both settings is 0 to 1. Having a stability setting of 1 makes the output sound quite boring, whereas a setting of 0 makes the speaker sound excited and emotional. Usually somewhere near the default of 0.5 is best but it’s fun to play with the extremes!

voice = elevenlabs.Voice(
    voice_id = "ZQe5CZNOzWyzPSCn5a3c",
    settings = elevenlabs.VoiceSettings(
        stability = 0.3, # Lower is more expressive.
        similarity_boost = 0.75
    )
)

Once you have your voice object, you can play or save it with the same syntax as before. Here’s the working code in full.

import elevenlabs

voice = elevenlabs.Voice(
    voice_id = "ZQe5CZNOzWyzPSCn5a3c",
    settings = elevenlabs.VoiceSettings(
        stability = 0,
        similarity_boost = 0.75
    )
)

audio = elevenlabs.generate(
    text = "Hi, I'm from the future!",
    voice = voice
)

elevenlabs.play(audio)
elevenlabs.save(audio, "audio.mp3")

Adding your API key

Finally, once you hit the trial limits you can add your ElevenLabs API key with the built-in function like this:

elevenlabs.set_api_key("my-api-key")

I recommend you use environment variables to hide your API key instead of putting it directly in your code.

It’s really fun to play with this technology and easier than I expected, so give it a go and see the ElevenLabs Python project for more documentation and examples.


Reference: ElevenLabs Voice IDs & Samples

  • Adam: pNInz6obpgDQGcFmaJgB
  • Antoni: ErXwobaYiN019PkySvjV
  • Arnold: VR6AewLTigWG4xSOukaG
  • Bella: EXAVITQu4vr4xnSDxMaL
  • Callum: N2lVS1w4EtoT3dr4eOWO
  • Charlie: IKne3meq5aSn9XLyUdCD
  • Charlotte: XB0fDUnXU5powFXDhCwa
  • Clyde: 2EiwWnXFnvU5JabPnv8n
  • Daniel: onwK4e9ZLuTAKqWW03F9
  • Dave: CYw3kZ02Hs0563khs1Fj
  • Domi: AZnzlk1XvdvUeBnXmlld
  • Dorothy: ThT5KcBeYPX3keUQqHPh
  • Elli: MF3mGyEYCl7XYWbV9V6O
  • Emily: LcfcDJNUP1GQjkzn1xUU
  • Ethan: g5CIjZEefAph4nQFvHAz
  • Fin: D38z5RcWu1voky8WS1ja
  • Freya: jsCqWAovK2LkecY7zXl4
  • Gigi: jBpfuIE2acCO8z3wKNLl
  • Giovanni: zcAOhNBS3c14rBihAFp1
  • Glinda: z9fAnlkpzviPz146aGWa
  • Grace: oWAxZDx7w5VEj9dCyTzz
  • Harry: SOYHLrjzK2X1ezoPC6cr
  • James: ZQe5CZNOzWyzPSCn5a3c
  • Jeremy: bVMeCyTHy58xNoL34h3p
  • Jessie: t0jbNlBVZ17f02VDIeMI
  • Joseph: Zlb1dXrM653N07WRdFW3
  • Josh: TxGEqnHWrfWFTfGW9XjX
  • Liam: TX3LPaxmHKxFdv7VOQHJ
  • Matilda: XrExE9yKIg1WjnnlVkGX
  • Matthew: Yko7PKHZNXotIFUBG7I9
  • Michael: flq6f7yk4E4fJM5XTYuZ
  • Mimi: zrHiDhphv9ZnVXBqCLjz
  • Nicole: piTKgcLEGmPE4e6mEKli
  • Patrick: ODq5zmih8GrVes37Dizd
  • Rachel: 21m00Tcm4TlvDq8ikWAM
  • Ryan: wViXBPUzp2ZZixB1xQuM
  • Sam: yoZ06aMxZJJ28mfd3POQ
  • Serena: pMsXgVXv3BLzUgSXRplE
  • Thomas: GBv7mTt0atIp3Br8iCZE


2 responses to “How to Get Voice Output With Python: An ElevenLabs API Tutorial”

  1. […] my basic ElevenLabs API tutorial, a couple of people have asked how to use custom voices with the […]

    Like

Leave a comment