PuppyCoding

Friendly Python & AI tutorials for beginner & intermediate programmers.


How to Print ChatGPT API Response as a Stream

Two furry monsters trying to work a typewriter. This is supposed to represent streaming output.

When using the ChatGPT API, sometimes we want to print the output as we receive it, as a stream, rather than waiting a few seconds for the full response. Here’s how, in 5 steps:

  1. Tweak your ChatGPT request
  2. Print in chunks
  3. Remove extra line breaks
  4. Print as a real-time stream
  5. Remove weird ending characters

1. Tweak your ChatGPT request

This first step is super-easy. Simply add stream = True to your ChatGPT request, alongside the model and messages parameters, like this:

result = openai.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages = [
        {
            "role": "user",
            "content": prompt
        }
    ],
    stream = True # Add this property to stream output.
)

2. Print in chunks

The stream = True parameter means the response from ChatGPT will be a list of results rather than a single result, so the first printing step is to print within a loop, like this:

# Print the ChatGPT response within a loop.
for chunk in result:
    print(chunk)

Printing a whole chunk will give you an object full of info (most of it we don’t need):

ChatCompletionChunk(
id = 'chatcmpl-8pWEQvvDUnE6g2GBn2tAvo0vDvL3A',
choices = [
Choice(
delta = ChoiceDelta(
content = 'Because',
function_call = None,
role = 'assistant',
tool_calls = None
),
finish_reason = None,
index = 0,
logprobs = None
)
],
created = 1707289318, model = 'gpt-3.5-turbo-0613', object = 'chat.completion.chunk', system_fingerprint = None
)

To get what we want from this object, we need to drill down into choices -> delta -> content, so let’s print that:

for chunk in result:
   
 print(chunk.choices[0].delta.get.content)

Tip: See that get("content", "") bit? That’s a clever Python trick that’s effectively a compact if statement to prevent errors when things don’t exist. It basically means “get content if it exists, or an empty string if it doesn’t exist”. Perfect!

3. Remove extra line breaks

But we’re not there yet. Each chunk (e.g. a word or punctuation mark) is now printed on a new line, which is the default for the print() function. To prevent Python from adding a new line each time print() is run, add end = "" as a parameter:

for chunk in result:
    print(
        chunk.choices[0].delta.get("content", ""),
        end = "" # This will put all the streamed chunks on one line.
    )

4. Print as a real-time stream

Even with these tweaks, Python still buffers the chunks before printing them, whereas we want them printed as soon as they’re ready. To do this, we add a flush = True parameter to tell Python to flush the buffer with each chunk.

5. Remove weird ending characters

Finally, depending on your terminal you might notice a weird % character at the end of each response. We can get rid of that by just printing a blank line at the end of our program, using print().

Putting it all together

Here’s a full working example. Note that it uses dotenv (a hidden “.env” file) to hide the OpenAI API keys.

import os
from dotenv import load_dotenv
import openai

load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = input("Please enter a question or request: ")

result = openai.chat.completions.create(
    model = "gpt-3.5-turbo",
    messages = [
        {
            "role": "user",
            "content": prompt
        }
    ],
    stream = True # Add this optional property.
)


for chunk in result:
    print(
        chunk.choices[0].delta.get("content", ""),
        end = "",
        flush = True
    )

print()

For more information, the OpenAI cookbook has a page about how to stream ChatGPT completions.



Leave a comment