When using the ChatGPT API, sometimes we want to print the output as we receive it, as a stream, rather than waiting a few seconds for the full response. Here’s how, in 5 steps:
- Tweak your ChatGPT request
- Print in chunks
- Remove extra line breaks
- Print as a real-time stream
- Remove weird ending characters
1. Tweak your ChatGPT request
This first step is super-easy. Simply add stream = True to your ChatGPT request, alongside the model and messages parameters, like this:
result = openai.chat.completions.create(
model = "gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": prompt
}
],
stream = True # Add this property to stream output.
)
2. Print in chunks
The stream = True parameter means the response from ChatGPT will be a list of results rather than a single result, so the first printing step is to print within a loop, like this:
# Print the ChatGPT response within a loop.
for chunk in result:
print(chunk)
Printing a whole chunk will give you an object full of info (most of it we don’t need):
ChatCompletionChunk(
id = 'chatcmpl-8pWEQvvDUnE6g2GBn2tAvo0vDvL3A',
choices = [
Choice(
delta = ChoiceDelta(
content = 'Because',
function_call = None,
role = 'assistant',
tool_calls = None
),
finish_reason = None,
index = 0,
logprobs = None
)
],
created = 1707289318, model = 'gpt-3.5-turbo-0613', object = 'chat.completion.chunk', system_fingerprint = None
)
To get what we want from this object, we need to drill down into choices -> delta -> content, so let’s print that:
for chunk in result:
print(chunk.choices[0].delta.get.content)
Tip: See that get("content", "") bit? That’s a clever Python trick that’s effectively a compact if statement to prevent errors when things don’t exist. It basically means “get content if it exists, or an empty string if it doesn’t exist”. Perfect!
3. Remove extra line breaks
But we’re not there yet. Each chunk (e.g. a word or punctuation mark) is now printed on a new line, which is the default for the print() function. To prevent Python from adding a new line each time print() is run, add end = "" as a parameter:
for chunk in result:
print(
chunk.choices[0].delta.get("content", ""),
end = "" # This will put all the streamed chunks on one line.
)
4. Print as a real-time stream
Even with these tweaks, Python still buffers the chunks before printing them, whereas we want them printed as soon as they’re ready. To do this, we add a flush = True parameter to tell Python to flush the buffer with each chunk.
5. Remove weird ending characters
Finally, depending on your terminal you might notice a weird % character at the end of each response. We can get rid of that by just printing a blank line at the end of our program, using print().
Putting it all together
Here’s a full working example. Note that it uses dotenv (a hidden “.env” file) to hide the OpenAI API keys.
import os
from dotenv import load_dotenv
import openai
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
prompt = input("Please enter a question or request: ")
result = openai.chat.completions.create(
model = "gpt-3.5-turbo",
messages = [
{
"role": "user",
"content": prompt
}
],
stream = True # Add this optional property.
)
for chunk in result:
print(
chunk.choices[0].delta.get("content", ""),
end = "",
flush = True
)
print()
For more information, the OpenAI cookbook has a page about how to stream ChatGPT completions.

Leave a comment