Exploring Image Processing with ChatGPT
From Vision to Creation: Using ChatGPT to Describe and Generate Images
FROM THE CTO'S DESK
Milena Georgieva
3/6/20254 min read
From Vision to Creation: Using ChatGPT to Describe and Generate Images
For some time now, I have been working on a project that requires extensive knowledge and skills. It’s a challenging project, and since challenges are fun, here is something I have been experimenting with for a while.
If you are one of those people who enjoy exploring new technologies, I am sure you have gained a lot of experience with large language models (LLMs) and have asked them a variety of questions. You have probably also noticed that these models are not always reliable and have heard of their hallucinations—cases where the AI confidently generates incorrect or misleading information.
If you started using ChatGPT right after the announcement of its first publicly available version, you surely remember how limited it was. For example, while you could paste an Excel sheet, processing it was a challenging task. Despite your prompts, the model struggled to determine what needed to be summed up or which columns should be used. Images? Well, that was an entirely different story. It couldn’t "understand" images at all.
However, this has now changed. You can simply paste an image in the prompt and ask the model to describe it for you.
For my project, I came up with the idea of passing images to the model and asking it to generate a description of the data. Of course, as a developer, I don’t read documentation. Instead, I simply come up with an idea and search for someone who has already done it. Yes, that’s how the world works! 😊
To my surprise, it was not only possible to send images to ChatGPT but also to generate images. And so, here we go again!
A Quick Clarification Before We Dive In
We need to use gpt-4-turbo because it is the only version capable of processing images!
Sending an Image to ChatGPT
Code Example
import openai
# Set your OpenAI API key
openai.api_key = 'sk-ADD_YOUR_KEY_HERE'
# Use a publicly accessible image URL (not a local file)
try:
# Use OpenAI's Vision API (GPT-4 Turbo with Vision)
response = openai.ChatCompletion.create(
model="gpt-4-turbo", # Ensure this is the correct model name
messages=[
{"role": "system", "content": "You are an AI assistant that can analyze images."},
{
"role": "user",
"content": [
{"type": "text", "text": "What do you see in this image?"}, # Use "text" instead of "content"
{"type": "image_url", "image_url": {"url": image_url}},
],
},
],
)
# Print AI's response
print(response["choices"][0]["message"]["content"])
except openai.error.InvalidRequestError as e:
print(f"InvalidRequestError: {e}")
except Exception as e:
print(f"An error occurred: {e}")
What Does the Above Code Do?
To make any API call using ChatGPT, we first need to import the openai library. As you might guess, this library provides the functionality to build applications using OpenAI’s API.
If you’ve ever created connections before the API era, you know that in the past, you had to manually provide a username, password, communication port, IP address, and more. Now, all of that complexity is hidden within the library.
You simply create your request. In this case, we provide an API key (which you should already know why it’s needed), and then we construct the API call.
We use messages to define the role (if you’ve followed our previous publications, you’re already aware of the available roles) that ChatGPT will play. We also provide content, which can be either text or an image that will be sent to ChatGPT for processing.
JSON Structure for Sending an Image
Code:
{
"role": "user",
"content": [
{"type": "text", "content": "What do you see in this image?"},
{"type": "image_url", "image_url": "https://example.com/image.jpg"}
]
}
The code above allows you to send an image located at a specific URL. Right before the image, we define a question for ChatGPT.
And that’s the goal of our test today! We want to send results from a machine learning algorithm, submit our prompt request, and receive a response that we can further process in our application.
Example Response from ChatGPT
"This image shows a lush forest floor densely carpeted with white flowers, likely snowdrops (Galanthus species), which bloom early in spring. The scene is set in a wooded area with various trees, some of which have ivy climbing on them. The varied density of the flowers creates a textured appearance across the landscape. The sunlight filtering through the canopy highlights portions of the carpet of flowers, enhancing the natural beauty of the scene."
The Cost Factor: Is It Worth It?
💰 One request to gpt-4-turbo costs $0.01 USD.
Yes, that’s quite expensive, but this is the only version that can process images at this stage.
🛑 Use it wisely, and ensure that your customers are aware of the pricing, especially if they plan to process a large volume of images.
Generating Images with ChatGPT
Now, let’s reverse the example. Instead of sending an image, we will generate one using the ChatGPT API and save it locally.
Code Example
import openai
import requests
import os
# Set your OpenAI API key
openai.api_key = 'ADD_YOUR_API_KEY_HERE'
# Send request to OpenAI's API to generate an image
response = openai.Image.create(
model="dall-e-3",
prompt="A futuristic cyberpunk city at night with neon lights",
n=1, # Number of images to generate
size="1024x1024" # Choose from: "256x256", "512x512", "1024x1024"
)
# Get the image URL from the API response
image_url = response["data"][0]["url"]
print("Generated Image URL:", image_url)
# Download the image
image_data = requests.get(image_url).content
# Define the Downloads folder path
downloads_path = os.path.expanduser("~/Downloads/generated_image.png")
# Save the image to the Downloads folder
with open(downloads_path, "wb") as file:
file.write(image_data)
print(f"Image saved at: {downloads_path}")
Final Thoughts
If you’re wondering why we need the os library—it simply helps us save/write the generated image onto the hard drive.
💡 Hint: The generated image was used as the header for this publication.
Things to Keep in Mind
✔ Generating images is expensive.
✔ You need to develop a cost-effective pricing model if you plan to offer this feature to customers.
✔ You can specify the number (n=1) and size (1024x1024) of the generated images.
✔ You retrieve the image using:
python
CopyEdit
image_url = response["data"][0]["url"]
✔ You download and save the image locally using:
python
CopyEdit
image_data = requests.get(image_url).content
Is It Worth It?
✅ Yes, it’s a powerful feature!
❌ But, it's quite expensive.
Use it wisely, and always inform your users about the cost before implementing it in production.
🚀 Happy experimenting! 🚀