This tutorial shows how to use the Google Gemini API in Python to analyze and describe images with AI. You'll learn how to load images from the web or local files, send them to the Gemini model using Google Colab, and get smart, readable descriptions as output — all with just a few lines of code.
To begin working with image-based prompts using the Gemini API, the first step is to retrieve an image from an online source. Using Python’s requests library, we can fetch image data directly from a URL and store it in memory. This data will later be sent to the Gemini API for analysis or interaction.
import requests
# URL of the image
image_url = "https://www.go2india.in/upimg/9565.jpg"
# Download image content
response = requests.get(image_url)
image_data = response.content
Once the image is downloaded as binary content, we can use the PIL (Python Imaging Library) module to convert the byte stream into an image object. This allows further manipulation or display of the image in Python. The final print statement is used to preview a small portion of the binary data as a quick test.
from PIL import Image
from io import BytesIO
image = Image.open(BytesIO(image_data))
# for testing check the binary data
print(image_data[:20])
With the image loaded and prepared, the next step is to send it to the Gemini API for analysis. This example uses google.generativeai to configure the API, authenticate using a secure key from the Colab environment, and pass the image along with a prompt asking the model to describe it. The try-except block ensures that errors are handled gracefully, particularly if the API key is missing or the image object is not available.
import google.generativeai as genai
from google.colab import userdata
try:
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
# Initialize the model that supports generateContent
model = genai.GenerativeModel('gemini-2.5-flash')
prompt = ["Describe the image ", image]
response = model.generate_content(prompt)
except Exception as e:
print(f"An error occurred: {e}")
print("Please check your API key and ensure the 'image' variable is defined.")
print(response.text)
output
This vibrant and bustling image captures a grand religious festival, most likely the Rath Yatra (Chariot Festival) in Puri, India, given the distinctive architecture and the large, decorated chariots.
In the foreground and midground, two towering, elaborately decorated chariots dominate the scene. The central chariot, slightly larger and more prominently featured, is predominantly red with bright yellow vertical stripes and intricate patterns, possibly depicting symbols and deities. It has a multi-tiered, conical canopy-like roof topped with a golden finial. The base of this chariot is adorned with colorful fabrics, garlands, and sculptural elements, and is surrounded by a dense crowd of people. Another similar, though partially obscured, chariot stands to its left, also red and yellow with ornate decorations.
The entire lower half of the image is filled with an immense congregation of people, packed tightly around the chariots and extending into the foreground. Many are dressed in traditional Indian attire, with a mix of colorful and light-colored garments. Some individuals are seen climbing wooden ramps leading up to the chariots, while others are on the chariots themselves. Security personnel in khaki uniforms are visible throughout the crowd, attempting to manage the large gathering.
In the background, the distinctive golden shikhara (spire) of a large temple, characteristic of Kalinga architecture and likely the Jagannath Temple, rises prominently. Its stepped layers are visible, and numerous spectators are perched on its lower roofs and outer walls, observing the festivities from above. To the far left, another smaller, cream-colored temple dome is visible. Various other buildings, some with traditional pitched roofs and others with flat roofs, are scattered throughout the background. One building wall features a Swastika symbol, and another has banners with text in Odia script, one displaying "4G" and a picture of what appears to be PM Modi, providing a contemporary context to the ancient ritual. Green trees are visible on the horizon to the right.
The sky is overcast, suggesting either an early morning or a cloudy day. The overall impression is one of intense spiritual energy, devotion, and a massive cultural celebration.
Before sending the image to the Gemini API, we resize it to a thumbnail of 512x512 pixels to ensure efficient handling. We then display the image using Colab’s IPython.display. The API response is formatted using Markdown for cleaner output, making the result easier to read directly in a notebook environment.
import google.generativeai as genai
from google.colab import userdata
from IPython.display import display, Markdown
# Resize and display image
image.thumbnail([512, 512])
display(image)
try:
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
# Initialize the model
model = genai.GenerativeModel('gemini-2.5-flash')
prompt = ["Describe the image ", image]
response = model.generate_content(prompt)
Markdown(response.text)
except Exception as e:
print(f"An error occurred: {e}")
print("Please check your API key and ensure the 'image' variable is defined.")
image = Image.open('hand-written-text.jpg', mode="r") # create image object
In this step, instead of downloading the image from a URL, we use a locally available image file. Here, a handwritten note image named hand-written-text.jpg is loaded, resized, and displayed. The Gemini API then analyzes the content of the image and returns a descriptive output. This is especially useful for tasks like reading handwritten content or analyzing documents visually.
import google.generativeai as genai
from google.colab import userdata
from IPython.display import display, Markdown
# Load and resize a local handwritten image
image = Image.open('hand-written-text.jpg', mode="r")
image.thumbnail([512, 512])
display(image)
try:
GOOGLE_API_KEY = userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)
model = genai.GenerativeModel('gemini-2.5-flash')
prompt = ["Describe the image ", image]
response = model.generate_content(prompt)
Markdown(response.text)
except Exception as e:
print(f"An error occurred: {e}")
print("Please check your API key and ensure the 'image' variable is defined.")
GOOGLE_API_KEY=your_actual_api_key_here
Here is the code to get the API key from local file and configure the same with the Model.
import os
from PIL import Image
import google.generativeai as genai
from dotenv import load_dotenv
# Load API key from .env file
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
try:
# Configure Gemini API
genai.configure(api_key=GOOGLE_API_KEY)
# Load and resize the image
image = Image.open("your-image.jpg") # Replace with your image filename with path
image.thumbnail([512, 512])
image.show()
# Initialize Gemini model
model = genai.GenerativeModel('gemini-2.5-flash')
# Send prompt with image
prompt = ["Describe the image", image]
response = model.generate_content(prompt)
# Print the AI response
print(response.text)
except Exception as e:
print(f"An error occurred: {e}")
print("Please check your API key and ensure the image file is valid.")
This script uses Google Gemini AI and the ReportLab library to create a beautifully formatted PDF from a list of image URLs. Each page includes a resized image and a short AI-generated description. Ideal for creating coffee table books, travel journals, or AI-assisted photo essays, this tool blends automation and creativity seamlessly.
reportlab.pdfgen.canvas, it dynamically creates one page per image, ensuring both image and description stay within the same page.Pillow (PIL) is used to fetch, resize, and convert the image before embedding.# Gemini AI prompt with image input
prompt = ["Describe the image in 200 words:", gemini_image]
result = model.generate_content(prompt)
# Prepare and wrap description text
full_text = result.text.strip().replace('\n', ' ')
wrapped_lines = wrap(full_text, width=90)[:3]
# Draw image and description on the canvas
c.drawImage(img_reader, image_x, image_y, width=img.width, height=img.height)
c.setFont("Helvetica", 12)
for line in wrapped_lines:
c.drawString(50, text_y, line)
text_y -= 18
Full code is here
# Required Libraries
import os
import requests
import io
from PIL import Image
from reportlab.lib.pagesizes import A4
from reportlab.pdfgen import canvas
from reportlab.lib.utils import ImageReader
from textwrap import wrap
import google.generativeai as genai
from dotenv import load_dotenv
# Load Gemini API Key
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
genai.configure(api_key=GOOGLE_API_KEY)
# Initialize Gemini Model
model = genai.GenerativeModel("gemini-1.5-flash")
# List of Image URLs
image_urls = [
"https://www.go2india.in/upimg/9565.jpg",
"https://www.go2india.in/upimg/9561.jpg",
"https://www.go2india.in/upimg/9559.jpg"
]
# Output PDF Path
pdf_path = "E:\\testing3\\gemini\\travel_book1.pdf"
c = canvas.Canvas(pdf_path, pagesize=A4)
page_width, page_height = A4
for idx, url in enumerate(image_urls, start=1):
try:
print(f"Processing {url}")
# Download and Convert Image
response = requests.get(url)
img = Image.open(io.BytesIO(response.content)).convert("RGB")
# Resize Image to Fit PDF
max_img_width = page_width - 100
img.thumbnail((max_img_width, 400))
img_reader = ImageReader(img)
# Prepare Image for Gemini API
img_bytes = io.BytesIO()
img.save(img_bytes, format="JPEG")
gemini_image = {
"mime_type": "image/jpeg",
"data": img_bytes.getvalue()
}
# Get AI-generated Description
prompt = ["Describe the image in 100 words :", gemini_image]
result = model.generate_content(prompt)
# Wrap Text to Fit Page Width
full_text = result.text.strip().replace('\n', ' ')
wrapped_lines = wrap(full_text, width=90)
# Draw Image on Page
image_x = 50
image_y = page_height - img.height - 100
c.drawImage(img_reader, image_x, image_y, width=img.width, height=img.height)
# Draw Description Text Below Image
text_y = image_y - 30
c.setFont("Helvetica", 12)
for line in wrapped_lines:
c.drawString(50, text_y, line)
text_y -= 18
c.showPage()
except Exception as e:
print(f"Error processing {url}: {e}")
# Finalize and Save the PDF
c.save()
print(f"\n✅ PDF saved at: {pdf_path}")
You can use the Gemini Vision model via the `generate_content()` method by passing a prompt and an image (as a PIL object). This can be done in Google Colab or any local Python environment.
Common formats like JPG and PNG are supported as long as they are loaded as PIL Image objects.
You can use `"\n"` or triple quotes (`"""`) to send a multiline prompt, or pass it as part of a list along with the image object.
The temperature controls the randomness of the model's output. Higher values (e.g., 1.0) make the responses more creative; lower values (e.g., 0.2) make them more focused and deterministic.
Yes, by setting the `max_tokens` parameter in the `generate_content()` call, you can restrict the length of the response generated by Gemini AI.
Yes, Gemini API calls are made over the internet and require a valid API key and active internet connection.
Yes, you can run the same code locally by securely loading the API key using a `.env` file and installing the required packages using pip.
Using the Gemini API with image inputs opens up powerful possibilities for AI-assisted visual understanding. Whether you're analyzing handwritten notes, product photos, or diagrams, this workflow in Google Colab is efficient and easy to extend. As Gemini continues to evolve, you’ll be able to build even smarter applications by combining text, images, and other media inputs. Stay tuned for more examples and advanced integrations.
Tkinter and Gemini AI — complete with settings and chat history.
Read Tutorial
Author
🎥 Join me live on YouTubePassionate about coding and teaching, I publish practical tutorials on PHP, Python, JavaScript, SQL, and web development. My goal is to make learning simple, engaging, and project‑oriented with real examples and source code.