Back to guide/General Productivity

Multimodal Image Analysis

Gemini's native multimodal processing handles complex image analysis better than most models. The structured extraction format and uncertainty tagging produce reliable, parseable output.

gemini-promptsimage_typeoutput_format
Edit View
Prompt
Analyze the attached {{image_type}} and provide a comprehensive breakdown:

1. **Visual inventory**: List every distinct element you can identify (objects, text, colors, layout)
2. **Text extraction**: Transcribe ALL text visible in the image exactly as written
3. **Spatial relationships**: Describe how elements are positioned relative to each other
4. **Context clues**: What can you infer about when, where, and why this was created?
5. **Data extraction**: If this contains charts, tables, or diagrams, extract the data into a structured {{output_format}} format

For any element you're uncertain about, say "[uncertain]" rather than guessing.

Finally, suggest 3 follow-up questions I could ask about this image to get deeper insights.

Variables to customize

{{image_type}}{{output_format}}

Why this prompt works

Gemini's native multimodal processing handles complex image analysis better than most models. The structured extraction format and uncertainty tagging produce reliable, parseable output.

Save this prompt to your library

Organize, version, and access your best prompts across ChatGPT, Claude, and Cursor.