Multimodal Image Analysis
Gemini's native multimodal processing handles complex image analysis better than most models. The structured extraction format and uncertainty tagging produce reliable, parseable output.
Analyze the attached {{image_type}} and provide a comprehensive breakdown: 1. **Visual inventory**: List every distinct element you can identify (objects, text, colors, layout) 2. **Text extraction**: Transcribe ALL text visible in the image exactly as written 3. **Spatial relationships**: Describe how elements are positioned relative to each other 4. **Context clues**: What can you infer about when, where, and why this was created? 5. **Data extraction**: If this contains charts, tables, or diagrams, extract the data into a structured {{output_format}} format For any element you're uncertain about, say "[uncertain]" rather than guessing. Finally, suggest 3 follow-up questions I could ask about this image to get deeper insights.
Variables to customize
Why this prompt works
Gemini's native multimodal processing handles complex image analysis better than most models. The structured extraction format and uncertainty tagging produce reliable, parseable output.
What you get when you save this prompt
Your workspace unlocks powerful tools to iterate and improve.
AI Optimization
One-click improvement with structure analysis and pattern suggestions.
Version History
Track every edit. Compare versions side-by-side with word-level diffs.
Folders & Tags
Organize your library with nested folders, tags, and drag-and-drop.
$ npm i -g @promptingbox/mcpUse Everywhere
Access prompts from Claude, Cursor, ChatGPT & more via MCP integration.
Your prompts, organized
Save, version, and access your best prompts across ChatGPT, Claude, Cursor, and more.