Multimodal Image Analysis
Gemini's native multimodal processing handles complex image analysis better than most models. The structured extraction format and uncertainty tagging produce reliable, parseable output.
Analyze the attached {{image_type}} and provide a comprehensive breakdown: 1. **Visual inventory**: List every distinct element you can identify (objects, text, colors, layout) 2. **Text extraction**: Transcribe ALL text visible in the image exactly as written 3. **Spatial relationships**: Describe how elements are positioned relative to each other 4. **Context clues**: What can you infer about when, where, and why this was created? 5. **Data extraction**: If this contains charts, tables, or diagrams, extract the data into a structured {{output_format}} format For any element you're uncertain about, say "[uncertain]" rather than guessing. Finally, suggest 3 follow-up questions I could ask about this image to get deeper insights.
Variables to customize
Why this prompt works
Gemini's native multimodal processing handles complex image analysis better than most models. The structured extraction format and uncertainty tagging produce reliable, parseable output.
Save this prompt to your library
Organize, version, and access your best prompts across ChatGPT, Claude, and Cursor.
Related prompts
Forcing the agent to plan before acting prevents premature execution and wasted steps. Explicit dependency mapping enables parallel execution and catches logical gaps early.
Tool Selection AgentThe ReAct pattern (Reason + Act) creates an explicit reasoning trace that improves tool selection accuracy. The error-handling rule prevents infinite retry loops.
Prompt CompressorExplicitly requiring all functional requirements to be preserved prevents the model from over-compressing and losing critical instructions.
Memory Management AgentExplicit memory read/write instructions create agents that improve over time. Categorization keeps memories organized, and the deduplication rule prevents context bloat.