Multimodal Image + Text Analysis
GPT-4o processes images natively rather than converting to text descriptions. Asking it to reference specific locations in the image and categorize by priority produces structured, actionable output from visual input — much more useful than a generic description.
Analyze the attached {{imageType}} and provide a detailed assessment. **Focus areas:** {{focusAreas}} **For each issue or observation:** 1. Describe what you see (reference the specific location in the image) 2. Explain why it matters 3. Provide a specific, actionable recommendation **Output format:** - Priority: critical / important / minor - Category: {{categories}} - Description and recommendation Also provide a summary at the top with total counts by priority level. [Attach: {{imageDescription}}]
Variables to customize
Why this prompt works
GPT-4o processes images natively rather than converting to text descriptions. Asking it to reference specific locations in the image and categorize by priority produces structured, actionable output from visual input — much more useful than a generic description.
Save this prompt to your library
Organize, version, and access your best prompts across ChatGPT, Claude, and Cursor.
Related prompts
Forcing the agent to plan before acting prevents premature execution and wasted steps. Explicit dependency mapping enables parallel execution and catches logical gaps early.
Tool Selection AgentThe ReAct pattern (Reason + Act) creates an explicit reasoning trace that improves tool selection accuracy. The error-handling rule prevents infinite retry loops.
Prompt CompressorExplicitly requiring all functional requirements to be preserved prevents the model from over-compressing and losing critical instructions.
Memory Management AgentExplicit memory read/write instructions create agents that improve over time. Categorization keeps memories organized, and the deduplication rule prevents context bloat.