Back to guide/General Productivity

Multimodal Image + Text Analysis

GPT-4o processes images natively rather than converting to text descriptions. Asking it to reference specific locations in the image and categorize by priority produces structured, actionable output from visual input — much more useful than a generic description.

gpt-4o-promptsimageTypefocusAreascategories
Edit View
Prompt
Analyze the attached {{imageType}} and provide a detailed assessment.

**Focus areas:**
{{focusAreas}}

**For each issue or observation:**
1. Describe what you see (reference the specific location in the image)
2. Explain why it matters
3. Provide a specific, actionable recommendation

**Output format:**
- Priority: critical / important / minor
- Category: {{categories}}
- Description and recommendation

Also provide a summary at the top with total counts by priority level.

[Attach: {{imageDescription}}]

Variables to customize

{{imageType}}{{focusAreas}}{{categories}}{{imageDescription}}

Why this prompt works

GPT-4o processes images natively rather than converting to text descriptions. Asking it to reference specific locations in the image and categorize by priority produces structured, actionable output from visual input — much more useful than a generic description.

Save this prompt to your library

Organize, version, and access your best prompts across ChatGPT, Claude, and Cursor.