Back to guide/Developer Tools

Multimodal Task Prompt

Works identically on GPT-4o vision and Gemini multimodal. The structured format ensures both models extract the same categories of information rather than giving vague descriptions.

chatgpt-vs-geminimedia_typenum_suggestions
Edit View
Prompt
I'm sharing {{media_type}} with you. Analyze it and provide:

1. **Description**: What you see/hear in detail (be specific about elements, layout, colors, text)
2. **Key information extraction**: Pull out all data points, names, numbers, dates, or actionable items
3. **Quality assessment**: Rate the {{media_type}} quality on a 1-10 scale with justification
4. **Suggestions**: {{num_suggestions}} specific, actionable improvements
5. **Accessibility**: Note any accessibility concerns (alt text needed, contrast issues, readability)

Format the extracted information as a structured table where applicable.

Variables to customize

{{media_type}}{{num_suggestions}}

Why this prompt works

Works identically on GPT-4o vision and Gemini multimodal. The structured format ensures both models extract the same categories of information rather than giving vague descriptions.

Save this prompt to your library

Organize, version, and access your best prompts across ChatGPT, Claude, and Cursor.