Data Extraction from Documents

General Productivitygemini-promptsdocument_typefields_listoutput_format

Gemini's multimodal document understanding handles PDFs, images of forms, and scanned documents. The explicit normalization rules and NOT_FOUND convention make the output immediately usable in spreadsheets or databases.

Prompt

Extract structured data from the attached {{document_type}}.

Extract these fields:
{{fields_list}}

Output as a {{output_format}} with one row per {{entity_unit}}.

Extraction rules:
- If a field appears multiple times, take the most recent / most specific value
- If a field is missing, use "NOT_FOUND" (not null, not empty)
- For dates, normalize to YYYY-MM-DD format regardless of input format
- For currency, normalize to numbers without symbols (include a "currency" column)
- For names, use "LastName, FirstName" format
- Flag any field where the extraction is ambiguous with a trailing " [AMBIGUOUS]" marker

After the data, provide:
- Total records extracted
- Fields with the highest "NOT_FOUND" rate
- Any patterns or anomalies you noticed in the data

Variables to customize

Why this prompt works

What you get when you save this prompt

Your workspace unlocks powerful tools to iterate and improve.

AI OPTIMIZE

AI Optimization

One-click improvement with structure analysis and pattern suggestions.

VERSION DIFF

Version History

Track every edit. Compare versions side-by-side with word-level diffs.

ORGANIZE

Development

Code Review

Testing

Marketing

Folders & Tags

Organize your library with nested folders, tags, and drag-and-drop.

MCP

$ npm i -g @promptingbox/mcp

Claude · Cursor · ChatGPT

Use Everywhere

Access prompts from Claude, Cursor, ChatGPT & more via MCP integration.

Your prompts, organized

Save, version, and access your best prompts across ChatGPT, Claude, Cursor, and more.