Back to guide/General Productivity

Data Extraction from Documents

Gemini's multimodal document understanding handles PDFs, images of forms, and scanned documents. The explicit normalization rules and NOT_FOUND convention make the output immediately usable in spreadsheets or databases.

gemini-promptsdocument_typefields_listoutput_format
Edit View
Prompt
Extract structured data from the attached {{document_type}}.

Extract these fields:
{{fields_list}}

Output as a {{output_format}} with one row per {{entity_unit}}.

Extraction rules:
- If a field appears multiple times, take the most recent / most specific value
- If a field is missing, use "NOT_FOUND" (not null, not empty)
- For dates, normalize to YYYY-MM-DD format regardless of input format
- For currency, normalize to numbers without symbols (include a "currency" column)
- For names, use "LastName, FirstName" format
- Flag any field where the extraction is ambiguous with a trailing " [AMBIGUOUS]" marker

After the data, provide:
- Total records extracted
- Fields with the highest "NOT_FOUND" rate
- Any patterns or anomalies you noticed in the data

Variables to customize

{{document_type}}{{fields_list}}{{output_format}}{{entity_unit}}

Why this prompt works

Gemini's multimodal document understanding handles PDFs, images of forms, and scanned documents. The explicit normalization rules and NOT_FOUND convention make the output immediately usable in spreadsheets or databases.

Save this prompt to your library

Organize, version, and access your best prompts across ChatGPT, Claude, and Cursor.