Data Extraction from Documents
Gemini's multimodal document understanding handles PDFs, images of forms, and scanned documents. The explicit normalization rules and NOT_FOUND convention make the output immediately usable in spreadsheets or databases.
Extract structured data from the attached {{document_type}}. Extract these fields: {{fields_list}} Output as a {{output_format}} with one row per {{entity_unit}}. Extraction rules: - If a field appears multiple times, take the most recent / most specific value - If a field is missing, use "NOT_FOUND" (not null, not empty) - For dates, normalize to YYYY-MM-DD format regardless of input format - For currency, normalize to numbers without symbols (include a "currency" column) - For names, use "LastName, FirstName" format - Flag any field where the extraction is ambiguous with a trailing " [AMBIGUOUS]" marker After the data, provide: - Total records extracted - Fields with the highest "NOT_FOUND" rate - Any patterns or anomalies you noticed in the data
Variables to customize
Why this prompt works
Gemini's multimodal document understanding handles PDFs, images of forms, and scanned documents. The explicit normalization rules and NOT_FOUND convention make the output immediately usable in spreadsheets or databases.
Save this prompt to your library
Organize, version, and access your best prompts across ChatGPT, Claude, and Cursor.
Related prompts
Forcing the agent to plan before acting prevents premature execution and wasted steps. Explicit dependency mapping enables parallel execution and catches logical gaps early.
Tool Selection AgentThe ReAct pattern (Reason + Act) creates an explicit reasoning trace that improves tool selection accuracy. The error-handling rule prevents infinite retry loops.
Prompt CompressorExplicitly requiring all functional requirements to be preserved prevents the model from over-compressing and losing critical instructions.
Memory Management AgentExplicit memory read/write instructions create agents that improve over time. Categorization keeps memories organized, and the deduplication rule prevents context bloat.