Docento.app Logo
Docento.app
Code editor on a laptop screen
All Posts

Prompt Engineering for PDF Tasks

April 10, 2026·8 min read

Getting good results from an AI on PDF content is half about which model you use, half about how you ask. The default "summarize this" prompt produces forgettable output. A well-engineered prompt produces summaries, extractions, and analyses you can act on. This guide collects the patterns that work for everyday PDF tasks across the major chat AIs.

Why prompts matter especially for PDFs

PDFs come in many shapes: contracts, invoices, research papers, slide decks, scans, forms. A generic prompt cannot adapt to all of them. The model also has to handle imperfect text extraction, layout artifacts, and partial OCR. Giving the model context about what kind of document it is, what you need, and what to avoid produces measurably better results.

The five components of a good PDF prompt

A useful template:

  1. Role: who the AI should think it is for this task.
  2. Context: what kind of document, what domain.
  3. Task: what you want done, specifically.
  4. Constraints: what to avoid, what format to use.
  5. Verification clause: what to do if uncertain.

A worked example, for summarizing a legal contract:

You are a contracts analyst. This is a US-based vendor services agreement. Summarize the agreement in 5 bullets covering: parties, term, fees, termination conditions, and limitation of liability. Quote exact figures and dates. Use plain English. If a field is missing or unclear, say "Not specified" rather than guessing.

That prompt typically returns a usable summary on first pass. "Summarize this contract" does not.

Patterns by task

Summarization.

  • Specify length and granularity: "3 bullets," "one paragraph," "executive summary in 500 words."
  • Specify audience: "for a non-technical executive," "for a subject-matter expert."
  • Specify what to include: financial implications, technical methodology, key risks.
  • Specify what to exclude: "skip introductory and concluding material."

See AI PDF summarization explained for the broader picture.

Extraction (structured fields).

  • Provide a JSON schema example: "Return JSON with keys: vendor, invoice_number, total, due_date."
  • Set defaults: "Use null for any field not found."
  • Constrain types: "total as a number without currency symbol; due_date in YYYY-MM-DD."
  • Provide few-shot examples for tricky formats.
Return JSON like:
{
  "vendor": "Acme Corp",
  "invoice_number": "INV-2026-0042",
  "total": 1234.56,
  "due_date": "2026-04-30",
  "line_items": [{"description": "...", "quantity": 2, "unit_price": 100.00}]
}

Question answering.

  • Ground: "Answer using only the provided document."
  • Cite: "Quote the exact sentence that supports the answer."
  • Hedge: "Say 'I don't know' if the document does not contain the answer."

This combination dramatically reduces hallucinations.

Comparison.

  • Specify dimensions: "Compare the two contracts on payment terms, termination clauses, and indemnification."
  • Format: "Use a Markdown table with one row per dimension."
  • Flag absences: "If one document lacks the section, mark as N/A."

See how to compare two PDFs.

Translation.

  • Specify target language and register: "translate to Spanish, neutral register suitable for a business audience."
  • Preserve structure: "preserve all headings, lists, and tables."
  • Glossary if available: "use the following preferred translations for these terms: ..."

See AI PDF translation explained.

Classification.

  • Provide the category list explicitly.
  • Define each category briefly.
  • Ask for a confidence score.
  • Provide a tiebreaker rule for ambiguous cases.
Classify the document into one of: invoice, contract, resume, packing slip, other.
- invoice: contains line items and a total amount due
- contract: contains parties and binding clauses
- resume: contains professional experience and education
- packing slip: lists shipped items without prices
- other: anything else

Return JSON: {"category": "...", "confidence": 0.0 to 1.0, "reason": "one sentence"}

See classifying PDFs with machine learning.

Handling long documents

For PDFs longer than the model's context:

  1. Chunk by section (headings) rather than fixed token windows.
  2. Summarize per section with a consistent prompt.
  3. Aggregate with a final summarization pass.

For RAG-style answering over long PDFs, see building a RAG system with PDFs.

Handling tables and figures

Tables and figures usually need a multimodal model. See multimodal LLMs and PDF documents.

For text-only models with a table extracted as text, include the format hint:

The text below includes a table that was extracted from a PDF. Columns may be misaligned. Reconstruct the table as Markdown, using your judgment to align rows and columns. If a row looks corrupt, mark it [PROBABLY CORRUPT].

Handling OCR-extracted text

OCR introduces errors. A prompt-level hint helps:

The text below was extracted from a scanned PDF via OCR. Expect occasional misread characters, especially digits (0/O, 1/l, 5/S). When extracting numbers, prefer values that make sense in context.

For known systematic OCR errors (e.g., a specific font that misreads "rn" as "m"), include the patterns explicitly.

Reducing hallucinations

A combination of three techniques works:

  1. Grounding. "Answer only from the provided document."
  2. Quoting. "Quote the exact sentence supporting each claim."
  3. Hedging permission. "Say 'I don't know' rather than guessing."

Plus low temperature (0 to 0.2 for extraction tasks; 0.3 to 0.5 for summaries).

For factual extraction, the JSON-schema approach plus null defaults prevents most fabrication.

Few-shot examples

For consistent output format on tricky tasks:

Extract events from the document below. Here are two examples:

Input: "The contract was signed on March 5, 2026 in New York." Output: {"date": "2026-03-05", "event": "contract signed", "location": "New York"}

Input: "Delivery is expected by April 15." Output: {"date": "2026-04-15", "event": "delivery expected", "location": null}

Now extract events from: [document text]

Two to three examples often suffice. Five plus rarely adds quality and bloats the prompt.

System prompts vs user prompts

Most chat APIs distinguish:

  • System prompt: persistent instructions about role, format, and constraints.
  • User prompt: the specific task and document.

Put role, format, and constraints in the system prompt. Put the document and the specific question in the user prompt. This separation makes the constraints feel more authoritative to the model.

Temperature and other parameters

  • Temperature 0 to 0.2: extraction, classification, structured output.
  • Temperature 0.3 to 0.5: summarization, descriptive analysis.
  • Temperature 0.7 to 1.0: creative tasks (rare for PDFs).

For determinism in evaluation, fix the temperature and any random seed.

Verification

Always verify on a sample before trusting:

  • Run the prompt on 10 to 20 documents of varying types.
  • Compare to ground truth (manual extraction or known answers).
  • Adjust the prompt to fix observed errors.
  • Iterate until accuracy is acceptable.

Track which prompt version produced which results in your logs. Prompt changes can regress performance just like code changes.

Anti-patterns

Vague prompts. "Summarize this" gives default-shape summaries. Be specific.

Asking for opinions on factual tasks. "What do you think about the indemnification clause?" gets opinion mixed with fact. Separate.

Combining many tasks in one prompt. "Summarize, classify, extract fields, and translate." Each task is usually better in its own call.

Asking for citations without grounding. Models invent citations when not constrained to the source.

Trusting model self-reports. "How confident are you?" returns numbers that are weakly calibrated. Use it as a hint, not a guarantee.

Skipping verification. Models change. A prompt that worked on GPT-4 in 2024 may behave differently on GPT-4.1 today.

Model differences

Major models have personalities:

  • Claude is verbose by default, follows instructions tightly, hedges more.
  • GPT-4o is concise by default, freer with creative additions.
  • Gemini is strong at structured output, sometimes brittle on edge cases.

The same prompt produces noticeably different results across models. If switching models, re-verify your prompts.

Tools and frameworks

For production prompt engineering:

  • DSPy lets you write prompts as code and optimize them against datasets.
  • LangChain has prompt templates with versioning.
  • Promptfoo, Helicone, Langfuse for evaluation and observability.
  • Anthropic Workbench, OpenAI Playground for fast iteration.

For research papers and tax docs, dedicated extraction services often beat hand-prompted LLMs once you have volume. See AI data extraction from PDFs.

Practical recipe

For any new PDF task:

  1. Define the task precisely. Inputs, outputs, edge cases.
  2. Write a v1 prompt with role, context, task, constraints, verification clause.
  3. Run on 10 examples. Look at failures.
  4. Iterate. Add examples, tighten constraints, add hedging.
  5. Lock the prompt with a version number.
  6. Evaluate periodically on a held-out set.

Takeaway

Prompt engineering for PDF tasks is mostly about discipline: be specific, ground in the document, allow the model to say "I don't know," and verify. The investment of an hour refining a prompt typically saves dozens of hours of downstream cleanup. For PDF preparation steps before AI processing (cropping, splitting, redacting), Docento.app handles them in the browser without uploading. See also AI PDF summarization explained, building a RAG system with PDFs, and chatting with PDFs explained.

Related Posts