Docento.app Logo
Docento.app
Clean workspace with laptop and notebook
All Posts

PDF Workflows for Translators

May 10, 2026·7 min read

Translators face a specific PDF nuisance: the source document is fixed-layout, but the translation needs to flow into a different language with different word lengths, character sets, and reading directions. A clean workflow extracts text faithfully, runs it through translation memory (TM) and quality steps, then puts it back into a usable format. This guide covers the practical stack for professional translators in 2026.

The translator's PDF problem

A typical job:

  • Client provides a PDF (often the only source).
  • Translator extracts text.
  • Translates with TM and QA tools.
  • Returns deliverable: PDF, Word, or both.

The pain points:

  • PDFs are not designed for editing.
  • Source files (Word, InDesign, XML) are often unavailable.
  • Layout breaks when target language is longer or shorter than source.
  • Tables and figures need special handling.

Asking for the source

The first step is often: ask the client for the source file.

  • Word document.
  • InDesign (.indd or .idml).
  • Markdown or HTML.
  • XML / DITA / structured authoring.

Translating from the source is dramatically easier than from PDF. Many clients have the source but didn't think to send it. Ask first.

CAT tools and PDF

Computer-assisted translation (CAT) tools work on text segments:

  • Trados Studio: industry standard; handles many formats including PDF.
  • memoQ: strong competitor; project management features.
  • MateCat, Smartcat: cloud-based; subscription.
  • OmegaT: open source.
  • Across, Wordfast, Memsource (now Phrase): other established tools.

All major CAT tools include PDF import. Quality of the import varies based on the PDF's structure.

PDF preprocessing

Before importing to a CAT tool:

Specialized tools for PDF-to-translatable-format:

  • ABBYY FineReader: PDF OCR plus Word export with strong layout preservation.
  • Iceni Infix: edit PDFs directly with text-flow awareness.
  • Solid Converter: PDF-to-Word with layout retention.

Translation memory

The TM is the translator's compounding asset:

  • Segment-level matches: stores source/target pairs.
  • Fuzzy matches: similar but not identical segments get retrieved.
  • Concordance: search across all past translations for a phrase.

Build TMs per client and per domain. Over years, they save enormous time and improve consistency.

For PDFs specifically, ensure segments map cleanly to the source. Bad PDF extraction breaks segmentation.

Terminology management

Term bases (TBs) ensure consistent terminology:

  • Per-client glossaries.
  • Per-domain glossaries (medical, legal, technical).
  • Approved translations for product names, key concepts.

Tools integrated with CAT: MultiTerm (Trados), QTerm (memoQ), built-in features in cloud CATs.

Quality assurance

Pre-delivery QA:

  • Automated checks: number consistency, punctuation, format codes, term-base compliance.
  • Spell check: in the target language.
  • Length checks: for UI strings or layout-constrained content.
  • Forbidden terms: client-specific blocklists.

Tools: built into CAT or standalone like Verifika, Xbench.

Layout reconstruction

After translation, the question: where does the translation go?

Option 1: deliver Word. Client takes the Word and reformats as needed. Simplest for the translator.

Option 2: re-create the PDF. Translator (or DTP specialist) builds the PDF from the translated Word. Adds time.

Option 3: edit the PDF directly. Tools like Infix or Acrobat Pro for text-flow PDF editing. Sometimes the only option when source is unavailable.

Option 4: client handles layout. Translator delivers translated text; client reformats. Often best when the layout is complex.

For DTP-heavy work (InDesign-laid-out PDFs), an InDesign workflow with .idml exchange is the cleanest.

Length expansion

Target text often differs in length from source:

  • EN to DE: target is typically 25-35% longer.
  • EN to JA: target is often shorter (in character count, longer in display width).
  • EN to ZH: target is significantly shorter in character count.
  • EN to AR: right-to-left; same character count roughly; bidi text handling.

Layout-constrained content (UI strings, buttons, headers) needs special attention. Some clients specify maximum lengths per segment.

Multilingual PDFs

For PDFs that need multiple languages:

  • One PDF per language, common for marketing and product docs.
  • Bilingual PDF with side-by-side or page-pair layout.
  • Multilingual layout in a single PDF: more complex; specific to certain domains (legal, official).

For RTL languages (Arabic, Hebrew, Farsi), layout must support bidirectional text and right-to-left page order.

Sworn and certified translation

For legal, immigration, official documents:

  • Sworn translator signs and stamps the translation.
  • Certified translation (without stamp) for jurisdictions that don't require sworn.
  • Notarization required in some jurisdictions.
  • Original plus translation delivered together; sometimes bound as a single PDF.

For e-signatures and stamps on certified translations, see how to create an electronic signature and digital signatures vs electronic signatures.

Confidentiality

Translators often handle sensitive documents:

  • NDAs with clients.
  • Encrypted storage for client files.
  • Encrypted communication for delivery.
  • Right-of-deletion clauses for some content.
  • TM segregation: per-client TMs to avoid leaking across clients.

For sensitive content (medical, financial, legal), the privacy stack matters. See risks of using AI on confidential PDFs.

AI and MT in translation

Machine translation (MT) has been a translator's tool for years. In 2026:

  • DeepL, Google Translate, Microsoft Translator: traditional NMT.
  • GPT-4o, Claude Sonnet, Gemini: LLM-based MT with stronger context handling.
  • Specialized: Lilt, Phrase, ModelFront for translator-aimed AI.

The professional pattern: MT as a draft; human post-edits (MTPE). For high-quality output, never raw MT.

For PDFs specifically, see how to translate PDF documents and AI PDF translation explained.

Pricing models

Common translator pricing:

  • Per word (source or target).
  • Per hour for editing, DTP, complex work.
  • Per page for sworn translations.
  • Fuzzy match grids: full match = lower rate; new word = full rate.

PDF-source jobs often command a premium because of the preprocessing overhead.

Project management

For freelance translators:

  • Client database: contact info, rates, preferences.
  • Project tracker: source word count, due date, status.
  • Invoice generator.
  • TM and TB per client.

Tools: dedicated translator PM tools (Translation Office 3000, Plunet for agencies, Protemos for freelancers), or general PM tools (Notion, Airtable, Trello).

For freelancer PDF practices more broadly, see PDF workflows for freelancers.

Tools the translator uses

  • CAT: Trados, memoQ, Phrase, MateCat, OmegaT.
  • TM management: built into CAT.
  • QA: Verifika, Xbench, ApSIC.
  • PDF prep: ABBYY FineReader, Iceni Infix, Solid Converter.
  • MT: DeepL, GPT-4o, Claude.
  • Local PDF editing: Acrobat, Docento.app for browser-based ops.
  • DTP: InDesign for typeset deliverables; Word for general.

Common gotchas

Text in images. PDF text may be in images (logos, captions). OCR can extract; layout reconstruction in target requires graphics work.

Hyphenation in extraction. "exam-\nple" extracts as "exam-ple". Run dehyphenation.

Headers and footers repeating per page; translate once, place per page in layout.

Lost formatting. Bold, italic, color may not survive PDF extraction. Check and restore.

Numbers translated. CAT tools should protect numbers, but verify. "$5,000" becoming "5.000 €" needs explicit conversion.

Cultural adaptation. Sometimes translation isn't enough: date formats, addresses, units of measure, examples need localization.

Tags in CAT tools. Inline tags (font changes, links) must be preserved exactly.

Practical recipe

For a clean translator's PDF workflow:

  1. Always ask for the source file first.
  2. If PDF only, preprocess: convert to Word, clean up, OCR if needed.
  3. Import to CAT tool with TM and TB.
  4. Translate with TM and MT support.
  5. QA pass.
  6. Layout reconstruction as agreed with client.
  7. Delivery: PDF, Word, or both, per agreement.
  8. Update TM and TB for next time.

For local PDF tasks (cropping a chunk for translation samples, combining bilingual deliverables, signing certified translations), Docento.app handles operations in the browser.

Takeaway

Translation work on PDFs is doable but adds preprocessing time. The professional pattern is: ask for the source first; preprocess thoroughly when you can't get the source; use CAT tools with TM and TB; deliver in the requested format. Treat PDF preprocessing as a billable step, not as overhead. See also how to translate PDF documents, AI PDF translation explained, and how to convert PDF to Word.

Related Posts