Docento.app Logo
Docento.app
Laptop on a wooden desk
All Posts

PDF Redaction Failures: Real Cases and How to Avoid Them

May 9, 2026·9 min read

Every few years, a high-profile organization releases a redacted PDF, and within hours someone discovers the redactions can be defeated by selecting the text behind the black bars. The leaked content was never removed, just visually covered. Court records, government filings, corporate legal documents have all suffered this fate. The fix is straightforward, but the failure mode is so common that it deserves its own deep dive. This guide walks through the real failure modes and how to redact PDFs correctly.

The fundamental misunderstanding

A redaction is not a black rectangle.

Drawing a black rectangle over text in a PDF editor does exactly what it looks like: it draws a rectangle on top of the page. The text underneath is still there. Anyone with a PDF reader can:

  • Select the text behind the rectangle and copy it
  • Use pdftotext or any extraction tool to retrieve the underlying text
  • Open the PDF in a different viewer that does not respect the rectangle's z-order
  • Read the file's content stream directly with a text editor

A real redaction permanently removes the underlying content. The black box is just the visual indicator.

Famous failures

A few well-publicized cases over the years:

  • Government filings where redacted names were recovered by selecting through the black bars.
  • Court documents where redacted financial figures were extracted via copy-paste.
  • Corporate disclosures where redacted strategy details surfaced in news stories after a journalist used pdftotext.
  • Military documents redacted with black highlighter PDF annotations that did not remove the text.

The common thread: the redactor used a visual cover rather than a true content removal. The damage from such mistakes ranges from embarrassment to legal exposure to operational risk.

Real redaction: what it actually does

A proper redaction tool:

  1. Identifies the text or content to redact
  2. Removes that content from the page's content stream
  3. Draws a visible replacement (black bar, blank space, or "REDACTED" stamp)
  4. Optionally adds metadata noting the redaction occurred
  5. Saves the result

After step 2, the original content is gone, not hidden, but absent. No selection, no extraction tool, no clever workaround can recover it.

Tools that perform true redaction

Adobe Acrobat Pro. Tools → Redact. The redaction workflow has two distinct steps:

  1. Mark for redaction. Use the rectangle tool, text selection, or "Search and Redact" to identify content. This marks but does not yet redact.
  2. Apply redactions. A separate "Apply" step that actually removes the content. After applying, the file is permanently modified.

The two-step workflow exists precisely because applying is irreversible.

Foxit PDF Editor. Protect → Redact → Mark for Redaction → Apply Redaction. Same two-step pattern.

PDF-XChange Editor. Similar Mark / Apply workflow.

Browser-based. Docento.app supports true redaction with permanent content removal.

CLI/library options:

  • pdftk with cat and content stream manipulation, works but requires care
  • pikepdf, Python library that can remove content stream tokens
  • mutool (MuPDF), mutool clean -d input.pdf output.pdf removes redaction-marked content
  • Custom scripts using PDFBox or iText for batch redaction at scale

For most one-off redactions, GUI is right. For batch redaction across thousands of documents, scripting is the only sane path.

The two-step process matters

The "Mark" and "Apply" separation is not a quirk, it is a safety net.

  • You mark what to redact
  • You review the marks
  • You apply, which is irreversible
  • The redacted file is saved as a new file (do not overwrite the original)

If you mark wrong, you can adjust before applying. After applying, the content is gone forever.

For high-stakes redaction, the workflow should include:

  1. Mark
  2. Save a copy of the marked-but-not-applied version (for audit)
  3. Independent review by a second person
  4. Apply
  5. Verify the apply succeeded by trying to extract redacted text
  6. Save final
  7. Delete intermediate copies if confidentiality requires

Common redaction failures

Black highlight annotation. A black highlighter mark is an annotation, content underneath is untouched. Tools sometimes mistakenly call this "redaction".

Rectangle filled with black. A drawn shape covers content but does not remove it.

Black text on black background. Setting the text color to black and the background to black hides text visually but leaves it copyable.

Image overlay. A black image placed over content. Selectable text underneath persists.

Hidden via OCG (layers). Marking content as a hidden layer hides it from view but does not remove it from the file.

Cropped out. Reducing the page's visible CropBox to exclude redacted content. The content is still in the page's MediaBox; anyone can change the CropBox to reveal it.

Comment / annotation overlay. Any annotation type that visually covers without modifying the underlying content stream.

All of these are failures. The fix is to use a real redaction tool that modifies the underlying content.

Beyond visible text

True redaction removes more than visible text:

  • Hidden metadata. Author names, organizational paths, version history. See hidden data in PDFs explained.
  • Annotations. Comments and review marks. Strip on redaction.
  • Form field values. Filled forms may carry data that should be redacted.
  • Attached files. A PDF can attach other files. Strip if redacting.
  • Embedded images. A logo of a witness, a screenshot, a chart, image redaction works the same as text but requires the image to be replaced or removed.
  • Bookmarks and outline entries. Often forgotten; can leak section titles even when body text is removed.
  • JavaScript. PDFs can contain scripts that leak data. Remove during redaction.
  • Stream objects with old versions. PDFs can have multiple revisions ("incremental update"); old revisions may contain pre-redaction content. The redaction tool should flatten to a single revision.

Acrobat Pro's "Sanitize Document" runs alongside redaction and handles many of these. See how to anonymize PDF documents for the broader workflow.

OCR-aware redaction

For scanned PDFs:

  1. Run OCR first (PDF OCR explained)
  2. The OCR adds a text layer over the image
  3. Redact removes both the image content (the visible page) and the underlying text layer
  4. After redaction, no text layer exists for the redacted region; the image is also blanked

If you redact only the text layer but not the image, the visible scanned text remains. If you redact only the image but not the text layer, search and extraction tools can still find the redacted text. Always redact both.

Verification: testing your redaction

After applying redactions:

  1. Copy-paste test. Try to select and copy the redacted region. Paste into a text editor. Confirm nothing comes through.
  2. Extraction test. Run pdftotext on the redacted file. Confirm the redacted text does not appear.
  3. Hex dump test. Open the PDF in a binary viewer. Search for the redacted strings. None should appear.
  4. Visual test. Open in multiple readers. Confirm the redactions look right in all of them.
  5. Metadata check. Run exiftool or similar to confirm no metadata leaks the redacted content.

The hex-dump test is the gold standard for verifying that content is truly gone. See how to strip metadata from PDF.

Batch redaction

For organizations that redact at scale (legal e-discovery, FOIA responses):

  • Adobe Acrobat Pro Action Wizard automates "search → mark → apply" sequences across many documents
  • commercial e-discovery platforms (Relativity, Disco, Everlaw) have integrated batch redaction
  • scripted pipelines using PDFBox or iText for repetitive redaction patterns

Always include verification in the batch, automatically test that the redacted patterns no longer appear in the output.

Common gotchas

Black bars look like redaction but are not. Just because it looks black does not mean it is removed. Verify.

Same color blocks. Drawing a white rectangle on a white background "hides" content. Anyone can move the rectangle and reveal.

Inconsistent redaction. Redacting one instance of a name but missing another. Use Search and Redact to find all instances.

Margin notes. A name redacted in body text may remain in a margin annotation. Strip annotations during redaction.

File metadata. The PDF's Title or Author fields may carry the redacted name. Check and clear metadata.

Bookmarks. The bookmark "Section 3: Discussion of Customer X" leaks the customer name even if the body is redacted. Edit bookmarks.

Other version exposure. A redacted PDF emailed and then re-edited may carry the original content as an incremental update. Save as a "Save As" rather than "Save" to flatten revisions.

Email message. Sending a redacted PDF in an email that says "Redacted X" in the subject line leaks the name. Sanitize the surrounding context too.

Legal and compliance context

Failed redaction has real consequences:

  • Privacy regulations. GDPR, HIPAA, CCPA, and similar regimes can impose penalties for unintentional disclosure.
  • Court rules. Many jurisdictions have specific rules about how documents are redacted for court filing.
  • Industry standards. SOC 2, ISO 27001, and similar compliance frameworks audit redaction processes.

If redaction is part of a regulated workflow, document the process, train users, and audit periodically. See HIPAA-compliant PDF handling and GDPR and PDF documents for specific compliance angles.

Practical recipe

For a critical document:

  1. Make a working copy of the original (do not edit the master)
  2. Open in Acrobat Pro / Foxit / Docento.app
  3. Use the Redact tool (not the highlight, rectangle, or text tool)
  4. Mark all sensitive content (use Search and Redact for repeated terms)
  5. Run Sanitize Document to clean metadata and hidden data
  6. Apply redactions (the irreversible step)
  7. Save as a new file
  8. Run verification: copy-paste, pdftotext, hex dump
  9. Open in a different reader to spot-check
  10. Distribute the verified redacted file

Takeaway

Redaction is not drawing a black rectangle. It is permanent content removal, performed by a dedicated tool that distinguishes "mark" from "apply". Always verify the result by trying to extract redacted content; the only safe redacted PDF is one that fails every extraction attempt. For high-stakes documents, separate the marking and applying steps with independent review and document the process for audit. For browser-based redaction alongside other security operations, Docento.app handles true content removal without installing tooling. For the broader context, see hidden data in PDFs explained and how to anonymize PDF documents.

Related Posts