Docento.app Logo
Docento.app
Person using assistive technology
All Posts

Tagged PDF vs Untagged PDF: What the Difference Actually Means

April 20, 2026·7 min read

Open two PDFs side by side that look identical on screen. One has been written for visual layout only. The other carries an invisible scaffolding describing its structure, what is a heading, what is a paragraph, what is a list, what is a figure, what its reading order is. The first is untagged. The second is tagged. The user only notices the difference when they need to do something other than look at it: searching, copying, screen-reading, converting, reflowing on a small screen. Then the gap is enormous.

This article unpacks what tagging actually is, what it changes in practice, and how to tell which kind of PDF you have.

The two layers of every PDF

Every PDF has two parallel content models:

  1. Content stream. A list of drawing commands: "draw this glyph at this coordinate", "show this image here", "fill this rectangle that color". This is the visual layer. It is what the renderer paints to the screen.
  2. Structure tree (tags). A hierarchical model of meaning: "this is a Document. Inside is a Heading 1. Below it is a Paragraph. Below that is a List with three List Items. Each List Item contains a Paragraph. Below the list is a Figure with alt text."

An untagged PDF has only the content stream. A tagged PDF has both, with the structure tree pointing back into the content stream so each tag knows which drawing commands belong to it.

You can think of tags as the "DOM" of a PDF, the analogy is not perfect, but it captures the shape.

What tags enable

A reader application that knows how to use the structure tree can do things that an untagged file makes impossible or unreliable:

  • Screen reading. Walk the structure tree top to bottom, announcing tags in order. Headings can be navigated, lists can be skipped, figures get their alt text read. See our PDF accessibility guide.
  • Reflow. Display the document in a single column at any width, useful on phones, tablets, and accessibility zoom. Without tags, reflow has to guess from layout and produces garbled results.
  • Reliable copy and paste. A multi-column document copies as continuous reading-order text instead of "first half of left column, first half of right column, second half of left column", which is what untagged copy-paste produces.
  • Conversion to other formats. Going from PDF to Word, HTML, EPUB, or Markdown is dramatically more accurate when tags exist, the converter has structure, not just a bag of glyphs.
  • Search and indexing. Headings and metadata feed into search systems and content extractors. Compliance reviewers can audit by section instead of by page.
  • Form interaction. Tagged form fields can have programmatic labels distinct from visible labels, supporting both accessibility and automated form filling.

What an untagged PDF feels like

Untagged PDFs are not broken, they just have ceilings.

  • A two-column scientific paper, untagged, reads incorrectly with a screen reader: it interleaves the columns row by row.
  • Trying to extract text reliably from an untagged invoice requires OCR-like heuristics even when the text is already embedded.
  • Reflow on a phone shows fragments out of order, because the renderer has nothing better to do than walk the layout in geometric order.
  • Conversion to Word produces text boxes scattered across the page instead of flowing paragraphs.

For documents that will only ever be printed and pinned to a wall, untagged is fine. For anything that has to live a digital life, untagged is a recurring tax.

How to tell whether a PDF is tagged

Several quick ways:

  • Adobe Acrobat / Reader. Open the file and look at the Properties dialog. There is a row labeled "Tagged PDF" that says Yes or No.
  • PAC 2026 (PDF Accessibility Checker). Drop the file in, open the report. The first thing it reports is whether tags exist.
  • Look at the Tags panel. In Acrobat, View → Show/Hide → Navigation Panes → Tags. A tagged PDF shows a populated tree. An untagged PDF shows "No Tags available".
  • Programmatic check. A library like pikepdf or iText can inspect the document's catalog for a /StructTreeRoot entry. Present means tagged.

How a PDF becomes tagged

Tagging is best done at authoring time. The author tool knows what is a heading and what is a paragraph; trying to recover this from a rendered PDF is hard. Tagging-aware authoring tools:

  • Microsoft Word. "Save as PDF" with "Document structure tags for accessibility" enabled (default in modern versions, but worth confirming).
  • LibreOffice Writer. "Tagged PDF" option in the Export As PDF dialog.
  • Adobe InDesign. Articles panel and Tag Order panel control export tags. Older InDesign files often have to be revised to produce well-tagged PDFs.
  • Google Docs. "Use accessibility settings during export" in the PDF export dialog.
  • LaTeX. The tagged-pdf package and the work-in-progress LaTeX-PDF/UA project make tagged output workable. Not yet flawless.
  • Apple Pages. Tagged PDFs are the default in recent versions.

If you only have an untagged PDF and the source is gone, you can attempt to tag it after the fact:

  • Adobe Acrobat Pro, "Autotag Document" produces a draft tag tree that always needs human cleanup, especially around tables, columns, and figures.
  • Equidox, CommonLook, axesPDF, commercial remediation tools that streamline the cleanup.
  • Open-source, pikepdf plus careful scripting can patch tags onto a PDF if you know the document structure, but this is engineering work, not a tool you click.

Common pitfalls when tagging

  • Headings styled but not tagged. Bold + large font is not a heading. The PDF needs an actual H1, H2, H3 element.
  • Tables built from text boxes. A grid of independently-placed text frames is not a table to a screen reader. Use the table tools in your authoring app.
  • Figures with no alt text. A figure tagged as <Figure> but with no alternative description is announced as "Figure" and nothing else, worse than untagged in some contexts.
  • Untagged signatures or stamps. A signature image plonked onto a tagged document needs to be either tagged with alt text ("Signature of J. Doe, 17 May 2026") or marked as artifact if purely decorative. See how to create an electronic signature for the workflow.
  • Tagged but with the wrong reading order. The visual page can show a callout box on the right, but the tag tree might place the callout between two paragraphs of the main column, breaking the flow.

For a deeper look at the structural standards involved, see PDF/UA explained and accessibility tags in PDF.

Does every PDF need to be tagged?

In practice, increasingly yes. If you publish documents to the public, the safest assumption is that someone will need them tagged. Even when accessibility is not a legal requirement for you, tagged PDFs convert better, search better, and reflow better.

Internal documents that will only ever be printed and never re-used can stay untagged without consequence. But the marginal cost of tagging at authoring time is small, usually a single dialog option, so there is rarely a reason not to.

Takeaway

Tags are the invisible structure that turns a PDF from a picture of a document into an actual document. An untagged PDF is fine to look at and bad at everything else. A tagged PDF preserves what authors meant, headings, lists, tables, figures, reading order, and unlocks accessibility, accurate conversion, reflow, and clean text extraction. Author with tagging enabled, validate before publishing, and the file works for everyone, including the version of yourself who needs to re-edit or re-extract it three years from now. Tools like Docento.app preserve existing tags when you make changes, so you can edit a properly-authored PDF without quietly breaking its structure.

Related Posts