Docento.app Logo
Docento.app
Wide desk shot with documents
All Posts

DjVu vs PDF: The Format Behind Scanned Books and Why You Might Still Care

April 22, 2026·6 min read

If you have ever downloaded an old scanned book from a digital library and the file extension was .djvu instead of .pdf, you have met DjVu. DjVu (pronounced "déjà vu") was designed in the late 1990s specifically for scanned documents, newspapers, books, technical manuals, and for many years it produced significantly smaller files than PDF for the same scanned content. PDF eventually caught up, but DjVu still has a niche, and there is a good chance you will encounter it. This article explains what it is and how it compares.

What DjVu is

DjVu was invented at AT&T Research in 1996 and released as an open file format in 2002. Its key trick is layered compression of scanned page images:

  • Foreground layer, text and line art, stored at high resolution (typically 300 DPI) with a black-and-white or limited-palette compression scheme (JB2).
  • Background layer, paper texture, colors, and any photographs, stored at lower resolution (typically 100 DPI) with a wavelet-based scheme (IW44).
  • Mask, a bilevel image telling the renderer which pixels are foreground and which are background.

The renderer composites the two layers into the final page. Because the human visual system is far more sensitive to crisp edges in text than to absolute fidelity in background tones, the technique can shrink a scanned page to a tiny fraction of its raw size without making the text unreadable.

For text-heavy scanned pages, a well-encoded DjVu file is often 5-10× smaller than a comparable PDF using the same DPI input. That advantage was decisive when bandwidth and storage were tight in the early 2000s.

How PDF caught up

PDF gradually closed the gap with a few additions:

  • JBIG2 compression (PDF 1.4+), bilevel image compression that uses pattern matching to compress repeated character glyphs. The same trick DjVu uses, applied to PDF.
  • JPEG 2000 (PDF 1.5+), wavelet-based image compression that competes with DjVu's IW44.
  • Mixed Raster Content (MRC), multi-layer image compression in PDF, mimicking DjVu's foreground/background split.
  • Better-tuned production pipelines. Tools like ABBYY FineReader, IRIS Readiris, and OCRmyPDF produce small, searchable PDFs by applying MRC and JBIG2 carefully.

A modern OCR pipeline can produce a searchable, MRC-encoded PDF whose size is within 10-20% of an equivalent DjVu file, close enough that DjVu's compression advantage no longer compels a separate format. See PDF OCR explained and how to make a PDF searchable OCR.

Where DjVu still wins

A handful of cases:

  • Massive historical scan collections. Many digital libraries, including the Internet Archive, parts of Google Books, and certain national libraries, host millions of files in DjVu. Replacing them with PDF means re-OCR'ing and re-encoding the entire collection.
  • Very tight bandwidth. For audiences on extremely slow networks, DjVu's smaller files still load faster.
  • Russian and Eastern European academic publishing. DjVu has unusually strong adoption in certain regional academic ecosystems for reasons of historical tooling.
  • OCR plus deep zoom. DjVu's layered structure lends itself well to "zoom into a tiny detail" workflows for archival researchers.

Where PDF wins

Almost everywhere else:

  • Universal reader support. Every device, every browser, every email client opens PDF without extra software. DjVu requires a dedicated viewer or browser plugin (DjView, SumatraPDF, Okular, and a few others).
  • Active standardization. PDF has PDF/A, PDF/UA, PDF/X, PDF 2.0, and so on. DjVu has not seen substantial spec evolution in over a decade.
  • Editing and annotation. PDF editing is well-supported; DjVu editing is rare and fragile.
  • Mobile. PDF readers ship by default on iOS and Android. DjVu readers exist but are rare and often unmaintained.
  • Toolchain. PDF tools are abundant, Acrobat, Ghostscript, qpdf, MuPDF, PDFBox, iText, pikepdf, and dozens more. DjVu tools are essentially DjVuLibre and a few derivatives.

What to do when you encounter a DjVu file

The pragmatic answer in 2026: convert it to PDF and continue.

Options:

  • DjVuLibre's djvu2pdf tool. Free, scriptable, the standard CLI conversion path.
  • Online converters. Several free services accept DjVu and emit PDF. Be cautious with sensitive content, see are online PDF editors safe.
  • DjView or Okular. Open the DjVu file, then File → Print → "Save as PDF" or print to PDF.
  • Bulk conversion. A simple shell loop calling djvu2pdf over a directory handles archives in minutes.

After conversion, you can reduce file size, annotate, sign, or convert to other formats using normal PDF tools, including in-browser editing via Docento.app.

What if you specifically need to produce DjVu?

Rare requirement, but it happens, for example, if you are contributing to an archive that mandates the format. The standard production toolchain:

  • DjVuLibre's c44, cjb2, and djvm tools assemble pages from TIFFs or other source images.
  • csepdjvu converts a separated PostScript file into a DjVu document.
  • ScanTailor + DjVu Converter, a popular workflow for scanned books.

For most modern scanning workflows, producing a searchable PDF with OCR and then converting to DjVu if necessary is more practical than producing DjVu natively. The PDF version is more useful day to day, and the DjVu archive copy can be regenerated.

A side-by-side comparison

| | DjVu | PDF | |--|--|--| | Original purpose | Compressing scanned documents | Visually fixed-layout documents | | Year introduced | 1996 / open 2002 | 1993 | | Standardization | None as ISO | ISO 32000 family | | Compression for scans | Excellent (MRC + JB2 + IW44) | Good with JBIG2 / JPEG 2000 / MRC | | Reader support | Niche (DjView, Okular, SumatraPDF) | Universal | | Editing tools | Few | Many | | Accessibility | Limited | PDF/UA | | Mobile | Rare | Universal | | Web embedding | Minimal | Native browser support | | Best use case | Massive scanned book archives | Almost everything |

Common gotchas

OCR layer. DjVu files can contain a hidden OCR text layer, just as PDFs can. When converting, make sure your converter preserves the text layer. Without it, the resulting PDF is image-only and not searchable.

Bookmarks and metadata. DjVu's outline and metadata model is simpler than PDF's. Conversion sometimes drops nested bookmarks or non-standard metadata.

Page numbering. Scanned books often have an offset between the page label ("page 1" of the text) and the file's physical page index. Preserve that with PDF page labels after conversion.

Color profiles. DjVu does not carry color management metadata the way PDF/X does. If color is critical, you need to add an ICC profile in the converted PDF.

Takeaway

DjVu solved a real problem of its era, compressing scanned documents to a fraction of PDF's size, and an entire generation of digital libraries still relies on it. PDF caught up technically and won everywhere except those legacy archives. In day-to-day work in 2026, if you encounter a DjVu file, convert it to PDF, preserving the OCR text layer, and continue with normal tooling. If you need to produce DjVu specifically for archive submission, start from a high-quality PDF master and convert that. For everyday viewing, editing, and signing, Docento.app and the broader PDF ecosystem are more pleasant places to work than DjVu's small, aging toolchain.

Related Posts