A scanned PDF is just an image of a document. Each page is a picture; the words you see are pixels, not text. You cannot select, copy, search, or edit them in any normal way. To do any of that, you have to recover the underlying text, which means OCR. This guide walks through editing a scanned PDF end to end, OCR first, then the actual edits, then the cleanup.
What "scanned PDF" actually is
A scanned PDF page contains:
- A single full-page image (TIFF, PNG, or JPEG embedded)
- No selectable text
- No structure tree (paragraphs, headings, lists)
- No form fields
When you click on a word, your cursor lands on pixels. There is no text under it.
This is different from a "native" PDF, which contains real text characters drawn at specific positions. A native PDF lets you select, copy, and edit. A scanned PDF does not, until you OCR it.
Step 1: Identify whether you actually have a scanned PDF
Quick checks:
- Try to select text. If your cursor selects pixels (a colored rectangle appears with no text highlight), it is scanned.
- Try Ctrl+F to search for a known word in the document. If search returns nothing, no text layer.
- Check the file's PDF version and producer. Right-click → Properties. Scanners often produce PDF 1.4 or 1.5 with a producer name like "ScanSnap" or "Brother MFC".
Sometimes a PDF has some text and some scanned pages, common when documents have been assembled from multiple sources. Treat each scanned section separately.
Step 2: Run OCR to add a text layer
OCR (Optical Character Recognition) analyzes the scanned image and produces text matching what it sees. The text is added as an invisible layer on top of the image, perfectly aligned. The visual appearance is unchanged; the document is now searchable, selectable, and (sort of) editable.
Tools:
- Adobe Acrobat Pro, Tools → Scan & OCR → Recognize Text. Reliable, paid.
- ABBYY FineReader, strongest commercial OCR for accuracy, especially in non-English languages.
- OCRmyPDF, open source CLI built on Tesseract.
ocrmypdf input.pdf output.pdf. Free and very good. - Tesseract directly, Google's open-source OCR engine. Requires setup but very capable.
- Cloud OCR APIs, Google Document AI, Amazon Textract, Microsoft Azure. Highest accuracy but cost-per-page; data leaves your network.
For day-to-day scanned-PDF work, OCRmyPDF is the right starting point.
For more on OCR fundamentals, see PDF OCR explained and how to make a PDF searchable OCR.
Step 3: Preprocess for better OCR
OCR accuracy depends heavily on input quality. Before running OCR:
- Deskew, rotate pages so text is horizontal. OCRmyPDF's
--deskewdoes this. - Rotate, fix pages that came in sideways or upside down.
--rotate-pagesauto-detects. - Clean, remove specks, lines, and noise. OCRmyPDF's
--cleanusesunpaper. - Convert to grayscale or bilevel, color noise hurts OCR. Bilevel (black and white) is best for text-only pages.
- Increase contrast, for faint scans, ImageMagick's
convert -level 30%,80%makes text more legible to OCR.
A typical preprocessing command:
ocrmypdf --rotate-pages --deskew --clean --remove-background \
scanned.pdf clean-with-text.pdf
For more on cleanup before OCR, see how to recover a corrupted PDF and how to convert a scanned PDF to text.
Step 4: Edit the text
After OCR, you can use a PDF editor's text editing tools on the file. But: the underlying page is still an image. When you click into a word, you are clicking the invisible text layer above the image. Editing the text changes the text layer but does NOT change the image.
To make a visible edit, you have to:
- Cover the original visible text (a "redaction", a white rectangle on top of the offending word)
- Type the replacement in its place
- Update the underlying text layer to match
Acrobat Pro and similar editors do this somewhat automatically when you use Edit Text on an OCR'd scanned PDF, they detect the OCR'd word, white it out, and let you type a replacement. The result is a PDF where both the visible and the invisible text are updated together.
The illusion is good for short edits. For substantial editing, this workflow breaks down. If you have major changes to make:
- Re-scan after physical correction if possible
- Re-author from scratch in a real document editor
- Convert to Word (see how to convert a PDF to Word) and edit there, then re-export to PDF
Specific edits
Fixing a typo. Cover the typo with a white rectangle matching the page background, type the correction in the same font and size, update the OCR text. Acrobat Pro's Edit Text handles this in one step on OCR'd documents.
Adding a paragraph. Hard. The new paragraph has to be added on top of the image. Find an empty area, add a text annotation or content text, and accept that the visual style may not match the surrounding scanned content perfectly.
Removing a section. Cover the section with a white rectangle (matching page background). Visible content disappears. For real "remove this from the file" semantics, see how to redact text in a PDF, covering is visually identical to redaction, but redaction removes the underlying text layer too.
Replacing a logo or image. See how to replace an image in PDF. For scanned PDFs, the "image" is the entire page, so you would need to swap the whole-page image, usually impractical.
Filling form fields. A scanned form has no real form fields. You can add fields on top of the visible boxes (see how to make a PDF fillable), but for one-off fills, adding text annotations over each box is easier. See how to fill out a PDF form.
Adding annotations vs editing text
Annotations are layered on top of the page. They do not modify the original image. Adding sticky notes, highlights, and shape annotations works exactly the same on a scanned PDF as on a native one. See annotating a PDF guide.
If your "edit" is just commentary or markup, annotations are the right answer for scanned PDFs.
Quality limits
A scanned PDF has a ceiling on quality:
- Resolution, a 200 DPI scan cannot be cleanly edited as if it were 600 DPI. Text edits show as a different sharpness than the rest of the page.
- Font matching, the original font is just glyph shapes in the image. Your edit uses a real font that has to approximate the visual. Common typewriter and serif fonts (Times, Garamond, Courier) often match closely; brand-specific fonts may not.
- Color matching, a scanned page is rarely perfectly white; it is often slightly off-white. White rectangles covering original content stand out unless you sample the background color first.
For high-stakes edits where the document needs to look pristine, the best path is to re-author from scratch.
Common gotchas
OCR makes errors that propagate. The text layer has occasional misreads (m vs rn, 0 vs O). If you search and replace based on OCR text, you may miss instances spelled differently. Spell-check the OCR output before relying on it for search.
Page rotation persists despite scanning. A page scanned sideways will OCR sideways. Pre-rotate before OCR or use --rotate-pages.
Skew distorts the text layer. A page scanned at a 2° angle gets a text layer at that same angle. OCRmyPDF's --deskew fixes the image and re-aligns the layer.
File size balloons after OCR. OCR adds an invisible text layer; the original image is still there. Use --optimize 3 in OCRmyPDF to also recompress the image and shrink the file.
Edits look obvious. A typed correction on a scanned page looks different from the surrounding scanned text. For final output, consider whether you can hide the difference (matching font and size as closely as possible) or whether the user will accept "visibly edited".
Tagged PDF accessibility. Most scanned PDFs are untagged. After OCR, they remain untagged unless you explicitly auto-tag and clean up. See tagged PDF vs untagged PDF.
Signed documents. A signed PDF cannot be edited without invalidating the signature. If you receive a signed scanned PDF, you can annotate but not modify. See digital signatures vs electronic signatures.
Best practice: avoid scanned PDFs when you can
The strongest advice for editing scanned PDFs is to not need to. If you have control over the source:
- Author and edit in a real document editor
- Export to PDF for distribution
- Keep the source for future edits
If you must work with scanned PDFs:
- OCR them immediately on ingest
- Tag and structure them if accessibility matters
- Keep the original scanned file separate as the "archive"
Practical recipe
For a small edit on a scanned PDF:
ocrmypdf --rotate-pages --deskew --clean scanned.pdf prepared.pdf- Open
prepared.pdfin Acrobat Pro or Docento.app - Use Edit Text to modify the OCR'd word
- The editor covers the original and adds your text in the layered position
- Save as a new file
- Verify the result visually
For substantial editing:
- Convert to Word: see how to convert a PDF to Word
- Edit in Word
- Re-export as PDF
- Accept that the result is no longer "the scanned original"
Takeaway
Editing a scanned PDF is a two-step problem: OCR adds a text layer, and then a smart editor lets you change the visible content by overlaying corrections. For small edits this works well. For substantial editing, conversion to Word or re-authoring is faster and produces cleaner results. The right preprocessing, rotation, deskew, cleanup, dramatically improves OCR quality and thus editability. For browser-based combination of OCR-aware editing and other operations like signing or combining PDFs, Docento.app handles the workflow without installing anything. And always preserve the original scan as a separate archive, it is the truth, and your edited copy is the derivative.