OCR PDF
Extract text from scanned PDFs or create a searchable PDF with a hidden text layer
OCR is ready. Tesseract Python bindings detected. Make sure the language pack you select is installed in your Tesseract tessdata directory — you'll get a clear error if it isn't.
Two output modes:
- Searchable PDF — keeps the original page images and adds an invisible text layer underneath, so you can copy-paste and search. The PDF still looks identical to the scan.
- Extracted text — just the recognised text, plain.
Higher DPI = better OCR accuracy but slower. 200 DPI is the sweet spot for most scans; bump to 300+ for small fonts or low-quality scans.