Extract Text from an XPS File

XPS files store text as XAML with embedded font and position data. If the XPS was created from a real document (not a scan), that text is selectable — you just need a way to get at it. This converter does not output a .txt file directly, but converting XPS to PDF here is the most reliable first step: the resulting PDF preserves the text layer, which you can then copy directly or process with a PDF text-extraction tool.

If your XPS is image-based (created from a scan), there is no hidden text layer to extract. That case requires OCR and is covered separately below.

How XPS stores text

An XPS file is a ZIP archive containing XAML pages. Text is represented as <Glyphs> elements with Unicode character data, x/y coordinates, and a reference to an embedded font. This means text is present as real characters — not a rasterised image — unless the document was created from a scan.

When this converter re-renders the XPS to PDF, it preserves those text objects in the PDF output. The PDF is not a flat image; text remains selectable and searchable.

Copy text directly from the PDF

After converting to PDF:

Open the PDF in any PDF reader (Adobe Acrobat Reader, Firefox, Chrome, macOS Preview, Edge, or Okular on Linux).
Use Ctrl+A (or Cmd+A on Mac) to select all, then Ctrl+C / Cmd+C to copy.
Paste into a text editor, Word, or wherever you need it.

This is the quickest route for extracting a few paragraphs or checking whether the text layer is present. For large documents or batch extraction, the command-line route below is faster.

Step 1: Convert XPS to PDF (Text Preserved)

Up to 20 files at once · 25 MB per file · no watermark · files deleted within 60 minutes.

Extract text with pdftotext (poppler)

For automated or bulk extraction, pdftotext from the poppler toolkit extracts text from every page in one command:

pdftotext input.pdf output.txt

On Ubuntu/Debian: sudo apt install poppler-utils. On macOS with Homebrew: brew install poppler. On Windows, poppler binaries are available from third-party build repositories.

Useful flags:

-layout — attempts to preserve the spatial layout of columns and tables using whitespace.
-f 3 -l 5 — extract only pages 3 to 5.
-enc UTF-8 — force UTF-8 output (the default on most systems).

Scanned XPS: OCR is required

If you convert to PDF and cannot select any text, the XPS was created from rasterised images rather than real document content. The PDF will also be image-only.

Options:

OCRmyPDF (free, open source): ocrmypdf input.pdf output.pdf adds a searchable text layer to the image-based PDF. Powered by Tesseract; available on Linux, macOS (Homebrew), and Windows (WSL or native build).
Adobe Acrobat Pro: Tools → Enhance Scans → Recognise Text. Paid, but accurate and handles complex layouts.
Google Docs: Upload the PDF to Google Drive and open it with Google Docs — it will OCR the content on import. Quality varies but it is free.

Checking if your XPS has a text layer

Convert to PDF here, open the result in a browser (Chrome or Firefox) and try to select a word. If text highlights, the layer is intact. If the cursor turns into a crosshair and nothing highlights, the file is image-based and you need OCR.

You can also inspect the raw XPS: rename the .xps to .zip, extract it, and look at the page XAML files. If the <Glyphs> elements have a UnicodeString attribute with readable characters, the text layer is present.

Frequently asked questions

Does this tool output a plain text file?

No — it outputs PDF or JPG. Convert to PDF (which preserves the text layer), then copy text manually from a PDF reader or use pdftotext from the poppler package.

Will the text be in the correct reading order?

Usually yes for standard documents. Complex multi-column layouts and tables can produce text in a non-reading order depending on how the original XPS positioned its Glyphs elements. Using pdftotext -layout helps with column documents.

My XPS was printed from a scanner — can I still get text?

Yes, but you need OCR. Convert to PDF here first, then run the PDF through OCRmyPDF, Adobe Acrobat Pro, or Google Docs to add a text layer.

Are special characters and Unicode text preserved?

Yes, as long as the XPS embeds the font with the full character set (most do). The PDF output and pdftotext with -enc UTF-8 should give you the correct characters.

Can I extract text from a password-protected XPS?

No. This tool cannot bypass password protection or Microsoft IRM (Information Rights Management) on XPS files.

Last updated: June 2026