How to Remove Metadata from PDF Files

PDF files carry extensive metadata: author name, organization, software used, creation and modification timestamps, GPS coordinates (if created on a phone), and editing history. Leaking this data has caused real harm — whistleblowers have been identified from document metadata, and corporate negotiations have been compromised by revision history. This guide shows how to strip it all.

What Metadata PDFs Contain

Use exiftool to inspect a file before cleaning it:

# Install exiftool
sudo apt install libimage-exiftool-perl   # Debian/Ubuntu
brew install exiftool                      # macOS

# View all metadata
exiftool document.pdf

Typical output reveals:

File Name           : document.pdf
Creator             : Microsoft Word
Author              : Jane Smith
Company             : Acme Corp
Create Date         : 2026-01-15 09:32:14+01:00
Modify Date         : 2026-03-20 14:55:02+01:00
Producer            : Adobe PDF Library 15.0
Document ID         : uuid:4a7b2c1d-...
Instance ID         : uuid:9f3e1a8b-...
GPS Latitude        : 51° 30' 26.54" N
GPS Longitude       : 0° 7' 39.41" W

The GPS coordinates here came from a PDF created on an iPhone. The Author and Company fields expose the creator’s identity.

Method 1: exiftool (Recommended for Scripting)

# Remove all metadata from a single file
exiftool -all= document.pdf

# This creates document.pdf_original (backup) and overwrites document.pdf
# To skip backup:
exiftool -all= -overwrite_original document.pdf

# Remove metadata from all PDFs in a directory
exiftool -all= -overwrite_original *.pdf

# Remove all metadata but keep the title (useful for document management)
exiftool -all= -Title="Document" -overwrite_original document.pdf

# Verify removal
exiftool document.pdf | grep -v "^File\|^Directory\|^MIME\|^PDF Version\|^Linearized\|^Page"

After running exiftool -all=, check that Author, Creator, GPS*, and Document ID fields are gone.

Method 2: qpdf (Preserves PDF Structure)

qpdf is a C++ library and command-line tool for structural PDF manipulation. It preserves PDF integrity better than some other tools when dealing with encrypted or form-containing PDFs.

# Install
sudo apt install qpdf
brew install qpdf

# Remove metadata stream and flatten document structure
qpdf --linearize --replace-input document.pdf

# Or write to a new file
qpdf --linearize document.pdf cleaned.pdf

# For encrypted PDFs, provide the password
qpdf --password="password" --decrypt --linearize document.pdf cleaned.pdf

Note: qpdf --linearize does not strip XMP metadata by itself. Combine with exiftool for full cleaning:

qpdf --linearize document.pdf temp.pdf && \
exiftool -all= -overwrite_original temp.pdf && \
mv temp.pdf cleaned.pdf

Method 3: Ghostscript (Nuclear Option)

Ghostscript re-renders the entire PDF, stripping metadata, embedded fonts metadata, and JavaScript:

# Install
sudo apt install ghostscript
brew install ghostscript

# Re-render the PDF (strips virtually everything)
gs \
  -dBATCH \
  -dNOPAUSE \
  -dNOSAFER \
  -sDEVICE=pdfwrite \
  -dCompatibilityLevel=1.4 \
  -dPrinted=false \
  -sOutputFile=cleaned.pdf \
  document.pdf

Ghostscript can slightly change the visual rendering of complex PDFs. Test on a sample page before processing important documents. It also strips embedded fonts (and re-embeds them from system fonts), which can change the appearance of custom-font documents.

Method 4: mat2 (GUI and CLI, Thorough)

mat2 is designed specifically for metadata removal and handles dozens of file types including PDFs, images, Office documents, and audio files.

# Install
sudo apt install mat2
pip3 install mat2

# Check metadata
mat2 --show document.pdf

# Remove metadata
mat2 document.pdf
# Creates document.cleaned.pdf

# In-place (overwrites original)
mat2 --inplace document.pdf

mat2 is the recommended tool for privacy activists and journalists because it has the most stripping across file formats.

Batch Processing

#!/bin/bash
# clean_pdfs.sh - Strip metadata from all PDFs in a directory

INPUT_DIR="$1"
OUTPUT_DIR="${2:-${INPUT_DIR}/cleaned}"

mkdir -p "$OUTPUT_DIR"

for pdf in "$INPUT_DIR"/*.pdf; do
    filename=$(basename "$pdf")
    echo "Processing: $filename"

    # Copy to output, then strip metadata
    cp "$pdf" "$OUTPUT_DIR/$filename"
    exiftool -all= -overwrite_original "$OUTPUT_DIR/$filename"

    # Verify
    remaining=$(exiftool "$OUTPUT_DIR/$filename" | grep -c "Author\|Creator\|GPS\|Company" || true)
    if [ "$remaining" -gt 0 ]; then
        echo "  WARNING: Some metadata may remain in $filename"
    else
        echo "  Clean: $filename"
    fi
done

echo "Done. Cleaned files in $OUTPUT_DIR"

Run it:

chmod +x clean_pdfs.sh
./clean_pdfs.sh /path/to/documents /path/to/output

Handling Redacted Documents

If you are redacting sensitive content before sharing, do not just draw black boxes over text in a PDF editor — the underlying text remains selectable and searchable. The proper approach:

# Flatten the PDF (rasterize to image, then re-PDF)
# This makes text non-selectable — best for truly redacted documents
gs \
  -dBATCH \
  -dNOPAUSE \
  -sDEVICE=pdfimage24 \
  -r150 \
  -sOutputFile=pages_%d.png \
  document.pdf

# Then convert images back to PDF
convert pages_*.png -compress jpeg -quality 85 redacted.pdf

# Remove metadata from result
exiftool -all= -overwrite_original redacted.pdf

Or use the dedicated tool pdf-redact-tools from the Freedom of the Press Foundation:

pip3 install pdf-redact-tools
pdf-redact-tools --sanitize document.pdf

What Metadata Cleaning Does Not Remove

Steganographic watermarks: Publishers like Elsevier and IEEE embed invisible watermarks in PDFs that survive metadata stripping
Font-based fingerprinting: Some services generate slightly unique glyph positions per download
Content-based identifiers: Text content itself is not modified
Print tracking dots: Printed documents from color laser printers contain yellow tracking dots identifying printer and timestamp

For maximum anonymity, re-type or re-create the document from scratch rather than cleaning an existing one.