class DocumentProtector
A class that handles protecting PDF documents from editing by applying encryption and permission restrictions using pikepdf and PyMuPDF libraries.
/tf/active/vicechatdev/document_auditor/src/security/document_protection.py
9 - 118
moderate
Purpose
DocumentProtector provides functionality to secure PDF files by applying owner passwords and configuring granular permissions (printing, copying, editing). It supports multiple encryption levels (40-bit, 128-bit, AES-256) and can generate secure passwords automatically. The class is designed to prevent unauthorized editing while allowing controlled access to document features. It verifies protection after application and handles the entire lifecycle of PDF security implementation.
Source Code
class DocumentProtector:
"""Handles protecting PDF documents from editing"""
def __init__(self):
self.logger = logging.getLogger(__name__)
def protect_document(self, pdf_path, password=None, allow_printing=True,
allow_copying=True, encryption_level=2):
"""
Apply protection to prevent editing of the document
Args:
pdf_path (str): Path to the PDF file
password (str, optional): Owner password to set (will be generated if None)
allow_printing (bool): Whether to allow printing
allow_copying (bool): Whether to allow copying of text
encryption_level (int): Encryption level (1=40bit, 2=128bit, 3=AES-256)
Returns:
tuple: (path to protected PDF, owner password)
"""
if not os.path.exists(pdf_path):
raise FileNotFoundError(f"PDF file not found: {pdf_path}")
# Generate a password if not provided
if password is None:
password = self._generate_secure_password()
try:
# Use pikepdf for more robust PDF encryption
temp_output = f"{pdf_path}.protected_tmp"
# Open the PDF
with pikepdf.open(pdf_path) as pdf:
# Set permissions
user_perms = pikepdf.Permissions(
accessibility=True, # Always allow accessibility
extract=allow_copying,
modify_annotation=False,
modify_assembly=False,
modify_form=False,
modify_other=False,
print_lowres=allow_printing,
print_highres=allow_printing
)
# Choose encryption method based on level
if encryption_level == 1:
encryption = pikepdf.Encryption(
user="", # Empty user password allows opening
owner=password,
allow=user_perms,
R=2 # Acrobat 3 (PDF 1.2) 40-bit
)
elif encryption_level == 2:
encryption = pikepdf.Encryption(
user="", # Empty user password allows opening
owner=password,
allow=user_perms,
R=4 # Acrobat 6 (PDF 1.5) 128-bit
)
else: # level 3
encryption = pikepdf.Encryption(
user="", # Empty user password allows opening
owner=password,
allow=user_perms,
R=6 # Acrobat X (PDF 1.7) AES-256
)
# Save with encryption
pdf.save(temp_output, encryption=encryption)
# Verify the protection
self._verify_protection(temp_output, password)
# Replace original with protected version
os.replace(temp_output, pdf_path)
self.logger.info(f"Applied document protection to: {pdf_path}")
return pdf_path, password
except Exception as e:
self.logger.error(f"Error protecting document: {e}")
raise
def _verify_protection(self, pdf_path, password):
"""Verify that the PDF is properly protected"""
try:
# Try to open with pikepdf - should succeed with owner password
with pikepdf.open(pdf_path, password=password) as pdf:
pass # Just testing we can open it
# Try to open with PyMuPDF to verify permissions
doc = fitz.open(pdf_path)
# Check permissions using PyMuPDF permissions mask constants if available
# Note: PyMuPDF versions may have different attribute names
permissions = doc.permissions
doc.close()
# PyMuPDF permission constants might vary by version, so we'll just log the raw value
self.logger.debug(f"Protection verification complete. Raw permissions value: {permissions}")
except Exception as e:
self.logger.warning(f"Protection verification failed: {e}")
def _generate_secure_password(self, length=12):
"""Generate a secure random password"""
characters = string.ascii_letters + string.digits + string.punctuation
password = ''.join(random.choice(characters) for i in range(length))
return password
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
bases |
- | - |
Parameter Details
No constructor parameters: The __init__ method takes no parameters. It only initializes a logger instance for the class.
Return Value
Instantiation returns a DocumentProtector object. The main method protect_document() returns a tuple containing (str: path to protected PDF, str: owner password used). The protected PDF path is the same as the input path since the original file is replaced. The password is either the one provided or an auto-generated secure password.
Class Interface
Methods
__init__(self)
Purpose: Initialize the DocumentProtector instance with a logger
Returns: None - initializes the instance
protect_document(self, pdf_path: str, password: str = None, allow_printing: bool = True, allow_copying: bool = True, encryption_level: int = 2) -> tuple[str, str]
Purpose: Apply protection to a PDF document to prevent editing, with configurable permissions and encryption
Parameters:
pdf_path: Path to the PDF file to protect (must exist)password: Owner password to set for protection. If None, a secure password is auto-generatedallow_printing: Whether to allow printing the document (both low and high resolution)allow_copying: Whether to allow copying/extracting text from the documentencryption_level: Encryption strength: 1=40-bit (Acrobat 3), 2=128-bit (Acrobat 6), 3=AES-256 (Acrobat X)
Returns: Tuple of (protected_pdf_path: str, owner_password: str). The path is the same as input since original is replaced. Password is either provided or auto-generated.
_verify_protection(self, pdf_path: str, password: str) -> None
Purpose: Internal method to verify that PDF protection was applied correctly by attempting to open with password and checking permissions
Parameters:
pdf_path: Path to the protected PDF file to verifypassword: Owner password to use for verification
Returns: None - logs verification results or warnings
_generate_secure_password(self, length: int = 12) -> str
Purpose: Internal method to generate a cryptographically random password with letters, digits, and punctuation
Parameters:
length: Length of the password to generate (default 12 characters)
Returns: String containing a randomly generated secure password
Attributes
| Name | Type | Description | Scope |
|---|---|---|---|
logger |
logging.Logger | Logger instance for the class, initialized with the module name. Used to log info, debug, warning, and error messages during PDF protection operations. | instance |
Dependencies
loggingfitzospikepdftempfilerandomstring
Required Imports
import logging
import fitz
import os
import pikepdf
import tempfile
import random
import string
Usage Example
import logging
import os
import pikepdf
import fitz
import tempfile
import random
import string
# Configure logging (optional but recommended)
logging.basicConfig(level=logging.INFO)
# Instantiate the protector
protector = DocumentProtector()
# Protect a PDF with default settings (auto-generated password, 128-bit encryption)
pdf_path = 'document.pdf'
protected_path, password = protector.protect_document(pdf_path)
print(f'Protected PDF: {protected_path}')
print(f'Owner password: {password}')
# Protect with custom settings
protected_path, password = protector.protect_document(
pdf_path='document.pdf',
password='MySecurePassword123!',
allow_printing=True,
allow_copying=False,
encryption_level=3 # AES-256
)
# Protect with minimal permissions
protected_path, password = protector.protect_document(
pdf_path='sensitive.pdf',
allow_printing=False,
allow_copying=False,
encryption_level=3
)
Best Practices
- Always store the returned owner password securely - it's needed to remove protection later
- The original PDF file is replaced with the protected version - make a backup if needed before calling protect_document()
- Use encryption_level=3 (AES-256) for maximum security on modern systems
- The class creates temporary files during processing - ensure sufficient disk space and write permissions
- User password is intentionally left empty to allow opening the PDF without a password, while owner password prevents editing
- Call protect_document() only once per PDF file - repeated calls will require the owner password to modify
- Handle FileNotFoundError when the PDF path doesn't exist
- The logger attribute can be configured externally for custom logging behavior
- Verification step (_verify_protection) may log warnings but won't prevent protection from being applied
- Auto-generated passwords include special characters - ensure your system can handle them if storing/displaying
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class DocumentProcessor 57.8% similar
-
class Watermarker 56.5% similar
-
class PDFManipulator 52.1% similar
-
class ControlledDocumentConverter 51.7% similar
-
class PDFAConverter 51.6% similar