🔍 Code Extractor

class DocumentProtector

Maturity: 50

A class that handles protecting PDF documents from editing by applying encryption and permission restrictions using pikepdf and PyMuPDF libraries.

File:
/tf/active/vicechatdev/document_auditor/src/security/document_protection.py
Lines:
9 - 118
Complexity:
moderate

Purpose

DocumentProtector provides functionality to secure PDF files by applying owner passwords and configuring granular permissions (printing, copying, editing). It supports multiple encryption levels (40-bit, 128-bit, AES-256) and can generate secure passwords automatically. The class is designed to prevent unauthorized editing while allowing controlled access to document features. It verifies protection after application and handles the entire lifecycle of PDF security implementation.

Source Code

class DocumentProtector:
    """Handles protecting PDF documents from editing"""
    
    def __init__(self):
        self.logger = logging.getLogger(__name__)
    
    def protect_document(self, pdf_path, password=None, allow_printing=True, 
                         allow_copying=True, encryption_level=2):
        """
        Apply protection to prevent editing of the document
        
        Args:
            pdf_path (str): Path to the PDF file
            password (str, optional): Owner password to set (will be generated if None)
            allow_printing (bool): Whether to allow printing
            allow_copying (bool): Whether to allow copying of text
            encryption_level (int): Encryption level (1=40bit, 2=128bit, 3=AES-256)
            
        Returns:
            tuple: (path to protected PDF, owner password)
        """
        if not os.path.exists(pdf_path):
            raise FileNotFoundError(f"PDF file not found: {pdf_path}")
        
        # Generate a password if not provided
        if password is None:
            password = self._generate_secure_password()
        
        try:
            # Use pikepdf for more robust PDF encryption
            temp_output = f"{pdf_path}.protected_tmp"
            
            # Open the PDF
            with pikepdf.open(pdf_path) as pdf:
                # Set permissions
                user_perms = pikepdf.Permissions(
                    accessibility=True,  # Always allow accessibility
                    extract=allow_copying,
                    modify_annotation=False,
                    modify_assembly=False,
                    modify_form=False,
                    modify_other=False,
                    print_lowres=allow_printing,
                    print_highres=allow_printing
                )
                
                # Choose encryption method based on level
                if encryption_level == 1:
                    encryption = pikepdf.Encryption(
                        user="",  # Empty user password allows opening
                        owner=password,
                        allow=user_perms,
                        R=2  # Acrobat 3 (PDF 1.2) 40-bit
                    )
                elif encryption_level == 2:
                    encryption = pikepdf.Encryption(
                        user="",  # Empty user password allows opening
                        owner=password,
                        allow=user_perms,
                        R=4  # Acrobat 6 (PDF 1.5) 128-bit
                    )
                else:  # level 3
                    encryption = pikepdf.Encryption(
                        user="",  # Empty user password allows opening
                        owner=password,
                        allow=user_perms,
                        R=6  # Acrobat X (PDF 1.7) AES-256
                    )
                
                # Save with encryption
                pdf.save(temp_output, encryption=encryption)
            
            # Verify the protection
            self._verify_protection(temp_output, password)
            
            # Replace original with protected version
            os.replace(temp_output, pdf_path)
            
            self.logger.info(f"Applied document protection to: {pdf_path}")
            return pdf_path, password
            
        except Exception as e:
            self.logger.error(f"Error protecting document: {e}")
            raise
    
    def _verify_protection(self, pdf_path, password):
        """Verify that the PDF is properly protected"""
        try:
            # Try to open with pikepdf - should succeed with owner password
            with pikepdf.open(pdf_path, password=password) as pdf:
                pass  # Just testing we can open it
                
            # Try to open with PyMuPDF to verify permissions
            doc = fitz.open(pdf_path)
            # Check permissions using PyMuPDF permissions mask constants if available
            # Note: PyMuPDF versions may have different attribute names
            permissions = doc.permissions
            doc.close()
            
            # PyMuPDF permission constants might vary by version, so we'll just log the raw value
            self.logger.debug(f"Protection verification complete. Raw permissions value: {permissions}")
            
        except Exception as e:
            self.logger.warning(f"Protection verification failed: {e}")
    
    def _generate_secure_password(self, length=12):
        """Generate a secure random password"""
        characters = string.ascii_letters + string.digits + string.punctuation
        password = ''.join(random.choice(characters) for i in range(length))
        return password

Parameters

Name Type Default Kind
bases - -

Parameter Details

No constructor parameters: The __init__ method takes no parameters. It only initializes a logger instance for the class.

Return Value

Instantiation returns a DocumentProtector object. The main method protect_document() returns a tuple containing (str: path to protected PDF, str: owner password used). The protected PDF path is the same as the input path since the original file is replaced. The password is either the one provided or an auto-generated secure password.

Class Interface

Methods

__init__(self)

Purpose: Initialize the DocumentProtector instance with a logger

Returns: None - initializes the instance

protect_document(self, pdf_path: str, password: str = None, allow_printing: bool = True, allow_copying: bool = True, encryption_level: int = 2) -> tuple[str, str]

Purpose: Apply protection to a PDF document to prevent editing, with configurable permissions and encryption

Parameters:

  • pdf_path: Path to the PDF file to protect (must exist)
  • password: Owner password to set for protection. If None, a secure password is auto-generated
  • allow_printing: Whether to allow printing the document (both low and high resolution)
  • allow_copying: Whether to allow copying/extracting text from the document
  • encryption_level: Encryption strength: 1=40-bit (Acrobat 3), 2=128-bit (Acrobat 6), 3=AES-256 (Acrobat X)

Returns: Tuple of (protected_pdf_path: str, owner_password: str). The path is the same as input since original is replaced. Password is either provided or auto-generated.

_verify_protection(self, pdf_path: str, password: str) -> None

Purpose: Internal method to verify that PDF protection was applied correctly by attempting to open with password and checking permissions

Parameters:

  • pdf_path: Path to the protected PDF file to verify
  • password: Owner password to use for verification

Returns: None - logs verification results or warnings

_generate_secure_password(self, length: int = 12) -> str

Purpose: Internal method to generate a cryptographically random password with letters, digits, and punctuation

Parameters:

  • length: Length of the password to generate (default 12 characters)

Returns: String containing a randomly generated secure password

Attributes

Name Type Description Scope
logger logging.Logger Logger instance for the class, initialized with the module name. Used to log info, debug, warning, and error messages during PDF protection operations. instance

Dependencies

  • logging
  • fitz
  • os
  • pikepdf
  • tempfile
  • random
  • string

Required Imports

import logging
import fitz
import os
import pikepdf
import tempfile
import random
import string

Usage Example

import logging
import os
import pikepdf
import fitz
import tempfile
import random
import string

# Configure logging (optional but recommended)
logging.basicConfig(level=logging.INFO)

# Instantiate the protector
protector = DocumentProtector()

# Protect a PDF with default settings (auto-generated password, 128-bit encryption)
pdf_path = 'document.pdf'
protected_path, password = protector.protect_document(pdf_path)
print(f'Protected PDF: {protected_path}')
print(f'Owner password: {password}')

# Protect with custom settings
protected_path, password = protector.protect_document(
    pdf_path='document.pdf',
    password='MySecurePassword123!',
    allow_printing=True,
    allow_copying=False,
    encryption_level=3  # AES-256
)

# Protect with minimal permissions
protected_path, password = protector.protect_document(
    pdf_path='sensitive.pdf',
    allow_printing=False,
    allow_copying=False,
    encryption_level=3
)

Best Practices

  • Always store the returned owner password securely - it's needed to remove protection later
  • The original PDF file is replaced with the protected version - make a backup if needed before calling protect_document()
  • Use encryption_level=3 (AES-256) for maximum security on modern systems
  • The class creates temporary files during processing - ensure sufficient disk space and write permissions
  • User password is intentionally left empty to allow opening the PDF without a password, while owner password prevents editing
  • Call protect_document() only once per PDF file - repeated calls will require the owner password to modify
  • Handle FileNotFoundError when the PDF path doesn't exist
  • The logger attribute can be configured externally for custom logging behavior
  • Verification step (_verify_protection) may log warnings but won't prevent protection from being applied
  • Auto-generated passwords include special characters - ensure your system can handle them if storing/displaying

Similar Components

AI-powered semantic similarity - components with related functionality:

  • class DocumentProcessor 57.8% similar

    A comprehensive document processing class that converts documents to PDF, adds audit trails, applies security features (watermarks, signatures, hashing), and optionally converts to PDF/A format with document protection.

    From: /tf/active/vicechatdev/document_auditor/src/document_processor.py
  • class Watermarker 56.5% similar

    A class that adds watermark images to PDF documents with configurable opacity, scale, and positioning options.

    From: /tf/active/vicechatdev/document_auditor/src/security/watermark.py
  • class PDFManipulator 52.1% similar

    Manipulates existing PDF documents This class provides methods to add watermarks, merge PDFs, extract pages, and perform other manipulation operations.

    From: /tf/active/vicechatdev/CDocs/utils/pdf_utils.py
  • class ControlledDocumentConverter 51.7% similar

    A comprehensive document converter class that transforms controlled documents into archived PDFs with signature pages, audit trails, hash-based integrity verification, and PDF/A compliance for long-term archival.

    From: /tf/active/vicechatdev/CDocs/utils/document_converter.py
  • class PDFAConverter 51.6% similar

    A class that converts PDF files to PDF/A format for long-term archiving and compliance, supporting multiple compliance levels (1b, 2b, 3b) with fallback conversion methods.

    From: /tf/active/vicechatdev/document_auditor/src/utils/pdf_utils.py
← Back to Browse