function main_v48
Entry point function that demonstrates document processing workflow by creating an audited, watermarked, and protected PDF/A document from a DOCX file with audit trail data.
/tf/active/vicechatdev/document_auditor/main.py
23 - 114
moderate
Purpose
This function serves as a demonstration and testing entry point for the document processing system. It sets up necessary directories, validates input files (DOCX document, JSON audit data, watermark image), processes the document through the DocumentProcessor pipeline to create a compliant PDF/A output with watermarks and signatures, and performs verification checks on the resulting document including hash verification, PDF/A compliance, and protection status.
Source Code
def main():
# Create sample directory structure if it doesn't exist
signatures_dir = os.path.join(os.path.dirname(__file__), 'signatures')
if not os.path.exists(signatures_dir):
os.makedirs(signatures_dir)
logger.info(f"Created signatures directory: {signatures_dir}")
# Sample document and audit data
sample_doc = os.path.join(os.path.dirname(__file__), './examples/test_document_original.docx')
sample_json = os.path.join(os.path.dirname(__file__), './examples/sample_audit_data.json')
output_pdf = os.path.join(os.path.dirname(__file__), './examples/audited_document.pdf')
watermark_path = os.path.join(os.path.dirname(__file__), './examples/ViceBio_Logo_dark blue.png')
# Check if files exist
if not os.path.exists(sample_doc):
logger.error(f"Sample document not found: {sample_doc}")
return
if not os.path.exists(sample_json):
logger.error(f"Audit data JSON not found: {sample_json}")
return
if not os.path.exists(watermark_path):
logger.warning(f"Watermark image not found: {watermark_path}")
watermark_path = None
# Initialize document processor
processor = DocumentProcessor()
# Process document
try:
output_path = processor.process_document(
original_doc_path=sample_doc,
json_path=sample_json,
output_path=output_pdf,
watermark_image=watermark_path,
include_signatures=True,
convert_to_pdfa=True,
compliance_level='2b',
finalize=True # Add this parameter to lock the document
)
logger.info(f"Successfully created audited document: {output_path}")
# Verify document hash using processor's stored hash if available
if hasattr(processor, '_last_document_hash'):
logger.info("Using stored document hash for verification")
stored_hash = processor._last_document_hash
extracted_hash = None
try:
with pikepdf.open(output_path) as pdf:
if "/DocumentHash" in pdf.docinfo:
hash_json = pdf.docinfo["/DocumentHash"]
hash_metadata = json.loads(str(hash_json))
extracted_hash = hash_metadata.get("hash")
except Exception as e:
logger.warning(f"Could not extract hash from PDF metadata: {e}")
hash_verified = stored_hash == extracted_hash
if hash_verified:
logger.info(f"Document hash verification: Passed ✅")
else:
logger.warning(f"Document hash verification: Failed ❌")
else:
# Fall back to standard verification
hash_verified = processor.hash_generator.verify_hash(output_path)
if hash_verified:
logger.info(f"Document hash verification: Passed ✅")
else:
logger.warning(f"Document hash verification: Failed ❌")
# Verify PDF/A compliance
pdfa_compliant = processor.pdfa_converter.validate_pdfa(output_path)
if pdfa_compliant:
logger.info(f"PDF/A compliance check: Passed ✅")
else:
logger.warning(f"PDF/A compliance check: Failed ❌")
# Check if document is protected
is_protected = hasattr(processor, 'document_protector') and hasattr(processor, '_last_owner_password')
if is_protected:
logger.info("🔒 Document is protected from editing")
logger.info(f"Owner password: {getattr(processor, '_last_owner_password', 'Not available')}")
logger.info("Keep this password in a secure location for administrative access")
else:
logger.info("⚠️ Document is not protected from editing")
logger.info(f"Document processing complete. Output file: {output_path}")
except Exception as e:
logger.error(f"Error processing document: {e}", exc_info=True)
Return Value
This function does not return any value (implicitly returns None). It performs side effects including creating directories, processing documents, and logging results. The function may return early (None) if required input files are not found.
Dependencies
osloggingjsonsyspikepdf
Required Imports
import os
import logging
import json
import sys
import pikepdf
from src.document_processor import DocumentProcessor
Usage Example
# Ensure required files exist in examples directory:
# - examples/test_document_original.docx
# - examples/sample_audit_data.json
# - examples/ViceBio_Logo_dark blue.png (optional)
import os
import logging
import json
import sys
import pikepdf
from src.document_processor import DocumentProcessor
# Configure logger
logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)
# Run the main function
if __name__ == '__main__':
main()
# Output will be created at: ./examples/audited_document.pdf
# The function will log verification results for hash, PDF/A compliance, and protection status
Best Practices
- Ensure all required input files exist before calling this function
- Configure logging before calling main() to capture all log messages
- The function creates a 'signatures' directory in the script's directory - ensure write permissions
- Store the owner password logged by the function in a secure location for administrative access
- The function expects specific file paths relative to __file__ - adjust paths if running from different locations
- Handle exceptions at the caller level if using this as part of a larger application
- The watermark image is optional - the function will continue without it if not found
- Review logged verification results (hash, PDF/A compliance, protection status) to ensure document integrity
- The finalize=True parameter locks the document - ensure this is desired behavior
- The function uses compliance_level='2b' for PDF/A-2b standard - adjust if different compliance is needed
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
class DocumentProcessor 72.7% similar
-
function main_v18 69.3% similar
-
function test_document_processing 68.8% similar
-
function main_v1 66.4% similar
-
function test_document_processor 65.3% similar