šŸ” Code Extractor

function main

Maturity: 52

Main entry point function for a Legal Contract Data Extractor application that processes contracts from FileCloud, extracts data, and exports results to multiple formats (CSV, Excel, JSON).

File:
/tf/active/vicechatdev/contract_validity_analyzer/extractor.py
Lines:
747 - 836
Complexity:
complex

Purpose

This function orchestrates the entire contract extraction workflow: parses command-line arguments, loads configuration, sets up logging, initializes the ContractDataExtractor, processes contracts from a specified FileCloud path, and saves results to multiple output formats. It handles errors gracefully, provides progress logging, and returns appropriate exit codes for shell integration.

Source Code

def main():
    """Main entry point"""
    args = parse_arguments()
    
    try:
        # Load configuration
        if args.config:
            config = Config(args.config)
        else:
            config = Config()
        
        # Set up logging
        log_dir = "logs"
        os.makedirs(log_dir, exist_ok=True)
        
        log_config = config.get_section('logging')
        if args.verbose:
            log_config['level'] = 'DEBUG'
        
        setup_logging(log_config, log_dir)
        logger = get_logger(__name__)
        
        logger.info("="*80)
        logger.info("Legal Contract Data Extractor")
        logger.info("="*80)
        logger.info(f"FileCloud path: {args.path}")
        logger.info(f"Limit: {args.limit if args.limit else 'No limit'}")
        logger.info(f"Output: {args.output if args.output else 'Auto-generated'}")
        logger.info("="*80 + "\n")
        
        # Initialize extractor
        extractor = ContractDataExtractor(config, limit=args.limit)
        
        # Process contracts
        results = extractor.process_contracts(args.path)
        
        if not results:
            logger.warning("No contracts were successfully extracted")
            return 1
        
        # Generate output filenames
        timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
        
        if args.output:
            # Remove extension if provided
            base_output = args.output.replace('.csv', '').replace('.xlsx', '').replace('.json', '')
            csv_output = f"{base_output}.csv"
            excel_output = f"{base_output}.xlsx"
            json_output = f"{base_output}.json"
        else:
            output_dir = config.get('output_dir', 'output')
            os.makedirs(output_dir, exist_ok=True)
            base_name = f"contracts_extracted_{timestamp}"
            csv_output = os.path.join(output_dir, f"{base_name}.csv")
            excel_output = os.path.join(output_dir, f"{base_name}.xlsx")
            json_output = os.path.join(output_dir, f"{base_name}.json")
        
        # Save to CSV
        extractor.save_to_csv(results, csv_output)
        
        # Save to Excel
        extractor.save_to_excel(results, excel_output)
        
        # Save to JSON
        if args.output_json:
            extractor.save_to_json(results, args.output_json)
        else:
            extractor.save_to_json(results, json_output)
        
        logger.info("\n" + "="*80)
        logger.info(f"āœ… Extraction complete!")
        logger.info(f"Processed: {len(results)} contracts")
        logger.info(f"CSV Output: {csv_output}")
        logger.info(f"Excel Output: {excel_output}")
        if args.output_json:
            logger.info(f"JSON Output: {args.output_json}")
        else:
            logger.info(f"JSON Output: {json_output}")
        logger.info("="*80)
        
        return 0
        
    except KeyboardInterrupt:
        print("\n\nInterrupted by user")
        return 130
    except Exception as e:
        print(f"\nāŒ Fatal error: {e}")
        import traceback
        traceback.print_exc()
        return 1

Return Value

Returns an integer exit code: 0 for successful completion, 1 for fatal errors or exceptions, 130 for keyboard interrupt (user cancellation). This follows Unix convention for process exit codes.

Dependencies

  • os
  • sys
  • json
  • argparse
  • csv
  • pathlib
  • datetime
  • typing
  • re
  • openpyxl
  • openai
  • traceback
  • tempfile

Required Imports

import os
import sys
import json
import argparse
import csv
from pathlib import Path
from datetime import datetime
from datetime import timedelta
from typing import Dict, List, Optional, Any
import re
from config.config import Config
from utils.filecloud_client import FileCloudClient
from utils.document_processor import DocumentProcessor
from utils.logging_utils import setup_logging, get_logger
import openpyxl
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
from openai import OpenAI
import traceback
import tempfile

Usage Example

# This function is typically called as the main entry point of the application
# Example command-line usage:
# python script.py --path /contracts/folder --limit 10 --output results --verbose

# In code:
if __name__ == '__main__':
    sys.exit(main())

# The function expects these dependencies to be defined:
# - parse_arguments() returning args with: config, verbose, path, limit, output, output_json
# - Config class for configuration management
# - ContractDataExtractor class for processing contracts
# - Logging utilities: setup_logging(), get_logger()

# Example args structure:
# args.config = 'config.yaml'  # Optional config file path
# args.verbose = True  # Enable debug logging
# args.path = '/filecloud/contracts'  # FileCloud path to process
# args.limit = 100  # Max contracts to process (None for no limit)
# args.output = 'results'  # Base output filename
# args.output_json = 'results.json'  # Optional JSON output path

Best Practices

  • This function should be called as the main entry point using if __name__ == '__main__': sys.exit(main())
  • Ensure all required custom modules (Config, ContractDataExtractor, FileCloudClient, etc.) are properly implemented before calling
  • The function creates directories automatically (logs/, output/) but ensure the parent directories have write permissions
  • Handle the return code appropriately in shell scripts or calling code (0=success, 1=error, 130=interrupted)
  • The function uses timestamp-based filenames to avoid overwriting previous results
  • Verbose mode can be enabled via command-line args to get DEBUG level logging
  • The function catches KeyboardInterrupt separately to allow graceful user cancellation
  • All exceptions are caught and logged with full traceback for debugging
  • Output files are generated in three formats (CSV, Excel, JSON) automatically unless specific paths are provided
  • The limit parameter allows testing with a subset of contracts before processing all
  • Configuration can be loaded from a file or use defaults if no config file is specified

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function main_v4 77.0% similar

    Main entry point function for the Contract Validity Analyzer application that orchestrates configuration loading, logging setup, FileCloud connection, and contract analysis execution.

    From: /tf/active/vicechatdev/contract_validity_analyzer/main.py
  • function parse_arguments_v1 72.3% similar

    Parses command-line arguments for a legal contract extraction tool that processes documents from FileCloud storage.

    From: /tf/active/vicechatdev/contract_validity_analyzer/extractor.py
  • function main_v41 67.0% similar

    Entry point function that parses command-line arguments and orchestrates the FileCloud email processing workflow to find, download, and convert .msg files.

    From: /tf/active/vicechatdev/msg_to_eml.py
  • function test_llm_extraction 65.3% similar

    A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.

    From: /tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py
  • class ContractAnalyzer 65.1% similar

    Main class for analyzing contract validity from FileCloud documents.

    From: /tf/active/vicechatdev/contract_validity_analyzer/core/analyzer.py
← Back to Browse