function main
Main entry point function for a Legal Contract Data Extractor application that processes contracts from FileCloud, extracts data, and exports results to multiple formats (CSV, Excel, JSON).
/tf/active/vicechatdev/contract_validity_analyzer/extractor.py
747 - 836
complex
Purpose
This function orchestrates the entire contract extraction workflow: parses command-line arguments, loads configuration, sets up logging, initializes the ContractDataExtractor, processes contracts from a specified FileCloud path, and saves results to multiple output formats. It handles errors gracefully, provides progress logging, and returns appropriate exit codes for shell integration.
Source Code
def main():
"""Main entry point"""
args = parse_arguments()
try:
# Load configuration
if args.config:
config = Config(args.config)
else:
config = Config()
# Set up logging
log_dir = "logs"
os.makedirs(log_dir, exist_ok=True)
log_config = config.get_section('logging')
if args.verbose:
log_config['level'] = 'DEBUG'
setup_logging(log_config, log_dir)
logger = get_logger(__name__)
logger.info("="*80)
logger.info("Legal Contract Data Extractor")
logger.info("="*80)
logger.info(f"FileCloud path: {args.path}")
logger.info(f"Limit: {args.limit if args.limit else 'No limit'}")
logger.info(f"Output: {args.output if args.output else 'Auto-generated'}")
logger.info("="*80 + "\n")
# Initialize extractor
extractor = ContractDataExtractor(config, limit=args.limit)
# Process contracts
results = extractor.process_contracts(args.path)
if not results:
logger.warning("No contracts were successfully extracted")
return 1
# Generate output filenames
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
if args.output:
# Remove extension if provided
base_output = args.output.replace('.csv', '').replace('.xlsx', '').replace('.json', '')
csv_output = f"{base_output}.csv"
excel_output = f"{base_output}.xlsx"
json_output = f"{base_output}.json"
else:
output_dir = config.get('output_dir', 'output')
os.makedirs(output_dir, exist_ok=True)
base_name = f"contracts_extracted_{timestamp}"
csv_output = os.path.join(output_dir, f"{base_name}.csv")
excel_output = os.path.join(output_dir, f"{base_name}.xlsx")
json_output = os.path.join(output_dir, f"{base_name}.json")
# Save to CSV
extractor.save_to_csv(results, csv_output)
# Save to Excel
extractor.save_to_excel(results, excel_output)
# Save to JSON
if args.output_json:
extractor.save_to_json(results, args.output_json)
else:
extractor.save_to_json(results, json_output)
logger.info("\n" + "="*80)
logger.info(f"ā
Extraction complete!")
logger.info(f"Processed: {len(results)} contracts")
logger.info(f"CSV Output: {csv_output}")
logger.info(f"Excel Output: {excel_output}")
if args.output_json:
logger.info(f"JSON Output: {args.output_json}")
else:
logger.info(f"JSON Output: {json_output}")
logger.info("="*80)
return 0
except KeyboardInterrupt:
print("\n\nInterrupted by user")
return 130
except Exception as e:
print(f"\nā Fatal error: {e}")
import traceback
traceback.print_exc()
return 1
Return Value
Returns an integer exit code: 0 for successful completion, 1 for fatal errors or exceptions, 130 for keyboard interrupt (user cancellation). This follows Unix convention for process exit codes.
Dependencies
ossysjsonargparsecsvpathlibdatetimetypingreopenpyxlopenaitracebacktempfile
Required Imports
import os
import sys
import json
import argparse
import csv
from pathlib import Path
from datetime import datetime
from datetime import timedelta
from typing import Dict, List, Optional, Any
import re
from config.config import Config
from utils.filecloud_client import FileCloudClient
from utils.document_processor import DocumentProcessor
from utils.logging_utils import setup_logging, get_logger
import openpyxl
from openpyxl.styles import Font, Alignment, PatternFill, Border, Side
from openai import OpenAI
import traceback
import tempfile
Usage Example
# This function is typically called as the main entry point of the application
# Example command-line usage:
# python script.py --path /contracts/folder --limit 10 --output results --verbose
# In code:
if __name__ == '__main__':
sys.exit(main())
# The function expects these dependencies to be defined:
# - parse_arguments() returning args with: config, verbose, path, limit, output, output_json
# - Config class for configuration management
# - ContractDataExtractor class for processing contracts
# - Logging utilities: setup_logging(), get_logger()
# Example args structure:
# args.config = 'config.yaml' # Optional config file path
# args.verbose = True # Enable debug logging
# args.path = '/filecloud/contracts' # FileCloud path to process
# args.limit = 100 # Max contracts to process (None for no limit)
# args.output = 'results' # Base output filename
# args.output_json = 'results.json' # Optional JSON output path
Best Practices
- This function should be called as the main entry point using if __name__ == '__main__': sys.exit(main())
- Ensure all required custom modules (Config, ContractDataExtractor, FileCloudClient, etc.) are properly implemented before calling
- The function creates directories automatically (logs/, output/) but ensure the parent directories have write permissions
- Handle the return code appropriately in shell scripts or calling code (0=success, 1=error, 130=interrupted)
- The function uses timestamp-based filenames to avoid overwriting previous results
- Verbose mode can be enabled via command-line args to get DEBUG level logging
- The function catches KeyboardInterrupt separately to allow graceful user cancellation
- All exceptions are caught and logged with full traceback for debugging
- Output files are generated in three formats (CSV, Excel, JSON) automatically unless specific paths are provided
- The limit parameter allows testing with a subset of contracts before processing all
- Configuration can be loaded from a file or use defaults if no config file is specified
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function main_v4 77.0% similar
-
function parse_arguments_v1 72.3% similar
-
function main_v41 67.0% similar
-
function test_llm_extraction 65.3% similar
-
class ContractAnalyzer 65.1% similar