🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "documentprocessor"

Found 29 matching component(s)

  • class DocumentProcessor_v4

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

    class documentprocessor
  • class DocumentProcessor_v5

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi.py

    class documentprocessor
  • class DocumentProcessor_v3

    Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

    File: /tf/active/vicechatdev/docchat/document_processor.py

    class documentprocessor
  • class DocumentProcessor_v7

    Process different document types for indexing

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    class documentprocessor
  • class DocumentIndexer

    A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    document-indexing vector-database chromadb embeddings pdf-processing
  • function test_document_processor

    A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py

    testing document-processing pdf-extraction text-extraction integration-test
  • function test_extraction_methods

    A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

    testing pdf-extraction ocr document-processing text-extraction
  • function test_local_document

    Integration test function that validates end date extraction from a local PDF document using document processing and LLM-based analysis.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_local_document.py

    testing integration-test document-processing pdf-extraction llm
  • function test_single_document

    Tests end date extraction from a specific PDF document by downloading it from FileCloud, extracting text, and using LLM-based analysis to identify contract expiry dates.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_single_document.py

    testing integration-test document-processing pdf-extraction contract-analysis
  • function test_document_processing

    A test function that validates document processing functionality by creating a test PDF file, processing it through a DocumentProcessor, and verifying the extraction results or error handling.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py

    testing document-processing pdf-extraction integration-test contract-analysis
  • class TestDocumentProcessor

    A test subclass of DocumentProcessor that simulates llmsherpa PDF processing failures and triggers OCR fallback mechanisms for testing purposes.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

    testing document-processing pdf ocr fallback
  • class ContractDataExtractor

    Extract structured data from legal contracts using LLM analysis

    File: /tf/active/vicechatdev/contract_validity_analyzer/extractor.py

    class contractdataextractor
  • function debug_download

    A diagnostic function that downloads a PDF document from FileCloud, analyzes its content to verify it's a valid PDF, and tests text extraction capabilities.

    File: /tf/active/vicechatdev/contract_validity_analyzer/debug_download.py

    debugging diagnostics filecloud pdf download
  • function test_end_date_extraction

    Tests end date extraction functionality for contract documents that previously had missing end dates by downloading documents from FileCloud, extracting text, analyzing with LLM, and comparing results.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_missing_end_dates.py

    testing integration-test contract-analysis end-date-extraction document-processing
  • function explore_documents

    Explores and tests document accessibility across multiple FileCloud directory paths, attempting to download and validate document content from various locations in a hierarchical search pattern.

    File: /tf/active/vicechatdev/contract_validity_analyzer/explore_documents.py

    filecloud document-exploration diagnostic file-search pdf-validation
  • function test_ocr_retry_logic

    Tests the OCR retry logic for extracting contract end dates by first attempting normal text extraction, then falling back to OCR-based extraction if the end date is not found.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_retry.py

    testing ocr document-processing pdf-extraction contract-analysis
  • class ContractAnalyzer

    Main class for analyzing contract validity from FileCloud documents.

    File: /tf/active/vicechatdev/contract_validity_analyzer/core/analyzer.py

    class contractanalyzer
  • class DocumentProcessor_v1

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • class DocumentProcessor_v2

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • function main_v49

    Entry point function that demonstrates document processing workflow by creating an audited, watermarked, and protected PDF/A document from a DOCX file with audit trail data.

    File: /tf/active/vicechatdev/document_auditor/main.py

    document-processing pdf-generation audit-trail watermarking pdf-a-compliance
  • class DocumentProcessor

    A comprehensive document processing class that converts documents to PDF, adds audit trails, applies security features (watermarks, signatures, hashing), and optionally converts to PDF/A format with document protection.

    File: /tf/active/vicechatdev/document_auditor/src/document_processor.py

    document-processing pdf-generation audit-trail security watermarking
  • class DocumentProcessor_v6

    Lightweight document processor for chat upload functionality

    File: /tf/active/vicechatdev/vice_ai/document_processor.py

    class documentprocessor
  • function api_upload_document_v1

    Flask API endpoint that handles document file uploads, validates file type and size, stores the file temporarily, and extracts basic text content for processing.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    file-upload document-processing api-endpoint flask validation
  • function smartstat_upload_files

    Flask API endpoint that handles multi-file uploads (CSV, Excel, PDF, Word, PowerPoint) to a SmartStat session, processing data files as datasets and documents as information sheets.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    file-upload multi-file csv-processing excel-processing pdf-extraction
  • function test_enhanced_pdf_processing

    A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

    File: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py

    testing pdf-processing document-processing diagnostic text-extraction
  • function test_extraction_debugging

    A test function that validates the extraction debugging functionality of a DocumentProcessor by creating test files, simulating document extraction, and verifying debug log creation.

    File: /tf/active/vicechatdev/vice_ai/test_extraction_debug.py

    testing debugging document-processing file-operations unit-test
  • function publish_document_v1

    Publishes a controlled document by incrementing its version to the next major revision, converting it to a signed PDF, and updating its status to PUBLISHED.

    File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

    document-management publishing version-control pdf-generation workflow
  • function convert_document_to_pdf

    Converts a document version to PDF format with audit trail, signatures, watermarks, and PDF/A compliance options, then uploads the result to FileCloud storage.

    File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

    document-conversion pdf-generation document-management audit-trail compliance
  • function prepare_audit_data_for_document_processor

    Prepares comprehensive audit data for a controlled document version, aggregating information from document history, reviews, approvals, and audit events into a structured format for DocumentProcessor.

    File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

    audit document-management compliance data-aggregation pdf-generation

Search Examples