🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "OCR"

Found 17 matching component(s)

  • class RegulatoryExtractor

    A class for extracting structured metadata from regulatory guideline PDF documents using LLM-based analysis and storing the results in an Excel tracking spreadsheet.

    File: /tf/active/vicechatdev/reg_extractor.py

    pdf-extraction regulatory-documents llm-extraction ocr data-extraction
  • class DocumentProcessor_v4

    Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

    File: /tf/active/vicechatdev/docchat/document_processor.py

    class documentprocessor
  • class DocumentProcessor_v8

    Process different document types for indexing

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    class documentprocessor
  • class DocumentIndexer

    A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    document-indexing vector-database chromadb embeddings pdf-processing
  • function test_extraction_methods

    A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

    testing pdf-extraction ocr document-processing text-extraction
  • class TestDocumentProcessor

    A test subclass of DocumentProcessor that simulates llmsherpa PDF processing failures and triggers OCR fallback mechanisms for testing purposes.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

    testing document-processing pdf ocr fallback
  • function test_ocr_fallback

    A test function that validates OCR fallback functionality when the primary llmsherpa PDF text extraction method fails.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

    testing ocr pdf-processing text-extraction fallback-mechanism
  • class ContractDataExtractor

    Extract structured data from legal contracts using LLM analysis

    File: /tf/active/vicechatdev/contract_validity_analyzer/extractor.py

    class contractdataextractor
  • function setup_test_logging_v3

    Configures Python logging with both console and file output for test execution, returning a logger instance for the calling module.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_retry.py

    logging testing configuration setup debugging
  • function test_ocr_retry_logic

    Tests the OCR retry logic for extracting contract end dates by first attempting normal text extraction, then falling back to OCR-based extraction if the end date is not found.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_retry.py

    testing ocr document-processing pdf-extraction contract-analysis
  • class ContractAnalyzer

    Main class for analyzing contract validity from FileCloud documents.

    File: /tf/active/vicechatdev/contract_validity_analyzer/core/analyzer.py

    class contractanalyzer
  • class DocumentProcessor_v7

    Lightweight document processor for chat upload functionality

    File: /tf/active/vicechatdev/vice_ai/document_processor.py

    class documentprocessor
  • function test_enhanced_pdf_processing

    A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

    File: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py

    testing pdf-processing document-processing diagnostic text-extraction
  • class DocumentProcessor_v3

    A comprehensive PDF document processor that handles text extraction, OCR (Optical Character Recognition), layout analysis, table detection, and metadata extraction from PDF files.

    File: /tf/active/vicechatdev/invoice_extraction/core/document_processor.py

    pdf-processing ocr text-extraction document-processing invoice-processing
  • class BEExtractor

    Belgium-specific invoice data extractor that uses LLM (Large Language Model) to extract structured invoice data from Belgian invoices in multiple languages (English, French, Dutch).

    File: /tf/active/vicechatdev/invoice_extraction/extractors/be_extractor.py

    invoice-extraction belgium llm ocr document-processing
  • class AUExtractor

    Australia-specific invoice data extractor that uses LLM (Large Language Model) to extract structured invoice data from Australian tax invoices, handling ABN, ACN, GST, BSB numbers and Australian date formats.

    File: /tf/active/vicechatdev/invoice_extraction/extractors/au_extractor.py

    invoice-extraction australia llm ocr document-processing
  • class BaseExtractor

    Abstract base class that defines the interface and shared functionality for entity-specific invoice data extractors (UK, BE, AU), providing a multi-stage extraction pipeline for invoice processing.

    File: /tf/active/vicechatdev/invoice_extraction/extractors/base_extractor.py

    invoice-processing data-extraction abstract-base-class OCR document-processing

Search Examples