🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "tesseract"

Found 4 matching component(s)

  • class DocumentProcessor_v4

    Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

    File: /tf/active/vicechatdev/docchat/document_processor.py

    class documentprocessor
  • class DocumentProcessor_v8

    Process different document types for indexing

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    class documentprocessor
  • function test_extraction_methods

    A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

    testing pdf-extraction ocr document-processing text-extraction
  • class DocumentProcessor_v3

    A comprehensive PDF document processor that handles text extraction, OCR (Optical Character Recognition), layout analysis, table detection, and metadata extraction from PDF files.

    File: /tf/active/vicechatdev/invoice_extraction/core/document_processor.py

    pdf-processing ocr text-extraction document-processing invoice-processing

Search Examples