🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "llmsherpa"

Found 16 matching component(s)

  • class RegulatoryExtractor

    A class for extracting structured metadata from regulatory guideline PDF documents using LLM-based analysis and storing the results in an Excel tracking spreadsheet.

    File: /tf/active/vicechatdev/reg_extractor.py

    pdf-extraction regulatory-documents llm-extraction ocr data-extraction
  • class DocumentProcessor_v5

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

    class documentprocessor
  • class DocumentProcessor_v6

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi.py

    class documentprocessor
  • class DocumentProcessor_v4

    Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

    File: /tf/active/vicechatdev/docchat/document_processor.py

    class documentprocessor
  • class DocumentProcessor_v8

    Process different document types for indexing

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    class documentprocessor
  • function test_document_processor

    A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py

    testing document-processing pdf-extraction text-extraction integration-test
  • function test_extraction_methods

    A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

    testing pdf-extraction ocr document-processing text-extraction
  • class TestDocumentProcessor

    A test subclass of DocumentProcessor that simulates llmsherpa PDF processing failures and triggers OCR fallback mechanisms for testing purposes.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

    testing document-processing pdf ocr fallback
  • function test_ocr_fallback

    A test function that validates OCR fallback functionality when the primary llmsherpa PDF text extraction method fails.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

    testing ocr pdf-processing text-extraction fallback-mechanism
  • class DocumentProcessor_v1

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • class DocumentProcessor_v2

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • class DocumentProcessor_v7

    Lightweight document processor for chat upload functionality

    File: /tf/active/vicechatdev/vice_ai/document_processor.py

    class documentprocessor
  • function test_enhanced_pdf_processing

    A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

    File: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py

    testing pdf-processing document-processing diagnostic text-extraction
  • class PDFTextExtractor

    A class for extracting text, images, and structured content from PDF documents with layout preservation capabilities.

    File: /tf/active/vicechatdev/CDocs/utils/pdf_utils.py

    pdf text-extraction document-processing layout-analysis markdown-conversion
  • class DocumentDetail_v2

    Document detail view component

    File: /tf/active/vicechatdev/CDocs/ui/document_detail.py

    class documentdetail
  • class DocumentDownloader

    A client class for downloading documents (primarily PDFs) from various sources, managing download caching, respecting rate limits per domain, and processing documents using llmsherpa for content extraction.

    File: /tf/active/vicechatdev/QA_updater/data_access/document_downloader.py

    document-download pdf-processing rate-limiting caching llmsherpa

Search Examples