Search - Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "llmsherpa"

Found 16 matching component(s)

class RegulatoryExtractor

A class for extracting structured metadata from regulatory guideline PDF documents using LLM-based analysis and storing the results in an Excel tracking spreadsheet.

File: /tf/active/vicechatdev/reg_extractor.py

pdf-extraction regulatory-documents llm-extraction ocr data-extraction
class DocumentProcessor_v5

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

class documentprocessor
class DocumentProcessor_v6

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi.py

class documentprocessor
class DocumentProcessor_v4

Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

File: /tf/active/vicechatdev/docchat/document_processor.py

class documentprocessor
class DocumentProcessor_v8

Process different document types for indexing

File: /tf/active/vicechatdev/docchat/document_indexer.py

class documentprocessor
function test_document_processor

A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py

testing document-processing pdf-extraction text-extraction integration-test
function test_extraction_methods

A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

testing pdf-extraction ocr document-processing text-extraction
class TestDocumentProcessor

A test subclass of DocumentProcessor that simulates llmsherpa PDF processing failures and triggers OCR fallback mechanisms for testing purposes.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

testing document-processing pdf ocr fallback
function test_ocr_fallback

A test function that validates OCR fallback functionality when the primary llmsherpa PDF text extraction method fails.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

testing ocr pdf-processing text-extraction fallback-mechanism
class DocumentProcessor_v1

A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py

document-processing text-extraction pdf-processing word-processing llmsherpa
class DocumentProcessor_v2

A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py

document-processing text-extraction pdf-processing word-processing llmsherpa
class DocumentProcessor_v7

Lightweight document processor for chat upload functionality

File: /tf/active/vicechatdev/vice_ai/document_processor.py

class documentprocessor
function test_enhanced_pdf_processing

A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

File: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py

testing pdf-processing document-processing diagnostic text-extraction
class PDFTextExtractor

A class for extracting text, images, and structured content from PDF documents with layout preservation capabilities.

File: /tf/active/vicechatdev/CDocs/utils/pdf_utils.py

pdf text-extraction document-processing layout-analysis markdown-conversion
class DocumentDetail_v2

Document detail view component

File: /tf/active/vicechatdev/CDocs/ui/document_detail.py

class documentdetail
class DocumentDownloader

A client class for downloading documents (primarily PDFs) from various sources, managing download caching, respecting rate limits per domain, and processing documents using llmsherpa for content extraction.

File: /tf/active/vicechatdev/QA_updater/data_access/document_downloader.py

document-download pdf-processing rate-limiting caching llmsherpa

Search Examples

validation - Find validation functions
database - Find database-related components
email - Find email processing functions
api - Find API-related components
authentication - Find auth components

Search Components

Search Results for "llmsherpa"

class RegulatoryExtractor

class DocumentProcessor_v5

class DocumentProcessor_v6

class DocumentProcessor_v4

class DocumentProcessor_v8

function test_document_processor

function test_extraction_methods

class TestDocumentProcessor

function test_ocr_fallback

class DocumentProcessor_v1

class DocumentProcessor_v2

class DocumentProcessor_v7

function test_enhanced_pdf_processing

class PDFTextExtractor

class DocumentDetail_v2

class DocumentDownloader

Search Examples