Search - Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "OCR"

Found 25 matching component(s)

class RegulatoryExtractor

A class for extracting structured metadata from regulatory guideline PDF documents using LLM-based analysis and storing the results in an Excel tracking spreadsheet.

File: /tf/active/vicechatdev/reg_extractor.py

pdf-extraction regulatory-documents llm-extraction ocr data-extraction
class DocumentProcessor_v4

Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

File: /tf/active/vicechatdev/docchat/document_processor.py

class documentprocessor
class DocumentProcessor_v8

Process different document types for indexing

File: /tf/active/vicechatdev/docchat/document_indexer.py

class documentprocessor
class DocumentIndexer

A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

File: /tf/active/vicechatdev/docchat/document_indexer.py

document-indexing vector-database chromadb embeddings pdf-processing
function test_extraction_methods

A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

testing pdf-extraction ocr document-processing text-extraction
class TestDocumentProcessor

A test subclass of DocumentProcessor that simulates llmsherpa PDF processing failures and triggers OCR fallback mechanisms for testing purposes.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

testing document-processing pdf ocr fallback
function test_ocr_fallback

A test function that validates OCR fallback functionality when the primary llmsherpa PDF text extraction method fails.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

testing ocr pdf-processing text-extraction fallback-mechanism
class ContractDataExtractor

Extract structured data from legal contracts using LLM analysis

File: /tf/active/vicechatdev/contract_validity_analyzer/extractor.py

class contractdataextractor
function setup_test_logging_v3

Configures Python logging with both console and file output for test execution, returning a logger instance for the calling module.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_retry.py

logging testing configuration setup debugging
function test_ocr_retry_logic

Tests the OCR retry logic for extracting contract end dates by first attempting normal text extraction, then falling back to OCR-based extraction if the end date is not found.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_retry.py

testing ocr document-processing pdf-extraction contract-analysis
class ContractAnalyzer

Main class for analyzing contract validity from FileCloud documents.

File: /tf/active/vicechatdev/contract_validity_analyzer/core/analyzer.py

class contractanalyzer
class DocumentProcessor_v7

Lightweight document processor for chat upload functionality

File: /tf/active/vicechatdev/vice_ai/document_processor.py

class documentprocessor
function test_enhanced_pdf_processing

A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

File: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py

testing pdf-processing document-processing diagnostic text-extraction
class DocumentProcessor_v3

A comprehensive PDF document processor that handles text extraction, OCR (Optical Character Recognition), layout analysis, table detection, and metadata extraction from PDF files.

File: /tf/active/vicechatdev/invoice_extraction/core/document_processor.py

pdf-processing ocr text-extraction document-processing invoice-processing
class BEExtractor

Belgium-specific invoice data extractor that uses LLM (Large Language Model) to extract structured invoice data from Belgian invoices in multiple languages (English, French, Dutch).

File: /tf/active/vicechatdev/invoice_extraction/extractors/be_extractor.py

invoice-extraction belgium llm ocr document-processing
class AUExtractor

Australia-specific invoice data extractor that uses LLM (Large Language Model) to extract structured invoice data from Australian tax invoices, handling ABN, ACN, GST, BSB numbers and Australian date formats.

File: /tf/active/vicechatdev/invoice_extraction/extractors/au_extractor.py

invoice-extraction australia llm ocr document-processing
class BaseExtractor

Abstract base class that defines the interface and shared functionality for entity-specific invoice data extractors (UK, BE, AU), providing a multi-stage extraction pipeline for invoice processing.

File: /tf/active/vicechatdev/invoice_extraction/extractors/base_extractor.py

invoice-processing data-extraction abstract-base-class OCR document-processing
class DocumentAnalyzer

Analyze PDF documents using OCR and LLM

File: /tf/active/vicechatdev/mailsearch/document_analyzer.py

class documentanalyzer
function main_v10

Command-line interface function that orchestrates PDF document analysis using OCR and LLM processing, with configurable input/output paths and processing limits.

File: /tf/active/vicechatdev/mailsearch/document_analyzer.py

cli command-line entry-point pdf-processing ocr
function generate_failure_report

Analyzes processing results from a JSON file, generates a comprehensive failure report with statistics and error categorization, and exports detailed failure information to a CSV file.

File: /tf/active/vicechatdev/mailsearch/generate_failure_report.py

reporting error-analysis document-processing failure-analysis csv-export
function create_handwritten_question

Generates a synthetic handwritten-style question image about photosynthesis with formatted text and decorative elements, saved as a PNG file.

File: /tf/active/vicechatdev/e-ink-llm/demo.py

image-generation demo testing PIL Pillow
function create_math_problem

Generates a PNG image file containing a sample algebra math problem with workspace lines for solving.

File: /tf/active/vicechatdev/e-ink-llm/demo.py

image-generation PIL Pillow math-problem education
function create_test_image

Creates a synthetic test image with text rendered in a handwritten-style font on a white background and saves it to disk.

File: /tf/active/vicechatdev/e-ink-llm/test.py

image-generation test-data PIL Pillow text-rendering
function run_tests

Asynchronous test suite function that creates test images with various text prompts, processes them through an E-Ink LLM processor, and reports usage statistics and results.

File: /tf/active/vicechatdev/e-ink-llm/test.py

testing async e-ink llm openai
class MultiPagePDFProcessor

A class for processing multi-page PDF documents with context-aware analysis, OCR, and summarization capabilities.

File: /tf/active/vicechatdev/e-ink-llm/multi_page_processor.py

pdf-processing document-analysis ocr multi-page context-aware

Search Examples

validation - Find validation functions
database - Find database-related components
email - Find email processing functions
api - Find API-related components
authentication - Find auth components

Search Components

Search Results for "OCR"

class RegulatoryExtractor

class DocumentProcessor_v4

class DocumentProcessor_v8

class DocumentIndexer

function test_extraction_methods

class TestDocumentProcessor

function test_ocr_fallback

class ContractDataExtractor

function setup_test_logging_v3

function test_ocr_retry_logic

class ContractAnalyzer

class DocumentProcessor_v7

function test_enhanced_pdf_processing

class DocumentProcessor_v3

class BEExtractor

class AUExtractor

class BaseExtractor

class DocumentAnalyzer

function main_v10

function generate_failure_report

function create_handwritten_question

function create_math_problem

function create_test_image

function run_tests

class MultiPagePDFProcessor

Search Examples