Search - Code Extractor

class DocumentProcessor_v5

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

class documentprocessor

class DocumentProcessor_v6

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi.py

class documentprocessor

class DocumentProcessor_v4

Handles document processing and text extraction using llmsherpa (same approach as offline_docstore_multi_vice.py).

File: /tf/active/vicechatdev/docchat/document_processor.py

class documentprocessor

class DocumentProcessor_v8

Process different document types for indexing

File: /tf/active/vicechatdev/docchat/document_indexer.py

class documentprocessor

class DocumentIndexer

A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

File: /tf/active/vicechatdev/docchat/document_indexer.py

document-indexing vector-database chromadb embeddings pdf-processing

function test_document_processor

A test function that validates the DocumentProcessor component's ability to extract text from PDF files with improved error handling and llmsherpa integration.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_improved_processor.py

testing document-processing pdf-extraction text-extraction integration-test

function test_extraction_methods

A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

testing pdf-extraction ocr document-processing text-extraction

function test_local_document

Integration test function that validates end date extraction from a local PDF document using document processing and LLM-based analysis.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_local_document.py

testing integration-test document-processing pdf-extraction llm

function test_single_document

Tests end date extraction from a specific PDF document by downloading it from FileCloud, extracting text, and using LLM-based analysis to identify contract expiry dates.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_single_document.py

testing integration-test document-processing pdf-extraction contract-analysis

function test_document_processing

A test function that validates document processing functionality by creating a test PDF file, processing it through a DocumentProcessor, and verifying the extraction results or error handling.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py

testing document-processing pdf-extraction integration-test contract-analysis

class TestDocumentProcessor

A test subclass of DocumentProcessor that simulates llmsherpa PDF processing failures and triggers OCR fallback mechanisms for testing purposes.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_fallback.py

testing document-processing pdf ocr fallback

class ContractDataExtractor

Extract structured data from legal contracts using LLM analysis

File: /tf/active/vicechatdev/contract_validity_analyzer/extractor.py

class contractdataextractor

function debug_download

A diagnostic function that downloads a PDF document from FileCloud, analyzes its content to verify it's a valid PDF, and tests text extraction capabilities.

File: /tf/active/vicechatdev/contract_validity_analyzer/debug_download.py

debugging diagnostics filecloud pdf download

function test_end_date_extraction

Tests end date extraction functionality for contract documents that previously had missing end dates by downloading documents from FileCloud, extracting text, analyzing with LLM, and comparing results.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_missing_end_dates.py

testing integration-test contract-analysis end-date-extraction document-processing

function explore_documents

Explores and tests document accessibility across multiple FileCloud directory paths, attempting to download and validate document content from various locations in a hierarchical search pattern.

File: /tf/active/vicechatdev/contract_validity_analyzer/explore_documents.py

filecloud document-exploration diagnostic file-search pdf-validation

function test_ocr_retry_logic

Tests the OCR retry logic for extracting contract end dates by first attempting normal text extraction, then falling back to OCR-based extraction if the end date is not found.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_ocr_retry.py

testing ocr document-processing pdf-extraction contract-analysis

class ContractAnalyzer

Main class for analyzing contract validity from FileCloud documents.

File: /tf/active/vicechatdev/contract_validity_analyzer/core/analyzer.py

class contractanalyzer

class DocumentProcessor_v1

A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py

document-processing text-extraction pdf-processing word-processing llmsherpa

class DocumentProcessor_v2

A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py

document-processing text-extraction pdf-processing word-processing llmsherpa

function main_v88

Entry point function that demonstrates document processing workflow by creating an audited, watermarked, and protected PDF/A document from a DOCX file with audit trail data.

File: /tf/active/vicechatdev/document_auditor/main.py

document-processing pdf-generation audit-trail watermarking pdf-a-compliance

class DocumentProcessor

A comprehensive document processing class that converts documents to PDF, adds audit trails, applies security features (watermarks, signatures, hashing), and optionally converts to PDF/A format with document protection.

File: /tf/active/vicechatdev/document_auditor/src/document_processor.py

document-processing pdf-generation audit-trail security watermarking

class DocumentProcessor_v7

Lightweight document processor for chat upload functionality

File: /tf/active/vicechatdev/vice_ai/document_processor.py

class documentprocessor

function api_upload_document_v1

Flask API endpoint that handles document file uploads, validates file type and size, stores the file temporarily, and extracts basic text content for processing.

File: /tf/active/vicechatdev/vice_ai/new_app.py

file-upload document-processing api-endpoint flask validation

function smartstat_upload_files

Flask API endpoint that handles multi-file uploads (CSV, Excel, PDF, Word, PowerPoint) to a SmartStat session, processing data files as datasets and documents as information sheets.

File: /tf/active/vicechatdev/vice_ai/new_app.py

file-upload multi-file csv-processing excel-processing pdf-extraction

function test_enhanced_pdf_processing

A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

File: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py

testing pdf-processing document-processing diagnostic text-extraction

function test_extraction_debugging

A test function that validates the extraction debugging functionality of a DocumentProcessor by creating test files, simulating document extraction, and verifying debug log creation.

File: /tf/active/vicechatdev/vice_ai/test_extraction_debug.py

testing debugging document-processing file-operations unit-test

function publish_document_v3

Publishes a controlled document by incrementing its version to the next major revision, converting it to a signed PDF, and updating its status to PUBLISHED.

File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

document-management publishing version-control pdf-generation workflow

function convert_document_to_pdf_v1

Converts a document version to PDF format with audit trail, signatures, watermarks, and PDF/A compliance options, then uploads the result to FileCloud storage.

File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

document-conversion pdf-generation document-management audit-trail compliance

function prepare_audit_data_for_document_processor_v1

Prepares comprehensive audit data for a controlled document version, aggregating information from document history, reviews, approvals, and audit events into a structured format for DocumentProcessor.

File: /tf/active/vicechatdev/CDocs/controllers/document_controller.py

audit document-management compliance data-aggregation pdf-generation

class InvoiceProcessor

Main orchestrator class that coordinates the complete invoice processing pipeline from PDF extraction through validation to Excel generation.

File: /tf/active/vicechatdev/invoice_extraction/main.py

invoice-processing document-processing pdf-extraction entity-classification language-detection

class DocumentProcessor_v3

A comprehensive PDF document processor that handles text extraction, OCR (Optical Character Recognition), layout analysis, table detection, and metadata extraction from PDF files.

File: /tf/active/vicechatdev/invoice_extraction/core/document_processor.py

pdf-processing ocr text-extraction document-processing invoice-processing

class LanguageDetector

A language detection class that identifies whether invoice documents are written in English, French, or Dutch using both rule-based keyword matching and LLM-based detection.

File: /tf/active/vicechatdev/invoice_extraction/core/language_detector.py

language-detection nlp invoice-processing text-analysis multilingual

class EntityClassifier

Classifies which ViceBio entity (UK, Belgium, or Australia) an invoice is addressed to using rule-based pattern matching and LLM fallback.

File: /tf/active/vicechatdev/invoice_extraction/core/entity_classifier.py

classification entity-recognition invoice-processing pattern-matching regex

class BEExtractor

Belgium-specific invoice data extractor that uses LLM (Large Language Model) to extract structured invoice data from Belgian invoices in multiple languages (English, French, Dutch).

File: /tf/active/vicechatdev/invoice_extraction/extractors/be_extractor.py

invoice-extraction belgium llm ocr document-processing

class AUExtractor

Australia-specific invoice data extractor that uses LLM (Large Language Model) to extract structured invoice data from Australian tax invoices, handling ABN, ACN, GST, BSB numbers and Australian date formats.

File: /tf/active/vicechatdev/invoice_extraction/extractors/au_extractor.py

invoice-extraction australia llm ocr document-processing

class BaseExtractor

Abstract base class that defines the interface and shared functionality for entity-specific invoice data extractors (UK, BE, AU), providing a multi-stage extraction pipeline for invoice processing.

File: /tf/active/vicechatdev/invoice_extraction/extractors/base_extractor.py

invoice-processing data-extraction abstract-base-class OCR document-processing

class UKExtractor

UK-specific invoice data extractor.

File: /tf/active/vicechatdev/invoice_extraction/extractors/uk_extractor.py

class ukextractor

function convert_document_to_pdf

Converts a controlled document version to PDF format with audit trail, signatures, watermarking, and PDF/A compliance capabilities, then uploads the result to FileCloud storage.

File: /tf/active/vicechatdev/CDocs single class/controllers/document_controller.py

document-conversion pdf-generation pdf-a audit-trail compliance

function prepare_audit_data_for_document_processor

Prepares comprehensive audit data for a controlled document version, including revision history, reviews, approvals, and event history, formatted for DocumentProcessor consumption.

File: /tf/active/vicechatdev/CDocs single class/controllers/document_controller.py

audit document-management compliance metadata data-aggregation

Search Components

Search Results for "documentprocessor"

class DocumentProcessor_v5

class DocumentProcessor_v6

class DocumentProcessor_v4

class DocumentProcessor_v8

class DocumentIndexer

function test_document_processor

function test_extraction_methods

function test_local_document

function test_single_document

function test_document_processing

class TestDocumentProcessor

class ContractDataExtractor

function debug_download

function test_end_date_extraction

function explore_documents

function test_ocr_retry_logic

class ContractAnalyzer

class DocumentProcessor_v1

class DocumentProcessor_v2

function main_v88

class DocumentProcessor

class DocumentProcessor_v7

function api_upload_document_v1

function smartstat_upload_files

function test_enhanced_pdf_processing

function test_extraction_debugging

function publish_document_v3

function convert_document_to_pdf_v1

function prepare_audit_data_for_document_processor_v1

class InvoiceProcessor

class DocumentProcessor_v3

class LanguageDetector

class EntityClassifier

class BEExtractor

class AUExtractor

class BaseExtractor

class UKExtractor

function convert_document_to_pdf

function prepare_audit_data_for_document_processor

Search Examples