Search - Code Extractor

function check_fixes

A diagnostic function that prints a comprehensive summary report of email notification fixes implemented in a CDocs system, verifying template files and documenting debugging enhancements.

File: /tf/active/vicechatdev/fix_summary.py

diagnostic verification email-templates debugging documentation

function create_word_report

Generates a formatted Microsoft Word document report containing warranty disclosures with a table of contents, metadata, and structured sections for each warranty.

File: /tf/active/vicechatdev/convert_disclosures_to_table.py

document-generation word-document docx report-generation warranty

function test_reference_system_completeness

A diagnostic test function that prints a comprehensive overview of a reference system's architecture, including backend storage, API endpoints, reference types, and content flow verification.

File: /tf/active/vicechatdev/reference_system_verification.py

testing documentation diagnostic reference-system api-endpoints

class DocumentProcessor_v5

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

class documentprocessor

function show_critical_errors

Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.

File: /tf/active/vicechatdev/data_quality_dashboard.py

data-quality validation error-reporting date-validation data-cleaning

function show_problematic_flocks

Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

File: /tf/active/vicechatdev/data_quality_dashboard.py

data-quality reporting diagnostics livestock-management data-validation

class DocumentProcessor_v6

Process different document types for RAG context extraction

File: /tf/active/vicechatdev/offline_docstore_multi.py

class documentprocessor

class DocumentExtractor

A document text extraction class that supports multiple file formats including Word, PowerPoint, PDF, and plain text files, with automatic format detection and conversion capabilities.

File: /tf/active/vicechatdev/leexi/document_extractor.py

document-processing text-extraction pdf word powerpoint

function test_document_extractor

A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

File: /tf/active/vicechatdev/leexi/test_document_extractor.py

testing document-extraction file-processing validation text-extraction

function test_language_detection_and_translation

A test function that validates multi-language query processing capabilities including language detection, translation, and query expansion across multiple supported languages.

File: /tf/active/vicechatdev/docchat/test_multilanguage.py

testing multi-language language-detection translation RAG

class DocChatRAG

Main RAG engine with three operating modes: 1. Basic RAG (similarity search) 2. Extensive (full document retrieval with preprocessing) 3. Full Reading (process all documents)

File: /tf/active/vicechatdev/docchat/rag_engine.py

class docchatrag

class DocumentIndexer

A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

File: /tf/active/vicechatdev/docchat/document_indexer.py

document-indexing vector-database chromadb embeddings pdf-processing

function test_incremental_indexing

Comprehensive test function that validates incremental indexing functionality of a document indexing system, including initial indexing, change detection, re-indexing, and force re-indexing scenarios.

File: /tf/active/vicechatdev/docchat/test_incremental_indexing.py

testing incremental-indexing document-indexing integration-test file-system

function get_llm_instance

Factory function that creates and returns an appropriate LLM (Large Language Model) instance based on the specified model name, automatically detecting the provider (OpenAI, Azure OpenAI, or Anthropic) and configuring it with the given parameters.

File: /tf/active/vicechatdev/docchat/llm_factory.py

llm factory-pattern openai azure anthropic

function test_extraction_methods

A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

testing pdf-extraction ocr document-processing text-extraction

class DocumentProcessor_v1

A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py

document-processing text-extraction pdf-processing word-processing llmsherpa

class DocumentProcessor_v2

A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py

document-processing text-extraction pdf-processing word-processing llmsherpa

function main_v105

Command-line interface function that orchestrates the cleaning of ChromaDB collections by removing duplicates and similar documents, with options to skip collections and customize the cleaning process.

File: /tf/active/vicechatdev/chromadb-cleanup/main.py

cli command-line chromadb database-cleaning deduplication

function clean_collection

Cleans a ChromaDB collection by removing duplicate and similar documents using hash-based and similarity-based deduplication techniques, then saves the cleaned data to a new collection.

File: /tf/active/vicechatdev/chromadb-cleanup/main.py

data-cleaning deduplication chromadb vector-database similarity-detection

function main_v89

Command-line interface function that orchestrates a ChromaDB collection cleaning pipeline by removing duplicate and similar documents through hashing and similarity screening.

File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py

cli command-line data-cleaning deduplication chromadb

function test_nearly_similar_text_handling

A pytest test function that verifies the SimilarityCleaner's ability to identify and remove nearly similar text entries while preserving distinct ones.

File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

testing pytest text-processing similarity-detection deduplication

function test_similarity_threshold_effect

A pytest test function that validates the behavior of SimilarityCleaner with different similarity threshold values, ensuring that higher thresholds retain more texts while lower thresholds are more aggressive in removing similar content.

File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

testing pytest text-deduplication similarity-detection data-cleaning

class TestCombinedCleaner

A unittest test class that validates the functionality of the CombinedCleaner class, testing its ability to remove duplicate and similar texts from collections.

File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_combined_cleaner.py

unittest testing text-cleaning deduplication similarity-detection

function find_similar_documents

Identifies pairs of similar documents by comparing their embeddings and returns those exceeding a specified similarity threshold, sorted by similarity score.

File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

document-similarity embedding-comparison duplicate-detection cosine-similarity nlp

function hash_text

Creates a SHA-256 hash of normalized text content to generate a unique identifier for documents, enabling duplicate detection and content comparison.

File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

hashing text-processing deduplication content-fingerprinting sha256

function identify_duplicates

Identifies duplicate documents by computing hash values of their text content and grouping documents with identical hashes.

File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

deduplication document-processing hashing data-cleaning duplicate-detection

function get_unique_documents

Identifies and separates unique documents from duplicates in a list by comparing hash values of document text content.

File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

deduplication document-processing data-cleaning hashing text-processing

function check_and_fix_corruption

Scans a SQLite database for corrupted chat_session_id values in the text_sections table and automatically fixes them by setting invalid entries to NULL.

File: /tf/active/vicechatdev/vice_ai/direct_corruption_checker.py

database sqlite data-integrity corruption-detection data-cleaning

function process_markdown_content

Parses markdown-formatted text content and converts it into a structured list of content elements with type annotations and formatting metadata suitable for document export.

File: /tf/active/vicechatdev/vice_ai/complex_app.py

markdown parser document-processing text-processing content-conversion

function add_table_to_word_v1

Adds a formatted table to a Microsoft Word document using the python-docx library, with automatic column detection, header row styling, and debug logging.

File: /tf/active/vicechatdev/vice_ai/complex_app.py

word-document table-generation docx document-formatting python-docx

class ScriptExecutor

A sandboxed Python script executor that safely runs user-provided Python code with timeout controls, security restrictions, and isolated execution environments for data analysis tasks.

File: /tf/active/vicechatdev/vice_ai/script_executor.py

sandbox script-execution security code-validation data-analysis

function convert_european_decimals

Detects and converts numeric data with European decimal format (comma as decimal separator) to standard format (dot as decimal separator) in a pandas DataFrame, handling mixed formats and missing data patterns.