🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "detection"

Found 50 matching component(s)

  • function check_fixes

    A diagnostic function that prints a comprehensive summary report of email notification fixes implemented in a CDocs system, verifying template files and documenting debugging enhancements.

    File: /tf/active/vicechatdev/fix_summary.py

    diagnostic verification email-templates debugging documentation
  • function create_word_report

    Generates a formatted Microsoft Word document report containing warranty disclosures with a table of contents, metadata, and structured sections for each warranty.

    File: /tf/active/vicechatdev/convert_disclosures_to_table.py

    document-generation word-document docx report-generation warranty
  • function test_reference_system_completeness

    A diagnostic test function that prints a comprehensive overview of a reference system's architecture, including backend storage, API endpoints, reference types, and content flow verification.

    File: /tf/active/vicechatdev/reference_system_verification.py

    testing documentation diagnostic reference-system api-endpoints
  • class DocumentProcessor_v5

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

    class documentprocessor
  • function show_critical_errors

    Displays critical data quality errors in treatment records, focusing on date anomalies including 1900 dates, extreme future dates, and extreme past dates relative to flock lifecycles.

    File: /tf/active/vicechatdev/data_quality_dashboard.py

    data-quality validation error-reporting date-validation data-cleaning
  • function show_problematic_flocks

    Analyzes and displays problematic flocks from a dataset by identifying those with systematic timing issues in their treatment records, categorizing them by severity and volume.

    File: /tf/active/vicechatdev/data_quality_dashboard.py

    data-quality reporting diagnostics livestock-management data-validation
  • class DocumentProcessor_v6

    Process different document types for RAG context extraction

    File: /tf/active/vicechatdev/offline_docstore_multi.py

    class documentprocessor
  • class DocumentExtractor

    A document text extraction class that supports multiple file formats including Word, PowerPoint, PDF, and plain text files, with automatic format detection and conversion capabilities.

    File: /tf/active/vicechatdev/leexi/document_extractor.py

    document-processing text-extraction pdf word powerpoint
  • function test_document_extractor

    A test function that validates the DocumentExtractor class by testing file type support detection, text extraction from various document formats, and error handling.

    File: /tf/active/vicechatdev/leexi/test_document_extractor.py

    testing document-extraction file-processing validation text-extraction
  • function test_language_detection_and_translation

    A test function that validates multi-language query processing capabilities including language detection, translation, and query expansion across multiple supported languages.

    File: /tf/active/vicechatdev/docchat/test_multilanguage.py

    testing multi-language language-detection translation RAG
  • class DocChatRAG

    Main RAG engine with three operating modes: 1. Basic RAG (similarity search) 2. Extensive (full document retrieval with preprocessing) 3. Full Reading (process all documents)

    File: /tf/active/vicechatdev/docchat/rag_engine.py

    class docchatrag
  • class DocumentIndexer

    A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    document-indexing vector-database chromadb embeddings pdf-processing
  • function test_incremental_indexing

    Comprehensive test function that validates incremental indexing functionality of a document indexing system, including initial indexing, change detection, re-indexing, and force re-indexing scenarios.

    File: /tf/active/vicechatdev/docchat/test_incremental_indexing.py

    testing incremental-indexing document-indexing integration-test file-system
  • function get_llm_instance

    Factory function that creates and returns an appropriate LLM (Large Language Model) instance based on the specified model name, automatically detecting the provider (OpenAI, Azure OpenAI, or Anthropic) and configuring it with the given parameters.

    File: /tf/active/vicechatdev/docchat/llm_factory.py

    llm factory-pattern openai azure anthropic
  • function test_extraction_methods

    A test function that compares two PDF text extraction methods (regular llmsherpa and OCR-based Tesseract) on a specific purchase order document from FileCloud, checking for vendor name detection.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_extraction_methods.py

    testing pdf-extraction ocr document-processing text-extraction
  • class DocumentProcessor_v1

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_new.py

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • class DocumentProcessor_v2

    A document processing class that extracts text from PDF and Word documents using llmsherpa as the primary method with fallback support for PyPDF2, pdfplumber, and python-docx.

    File: /tf/active/vicechatdev/contract_validity_analyzer/utils/document_processor_old.py

    document-processing text-extraction pdf-processing word-processing llmsherpa
  • function main_v105

    Command-line interface function that orchestrates the cleaning of ChromaDB collections by removing duplicates and similar documents, with options to skip collections and customize the cleaning process.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    cli command-line chromadb database-cleaning deduplication
  • function clean_collection

    Cleans a ChromaDB collection by removing duplicate and similar documents using hash-based and similarity-based deduplication techniques, then saves the cleaned data to a new collection.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    data-cleaning deduplication chromadb vector-database similarity-detection
  • function main_v89

    Command-line interface function that orchestrates a ChromaDB collection cleaning pipeline by removing duplicate and similar documents through hashing and similarity screening.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py

    cli command-line data-cleaning deduplication chromadb
  • function test_nearly_similar_text_handling

    A pytest test function that verifies the SimilarityCleaner's ability to identify and remove nearly similar text entries while preserving distinct ones.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest text-processing similarity-detection deduplication
  • function test_similarity_threshold_effect

    A pytest test function that validates the behavior of SimilarityCleaner with different similarity threshold values, ensuring that higher thresholds retain more texts while lower thresholds are more aggressive in removing similar content.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest text-deduplication similarity-detection data-cleaning
  • class TestCombinedCleaner

    A unittest test class that validates the functionality of the CombinedCleaner class, testing its ability to remove duplicate and similar texts from collections.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_combined_cleaner.py

    unittest testing text-cleaning deduplication similarity-detection
  • function find_similar_documents

    Identifies pairs of similar documents by comparing their embeddings and returns those exceeding a specified similarity threshold, sorted by similarity score.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    document-similarity embedding-comparison duplicate-detection cosine-similarity nlp
  • function hash_text

    Creates a SHA-256 hash of normalized text content to generate a unique identifier for documents, enabling duplicate detection and content comparison.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

    hashing text-processing deduplication content-fingerprinting sha256
  • function identify_duplicates

    Identifies duplicate documents by computing hash values of their text content and grouping documents with identical hashes.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

    deduplication document-processing hashing data-cleaning duplicate-detection
  • function get_unique_documents

    Identifies and separates unique documents from duplicates in a list by comparing hash values of document text content.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

    deduplication document-processing data-cleaning hashing text-processing
  • function check_and_fix_corruption

    Scans a SQLite database for corrupted chat_session_id values in the text_sections table and automatically fixes them by setting invalid entries to NULL.

    File: /tf/active/vicechatdev/vice_ai/direct_corruption_checker.py

    database sqlite data-integrity corruption-detection data-cleaning
  • function process_markdown_content

    Parses markdown-formatted text content and converts it into a structured list of content elements with type annotations and formatting metadata suitable for document export.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    markdown parser document-processing text-processing content-conversion
  • function add_table_to_word_v1

    Adds a formatted table to a Microsoft Word document using the python-docx library, with automatic column detection, header row styling, and debug logging.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    word-document table-generation docx document-formatting python-docx
  • class ScriptExecutor

    A sandboxed Python script executor that safely runs user-provided Python code with timeout controls, security restrictions, and isolated execution environments for data analysis tasks.

    File: /tf/active/vicechatdev/vice_ai/script_executor.py

    sandbox script-execution security code-validation data-analysis
  • function convert_european_decimals

    Detects and converts numeric data with European decimal format (comma as decimal separator) to standard format (dot as decimal separator) in a pandas DataFrame, handling mixed formats and missing data patterns.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    data-processing data-cleaning decimal-conversion european-format locale-handling
  • function smart_read_csv

    Automatically detects CSV file delimiters (comma, semicolon, tab) and handles regional decimal formats (European comma vs US/UK point) to reliably parse CSV files from different locales.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    csv data-loading file-parsing delimiter-detection regional-formats
  • function validate_sheet_format

    Analyzes Excel sheet structure using multiple heuristics to classify it as tabular data, information sheet, or mixed format, returning quality metrics and extraction recommendations.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    data-validation excel-processing sheet-classification data-quality heuristic-analysis
  • function detect_table_boundaries

    Detects distinct tables within a pandas DataFrame by identifying empty rows as table boundaries and returns metadata about each detected table region.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    data-processing excel table-detection boundary-detection pandas
  • function extract_table_as_markdown

    Extracts a specified row range from a pandas DataFrame and converts it into a properly formatted markdown table with automatic header detection and data cleaning.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    markdown table-formatting data-conversion pandas dataframe
  • function extract_sheet_context

    Extracts comprehensive text context from Excel DataFrame sheets that contain mixed structured and unstructured content, converting them into markdown-formatted text while preserving table structures, key-value pairs, and section headers.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    excel-processing data-extraction markdown-generation text-parsing table-detection
  • class SmartStatService

    Service for running SmartStat analysis sessions in Vice AI

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    class smartstatservice
  • class DocumentProcessor_v7

    Lightweight document processor for chat upload functionality

    File: /tf/active/vicechatdev/vice_ai/document_processor.py

    class documentprocessor
  • function test_models_integration

    Integration test function that validates the import and instantiation of data models including SectionType, TextSection, DataAnalysisSession, AnalysisStatus, DataSource, and DataSourceType.

    File: /tf/active/vicechatdev/vice_ai/test_integration.py

    testing integration-test models validation import-test
  • function check_specific_corruption

    Detects and fixes specific corruption patterns in the chat_session_id column of a SQLite database's text_sections table, replacing invalid values with NULL.

    File: /tf/active/vicechatdev/vice_ai/check_specific_corruption.py

    database sqlite data-cleaning corruption-detection data-repair
  • function test_enhanced_pdf_processing

    A comprehensive test function that validates PDF processing capabilities, including text extraction, cleaning, chunking, and table detection across multiple PDF processing libraries.

    File: /tf/active/vicechatdev/vice_ai/test_enhanced_pdf.py

    testing pdf-processing document-processing diagnostic text-extraction
  • function remove_outliers_iqr

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py

    data-cleaning outlier-detection IQR interquartile-range data-preprocessing
  • function remove_outliers_iqr_v1

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a 3×IQR threshold.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/project_1/analysis.py

    data-cleaning outlier-detection IQR interquartile-range statistics
  • function remove_outliers

    Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py

    data-cleaning outlier-detection IQR interquartile-range data-preprocessing
  • function detect_outliers_iqr

    Detects extreme outliers in a pandas Series using the Interquartile Range (IQR) method with a configurable multiplier (default 3.0).

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5021ab2a-8cdd-44cb-81ad-201598352e39/analysis_1.py

    outlier-detection IQR interquartile-range data-cleaning anomaly-detection
  • function detect_outliers_iqr_v1

    Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py

    outlier-detection IQR interquartile-range statistics data-cleaning
  • function detect_outliers_zscore

    Detects outliers in numerical data using the Z-score statistical method, identifying data points that deviate significantly from the mean.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py

    outlier-detection statistics data-cleaning anomaly-detection z-score
  • function detect_outliers_iqr_v2

    Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py

    outlier-detection statistics data-cleaning IQR interquartile-range
  • function main_v8

    Main execution function that orchestrates the import of controlled documents from FileCloud into a Neo4j database, checking for duplicates and managing document metadata.

    File: /tf/active/vicechatdev/CDocs/FC_sync.py

    document-management filecloud neo4j import batch-processing

Search Examples