🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "cleaner"

Found 25 matching component(s)

  • function quick_clean

    Cleans flock data by identifying and removing flocks that have treatment records with timing inconsistencies (treatments administered outside the flock's start/end date range).

    File: /tf/active/vicechatdev/quick_cleaner.py

    data-cleaning data-quality flock-management livestock poultry
  • function main_page

    Renders the main navigation page for the FileCloud Data Processor application, providing authenticated users with access to various modules including document audit, controlled documents, settings, and reports.

    File: /tf/active/vicechatdev/datacapture_integrated.py

    streamlit ui navigation authentication dashboard
  • function select_dataset

    Interactive command-line function that prompts users to select between original, cleaned, or comparison of flock datasets for analysis.

    File: /tf/active/vicechatdev/data_quality_dashboard.py

    user-interface dataset-selection interactive command-line data-loading
  • function generate_html_from_msg

    Converts an email message object into a formatted HTML representation with styling, headers, body content, and attachment information.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email html-generation email-parsing formatting msg-file
  • function generate_simple_html_from_eml

    Converts an email.message.Message object into a clean, styled HTML representation with embedded inline images and attachment listings.

    File: /tf/active/vicechatdev/msg_to_eml.py

    email html-generation email-parsing mime inline-images
  • class DocumentProcessor_v8

    Process different document types for indexing

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    class documentprocessor
  • function clean_collection

    Cleans a ChromaDB collection by removing duplicate and similar documents using hash-based and similarity-based deduplication techniques, then saves the cleaned data to a new collection.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    data-cleaning deduplication chromadb vector-database similarity-detection
  • function main_v52

    Command-line interface function that orchestrates a ChromaDB collection cleaning pipeline by removing duplicate and similar documents through hashing and similarity screening.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py

    cli command-line data-cleaning deduplication chromadb
  • function setup_similarity_cleaner

    A pytest fixture that creates and returns a configured SimilarityCleaner instance with a threshold of 0.8 for use in test cases.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    pytest fixture testing similarity data-cleaning
  • function test_identical_text_removal

    A pytest test function that verifies the SimilarityCleaner's ability to remove identical duplicate text entries from a list while preserving unique documents.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest unit-test deduplication text-processing
  • function test_nearly_similar_text_handling

    A pytest test function that verifies the SimilarityCleaner's ability to identify and remove nearly similar text entries while preserving distinct ones.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest text-processing similarity-detection deduplication
  • function test_empty_input

    A pytest test function that verifies the SimilarityCleaner correctly handles empty input by returning an empty list.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing unit-test pytest edge-case empty-input
  • function test_single_text_input

    A pytest test function that verifies the SimilarityCleaner correctly handles a single text document by returning it unchanged.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing unit-test pytest text-processing similarity
  • function test_similarity_threshold_effect

    A pytest test function that validates the behavior of SimilarityCleaner with different similarity threshold values, ensuring that higher thresholds retain more texts while lower thresholds are more aggressive in removing similar content.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest text-deduplication similarity-detection data-cleaning
  • class TestCombinedCleaner

    A unittest test class that validates the functionality of the CombinedCleaner class, testing its ability to remove duplicate and similar texts from collections.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_combined_cleaner.py

    unittest testing text-cleaning deduplication similarity-detection
  • function hash_cleaner

    A pytest fixture that instantiates and returns a HashCleaner object for use in test cases.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_hash_cleaner.py

    pytest fixture testing hash cleaner
  • function test_remove_identical_chunks

    A pytest test function that verifies the HashCleaner's ability to remove duplicate text chunks from a list while preserving order and unique entries.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_hash_cleaner.py

    testing pytest unit-test deduplication text-processing
  • function test_empty_input_v1

    A pytest test function that verifies the HashCleaner's behavior when processing an empty list of text chunks.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_hash_cleaner.py

    testing unit-test pytest edge-case boundary-condition
  • function test_no_identical_chunks

    A unit test function that verifies the HashCleaner's behavior when processing a list of unique text chunks, ensuring no chunks are removed when all are distinct.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_hash_cleaner.py

    unit-test pytest hash-cleaner deduplication text-processing
  • function test_identical_chunks_with_different_cases

    A unit test function that verifies the HashCleaner's ability to remove duplicate text chunks while being case-sensitive, ensuring that strings differing only in case are treated as distinct entries.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_hash_cleaner.py

    unit-test pytest deduplication case-sensitive text-processing
  • class HashCleaner

    A document deduplication cleaner that removes documents with identical content by comparing hash values of document text.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/hash_cleaner.py

    deduplication data-cleaning hash-based document-processing duplicate-removal
  • class CombinedCleaner

    A document cleaner that combines hash-based and similarity-based cleaning approaches to remove both exact and near-duplicate documents in a two-stage process.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/combined_cleaner.py

    document-cleaning deduplication data-processing hash-based similarity-based
  • class SimilarityCleaner

    A document cleaning class that identifies and removes duplicate or highly similar documents based on embedding vector similarity, keeping only representative documents from each similarity group.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/similarity_cleaner.py

    document-processing deduplication similarity embeddings clustering
  • class BaseCleaner

    Abstract base class that defines the interface for document cleaning implementations, providing methods to remove redundancy from document collections and track cleaning statistics.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/base_cleaner.py

    abstract-base-class document-processing data-cleaning redundancy-removal statistics
  • class DocumentDashboard

    Dashboard for viewing and managing controlled documents.

    File: /tf/active/vicechatdev/CDocs/ui/document_dashboard.py

    class documentdashboard

Search Examples