🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "clustering"

Found 14 matching component(s)

  • class PatternBasedExtractor

    Extract flocks based on farm-level In-Ovo usage patterns.

    File: /tf/active/vicechatdev/pattern_based_extraction.py

    class patternbasedextractor
  • function main_v5

    Command-line interface function that orchestrates pattern-based extraction of poultry flock data, including data loading, pattern classification, geocoding, and export functionality.

    File: /tf/active/vicechatdev/pattern_based_extraction.py

    cli command-line-interface data-extraction poultry-data pattern-analysis
  • function clean_collection

    Cleans a ChromaDB collection by removing duplicate and similar documents using hash-based and similarity-based deduplication techniques, then saves the cleaned data to a new collection.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    data-cleaning deduplication chromadb vector-database similarity-detection
  • function load_data_from_chromadb

    Connects to a ChromaDB instance and retrieves all documents from a specified collection, returning them as a list of dictionaries with document IDs, text content, embeddings, and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    chromadb vector-database data-loading document-retrieval embeddings
  • function save_data_to_chromadb_v1

    Saves a list of document dictionaries to a ChromaDB collection, with support for batch processing, embeddings, and metadata storage.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    chromadb vector-database document-storage embeddings batch-processing
  • function main_v51

    Command-line interface function that orchestrates a ChromaDB collection cleaning pipeline by removing duplicate and similar documents through hashing and similarity screening.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py

    cli command-line data-cleaning deduplication chromadb
  • function load_data_from_chromadb_v1

    Retrieves all documents from a specified ChromaDB collection, including their IDs, text content, embeddings, and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py

    chromadb database document-retrieval vector-database embeddings
  • function save_data_to_chromadb

    Saves a list of document dictionaries to a ChromaDB vector database collection, optionally including embeddings and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py

    chromadb vector-database document-storage embeddings persistence
  • class Config_v6

    A dataclass that stores configuration settings for a ChromaDB cleanup process, including connection parameters, cleaning/clustering options, and summarization settings.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/config.py

    configuration dataclass chromadb settings cleanup
  • class TextClusterer

    A class that clusters similar documents based on their embeddings using various clustering algorithms (K-means, Agglomerative, DBSCAN) and optionally generates summaries for each cluster.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/clustering/text_clusterer.py

    clustering document-clustering embeddings machine-learning kmeans
  • function build_similarity_matrix

    Computes a pairwise cosine similarity matrix for a collection of embedding vectors, where each cell (i,j) represents the similarity between embedding i and embedding j.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    embeddings similarity cosine-similarity matrix nlp
  • function find_similar_documents

    Identifies pairs of similar documents by comparing their embeddings and returns those exceeding a specified similarity threshold, sorted by similarity score.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    document-similarity embedding-comparison duplicate-detection cosine-similarity nlp
  • function summarize_text

    A deprecated standalone function that was originally designed to summarize groups of similar documents but now only returns the input documents unchanged with a deprecation warning.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/summarization/summarizer.py

    deprecated text-summarization document-processing nlp text-clustering
  • class SimilarityCleaner

    A document cleaning class that identifies and removes duplicate or highly similar documents based on embedding vector similarity, keeping only representative documents from each similarity group.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/similarity_cleaner.py

    document-processing deduplication similarity embeddings clustering

Search Examples