🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "similarity"

Found 44 matching component(s)

  • class SimpleDataHandle

    A data handler class that manages multiple data sources with different types (dataframes, vector stores, databases) and their associated processing configurations.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    data-management registry vector-store RAG dataframe
  • class OneCo_hybrid_RAG

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    class oneco_hybrid_rag
  • class pathobrowser_base

    Base class that contains all static elements of the app Parameters ---------- image : str An Image UID which may be passed on app startup. Immediately redirects to said image Attributes ---------- current_user : Userclass A class containing various information on the user workspace : panel.layout.Column The main container of the app sidebar : panel.layout.Column Container showing items on the side of the app head : panel.layout.Row The header of the app modal : panel.layout.Column The container for the modal window of the app

    File: /tf/active/vicechatdev/datacapture_integrated.py

    class pathobrowser_base
  • class OneCo_hybrid_RAG_v1

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG_old.py

    class oneco_hybrid_rag
  • class OneCo_hybrid_RAG_v2

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag
  • class ExtensiveSearchManager

    Manages extensive search functionality including full document retrieval, summarization, and enhanced context gathering.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class extensivesearchmanager
  • class pathobrowser_base_v1

    Base class that contains all static elements of the app Parameters ---------- image : str An Image UID which may be passed on app startup. Immediately redirects to said image Attributes ---------- current_user : Userclass A class containing various information on the user workspace : panel.layout.Column The main container of the app sidebar : panel.layout.Column Container showing items on the side of the app head : panel.layout.Row The header of the app modal : panel.layout.Column The container for the modal window of the app

    File: /tf/active/vicechatdev/datacapture.py

    class pathobrowser_base
  • class DocChatRAG

    Main RAG engine with three operating modes: 1. Basic RAG (similarity search) 2. Extensive (full document retrieval with preprocessing) 3. Full Reading (process all documents)

    File: /tf/active/vicechatdev/docchat/rag_engine.py

    class docchatrag
  • class DocumentIndexer

    A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    document-indexing vector-database chromadb embeddings pdf-processing
  • function main_v61

    Command-line interface function that orchestrates the cleaning of ChromaDB collections by removing duplicates and similar documents, with options to skip collections and customize the cleaning process.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    cli command-line chromadb database-cleaning deduplication
  • function clean_collection

    Cleans a ChromaDB collection by removing duplicate and similar documents using hash-based and similarity-based deduplication techniques, then saves the cleaned data to a new collection.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    data-cleaning deduplication chromadb vector-database similarity-detection
  • function load_data_from_chromadb

    Connects to a ChromaDB instance and retrieves all documents from a specified collection, returning them as a list of dictionaries with document IDs, text content, embeddings, and metadata.

    File: /tf/active/vicechatdev/chromadb-cleanup/main.py

    chromadb vector-database data-loading document-retrieval embeddings
  • function main_v52

    Command-line interface function that orchestrates a ChromaDB collection cleaning pipeline by removing duplicate and similar documents through hashing and similarity screening.

    File: /tf/active/vicechatdev/chromadb-cleanup/main copy.py

    cli command-line data-cleaning deduplication chromadb
  • function setup_similarity_cleaner

    A pytest fixture that creates and returns a configured SimilarityCleaner instance with a threshold of 0.8 for use in test cases.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    pytest fixture testing similarity data-cleaning
  • function test_identical_text_removal

    A pytest test function that verifies the SimilarityCleaner's ability to remove identical duplicate text entries from a list while preserving unique documents.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest unit-test deduplication text-processing
  • function test_nearly_similar_text_handling

    A pytest test function that verifies the SimilarityCleaner's ability to identify and remove nearly similar text entries while preserving distinct ones.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest text-processing similarity-detection deduplication
  • function test_empty_input

    A pytest test function that verifies the SimilarityCleaner correctly handles empty input by returning an empty list.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing unit-test pytest edge-case empty-input
  • function test_single_text_input

    A pytest test function that verifies the SimilarityCleaner correctly handles a single text document by returning it unchanged.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing unit-test pytest text-processing similarity
  • function test_similarity_threshold_effect

    A pytest test function that validates the behavior of SimilarityCleaner with different similarity threshold values, ensuring that higher thresholds retain more texts while lower thresholds are more aggressive in removing similar content.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_similarity_cleaner.py

    testing pytest text-deduplication similarity-detection data-cleaning
  • class TestCombinedCleaner

    A unittest test class that validates the functionality of the CombinedCleaner class, testing its ability to remove duplicate and similar texts from collections.

    File: /tf/active/vicechatdev/chromadb-cleanup/tests/test_combined_cleaner.py

    unittest testing text-cleaning deduplication similarity-detection
  • class Config_v6

    A dataclass that stores configuration settings for a ChromaDB cleanup process, including connection parameters, cleaning/clustering options, and summarization settings.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/config.py

    configuration dataclass chromadb settings cleanup
  • class TextClusterer

    A class that clusters similar documents based on their embeddings using various clustering algorithms (K-means, Agglomerative, DBSCAN) and optionally generates summaries for each cluster.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/clustering/text_clusterer.py

    clustering document-clustering embeddings machine-learning kmeans
  • function calculate_similarity

    Computes the cosine similarity between two embedding vectors, returning a normalized score between 0 and 1 that measures their directional alignment.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    cosine-similarity vector-comparison embeddings similarity-metric machine-learning
  • function build_similarity_matrix

    Computes a pairwise cosine similarity matrix for a collection of embedding vectors, where each cell (i,j) represents the similarity between embedding i and embedding j.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    embeddings similarity cosine-similarity matrix nlp
  • function find_similar_documents

    Identifies pairs of similar documents by comparing their embeddings and returns those exceeding a specified similarity threshold, sorted by similarity score.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    document-similarity embedding-comparison duplicate-detection cosine-similarity nlp
  • class CombinedCleaner

    A document cleaner that combines hash-based and similarity-based cleaning approaches to remove both exact and near-duplicate documents in a two-stage process.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/combined_cleaner.py

    document-cleaning deduplication data-processing hash-based similarity-based
  • class SimilarityCleaner

    A document cleaning class that identifies and removes duplicate or highly similar documents based on embedding vector similarity, keeping only representative documents from each similarity group.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/similarity_cleaner.py

    document-processing deduplication similarity embeddings clustering
  • function api_send_chat_message

    Flask API endpoint that handles sending a message in a chat session, processes it through a hybrid RAG engine with configurable search and memory settings, and returns an AI-generated response with references.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    flask api chat rag hybrid-rag
  • function process_chat_request_background

    Process chat request in background thread

    File: /tf/active/vicechatdev/vice_ai/app.py

    function process_chat_request_background
  • function api_chat_v1

    Handle chat API requests with support for long-running tasks

    File: /tf/active/vicechatdev/vice_ai/app.py

    function api_chat
  • function api_send_chat_message_v1

    Flask API endpoint that handles sending messages in a chat session, processes them through a RAG (Retrieval-Augmented Generation) engine with configurable LLM models, and returns AI-generated responses with references.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    chat api rag llm conversational-ai
  • class TextSectionService

    Service class for managing TextSection entities, providing CRUD operations, versioning, chat functionality, and search capabilities.

    File: /tf/active/vicechatdev/vice_ai/services.py

    service-layer text-management versioning crud-operations chat-integration
  • class OneCo_hybrid_RAG_v3

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/vice_ai/hybrid_rag_engine.py

    class oneco_hybrid_rag
  • class ExtensiveSearchManager_v1

    Manages extensive search functionality including full document retrieval, summarization, and enhanced context gathering.

    File: /tf/active/vicechatdev/vice_ai/hybrid_rag_engine.py

    class extensivesearchmanager
  • class VersionComparisonService

    A service class that compares two versions of a document using LLM-based analysis, implementing smart segmentation and chunking for handling large documents efficiently.

    File: /tf/active/vicechatdev/CDocs/utils/version_comparison.py

    document-comparison version-control llm openai text-analysis
  • class EnhancedSQLWorkflow

    Enhanced SQL workflow with iterative optimization

    File: /tf/active/vicechatdev/full_smartstat/enhanced_sql_workflow.py

    class enhancedsqlworkflow
  • class VendorEmailExtractor

    Extract vendor email addresses from all organizational mailboxes

    File: /tf/active/vicechatdev/find_email/vendor_email_extractor.py

    class vendoremailextractor
  • class ChromaManager

    ChromaManager is a class that manages interactions with a Chroma vector database, providing methods to create collections, add documents with embeddings, and query for similar documents.

    File: /tf/active/vicechatdev/QA_updater/knowledge_store/chroma_manager.py

    vector-database chromadb embeddings semantic-search document-retrieval
  • class LanguageDetector

    A language detection class that identifies whether invoice documents are written in English, French, or Dutch using both rule-based keyword matching and LLM-based detection.

    File: /tf/active/vicechatdev/invoice_extraction/core/language_detector.py

    language-detection nlp invoice-processing text-analysis multilingual
  • class OneCo_hybrid_RAG_v4

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/datacapture_backup_16072025/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag
  • class pathobrowser_base_v2

    Base class that contains all static elements of the app Parameters ---------- image : str An Image UID which may be passed on app startup. Immediately redirects to said image Attributes ---------- current_user : Userclass A class containing various information on the user workspace : panel.layout.Column The main container of the app sidebar : panel.layout.Column Container showing items on the side of the app head : panel.layout.Row The header of the app modal : panel.layout.Column The container for the modal window of the app

    File: /tf/active/vicechatdev/datacapture_backup_16072025/datacapture.py

    class pathobrowser_base
  • class OneCo_hybrid_RAG_v5

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/data_capture_backup_18072025/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag
  • class ExtensiveSearchManager_v2

    Manages extensive search functionality including full document retrieval, summarization, and enhanced context gathering.

    File: /tf/active/vicechatdev/data_capture_backup_18072025/OneCo_hybrid_RAG.py

    class extensivesearchmanager
  • class pathobrowser_base_v3

    Base class that contains all static elements of the app Parameters ---------- image : str An Image UID which may be passed on app startup. Immediately redirects to said image Attributes ---------- current_user : Userclass A class containing various information on the user workspace : panel.layout.Column The main container of the app sidebar : panel.layout.Column Container showing items on the side of the app head : panel.layout.Row The header of the app modal : panel.layout.Column The container for the modal window of the app

    File: /tf/active/vicechatdev/data_capture_backup_18072025/datacapture.py

    class pathobrowser_base

Search Examples