🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "nlp"

Found 32 matching component(s)

  • function clean_text

    Cleans and normalizes text content by removing HTML tags, normalizing whitespace, and stripping markdown formatting elements.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    text-processing text-cleaning normalization html-removal markdown-removal
  • function validate_and_alternatives

    Validates whether a given keyword is a valid chemical compound, biochemical concept, or drug-related term using GPT-4, and returns alternative names/synonyms if valid.

    File: /tf/active/vicechatdev/offline_parser_docstore.py

    validation chemistry biochemistry drug-research llm
  • class MyEmbeddingFunction_v1

    A custom embedding function class that generates embeddings for documents using OpenAI's API, with built-in text summarization for long documents and token management.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    embeddings openai chromadb vector-database text-summarization
  • class MyEmbeddingFunction_v2

    A custom embedding function class that generates embeddings for text documents using OpenAI's embedding models, with automatic text summarization and token management for large documents.

    File: /tf/active/vicechatdev/offline_docstore_multi_vice.py

    embeddings openai chromadb text-processing summarization
  • class MyEmbeddingFunction

    Custom embedding function class that integrates OpenAI's embedding API with Chroma DB for generating vector embeddings from text documents.

    File: /tf/active/vicechatdev/project_victoria_disclosure_generator.py

    embeddings openai chroma vector-database nlp
  • class MyEmbeddingFunction_v3

    A custom embedding function class that generates embeddings for text documents using OpenAI's embedding models, with automatic text summarization and token limit handling for large documents.

    File: /tf/active/vicechatdev/offline_docstore_multi.py

    embeddings openai vector-database chromadb text-processing
  • function test_attendee_extraction_comprehensive

    A comprehensive test function that validates the attendee extraction logic from meeting transcripts, comparing actual speakers versus mentioned names, and demonstrating integration with meeting minutes generation.

    File: /tf/active/vicechatdev/leexi/test_attendee_comprehensive.py

    testing attendee-extraction meeting-minutes transcript-parsing speaker-identification
  • function test_language_detection_and_translation

    A test function that validates multi-language query processing capabilities including language detection, translation, and query expansion across multiple supported languages.

    File: /tf/active/vicechatdev/docchat/test_multilanguage.py

    testing multi-language language-detection translation RAG
  • function full_reading_example

    Demonstrates the full reading mode of a RAG (Retrieval-Augmented Generation) system by processing all documents to answer a comprehensive query about key findings.

    File: /tf/active/vicechatdev/docchat/example_usage.py

    example demonstration RAG retrieval-augmented-generation full-reading
  • function api_chat

    Flask API endpoint that handles chat requests asynchronously, processing user queries through a RAG (Retrieval-Augmented Generation) engine with support for multiple modes, memory, web search, and custom configurations.

    File: /tf/active/vicechatdev/docchat/app.py

    flask api chat async rag
  • class QueryBasedExtractor

    A class that extracts relevant information from documents using a small LLM (Language Model), designed for Extensive and Full Reading modes in RAG systems.

    File: /tf/active/vicechatdev/docchat/rag_engine.py

    information-extraction document-processing llm rag query-based
  • class DocChatEmbeddingFunction

    A custom ChromaDB embedding function that generates OpenAI embeddings with automatic text summarization for documents exceeding token limits.

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    embeddings chromadb openai text-processing summarization
  • class DocumentIndexer

    A class for indexing documents into ChromaDB with support for multiple file formats (PDF, Word, PowerPoint, Excel, text files), smart incremental indexing, and document chunk management.

    File: /tf/active/vicechatdev/docchat/document_indexer.py

    document-indexing vector-database chromadb embeddings pdf-processing
  • function test_local_document

    Integration test function that validates end date extraction from a local PDF document using document processing and LLM-based analysis.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_local_document.py

    testing integration-test document-processing pdf-extraction llm
  • function test_llm_client

    Tests the LLM client functionality by analyzing a sample contract text and verifying the extraction of key contract metadata such as third parties, dates, and status.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_implementation.py

    testing llm contract-analysis integration-test validation
  • function test_llm_extraction

    A test function that validates LLM-based contract data extraction by processing a sample contract and verifying the extracted fields against expected values.

    File: /tf/active/vicechatdev/contract_validity_analyzer/test_extractor.py

    testing contract-extraction llm validation integration-test
  • class TextClusterer

    A class that clusters similar documents based on their embeddings using various clustering algorithms (K-means, Agglomerative, DBSCAN) and optionally generates summaries for each cluster.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/clustering/text_clusterer.py

    clustering document-clustering embeddings machine-learning kmeans
  • function calculate_similarity

    Computes the cosine similarity between two embedding vectors, returning a normalized score between 0 and 1 that measures their directional alignment.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    cosine-similarity vector-comparison embeddings similarity-metric machine-learning
  • function build_similarity_matrix

    Computes a pairwise cosine similarity matrix for a collection of embedding vectors, where each cell (i,j) represents the similarity between embedding i and embedding j.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    embeddings similarity cosine-similarity matrix nlp
  • function find_similar_documents

    Identifies pairs of similar documents by comparing their embeddings and returns those exceeding a specified similarity threshold, sorted by similarity score.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/similarity_utils.py

    document-similarity embedding-comparison duplicate-detection cosine-similarity nlp
  • function get_unique_documents

    Identifies and separates unique documents from duplicates in a list by comparing hash values of document text content.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

    deduplication document-processing data-cleaning hashing text-processing
  • function summarize_text

    A deprecated standalone function that was originally designed to summarize groups of similar documents but now only returns the input documents unchanged with a deprecation warning.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/summarization/summarizer.py

    deprecated text-summarization document-processing nlp text-clustering
  • function create_summary

    Creates a text summary using OpenAI's GPT models or returns a truncated version as fallback when API key is unavailable.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/summarization/summarizer.py

    summarization text-processing openai gpt nlp
  • class SummarizationConfig

    A configuration wrapper class that manages settings for a text summarization model by encapsulating a SummarizationModel instance.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/summarization/models.py

    configuration summarization model-config wrapper nlp
  • class SimilarityCleaner

    A document cleaning class that identifies and removes duplicate or highly similar documents based on embedding vector similarity, keeping only representative documents from each similarity group.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/similarity_cleaner.py

    document-processing deduplication similarity embeddings clustering
  • class QueryBasedExtractor_v1

    A class that performs targeted information extraction from text using LLM-based query-guided extraction, with support for handling long documents through chunking and token management.

    File: /tf/active/vicechatdev/vice_ai/hybrid_rag_engine.py

    information-extraction llm openai text-processing query-based
  • class StatisticalAgent

    LLM-powered statistical analysis agent

    File: /tf/active/vicechatdev/vice_ai/statistical_agent.py

    class statisticalagent
  • class TableSelectionResult

    A dataclass that encapsulates the results of a table selection operation, including selected tables, reasoning, confidence score, and suggested joins.

    File: /tf/active/vicechatdev/full_smartstat/two_pass_sql_workflow.py

    dataclass table-selection database query-generation sql
  • class StatisticalAgent_v1

    LLM-powered statistical analysis agent

    File: /tf/active/vicechatdev/full_smartstat/statistical_agent.py

    class statisticalagent
  • class StatisticalAgent_v2

    LLM-powered statistical analysis agent

    File: /tf/active/vicechatdev/smartstat/statistical_agent.py

    class statisticalagent
  • class LLMClient_v1

    A client class for interacting with Large Language Models (LLMs), specifically designed to work with OpenAI's chat completion API.

    File: /tf/active/vicechatdev/QA_updater/core/llm_client.py

    llm openai gpt chat-completion api-client
  • class QueryParser

    A parser class that converts LLM-generated query response text into structured dictionaries containing various search query types, metadata, and parameters.

    File: /tf/active/vicechatdev/QA_updater/core/query_parser.py

    parser LLM query-processing text-parsing structured-data

Search Examples