🔍 Code Extractor

Search Components

Full-Text: Fast keyword matching | Semantic: AI-powered understanding of intent (finds similar concepts)

Search Results for "preprocessing"

Found 32 matching component(s)

  • function clean_text

    Cleans and normalizes text content by removing HTML tags, normalizing whitespace, and stripping markdown formatting elements.

    File: /tf/active/vicechatdev/improved_convert_disclosures_to_table.py

    text-processing text-cleaning normalization html-removal markdown-removal
  • class MyEmbeddingFunction_v1

    A custom embedding function class that generates embeddings for documents using OpenAI's API, with built-in text summarization for long documents and token management.

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG copy.py

    embeddings openai chromadb vector-database text-summarization
  • class OneCo_hybrid_RAG_v2

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag
  • function extract_previous_reports_summary

    Extracts and summarizes key information from previous meeting report files using document extraction and OpenAI's GPT-4o-mini model to provide context for upcoming meetings.

    File: /tf/active/vicechatdev/leexi/app.py

    meeting-analysis document-extraction text-summarization llm openai
  • function test_multiple_files

    A test function that validates the extraction of text content from multiple document files using a DocumentExtractor instance, displaying extraction results and simulating combined content processing.

    File: /tf/active/vicechatdev/leexi/test_multiple_files.py

    testing document-extraction file-processing text-extraction multiple-files
  • class DocChatRAG

    Main RAG engine with three operating modes: 1. Basic RAG (similarity search) 2. Extensive (full document retrieval with preprocessing) 3. Full Reading (process all documents)

    File: /tf/active/vicechatdev/docchat/rag_engine.py

    class docchatrag
  • function get_unique_documents

    Identifies and separates unique documents from duplicates in a list by comparing hash values of document text content.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/utils/hash_utils.py

    deduplication document-processing data-cleaning hashing text-processing
  • class HashCleaner

    A document deduplication cleaner that removes documents with identical content by comparing hash values of document text.

    File: /tf/active/vicechatdev/chromadb-cleanup/src/cleaners/hash_cleaner.py

    deduplication data-cleaning hash-based document-processing duplicate-removal
  • function process_inline_markdown

    Processes inline markdown formatting by unescaping HTML entities in text. Currently performs basic cleanup while preserving markdown syntax for downstream processing.

    File: /tf/active/vicechatdev/vice_ai/complex_app.py

    markdown text-processing html-entities preprocessing formatting
  • function smart_read_csv

    Automatically detects CSV file delimiters (comma, semicolon, tab) and handles regional decimal formats (European comma vs US/UK point) to reliably parse CSV files from different locales.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    csv data-loading file-parsing delimiter-detection regional-formats
  • function validate_sheet_format

    Analyzes Excel sheet structure using multiple heuristics to classify it as tabular data, information sheet, or mixed format, returning quality metrics and extraction recommendations.

    File: /tf/active/vicechatdev/vice_ai/smartstat_service.py

    data-validation excel-processing sheet-classification data-quality heuristic-analysis
  • function clean_for_json_v1

    Recursively traverses nested data structures (dicts, lists) and replaces NaN and Infinity float values with None to ensure JSON serialization compatibility.

    File: /tf/active/vicechatdev/vice_ai/new_app.py

    json serialization data-cleaning nan-handling infinity-handling
  • function clean_nan_for_json

    Recursively traverses nested data structures (dicts, lists) and converts NaN, null, and invalid numeric values to None for safe JSON serialization.

    File: /tf/active/vicechatdev/vice_ai/data_analysis_service.py

    json-serialization data-cleaning nan-handling recursive data-preprocessing
  • function remove_outliers_iqr

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a conservative 3*IQR threshold.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/analysis_2.py

    data-cleaning outlier-detection IQR interquartile-range data-preprocessing
  • function remove_outliers_iqr_v1

    Removes outliers from a pandas DataFrame column using the Interquartile Range (IQR) method with a 3×IQR threshold.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/42b81361-ba7e-4d79-9598-3090af68384b/project_1/analysis.py

    data-cleaning outlier-detection IQR interquartile-range statistics
  • function clean_for_json_v4

    Recursively traverses nested data structures (dicts, lists, arrays) and converts NaN and Inf float values to None for safe JSON serialization, while also converting NumPy types to native Python types.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/7372154d-807e-4723-a769-4668761944b5/analysis_2.py

    json serialization data-cleaning numpy nan-handling
  • function remove_outliers

    Removes outliers from a pandas DataFrame based on the Interquartile Range (IQR) method for a specified column.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/f5da873e-41e6-4f34-b3e4-f7443d4d213b/analysis_5.py

    data-cleaning outlier-detection IQR interquartile-range data-preprocessing
  • function clean_for_json_v5

    Recursively traverses nested data structures (dictionaries, lists) and sanitizes numeric values by converting NaN and Inf to None, and normalizing NumPy numeric types to native Python types for JSON serialization.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e4e8cb00-c17d-4282-aa80-5af67f32952f/analysis_1.py

    data-cleaning json-serialization numpy data-preprocessing nan-handling
  • function clean_for_json_v9

    Recursively sanitizes Python objects (dicts, lists, floats) to ensure they are JSON-serializable by converting NaN and infinity values to None and ensuring all dictionary keys are strings.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/c385e1f5-fbf6-4832-8fd4-78ef8b72fc53/project_2/analysis.py

    json serialization data-cleaning sanitization nan-handling
  • function clean_for_json_v6

    Recursively traverses nested data structures (dicts, lists) and sanitizes floating-point values by replacing NaN and Inf with None, while also converting NumPy numeric types to native Python types.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/d1e252f5-950c-4ad7-b425-86b4b02c3c62/analysis_4.py

    json serialization data-cleaning numpy nan-handling
  • function clean_for_json_v2

    Recursively traverses nested data structures (dicts, lists) and sanitizes numeric values by converting NaN and Inf to None, and numpy types to native Python types for JSON serialization.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e9b7c942-87b5-4a6f-865e-e7a0d62fb0a1/analysis_2.py

    json serialization data-cleaning numpy nan-handling
  • function explore_data

    Performs comprehensive exploratory data analysis on a pandas DataFrame, printing dataset overview, data types, missing values, descriptive statistics, and identifying categorical and numerical variables.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py

    data-exploration EDA exploratory-data-analysis data-profiling pandas
  • function identify_variables

    Categorizes DataFrame columns into Eimeria infection variables, performance measure variables, and grouping variables based on keyword matching in column names.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py

    data-preprocessing variable-classification keyword-matching veterinary-research eimeria
  • function detect_outliers_iqr_v1

    Detects outliers in a dataset using the Interquartile Range (IQR) method, returning boolean indices of outliers and the calculated bounds.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py

    outlier-detection IQR interquartile-range statistics data-cleaning
  • function detect_outliers_zscore

    Detects outliers in numerical data using the Z-score statistical method, identifying data points that deviate significantly from the mean.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/328d2f87-3367-495e-89f7-e633ff8c5b3d/analysis_2.py

    outlier-detection statistics data-cleaning anomaly-detection z-score
  • function detect_outliers_iqr_v2

    Detects statistical outliers in a dataset using the Interquartile Range (IQR) method with a conservative 3×IQR threshold.

    File: /tf/active/vicechatdev/vice_ai/smartstat_scripts/84b9ac09-e646-4422-9d3a-e9f96529a553/analysis_1.py

    outlier-detection statistics data-cleaning IQR interquartile-range
  • class DataProcessor

    Handles data loading, validation, and preprocessing

    File: /tf/active/vicechatdev/full_smartstat/data_processor.py

    class dataprocessor
  • class DataProcessor_v1

    Handles data loading, validation, and preprocessing

    File: /tf/active/vicechatdev/smartstat/data_processor.py

    class dataprocessor
  • function isfinite

    Extended version of numpy.isfinite that handles additional data types including None, strings, datetime objects, masked arrays, and dask arrays.

    File: /tf/active/vicechatdev/patches/util.py

    validation data-processing numpy pandas dask
  • function unique_array

    Returns an array of unique values from the input array while preserving the original order of first occurrence.

    File: /tf/active/vicechatdev/patches/util.py

    array-processing deduplication unique-values data-cleaning order-preserving
  • function is_float

    A type-checking utility function that determines whether a given object is a floating-point scalar value, supporting both Python's native float type and NumPy floating-point types.

    File: /tf/active/vicechatdev/patches/util.py

    type-checking validation float numpy scalar
  • class OneCo_hybrid_RAG_v5

    A class named OneCo_hybrid_RAG

    File: /tf/active/vicechatdev/data_capture_backup_18072025/OneCo_hybrid_RAG.py

    class oneco_hybrid_rag

Search Examples