🔍 Code Extractor

function load_analysis_data

Maturity: 42

Loads CSV dataset(s) into pandas DataFrames based on dataset configuration, supporting both single dataset loading and comparison mode with two datasets.

File:
/tf/active/vicechatdev/data_quality_dashboard.py
Lines:
56 - 74
Complexity:
simple

Purpose

This function serves as a data loader for analysis workflows, handling two distinct loading patterns: (1) comparison mode where two CSV files (original and cleaned versions) are loaded for comparative analysis, and (2) single dataset mode where one CSV file is loaded. It returns a dictionary containing the loaded DataFrame(s) and metadata about the dataset type.

Source Code

def load_analysis_data(dataset_info):
    """Load analysis data based on dataset selection."""
    if dataset_info['type'] == 'compare':
        print("Loading data for comparison analysis...")
        # Load both datasets for comparison
        original_flocks = pd.read_csv(dataset_info['original'])
        cleaned_flocks = pd.read_csv(dataset_info['cleaned'])
        return {
            'original_flocks': original_flocks,
            'cleaned_flocks': cleaned_flocks,
            'type': 'compare'
        }
    else:
        print(f"Loading {dataset_info['type']} dataset...")
        flocks = pd.read_csv(dataset_info['path'])
        return {
            'flocks': flocks,
            'type': dataset_info['type']
        }

Parameters

Name Type Default Kind
dataset_info - - positional_or_keyword

Parameter Details

dataset_info: A dictionary containing dataset configuration. For comparison mode, must include keys: 'type' (set to 'compare'), 'original' (path to original CSV), and 'cleaned' (path to cleaned CSV). For single dataset mode, must include keys: 'type' (dataset type identifier, any string except 'compare') and 'path' (path to CSV file). All paths should be valid file system paths to CSV files.

Return Value

Returns a dictionary with different structures based on dataset type. For comparison mode (type='compare'): {'original_flocks': DataFrame, 'cleaned_flocks': DataFrame, 'type': 'compare'}. For single dataset mode: {'flocks': DataFrame, 'type': <dataset_type_string>}. All DataFrames are pandas DataFrame objects loaded from CSV files.

Dependencies

  • pandas

Required Imports

import pandas as pd

Usage Example

import pandas as pd

# Example 1: Load single dataset
dataset_config = {
    'type': 'original',
    'path': 'data/flocks.csv'
}
result = load_analysis_data(dataset_config)
flocks_df = result['flocks']
print(f"Loaded {len(flocks_df)} rows of type {result['type']}")

# Example 2: Load comparison datasets
comparison_config = {
    'type': 'compare',
    'original': 'data/original_flocks.csv',
    'cleaned': 'data/cleaned_flocks.csv'
}
result = load_analysis_data(comparison_config)
original_df = result['original_flocks']
cleaned_df = result['cleaned_flocks']
print(f"Loaded {len(original_df)} original and {len(cleaned_df)} cleaned records")

Best Practices

  • Ensure CSV files exist before calling this function to avoid FileNotFoundError
  • Validate the structure of dataset_info dictionary before passing to ensure required keys are present
  • Handle potential pandas CSV parsing errors (e.g., encoding issues, malformed CSV) with try-except blocks when calling this function
  • The function prints status messages to stdout; consider redirecting or capturing output in production environments
  • For large CSV files, consider memory constraints as entire datasets are loaded into memory
  • The returned dictionary structure differs based on dataset type; always check the 'type' key before accessing DataFrame keys

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function load_dataset 74.2% similar

    Loads a CSV dataset from a specified file path using pandas and returns it as a DataFrame with error handling for file not found and general exceptions.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/e1ecec5f-4ea5-49c5-b4f5-d051ce851294/project_1/analysis.py
  • function load_data 73.8% similar

    Loads a CSV dataset from a specified filepath using pandas, with fallback to creating sample data if the file is not found.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function compare_datasets 61.0% similar

    Analyzes and compares two pandas DataFrames containing flock data (original vs cleaned), printing detailed statistics about removed records, type distributions, and impact assessment.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
  • function explore_data 59.8% similar

    Performs comprehensive exploratory data analysis on a pandas DataFrame, printing dataset overview, data types, missing values, descriptive statistics, and identifying categorical and numerical variables.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function select_dataset 56.5% similar

    Interactive command-line function that prompts users to select between original, cleaned, or comparison of flock datasets for analysis.

    From: /tf/active/vicechatdev/data_quality_dashboard.py
← Back to Browse