function load_analysis_data
Loads CSV dataset(s) into pandas DataFrames based on dataset configuration, supporting both single dataset loading and comparison mode with two datasets.
/tf/active/vicechatdev/data_quality_dashboard.py
56 - 74
simple
Purpose
This function serves as a data loader for analysis workflows, handling two distinct loading patterns: (1) comparison mode where two CSV files (original and cleaned versions) are loaded for comparative analysis, and (2) single dataset mode where one CSV file is loaded. It returns a dictionary containing the loaded DataFrame(s) and metadata about the dataset type.
Source Code
def load_analysis_data(dataset_info):
"""Load analysis data based on dataset selection."""
if dataset_info['type'] == 'compare':
print("Loading data for comparison analysis...")
# Load both datasets for comparison
original_flocks = pd.read_csv(dataset_info['original'])
cleaned_flocks = pd.read_csv(dataset_info['cleaned'])
return {
'original_flocks': original_flocks,
'cleaned_flocks': cleaned_flocks,
'type': 'compare'
}
else:
print(f"Loading {dataset_info['type']} dataset...")
flocks = pd.read_csv(dataset_info['path'])
return {
'flocks': flocks,
'type': dataset_info['type']
}
Parameters
| Name | Type | Default | Kind |
|---|---|---|---|
dataset_info |
- | - | positional_or_keyword |
Parameter Details
dataset_info: A dictionary containing dataset configuration. For comparison mode, must include keys: 'type' (set to 'compare'), 'original' (path to original CSV), and 'cleaned' (path to cleaned CSV). For single dataset mode, must include keys: 'type' (dataset type identifier, any string except 'compare') and 'path' (path to CSV file). All paths should be valid file system paths to CSV files.
Return Value
Returns a dictionary with different structures based on dataset type. For comparison mode (type='compare'): {'original_flocks': DataFrame, 'cleaned_flocks': DataFrame, 'type': 'compare'}. For single dataset mode: {'flocks': DataFrame, 'type': <dataset_type_string>}. All DataFrames are pandas DataFrame objects loaded from CSV files.
Dependencies
pandas
Required Imports
import pandas as pd
Usage Example
import pandas as pd
# Example 1: Load single dataset
dataset_config = {
'type': 'original',
'path': 'data/flocks.csv'
}
result = load_analysis_data(dataset_config)
flocks_df = result['flocks']
print(f"Loaded {len(flocks_df)} rows of type {result['type']}")
# Example 2: Load comparison datasets
comparison_config = {
'type': 'compare',
'original': 'data/original_flocks.csv',
'cleaned': 'data/cleaned_flocks.csv'
}
result = load_analysis_data(comparison_config)
original_df = result['original_flocks']
cleaned_df = result['cleaned_flocks']
print(f"Loaded {len(original_df)} original and {len(cleaned_df)} cleaned records")
Best Practices
- Ensure CSV files exist before calling this function to avoid FileNotFoundError
- Validate the structure of dataset_info dictionary before passing to ensure required keys are present
- Handle potential pandas CSV parsing errors (e.g., encoding issues, malformed CSV) with try-except blocks when calling this function
- The function prints status messages to stdout; consider redirecting or capturing output in production environments
- For large CSV files, consider memory constraints as entire datasets are loaded into memory
- The returned dictionary structure differs based on dataset type; always check the 'type' key before accessing DataFrame keys
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function load_dataset 74.2% similar
-
function load_data 73.8% similar
-
function compare_datasets 61.0% similar
-
function explore_data 59.8% similar
-
function select_dataset 56.5% similar