🔍 Code Extractor

function create_test_dataset

Maturity: 42

Creates a test CSV dataset with sample product sales data across different regions and months, saving it to a temporary file.

File:
/tf/active/vicechatdev/vice_ai/test_integration.py
Lines:
133 - 152
Complexity:
simple

Purpose

This function generates a synthetic dataset for testing purposes, containing 100 rows of sales data with products (A-E), sales figures, regions (North/South/East/West/Central), and months (Jan-May). It creates a temporary CSV file that can be used for testing data analysis workflows, CSV parsing, or data processing pipelines without requiring external data sources.

Source Code

def create_test_dataset():
    """Create a test CSV dataset for testing"""
    import pandas as pd
    
    # Generate sample data
    data = {
        'product': ['A', 'B', 'C', 'D', 'E'] * 20,
        'sales': [100, 150, 200, 120, 180] * 20,
        'region': ['North', 'South', 'East', 'West', 'Central'] * 20,
        'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'] * 20
    }
    
    df = pd.DataFrame(data)
    
    # Save to temporary file
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.csv')
    df.to_csv(temp_file.name, index=False)
    print(f"✅ Test dataset created: {temp_file.name}")
    
    return temp_file.name

Return Value

Returns a string containing the file path to the created temporary CSV file. The file contains 100 rows with columns: 'product', 'sales', 'region', and 'month'. The file is created with a '.csv' suffix and is not automatically deleted (delete=False), so the caller is responsible for cleanup.

Dependencies

  • pandas
  • tempfile

Required Imports

import pandas as pd
import tempfile

Usage Example

import pandas as pd
import tempfile
import os

def create_test_dataset():
    data = {
        'product': ['A', 'B', 'C', 'D', 'E'] * 20,
        'sales': [100, 150, 200, 120, 180] * 20,
        'region': ['North', 'South', 'East', 'West', 'Central'] * 20,
        'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'] * 20
    }
    df = pd.DataFrame(data)
    temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.csv')
    df.to_csv(temp_file.name, index=False)
    print(f"✅ Test dataset created: {temp_file.name}")
    return temp_file.name

# Usage
test_file_path = create_test_dataset()
df = pd.read_csv(test_file_path)
print(df.head())

# Clean up when done
os.unlink(test_file_path)

Best Practices

  • Always clean up the temporary file after use with os.unlink() or os.remove() to prevent disk space accumulation
  • The function creates a file with delete=False, meaning it persists after the NamedTemporaryFile object is closed
  • Consider wrapping usage in a try-finally block to ensure cleanup even if errors occur
  • The dataset has a predictable pattern (repeating values 20 times) which may not be suitable for all testing scenarios
  • For production use, consider adding parameters to customize the dataset size, columns, or data patterns
  • The function prints to stdout; consider adding a silent mode parameter for automated testing

Similar Components

AI-powered semantic similarity - components with related functionality:

  • function create_sample_data_v1 72.9% similar

    Generates a synthetic dataset with 200 samples containing group-based measurements, quality scores, environmental data, and temporal information, then saves it to a CSV file.

    From: /tf/active/vicechatdev/full_smartstat/demo.py
  • function create_test_file 57.8% similar

    Creates a temporary text file with predefined multi-chapter test content for testing document extraction and processing functionality.

    From: /tf/active/vicechatdev/vice_ai/test_extraction_debug.py
  • function load_data 57.4% similar

    Loads a CSV dataset from a specified filepath using pandas, with fallback to creating sample data if the file is not found.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function create_sample_data_v2 57.0% similar

    Generates a synthetic dataset of 200 poultry research records with multiple treatment groups, challenge regimens, and performance metrics for demonstration purposes.

    From: /tf/active/vicechatdev/vice_ai/smartstat_scripts/5a059cb7-3903-4020-8519-14198d1f39c9/analysis_1.py
  • function test_european_csv 55.7% similar

    A test function that validates the ability to read and parse European-formatted CSV files (semicolon delimiters, comma decimal separators) and convert them to proper numeric types.

    From: /tf/active/vicechatdev/vice_ai/test_regional_formats.py
← Back to Browse