function create_test_dataset
Creates a test CSV dataset with sample product sales data across different regions and months, saving it to a temporary file.
/tf/active/vicechatdev/vice_ai/test_integration.py
133 - 152
simple
Purpose
This function generates a synthetic dataset for testing purposes, containing 100 rows of sales data with products (A-E), sales figures, regions (North/South/East/West/Central), and months (Jan-May). It creates a temporary CSV file that can be used for testing data analysis workflows, CSV parsing, or data processing pipelines without requiring external data sources.
Source Code
def create_test_dataset():
"""Create a test CSV dataset for testing"""
import pandas as pd
# Generate sample data
data = {
'product': ['A', 'B', 'C', 'D', 'E'] * 20,
'sales': [100, 150, 200, 120, 180] * 20,
'region': ['North', 'South', 'East', 'West', 'Central'] * 20,
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'] * 20
}
df = pd.DataFrame(data)
# Save to temporary file
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.csv')
df.to_csv(temp_file.name, index=False)
print(f"✅ Test dataset created: {temp_file.name}")
return temp_file.name
Return Value
Returns a string containing the file path to the created temporary CSV file. The file contains 100 rows with columns: 'product', 'sales', 'region', and 'month'. The file is created with a '.csv' suffix and is not automatically deleted (delete=False), so the caller is responsible for cleanup.
Dependencies
pandastempfile
Required Imports
import pandas as pd
import tempfile
Usage Example
import pandas as pd
import tempfile
import os
def create_test_dataset():
data = {
'product': ['A', 'B', 'C', 'D', 'E'] * 20,
'sales': [100, 150, 200, 120, 180] * 20,
'region': ['North', 'South', 'East', 'West', 'Central'] * 20,
'month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'] * 20
}
df = pd.DataFrame(data)
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.csv')
df.to_csv(temp_file.name, index=False)
print(f"✅ Test dataset created: {temp_file.name}")
return temp_file.name
# Usage
test_file_path = create_test_dataset()
df = pd.read_csv(test_file_path)
print(df.head())
# Clean up when done
os.unlink(test_file_path)
Best Practices
- Always clean up the temporary file after use with os.unlink() or os.remove() to prevent disk space accumulation
- The function creates a file with delete=False, meaning it persists after the NamedTemporaryFile object is closed
- Consider wrapping usage in a try-finally block to ensure cleanup even if errors occur
- The dataset has a predictable pattern (repeating values 20 times) which may not be suitable for all testing scenarios
- For production use, consider adding parameters to customize the dataset size, columns, or data patterns
- The function prints to stdout; consider adding a silent mode parameter for automated testing
Tags
Similar Components
AI-powered semantic similarity - components with related functionality:
-
function create_sample_data_v1 72.9% similar
-
function create_test_file 57.8% similar
-
function load_data 57.4% similar
-
function create_sample_data_v2 57.0% similar
-
function test_european_csv 55.7% similar